TY - JOUR
T1 - Improving the Utility of Tobacco-Related Problem List Entries Using Natural Language Processing
AU - Harris, Daniel R.
AU - Henderson, Darren W.
AU - Corbeau, Alexandria
N1 - Publisher Copyright:
©2020 AMIA - All rights reserved.
PY - 2020
Y1 - 2020
N2 - We present findings on using natural language processing to classify tobacco-related entries from problem lists found within patient's electronic health records. Problem lists describe health-related issues recorded during a patient's medical visit; these problems are typically followed up upon during subsequent visits and are updated for relevance or accuracy. The mechanics of problem lists vary across different electronic health record systems. In general, they either manifest as pre-generated generic problems that may be selected from a master list or as text boxes where a healthcare professional may enter free text describing the problem. Using commonly-available natural language processing tools, we classified tobacco-related problems into three classes: active-user, former-user, and non-user; we further demonstrate that rule-based post-processing may significantly increase precision in identifying these classes (+32%, +22%, +35% respectively). We used these classes to generate tobacco time-spans that reconstruct a patient's tobacco-use history and better support secondary data analysis. We bundle this as an open-source toolkit with flow visualizations indicating how patient tobacco-related behavior changes longitudinally, which can also capture and visualize contradicting information such as smokers being flagged as having never smoked.
AB - We present findings on using natural language processing to classify tobacco-related entries from problem lists found within patient's electronic health records. Problem lists describe health-related issues recorded during a patient's medical visit; these problems are typically followed up upon during subsequent visits and are updated for relevance or accuracy. The mechanics of problem lists vary across different electronic health record systems. In general, they either manifest as pre-generated generic problems that may be selected from a master list or as text boxes where a healthcare professional may enter free text describing the problem. Using commonly-available natural language processing tools, we classified tobacco-related problems into three classes: active-user, former-user, and non-user; we further demonstrate that rule-based post-processing may significantly increase precision in identifying these classes (+32%, +22%, +35% respectively). We used these classes to generate tobacco time-spans that reconstruct a patient's tobacco-use history and better support secondary data analysis. We bundle this as an open-source toolkit with flow visualizations indicating how patient tobacco-related behavior changes longitudinally, which can also capture and visualize contradicting information such as smokers being flagged as having never smoked.
UR - http://www.scopus.com/inward/record.url?scp=85105323271&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105323271&partnerID=8YFLogxK
M3 - Article
C2 - 33936427
AN - SCOPUS:85105323271
SN - 1559-4076
VL - 2020
SP - 534
EP - 543
JO - AMIA ... Annual Symposium proceedings. AMIA Symposium
JF - AMIA ... Annual Symposium proceedings. AMIA Symposium
ER -