TY - JOUR
T1 - Ascertaining Framingham heart failure phenotype from inpatient electronic health record data using natural language processing
T2 - A multicentre Atherosclerosis Risk in Communities (ARIC) validation study
AU - Moore, Carlton R.
AU - Jain, Saumya
AU - Haas, Stephanie
AU - Yadav, Harish
AU - Whitsel, Eric
AU - Rosamand, Wayne
AU - Heiss, Gerardo
AU - Kucharska-Newton, Anna M.
N1 - Publisher Copyright:
© 2021 Author(s) (or their employer(s)).
PY - 2021/6/14
Y1 - 2021/6/14
N2 - Objectives: Using free-text clinical notes and reports from hospitalised patients, determine the performance of natural language processing (NLP) ascertainment of Framingham heart failure (HF) criteria and phenotype. Study design: A retrospective observational study design of patients hospitalised in 2015 from four hospitals participating in the Atherosclerosis Risk in Communities (ARIC) study was used to determine NLP performance in the ascertainment of Framingham HF criteria and phenotype. Setting: Four ARIC study hospitals, each representing an ARIC study region in the USA. Participants A stratified random sample of hospitalisations identified using a broad range of International Classification of Disease, ninth revision, diagnostic codes indicative of an HF event and occurring during 2015 was drawn for this study. A randomly selected set of 394 hospitalisations was used as the derivation dataset and 406 hospitalisations was used as the validation dataset. Intervention Use of NLP on free-text clinical notes and reports to ascertain Framingham HF criteria and phenotype. Primary and secondary outcome measures NLP performance as measured by sensitivity, specificity, positive-predictive value (PPV) and agreement in ascertainment of Framingham HF criteria and phenotype. Manual medical record review by trained ARIC abstractors was used as the reference standard. Results Overall, performance of NLP ascertainment of Framingham HF phenotype in the validation dataset was good, with 78.8%, 81.7%, 84.4% and 80.0% for sensitivity, specificity, PPV and agreement, respectively. Conclusions: By decreasing the need for manual chart review, our results on the use of NLP to ascertain Framingham HF phenotype from free-text electronic health record data suggest that validated NLP technology holds the potential for significantly improving the feasibility and efficiency of conducting large-scale epidemiologic surveillance of HF prevalence and incidence.
AB - Objectives: Using free-text clinical notes and reports from hospitalised patients, determine the performance of natural language processing (NLP) ascertainment of Framingham heart failure (HF) criteria and phenotype. Study design: A retrospective observational study design of patients hospitalised in 2015 from four hospitals participating in the Atherosclerosis Risk in Communities (ARIC) study was used to determine NLP performance in the ascertainment of Framingham HF criteria and phenotype. Setting: Four ARIC study hospitals, each representing an ARIC study region in the USA. Participants A stratified random sample of hospitalisations identified using a broad range of International Classification of Disease, ninth revision, diagnostic codes indicative of an HF event and occurring during 2015 was drawn for this study. A randomly selected set of 394 hospitalisations was used as the derivation dataset and 406 hospitalisations was used as the validation dataset. Intervention Use of NLP on free-text clinical notes and reports to ascertain Framingham HF criteria and phenotype. Primary and secondary outcome measures NLP performance as measured by sensitivity, specificity, positive-predictive value (PPV) and agreement in ascertainment of Framingham HF criteria and phenotype. Manual medical record review by trained ARIC abstractors was used as the reference standard. Results Overall, performance of NLP ascertainment of Framingham HF phenotype in the validation dataset was good, with 78.8%, 81.7%, 84.4% and 80.0% for sensitivity, specificity, PPV and agreement, respectively. Conclusions: By decreasing the need for manual chart review, our results on the use of NLP to ascertain Framingham HF phenotype from free-text electronic health record data suggest that validated NLP technology holds the potential for significantly improving the feasibility and efficiency of conducting large-scale epidemiologic surveillance of HF prevalence and incidence.
KW - cardiac epidemiology
KW - health informatics
KW - heart failure
UR - http://www.scopus.com/inward/record.url?scp=85108243073&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108243073&partnerID=8YFLogxK
U2 - 10.1136/bmjopen-2020-047356
DO - 10.1136/bmjopen-2020-047356
M3 - Article
C2 - 34127492
AN - SCOPUS:85108243073
SN - 2044-6055
VL - 11
JO - BMJ Open
JF - BMJ Open
IS - 6
M1 - e047356
ER -