TY - GEN
T1 - Supervised extraction of diagnosis codes from EMRS
T2 - 2013 1st IEEE International Conference on Healthcare Informatics, ICHI 2013
AU - Rios, Anthony
AU - Kavuluru, Ramakanth
PY - 2013
Y1 - 2013
N2 - Extracting diagnosis codes from medical records is a complex task carried out by trained coders by reading all the documents associated with a patient's visit. With the popularity of electronic medical records (EMRs), computational approaches to code extraction have been proposed in the recent years. Machine learning approaches to multi-label text classification provide an important methodology in this task given each EMR can be associated with multiple codes. In this paper, we study the the role of feature selection, training data selection, and probabilistic threshold optimization in improving different multi-label classification approaches. We conduct experiments based on two different datasets: a recent gold standard dataset used for this task and a second larger and more complex EMR dataset we curated from the University of Kentucky Medical Center. While conventional approaches achieve results comparable to the state-of-the-art on the gold standard dataset, on our complex in-house dataset, we show that feature selection, training data selection, and probabilistic thresholding provide significant gains in performance.
AB - Extracting diagnosis codes from medical records is a complex task carried out by trained coders by reading all the documents associated with a patient's visit. With the popularity of electronic medical records (EMRs), computational approaches to code extraction have been proposed in the recent years. Machine learning approaches to multi-label text classification provide an important methodology in this task given each EMR can be associated with multiple codes. In this paper, we study the the role of feature selection, training data selection, and probabilistic threshold optimization in improving different multi-label classification approaches. We conduct experiments based on two different datasets: a recent gold standard dataset used for this task and a second larger and more complex EMR dataset we curated from the University of Kentucky Medical Center. While conventional approaches achieve results comparable to the state-of-the-art on the gold standard dataset, on our complex in-house dataset, we show that feature selection, training data selection, and probabilistic thresholding provide significant gains in performance.
UR - http://www.scopus.com/inward/record.url?scp=84893437311&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84893437311&partnerID=8YFLogxK
U2 - 10.1109/ICHI.2013.15
DO - 10.1109/ICHI.2013.15
M3 - Conference contribution
AN - SCOPUS:84893437311
SN - 9780769550893
T3 - Proceedings - 2013 IEEE International Conference on Healthcare Informatics, ICHI 2013
SP - 66
EP - 73
BT - Proceedings - 2013 IEEE International Conference on Healthcare Informatics, ICHI 2013
Y2 - 9 September 2013 through 11 September 2013
ER -