Unsupervised extraction of diagnosis codes from EMRs using knowledge-based and extractive text summarization techniques

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Scopus citations

Abstract

Diagnosis codes are extracted from medical records for billing and reimbursement and for secondary uses such as quality control and cohort identification. In the US, these codes come from the standard terminology ICD-9-CM derived from the international classification of diseases (ICD). ICD-9 codes are generally extracted by trained human coders by reading all artifacts available in a patient's medical record following specific coding guidelines. To assist coders in this manual process, this paper proposes an unsupervised ensemble approach to automatically extract ICD-9 diagnosis codes from textual narratives included in electronic medical records (EMRs). Earlier attempts on automatic extraction focused on individual documents such as radiology reports and discharge summaries. Here we use a more realistic dataset and extract ICD-9 codes from EMRs of 1000 inpatient visits at the University of Kentucky Medical Center. Using named entity recognition (NER), graph-based concept-mapping of medical concepts, and extractive text summarization techniques, we achieve an example based average recall of 0.42 with average precision 0.47; compared with a baseline of using only NER, we notice a 12% improvement in recall with the graph-based approach and a 7% improvement in precision using the extractive text summarization approach. Although diagnosis codes are complex concepts often expressed in text with significant long range non-local dependencies, our present work shows the potential of unsupervised methods in extracting a portion of codes. As such, our findings are especially relevant for code extraction tasks where obtaining large amounts of training data is difficult.

Original languageEnglish
Title of host publicationAdvances in Artificial Intelligence - 26th Canadian Conference on Artificial Intelligence, Canadian AI 2013, Proceedings
Pages77-88
Number of pages12
DOIs
StatePublished - 2013
Event26th Canadian Conference on Artificial Intelligence, Canadian AI 2013 - Regina, SK, Canada
Duration: May 28 2013May 31 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7884 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference26th Canadian Conference on Artificial Intelligence, Canadian AI 2013
Country/TerritoryCanada
CityRegina, SK
Period5/28/135/31/13

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Unsupervised extraction of diagnosis codes from EMRs using knowledge-based and extractive text summarization techniques'. Together they form a unique fingerprint.

Cite this