Association rule mining has received significant attention from both the data mining and machine learning communities. While data mining researchers focus more on designing efficient algorithms to mine rules from large datasets, the learning community has explored applications of rule mining to classification. A major problem with rule mining algorithms is the explosion of rules even for moderate sized datasets making it very difficult for end users to identify both statistically significant and potentially novel rules that could lead to interesting new insights and hypotheses. Researchers have proposed many domain independent interestingness measures using which, one can rank the rules and potentially glean useful rules from the top ranked ones. However, these measures have not been fully explored for rule mining in clinical datasets owing to the relatively large sizes of the datasets often encountered in healthcare and also due to limited access to domain experts for review/analysis. In this paper, using an electronic medical record (EMR) dataset of diagnoses and medications from over three million patient visits to the University of Kentucky medical center and affiliated clinics, we conduct a thorough evaluation of dozens of interestingness measures proposed in data mining literature, including some new composite measures. Using cumulative relevance metrics from information retrieval, we compare these interestingness measures against human judgments obtained from a practicing psychiatrist for association rules involving the depressive disorders class as the consequent. Our results not only surface new interesting associations for depressive disorders but also indicate classes of interestingness measures that weight rule novelty and statistical strength in contrasting ways, offering new insights for end users in identifying interesting rules.
|Title of host publication||ACM-BCB 2016 - 7th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics|
|Number of pages||8|
|State||Published - Oct 2 2016|
|Event||7th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2016 - Seattle, United States|
Duration: Oct 2 2016 → Oct 5 2016
|Name||ACM-BCB 2016 - 7th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics|
|Conference||7th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2016|
|Period||10/2/16 → 10/5/16|
Bibliographical noteFunding Information:
We are grateful to anonymous reviewers for their helpful comments that improved the presentation of this paper and for interesting suggestions to extend our work using a event sequence framework. This work is supported by the National Center for Advancing Translational Sciences through Grant UL1TR000117 and the Kentucky Lung Cancer Research Program through Grant PO2-415-1400004000-1. The content of this paper is the responsibility of the authors and does not necessarily represent the official views of the NIH.
Copyright 2016 ACM.
- Association rule mining
- Electronic medical records
- Rule interestingness measures
ASJC Scopus subject areas
- Health Informatics
- Biomedical Engineering
- Computer Science Applications