Using Common Table Expressions to Build a Scalable Boolean Query Generator for Clinical Data Warehouses

Daniel R. Harris, Darren W. Henderson, Ramakanth Kavuluru, Arnold J. Stromberg, Todd R. Johnson

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

We present a custom, Boolean query generator utilizing common-table expressions (CTEs) that is capable of scaling with big datasets. The generator maps user-defined Boolean queries, such as those interactively created in clinical-research and general-purpose healthcare tools, into SQL. We demonstrate the effectiveness of this generator by integrating our study into the Informatics for Integrating Biology and the Bedside (i2b2) query tool and show that it is capable of scaling. Our custom generator replaces and outperforms the default query generator found within the Clinical Research Chart cell of i2b2. In our experiments, 16 different types of i2b2 queries were identified by varying four constraints: date, frequency, exclusion criteria, and whether selected concepts occurred in the same encounter. We generated nontrivial, random Boolean queries based on these 16 types; the corresponding SQL queries produced by both generators were compared by execution times. The CTE-based solution significantly outperformed the default query generator and provided a much more consistent response time across all query types (M = 2.03, SD = 6.64 versus M = 75.82, SD = 238.88 s). Without costly hardware upgrades, we provide a scalable solution based on CTEs with very promising empirical results centered on performance gains. The evaluation methodology used for this provides a means of profiling clinical data warehouse performance.

Original languageEnglish
Article number6674997
Pages (from-to)1607-1613
Number of pages7
JournalIEEE Journal of Biomedical and Health Informatics
Volume18
Issue number5
DOIs
StatePublished - Sep 2014

Bibliographical note

Publisher Copyright:
© 2013 IEEE.

Keywords

  • Biomedical computing
  • biomedical informatics
  • data systems
  • data warehouses
  • health information management
  • large-scale systems

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics
  • Electrical and Electronic Engineering
  • Health Information Management

Fingerprint

Dive into the research topics of 'Using Common Table Expressions to Build a Scalable Boolean Query Generator for Clinical Data Warehouses'. Together they form a unique fingerprint.

Cite this