Improving Cervical Precancer Surveillance: Validity of Claims-Based Prediction Models in ICD-9 and ICD-10 Eras

Jaimie Z. Shing, Marie R. Griffin, Linh D. Nguyen, James C. Slaughter, Edward F. Mitchel, Manideepthi Pemmaraju, Alyssa B. Rentuza, Pamela C. Hull

Research output: Contribution to journalArticlepeer-review

3 Scopus citations


Background: Human papillomavirus vaccine (HPV) impact on cervical precancer (cervical intraepithelial neoplasia grades 2+ [CIN2+]) is observable sooner than impact on cancer. Biopsy-confirmed CIN2+ is not included in most US cancer registries. Billing codes could provide surrogate metrics; however, the International Classification of Diseases, ninth (ICD-9) to tenth (ICD-10) transition disrupts trends. We built, validated, and compared claims-based models to identify CIN2+ events in both ICD eras. Methods: A database of Davidson County (Nashville), Tennessee, pathology-confirmed CIN2+ from the HPV Vaccine Impact Monitoring Project (HPV-IMPACT) provided gold standard events. Using Tennessee Medicaid 2008-2017, cervical diagnostic procedures (N = 8549) among Davidson County women aged 18-39 years were randomly split into 60% training and 40% testing sets. Relevant diagnosis, procedure, and screening codes were used to build models from CIN2+ tissue diagnosis codes alone, least absolute shrinkage and selection operator (LASSO), and random forest. Model-classified index events were counted to estimate incident events. Results: HPV-IMPACT identified 983 incident CIN2+ events. Models identified 1007 (LASSO), 1245 (CIN2+ tissue diagnosis codes alone), and 957 (random forest) incident events. LASSO performed well in ICD-9 and ICD-10 eras: 77.3% (95% confidence interval [CI] = 72.5% to 81.5%) vs 81.1% (95% CI = 71.5% to 88.6%) sensitivity, 93.0% (95% CI = 91.9% to 94.0%) vs 90.2% (95% CI = 87.2% to 92.7%) specificity, 61.3% (95% CI = 56.6% to 65.8%) vs 60.3% (95% CI = 51.0% to 69.1%) positive predictive value, 96.6% (95% CI = 95.8% to 97.3%) vs 96.3% (95% CI = 94.1% to 97.8%) negative predictive value, 91.0% (95% CI = 89.9% to 92.1%) vs 88.8% (95% CI = 85.9% to 91.2%) accuracy, and 85.1% (95% CI = 82.9% to 87.4%) vs 85.6% (95% CI = 81.4% to 89.9%) C-indices, respectively; performance did not statistically significantly differ between eras (95% confidence intervals all overlapped). Conclusions: Results confirmed model utility with good performance across both ICD eras for CIN2+ surveillance. Validated claims-based models may be used in future CIN2+ trend analyses to estimate HPV vaccine impact where population-based biopsies are unavailable.

Original languageEnglish
Article numberpkaa112
JournalJNCI Cancer Spectrum
Issue number1
StatePublished - Feb 1 2021

Bibliographical note

Publisher Copyright:
© 2020 The Author(s). Published by Oxford University Press.

ASJC Scopus subject areas

  • Oncology
  • Cancer Research


Dive into the research topics of 'Improving Cervical Precancer Surveillance: Validity of Claims-Based Prediction Models in ICD-9 and ICD-10 Eras'. Together they form a unique fingerprint.

Cite this