Testing and Validating Semi-automated Approaches to the Occupational Exposure Assessment of Polycyclic Aromatic Hydrocarbons

Albeliz Santiago-Colón, Carissa M. Rocheleau, Stephen Bertke, Annette Christianson, Devon T. Collins, Emma Trester-Wilson, Wayne Sanderson, Martha A. Waters, Jennita Reefhuis

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


Introduction: When it is not possible to capture direct measures of occupational exposure or conduct biomonitoring, retrospective exposure assessment methods are often used. Among the common retrospective assessment methods, assigning exposure estimates by multiple expert rater review of detailed job descriptions is typically the most valid, but also the most time-consuming and expensive. Development of screening protocols to prioritize a subset of jobs for expert rater review can reduce the exposure assessment cost and time requirement, but there is often little data with which to evaluate different screening approaches. We used existing job-by-job exposure assessment data (assigned by consensus between multiple expert raters) from a large, population-based study of women to create and test screening algorithms for polycyclic aromatic hydrocarbons (PAHs) that would be suitable for use in other population-based studies. Methods: We evaluated three approaches to creating a screening algorithm: a machine-learning algorithm, a set of a priori decision rules created by experts based on features (such as keywords) found in the job description, and a hybrid algorithm incorporating both sets of criteria. All coded jobs held by mothers of infants participating in National Birth Defects Prevention Study (NBDPS) (n = 35,424) were used in developing or testing the screening algorithms. The job narrative fields considered for all approaches included job title, type of product made by the company, main activities or duties, and chemicals or substances handled. Each screening approach was evaluated against the consensus rating of two or more expert raters. Results: The machine-learning algorithm considered over 30,000 keywords and industry/occupation codes (separate and in combination). Overall, the hybrid method had a similar sensitivity (87.1%) as the expert decision rules (85.5%) but was higher than the machine-learning algorithm (67.7%). Specificity was best in the machine-learning algorithm (98.1%), compared to the expert decision rules (89.2%) and hybrid approach (89.1%). Using different probability cutoffs in the hybrid approach resulted in improvements in sensitivity (24-30%), without the loss of much specificity (7-18%). Conclusion: Both expert decision rules and the machine-learning algorithm performed reasonably well in identifying the majority of jobs with potential exposure to PAHs. The hybrid screening approach demonstrated that by reviewing approximately 20% of the total jobs, it could identify 87% of all jobs exposed to PAHs; sensitivity could be further increased, albeit with a decrease in specificity, by adjusting the algorithm. The resulting screening algorithm could be applied to other population-based studies of women. The process of developing the algorithm also provides a useful illustration of the strengths and potential pitfalls of these approaches to developing exposure assessment algorithms.

Original languageEnglish
Pages (from-to)682-693
Number of pages12
JournalAnnals of Work Exposures and Health
Issue number6
StatePublished - Jul 1 2021

Bibliographical note

Publisher Copyright:
© 2021 Published by Oxford University Press on behalf of The British Occupational Hygiene Society 2021.


  • National Birth Defects Prevention Study
  • exposure assessment
  • female worker
  • jobs
  • machine-learning algorithm
  • occupation
  • polycyclic aromatic hydrocarbons
  • population-based
  • prediction model
  • regularized logistic regression

ASJC Scopus subject areas

  • General Medicine


Dive into the research topics of 'Testing and Validating Semi-automated Approaches to the Occupational Exposure Assessment of Polycyclic Aromatic Hydrocarbons'. Together they form a unique fingerprint.

Cite this