Using Machine Learning to Evaluate Attending Feedback on Resident Performance

Sara E. Neves, Michael J. Chen, Cindy M. Ku, Suzanne Karan, Amy N. Dilorenzo, Randall M. Schell, Daniel E. Lee, Carol Ann B. Diachun, Stephanie B. Jones, John D. Mitchell

Research output: Contribution to journalArticlepeer-review

8 Scopus citations


BACKGROUND: High-quality and high-utility feedback allows for the development of improvement plans for trainees. The current manual assessment of the quality of this feedback is time consuming and subjective. We propose the use of machine learning to rapidly distinguish the quality of attending feedback on resident performance. METHODS: Using a preexisting databank of 1925 manually reviewed feedback comments from 4 anesthesiology residency programs, we trained machine learning models to predict whether comments contained 6 predefined feedback traits (actionable, behavior focused, detailed, negative feedback, professionalism/communication, and specific) and predict the utility score of the comment on a scale of 1-5. Comments with ≥4 feedback traits were classified as high-quality and comments with ≥4 utility scores were classified as high-utility; otherwise comments were considered low-quality or low-utility, respectively. We used RapidMiner Studio (RapidMiner, Inc, Boston, MA), a data science platform, to train, validate, and score performance of models. RESULTS: Models for predicting the presence of feedback traits had accuracies of 74.4%-82.2%. Predictions on utility category were 82.1% accurate, with 89.2% sensitivity, and 89.8% class precision for low-utility predictions. Predictions on quality category were 78.5% accurate, with 86.1% sensitivity, and 85.0% class precision for low-quality predictions. Fifteen to 20 hours were spent by a research assistant with no prior experience in machine learning to become familiar with software, create models, and review performance on predictions made. The program read data, applied models, and generated predictions within minutes. In contrast, a recent manual feedback scoring effort by an author took 15 hours to manually collate and score 200 comments during the course of 2 weeks. CONCLUSIONS: Harnessing the potential of machine learning allows for rapid assessment of attending feedback on resident performance. Using predictive models to rapidly screen for low-quality and low-utility feedback can aid programs in improving feedback provision, both globally and by individual faculty.

Original languageEnglish
Pages (from-to)545-555
Number of pages11
JournalAnesthesia and Analgesia
Issue number2
StatePublished - Feb 1 2021

Bibliographical note

Publisher Copyright:
© 2021 Lippincott Williams and Wilkins. All rights reserved.

ASJC Scopus subject areas

  • Anesthesiology and Pain Medicine


Dive into the research topics of 'Using Machine Learning to Evaluate Attending Feedback on Resident Performance'. Together they form a unique fingerprint.

Cite this