Evolving phenotypes of non-hospitalized patients that indicate long COVID

Research output: Contribution to journalArticlepeer-review

72 Scopus citations


Background: For some SARS-CoV-2 survivors, recovery from the acute phase of the infection has been grueling with lingering effects. Many of the symptoms characterized as the post-acute sequelae of COVID-19 (PASC) could have multiple causes or are similarly seen in non-COVID patients. Accurate identification of PASC phenotypes will be important to guide future research and help the healthcare system focus its efforts and resources on adequately controlled age- and gender-specific sequelae of a COVID-19 infection. Methods: In this retrospective electronic health record (EHR) cohort study, we applied a computational framework for knowledge discovery from clinical data, MLHO, to identify phenotypes that positively associate with a past positive reverse transcription-polymerase chain reaction (RT-PCR) test for COVID-19. We evaluated the post-test phenotypes in two temporal windows at 3–6 and 6–9 months after the test and by age and gender. Data from longitudinal diagnosis records stored in EHRs from Mass General Brigham in the Boston Metropolitan Area was used for the analyses. Statistical analyses were performed on data from March 2020 to June 2021. Study participants included over 96 thousand patients who had tested positive or negative for COVID-19 and were not hospitalized. Results: We identified 33 phenotypes among different age/gender cohorts or time windows that were positively associated with past SARS-CoV-2 infection. All identified phenotypes were newly recorded in patients’ medical records 2 months or longer after a COVID-19 RT-PCR test in non-hospitalized patients regardless of the test result. Among these phenotypes, a new diagnosis record for anosmia and dysgeusia (OR 2.60, 95% CI [1.94–3.46]), alopecia (OR 3.09, 95% CI [2.53–3.76]), chest pain (OR 1.27, 95% CI [1.09–1.48]), chronic fatigue syndrome (OR 2.60, 95% CI [1.22–2.10]), shortness of breath (OR 1.41, 95% CI [1.22–1.64]), pneumonia (OR 1.66, 95% CI [1.28–2.16]), and type 2 diabetes mellitus (OR 1.41, 95% CI [1.22–1.64]) is one of the most significant indicators of a past COVID-19 infection. Additionally, more new phenotypes were found with increased confidence among the cohorts who were younger than 65. Conclusions: The findings of this study confirm many of the post-COVID-19 symptoms and suggest that a variety of new diagnoses, including new diabetes mellitus and neurological disorder diagnoses, are more common among those with a history of COVID-19 than those without the infection. Additionally, more than 63% of PASC phenotypes were observed in patients under 65 years of age, pointing out the importance of vaccination to minimize the risk of debilitating post-acute sequelae of COVID-19 among younger adults.

Original languageEnglish
Article number249
JournalBMC Medicine
Issue number1
StatePublished - Dec 2021

Bibliographical note

Publisher Copyright:
© 2021, The Author(s).


  • Electronic health records
  • Machine learning
  • Phenotypes
  • Post-acute sequelae of SARS-CoV-2

ASJC Scopus subject areas

  • General Medicine


Dive into the research topics of 'Evolving phenotypes of non-hospitalized patients that indicate long COVID'. Together they form a unique fingerprint.

Cite this