Joint learning for biomedical NER and entity normalization: Encoding schemes, counterfactual examples, and zero-shot evaluation

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

Named entity recognition (NER) and normalization (EN) form an indispensable first step to many biomedical natural language processing applications. In biomedical information science, recognizing entities (e.g., genes, diseases, or drugs) and normalizing them to concepts in standard terminologies or thesauri (e.g., Entrez, ICD-10, or RxNorm) is crucial for identifying more informative relations among them that drive disease etiology, progression, and treatment. In this effort we pursue two high level strategies to improve biomedical ER and EN. The first is to decouple standard entity encoding tags (e.g., "B-Drug"for the beginning of a drug) into type tags (e.g., "Drug") and positional tags (e.g., "B"). A second strategy is to use additional counterfactual training examples to handle the issue of models learning spurious correlations between surrounding context and normalized concepts in training data. We conduct elaborate experiments using the MedMentions dataset, the largest dataset of its kind for ER and EN in biomedicine. We find that our first strategy performs better in entity normalization when compared with the standard coding scheme. The second data augmentation strategy uniformly improves performance in span detection, typing, and normalization. The gains from counterfactual examples are more prominent when evaluating in zero-shot settings, for concepts that have never been encountered during training.

Original languageEnglish
Title of host publicationProceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2021
ISBN (Electronic)9781450384506
DOIs
StatePublished - Jan 18 2021
Event12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2021 - Virtual, Online, United States
Duration: Aug 1 2021Aug 4 2021

Publication series

NameProceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2021

Conference

Conference12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2021
Country/TerritoryUnited States
CityVirtual, Online
Period8/1/218/4/21

Bibliographical note

Publisher Copyright:
© 2021 ACM.

Keywords

  • biomedical natural language processing
  • deep neural networks
  • entity normalization
  • information extraction
  • named entity recognition

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Biomedical Engineering
  • Health Informatics

Fingerprint

Dive into the research topics of 'Joint learning for biomedical NER and entity normalization: Encoding schemes, counterfactual examples, and zero-shot evaluation'. Together they form a unique fingerprint.

Cite this