Caption-based topical descriptors for microscopic images as published in academic papers

Sujin Kim, Shannon Lamkin, Pam Duncan

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


Background: Visual findings summarized in the figures and tables of academic papers are invaluable sources for biomedical researchers. Captions associated with the visual findings are often neglected while retrieving biomedical images in published academic papers. Objectives: This study is to assess caption-based topical descriptors for microscopic images of breast neoplasms, as published in academic papers retrieved through the PubMed Central database. Method: Human indexers as well as an automatic keyword finder called TAPoR generated the topical descriptors from collected captions. The study then compared the human-generated descriptors to machine-generated descriptors. Finally, a set of core descriptors was developed from both sets and automatically mapped into the Unified Medical Language System's (UMLS) Metathesaurus through a MetaMap Transfer engine. Results: Major topical descriptors included histologic disease names, laboratory procedures, genetic functions and components. Human indexers provided more relevant descriptors than TAPoR. The UMLS Metathesaurus identified several semantic types including Indicator, Reagent, or Diagnostic Aid; Organic Chemical; Laboratory Procedure; Spatial Concept; Qualitative Concept; and Quantitative Concept. Discussion: The findings suggest that caption-based descriptors can complement title or abstract-based literature indexing for figure image retrieval in articles. With respect to forming a metadata framework for online microscopic image description, the semantic types can be used as a core metadata set. In this regard, this finding can be used for standardising a microscopic image description protocol to train medical students. Conclusions: It is incumbent upon libraries and other information agencies to promote and maintain an interest in the opportunities and challenges associated with biomedical imaging.

Original languageEnglish
Pages (from-to)235-243
Number of pages9
JournalHealth Information and Libraries Journal
Issue number3
StatePublished - Sep 2010


  • Caption-based indexing
  • Image Indexing
  • MetaMap Transfer
  • Unified Medical Language (UMLS)

ASJC Scopus subject areas

  • Health Informatics
  • Library and Information Sciences
  • Health Information Management


Dive into the research topics of 'Caption-based topical descriptors for microscopic images as published in academic papers'. Together they form a unique fingerprint.

Cite this