Visual Question Answering Using Semantic Information from Image Descriptions

Tasmia Tasrin, Md Sultan Al Nahian, Brent Harrison

Research output: Contribution to journalConference articlepeer-review

Abstract

In this work, we propose a deep neural architecture that uses an attention mechanism which utilizes region based image features, the natural language question asked, and semantic knowledge extracted from the regions of an image to produce open-ended answers for questions asked in a visual question answering (VQA) task. The combination of both region based features and region based textual information about the image bolsters a model to more accurately respond to questions and potentially do so with less required training data. We evaluate our proposed architecture on a VQA task against a strong baseline and show that our method achieves excellent results on this task.

Original languageEnglish
JournalProceedings of the International Florida Artificial Intelligence Research Society Conference, FLAIRS
Volume34
DOIs
StatePublished - 2021
Event34th International Florida Artificial Intelligence Research Society Conference, FLAIRS-34 2021 - North Miami Beach, United States
Duration: May 16 2021May 19 2021

Bibliographical note

Publisher Copyright:
© 2021by the authors. All rights reserved.

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software

Fingerprint

Dive into the research topics of 'Visual Question Answering Using Semantic Information from Image Descriptions'. Together they form a unique fingerprint.

Cite this