A hierarchical approach for visual storytelling using image description

Md Sultan Al Nahian, Tasmia Tasrin, Sagar Gandhi, Ryan Gaines, Brent Harrison

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Scopus citations

Abstract

One of the primary challenges of visual storytelling is developing techniques that can maintain the context of the story over long event sequences to generate human-like stories. In this paper, we propose a hierarchical deep learning architecture based on encoder-decoder networks to address this problem. To better help our network maintain this context while also generating long and diverse sentences, we incorporate natural language image descriptions along with the images themselves to generate each story sentence. We evaluate our system on the Visual Storytelling (VIST) dataset [7] and show that our method outperforms state-of-the-art techniques on a suite of different automatic evaluation metrics. The empirical results from this evaluation demonstrate the necessities of different components of our proposed architecture and shows the effectiveness of the architecture for visual storytelling.

Original languageEnglish
Title of host publicationInteractive Storytelling - 12th International Conference on Interactive Digital Storytelling, ICIDS 2019, Proceedings
EditorsRogelio E. Cardona-Rivera, R. Michael Young, Anne Sullivan
Pages304-317
Number of pages14
DOIs
StatePublished - 2019
Event12th International Conference on Interactive Digital Storytelling, ICIDS 2019 - Salt Lake City, United States
Duration: Nov 19 2019Nov 22 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11869 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference12th International Conference on Interactive Digital Storytelling, ICIDS 2019
Country/TerritoryUnited States
CitySalt Lake City
Period11/19/1911/22/19

Bibliographical note

Publisher Copyright:
© Springer Nature Switzerland AG 2019.

Keywords

  • Deep learning
  • Natural language processing
  • Visual storytelling

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'A hierarchical approach for visual storytelling using image description'. Together they form a unique fingerprint.

Cite this