End-to-end let's play commentary generation using multi-modal video representations

Chengxi Li, Sagar Gandhi, Brent Harrison

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

In this paper, we explore how multi-modal video representations can be applied in an end-to-end fashion for automatically generating game commentary based on Let's Play videos using deep learning. We introduce a comprehensive pipeline that involves directly taking videos from YouTube and then using a sequence-to-sequence strategy to learn how to generate appropriate commentary. We evaluate our framework using Let's Play commentaries for the game Getting Over It with Bennet Foddy. To test the quality of the commentary generation, we apply perplexity to evaluate our language models using different input video representations to highlight different aspects of gameplay that might influence commentary.

Original languageEnglish
Title of host publicationProceedings of the 14th International Conference on the Foundations of Digital Games, FDG 2019
EditorsFoaad Khosmood, Johanna Pirker, Thomas Apperley, Sebastian Deterding
ISBN (Electronic)9781450372176
DOIs
StatePublished - Aug 26 2019
Event14th International Conference on the Foundations of Digital Games, FDG 2019 - San Luis Obispo, United States
Duration: Aug 26 2019Aug 30 2019

Publication series

NameACM International Conference Proceeding Series

Conference

Conference14th International Conference on the Foundations of Digital Games, FDG 2019
Country/TerritoryUnited States
CitySan Luis Obispo
Period8/26/198/30/19

Bibliographical note

Publisher Copyright:
© 2019 ACM.

Keywords

  • Commentary generation
  • Multi-modality
  • Sequence to sequence

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'End-to-end let's play commentary generation using multi-modal video representations'. Together they form a unique fingerprint.

Cite this