Automatic content generation for video self modeling

Ju Shen, Anusha Raghunathan, Sen Ching S. Cheung, Rita Patel

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations


Video self modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of him or herself. Its effectiveness in rehabilitation and education has been repeatedly demonstrated but technical challenges remain in creating video contents that depict previously unseen behaviors. In this paper, we propose a novel system that re-renders new talking-head sequences suitable to be used for VSM treatment of patients with voice disorder. After the raw footage is captured, a new speech track is either synthesized using text-to-speech or selected based on voice similarity from a database of clean speeches. Voice conversion is then applied to match the new speech to the original voice. Time markers extracted from the original and new speech track are used to re-sample the video track for lip synchronization. We use an adaptive re-sampling strategy to minimize motion jitter, and apply bilinear and optical-flow based interpolation to ensure the image quality. Both objective measurements and subjective evaluations demonstrate the effectiveness of the proposed techniques.

Original languageEnglish
Title of host publicationElectronic Proceedings of the 2011 IEEE International Conference on Multimedia and Expo, ICME 2011
StatePublished - 2011
Event2011 12th IEEE International Conference on Multimedia and Expo, ICME 2011 - Barcelona, Spain
Duration: Jul 11 2011Jul 15 2011

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X


Conference2011 12th IEEE International Conference on Multimedia and Expo, ICME 2011

Bibliographical note

Funding Information:
This research was supported by the 2006FY Industrial Technology Research Grant Program of the New Energy and Industrial Technology Development Organization (NEDO) Japan, 2009FY Grants-in-Aid for Scientific Research of Japan Society for the Promotion of Science, and the 2009FY Research Grant of the Sound Technology Promotion Foundation of Japan.


  • computational multimedia
  • frame interpolation
  • positive feedforward
  • video self modeling
  • voice disorder
  • voice imitation

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications


Dive into the research topics of 'Automatic content generation for video self modeling'. Together they form a unique fingerprint.

Cite this