Automatic video self modeling for voice disorder

Ju Shen, Changpeng Ti, Anusha Raghunathan, Sen ching S. Cheung, Rita Patel

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


Video self modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of him- or herself. In the field of speech language pathology, the approach of VSM has been successfully used for treatment of language in children with Autism and in individuals with fluency disorder of stuttering. Technical challenges remain in creating VSM contents that depict previously unseen behaviors. In this paper, we propose a novel system that synthesizes new video sequences for VSM treatment of patients with voice disorders. Starting with a video recording of a voice-disorder patient, the proposed system replaces the coarse speech with a clean, healthier speech that bears resemblance to the patient’s original voice. The replacement speech is synthesized using either a text-to-speech engine or selecting from a database of clean speeches based on a voice similarity metric. To realign the replacement speech with the original video, a novel audiovisual algorithm that combines audio segmentation with lip-state detection is proposed to identify corresponding time markers in the audio and video tracks. Lip synchronization is then accomplished by using an adaptive video re-sampling scheme that minimizes the amount of motion jitter and preserves the spatial sharpness. Results of both objective measurements and subjective evaluations on a dataset with 31 subjects demonstrate the effectiveness of the proposed techniques.

Original languageEnglish
Pages (from-to)5329-5351
Number of pages23
JournalMultimedia Tools and Applications
Issue number14
StatePublished - Jul 1 2015

Bibliographical note

Funding Information:
Part of this material is based upon work supported by the National Science Foundation under Grant No. 1237134. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Publisher Copyright:
© 2014, Springer Science+Business Media New York.


  • Audio segmentation
  • Computational multimedia
  • Frame interpolation
  • Lip reading
  • Positive feedforward
  • Video self modeling
  • Voice disorder
  • Voice imitation

ASJC Scopus subject areas

  • Software
  • Media Technology
  • Hardware and Architecture
  • Computer Networks and Communications


Dive into the research topics of 'Automatic video self modeling for voice disorder'. Together they form a unique fingerprint.

Cite this