Dysarthric speech augmentation using prosodic transformation and masking for subword end-to-end ASR

Mohammad Soleymanpour, Michael T. Johnson, Jeffrey Berry

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

End-to-end speech recognition systems are effective, but in order to train an end-to-end model, a large amount of training data is needed. For applications such as dysarthric speech recognition, we do not have sufficient data. In this paper, we propose a specialized data augmentation approach to enhance the performance of an end-to-end dysarthric ASR based on sub-word models. The proposed approach contains two methods, including prosodic transformation and time-feature masking. Prosodic transformation modifies the speaking rate and pitch of normal speech to control prosodic characteristics such as loudness, intonation, and rhythm. Using time and feature masking, we apply a mask to the Mel Frequency Cepstral Coefficients (MFCC) for robustness-focused augmentation. Results show that augmenting normal speech with prosodic transformation plus masking decreases CER by 5.4% and WER by 5.6%, and the further addition of dysarthric speech masking decreases CER by 11.3% and WER by 11.4%.

Original languageEnglish
Title of host publication2021 11th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2021
Pages42-46
Number of pages5
ISBN (Electronic)9781665427869
DOIs
StatePublished - 2021
Event11th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2021 - Virtual, Bucharest, Romania
Duration: Oct 13 2021Oct 15 2021

Publication series

Name2021 11th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2021

Conference

Conference11th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2021
Country/TerritoryRomania
CityVirtual, Bucharest
Period10/13/2110/15/21

Bibliographical note

Publisher Copyright:
© 2021 IEEE.

Keywords

  • Data augmentation
  • Dysarthria
  • Dysarthric ASR
  • Speech recognition
  • Subwordmodel

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Safety, Risk, Reliability and Quality
  • Communication

Fingerprint

Dive into the research topics of 'Dysarthric speech augmentation using prosodic transformation and masking for subword end-to-end ASR'. Together they form a unique fingerprint.

Cite this