Abstract
End-to-end speech recognition systems are effective, but in order to train an end-to-end model, a large amount of training data is needed. For applications such as dysarthric speech recognition, we do not have sufficient data. In this paper, we propose a specialized data augmentation approach to enhance the performance of an end-to-end dysarthric ASR based on sub-word models. The proposed approach contains two methods, including prosodic transformation and time-feature masking. Prosodic transformation modifies the speaking rate and pitch of normal speech to control prosodic characteristics such as loudness, intonation, and rhythm. Using time and feature masking, we apply a mask to the Mel Frequency Cepstral Coefficients (MFCC) for robustness-focused augmentation. Results show that augmenting normal speech with prosodic transformation plus masking decreases CER by 5.4% and WER by 5.6%, and the further addition of dysarthric speech masking decreases CER by 11.3% and WER by 11.4%.
Original language | English |
---|---|
Title of host publication | 2021 11th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2021 |
Pages | 42-46 |
Number of pages | 5 |
ISBN (Electronic) | 9781665427869 |
DOIs | |
State | Published - 2021 |
Event | 11th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2021 - Virtual, Bucharest, Romania Duration: Oct 13 2021 → Oct 15 2021 |
Publication series
Name | 2021 11th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2021 |
---|
Conference
Conference | 11th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2021 |
---|---|
Country/Territory | Romania |
City | Virtual, Bucharest |
Period | 10/13/21 → 10/15/21 |
Bibliographical note
Publisher Copyright:© 2021 IEEE.
Keywords
- Data augmentation
- Dysarthria
- Dysarthric ASR
- Speech recognition
- Subwordmodel
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Safety, Risk, Reliability and Quality
- Communication