Abstract
Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech recognition (ASR) systems may help dysarthric talkers communicate more effectively. To have robust dysarthria-specific ASR, sufficient training speech is required, which is not readily available. Recent advances in Text-To-Speech (TTS) synthesis multi-speaker end-to-end systems suggest the possibility of using synthesis for data augmentation. In this paper, we aim to improve multi-speaker end-to-end TTS systems to synthesize dysarthric speech for improved training of a dysarthria-specific DNN-HMM ASR. In the synthesized speech, we add dysarthria severity level and pause insertion mechanisms to other control parameters such as pitch, energy, and duration. Results show that a DNN-HMM model trained on additional synthetic dysarthric speech achieves WER improvement of 12.2% compared to the baseline, the addition of the severity level and pause insertion controls decrease WER by 6.5%, showing the effectiveness of adding these parameters. Audio samples are available at https://mohammadelc.github.io/SpeechGroupUKY/.
Original language | English |
---|---|
Title of host publication | 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings |
Pages | 7382-7386 |
Number of pages | 5 |
ISBN (Electronic) | 9781665405409 |
DOIs | |
State | Published - 2022 |
Event | 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022 - Hybrid, Singapore Duration: May 22 2022 → May 27 2022 |
Publication series
Name | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
---|---|
Volume | 2022-May |
ISSN (Print) | 1520-6149 |
Conference
Conference | 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022 |
---|---|
Country/Territory | Singapore |
City | Hybrid |
Period | 5/22/22 → 5/27/22 |
Bibliographical note
Publisher Copyright:© 2022 IEEE
Keywords
- Data augmentation
- Dysarthria
- Speech-To-Text
- Synthesized speech
- speech recognition
ASJC Scopus subject areas
- Software
- Signal Processing
- Electrical and Electronic Engineering