Abstract
This paper presents a Mispronunciation Detection and Diagnosis (MDD) system based on a range of Automatic Speech Recognition (ASR) models and feature types. The goals of this research are to assess the ability of speech recognition systems to detect and diagnose the common pronunciation errors seen in non-native speakers (L2) of English and to assess the contribution of the information offered by Electromagnetic Articulography (EMA) data in improving the performance of such MDD systems. To evaluate the ability of the ASR systems to detect and diagnose pronunciation errors, the recognized sequence of phonemes generated by the ASR models were aligned with human-labeled phonetic transcripts as well as with the original phonetic prompts. This three-way alignment determined the MDD related metrics of the ASR system. System architectures included GMM-HMM, DNN, and RNN based ASR engines for the MDD system. Articulatory features derived from the Electromagnetic Articulography corpus of Mandarin-Accented English (EMA-MAE) were utilized along with acoustic features to compare the performance of MDD systems. The best performing system using a combination of acoustic and articulatory features had an accuracy of 82.4%, diagnostic accuracy of 75.8% and a false rejection rate of 17.2%.
Original language | English |
---|---|
Title of host publication | 2021 11th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2021 |
Pages | 62-67 |
Number of pages | 6 |
ISBN (Electronic) | 9781665427869 |
DOIs | |
State | Published - 2021 |
Event | 11th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2021 - Virtual, Bucharest, Romania Duration: Oct 13 2021 → Oct 15 2021 |
Publication series
Name | 2021 11th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2021 |
---|
Conference
Conference | 11th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2021 |
---|---|
Country/Territory | Romania |
City | Virtual, Bucharest |
Period | 10/13/21 → 10/15/21 |
Bibliographical note
Publisher Copyright:© 2021 IEEE.
Keywords
- Articulatory features
- Automatic speech recognition (ASR)
- Mispronunciation detection and diagnosis
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Safety, Risk, Reliability and Quality
- Communication