Abstract
In this paper we introduce a new speaker independent method for Acoustic-to-Articulatory Inversion. The proposed architecture, Speaker Independent-Articulatory WaveNet (SI-AWN), models the relationship between acoustic and articulatory features by conditioning the articulatory trajectories on acoustic features and then utilizes the structure for unseen target speakers. We evaluate the proposed SI-AWN on the Electro Magnetic Articulography corpus of Mandarin Accented English (EMA-MAE), using the pool of acoustic-articulatory information from 35 reference speakers and testing on target speakers that include male, female, native and non-native speakers. The results suggest that SI-AWN improves the performance of the acoustic-to-articulatory inversion process compared to the baseline Maximum Likelihood Regression-Parallel Reference Speaker Weighting (MLLR-PRSW) method by 21 percent. To the best of our knowledge, this is the first application of a WaveNet-like synthesis approach to the problem of Speaker Independent Acoustic-to-Articulatory Inversion, and results are comparable to or better than the best currently published systems.
Original language | English |
---|---|
Title of host publication | 2021 11th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2021 |
Pages | 156-161 |
Number of pages | 6 |
ISBN (Electronic) | 9781665427869 |
DOIs | |
State | Published - 2021 |
Event | 11th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2021 - Virtual, Bucharest, Romania Duration: Oct 13 2021 → Oct 15 2021 |
Publication series
Name | 2021 11th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2021 |
---|
Conference
Conference | 11th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2021 |
---|---|
Country/Territory | Romania |
City | Virtual, Bucharest |
Period | 10/13/21 → 10/15/21 |
Bibliographical note
Publisher Copyright:© 2021 IEEE.
Keywords
- Acoustic-to-articulatory inversion
- Deep autoregressive model
- Speaker-independent
- WaveNet
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Safety, Risk, Reliability and Quality
- Communication