This paper presents a novel deep autoregressive method for Acoustic-to-Articulatory Inversion called Articulatory-WaveNet. In traditional methods such as Gaussian Mixture Model-Hidden Markov Model (GMM-HMM), mapping the frame-level interdependency of observations has not been considered. We address this problem by introducing the Articulatory-WaveNet with dilated causal convolutional layers to predict the articulatory trajectories from acoustic feature sequences. This new model has an average Root Mean Square Error (RMSE) of 1.08mm and a correlation of 0.82 on the English speaker subset of the ElectroMagnetic Articulography-Mandarin Accented English (EMA-MAE) corpus. Articulatory-WaveNet represents an improvement of 59% for RMSE and 30% for correlation over the previous GMM-HMM based inversion model. To the best of our knowledge, this paper introduces the first application of a WaveNet synthesis approach to the problem of Acoustic-to-Articulatory Inversion, and results are comparable to or better than the best currently published systems.
|Number of pages||5|
|Journal||Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH|
|State||Published - 2020|
|Event||21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China|
Duration: Oct 25 2020 → Oct 29 2020
Bibliographical notePublisher Copyright:
Copyright © 2020 ISCA
Copyright 2020 Elsevier B.V., All rights reserved.
- Acoustic-to-articulatory inversion
- Deep autoregressive model
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Modeling and Simulation