Acoustic-to-articulatory inversion with deep autoregressive articulatory-wavenet

Narjes Bozorg, Michael T. Johnson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

This paper presents a novel deep autoregressive method for Acoustic-to-Articulatory Inversion called Articulatory-WaveNet. In traditional methods such as Gaussian Mixture Model-Hidden Markov Model (GMM-HMM), mapping the frame-level interdependency of observations has not been considered. We address this problem by introducing the Articulatory-WaveNet with dilated causal convolutional layers to predict the articulatory trajectories from acoustic feature sequences. This new model has an average Root Mean Square Error (RMSE) of 1.08mm and a correlation of 0.82 on the English speaker subset of the ElectroMagnetic Articulography-Mandarin Accented English (EMA-MAE) corpus. Articulatory-WaveNet represents an improvement of 59% for RMSE and 30% for correlation over the previous GMM-HMM based inversion model. To the best of our knowledge, this paper introduces the first application of a WaveNet synthesis approach to the problem of Acoustic-to-Articulatory Inversion, and results are comparable to or better than the best currently published systems.

Original languageEnglish
Title of host publicationInterspeech 2020
Pages3725-3729
Number of pages5
DOIs
StatePublished - 2020
Event21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
Duration: Oct 25 2020Oct 29 2020

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2020-October
ISSN (Print)2308-457X
ISSN (Electronic)1990-9772

Conference

Conference21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Country/TerritoryChina
CityShanghai
Period10/25/2010/29/20

Bibliographical note

Publisher Copyright:
Copyright © 2020 ISCA

Keywords

  • Acoustic-to-articulatory inversion
  • Deep autoregressive model
  • Speaker-dependent
  • WaveNet

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'Acoustic-to-articulatory inversion with deep autoregressive articulatory-wavenet'. Together they form a unique fingerprint.

Cite this