Minimum mean-squared error estimation of mel-frequency cepstral coefficients using a novel distortion model

Kevin M. Indrebo, Richard J. Povinelli, Michael T. Johnson

Research output: Contribution to journalArticlepeer-review

21 Scopus citations

Abstract

In this paper, a new method for statistical estimation of Mel-frequency cepstral coefficients (MFCCs) in noisy speech signals is proposed. Previous research has shown that model-based feature domain enhancement of speech signals for use in robust speech recognition can improve recognition accuracy significantly. These methods, which typically work in the log spectral or cepstral domain, must face the high complexity of distortion models caused by the nonlinear interaction of speech and noise in these domains. In this paper, an additive cepstral distortion model (ACDM) is developed, and used with a minimum mean-squared error (MMSE) estimator for recovery of MFCC features corrupted by additive noise. The proposed ACDM-MMSE estimation algorithm is evaluated on the Aurora2 database, and is shown to provide significant improvement in word recognition accuracy over the baseline.

Original languageEnglish
Pages (from-to)1654-1661
Number of pages8
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume16
Issue number8
DOIs
StatePublished - Nov 2008

Bibliographical note

Funding Information:
Manuscript received January 15, 2008; revised May 15, 2008. Current version published October 17, 2008. This work was supported by the Graduate Assistance in Areas of National Need (GAANN) program, funded by the U.S. Department of Education. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Abeer Alwan.

Keywords

  • Parameter estimation
  • Robustness
  • Speech recognition

ASJC Scopus subject areas

  • Acoustics and Ultrasonics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Minimum mean-squared error estimation of mel-frequency cepstral coefficients using a novel distortion model'. Together they form a unique fingerprint.

Cite this