Bayesian speaker adaptation based on a new hierarchical probabilistic model

Wen Lin Zhang, Wei Qiang Zhang, Bi Cheng Li, Dan Qu, Michael T. Johnson

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

In this paper, a new hierarchical Bayesian speaker adaptation method called HMAP is proposed that combines the advantages of three conventional algorithms, maximum a posteriori (MAP), maximum-likelihood linear regression (MLLR), and eigenvoice, resulting in excellent performance across a wide range of adaptation conditions. The new method efficiently utilizes intra-speaker and inter-speaker correlation information through modeling phone and speaker subspaces in a consistent hierarchical Bayesian way. The phone variations for a specific speaker are assumed to be located in a low-dimensional subspace. The phone coordinate, which is shared among different speakers, implicitly contains the intra-speaker correlation information. For a specific speaker, the phone variation, represented by speaker-dependent eigenphones, are concatenated into a supervector. The eigenphone supervector space is also a low dimensional speaker subspace, which contains inter-speaker correlation information. Using principal component analysis (PCA), a new hierarchical probabilistic model for the generation of the speech observations is obtained. Speaker adaptation based on the new hierarchical model is derived using the maximum a posteriori criterion in a top-down manner. Both batch adaptation and online adaptation schemes are proposed. With tuned parameters, the new method can handle varying amounts of adaptation data automatically and efficiently. Experimental results on a Mandarin Chinese continuous speech recognition task show good performance under all testing conditions.

Original languageEnglish
Article number6178005
Pages (from-to)2002-2015
Number of pages14
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume20
Issue number7
DOIs
StatePublished - 2012

Bibliographical note

Funding Information:
Manuscript received June 28, 2011; revised December 30, 2011; accepted March 13, 2012. Date of publication April 05, 2012; date of current version May 07, 2012. This work was supported in part by the National Natural Science Foundation of China under Grants 60872142, 61005019, and 61175017. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Brian Mak.

Funding

Manuscript received June 28, 2011; revised December 30, 2011; accepted March 13, 2012. Date of publication April 05, 2012; date of current version May 07, 2012. This work was supported in part by the National Natural Science Foundation of China under Grants 60872142, 61005019, and 61175017. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Brian Mak.

FundersFunder number
National Natural Science Foundation of China (NSFC)61005019, 61175017, 60872142

    Keywords

    • Eigenphones
    • eigenvoices
    • hierarchical model
    • maximum a posteriori (MAP)
    • speaker adaptation

    ASJC Scopus subject areas

    • Acoustics and Ultrasonics
    • Electrical and Electronic Engineering

    Fingerprint

    Dive into the research topics of 'Bayesian speaker adaptation based on a new hierarchical probabilistic model'. Together they form a unique fingerprint.

    Cite this