Abstract
The shifted delta cepstrum (SDC) is a widely used feature extraction for language recognition (LRE). With a high context width due to incorporation of multiple frames, SDC outperforms traditional delta and acceleration feature vectors. However, it also introduces correlation into the concatenated feature vector, which increases redundancy and may degrade the performance of backend classifiers. In this paper, we first propose a timefrequency cepstral (TFC) feature vector, which is obtained by performing a temporal discrete cosine transform (DCT) on the cepstrum matrix and selecting the transformed elements in a zigzag scan order. Beyond this, we increase discriminability through a heteroscedastic linear discriminant analysis (HLDA) on the full cepstrum matrix. By utilizing block diagonal matrix constraints, the large HLDA problem is then reduced to several smaller HLDA problems, creating a block diagonal HLDA (BDHLDA) algorithm which has much lower computational complexity. The BDHLDA method is finally extended to the GMM domain, using the simpler TFC features during re-estimation to provide significantly improved computation speed. Experiments on NIST 2003 and 2007 LRE evaluation corpora show that TFC is more effective than SDC, and that the GMM-based BDHLDA results in lower equal error rate (EER) and minimum average cost (Cavg) than either TFC or SDC approaches.
Original language | English |
---|---|
Article number | 5444973 |
Pages (from-to) | 266-276 |
Number of pages | 11 |
Journal | IEEE Transactions on Audio, Speech and Language Processing |
Volume | 19 |
Issue number | 2 |
DOIs | |
State | Published - 2011 |
Bibliographical note
Funding Information:Manuscript received October 07, 2009; revised March 24, 2010; accepted March 24, 2010. Date of publication April 08, 2010; date of current version October 27, 2010. This work was supported in part by the National Natural Science Foundation of China and Microsoft Research Asia under Grant 60776800, in part by the National Natural Science Foundation of China and Research Grants Council under Grant 60931160443 and in part by the National High Technology Development Program of China under Grants 2006AA010101, 2007AA04Z223, 2008AA02Z414, and 2008AA040201. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Gokhan Tur.
Keywords
- Language recognition (LRE)
- block diagonal heteroscedastic linear discriminant analysis (BDHLDA)
- timefrequency cepstrum (TFC)
ASJC Scopus subject areas
- Acoustics and Ultrasonics
- Electrical and Electronic Engineering