Homogenous ensemble phonotactic language recognition based on SVM supervector reconstruction

Wei Wei Liu, Wei Qiang Zhang, Michael T. Johnson, Jia Liu

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Currently, acoustic spoken language recognition (SLR) and phonotactic SLR systems are widely used language recognition systems. To achieve better performance, researchers combine multiple subsystems with the results often much better than a single SLR system. Phonotactic SLR subsystems may vary in the acoustic features vectors or include multiple language-specific phone recognizers and different acoustic models. These methods achieve good performance but usually compute at high computational cost. In this paper, a new diversification for phonotactic language recognition systems is proposed using vector space models by support vector machine (SVM) supervector reconstruction (SSR). In this architecture, the subsystems share the same feature extraction, decoding, and N-gram counting preprocessing steps, but model in a different vector space by using the SSR algorithm without significant additional computation. We term this a homogeneous ensemble phonotactic language recognition (HEPLR) system. The system integrates three different SVM supervector reconstruction algorithms, including relative SVM supervector reconstruction, functional SVM supervector reconstruction, and perturbing SVM supervector reconstruction. All of the algorithms are incorporated using a linear discriminant analysis-maximum mutual information (LDA-MMI) backend for improving language recognition evaluation (LRE) accuracy. Evaluated on the National Institute of Standards and Technology (NIST) LRE 2009 task, the proposed HEPLR system achieves better performance than a baseline phone recognition-vector space modeling (PR-VSM) system with minimal extra computational cost. The performance of the HEPLR system yields 1.39%, 3.63%, and 14.79% equal error rate (EER), representing 6.06%, 10.15%, and 10.53% relative improvements over the baseline system, respectively, for the 30-, 10-, and 3-s test conditions.

Original languageEnglish
Article number42
Pages (from-to)1-13
Number of pages13
JournalEurasip Journal on Audio, Speech, and Music Processing
Volume2014
Issue number1
DOIs
StatePublished - 2014

Bibliographical note

Publisher Copyright:
© 2014, Liu et al.; licensee Springer.

Funding

This project is supported by the National Natural Science Foundation of China under grant nos. 61370034, 61273268, and 61403224.

FundersFunder number
National Natural Science Foundation of China (NSFC)
National Natural Science Foundation of China (NSFC)61403224, 61273268, 61370034

    Keywords

    • Phone recognition-vector space modeling (PR-VSM)
    • Phonotactic language recognition
    • Support vector machine (SVM) supervector reconstruction

    ASJC Scopus subject areas

    • Acoustics and Ultrasonics
    • Electrical and Electronic Engineering

    Fingerprint

    Dive into the research topics of 'Homogenous ensemble phonotactic language recognition based on SVM supervector reconstruction'. Together they form a unique fingerprint.

    Cite this