Phone lattice reconstruction for embedded language recognition in LVCSR

Yuxiang Shan, Yan Deng, Jia Liu, Michael T. Johnson

Research output: Contribution to journalArticlepeer-review


An increasing number of multilingual applications require language recognition (LRE) as a frontend, but desire low additional computational cost. This article demonstrates a novel architecture for embedding phone based anguage recognition into a large vocabulary continuous speech recognition (LVCSR) decoder by sharing the same decoding process but generating separate lattices. To compensate for the prior bias introduced by the pronunciation dictionary and the language model of the LVCSR decoder, three different phone lattice reconstruction algorithms are proposed. The underlying goals of these algorithms are to override pronunciation and grammar restrictions to provide richer phonetic information. All of the new algorithms incorporate a vector space modeling backend for improved LRE accuracy. Evaluated on a Mandarin/English detection task, the proposed integrated LVCSR-LRE system using frame-expanded N-best phone lattice achieves comparable performance to a state-of-the-art phone recognition-vector space modeling (PRVSM) system, but with an added computational cost three times lower than that of a separate PRVSM system.

Original languageEnglish
Article number15
JournalEurasip Journal on Audio, Speech, and Music Processing
Issue number1
StatePublished - 2012

Bibliographical note

Funding Information:
This research was supported by the National Natural Science Foundation of China (NSFC) (Nos. 90920302 and 61005019), by the NSFC and Research Grants Council (RGC) of Hong Kong (No. 60931160443), and in part by the National High Technology Development Program of China (863 Program) (No. 2008AA040201).


  • Language recognition
  • Lattice reconstruction
  • Speech recognition
  • Vector space modeling

ASJC Scopus subject areas

  • Acoustics and Ultrasonics
  • Electrical and Electronic Engineering


Dive into the research topics of 'Phone lattice reconstruction for embedded language recognition in LVCSR'. Together they form a unique fingerprint.

Cite this