Abstract
A novel method combining filter banks and reconstructed phase spaces is proposed for the modeling and classification of speech. Reconstructed phase spaces, which are based on dynamical systems theory, have advantages over spectral-based analysis methods in that they can capture nonlinear or higher-order statistics. Recent work has shown that the natural measure of a reconstructed phase space can be used for modeling and classification of phonemes. In this work, sub-banding of speech, which has been examined for recognition of noise-corrupted speech, is studied in combination with phase space reconstruction. This sub-banding, which is motivated by empirical psychoacoustical studies, is shown to dramatically improve the phoneme classification accuracy of reconstructed phase space-based approaches. Experiments that examine the performance of fused sub-banded reconstructed phase spaces for phoneme classification are presented. Comparisons against a cepstral-based classifier show that the proposed approach is competitive with state-of-the-art methods for modeling and classification of phonemes. Combination of cepstral-based features and the sub-band RPS features shows improvement over a cepstral-only baseline.
Original language | English |
---|---|
Pages (from-to) | 760-774 |
Number of pages | 15 |
Journal | Speech Communication |
Volume | 48 |
Issue number | 7 |
DOIs | |
State | Published - Jul 2006 |
Bibliographical note
Funding Information:This material is based on work supported by the National Science Foundation under Grant No. IIS-0113508 and the Department of Education GAANN Fellowship.
Keywords
- Dynamical systems
- Nonlinear signal processing
- Speech recognition
- Sub-bands
ASJC Scopus subject areas
- Software
- Modeling and Simulation
- Communication
- Language and Linguistics
- Linguistics and Language
- Computer Vision and Pattern Recognition
- Computer Science Applications