TY - GEN
T1 - Physiologically-motivated feature extraction for speaker identification
AU - Wang, Jianglin
AU - Johnson, Michael T.
PY - 2014
Y1 - 2014
N2 - This paper introduces the use of three physiologically-motivated features for speaker identification, Residual Phase Cepstrum Coefficients (RPCC), Glottal Flow Cepstrum Coefficients (GLFCC) and Teager Phase Cepstrum Coefficients (TPCC). These features capture speaker-discriminative characteristics from different aspects of glottal source excitation patterns. The proposed physiologically-driven features give better results with lower model complexities, and also provide complementary information that can improve overall system performance even for larger amounts of data. Results on speaker identification using the YOHO corpus demonstrate that these physiologically-driven features are both more accurate than and complementary to traditional mel-frequency cepstral coefficients (MFCC). In particular, the incorporation of the proposed glottal source features offers significant overall improvement to the robustness and accuracy of speaker identification tasks.
AB - This paper introduces the use of three physiologically-motivated features for speaker identification, Residual Phase Cepstrum Coefficients (RPCC), Glottal Flow Cepstrum Coefficients (GLFCC) and Teager Phase Cepstrum Coefficients (TPCC). These features capture speaker-discriminative characteristics from different aspects of glottal source excitation patterns. The proposed physiologically-driven features give better results with lower model complexities, and also provide complementary information that can improve overall system performance even for larger amounts of data. Results on speaker identification using the YOHO corpus demonstrate that these physiologically-driven features are both more accurate than and complementary to traditional mel-frequency cepstral coefficients (MFCC). In particular, the incorporation of the proposed glottal source features offers significant overall improvement to the robustness and accuracy of speaker identification tasks.
KW - Glottal source excitation and GMM-UBM
KW - Speaker distinctive feature
KW - Speaker identification
UR - http://www.scopus.com/inward/record.url?scp=84905258919&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84905258919&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2014.6853886
DO - 10.1109/ICASSP.2014.6853886
M3 - Conference contribution
AN - SCOPUS:84905258919
SN - 9781479928927
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 1690
EP - 1694
BT - 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
T2 - 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
Y2 - 4 May 2014 through 9 May 2014
ER -