TY - GEN
T1 - Residual phase cepstrum coefficients with application to cross-lingual speaker verification
AU - Wang, Jianglin
AU - Johnson, Michael T.
PY - 2012
Y1 - 2012
N2 - Speaker identification and verification has received a great deal of attention from the speech community, and significant gains in robustness and accuracy have been obtained over the past decade [1], [2]. However, the features used for identification are still primarily representations of overall spectral characteristics, and thus the models are primarily phonetic in nature, differentiating speakers based on overall pronunciation patterns. This creates difficulties in terms of the amount of enrollment data and complexity of the models required to cover the phonetic space, especially in tasks such as cross-lingual verification where enrollment and testing data may not have similar phonetic coverage. This paper introduces the use of a new feature for speaker verification, residual phase cepstral coefficients (RPCC), to capture speaker characteristics from their vocal excitation patterns. Results on a cross-lingual speaker verification task taken from the NIST 2004 SRE demonstrate that these RPCC features are significantly more accurate than traditional melfrequency cepstral coefficients (MFCC) when the amount of enrollment data available for training is limited. Additionally, because of the significant differences in the nature of the features, combining MFCC and RPCC features shows an improvement in verification results over MFCCs alone.
AB - Speaker identification and verification has received a great deal of attention from the speech community, and significant gains in robustness and accuracy have been obtained over the past decade [1], [2]. However, the features used for identification are still primarily representations of overall spectral characteristics, and thus the models are primarily phonetic in nature, differentiating speakers based on overall pronunciation patterns. This creates difficulties in terms of the amount of enrollment data and complexity of the models required to cover the phonetic space, especially in tasks such as cross-lingual verification where enrollment and testing data may not have similar phonetic coverage. This paper introduces the use of a new feature for speaker verification, residual phase cepstral coefficients (RPCC), to capture speaker characteristics from their vocal excitation patterns. Results on a cross-lingual speaker verification task taken from the NIST 2004 SRE demonstrate that these RPCC features are significantly more accurate than traditional melfrequency cepstral coefficients (MFCC) when the amount of enrollment data available for training is limited. Additionally, because of the significant differences in the nature of the features, combining MFCC and RPCC features shows an improvement in verification results over MFCCs alone.
KW - GMM
KW - Glottal source excitation
KW - Residual phase cepstrum
KW - Speaker verification
KW - UBM
UR - http://www.scopus.com/inward/record.url?scp=84878403757&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84878403757&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84878403757
SN - 9781622767595
T3 - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
SP - 1554
EP - 1557
BT - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
T2 - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Y2 - 9 September 2012 through 13 September 2012
ER -