Investigation of Frame Alignments for GMM-based Digit-prompted Speaker Verification

Yi Liu, Liang He, Wei Qiang Zhang, Jia Liu, Michael T. Johnson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Frame alignments can be computed by different methods in GMM-based speaker verification. By incorporating a phonetic Gaussian mixture model (PGMM), we are able to compare the performance using alignments extracted from the deep neural networks (DNN) and the conventional hidden Markov model (HMM) in digit-prompted speaker verification. Based on the different characteristics of these two alignments, we present a novel content verification method to improve the system security without much computational overhead. Our experiments on the RSR2015 Part-3 digit-prompted task show that, the DNN-based alignment performs on par with the HMM alignment. The results also demonstrate the effectiveness of the proposed Kullback-Leibler (KL) divergence based scoring to reject speech with incorrect pass-phrases.

Original languageEnglish
Title of host publication2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings
Pages1467-1472
Number of pages6
ISBN (Electronic)9789881476852
DOIs
StatePublished - Jul 2 2018
Event10th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Honolulu, United States
Duration: Nov 12 2018Nov 15 2018

Publication series

Name2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings

Conference

Conference10th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018
Country/TerritoryUnited States
CityHonolulu
Period11/12/1811/15/18

Bibliographical note

Publisher Copyright:
© 2018 APSIPA organization.

ASJC Scopus subject areas

  • Information Systems

Fingerprint

Dive into the research topics of 'Investigation of Frame Alignments for GMM-based Digit-prompted Speaker Verification'. Together they form a unique fingerprint.

Cite this