Voice conversion based on Gaussian mixture modules with Minimum Distance Spectral Mapping

Gui Jin, Michael T. Johnson, Jia Liu, Xiaokang Lin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Voice conversion (VC) is the task of modifying a source speaker's voice to match that of a specific target speaker. Traditional methods use Gaussian mixture models (GMM), but the converted speech quality is often badly degraded due to over-smoothing. More recent approaches such as Dynamic Frequency Warping (DFW) maintain more spectrum details during transformation, but require specific formant frequency estimates, with estimation errors resulting in poor similarity between source and target speakers. This paper proposes a new method for voice conversion called Minimum Distance Spectral Mapping (MDSM), based on a frequency-warped point-To-point mapping that robustly and accurately transforms formant frequencies while also maintaining spectral details. The proposed MDSM method uses a minimum distance alignment between source and target speakers, rather than direct formant estimates, which increases robustness and also preserves other spectral details such as formant bandwidth. Results show that the proposed method offers a good trade-off between voice quality and identity similarity, outperforming traditional GMM and DFW in both subjective and objective evaluations.

Original languageEnglish
Title of host publication2015 5th International Conference on Information Science and Technology, ICIST 2015
Pages356-359
Number of pages4
ISBN (Electronic)9781479974894
DOIs
StatePublished - Oct 2 2015
Event5th International Conference on Information Science and Technology, ICIST 2015 - Changsha, Hunan, China
Duration: Apr 24 2015Apr 26 2015

Publication series

Name2015 5th International Conference on Information Science and Technology, ICIST 2015

Conference

Conference5th International Conference on Information Science and Technology, ICIST 2015
Country/TerritoryChina
CityChangsha, Hunan
Period4/24/154/26/15

Bibliographical note

Funding Information:
The work was supported by National Natural Science Foundation of China under Grant No. 61273268, No. 61370034, and No. 61403224 .

Publisher Copyright:
© 2015 IEEE.

Keywords

  • Gaussian mixture models
  • Point-To-point mapping
  • Voice Conversion
  • frequency warping

ASJC Scopus subject areas

  • Information Systems

Fingerprint

Dive into the research topics of 'Voice conversion based on Gaussian mixture modules with Minimum Distance Spectral Mapping'. Together they form a unique fingerprint.

Cite this