SCP-GAN: Self-Correcting Discriminator Optimization for Training Consistency Preserving Metric GAN on Speech Enhancement Tasks

Vasily Zadorozhnyy, Qiang Ye, Kazuhito Koishida

Research output: Contribution to journalConference articlepeer-review

Abstract

In recent years, Generative Adversarial Networks (GANs) have produced significantly improved speech enhancement (SE) task results. However, they are challenging to train. In this work, we introduce several improvements to GAN training schemes, which can be applied to most GAN-based SE models. We propose using consistency loss functions, which target the inconsistency in time and time-frequency domains caused by Fourier and Inverse Fourier Transforms. We also present self-correcting optimization for training a GAN discriminator on SE tasks which helps avoid “harmful” training directions for parts of the discriminator loss function. We have tested our proposed methods on several state-of-the-art GAN-based SE models and obtained consistent improvements, including new state-of-the-art results for the Voice Bank+DEMAND dataset.

Original languageEnglish
Pages (from-to)2463-2467
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2023-August
DOIs
StatePublished - 2023
Event24th International Speech Communication Association, Interspeech 2023 - Dublin, Ireland
Duration: Aug 20 2023Aug 24 2023

Bibliographical note

Publisher Copyright:
© 2023 International Speech Communication Association. All rights reserved.

Keywords

  • GAN
  • MetricGAN
  • STFT Consistency
  • Self-Correcting Optimization
  • Speech Enhancement
  • Voice Bank+DEMAND

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'SCP-GAN: Self-Correcting Discriminator Optimization for Training Consistency Preserving Metric GAN on Speech Enhancement Tasks'. Together they form a unique fingerprint.

Cite this