Abstract
In recent years, Generative Adversarial Networks (GANs) have produced significantly improved speech enhancement (SE) task results. However, they are challenging to train. In this work, we introduce several improvements to GAN training schemes, which can be applied to most GAN-based SE models. We propose using consistency loss functions, which target the inconsistency in time and time-frequency domains caused by Fourier and Inverse Fourier Transforms. We also present self-correcting optimization for training a GAN discriminator on SE tasks which helps avoid “harmful” training directions for parts of the discriminator loss function. We have tested our proposed methods on several state-of-the-art GAN-based SE models and obtained consistent improvements, including new state-of-the-art results for the Voice Bank+DEMAND dataset.
Original language | English |
---|---|
Pages (from-to) | 2463-2467 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2023-August |
DOIs | |
State | Published - 2023 |
Event | 24th International Speech Communication Association, Interspeech 2023 - Dublin, Ireland Duration: Aug 20 2023 → Aug 24 2023 |
Bibliographical note
Publisher Copyright:© 2023 International Speech Communication Association. All rights reserved.
Keywords
- GAN
- MetricGAN
- STFT Consistency
- Self-Correcting Optimization
- Speech Enhancement
- Voice Bank+DEMAND
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modeling and Simulation