A Two-stage Complex Network using Cycle-consistent Generative Adversarial Networks for Speech Enhancement

by   Guochen Yu, et al.

Cycle-consistent generative adversarial networks (CycleGAN) have shown their promising performance for speech enhancement (SE), while one intractable shortcoming of these CycleGAN-based SE systems is that the noise components propagate throughout the cycle and cannot be completely eliminated. Additionally, conventional CycleGAN-based SE systems only estimate the spectral magnitude, while the phase is unaltered. Motivated by the multi-stage learning concept, we propose a novel two-stage denoising system that combines a CycleGAN-based magnitude enhancing network and a subsequent complex spectral refining network in this paper. Specifically, in the first stage, a CycleGAN-based model is responsible for only estimating magnitude, which is subsequently coupled with the original noisy phase to obtain a coarsely enhanced complex spectrum. After that, the second stage is applied to further suppress the residual noise components and estimate the clean phase by a complex spectral mapping network, which is a pure complex-valued network composed of complex 2D convolution/deconvolution and complex temporal-frequency attention blocks. Experimental results on two public datasets demonstrate that the proposed approach consistently surpasses previous one-stage CycleGANs and other state-of-the-art SE systems in terms of various evaluation metrics, especially in background noise suppression.


page 14

page 25


Joint magnitude estimation and phase recovery using Cycle-in-Cycle GAN for non-parallel speech enhancement

For the lack of adequate paired noisy-clean speech corpus in many real s...

Know Your Enemy, Know Yourself: A Unified Two-Stage Framework for Speech Enhancement

Traditional spectral subtraction-type single channel speech enhancement ...

Speech Enhancement Based on Cyclegan with Noise-informed Training

Speech enhancement (SE) approaches can be classified into supervised and...

Taylor, Can You Hear Me Now? A Taylor-Unfolding Framework for Monaural Speech Enhancement

While the deep learning techniques promote the rapid development of the ...

Magnitude or Phase? A Two Stage Algorithm for Dereverberation

In this work we present a new single-microphone speech dereverberation a...

A deep complex network with multi-frame filtering for stereophonic acoustic echo cancellation

In hands-free communication system, the coupling between the loudspeaker...

Phase-aware Single-stage Speech Denoising and Dereverberation with U-Net

In this work, we tackle a denoising and dereverberation problem with a s...

Please sign up or login with your details

Forgot password? Click here to reset