Joint magnitude estimation and phase recovery using Cycle-in-Cycle GAN for non-parallel speech enhancement

09/26/2021
by   Guochen Yu, et al.
0

For the lack of adequate paired noisy-clean speech corpus in many real scenarios, non-parallel training is a promising task for DNN-based speech enhancement methods. However, because of the severe mismatch between input and target speech, many previous studies only focus on the magnitude spectrum estimation and remain the phase unaltered, resulting in the degraded speech quality under low signal-to-noise ratio conditions. To tackle this problem, we decouple the difficult target w.r.t. original spectrum optimization into spectral magnitude and phase, and a novel Cycle-in-Cycle generative adversarial network (dubbed CinCGAN) is proposed to jointly estimate the spectral magnitude and phase information stage by stage under unpaired data. In the first stage, we pretrain a magnitude CycleGAN to coarsely estimate the spectral magnitude of clean speech. In the second stage, we incorporate the pretrained CycleGAN in a complex-valued CycleGAN as a cycle-in-cycle structure to simultaneously recover phase information and refine the overall spectrum. Experimental results demonstrate that the proposed approach significantly outperforms previous baselines under non-parallel training. The evaluation on training the models with standard paired data also shows that CinCGAN achieves remarkable performance especially in reducing background noise and speech distortion.

READ FULL TEXT
research
11/03/2020

Two Heads Are Better Than One: A Two-Stage Approach for Monaural Noise Reduction in the Complex Domain

In low signal-to-noise ratio conditions, it is difficult to effectively ...
research
09/05/2021

A Two-stage Complex Network using Cycle-consistent Generative Adversarial Networks for Speech Enhancement

Cycle-consistent generative adversarial networks (CycleGAN) have shown t...
research
07/28/2021

CycleGAN-based Non-parallel Speech Enhancement with an Adaptive Attention-in-attention Mechanism

Non-parallel training is a difficult but essential task for DNN-based sp...
research
02/03/2022

A deep complex network with multi-frame filtering for stereophonic acoustic echo cancellation

In hands-free communication system, the coupling between the loudspeaker...
research
10/27/2020

Phase Aware Speech Enhancement using Realisation of Complex-valued LSTM

Most of the deep learning based speech enhancement (SE) methods rely on ...
research
04/30/2022

Taylor, Can You Hear Me Now? A Taylor-Unfolding Framework for Monaural Speech Enhancement

While the deep learning techniques promote the rapid development of the ...
research
10/31/2022

Magnitude or Phase? A Two Stage Algorithm for Dereverberation

In this work we present a new single-microphone speech dereverberation a...

Please sign up or login with your details

Forgot password? Click here to reset