CMGAN: Conformer-based Metric GAN for Speech Enhancement

03/28/2022
by   Ruizhe Cao, et al.
0

Recently, convolution-augmented transformer (Conformer) has achieved promising performance in automatic speech recognition (ASR) and time-domain speech enhancement (SE), as it can capture both local and global dependencies in the speech signal. In this paper, we propose a conformer-based metric generative adversarial network (CMGAN) for SE in the time-frequency (TF) domain. In the generator, we utilize two-stage conformer blocks to aggregate all magnitude and complex spectrogram information by modeling both time and frequency dependencies. The estimation of magnitude and complex spectrogram is decoupled in the decoder stage and then jointly incorporated to reconstruct the enhanced speech. In addition, a metric discriminator is employed to further improve the quality of the enhanced estimated speech by optimizing the generator with respect to a corresponding evaluation score. Quantitative analysis on Voice Bank+DEMAND dataset indicates the capability of CMGAN in outperforming various previous models with a margin, i.e., PESQ of 3.41 and SSNR of 11.10 dB.

READ FULL TEXT
research
09/22/2022

CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement

Convolution-augmented transformers (Conformers) are recently proposed in...
research
03/18/2021

TSTNN: Two-stage Transformer based Neural Network for Speech Enhancement in the Time Domain

In this paper, we propose a transformer-based architecture, called two-s...
research
08/17/2017

An instrumental intelligibility metric based on information theory

We propose a new monaural intrusive instrumental intelligibility metric ...
research
10/26/2022

SCP-GAN: Self-Correcting Discriminator Optimization for Training Consistency Preserving Metric GAN on Speech Enhancement Tasks

In recent years, Generative Adversarial Networks (GANs) have produced si...
research
10/20/2020

Investigating Cross-Domain Losses for Speech Enhancement

Recent years have seen a surge in the number of available frameworks for...
research
10/24/2022

TridentSE: Guiding Speech Enhancement with 32 Global Tokens

In this paper, we present TridentSE, a novel architecture for speech enh...
research
12/19/2020

DCCRGAN: Deep Complex Convolution Recurrent Generator Adversarial Network for Speech Enhancement

Generative adversarial network (GAN) still exists some problems in deali...

Please sign up or login with your details

Forgot password? Click here to reset