CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement

09/22/2022
by   Sherif Abdulatif, et al.
4

Convolution-augmented transformers (Conformers) are recently proposed in various speech-domain applications, such as automatic speech recognition (ASR) and speech separation, as they can capture both local and global dependencies. In this paper, we propose a conformer-based metric generative adversarial network (CMGAN) for speech enhancement (SE) in the time-frequency (TF) domain. The generator encodes the magnitude and complex spectrogram information using two-stage conformer blocks to model both time and frequency dependencies. The decoder then decouples the estimation into a magnitude mask decoder branch to filter out unwanted distortions and a complex refinement branch to further improve the magnitude estimation and implicitly enhance the phase information. Additionally, we include a metric discriminator to alleviate metric mismatch by optimizing the generator with respect to a corresponding evaluation score. Objective and subjective evaluations illustrate that CMGAN is able to show superior performance compared to state-of-the-art methods in three speech enhancement tasks (denoising, dereverberation and super-resolution). For instance, quantitative denoising analysis on Voice Bank+DEMAND dataset indicates that CMGAN outperforms various previous models with a margin, i.e., PESQ of 3.41 and SSNR of 11.10 dB.

READ FULL TEXT

page 1

page 2

page 4

page 10

page 15

page 16

research
03/28/2022

CMGAN: Conformer-based Metric GAN for Speech Enhancement

Recently, convolution-augmented transformer (Conformer) has achieved pro...
research
12/19/2020

DCCRGAN: Deep Complex Convolution Recurrent Generator Adversarial Network for Speech Enhancement

Generative adversarial network (GAN) still exists some problems in deali...
research
06/09/2021

Deep Interaction between Masking and Mapping Targets for Single-Channel Speech Enhancement

The most recent deep neural network (DNN) models exhibit impressive deno...
research
08/17/2017

An instrumental intelligibility metric based on information theory

We propose a new monaural intrusive instrumental intelligibility metric ...
research
10/12/2021

Foster Strengths and Circumvent Weaknesses: a Speech Enhancement Framework with Two-branch Collaborative Learning

Recent single-channel speech enhancement methods usually convert wavefor...
research
06/30/2022

GLD-Net: Improving Monaural Speech Enhancement by Learning Global and Local Dependency Features with GLD Block

For monaural speech enhancement, contextual information is important for...
research
05/15/2023

ForkNet: Simultaneous Time and Time-Frequency Domain Modeling for Speech Enhancement

Previous research in speech enhancement has mostly focused on modeling t...

Please sign up or login with your details

Forgot password? Click here to reset