DeepAI
Log In Sign Up

Speech Enhancement using Separable Polling Attention and Global Layer Normalization followed with PReLU

05/06/2021
by   Dengfeng Ke, et al.
0

Single channel speech enhancement is a challenging task in speech community. Recently, various neural networks based methods have been applied to speech enhancement. Among these models, PHASEN and T-GSA achieve state-of-the-art performances on the publicly opened VoiceBank+DEMAND corpus. Both of the models reach the COVL score of 3.62. PHASEN achieves the highest CSIG score of 4.21 while T-GSA gets the highest PESQ score of 3.06. However, both of these two models are very large. The contradiction between the model performance and the model size is hard to reconcile. In this paper, we introduce three kinds of techniques to shrink the PHASEN model and improve the performance. Firstly, seperable polling attention is proposed to replace the frequency transformation blocks in PHASEN. Secondly, global layer normalization followed with PReLU is used to replace batch normalization followed with ReLU. Finally, BLSTM in PHASEN is replaced with Conv2d operation and the phase stream is simplified. With all these modifications, the size of the PHASEN model is shrunk from 33M parameters to 5M parameters, while the performance on VoiceBank+DEMAND is improved to the CSIG score of 4.30, the PESQ score of 3.07 and the COVL score of 3.73.

READ FULL TEXT

page 1

page 2

page 3

page 4

10/27/2021

Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions

The deep learning based time-domain models, e.g. Conv-TasNet, have shown...
11/10/2022

Speech Enhancement with Fullband-Subband Cross-Attention Network

FullSubNet has shown its promising performance on speech enhancement by ...
02/20/2020

iSEGAN: Improved Speech Enhancement Generative Adversarial Networks

Popular neural network-based speech enhancement systems operate on the m...
04/08/2021

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

The discrepancy between the cost function used for training a speech enh...
11/16/2018

Exploring Tradeoffs in Models for Low-latency Speech Enhancement

We explore a variety of neural networks configurations for one- and two-...
07/25/2020

Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement

This paper investigates different trade-offs between the number of model...
10/24/2022

TridentSE: Guiding Speech Enhancement with 32 Global Tokens

In this paper, we present TridentSE, a novel architecture for speech enh...