Frequency Gating: Improved Convolutional Neural Networks for Speech Enhancement in the Time-Frequency Domain

11/08/2020
by   Koen Oostermeijer, et al.
6

One of the strengths of traditional convolutional neural networks (CNNs) is their inherent translational invariance. However, for the task of speech enhancement in the time-frequency domain, this property cannot be fully exploited due to a lack of invariance in the frequency direction. In this paper we propose to remedy this inefficiency by introducing a method, which we call Frequency Gating, to compute multiplicative weights for the kernels of the CNN in order to make them frequency dependent. Several mechanisms are explored: temporal gating, in which weights are dependent on prior time frames, local gating, whose weights are generated based on a single time frame and the ones adjacent to it, and frequency-wise gating, where each kernel is assigned a weight independent of the input data. Experiments with an autoencoder neural network with skip connections show that both local and frequency-wise gating outperform the baseline and are therefore viable ways to improve CNN-based speech enhancement neural networks. In addition, a loss function based on the extended short-time objective intelligibility score (ESTOI) is introduced, which we show to outperform the standard mean squared error (MSE) loss function.

READ FULL TEXT

page 2

page 4

research
06/08/2023

Convolutional Recurrent Neural Network with Attention for 3D Speech Enhancement

3D speech enhancement can effectively improve the auditory experience an...
research
05/23/2020

Exploring the Best Loss Function for DNN-Based Low-latency Speech Enhancement with Temporal Convolutional Networks

Recently, deep neural networks (DNNs) have been successfully used for sp...
research
06/06/2018

Spatial Frequency Loss for Learning Convolutional Autoencoders

This paper presents a learning method for convolutional autoencoders (CA...
research
08/24/2020

AMRConvNet: AMR-Coded Speech Enhancement Using Convolutional Neural Networks

Speech is converted to digital signals using speech coding for efficient...
research
12/08/2019

A Supervised Speech enhancement Approach with Residual Noise Control for Voice Communication

For voice communication, it is important to extract the speech from its ...
research
08/11/2020

PoCoNet: Better Speech Enhancement with Frequency-Positional Embeddings, Semi-Supervised Conversational Data, and Biased Loss

Neural network applications generally benefit from larger-sized models, ...
research
03/21/2019

Data-driven design of perfect reconstruction filterbank for DNN-based sound source enhancement

We propose a data-driven design method of perfect-reconstruction filterb...

Please sign up or login with your details

Forgot password? Click here to reset