Constrained Convolutional-Recurrent Networks to Improve Speech Quality with Low Impact on Recognition Accuracy

02/16/2018
by   Rasool Fakoor, et al.
0

For a speech-enhancement algorithm, it is highly desirable to simultaneously improve perceptual quality and recognition rate. Thanks to computational costs and model complexities, it is challenging to train a model that effectively optimizes both metrics at the same time. In this paper, we propose a method for speech enhancement that combines local and global contextual structures information through convolutional-recurrent neural networks that improves perceptual quality. At the same time, we introduce a new constraint on the objective function using a language model/decoder that limits the impact on recognition rate. Based on experiments conducted with real user data, we demonstrate that our new context-augmented machine-learning approach for speech enhancement improves PESQ and WER by an additional 24.5 respectively, when compared to the best-performing methods in the literature.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2022

NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement

Acoustic echo cancellation (AEC) plays an important role in the full-dup...
research
04/08/2021

Phoneme-based Distribution Regularization for Speech Enhancement

Existing speech enhancement methods mainly separate speech from noises a...
research
08/24/2020

AMRConvNet: AMR-Coded Speech Enhancement Using Convolutional Neural Networks

Speech is converted to digital signals using speech coding for efficient...
research
05/02/2018

Convolutional-Recurrent Neural Networks for Speech Enhancement

We propose an end-to-end model based on convolutional and recurrent neur...
research
05/20/2020

TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids

Modern speech enhancement algorithms achieve remarkable noise suppressio...
research
10/24/2022

TridentSE: Guiding Speech Enhancement with 32 Global Tokens

In this paper, we present TridentSE, a novel architecture for speech enh...
research
08/11/2020

PoCoNet: Better Speech Enhancement with Frequency-Positional Embeddings, Semi-Supervised Conversational Data, and Biased Loss

Neural network applications generally benefit from larger-sized models, ...

Please sign up or login with your details

Forgot password? Click here to reset