Weighted Speech Distortion Losses for Neural-network-based Real-time Speech Enhancement

01/28/2020
by   Yangyang Xia, et al.
0

This paper investigates several aspects of training a RNN (recurrent neural network) that impact the objective and subjective quality of enhanced speech for real-time single-channel speech enhancement. Specifically, we focus on a RNN that enhances short-time speech spectra on a single-frame-in, single-frame-out basis, a framework adopted by most classical signal processing methods. We propose two novel mean-squared-error-based learning objectives that enable separate control over the importance of speech distortion versus noise reduction. The proposed loss functions are evaluated by widely accepted objective quality and intelligibility measures and compared to other competitive online methods. In addition, we study the impact of feature normalization and varying batch sequence lengths on the objective quality of enhanced speech. Finally, we show subjective ratings for the proposed approach and a state-of-the-art real-time RNN-based method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/08/2019

A Supervised Speech enhancement Approach with Residual Noise Control for Voice Communication

For voice communication, it is important to extract the speech from its ...
research
02/14/2020

Real-time speech enhancement using equilibriated RNN

We propose a speech enhancement method using a causal deep neural networ...
research
02/11/2022

A Novel Speech Intelligibility Enhancement Model based on CanonicalCorrelation and Deep Learning

Current deep learning (DL) based approaches to speech intelligibility en...
research
08/14/2019

Components Loss for Neural Networks in Mask-Based Speech Enhancement

Estimating time-frequency domain masks for single-channel speech enhance...
research
07/29/2020

Investigation of Phase Distortion on Perceived Speech Quality for Hearing-impaired Listeners

Phase serves as a critical component of speech that influences the quali...
research
09/24/2017

A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement

Despite noise suppression being a mature area in signal processing, it r...
research
02/05/2019

An Enhanced Interleaving Frame Loss Concealment Method for Voice Over IP Network Services

This paper focuses on AMR WB G.722.2 speech codec, and discusses the unu...

Please sign up or login with your details

Forgot password? Click here to reset