A Training Framework for Stereo-Aware Speech Enhancement using Deep Neural Networks

12/09/2021
by   Bahareh Tolooshams, et al.
3

Deep learning-based speech enhancement has shown unprecedented performance in recent years. The most popular mono speech enhancement frameworks are end-to-end networks mapping the noisy mixture into an estimate of the clean speech. With growing computational power and availability of multichannel microphone recordings, prior works have aimed to incorporate spatial statistics along with spectral information to boost up performance. Despite an improvement in enhancement performance of mono output, the spatial image preservation and subjective evaluations have not gained much attention in the literature. This paper proposes a novel stereo-aware framework for speech enhancement, i.e., a training loss for deep learning-based speech enhancement to preserve the spatial image while enhancing the stereo mixture. The proposed framework is model independent, hence it can be applied to any deep learning based architecture. We provide an extensive objective and subjective evaluation of the trained models through a listening test. We show that by regularizing for an image preservation loss, the overall performance is improved, and the stereo aspect of the speech is better preserved.

READ FULL TEXT

page 3

page 4

research
03/03/2020

Phonetic Feedback for Speech Enhancement With and Without Parallel Speech Data

While deep learning systems have gained significant ground in speech enh...
research
10/12/2021

MetricGAN-U: Unsupervised speech enhancement/ dereverberation based only on noisy/ reverberated speech

Most of the deep learning-based speech enhancement models are learned in...
research
06/05/2023

On the Behavior of Intrusive and Non-intrusive Speech Enhancement Metrics in Predictive and Generative Settings

Since its inception, the field of deep speech enhancement has been domin...
research
11/25/2022

Stereo Speech Enhancement Using Custom Mid-Side Signals and Monaural Processing

Speech Enhancement (SE) systems typically operate on monaural input and ...
research
07/31/2018

Lip-Reading Driven Deep Learning Approach for Speech Enhancement

This paper proposes a novel lip-reading driven deep learning framework f...
research
09/19/2023

Efficient Multi-Channel Speech Enhancement with Spherical Harmonics Injection for Directional Encoding

Multi-channel speech enhancement extracts speech using multiple micropho...
research
03/14/2022

TaylorBeamformer: Learning All-Neural Beamformer for Multi-Channel Speech Enhancement from Taylor's Approximation Theory

While existing end-to-end beamformers achieve impressive performance in ...

Please sign up or login with your details

Forgot password? Click here to reset