Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses

11/16/2021
by   Viet Anh Trinh, et al.
0

Speech enhancement has recently achieved great success with various deep learning methods. However, most conventional speech enhancement systems are trained with supervised methods that impose two significant challenges. First, a majority of training datasets for speech enhancement systems are synthetic. When mixing clean speech and noisy corpora to create the synthetic datasets, domain mismatches occur between synthetic and real-world recordings of noisy speech or audio. Second, there is a trade-off between increasing speech enhancement performance and degrading speech recognition (ASR) performance. Thus, we propose an unsupervised loss function to tackle those two problems. Our function is developed by extending the MixIT loss function with speech recognition embedding and disentanglement loss. Our results show that the proposed function effectively improves the speech enhancement performance compared to a baseline trained in a supervised way on the noisy VoxCeleb dataset. While fully unsupervised training is unable to exceed the corresponding baseline, with joint super- and unsupervised training, the system is able to achieve similar speech quality and better ASR performance than the best supervised baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2022

Time-Domain Speech Enhancement for Robust Automatic Speech Recognition

It has been shown that the intelligibility of noisy speech can be improv...
research
09/11/2021

Incorporating Real-world Noisy Speech in Neural-network-based Speech Enhancement Systems

Supervised speech enhancement relies on parallel databases of degraded s...
research
06/01/2021

A Neural Acoustic Echo Canceller Optimized Using An Automatic Speech Recognizer And Large Scale Synthetic Data

We consider the problem of recognizing speech utterances spoken to a dev...
research
07/15/2022

Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments

This paper describes noisy speech recognition for an augmented reality h...
research
03/26/2018

Spectral feature mapping with mimic loss for robust speech recognition

For the task of speech enhancement, local learning objectives are agnost...
research
01/03/2019

Deep Speech Enhancement for Reverberated and Noisy Signals using Wide Residual Networks

This paper proposes a deep speech enhancement method which exploits the ...
research
07/28/2020

Neural Kalman Filtering for Speech Enhancement

Statistical signal processing based speech enhancement methods adopt exp...

Please sign up or login with your details

Forgot password? Click here to reset