Incorporating Real-world Noisy Speech in Neural-network-based Speech Enhancement Systems

09/11/2021
by   Yangyang Xia, et al.
0

Supervised speech enhancement relies on parallel databases of degraded speech signals and their clean reference signals during training. This setting prohibits the use of real-world degraded speech data that may better represent the scenarios where such systems are used. In this paper, we explore methods that enable supervised speech enhancement systems to train on real-world degraded speech data. Specifically, we propose a semi-supervised approach for speech enhancement in which we first train a modified vector-quantized variational autoencoder that solves a source separation task. We then use this trained autoencoder to further train an enhancement network using real-world noisy speech data by computing a triplet-based unsupervised loss function. Experiments show promising results for incorporating real-world data in training speech enhancement systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2021

Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses

Speech enhancement has recently achieved great success with various deep...
research
08/11/2020

PoCoNet: Better Speech Enhancement with Frequency-Positional Embeddings, Semi-Supervised Conversational Data, and Biased Loss

Neural network applications generally benefit from larger-sized models, ...
research
02/08/2023

A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech

Recent Text-to-Speech (TTS) systems trained on reading or acted corpora ...
research
08/09/2022

Subjective Evaluation of Deep Neural Network Based Speech Enhancement Systems in Real-World Conditions

Subjective evaluation results for two low-latency deep neural networks (...
research
02/13/2021

Multi-Channel Speech Enhancement using Graph Neural Networks

Multi-channel speech enhancement aims to extract clean speech from a noi...
research
01/25/2023

On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems

The performance of neural network-based speech enhancement systems is pr...
research
03/09/2020

Tackling real noisy reverberant meetings with all-neural source separation, counting, and diarization system

Automatic meeting analysis is an essential fundamental technology requir...

Please sign up or login with your details

Forgot password? Click here to reset