DeepAI AI Chat
Log In Sign Up

Incorporating Real-world Noisy Speech in Neural-network-based Speech Enhancement Systems

09/11/2021
by   Yangyang Xia, et al.
Facebook
Carnegie Mellon University
0

Supervised speech enhancement relies on parallel databases of degraded speech signals and their clean reference signals during training. This setting prohibits the use of real-world degraded speech data that may better represent the scenarios where such systems are used. In this paper, we explore methods that enable supervised speech enhancement systems to train on real-world degraded speech data. Specifically, we propose a semi-supervised approach for speech enhancement in which we first train a modified vector-quantized variational autoencoder that solves a source separation task. We then use this trained autoencoder to further train an enhancement network using real-world noisy speech data by computing a triplet-based unsupervised loss function. Experiments show promising results for incorporating real-world data in training speech enhancement systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/16/2021

Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses

Speech enhancement has recently achieved great success with various deep...
02/08/2023

A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech

Recent Text-to-Speech (TTS) systems trained on reading or acted corpora ...
02/13/2021

Multi-Channel Speech Enhancement using Graph Neural Networks

Multi-channel speech enhancement aims to extract clean speech from a noi...
08/09/2022

Subjective Evaluation of Deep Neural Network Based Speech Enhancement Systems in Real-World Conditions

Subjective evaluation results for two low-latency deep neural networks (...
01/25/2023

On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems

The performance of neural network-based speech enhancement systems is pr...
03/09/2020

Tackling real noisy reverberant meetings with all-neural source separation, counting, and diarization system

Automatic meeting analysis is an essential fundamental technology requir...