Improved Speech Enhancement with the Wave-U-Net

11/27/2018
by   Craig Macartney, et al.
0

We study the use of the Wave-U-Net architecture for speech enhancement, a model introduced by Stoller et al for the separation of music vocals and accompaniment. This end-to-end learning method for audio source separation operates directly in the time domain, permitting the integrated modelling of phase information and being able to take large temporal contexts into account. Our experiments show that the proposed method improves several metrics, namely PESQ, CSIG, CBAK, COVL and SSNR, over the state-of-the-art with respect to the speech enhancement task on the Voice Bank corpus (VCTK) dataset. We find that a reduced number of hidden layers is sufficient for speech enhancement in comparison to the original system designed for singing voice separation in music. We see this initial result as an encouraging signal to further explore speech enhancement in the time-domain, both as an end in itself and as a pre-processing step to speech recognition systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/07/2020

ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration

We present ESPnet-SE, which is designed for the quick development of spe...
research
12/07/2020

Towards end-to-end speech enhancement with a variational U-Net architecture

In this paper, we investigate the viability of a variational U-Net archi...
research
10/23/2020

Speech enhancement aided end-to-end multi-task learning for voice activity detection

Robust voice activity detection (VAD) is a challenging task in low signa...
research
04/05/2021

Real-time Streaming Wave-U-Net with Temporal Convolutions for Multichannel Speech Enhancement

In this paper, we describe the work that we have done to participate in ...
research
11/29/2019

Improving Voice Separation by Incorporating End-to-end Speech Recognition

Despite recent advances in voice separation methods, many challenges rem...
research
10/22/2019

Improving singing voice separation with the Wave-U-Net using Minimum Hyperspherical Energy

In recent years, deep learning has surpassed traditional approaches to t...
research
12/20/2018

Fréchet Audio Distance: A Metric for Evaluating Music Enhancement Algorithms

We propose the Fréchet Audio Distance (FAD), a novel, reference-free eva...

Please sign up or login with your details

Forgot password? Click here to reset