Kernel Machines Beat Deep Neural Networks on Mask-based Single-channel Speech Enhancement

11/06/2018
by   Like Hui, et al.
4

We apply a fast kernel method for mask-based single-channel speech enhancement. Specifically, our method solves a kernel regression problem associated to a non-smooth kernel function (exponential power kernel) with a highly efficient iterative method (EigenPro). Due to the simplicity of this method, its hyper-parameters such as kernel bandwidth can be automatically and efficiently selected using line search with subsamples of training data. We observe an empirical correlation between the regression loss (mean square error) and regular metrics for speech enhancement. This observation justifies our training target and motivates us to achieve lower regression loss by training separate kernel model per frequency subband. We compare our method with the state-of-the-art deep neural networks on mask-based HINT and TIMIT. Experimental results show that our kernel method consistently outperforms deep neural networks while requiring less training time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/27/2018

Student-Teacher Learning for BLSTM Mask-based Speech Enhancement

Spectral mask estimation using bidirectional long short-term memory (BLS...
research
04/07/2019

VoiceID Loss: Speech Enhancement for Speaker Verification

In this paper, we propose VoiceID loss, a novel loss function for traini...
research
09/03/2019

On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement

Many deep learning-based speech enhancement algorithms are designed to m...
research
08/14/2019

Components Loss for Neural Networks in Mask-Based Speech Enhancement

Estimating time-frequency domain masks for single-channel speech enhance...
research
11/16/2018

Exploring Tradeoffs in Models for Low-latency Speech Enhancement

We explore a variety of neural networks configurations for one- and two-...
research
04/02/2019

Speech denoising by parametric resynthesis

This work proposes the use of clean speech vocoder parameters as the tar...
research
08/11/2020

PoCoNet: Better Speech Enhancement with Frequency-Positional Embeddings, Semi-Supervised Conversational Data, and Biased Loss

Neural network applications generally benefit from larger-sized models, ...

Please sign up or login with your details

Forgot password? Click here to reset