Data-driven design of perfect reconstruction filterbank for DNN-based sound source enhancement

03/21/2019
by   Daiki Takeuchi, et al.
0

We propose a data-driven design method of perfect-reconstruction filterbank (PRFB) for sound-source enhancement (SSE) based on deep neural network (DNN). DNNs have been used to estimate a time-frequency (T-F) mask in the short-time Fourier transform (STFT) domain. Their training is more stable when a simple cost function as mean-squared error (MSE) is utilized comparing to some advanced cost such as objective sound quality assessments. However, such a simple cost function inherits strong assumptions on the statistics of the target and/or noise which is often not satisfied, and the mismatch of assumption results in degraded performance. In this paper, we propose to design the frequency scale of PRFB from training data so that the assumption on MSE is satisfied. For designing the frequency scale, the warped filterbank frame (WFBF) is considered as PRFB. The frequency characteristic of learned WFBF was in between STFT and the wavelet transform, and its effectiveness was confirmed by comparison with a standard STFT-based DNN whose input feature is compressed into the mel scale.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

research
11/25/2019

Invertible DNN-based nonlinear time-frequency transform for speech enhancement

We propose an end-to-end speech enhancement method with trainable time-f...
research
10/22/2018

DNN-based Source Enhancement to Increase Objective Sound Quality Assessment Score

We propose a training method for deep neural network (DNN)-based source ...
research
06/21/2018

On the Equivalence between Objective Intelligibility and Mean-Squared Error for Deep Neural Network based Speech Enhancement

Although speech enhancement algorithms based on deep neural networks (DN...
research
05/21/2019

DNN-Based Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement

Multi-frame approaches for single-microphone speech enhancement, e.g., t...
research
11/08/2020

Frequency Gating: Improved Convolutional Neural Networks for Speech Enhancement in the Time-Frequency Domain

One of the strengths of traditional convolutional neural networks (CNNs)...
research
02/14/2020

Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-Box Cost Function

Improving subjective sound quality of enhanced signals is one of the mos...
research
07/18/2018

Deep neural network based speech separation optimizing an objective estimator of intelligibility for low latency applications

Mean square error (MSE) has been the preferred choice as loss function i...

Please sign up or login with your details

Forgot password? Click here to reset