Spectral feature mapping with mimic loss for robust speech recognition

03/26/2018
by   Deblin Bagchi, et al.
0

For the task of speech enhancement, local learning objectives are agnostic to phonetic structures helpful for speech recognition. We propose to add a global criterion to ensure de-noised speech is useful for downstream tasks like ASR. We first train a spectral classifier on clean speech to predict senone labels. Then, the spectral classifier is joined with our speech enhancer as a noisy speech recognizer. This model is taught to imitate the output of the spectral classifier alone on clean speech. This mimic loss is combined with the traditional local criterion to train the speech enhancer to produce de-noised speech. Feeding the de-noised speech to an off-the-shelf Kaldi training recipe for the CHiME-2 corpus shows significant improvements in WER.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/25/2018

An Exploration of Mimic Architectures for Residual Network Based Spectral Mapping

Spectral mapping uses a deep neural network (DNN) to map directly from n...
research
10/29/2019

Does Speech enhancement of publicly available data help build robust Speech Recognition Systems?

Automatic speech recognition (ASR) systems play a key role in many comme...
research
11/16/2021

Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses

Speech enhancement has recently achieved great success with various deep...
research
02/16/2023

PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement

Despite rapid advancement in recent years, current speech enhancement mo...
research
11/07/2020

Dual Application of Speech Enhancement for Automatic Speech Recognition

In this work, we exploit speech enhancement for improving a recurrent ne...
research
09/30/2022

Blind Signal Dereverberation for Machine Speech Recognition

We present a method to remove unknown convolutive noise introduced to sp...
research
07/17/2020

SkipConvNet: Skip Convolutional Neural Network for Speech Dereverberation using Optimally Smoothed Spectral Mapping

The reliability of using fully convolutional networks (FCNs) has been su...

Please sign up or login with your details

Forgot password? Click here to reset