Robust Front-End for Multi-Channel ASR using Flow-Based Density Estimation

07/25/2020
by   Hyeongju Kim, et al.
0

For multi-channel speech recognition, speech enhancement techniques such as denoising or dereverberation are conventionally applied as a front-end processor. Deep learning-based front-ends using such techniques require aligned clean and noisy speech pairs which are generally obtained via data simulation. Recently, several joint optimization techniques have been proposed to train the front-end without parallel data within an end-to-end automatic speech recognition (ASR) scheme. However, the ASR objective is sub-optimal and insufficient for fully training the front-end, which still leaves room for improvement. In this paper, we propose a novel approach which incorporates flow-based density estimation for the robust front-end using non-parallel clean and noisy speech. Experimental results on the CHiME-4 dataset show that the proposed method outperforms the conventional techniques where the front-end is trained only with ASR objective.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2020

Improving noise robust automatic speech recognition with single-channel time-domain enhancement network

With the advent of deep learning, research on noise-robust automatic spe...
research
07/11/2022

pMCT: Patched Multi-Condition Training for Robust Speech Recognition

We propose a novel Patched Multi-Condition Training (pMCT) method for ro...
research
03/24/2022

Computing Optimal Location of Microphone for Improved Speech Recognition

It was shown in our earlier work that the measurement error in the micro...
research
02/11/2021

An Investigation of End-to-End Models for Robust Speech Recognition

End-to-end models for robust automatic speech recognition (ASR) have not...
research
07/22/2021

Multitask-Based Joint Learning Approach To Robust ASR For Radio Communication Speech

To realize robust end-to-end Automatic Speech Recognition(E2E ASR) under...
research
04/19/2019

An Investigation of End-to-End Multichannel Speech Recognition for Reverberant and Mismatch Conditions

Sequence-to-sequence (S2S) modeling is becoming a popular paradigm for a...
research
04/19/2019

Dry, Focus, and Transcribe: End-to-End Integration of Dereverberation, Beamforming, and ASR

Sequence-to-sequence (S2S) modeling is becoming a popular paradigm for a...

Please sign up or login with your details

Forgot password? Click here to reset