An Investigation into the Effectiveness of Enhancement in ASR Training and Test for CHiME-5 Dinner Party Transcription

09/26/2019
by   Catalin Zorila, et al.
0

Despite the strong modeling power of neural network acoustic models, speech enhancement has been shown to deliver additional word error rate improvements if multi-channel data is available. However, there has been a longstanding debate whether enhancement should also be carried out on the ASR training data. In an extensive experimental evaluation on the acoustically very challenging CHiME-5 dinner party data we show that: (i) cleaning up the training data can lead to substantial error rate reductions, and (ii) enhancement in training is advisable as long as enhancement in test is at least as strong as in training. This approach stands in contrast and delivers larger gains than the common strategy reported in the literature to augment the training database with additional artificially degraded speech. Together with an acoustic model topology consisting of initial CNN layers followed by factorized TDNN layers we achieve with 41.6 new single-system state-of-the-art result on the CHiME-5 data. This is a 8 relative improvement compared to the best word error rate published so far for a speech recognizer without system combination.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/13/2019

Frequency Domain Multi-channel Acoustic Modeling for Distant Speech Recognition

Conventional far-field automatic speech recognition (ASR) systems typica...
research
05/16/2019

Learning discriminative features in sequence training without requiring framewise labelled data

In this work, we try to answer two questions: Can deeply learned feature...
research
05/29/2019

Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR

In this paper, we present Hitachi and Paderborn University's joint effor...
research
03/13/2019

Multi-Geometry Spatial Acoustic Modeling for Distant Speech Recognition

The use of spatial information with multiple microphones can improve far...
research
12/11/2021

Perceptual Loss with Recognition Model for Single-Channel Enhancement and Robust ASR

Single-channel speech enhancement approaches do not always improve autom...
research
12/02/2020

Combining Spatial Clustering with LSTM Speech Models for Multichannel Speech Enhancement

Recurrent neural networks using the LSTM architecture can achieve signif...
research
11/28/2018

Acoustics-guided evaluation (AGE): a new measure for estimating performance of speech enhancement algorithms for robust ASR

One challenging problem of robust automatic speech recognition (ASR) is ...

Please sign up or login with your details

Forgot password? Click here to reset