Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription

04/22/2020
by   Andrei Andrusenko, et al.
0

While end-to-end ASR systems have proven competitive with the conventional hybrid approach, they are prone to accuracy degradation when it comes to noisy and low-resource conditions. In this paper, we argue that, even in such difficult cases, some end-to-end approaches show performance close to the hybrid baseline. To demonstrate this, we use the CHiME-6 Challenge data as an example of challenging environments and noisy conditions of everyday speech. We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures. We also provide a comparison of acoustic features and speech enhancements. Besides, we evaluate the effectiveness of neural network language models for hypothesis re-scoring in low-resource conditions. Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8 than the LF-MMI TDNN-F CHiME-6 Challenge baseline. With the Guided Source Separation based speech enhancement, this approach outperforms the hybrid baseline system by 2.7 25.7

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/14/2020

You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation

Data augmentation is one of the most effective ways to make end-to-end a...
research
05/28/2020

On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition

Recently, there has been a strong push to transition from hybrid models ...
research
01/29/2021

BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge

This paper describes joint effort of BUT and Telefónica Research on deve...
research
05/27/2020

Phone Features Improve Speech Translation

End-to-end models for speech translation (ST) more tightly couple speech...
research
02/02/2021

The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap

This paper provides a detailed description of the Hitachi-JHU system tha...
research
06/22/2021

Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw

We present a number of low-resource approaches to the tasks of the Zero ...
research
02/21/2022

Adaptive Discounting of Implicit Language Models in RNN-Transducers

RNN-Transducer (RNN-T) models have become synonymous with streaming end-...

Please sign up or login with your details

Forgot password? Click here to reset