Multiple-hypothesis CTC-based semi-supervised adaptation of end-to-end speech recognition

03/29/2021
by   Cong-Thanh Do, et al.
0

This paper proposes an adaptation method for end-to-end speech recognition. In this method, multiple automatic speech recognition (ASR) 1-best hypotheses are integrated in the computation of the connectionist temporal classification (CTC) loss function. The integration of multiple ASR hypotheses helps alleviating the impact of errors in the ASR hypotheses to the computation of the CTC loss when ASR hypotheses are used. When being applied in semi-supervised adaptation scenarios where part of the adaptation data do not have labels, the CTC loss of the proposed method is computed from different ASR 1-best hypotheses obtained by decoding the unlabeled adaptation data. Experiments are performed in clean and multi-condition training scenarios where the CTC-based end-to-end ASR systems are trained on Wall Street Journal (WSJ) clean training data and CHiME-4 multi-condition training data, respectively, and tested on Aurora-4 test data. The proposed adaptation method yields 6.6 and 5.8 training scenarios, respectively, compared to a baseline system which is adapted with part of the adaptation data having manual transcriptions using back-propagation fine-tuning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/29/2022

Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer

This paper proposes a new approach to perform unsupervised fine-tuning a...
research
08/08/2019

Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition

In this paper, we explore various approaches for semi supervised learnin...
research
12/10/2021

Sequence-level self-learning with multiple hypotheses

In this work, we develop new self-learning techniques with an attention-...
research
05/17/2019

End-to-end Adaptation with Backpropagation through WFST for On-device Speech Recognition System

An on-device DNN-HMM speech recognition system efficiently works with a ...
research
02/06/2017

DNN adaptation by automatic quality estimation of ASR hypotheses

In this paper we propose to exploit the automatic Quality Estimation (QE...
research
11/09/2018

Multimodal Grounding for Sequence-to-Sequence Speech Recognition

Humans are capable of processing speech by making use of multiple sensor...

Please sign up or login with your details

Forgot password? Click here to reset