Sequence-level self-learning with multiple hypotheses

12/10/2021
by   Kenichi Kumatani, et al.
0

In this work, we develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR). For untranscribed speech data, the hypothesis from an ASR system must be used as a label. However, the imperfect ASR result makes unsupervised learning difficult to consistently improve recognition performance especially in the case that multiple powerful teacher models are unavailable. In contrast to conventional unsupervised learning approaches, we adopt the multi-task learning (MTL) framework where the n-th best ASR hypothesis is used as the label of each task. The seq2seq network is updated through the MTL framework so as to find the common representation that can cover multiple hypotheses. By doing so, the effect of the hard-decision errors can be alleviated. We first demonstrate the effectiveness of our self-learning methods through ASR experiments in an accent adaptation task between the US and British English speech. Our experiment results show that our method can reduce the WER on the British speech data from 14.55% to 10.36% compared to the baseline model trained with the US English data only. Moreover, we investigate the effect of our proposed methods in a federated learning scenario.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2020

A General Multi-Task Learning Framework to Leverage Text Data for Speech to Text Tasks

Attention-based sequence-to-sequence modeling provides a powerful and el...
research
03/29/2021

Multiple-hypothesis CTC-based semi-supervised adaptation of end-to-end speech recognition

This paper proposes an adaptation method for end-to-end speech recogniti...
research
07/29/2022

Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer

This paper proposes a new approach to perform unsupervised fine-tuning a...
research
12/07/2020

Using multiple ASR hypotheses to boost i18n NLU performance

Current voice assistants typically use the best hypothesis yielded by th...
research
11/04/2020

Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition

Attention-based sequence-to-sequence automatic speech recognition (ASR) ...
research
11/16/2021

Attention-based Multi-hypothesis Fusion for Speech Summarization

Speech summarization, which generates a text summary from speech, can be...
research
02/06/2017

DNN adaptation by automatic quality estimation of ASR hypotheses

In this paper we propose to exploit the automatic Quality Estimation (QE...

Please sign up or login with your details

Forgot password? Click here to reset