Hierarchical Transformer-based Large-Context End-to-end ASR with Large-Context Knowledge Distillation

02/16/2021
by   Ryo Masumura, et al.
0

We present a novel large-context end-to-end automatic speech recognition (E2E-ASR) model and its effective training method based on knowledge distillation. Common E2E-ASR models have mainly focused on utterance-level processing in which each utterance is independently transcribed. On the other hand, large-context E2E-ASR models, which take into account long-range sequential contexts beyond utterance boundaries, well handle a sequence of utterances such as discourses and conversations. However, the transformer architecture, which has recently achieved state-of-the-art ASR performance among utterance-level ASR systems, has not yet been introduced into the large-context ASR systems. We can expect that the transformer architecture can be leveraged for effectively capturing not only input speech contexts but also long-range sequential contexts beyond utterance boundaries. Therefore, this paper proposes a hierarchical transformer-based large-context E2E-ASR model that combines the transformer architecture with hierarchical encoder-decoder based large-context modeling. In addition, in order to enable the proposed model to use long-range sequential contexts, we also propose a large-context knowledge distillation that distills the knowledge from a pre-trained large-context language model in the training phase. We evaluate the effectiveness of the proposed model and proposed training method on Japanese discourse ASR tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/02/2022

Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism

Transformer-based models have demonstrated their effectiveness in automa...
research
06/23/2023

Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems

Current ASR systems are mainly trained and evaluated at the utterance le...
research
11/22/2021

Hierarchical Knowledge Distillation for Dialogue Sequence Labeling

This paper presents a novel knowledge distillation method for dialogue s...
research
08/09/2020

Distilling the Knowledge of BERT for Sequence-to-Sequence ASR

Attention-based sequence-to-sequence (seq2seq) models have achieved prom...
research
02/16/2021

End-to-End Automatic Speech Recognition with Deep Mutual Learning

This paper is the first study to apply deep mutual learning (DML) to end...
research
04/17/2019

Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation

Conventional automatic speech recognition (ASR) systems trained from fra...
research
11/17/2020

s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis

Neural end-to-end text-to-speech (TTS) , which adopts either a recurrent...

Please sign up or login with your details

Forgot password? Click here to reset