DeepAI AI Chat
Log In Sign Up

Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition

by   Ye Bai, et al.

Integrating an external language model into a sequence-to-sequence speech recognition system is non-trivial. Previous works utilize linear interpolation or a fusion network to integrate external language models. However, these approaches introduce external components, and increase decoding computation. In this paper, we instead propose a knowledge distillation based training approach to integrating external language models into a sequence-to-sequence model. A recurrent neural network language model, which is trained on large scale external text, generates soft labels to guide the sequence-to-sequence model training. Thus, the language model plays the role of the teacher. This approach does not add any external component to the sequence-to-sequence model during testing. And this approach is flexible to be combined with shallow fusion technique together for decoding. The experiments are conducted on public Chinese datasets AISHELL-1 and CLMAD. Our approach achieves a character error rate of 9.3 sequence-to-sequence model.


page 1

page 2

page 3

page 4


An analysis of incorporating an external language model into a sequence-to-sequence model

Attention-based sequence-to-sequence models for automatic speech recogni...

Candidate Fusion: Integrating Language Modelling into a Sequence-to-Sequence Handwritten Word Recognition Architecture

Sequence-to-sequence models have recently become very popular for tackli...

Integrating Whole Context to Sequence-to-sequence Speech Recognition

Because an attention based sequence-to-sequence speech (Seq2Seq) recogni...

Cold Fusion: Training Seq2Seq Models Together with Language Models

Sequence-to-sequence (Seq2Seq) models with attention have excelled at ta...

Memory Attentive Fusion: External Language Model Integration for Transformer-based Sequence-to-Sequence Model

This paper presents a novel fusion method for integrating an external la...

Early Stage LM Integration Using Local and Global Log-Linear Combination

Sequence-to-sequence models with an implicit alignment mechanism (e.g. a...

Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions

We propose a fully convolutional sequence-to-sequence encoder architectu...