Improved training for online end-to-end speech recognition systems

11/06/2017
by   Suyoun Kim, et al.
0

Achieving high accuracy with end-to-end speech recognizers requires careful parameter initialization prior to training. Otherwise, the networks may fail to find a good local optimum. This is particularly true for low-latency online networks, such as unidirectional LSTMs. Currently, the best strategy to train such systems is to bootstrap the training from a tied-triphone system. However, this is time consuming, and more importantly, is impossible for languages without a high-quality pronunciation lexicon. In this work, we propose an initialization strategy that uses teacher-student learning to transfer knowledge from a large, well-trained, offline end-to-end speech recognition model to an online end-to-end model, eliminating the need for a lexicon or any other linguistic resources. We also explore curriculum learning and label smoothing and show how they can be combined with the proposed teacher-student learning for further improvements. We evaluate our methods on a Microsoft Cortana personal assistant task and show that the proposed method results in a 19 baseline system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2022

Pseudo Label Is Better Than Human Label

State-of-the-art automatic speech recognition (ASR) systems are trained ...
research
03/17/2020

High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model

While the community keeps promoting end-to-end models over conventional ...
research
10/27/2017

BridgeNets: Student-Teacher Transfer Learning Based on Recursive Neural Networks and its Application to Distant Speech Recognition

Despite the remarkable progress achieved on automatic speech recognition...
research
11/26/2020

Streaming end-to-end multi-talker speech recognition

End-to-end multi-talker speech recognition is an emerging research trend...
research
01/06/2020

Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition

Teacher-student (T/S) has shown to be effective for domain adaptation of...
research
09/28/2021

Exploring Teacher-Student Learning Approach for Multi-lingual Speech-to-Intent Classification

End-to-end speech-to-intent classification has shown its advantage in ha...
research
12/08/2015

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

We show that an end-to-end deep learning approach can be used to recogni...

Please sign up or login with your details

Forgot password? Click here to reset