Training ASR models by Generation of Contextual Information

10/27/2019
by   Kritika Singh, et al.
0

Supervised ASR models have reached unprecedented levels of accuracy, thanks in part to ever-increasing amounts of labelled training data. However, in many applications and locales, only moderate amounts of data are available, which has led to a surge in semi- and weakly-supervised learning research. In this paper, we conduct a large-scale study evaluating the effectiveness of weakly-supervised learning for speech recognition by using loosely related contextual information as a surrogate for ground-truth labels. For weakly supervised training, we use 50k hours of public English social media videos along with their respective titles and post text to train an encoder-decoder transformer model. Our best encoder-decoder models achieve an average of 20.8 WER reduction over a 1000 hours supervised baseline, and an average of 13.4 WER reduction when using only the weakly supervised encoder for CTC fine-tuning. Our results show that our setup for weak supervision improved both the encoder acoustic representations as well as the decoder language generation abilities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2020

Large scale weakly and semi-supervised learning for low-resource video ASR

Many semi- and weakly-supervised approaches have been investigated for o...
research
06/09/2022

Joint Encoder-Decoder Self-Supervised Pre-training for ASR

Self-supervised learning (SSL) has shown tremendous success in various s...
research
08/04/2020

Weakly Supervised Construction of ASR Systems with Massive Video Data

Building Automatic Speech Recognition (ASR) systems from scratch is sign...
research
06/21/2021

Two-Stream Consensus Network: Submission to HACS Challenge 2021 Weakly-Supervised Learning Track

This technical report presents our solution to the HACS Temporal Action ...
research
06/07/2022

LegoNN: Building Modular Encoder-Decoder Models

State-of-the-art encoder-decoder models (e.g. for machine translation (M...
research
10/12/2021

Word Order Does Not Matter For Speech Recognition

In this paper, we study training of automatic speech recognition system ...
research
05/30/2023

Understanding temporally weakly supervised training: A case study for keyword spotting

The currently most prominent algorithm to train keyword spotting (KWS) m...

Please sign up or login with your details

Forgot password? Click here to reset