Weakly Supervised Construction of ASR Systems with Massive Video Data

08/04/2020
by   Mengli Cheng, et al.
0

Building Automatic Speech Recognition (ASR) systems from scratch is significantly challenging, mostly due to the time-consuming and financially-expensive process of annotating a large amount of audio data with transcripts. Although several unsupervised pre-training models have been proposed, applying such models directly might still be sub-optimal if more labeled, training data could be obtained without a large cost. In this paper, we present a weakly supervised framework for constructing ASR systems with massive video data. As videos often contain human-speech audios aligned with subtitles, we consider videos as an important knowledge source, and propose an effective approach to extract high-quality audios aligned with transcripts from videos based on Optical Character Recognition (OCR). The underlying ASR model can be fine-tuned to fit any domain-specific target training datasets after weakly supervised pre-training. Extensive experiments show that our framework can easily produce state-of-the-art results on six public datasets for Mandarin speech recognition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2023

Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition

Self-supervised learning (SSL) based speech pre-training has attracted m...
research
04/04/2023

Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data

Scaling up weakly-supervised datasets has shown to be highly effective i...
research
03/31/2021

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone

Transcribing meetings containing overlapped speech with only a single di...
research
10/27/2019

Training ASR models by Generation of Contextual Information

Supervised ASR models have reached unprecedented levels of accuracy, tha...
research
05/16/2020

Large scale weakly and semi-supervised learning for low-resource video ASR

Many semi- and weakly-supervised approaches have been investigated for o...
research
02/21/2023

HierCat: Hierarchical Query Categorization from Weakly Supervised Data at Facebook Marketplace

Query categorization at customer-to-customer e-commerce platforms like F...
research
11/21/2022

SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale

End-to-end automatic speech recognition systems represent the state of t...

Please sign up or login with your details

Forgot password? Click here to reset