Deep Contextualized Acoustic Representations For Semi-Supervised Speech Recognition

12/03/2019
by   Shaoshi Ling, et al.
0

We propose a novel approach to semi-supervised automatic speech recognition (ASR). We first exploit a large amount of unlabeled audio data via representation learning, where we reconstruct a temporal slice of filterbank features from past and future context frames. The resulting deep contextualized acoustic representations (DeCoAR) are then used to train a CTC-based end-to-end ASR system using a smaller amount of labeled audio data. In our experiments, we show that systems trained on DeCoAR consistently outperform ones trained on conventional filterbank features, giving 42 the baseline on WSJ eval92 and LibriSpeech test-clean, respectively. Our approach can drastically reduce the amount of labeled data required; unsupervised training on LibriSpeech then supervision with 100 hours of labeled data achieves performance on par with training on all 960 hours directly.

READ FULL TEXT
research
12/11/2020

DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector Quantization

Recent success in speech representation learning enables a new way to le...
research
11/19/2019

End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures

We study ResNet-, Time-Depth Separable ConvNets-, and Transformer-based ...
research
05/19/2023

Unsupervised ASR via Cross-Lingual Pseudo-Labeling

Recent work has shown that it is possible to train an unsupervised autom...
research
02/24/2020

Semi-Supervised Speech Recognition via Local Prior Matching

For sequence transduction tasks like speech recognition, a strong struct...
research
08/04/2020

"This is Houston. Say again, please". The Behavox system for the Apollo-11 Fearless Steps Challenge (phase II)

We describe the speech activity detection (SAD), speaker diarization (SD...
research
04/24/2019

Realizing Petabyte Scale Acoustic Modeling

Large scale machine learning (ML) systems such as the Alexa automatic sp...
research
04/03/2022

Automatic Dialect Density Estimation for African American English

In this paper, we explore automatic prediction of dialect density of the...

Please sign up or login with your details

Forgot password? Click here to reset