Deep Contextualized Acoustic Representations For Semi-Supervised Speech Recognition

12/03/2019
by   Shaoshi Ling, et al.
Amazon
0

We propose a novel approach to semi-supervised automatic speech recognition (ASR). We first exploit a large amount of unlabeled audio data via representation learning, where we reconstruct a temporal slice of filterbank features from past and future context frames. The resulting deep contextualized acoustic representations (DeCoAR) are then used to train a CTC-based end-to-end ASR system using a smaller amount of labeled audio data. In our experiments, we show that systems trained on DeCoAR consistently outperform ones trained on conventional filterbank features, giving 42 the baseline on WSJ eval92 and LibriSpeech test-clean, respectively. Our approach can drastically reduce the amount of labeled data required; unsupervised training on LibriSpeech then supervision with 100 hours of labeled data achieves performance on par with training on all 960 hours directly.

READ FULL TEXT
12/11/2020

DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector Quantization

Recent success in speech representation learning enables a new way to le...
11/19/2019

End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures

We study ResNet-, Time-Depth Separable ConvNets-, and Transformer-based ...
05/19/2023

Unsupervised ASR via Cross-Lingual Pseudo-Labeling

Recent work has shown that it is possible to train an unsupervised autom...
02/24/2020

Semi-Supervised Speech Recognition via Local Prior Matching

For sequence transduction tasks like speech recognition, a strong struct...
08/04/2020

"This is Houston. Say again, please". The Behavox system for the Apollo-11 Fearless Steps Challenge (phase II)

We describe the speech activity detection (SAD), speaker diarization (SD...
04/24/2019

Realizing Petabyte Scale Acoustic Modeling

Large scale machine learning (ML) systems such as the Alexa automatic sp...
04/03/2022

Automatic Dialect Density Estimation for African American English

In this paper, we explore automatic prediction of dialect density of the...

Please sign up or login with your details

Forgot password? Click here to reset