DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector Quantization

12/11/2020
by   Shaoshi Ling, et al.
0

Recent success in speech representation learning enables a new way to leverage unlabeled data to train speech recognition model. In speech representation learning, a large amount of unlabeled data is used in a self-supervised manner to learn a feature representation. Then a smaller amount of labeled data is used to train a downstream ASR system using the new feature representations. Based on our previous work DeCoAR and inspirations from other speech representation learning, we propose DeCoAR 2.0, a Deep Contextualized Acoustic Representation with vector quantization. We introduce several modifications over the DeCoAR: first, we use Transformers in encoding module instead of LSTMs; second, we introduce a vector quantization layer between encoder and reconstruction modules; third, we propose an objective that combines the reconstructive loss with vector quantization diversity loss to train speech representations. Our experiments show consistent improvements over other speech representations in different data-sparse scenarios. Without fine-tuning, a light-weight ASR model trained on 10 hours of LibriSpeech labeled data with DeCoAR 2.0 features outperforms the model trained on the full 960-hour dataset with filterbank features.

READ FULL TEXT

page 2

page 3

research
03/09/2021

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

Wav2vec-C introduces a novel representation learning technique combining...
research
12/03/2019

Deep Contextualized Acoustic Representations For Semi-Supervised Speech Recognition

We propose a novel approach to semi-supervised automatic speech recognit...
research
04/26/2018

Competitive Learning Enriches Learning Representation and Accelerates the Fine-tuning of CNNs

In this study, we propose the integration of competitive learning into c...
research
10/23/2019

Generative Pre-Training for Speech with Autoregressive Predictive Coding

Learning meaningful and general representations from unannotated speech ...
research
07/07/2023

VariGrad: A Novel Feature Vector Architecture for Geometric Deep Learning on Unregistered Data

We present a novel geometric deep learning layer that leverages the vari...
research
04/03/2022

Automatic Dialect Density Estimation for African American English

In this paper, we explore automatic prediction of dialect density of the...
research
11/02/2022

data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup

In this paper, we propose a new Self-Supervised Learning (SSL) algorithm...

Please sign up or login with your details

Forgot password? Click here to reset