wav2vec: Unsupervised Pre-training for Speech Recognition

04/11/2019
by   Steffen Schneider, et al.
0

We explore unsupervised pre-training for speech recognition by learning representations of raw audio. wav2vec is trained on large amounts of unlabeled audio data and the resulting representations are then used to improve acoustic model training. We pre-train a simple multi-layer convolutional neural network optimized via a noise contrastive binary classification task. Our experiments on WSJ reduce WER of a strong character-based log-mel filterbank baseline by up to 32 achieves 2.78 best reported character-based system in the literature while using three orders of magnitude less labeled training data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2019

Improving Transformer-based Speech Recognition Using Unsupervised Pre-training

Speech recognition technologies are gaining enormous popularity in vario...
research
10/22/2020

Self-training and Pre-training are Complementary for Speech Recognition

Self-training and unsupervised pre-training have emerged as effective ap...
research
10/28/2019

Unsupervised pre-training for sequence to sequence speech recognition

This paper proposes a novel approach to pre-train encoder-decoder sequen...
research
10/28/2019

Unsupervised pre-traing for sequence to sequence speech recognition

This paper proposes a novel approach to pre-train encoder-decoder sequen...
research
05/20/2020

A Further Study of Unsupervised Pre-training for Transformer Based Speech Recognition

Building a good speech recognition system usually requires large amounts...
research
07/29/2020

Transformer based unsupervised pre-training for acoustic representation learning

Computational audio analysis has become a central issue in associated ar...
research
01/14/2022

Learning from One and Only One Shot

Humans can generalize from only a few examples and from little pre-train...

Please sign up or login with your details

Forgot password? Click here to reset