Wav2Vec-Aug: Improved self-supervised training with limited data

06/27/2022
by   Anuroop Sriram, et al.
0

Self-supervised learning (SSL) of speech representations has received much attention over the last few years but most work has focused on languages and domains with an abundance of unlabeled data. However, for many languages there is a shortage even in the unlabeled data which limits the effectiveness of SSL. In this work, we focus on the problem of applying SSL to domains with limited available data by leveraging data augmentation for Wav2Vec 2.0 pretraining. Further, we propose improvements to each component of the model which result in a combined relative word error rate (WER) improvement of up to 13 Wav2Vec 2.0 on Librispeech test-clean / other.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/19/2021

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

In this paper, we propose a unified pre-training approach called UniSpee...
research
02/07/2022

Self-supervised Speaker Recognition Training Using Human-Machine Dialogues

Speaker recognition, recognizing speaker identities based on voice alone...
research
09/09/2021

SanitAIs: Unsupervised Data Augmentation to Sanitize Trojaned Neural Networks

The application of self-supervised methods has resulted in broad improve...
research
04/05/2022

Self-supervised learning – A way to minimize time and effort for precision agriculture?

Machine learning, satellites or local sensors are key factors for a sust...
research
05/28/2022

Applying Self-Supervised Learning to Medicine: Review of the State of the Art and Medical Implementations

Machine learning has become an increasingly ubiquitous technology, as bi...
research
03/20/2023

Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech

Self-supervised learning leverages unlabeled data effectively, improving...
research
10/01/2021

Incremental Layer-wise Self-Supervised Learning for Efficient Speech Domain Adaptation On Device

Streaming end-to-end speech recognition models have been widely applied ...

Please sign up or login with your details

Forgot password? Click here to reset