Data Augmenting Contrastive Learning of Speech Representations in the Time Domain

07/02/2020
by   Eugene Kharitonov, et al.
0

Contrastive Predictive Coding (CPC), based on predicting future segments of speech based on past segments is emerging as a powerful algorithm for representation learning of speech signal. However, it still under-performs other methods on unsupervised evaluation benchmarks. Here, we introduce WavAugment, a time-domain data augmentation library and find that applying augmentation in the past is generally more efficient and yields better performances than other methods. We find that a combination of pitch modification, additive noise and reverberation substantially increase the performance of CPC (relative improvement of 18-22 Libri-light results with 600 times less data. Using an out-of-domain dataset, time-domain data augmentation can push CPC to be on par with the state of the art on the Zero Speech Benchmark 2017. We also show that time-domain data augmentation consistently improves downstream limited-supervision phoneme classification tasks by a factor of 12-15

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2022

Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning

Contrastive learning enables learning useful audio and speech representa...
research
12/27/2020

Domain Generalisation with Domain Augmented Supervised Contrastive Learning (Student Abstract)

Domain generalisation (DG) methods address the problem of domain shift, ...
research
07/02/2021

Supervised Contrastive Learning for Accented Speech Recognition

Neural network based speech recognition systems suffer from performance ...
research
04/11/2021

Constructing Contrastive samples via Summarization for Text Classification with limited annotations

Contrastive Learning has emerged as a powerful representation learning m...
research
05/27/2023

GIMM: InfoMin-Max for Automated Graph Contrastive Learning

Graph contrastive learning (GCL) shows great potential in unsupervised g...
research
02/21/2023

Advancing Stuttering Detection via Data Augmentation, Class-Balanced Loss and Multi-Contextual Deep Learning

Stuttering is a neuro-developmental speech impairment characterized by u...
research
01/28/2022

You Only Cut Once: Boosting Data Augmentation with a Single Cut

We present You Only Cut Once (YOCO) for performing data augmentations. Y...

Please sign up or login with your details

Forgot password? Click here to reset