data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup

11/02/2022
by   Vasista Sai Lodagala, et al.
0

In this paper, we propose a new Self-Supervised Learning (SSL) algorithm called data2vec-aqc, for speech representation learning from unlabeled speech data. Our goal is to improve SSL for speech in domains where both unlabeled and labeled data are limited. Building on the recently introduced data2vec, we introduce additional modules to the data2vec framework that leverage the benefit of data augmentations, quantized representations, and clustering. The interaction between these modules helps solve the cross-contrastive loss as an additional self-supervised objective. data2vec-aqc achieves up to 14.1 20.9 system on the test-clean and test-other sets, respectively, of LibriSpeech, without the use of any language model. Our proposed model also achieves up to 17.8 Switchboard data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2022

CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations

While Self-Supervised Learning has helped reap the benefit of the scale ...
research
03/09/2021

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

Wav2vec-C introduces a novel representation learning technique combining...
research
01/19/2021

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

In this paper, we propose a unified pre-training approach called UniSpee...
research
03/20/2023

Exploring Representation Learning for Small-Footprint Keyword Spotting

In this paper, we investigate representation learning for low-resource k...
research
04/26/2022

ATST: Audio Representation Learning with Teacher-Student Transformer

Self-supervised learning (SSL) learns knowledge from a large amount of u...
research
12/11/2020

DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector Quantization

Recent success in speech representation learning enables a new way to le...
research
03/20/2023

Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech

Self-supervised learning leverages unlabeled data effectively, improving...

Please sign up or login with your details

Forgot password? Click here to reset