Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute

06/11/2023
by   William Chen, et al.
0

Self-supervised learning (SSL) has led to great strides in speech processing. However, the resources needed to train these models has become prohibitively large as they continue to scale. Currently, only a few groups with substantial resources are capable of creating SSL models, which harms reproducibility. In this work, we optimize HuBERT SSL to fit in academic constraints. We reproduce HuBERT independently from the original implementation, with no performance loss. Our code and training optimizations make SSL feasible with only 8 GPUs, instead of the 32 used in the original work. We also explore a semi-supervised route, using an ASR model to skip the first pre-training iteration. Within one iteration of pre-training, our models improve over HuBERT on several tasks. Furthermore, our HuBERT Large variant requires only 8 GPUs, achieving similar performance to the original trained on 128. As our contribution to the community, all models, configurations, and code are made open-source in ESPnet.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/09/2021

Wav2vec-S: Semi-Supervised Pre-Training for Speech Recognition

Self-supervised pre-training has dramatically improved the performance o...
research
11/04/2022

Biased Self-supervised learning for ASR

Self-supervised learning via masked prediction pre-training (MPPT) has s...
research
04/25/2021

How Well Self-Supervised Pre-Training Performs with Streaming Data?

The common self-supervised pre-training practice requires collecting mas...
research
09/30/2022

Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods

Self-supervised methods have achieved remarkable success in transfer lea...
research
03/11/2023

PRSNet: A Masked Self-Supervised Learning Pedestrian Re-Identification Method

In recent years, self-supervised learning has attracted widespread acade...
research
10/29/2020

Self-supervised Pre-training Reduces Label Permutation Instability of Speech Separation

Speech separation has been well-developed while there are still problems...
research
06/01/2023

On Masked Pre-training and the Marginal Likelihood

Masked pre-training removes random input dimensions and learns a model t...

Please sign up or login with your details

Forgot password? Click here to reset