Continual-wav2vec2: an Application of Continual Learning for Self-Supervised Automatic Speech Recognition

07/26/2021
by   Samuel Kessler, et al.
0

We present a method for continual learning of speech representations for multiple languages using self-supervised learning (SSL) and applying these for automatic speech recognition. There is an abundance of unannotated speech, so creating self-supervised representations from raw audio and finetuning on a small annotated datasets is a promising direction to build speech recognition systems. Wav2vec models perform SSL on raw audio in a pretraining phase and then finetune on a small fraction of annotated data. SSL models have produced state of the art results for ASR. However, these models are very expensive to pretrain with self-supervision. We tackle the problem of learning new language representations continually from audio without forgetting a previous language representation. We use ideas from continual learning to transfer knowledge from a previous task to speed up pretraining a new language task. Our continual-wav2vec2 model can decrease pretraining times by 32 new language task, and learn this new audio-language representation without forgetting previous language representation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/11/2022

Online Continual Learning of End-to-End Speech Recognition Models

Continual Learning, also known as Lifelong Learning, aims to continually...
research
06/16/2021

SPeCiaL: Self-Supervised Pretraining for Continual Learning

This paper presents SPeCiaL: a method for unsupervised pretraining of re...
research
03/29/2022

Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition

We investigate the performance of self-supervised pretraining frameworks...
research
06/16/2022

DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR

Self-supervised learning (SSL) in the pretraining stage using un-annotat...
research
05/29/2023

DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes

Lifelong audio feature extraction involves learning new sound classes in...
research
05/05/2020

Temporal Event Segmentation using Attention-based Perceptual Prediction Model for Continual Learning

Temporal event segmentation of a long video into coherent events require...
research
12/30/2021

Continually Learning Self-Supervised Representations with Projected Functional Regularization

Recent self-supervised learning methods are able to learn high-quality i...

Please sign up or login with your details

Forgot password? Click here to reset