Self-Supervised Audio-Visual Speech Representations Learning By Multimodal Self-Distillation

12/06/2022
by   Jing-Xuan Zhang, et al.
0

In this work, we present a novel method, named AV2vec, for learning audio-visual speech representations by multimodal self-distillation. AV2vec has a student and a teacher module, in which the student performs a masked latent feature regression task using the multimodal target features generated online by the teacher. The parameters of the teacher model are a momentum update of the student. Since our target features are generated online, AV2vec needs no iteration step like AV-HuBERT and the total training time cost is reduced to less than one-fifth. We further propose AV2vec-MLM in this study, which augments AV2vec with a masked language model (MLM)-style loss using multitask learning. Our experimental results show that AV2vec achieved comparable performance to the AV-HuBERT baseline. When combined with an MLM-style loss, AV2vec-MLM outperformed baselines and achieved the best performance on the downstream tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2023

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

In this paper, we introduce self-distillation and online clustering for ...
research
06/07/2023

Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks

In recent years, self-supervised learning (SSL) has emerged as a popular...
research
02/18/2023

RobustDistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness

Self-supervised speech pre-training enables deep neural network models t...
research
10/27/2022

Multimodal Transformer Distillation for Audio-Visual Synchronization

Audio-visual synchronization aims to determine whether the mouth movemen...
research
01/19/2021

Momentum^2 Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning

In this paper, we present a novel approach, Momentum^2 Teacher, for stud...
research
02/23/2023

Random Teachers are Good Teachers

In this work, we investigate the implicit regularization induced by teac...
research
09/19/2022

Toward Understanding Privileged Features Distillation in Learning-to-Rank

In learning-to-rank problems, a privileged feature is one that is availa...

Please sign up or login with your details

Forgot password? Click here to reset