ASR is all you need: cross-modal distillation for lip reading

by   Triantafyllos Afouras, et al.

The goal of this work is to train strong models for visual speech recognition without requiring human annotated ground truth data. We achieve this by distilling from an Automatic Speech Recognition (ASR) model that has been trained on a large-scale audio-only corpus. We use a cross-modal distillation method that combines CTC with a frame-wise cross-entropy loss. Our contributions are fourfold: (i) we show that ground truth transcriptions are not necessary to train a lip reading system; (ii) we show how arbitrary amounts of unlabelled video data can be leveraged to improve performance; (iii) we demonstrate that distillation significantly speeds up training; and, (iv) we obtain state-of-the-art results on the challenging LRS2 and LRS3 datasets for training only on publicly available data.


page 1

page 2

page 3

page 4


Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation

Large-scale pre-trained language models (PLMs) with powerful language mo...

ViLaS: Integrating Vision and Language into Automatic Speech Recognition

Employing additional multimodal information to improve automatic speech ...

Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers

Lip reading has witnessed unparalleled development in recent years thank...

Cross-Modal Knowledge Distillation Method for Automatic Cued Speech Recognition

Cued Speech (CS) is a visual communication system for the deaf or hearin...

Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition

We propose a cross-modal transformer-based neural correction models that...

Chain-based Discriminative Autoencoders for Speech Recognition

In our previous work, we proposed a discriminative autoencoder (DcAE) fo...

Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR

Leveraging context information is an intuitive idea to improve performan...

Please sign up or login with your details

Forgot password? Click here to reset