Sub-word Level Lip Reading With Visual Attention

10/14/2021
by   Prajwal K R, et al.
5

The goal of this paper is to learn strong lip reading models that can recognise speech in silent videos. Most prior works deal with the open-set visual speech recognition problem by adapting existing automatic speech recognition techniques on top of trivially pooled visual features. Instead, in this paper we focus on the unique challenges encountered in lip reading and propose tailored solutions. To that end we make the following contributions: (1) we propose an attention-based pooling mechanism to aggregate visual speech representations; (2) we use sub-word units for lip reading for the first time and show that this allows us to better model the ambiguities of the task; (3) we propose a training pipeline that balances the lip reading performance with other key factors such as data and compute efficiency. Following the above, we obtain state-of-the-art results on the challenging LRS2 and LRS3 benchmarks when training on public datasets, and even surpass models trained on large-scale industrial datasets by using an order of magnitude less data. Our best model achieves 22.6 unprecedented for lip reading models, significantly reducing the performance gap between lip reading and automatic speech recognition.

READ FULL TEXT

page 4

page 7

research
01/21/2023

A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset

In recent years, significant progress has been made in automatic lip rea...
research
11/15/2020

Learn an Effective Lip Reading Model without Pains

Lip reading, also known as visual speech recognition, aims to recognize ...
research
10/16/2018

LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild

Large-scale datasets have successively proven their fundamental importan...
research
10/03/2017

Resolution limits on visual speech recognition

Visual-only speech recognition is dependent upon a number of factors tha...
research
09/28/2020

A Study on Lip Localization Techniques used for Lip reading from a Video

In this paper some of the different techniques used to localize the lips...
research
06/15/2018

Deep Lip Reading: a comparison of models and an online application

The goal of this paper is to develop state-of-the-art models for lip rea...
research
03/06/2020

Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition

Recent advances in deep learning have heightened interest among research...

Please sign up or login with your details

Forgot password? Click here to reset