Deep Lip Reading: a comparison of models and an online application

06/15/2018
by   Triantafyllos Afouras, et al.
4

The goal of this paper is to develop state-of-the-art models for lip reading -- visual speech recognition. We develop three architectures and compare their accuracy and training times: (i) a recurrent model using LSTMs; (ii) a fully convolutional model; and (iii) the recently proposed transformer model. The recurrent and fully convolutional models are trained with a Connectionist Temporal Classification loss and use an explicit language model for decoding, the transformer is a sequence-to-sequence model. Our best performing model improves the state-of-the-art word error rate on the challenging BBC-Oxford Lip Reading Sentences 2 (LRS2) benchmark dataset by over 20 percent. As a further contribution we investigate the fully convolutional model when used for online (real time) lip reading of continuous speech, and show that it achieves high performance with low latency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2018

Deep Audio-Visual Speech Recognition

The goal of this work is to recognise phrases and sentences being spoken...
research
04/10/2021

Lip reading using external viseme decoding

Lip-reading is the operation of recognizing speech from lip movements. T...
research
10/14/2021

Sub-word Level Lip Reading With Visual Attention

The goal of this paper is to learn strong lip reading models that can re...
research
07/02/2018

Exploring End-to-End Techniques for Low-Resource Speech Recognition

In this work we present simple grapheme-based system for low-resource sp...
research
09/03/2022

Training Strategies for Improved Lip-reading

Several training strategies and temporal models have been recently propo...
research
09/12/2020

DualLip: A System for Joint Lip Reading and Generation

Lip reading aims to recognize text from talking lip, while lip generatio...
research
03/09/2020

Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading

Lip-reading aims to infer the speech content from the lip movement seque...

Please sign up or login with your details

Forgot password? Click here to reset