Lip Reading Sentences in the Wild

11/16/2016
by   Joon Son Chung, et al.
0

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio. Unlike previous works that have focussed on recognising a limited number of words or phrases, we tackle lip reading as an open-world problem - unconstrained natural language sentences, and in the wild videos. Our key contributions are: (1) a 'Watch, Listen, Attend and Spell' (WLAS) network that learns to transcribe videos of mouth motion to characters; (2) a curriculum learning strategy to accelerate training and to reduce overfitting; (3) a 'Lip Reading Sentences' (LRS) dataset for visual speech recognition, consisting of over 100,000 natural sentences from British television. The WLAS model trained on the LRS dataset surpasses the performance of all previous work on standard lip reading benchmark datasets, often by a significant margin. This lip reading performance beats a professional lip reader on videos from BBC television, and we also demonstrate that visual information helps to improve speech recognition performance even when the audio is available.

READ FULL TEXT

page 6

page 8

page 12

research
09/06/2018

Deep Audio-Visual Speech Recognition

The goal of this work is to recognise phrases and sentences being spoken...
research
11/17/2021

It's About Time: Analog Clock Reading in the Wild

In this paper, we present a framework for reading analog clocks in natur...
research
10/03/2017

Visual speech recognition: aligning terminologies for better understanding

We are at an exciting time for machine lipreading. Traditional research ...
research
02/12/2021

End-to-end Audio-visual Speech Recognition with Conformers

In this work, we present a hybrid CTC/Attention model based on a ResNet-...
research
02/15/2018

Deep Learning for Lip Reading using Audio-Visual Information for Urdu Language

Human lip-reading is a challenging task. It requires not only knowledge ...
research
03/18/2016

A Readability Analysis of Campaign Speeches from the 2016 US Presidential Campaign

Readability is defined as the reading level of the speech from grade 1 t...
research
10/16/2018

LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild

Large-scale datasets have successively proven their fundamental importan...

Please sign up or login with your details

Forgot password? Click here to reset