Deep Audio-Visual Speech Recognition

09/06/2018
by   Triantafyllos Afouras, et al.
2

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio. Unlike previous works that have focussed on recognising a limited number of words or phrases, we tackle lip reading as an open-world problem - unconstrained natural language sentences, and in the wild videos. Our key contributions are: (1) we compare two models for lip reading, one using a CTC loss, and the other using a sequence-to-sequence loss. Both models are built on top of the transformer self-attention architecture; (2) we investigate to what extent lip reading is complementary to audio speech recognition, especially when the audio signal is noisy; (3) we introduce and publicly release a new dataset for audio-visual speech recognition, LRS2-BBC, consisting of thousands of natural sentences from British television. The models that we train surpass the performance of all previous work on a lip reading benchmark dataset by a significant margin.

READ FULL TEXT

page 5

page 7

page 8

page 9

research
11/16/2016

Lip Reading Sentences in the Wild

The goal of this work is to recognise phrases and sentences being spoken...
research
02/15/2018

Deep Learning for Lip Reading using Audio-Visual Information for Urdu Language

Human lip-reading is a challenging task. It requires not only knowledge ...
research
06/15/2018

Deep Lip Reading: a comparison of models and an online application

The goal of this paper is to develop state-of-the-art models for lip rea...
research
10/03/2017

Visual speech recognition: aligning terminologies for better understanding

We are at an exciting time for machine lipreading. Traditional research ...
research
03/29/2023

Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert

Talking face generation, also known as speech-to-lip generation, reconst...
research
09/03/2014

Visual Speech Recognition

Lip reading is used to understand or interpret speech without hearing it...
research
08/07/2020

Incremental Text to Speech for Neural Sequence-to-Sequence Models using Reinforcement Learning

Modern approaches to text to speech require the entire input character s...

Please sign up or login with your details

Forgot password? Click here to reset