Can DNNs Learn to Lipread Full Sentences?

05/29/2018
by   George Sterpu, et al.
0

Finding visual features and suitable models for lipreading tasks that are more complex than a well-constrained vocabulary has proven challenging. This paper explores state-of-the-art Deep Neural Network architectures for lipreading based on a Sequence to Sequence Recurrent Neural Network. We report results for both hand-crafted and 2D/3D Convolutional Neural Network visual front-ends, online monotonic attention, and a joint Connectionist Temporal Classification-Sequence-to-Sequence loss. The system is evaluated on the publicly available TCD-TIMIT dataset, with 59 speakers and a vocabulary of over 6000 words. Results show a major improvement on a Hidden Markov Model framework. A fuller analysis of performance across visemes demonstrates that the network is not only learning the language model, but actually learning to lipread.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/03/2015

Sequence to Sequence -- Video to Text

Real-world videos often have complex dynamics; and methods for generatin...
research
10/26/2017

Streaming Small-Footprint Keyword Spotting using Sequence-to-Sequence Models

We develop streaming keyword spotting systems using a recurrent neural n...
research
10/29/2016

Sequence-to-sequence neural network models for transliteration

Transliteration is a key component of machine translation systems and so...
research
04/03/2017

Online and Linear-Time Attention by Enforcing Monotonic Alignments

Recurrent neural network models with an attention mechanism have proven ...
research
08/30/2021

Neural HMMs are all you need (for high-quality attention-free TTS)

Neural sequence-to-sequence TTS has achieved significantly better output...
research
11/23/2018

A Hierarchical Neural Network for Sequence-to-Sequences Learning

In recent years, the sequence-to-sequence learning neural networks with ...
research
09/21/2017

Large Vocabulary Automatic Chord Estimation Using Deep Neural Nets: Design Framework, System Variations and Limitations

In this paper, we propose a new system design framework for large vocabu...

Please sign up or login with your details

Forgot password? Click here to reset