Lip Reading Using Convolutional Auto Encoders as Feature Extractor

05/31/2018
by   Dharin Parekh, et al.
0

Visual recognition of speech using the lip movement is called Lip-reading. Recent developments in this nascent field uses different neural networks as feature extractors which serve as input to a model which can map the temporal relationship and classify. Though end to end sentence level Lip-reading is the current trend, we proposed a new model which employs word level classification and breaks the set benchmarks for standard datasets. In our model we use convolutional autoencoders as feature extractors which are then fed to a Long short-term memory model. We tested our proposed model on BBC's LRW dataset, MIRACL-VC1 and GRID dataset. Achieving a classification accuracy of 98 MIRACL-VC1 as compared to 93.4 BBC's LRW the proposed model performed better than the baseline model of convolutional neural networks and Long short-term memory model (Garg et al., 2016). Showing the features learned by the models we clearly indicate how the proposed model works better than the baseline model. The same model can also be extended for end to end sentence level classification.

READ FULL TEXT

page 2

page 4

research
09/19/2017

Convolutional Long Short-Term Memory Networks for Recognizing First Person Interactions

In this paper, we present a novel deep learning based approach for addre...
research
09/19/2017

Learning to Detect Violent Videos using Convolutional Long Short-Term Memory

Developing a technique for the automatic analysis of surveillance videos...
research
01/25/2016

Long Short-Term Memory-Networks for Machine Reading

In this paper we address the question of how to render sequence-level ne...
research
05/11/2017

Object-Level Context Modeling For Scene Classification with Context-CNN

Convolutional Neural Networks (CNNs) have been used extensively for comp...
research
11/20/2016

Fast Video Classification via Adaptive Cascading of Deep Models

Recent advances have enabled "oracle" classifiers that can classify acro...
research
07/22/2018

Rapid Autonomous Car Control based on Spatial and Temporal Visual Cues

We present a novel approach to modern car control utilizing a combinatio...
research
05/21/2020

SpotFast Networks with Memory Augmented Lateral Transformers for Lipreading

This paper presents a novel deep learning architecture for word-level li...

Please sign up or login with your details

Forgot password? Click here to reset