LCANet: End-to-End Lipreading with Cascaded Attention-CTC

03/13/2018
by   Kai Xu, et al.
0

Machine lipreading is a special type of automatic speech recognition (ASR) which transcribes human speech by visually interpreting the movement of related face regions including lips, face, and tongue. Recently, deep neural network based lipreading methods show great potential and have exceeded the accuracy of experienced human lipreaders in some benchmark datasets. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data. In this paper, we propose LCANet, an end-to-end deep neural network based lipreading system. LCANet encodes input video frames using a stacked 3D convolutional neural network (CNN), highway network and bidirectional GRU network. The encoder effectively captures both short-term and long-term spatio-temporal information. More importantly, LCANet incorporates a cascaded attention-CTC decoder to generate output texts. By cascading CTC with attention, it partially eliminates the defect of the conditional independence assumption of CTC within the hidden neural layers, and this yields notably performance improvement as well as faster convergence. The experimental results show the proposed system achieves a 1.3 database, leading to a 12.3 methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2022

NeuFA: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism

Although deep learning and end-to-end models have been widely used and s...
research
09/21/2015

Noise Robust IOA/CAS Speech Separation and Recognition System For The Third 'CHIME' Challenge

This paper presents the contribution to the third 'CHiME' speech separat...
research
11/07/2018

CNN-based MultiChannel End-to-End Speech Recognition for everyday home environments

Casual conversations involving multiple speakers and noises from surroun...
research
02/24/2017

Residual Convolutional CTC Networks for Automatic Speech Recognition

Deep learning approaches have been widely used in Automatic Speech Recog...
research
09/23/2020

FluentNet: End-to-End Detection of Speech Disfluency with Deep Learning

Strong presentation skills are valuable and sought-after in workplace an...
research
04/09/2019

Performance Monitoring for End-to-End Speech Recognition

Measuring performance of an automatic speech recognition (ASR) system wi...
research
03/17/2017

DropRegion Training of Inception Font Network for High-Performance Chinese Font Recognition

Chinese font recognition (CFR) has gained significant attention in recen...

Please sign up or login with your details

Forgot password? Click here to reset