Pushing the boundaries of audiovisual word recognition using Residual Networks and LSTMs

11/03/2018
by   Themos Stafylakis, et al.
0

Visual and audiovisual speech recognition are witnessing a renaissance which is largely due to the advent of deep learning methods. In this paper, we present a deep learning architecture for lipreading and audiovisual word recognition, which combines Residual Networks equipped with spatiotemporal input layers and Bidirectional LSTMs. The lipreading architecture attains 11.92 database, which is composed of excerpts from BBC-TV, each containing one of the 500 target words. Audiovisual experiments are performed using both intermediate and late integration, as well as several types and levels of environmental noise, and notable improvements over the audio-only network are reported, even in the case of clean speech. A further analysis on the utility of target word boundaries is provided, as well as on the capacity of the network in modeling the linguistic context of the target word. Finally, we examine difficult word pairs and discuss how visual information helps towards attaining higher recognition accuracy.

READ FULL TEXT

page 4

page 5

research
03/12/2017

Combining Residual Networks with LSTMs for Lipreading

We propose an end-to-end deep learning architecture for word-level visua...
research
10/30/2017

Deep word embeddings for visual speech recognition

In this paper we present a deep learning architecture for extracting wor...
research
02/18/2018

End-to-end Audiovisual Speech Recognition

Several end-to-end deep learning approaches have been recently presented...
research
10/06/2022

Are word boundaries useful for unsupervised language learning?

Word or word-fragment based Language Models (LM) are typically preferred...
research
10/19/2017

Visual Speech Recognition Using PCA Networks and LSTMs in a Tandem GMM-HMM System

Automatic visual speech recognition is an interesting problem in pattern...
research
08/13/2020

Speech Recognition using EEG signals recorded using dry electrodes

In this paper, we demonstrate speech recognition using electroencephalog...
research
05/22/2023

The neural dynamics of auditory word recognition and integration

Listeners recognize and integrate words in rapid and noisy everyday spee...

Please sign up or login with your details

Forgot password? Click here to reset