Visual-Only Recognition of Normal, Whispered and Silent Speech

02/18/2018
by   Stavros Petridis, et al.
0

Silent speech interfaces have been recently proposed as a way to enable communication when the acoustic signal is not available. This introduces the need to build visual speech recognition systems for silent and whispered speech. However, almost all the recently proposed systems have been trained on vocalised data only. This is in contrast with evidence in the literature which suggests that lip movements change depending on the speech mode. In this work, we introduce a new audiovisual database which is publicly available and contains normal, whispered and silent speech. To the best of our knowledge, this is the first study which investigates the differences between the three speech modes using the visual modality only. We show that an absolute decrease in classification rate of up to 3.7 normal and whispered, respectively, and vice versa. An even higher decrease of up to 8.5 reveals that there are indeed visual differences between the 3 speech modes and the common assumption that vocalized training data can be used directly to train a silent speech recognition system may not be true.

READ FULL TEXT
research
06/05/2019

Investigating the Lombard Effect Influence on End-to-End Audio-Visual Speech Recognition

Several audio-visual speech recognition models have been recently propos...
research
04/17/2020

How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition

Audio-Visual Speech Recognition (AVSR) seeks to model, and thereby explo...
research
09/04/2020

Silent Speech Interfaces for Speech Restoration: A Review

This review summarises the status of silent speech interface (SSI) resea...
research
03/25/2023

Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels

Audio-visual speech recognition has received a lot of attention due to i...
research
09/28/2018

Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture

Recent works in speech recognition rely either on connectionist temporal...
research
04/12/2023

Acoustic absement in detail: Quantifying acoustic differences across time-series representations of speech data

The speech signal is a consummate example of time-series data. The acous...
research
01/22/2016

Manifold-Kernels Comparison in MKPLS for Visual Speech Recognition

Speech recognition is a challenging problem. Due to the acoustic limitat...

Please sign up or login with your details

Forgot password? Click here to reset