Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding

04/11/2022
by   Sanjana Sankar, et al.
0

This paper proposes a simple and effective approach for automatic recognition of Cued Speech (CS), a visual communication tool that helps people with hearing impairment to understand spoken language with the help of hand gestures that can uniquely identify the uttered phonemes in complement to lipreading. The proposed approach is based on a pre-trained hand and lips tracker used for visual feature extraction and a phonetic decoder based on a multistream recurrent neural network trained with connectionist temporal classification loss and combined with a pronunciation lexicon. The proposed system is evaluated on an updated version of the French CS dataset CSF18 for which the phonetic transcription has been manually checked and corrected. With a decoding accuracy at the phonetic level of 70.88 previous CNN-HMM decoder and competes with more complex baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2023

Investigating the dynamics of hand and lips in French Cued Speech using attention mechanisms and CTC-based decoding

Hard of hearing or profoundly deaf people make use of cued speech (CS) a...
research
06/18/2020

Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition

Code-switching (CS) occurs when a speaker alternates words of two or mor...
research
03/19/2021

Prediction of progressive lens performance from neural network simulations

Purpose: The purpose of this study is to present a framework to predict ...
research
09/27/2019

End-to-End Code-Switching ASR for Low-Resourced Language Pairs

Despite the significant progress in end-to-end (E2E) automatic speech re...
research
06/29/2022

The THUEE System Description for the IARPA OpenASR21 Challenge

This paper describes the THUEE team's speech recognition system for the ...
research
08/07/2023

Cuing Without Sharing: A Federated Cued Speech Recognition Framework via Mutual Knowledge Distillation

Cued Speech (CS) is a visual coding tool to encode spoken languages at t...
research
09/07/2016

A three-dimensional approach to Visual Speech Recognition using Discrete Cosine Transforms

Visual speech recognition aims to identify the sequence of phonemes from...

Please sign up or login with your details

Forgot password? Click here to reset