An Attention Self-supervised Contrastive Learning based Three-stage Model for Hand Shape Feature Representation in Cued Speech

06/26/2021
by   Jianrong Wang, et al.
0

Cued Speech (CS) is a communication system for deaf people or hearing impaired people, in which a speaker uses it to aid a lipreader in phonetic level by clarifying potentially ambiguous mouth movements with hand shape and positions. Feature extraction of multi-modal CS is a key step in CS recognition. Recent supervised deep learning based methods suffer from noisy CS data annotations especially for hand shape modality. In this work, we first propose a self-supervised contrastive learning method to learn the feature representation of image without using labels. Secondly, a small amount of manually annotated CS data are used to fine-tune the first module. Thirdly, we present a module, which combines Bi-LSTM and self-attention networks to further learn sequential features with temporal and contextual information. Besides, to enlarge the volume and the diversity of the current limited CS datasets, we build a new British English dataset containing 5 native CS speakers. Evaluation results on both French and British English datasets show that our model achieves over 90 of 8.75 phoneme recognition correctness compared with the state-of-the-art.

READ FULL TEXT
research
01/03/2020

Re-synchronization using the Hand Preceding Model for Multi-modal Fusion in Automatic Continuous Cued Speech Recognition

Cued Speech (CS) is an augmented lip reading complemented by hand coding...
research
06/25/2021

Cross-Modal Knowledge Distillation Method for Automatic Cued Speech Recognition

Cued Speech (CS) is a visual communication system for the deaf or hearin...
research
01/03/2020

A New Re-synchronization Method based Multi-modal Fusion for Automatic Continuous Cued Speech Recognition

Cued Speech (CS) is an augmented lip reading complemented by hand coding...
research
12/02/2022

Cross-Modal Mutual Learning for Cued Speech Recognition

Automatic Cued Speech Recognition (ACSR) provides an intelligent human-m...
research
06/05/2023

A Novel Interpretable and Generalizable Re-synchronization Model for Cued Speech based on a Multi-Cuer Corpus

Cued Speech (CS) is a multi-modal visual coding system combining lip rea...
research
12/12/2018

EasiCSDeep: A deep learning model for Cervical Spondylosis Identification using surface electromyography signal

Cervical spondylosis (CS) is a common chronic disease that affects up to...
research
03/28/2023

Data Efficient Contrastive Learning in Histopatholgy using Active Sampling

Deep Learning based diagnostics systems can provide accurate and robust ...

Please sign up or login with your details

Forgot password? Click here to reset