An Attention Self-supervised Contrastive Learning based Three-stage Model for Hand Shape Feature Representation in Cued Speech

06/26/2021
by   Jianrong Wang, et al.
0

Cued Speech (CS) is a communication system for deaf people or hearing impaired people, in which a speaker uses it to aid a lipreader in phonetic level by clarifying potentially ambiguous mouth movements with hand shape and positions. Feature extraction of multi-modal CS is a key step in CS recognition. Recent supervised deep learning based methods suffer from noisy CS data annotations especially for hand shape modality. In this work, we first propose a self-supervised contrastive learning method to learn the feature representation of image without using labels. Secondly, a small amount of manually annotated CS data are used to fine-tune the first module. Thirdly, we present a module, which combines Bi-LSTM and self-attention networks to further learn sequential features with temporal and contextual information. Besides, to enlarge the volume and the diversity of the current limited CS datasets, we build a new British English dataset containing 5 native CS speakers. Evaluation results on both French and British English datasets show that our model achieves over 90 of 8.75 phoneme recognition correctness compared with the state-of-the-art.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

01/03/2020

Re-synchronization using the Hand Preceding Model for Multi-modal Fusion in Automatic Continuous Cued Speech Recognition

Cued Speech (CS) is an augmented lip reading complemented by hand coding...
06/25/2021

Cross-Modal Knowledge Distillation Method for Automatic Cued Speech Recognition

Cued Speech (CS) is a visual communication system for the deaf or hearin...
01/03/2020

A New Re-synchronization Method based Multi-modal Fusion for Automatic Continuous Cued Speech Recognition

Cued Speech (CS) is an augmented lip reading complemented by hand coding...
01/03/2020

A Pilot Study on Mandarin Chinese Cued Speech

Cued Speech (CS) is a communication system developed for deaf people, wh...
12/12/2018

EasiCSDeep: A deep learning model for Cervical Spondylosis Identification using surface electromyography signal

Cervical spondylosis (CS) is a common chronic disease that affects up to...
02/15/2021

Self-Supervised Features Improve Open-World Learning

This is a position paper that addresses the problem of Open-World learni...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.