Cross-Modal Mutual Learning for Cued Speech Recognition

12/02/2022
by   Lei Liu, et al.
0

Automatic Cued Speech Recognition (ACSR) provides an intelligent human-machine interface for visual communications, where the Cued Speech (CS) system utilizes lip movements and hand gestures to code spoken language for hearing-impaired people. Previous ACSR approaches often utilize direct feature concatenation as the main fusion paradigm. However, the asynchronous modalities (i.e., lip, hand shape and hand position) in CS may cause interference for feature concatenation. To address this challenge, we propose a transformer based cross-modal mutual learning framework to prompt multi-modal interaction. Compared with the vanilla self-attention, our model forces modality-specific information of different modalities to pass through a modality-invariant codebook, collating linguistic representations for tokens of each modality. Then the shared linguistic knowledge is used to re-synchronize multi-modal sequences. Moreover, we establish a novel large-scale multi-speaker CS dataset for Mandarin Chinese. To our knowledge, this is the first work on ACSR for Mandarin Chinese. Extensive experiments are conducted for different languages (i.e., Chinese, French, and British English). Results demonstrate that our model exhibits superior recognition performance to the state-of-the-art by a large margin.

READ FULL TEXT

page 2

page 4

research
01/03/2020

Re-synchronization using the Hand Preceding Model for Multi-modal Fusion in Automatic Continuous Cued Speech Recognition

Cued Speech (CS) is an augmented lip reading complemented by hand coding...
research
06/25/2021

Cross-Modal Knowledge Distillation Method for Automatic Cued Speech Recognition

Cued Speech (CS) is a visual communication system for the deaf or hearin...
research
08/07/2023

Cuing Without Sharing: A Federated Cued Speech Recognition Framework via Mutual Knowledge Distillation

Cued Speech (CS) is a visual coding tool to encode spoken languages at t...
research
05/24/2005

Multi-Modal Human-Machine Communication for Instructing Robot Grasping Tasks

A major challenge for the realization of intelligent robots is to supply...
research
01/03/2020

A New Re-synchronization Method based Multi-modal Fusion for Automatic Continuous Cued Speech Recognition

Cued Speech (CS) is an augmented lip reading complemented by hand coding...
research
01/03/2020

A Pilot Study on Mandarin Chinese Cued Speech

Cued Speech (CS) is a communication system developed for deaf people, wh...
research
06/26/2021

An Attention Self-supervised Contrastive Learning based Three-stage Model for Hand Shape Feature Representation in Cued Speech

Cued Speech (CS) is a communication system for deaf people or hearing im...

Please sign up or login with your details

Forgot password? Click here to reset