Improving Continuous Sign Language Recognition with Consistency Constraints and Signer Removal

12/26/2022
by   Ronglai Zuo, et al.
0

Most deep-learning-based continuous sign language recognition (CSLR) models share a similar backbone consisting of a visual module, a sequential module, and an alignment module. However, due to limited training samples, a connectionist temporal classification loss may not train such CSLR backbones sufficiently. In this work, we propose three auxiliary tasks to enhance the CSLR backbones. The first task enhances the visual module, which is sensitive to the insufficient training problem, from the perspective of consistency. Specifically, since the information of sign languages is mainly included in signers' facial expressions and hand movements, a keypoint-guided spatial attention module is developed to enforce the visual module to focus on informative regions, i.e., spatial attention consistency. Second, noticing that both the output features of the visual and sequential modules represent the same sentence, to better exploit the backbone's power, a sentence embedding consistency constraint is imposed between the visual and sequential modules to enhance the representation power of both features. We name the CSLR model trained with the above auxiliary tasks as consistency-enhanced CSLR, which performs well on signer-dependent datasets in which all signers appear during both training and testing. To make it more robust for the signer-independent setting, a signer removal module based on feature disentanglement is further proposed to remove signer information from the backbone. Extensive ablation studies are conducted to validate the effectiveness of these auxiliary tasks. More remarkably, with a transformer-based backbone, our model achieves state-of-the-art or competitive performance on five benchmarks, PHOENIX-2014, PHOENIX-2014-T, PHOENIX-2014-SI, CSL, and CSL-Daily.

READ FULL TEXT

page 1

page 6

page 14

page 16

page 17

research
04/06/2021

Visual Alignment Constraint for Continuous Sign Language Recognition

Vision-based Continuous Sign Language Recognition (CSLR) aims to recogni...
research
02/08/2020

Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition

Despite the recent success of deep learning in continuous sign language ...
research
10/15/2018

3D Feature Pyramid Attention Module for Robust Visual Speech Recognition

Visual speech recognition is the task to decode the speech content from ...
research
09/21/2023

SlowFast Network for Continuous Sign Language Recognition

The objective of this work is the effective extraction of spatial and dy...
research
03/21/2023

Natural Language-Assisted Sign Language Recognition

Sign languages are visual languages which convey information by signers'...
research
11/02/2022

Two-Stream Network for Sign Language Recognition and Translation

Sign languages are visual languages using manual articulations and non-m...
research
08/17/2022

A Monotonicity Constrained Attention Module for Emotion Classification with Limited EEG Data

In this work, a parameter-efficient attention module is presented for em...

Please sign up or login with your details

Forgot password? Click here to reset