SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition

10/11/2021
by   Hezhen Hu, et al.
0

Hand gesture serves as a critical role in sign language. Current deep-learning-based sign language recognition (SLR) methods may suffer insufficient interpretability and overfitting due to limited sign data sources. In this paper, we introduce the first self-supervised pre-trainable SignBERT with incorporated hand prior for SLR. SignBERT views the hand pose as a visual token, which is derived from an off-the-shelf pose extractor. The visual tokens are then embedded with gesture state, temporal and hand chirality information. To take full advantage of available sign data sources, SignBERT first performs self-supervised pre-training by masking and reconstructing visual tokens. Jointly with several mask modeling strategies, we attempt to incorporate hand prior in a model-aware method to better model hierarchical context over the hand sequence. Then with the prediction head added, SignBERT is fine-tuned to perform the downstream SLR task. To validate the effectiveness of our method on SLR, we perform extensive experiments on four public benchmark datasets, i.e., NMFs-CSL, SLR500, MSASL and WLASL. Experiment results demonstrate the effectiveness of both self-supervised learning and imported hand prior. Furthermore, we achieve state-of-the-art performance on all benchmarks with a notable gain.

READ FULL TEXT
research
05/08/2023

SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign Language Understanding

Hand gesture serves as a crucial role during the expression of sign lang...
research
02/10/2023

BEST: BERT Pre-Training for Sign Language Recognition with Coupling Tokenization

In this work, we are dedicated to leveraging the BERT pre-training succe...
research
09/02/2023

Self-Supervised Video Transformers for Isolated Sign Language Recognition

This paper presents an in-depth analysis of various self-supervision met...
research
07/29/2023

HandMIM: Pose-Aware Self-Supervised Learning for 3D Hand Mesh Estimation

With an enormous number of hand images generated over time, unleashing p...
research
05/13/2021

Using Self-Supervised Co-Training to Improve Facial Representation

In this paper, at first, the impact of ImageNet pre-training on Facial E...
research
11/16/2022

Disentangling Task Relations for Few-shot Text Classification via Self-Supervised Hierarchical Task Clustering

Few-Shot Text Classification (FSTC) imitates humans to learn a new text ...
research
04/18/2022

The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training

The self-supervised Masked Image Modeling (MIM) schema, following "mask-...

Please sign up or login with your details

Forgot password? Click here to reset