BEST: BERT Pre-Training for Sign Language Recognition with Coupling Tokenization

02/10/2023
by   Weichao Zhao, et al.
0

In this work, we are dedicated to leveraging the BERT pre-training success and modeling the domain-specific statistics to fertilize the sign language recognition (SLR) model. Considering the dominance of hand and body in sign language expression, we organize them as pose triplet units and feed them into the Transformer backbone in a frame-wise manner. Pre-training is performed via reconstructing the masked triplet unit from the corrupted input sequence, which learns the hierarchical correlation context cues among internal and external triplet units. Notably, different from the highly semantic word token in BERT, the pose unit is a low-level signal originally located in continuous space, which prevents the direct adoption of the BERT cross-entropy objective. To this end, we bridge this semantic gap via coupling tokenization of the triplet unit. It adaptively extracts the discrete pseudo label from the pose triplet unit, which represents the semantic gesture/body state. After pre-training, we fine-tune the pre-trained encoder on the downstream SLR task, jointly with the newly added task-specific layer. Extensive experiments are conducted to validate the effectiveness of our proposed method, achieving new state-of-the-art performance on all four benchmarks with a notable gain.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2021

SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition

Hand gesture serves as a critical role in sign language. Current deep-le...
research
05/08/2023

SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign Language Understanding

Hand gesture serves as a crucial role during the expression of sign lang...
research
04/03/2022

POS-BERT: Point Cloud One-Stage BERT Pre-Training

Recently, the pre-training paradigm combining Transformer and masked lan...
research
03/30/2021

Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

We present a new vision-language (VL) pre-training model dubbed Kaleido-...
research
06/10/2020

MC-BERT: Efficient Language Pre-Training via a Meta Controller

Pre-trained contextual representations (e.g., BERT) have become the foun...
research
12/03/2020

Triplet Entropy Loss: Improving The Generalisation of Short Speech Language Identification Systems

We present several methods to improve the generalisation of language ide...
research
08/18/2023

Human Part-wise 3D Motion Context Learning for Sign Language Recognition

In this paper, we propose P3D, the human part-wise motion context learni...

Please sign up or login with your details

Forgot password? Click here to reset