DeepAI AI Chat
Log In Sign Up

ZS-SLR: Zero-Shot Sign Language Recognition from RGB-D Videos

by   Razieh Rastgoo, et al.

Sign Language Recognition (SLR) is a challenging research area in computer vision. To tackle the annotation bottleneck in SLR, we formulate the problem of Zero-Shot Sign Language Recognition (ZS-SLR) and propose a two-stream model from two input modalities: RGB and Depth videos. To benefit from the vision Transformer capabilities, we use two vision Transformer models, for human detection and visual features representation. We configure a transformer encoder-decoder architecture, as a fast and accurate human detection model, to overcome the challenges of the current human detection models. Considering the human keypoints, the detected human body is segmented into nine parts. A spatio-temporal representation from human body is obtained using a vision Transformer and a LSTM network. A semantic space maps the visual features to the lingual embedding of the class labels via a Bidirectional Encoder Representations from Transformers (BERT) model. We evaluated the proposed model on four datasets, Montalbano II, MSR Daily Activity 3D, CAD-60, and NTU-60, obtaining state-of-the-art results compared to state-of-the-art ZS-SLR models.


Multi-Modal Zero-Shot Sign Language Recognition

Zero-Shot Learning (ZSL) has rapidly advanced in recent years. Towards o...

Zero-Shot Sign Language Recognition: Can Textual Data Uncover Sign Languages?

We introduce the problem of zero-shot sign language recognition (ZSSLR),...

TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning

Zero-shot learning (ZSL) tackles the novel class recognition problem by ...

Two-Stream Network for Sign Language Recognition and Translation

Sign languages are visual languages using manual articulations and non-m...

Using Motion History Images with 3D Convolutional Networks in Isolated Sign Language Recognition

Sign language recognition using computational models is a challenging pr...

A Transformer-Based Contrastive Learning Approach for Few-Shot Sign Language Recognition

Sign language recognition from sequences of monocular images or 2D poses...

Natural Language-Assisted Sign Language Recognition

Sign languages are visual languages which convey information by signers'...