Human Part-wise 3D Motion Context Learning for Sign Language Recognition

08/18/2023
by   Taeryung Lee, et al.
0

In this paper, we propose P3D, the human part-wise motion context learning framework for sign language recognition. Our main contributions lie in two dimensions: learning the part-wise motion context and employing the pose ensemble to utilize 2D and 3D pose jointly. First, our empirical observation implies that part-wise context encoding benefits the performance of sign language recognition. While previous methods of sign language recognition learned motion context from the sequence of the entire pose, we argue that such methods cannot exploit part-specific motion context. In order to utilize part-wise motion context, we propose the alternating combination of a part-wise encoding Transformer (PET) and a whole-body encoding Transformer (WET). PET encodes the motion contexts from a part sequence, while WET merges them into a unified context. By learning part-wise motion context, our P3D achieves superior performance on WLASL compared to previous state-of-the-art methods. Second, our framework is the first to ensemble 2D and 3D poses for sign language recognition. Since the 3D pose holds rich motion context and depth information to distinguish the words, our P3D outperformed the previous state-of-the-art methods employing a pose ensemble.

READ FULL TEXT

page 1

page 2

page 4

page 5

research
12/20/2020

Can Everybody Sign Now? Exploring Sign Language Video Generation from 2D Poses

Recent work have addressed the generation of human poses represented by ...
research
08/06/2016

Signs in time: Encoding human motion as a temporal image

The goal of this work is to recognise and localise short temporal signal...
research
04/20/2021

Evaluating the Immediate Applicability of Pose Estimation for Sign Language Recognition

Signed languages are visual languages produced by the movement of the ha...
research
12/21/2022

SLGTformer: An Attention-Based Approach to Sign Language Recognition

Sign language is the preferred method of communication of deaf or mute p...
research
11/26/2020

Depth-Aware Action Recognition: Pose-Motion Encoding through Temporal Heatmaps

Most state-of-the-art methods for action recognition rely only on 2D spa...
research
07/08/2022

Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation

Sign language is the window for people differently-abled to express thei...
research
02/10/2023

BEST: BERT Pre-Training for Sign Language Recognition with Coupling Tokenization

In this work, we are dedicated to leveraging the BERT pre-training succe...

Please sign up or login with your details

Forgot password? Click here to reset