StepNet: Spatial-temporal Part-aware Network for Sign Language Recognition

12/25/2022
by   Xiaolong Shen, et al.
3

Sign language recognition (SLR) aims to overcome the communication barrier for the people with deafness or the people with hard hearing. Most existing approaches can be typically divided into two lines, i.e., Skeleton-based and RGB-based methods, but both the two lines of methods have their limitations. RGB-based approaches usually overlook the fine-grained hand structure, while Skeleton-based methods do not take the facial expression into account. In attempts to address both limitations, we propose a new framework named Spatial-temporal Part-aware network (StepNet), based on RGB parts. As the name implies, StepNet consists of two modules: Part-level Spatial Modeling and Part-level Temporal Modeling. Particularly, without using any keypoint-level annotations, Part-level Spatial Modeling implicitly captures the appearance-based properties, such as hands and faces, in the feature space. On the other hand, Part-level Temporal Modeling captures the pertinent properties over time by implicitly mining the long-short term context. Extensive experiments show that our StepNet, thanks to Spatial-temporal modules, achieves competitive Top-1 Per-instance accuracy on three widely-used SLR benchmarks, i.e., 56.89 proposed method is compatible with the optical flow input, and can yield higher performance if fused. We hope that this work can serve as a preliminary step for the people with deafness.

READ FULL TEXT

page 1

page 2

page 3

research
04/19/2022

Multi-View Spatial-Temporal Network for Continuous Sign Language Recognition

Sign language is a beautiful visual language and is also the primary lan...
research
10/12/2021

Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble

Sign language is commonly used by deaf or mute people to communicate but...
research
08/18/2022

Spatial Temporal Graph Attention Network for Skeleton-Based Action Recognition

It's common for current methods in skeleton-based action recognition to ...
research
10/12/2022

DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action Recognition

Graph convolution networks (GCN) have been widely used in skeleton-based...
research
07/08/2022

Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation

Sign language is the window for people differently-abled to express thei...
research
01/31/2019

Spatial-Temporal Graph Convolutional Networks for Sign Language Recognition

The recognition of sign language is a challenging task with an important...
research
02/03/2022

Exploring Sub-skeleton Trajectories for Interpretable Recognition of Sign Language

Recent advances in tracking sensors and pose estimation software enable ...

Please sign up or login with your details

Forgot password? Click here to reset