PiSLTRc: Position-informed Sign Language Transformer with Content-aware Convolution

07/27/2021
by   Pan Xie, et al.
0

Since the superiority of Transformer in learning long-term dependency, the sign language Transformer model achieves remarkable progress in Sign Language Recognition (SLR) and Translation (SLT). However, there are several issues with the Transformer that prevent it from better sign language understanding. The first issue is that the self-attention mechanism learns sign video representation in a frame-wise manner, neglecting the temporal semantic structure of sign gestures. Secondly, the attention mechanism with absolute position encoding is direction and distance unaware, thus limiting its ability. To address these issues, we propose a new model architecture, namely PiSLTRc, with two distinctive characteristics: (i) content-aware and position-aware convolution layers. Specifically, we explicitly select relevant features using a novel content-aware neighborhood gathering method. Then we aggregate these features with position-informed temporal convolution layers, thus generating robust neighborhood-enhanced sign representation. (ii) injecting the relative position information to the attention mechanism in the encoder, decoder, and even encoder-decoder cross attention. Compared with the vanilla Transformer model, our model performs consistently better on three large-scale sign language benchmarks: PHOENIX-2014, PHOENIX-2014-T and CSL. Furthermore, extensive experiments demonstrate that the proposed method achieves state-of-the-art performance on translation quality with +1.6 BLEU improvements.

READ FULL TEXT

page 1

page 8

research
05/19/2022

Cross-Enhancement Transformer for Action Segmentation

Temporal convolutions have been the paradigm of choice in action segment...
research
03/06/2018

Self-Attention with Relative Position Representations

Relying entirely on an attention mechanism, the Transformer introduced b...
research
12/12/2022

P-Transformer: Towards Better Document-to-Document Neural Machine Translation

Directly training a document-to-document (Doc2Doc) neural machine transl...
research
09/19/2018

Close to Human Quality TTS with Transformer

Although end-to-end neural text-to-speech (TTS) methods (such as Tacotro...
research
01/12/2021

Context Matters: Self-Attention for Sign Language Recognition

This paper proposes an attentional network for the task of Continuous Si...
research
09/27/2021

Multiplicative Position-aware Transformer Models for Language Understanding

Transformer models, which leverage architectural improvements like self-...
research
05/05/2023

Online Gesture Recognition using Transformer and Natural Language Processing

The Transformer architecture is shown to provide a powerful machine tran...

Please sign up or login with your details

Forgot password? Click here to reset