Is context all you need? Scaling Neural Sign Language Translation to Large Domains of Discourse

08/18/2023
by   Ozge Mercanoglu Sincan, et al.
0

Sign Language Translation (SLT) is a challenging task that aims to generate spoken language sentences from sign language videos, both of which have different grammar and word/gloss order. From a Neural Machine Translation (NMT) perspective, the straightforward way of training translation models is to use sign language phrase-spoken language sentence pairs. However, human interpreters heavily rely on the context to understand the conveyed information, especially for sign language interpretation, where the vocabulary size may be significantly smaller than their spoken language equivalent. Taking direct inspiration from how humans translate, we propose a novel multi-modal transformer architecture that tackles the translation task in a context-aware manner, as a human would. We use the context from previous sequences and confident predictions to disambiguate weaker visual cues. To achieve this we use complementary transformer encoders, namely: (1) A Video Encoder, that captures the low-level video features at the frame-level, (2) A Spotting Encoder, that models the recognized sign glosses in the video, and (3) A Context Encoder, which captures the context of the preceding sign sequences. We combine the information coming from these encoders in a final transformer decoder to generate spoken language translations. We evaluate our approach on the recently published large-scale BOBSL dataset, which contains  1.2M sequences, and on the SRF dataset, which was part of the WMT-SLT 2022 challenge. We report significant improvements on state-of-the-art translation performance using contextual information, nearly doubling the reported BLEU-4 scores of baseline approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2020

Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation

Prior work on Sign Language Translation has shown that having a mid-leve...
research
04/13/2023

Sign Language Translation from Instructional Videos

The advances in automatic sign language translation (SLT) to spoken lang...
research
09/01/2020

Multi-channel Transformers for Multi-articulatory Sign Language Translation

Sign languages use multiple asynchronous information channels (articulat...
research
05/07/2020

Does Multi-Encoder Help? A Case Study on Context-Aware Neural Machine Translation

In encoder-decoder neural models, multiple encoders are in general used ...
research
04/12/2022

Explore More Guidance: A Task-aware Instruction Network for Sign Language Translation Enhanced with Data Augmentation

Sign language recognition and translation first uses a recognition modul...
research
11/02/2022

Transformer-based encoder-encoder architecture for Spoken Term Detection

The paper presents a method for spoken term detection based on the Trans...
research
09/04/2023

Attention-Driven Multi-Modal Fusion: Enhancing Sign Language Recognition and Translation

In this paper, we devise a mechanism for the addition of multi-modal inf...

Please sign up or login with your details

Forgot password? Click here to reset