Sign Language Translation with Iterative Prototype

08/23/2023
by   Huijie Yao, et al.
0

This paper presents IP-SLT, a simple yet effective framework for sign language translation (SLT). Our IP-SLT adopts a recurrent structure and enhances the semantic representation (prototype) of the input sign language video via an iterative refinement manner. Our idea mimics the behavior of human reading, where a sentence can be digested repeatedly, till reaching accurate understanding. Technically, IP-SLT consists of feature extraction, prototype initialization, and iterative prototype refinement. The initialization module generates the initial prototype based on the visual feature extracted by the feature extraction module. Then, the iterative refinement module leverages the cross-attention mechanism to polish the previous prototype by aggregating it with the original video feature. Through repeated refinement, the prototype finally converges to a more stable and accurate state, leading to a fluent and appropriate translation. In addition, to leverage the sequential dependence of prototypes, we further propose an iterative distillation loss to compress the knowledge of the final iteration into previous ones. As the autoregressive decoding process is executed only once in inference, our IP-SLT is ready to improve various SLT systems with acceptable overhead. Extensive experiments are conducted on public benchmarks to demonstrate the effectiveness of the IP-SLT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/23/2014

A prototype Malayalam to Sign Language Automatic Translator

Sign language, which is a medium of communication for deaf people, uses ...
research
06/06/2023

Iterative Translation Refinement with Large Language Models

Large language models have shown surprising performances in understandin...
research
10/20/2016

Iterative Refinement for Machine Translation

Existing machine translation decoding algorithms generate translations i...
research
11/25/2020

Sign language segmentation with temporal convolutional networks

The objective of this work is to determine the location of temporal boun...
research
08/12/2022

Non-Autoregressive Sign Language Production via Knowledge Distillation

Sign Language Production (SLP) aims to translate expressions in spoken l...
research
09/14/2021

Progressively Guide to Attend: An Iterative Alignment Framework for Temporal Sentence Grounding

A key solution to temporal sentence grounding (TSG) exists in how to lea...
research
08/19/2023

Scalable Video Object Segmentation with Simplified Framework

The current popular methods for video object segmentation (VOS) implemen...

Please sign up or login with your details

Forgot password? Click here to reset