Aligning Subtitles in Sign Language Videos

05/06/2021
by   Hannah Bull, et al.
5

The goal of this work is to temporally align asynchronous subtitles in sign language videos. In particular, we focus on sign-language interpreted TV broadcast data comprising (i) a video of continuous signing, and (ii) subtitles corresponding to the audio content. Previous work exploiting such weakly-aligned data only considered finding keyword-sign correspondences, whereas we aim to localise a complete subtitle text in continuous signing. We propose a Transformer architecture tailored for this task, which we train on manually annotated alignments covering over 15K subtitles that span 17.7 hours of video. We use BERT subtitle embeddings and CNN video representations learned for sign recognition to encode the two signals, which interact through a series of attention layers. Our model outputs frame-level predictions, i.e., for each video frame, whether it belongs to the queried subtitle or not. Through extensive evaluations, we show substantial improvements over existing alignment baselines that do not make use of subtitle text embeddings for learning. Our automatic alignment model opens up possibilities for advancing machine translation of sign languages via providing continuously synchronized video-text data.

READ FULL TEXT

page 1

page 9

page 13

page 14

research
08/04/2022

Automatic dense annotation of large-vocabulary sign language videos

Recently, sign language researchers have turned to sign language interpr...
research
07/23/2020

BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues

Recent progress in fine-grained gesture and action classification, and m...
research
08/08/2023

Gloss Alignment Using Word Embeddings

Capturing and annotating Sign language datasets is a time consuming and ...
research
06/24/2021

Towards Automatic Speech to Sign Language Generation

We aim to solve the highly challenging task of generating continuous sig...
research
01/07/2022

Sign Language Video Retrieval with Free-Form Textual Queries

Systems that can efficiently search collections of sign language videos ...
research
11/16/2022

Weakly-supervised Fingerspelling Recognition in British Sign Language Videos

The goal of this work is to detect and recognize sequences of letters si...
research
11/02/2022

Two-Stream Network for Sign Language Recognition and Translation

Sign languages are visual languages using manual articulations and non-m...

Please sign up or login with your details

Forgot password? Click here to reset