Learning from What is Already Out There: Few-shot Sign Language Recognition with Online Dictionaries

by   Matyáš Boháček, et al.

Today's sign language recognition models require large training corpora of laboratory-like videos, whose collection involves an extensive workforce and financial resources. As a result, only a handful of such systems are publicly available, not to mention their limited localization capabilities for less-populated sign languages. Utilizing online text-to-video dictionaries, which inherently hold annotated data of various attributes and sign languages, and training models in a few-shot fashion hence poses a promising path for the democratization of this technology. In this work, we collect and open-source the UWB-SL-Wild few-shot dataset, the first of its kind training resource consisting of dictionary-scraped videos. This dataset represents the actual distribution and characteristics of available online sign language data. We select glosses that directly overlap with the already existing datasets WLASL100 and ASLLVD and share their class mappings to allow for transfer learning experiments. Apart from providing baseline results on a pose-based architecture, we introduce a novel approach to training sign language recognition models in a few-shot scenario, resulting in state-of-the-art results on ASLLVD-Skeleton and ASLLVD-Skeleton-20 datasets with top-1 accuracy of 30.97 % and 95.45 %, respectively.


Can Everybody Sign Now? Exploring Sign Language Video Generation from 2D Poses

Recent work have addressed the generation of human poses represented by ...

Zero-Shot Sign Language Recognition: Can Textual Data Uncover Sign Languages?

We introduce the problem of zero-shot sign language recognition (ZSSLR),...

An Open-Source Gloss-Based Baseline for Spoken to Signed Language Translation

Sign language translation systems are complex and require many component...

Scaling up sign spotting through sign language dictionaries

The focus of this work is sign spotting - given a video of an isolated s...

Combining Efficient and Precise Sign Language Recognition: Good pose estimation library is all you need

Sign language recognition could significantly improve the user experienc...

WLASL-LEX: a Dataset for Recognising Phonological Properties in American Sign Language

Signed Language Processing (SLP) concerns the automated processing of si...

Pose-Guided Sign Language Video GAN with Dynamic Lambda

We propose a novel approach for the synthesis of sign language videos us...

Please sign up or login with your details

Forgot password? Click here to reset