DeepAI AI Chat
Log In Sign Up

Fingerspelling recognition in the wild with iterative visual attention

by   Bowen Shi, et al.
The University of Chicago
Toyota Technological Institute at Chicago

Sign language recognition is a challenging gesture sequence recognition problem, characterized by quick and highly coarticulated motion. In this paper we focus on recognition of fingerspelling sequences in American Sign Language (ASL) videos collected in the wild, mainly from YouTube and Deaf social media. Most previous work on sign language recognition has focused on controlled settings where the data is recorded in a studio environment and the number of signers is limited. Our work aims to address the challenges of real-life data, reducing the need for detection or segmentation modules commonly used in this domain. We propose an end-to-end model based on an iterative attention mechanism, without explicit hand detection or segmentation. Our approach dynamically focuses on increasingly high-resolution regions of interest. It outperforms prior work by a large margin. We also introduce a newly collected data set of crowdsourced annotations of fingerspelling in the wild, and show that performance can be further improved with this additional data set.


page 1

page 2

page 5

page 11

page 14


American Sign Language fingerspelling recognition in the wild

We address the problem of American Sign Language fingerspelling recognit...

A Fine-Grained Visual Attention Approach for Fingerspelling Recognition in the Wild

Fingerspelling in sign language has been the means of communicating tech...

Toward American Sign Language Processing in the Real World: Data, Tasks, and Methods

Sign language, which conveys meaning through gestures, is the chief mean...

Searching for fingerspelled content in American Sign Language

Natural language processing for sign language video - including tasks li...

On the Importance of Signer Overlap for Sign Language Detection

Sign language detection, identifying if someone is signing or not, is be...

Unsupervised Sign Language Phoneme Clustering using HamNoSys Notation

Traditionally, sign language resources have been collected in controlled...

MS-ASL: A Large-Scale Data Set and Benchmark for Understanding American Sign Language

Computer Vision has been improved significantly in the past few decades....

Code Repositories


Real-time fingerspelling video recognition achieving 74.4% letter accuracy on ChicagoFSWild+

view repo


These datasets are used for machine-learning research

view repo


ASL Fingerspelling recognition in the wild

view repo