SignDiff: Learning Diffusion Models for American Sign Language Production

08/30/2023
by   Sen Fang, et al.
0

The field of Sign Language Production (SLP) lacked a large-scale, pre-trained model based on deep learning for continuous American Sign Language (ASL) production in the past decade. This limitation hampers communication for all individuals with disabilities relying on ASL. To address this issue, we undertook the secondary development and utilization of How2Sign, one of the largest publicly available ASL datasets. Despite its significance, prior researchers in the field of sign language have not effectively employed this corpus due to the intricacies involved in American Sign Language Production (ASLP). To conduct large-scale ASLP, we propose SignDiff based on the latest work in related fields, which is a dual-condition diffusion pre-training model that can generate human sign language speakers from a skeleton pose. SignDiff has a novel Frame Reinforcement Network called FR-Net, similar to dense human pose estimation work, which enhances the correspondence between text lexical symbols and sign language dense pose frames reduce the occurrence of multiple fingers in the diffusion model. In addition, our ASLP method proposes two new improved modules and a new loss function to improve the accuracy and quality of sign language skeletal posture and enhance the ability of the model to train on large-scale data. We propose the first baseline for ASL production and report the scores of 17.19 and 12.85 on BLEU-4 on the How2Sign dev/test sets. We also evaluated our model on the previous mainstream dataset called PHOENIX14T, and the main experiments achieved the results of SOTA. In addition, our image quality far exceeds all previous results by 10 percentage points on the SSIM indicator. Finally, we conducted ablation studies and qualitative evaluations for discussion.

READ FULL TEXT

page 1

page 4

page 8

research
03/29/2022

Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production

Sign languages are visual languages, with vocabularies as rich as their ...
research
04/02/2020

BosphorusSign22k Sign Language Recognition Dataset

Sign Language Recognition is a challenging research domain. It has recen...
research
10/24/2019

Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison

Vision-based sign language recognition aims at helping the hearing-impai...
research
10/13/2022

SDW-ASL: A Dynamic System to Generate Large Scale Dataset for Continuous American Sign Language

Despite tremendous progress in natural language processing using deep le...
research
02/22/2023

Multi-View Bangla Sign Language(MV-BSL) Dataset and Continuous BSL Recognition

Being able to express our thoughts, feelings, and ideas to one another i...
research
06/27/2023

YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English Parallel Corpus

Machine learning for sign languages is bottlenecked by data. In this pap...
research
12/06/2022

SignNet: Single Channel Sign Generation using Metric Embedded Learning

A true interpreting agent not only understands sign language and transla...

Please sign up or login with your details

Forgot password? Click here to reset