Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production

03/29/2022
by   Ben Saunders, et al.
23

Sign languages are visual languages, with vocabularies as rich as their spoken language counterparts. However, current deep-learning based Sign Language Production (SLP) models produce under-articulated skeleton pose sequences from constrained vocabularies and this limits applicability. To be understandable and accepted by the deaf, an automatic SLP system must be able to generate co-articulated photo-realistic signing sequences for large domains of discourse. In this work, we tackle large-scale SLP by learning to co-articulate between dictionary signs, a method capable of producing smooth signing while scaling to unconstrained domains of discourse. To learn sign co-articulation, we propose a novel Frame Selection Network (FS-Net) that improves the temporal alignment of interpolated dictionary signs to continuous signing sequences. Additionally, we propose SignGAN, a pose-conditioned human synthesis model that produces photo-realistic sign language videos direct from skeleton pose. We propose a novel keypoint-based loss function which improves the quality of synthesized hand images. We evaluate our SLP model on the large-scale meineDGS (mDGS) corpus, conducting extensive user evaluation showing our FS-Net approach improves co-articulation of interpolated dictionary signs. Additionally, we show that SignGAN significantly outperforms all baseline methods for quantitative metrics, human perceptual studies and native deaf signer comprehension.

READ FULL TEXT

page 1

page 6

research
11/19/2020

Everybody Sign Now: Translating Spoken Language to Photo Realistic Sign Language Video

To be truly understandable and accepted by Deaf communities, an automati...
research
08/30/2023

SignDiff: Learning Diffusion Models for American Sign Language Production

The field of Sign Language Production (SLP) lacked a large-scale, pre-tr...
research
07/22/2021

AnonySIGN: Novel Human Appearance Synthesis for Sign Language Video Anonymisation

The visual anonymisation of sign language data is an essential task to a...
research
03/11/2021

Continuous 3D Multi-Channel Sign Language Production via Progressive Transformers and Mixture Density Networks

Sign languages are multi-channel visual languages, where signers use a c...
research
08/27/2020

Adversarial Training for Multi-Channel Sign Language Production

Sign Languages are rich multi-channel languages, requiring articulation ...
research
12/13/2019

Music-oriented Dance Video Synthesis with Pose Perceptual Loss

We present a learning-based approach with pose perceptual loss for autom...
research
11/24/2022

Ham2Pose: Animating Sign Language Notation into Pose Sequences

Translating spoken languages into Sign languages is necessary for open c...

Please sign up or login with your details

Forgot password? Click here to reset