DeepAI AI Chat
Log In Sign Up

Reconstructing Signing Avatars From Video Using Linguistic Priors

by   Maria-Paola Forte, et al.

Sign language (SL) is the primary method of communication for the 70 million Deaf people around the world. Video dictionaries of isolated signs are a core SL learning tool. Replacing these with 3D avatars can aid learning and enable AR/VR applications, improving access to technology and online media. However, little work has attempted to estimate expressive 3D avatars from SL video; occlusion, noise, and motion blur make this task difficult. We address this by introducing novel linguistic priors that are universally applicable to SL and provide constraints on 3D hand pose that help resolve ambiguities within isolated signs. Our method, SGNify, captures fine-grained hand pose, facial expression, and body movement fully automatically from in-the-wild monocular SL videos. We evaluate SGNify quantitatively by using a commercial motion-capture system to compute 3D avatars synchronized with monocular video. SGNify outperforms state-of-the-art 3D body-pose- and shape-estimation methods on SL videos. A perceptual study shows that SGNify's 3D reconstructions are significantly more comprehensible and natural than those of previous methods and are on par with the source videos. Code and data are available at $\href{}{\text{}}$.


page 1

page 5

page 13

page 15

page 16

page 17

page 18


FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration

Although the essential nuance of human motion is often conveyed as a com...

Dance In the Wild: Monocular Human Animation with Neural Dynamic Appearance Synthesis

Synthesizing dynamic appearances of humans in motion plays a central rol...

HiFECap: Monocular High-Fidelity and Expressive Capture of Human Performances

Monocular 3D human performance capture is indispensable for many applica...

Human Performance Capture from Monocular Video in the Wild

Capturing the dynamically deforming 3D shape of clothed human is essenti...

Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body Dynamics

We propose a novel learned deep prior of body motion for 3D hand shape s...

Imposing Temporal Consistency on Deep Monocular Body Shape and Pose Estimation

Accurate and temporally consistent modeling of human bodies is essential...

A Fine-Grained Visual Attention Approach for Fingerspelling Recognition in the Wild

Fingerspelling in sign language has been the means of communicating tech...