Reconstructing Signing Avatars From Video Using Linguistic Priors

04/20/2023
by   Maria-Paola Forte, et al.
0

Sign language (SL) is the primary method of communication for the 70 million Deaf people around the world. Video dictionaries of isolated signs are a core SL learning tool. Replacing these with 3D avatars can aid learning and enable AR/VR applications, improving access to technology and online media. However, little work has attempted to estimate expressive 3D avatars from SL video; occlusion, noise, and motion blur make this task difficult. We address this by introducing novel linguistic priors that are universally applicable to SL and provide constraints on 3D hand pose that help resolve ambiguities within isolated signs. Our method, SGNify, captures fine-grained hand pose, facial expression, and body movement fully automatically from in-the-wild monocular SL videos. We evaluate SGNify quantitatively by using a commercial motion-capture system to compute 3D avatars synchronized with monocular video. SGNify outperforms state-of-the-art 3D body-pose- and shape-estimation methods on SL videos. A perceptual study shows that SGNify's 3D reconstructions are significantly more comprehensible and natural than those of previous methods and are on par with the source videos. Code and data are available at $\href{http://sgnify.is.tue.mpg.de}{\text{sgnify.is.tue.mpg.de}}$.

READ FULL TEXT

page 1

page 5

page 13

page 15

page 16

page 17

page 18

research
08/19/2020

FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration

Although the essential nuance of human motion is often conveyed as a com...
research
11/10/2021

Dance In the Wild: Monocular Human Animation with Neural Dynamic Appearance Synthesis

Synthesizing dynamic appearances of humans in motion plays a central rol...
research
10/11/2022

HiFECap: Monocular High-Fidelity and Expressive Capture of Human Performances

Monocular 3D human performance capture is indispensable for many applica...
research
11/29/2021

Human Performance Capture from Monocular Video in the Wild

Capturing the dynamically deforming 3D shape of clothed human is essenti...
research
07/23/2020

Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body Dynamics

We propose a novel learned deep prior of body motion for 3D hand shape s...
research
02/07/2022

Imposing Temporal Consistency on Deep Monocular Body Shape and Pose Estimation

Accurate and temporally consistent modeling of human bodies is essential...
research
05/17/2021

A Fine-Grained Visual Attention Approach for Fingerspelling Recognition in the Wild

Fingerspelling in sign language has been the means of communicating tech...

Please sign up or login with your details

Forgot password? Click here to reset