Log In Sign Up

Audio- and Gaze-driven Facial Animation of Codec Avatars

by   Alexander Richard, et al.

Codec Avatars are a recent class of learned, photorealistic face models that accurately represent the geometry and texture of a person in 3D (i.e., for virtual reality), and are almost indistinguishable from video. In this paper we describe the first approach to animate these parametric models in real-time which could be deployed on commodity virtual reality hardware using audio and/or eye tracking. Our goal is to display expressive conversations between individuals that exhibit important social signals such as laughter and excitement solely from latent cues in our lossy input signals. To this end we collected over 5 hours of high frame rate 3D face scans across three participants including traditional neutral speech as well as expressive and conversational speech. We investigate a multimodal fusion approach that dynamically identifies which sensor encoding should animate which parts of the face at any time. See the supplemental video which demonstrates our ability to generate full face motion far beyond the typically neutral lip articulations seen in competing work:


page 5

page 6

page 7

page 8


FaceVR: Real-Time Facial Reenactment and Eye Gaze Control in Virtual Reality

We introduce FaceVR, a novel method for gaze-aware facial reenactment in...

MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement

This paper presents a generic method for generating full facial 3D anima...

Eyemotion: Classifying facial expressions in VR using eye-tracking cameras

One of the main challenges of social interaction in virtual reality sett...

Real-time 3D Face-Eye Performance Capture of a Person Wearing VR Headset

Teleconference or telepresence based on virtual reality (VR) headmount d...

Capture, Learning, and Synthesis of 3D Speaking Styles

Audio-driven 3D facial animation has been widely explored, but achieving...

Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion

We present a framework for modeling interactional communication in dyadi...