Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation

06/02/2023
by   Federico Nocentini, et al.
0

This paper presents a novel approach for generating 3D talking heads from raw audio inputs. Our method grounds on the idea that speech related movements can be comprehensively and efficiently described by the motion of a few control points located on the movable parts of the face, i.e., landmarks. The underlying musculoskeletal structure then allows us to learn how their motion influences the geometrical deformations of the whole face. The proposed method employs two distinct models to this aim: the first one learns to generate the motion of a sparse set of landmarks from the given audio. The second model expands such landmarks motion to a dense motion field, which is utilized to animate a given 3D mesh in neutral state. Additionally, we introduce a novel loss function, named Cosine Loss, which minimizes the angle between the generated motion vectors and the ground truth ones. Using landmarks in 3D talking head generation offers various advantages such as consistency, reliability, and obviating the need for manual-annotation. Our approach is designed to be identity-agnostic, enabling high-quality facial animations for any users without additional data or training.

READ FULL TEXT
research
04/20/2023

High-Fidelity and Freely Controllable Talking Head Video Generation

Talking head generation is to generate video based on a given source ide...
research
09/01/2021

Sparse to Dense Motion Transfer for Face Image Animation

Face image animation from a single image has achieved remarkable progres...
research
05/25/2020

Identity-Preserving Realistic Talking Face Generation

Speech-driven facial animation is useful for a variety of applications s...
research
12/06/2021

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning

Audio-driven one-shot talking face generation methods are usually traine...
research
05/24/2018

VisemeNet: Audio-Driven Animator-Centric Speech Animation

We present a novel deep-learning based approach to producing animator-ce...
research
03/26/2018

Generating Talking Face Landmarks from Speech

The presence of a corresponding talking face has been shown to significa...
research
09/30/2021

Unsupervised Landmark Detection Based Spatiotemporal Motion Estimation for 4D Dynamic Medical Images

Motion estimation is a fundamental step in dynamic medical image process...

Please sign up or login with your details

Forgot password? Click here to reset