READ Avatars: Realistic Emotion-controllable Audio Driven Avatars

03/01/2023
by   Jack Saunders, et al.
0

We present READ Avatars, a 3D-based approach for generating 2D avatars that are driven by audio input with direct and granular control over the emotion. Previous methods are unable to achieve realistic animation due to the many-to-many nature of audio to expression mappings. We alleviate this issue by introducing an adversarial loss in the audio-to-expression generation process. This removes the smoothing effect of regression-based models and helps to improve the realism and expressiveness of the generated avatars. We note furthermore, that audio should be directly utilized when generating mouth interiors and that other 3D-based methods do not attempt this. We address this with audio-conditioned neural textures, which are resolution-independent. To evaluate the performance of our method, we perform quantitative and qualitative experiments, including a user study. We also propose a new metric for comparing how well an actor's emotion is reconstructed in the generated avatar. Our results show that our approach outperforms state of the art audio-driven avatar generation methods across several metrics. A demo video can be found at <https://youtu.be/QSyMl3vV0pA>

READ FULL TEXT

page 1

page 4

page 5

page 6

page 7

page 8

page 9

page 13

research
09/06/2022

Read it to me: An emotionally aware Speech Narration Application

In this work we try to perform emotional style transfer on audios. In pa...
research
08/08/2020

Speech Driven Talking Face Generation from a Single Image and an Emotion Condition

Visual emotion expression plays an important role in audiovisual speech ...
research
04/23/2023

Towards Controllable Audio Texture Morphing

In this paper, we propose a data-driven approach to train a Generative A...
research
05/30/2022

EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model

Although significant progress has been made to audio-driven talking face...
research
09/10/2023

Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation

Audio-driven talking-head synthesis is a popular research topic for virt...
research
04/24/2022

EMOCA: Emotion Driven Monocular Face Capture and Animation

As 3D facial avatars become more widely used for communication, it is cr...
research
06/25/2022

Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms

We describe our approach for the generative emotional vocal burst task (...

Please sign up or login with your details

Forgot password? Click here to reset