Let's face it: Probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings

06/11/2020
by   Patrik Jonell, et al.
0

To enable more natural face-to-face interactions, conversational agents need to adapt their behavior to their interlocutors. One key aspect of this is generation of appropriate non-verbal behavior for the agent, for example facial gestures, here defined as facial expressions and head movements. Most existing gesture-generating systems do not utilize multi-modal cues from the interlocutor when synthesizing non-verbal behavior. Those that do, typically use deterministic methods that risk producing repetitive and non-vivid motions. In this paper, we introduce a probabilistic method to synthesize interlocutor-aware facial gestures - represented by highly expressive FLAME parameters - in dyadic conversations. Our contributions are: a) a method for feature extraction from multi-party video and speech recordings, resulting in a representation that allows for independent control and manipulation of expression and speech articulation in a 3D avatar; b) an extension to MoGlow, a recent motion-synthesis method based on normalizing flows, to also take multi-modal signals from the interlocutor as input and subsequently output interlocutor-aware facial gestures; and c) subjective and objective experiments assessing the use and relative importance of the different modalities in the synthesized output. The results show that the model successfully leverages the input from the interlocutor to generate more appropriate behavior.

READ FULL TEXT
research
06/15/2023

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

With read-aloud speech synthesis achieving high naturalness scores, ther...
research
02/13/2021

Learning Speech-driven 3D Conversational Gestures from Video

We propose the first approach to automatically and jointly synthesize bo...
research
11/10/2020

Multi-modal Fusion for Single-Stage Continuous Gesture Recognition

Gesture recognition is a much studied research area which has myriad rea...
research
03/10/2022

BEAT: A Large-Scale Semantic and Emotional Multi-Modal Dataset for Conversational Gestures Synthesis

Achieving realistic, vivid, and human-like synthesized conversational ge...
research
12/04/2018

A Face-to-Face Neural Conversation Model

Neural networks have recently become good at engaging in dialog. However...
research
05/18/2023

AMII: Adaptive Multimodal Inter-personal and Intra-personal Model for Adapted Behavior Synthesis

Socially Interactive Agents (SIAs) are physical or virtual embodied agen...
research
12/14/2020

Multi Modal Adaptive Normalization for Audio to Video Generation

Speech-driven facial video generation has been a complex problem due to ...

Please sign up or login with your details

Forgot password? Click here to reset