To React or not to React: End-to-End Visual Pose Forecasting for Personalized Avatar during Dyadic Conversations

10/05/2019
by   Chaitanya Ahuja, et al.
15

Non verbal behaviours such as gestures, facial expressions, body posture, and para-linguistic cues have been shown to complement or clarify verbal messages. Hence to improve telepresence, in form of an avatar, it is important to model these behaviours, especially in dyadic interactions. Creating such personalized avatars not only requires to model intrapersonal dynamics between a avatar's speech and their body pose, but it also needs to model interpersonal dynamics with the interlocutor present in the conversation. In this paper, we introduce a neural architecture named Dyadic Residual-Attention Model (DRAM), which integrates intrapersonal (monadic) and interpersonal (dyadic) dynamics using selective attention to generate sequences of body pose conditioned on audio and body pose of the interlocutor and audio of the human operating the avatar. We evaluate our proposed model on dyadic conversational data consisting of pose and audio of both participants, confirming the importance of adaptive attention between monadic and dyadic dynamics when predicting avatar pose. We also conduct a user study to analyze judgments of human observers. Our results confirm that the generated body pose is more natural, models intrapersonal dynamics and interpersonal dynamics better than non-adaptive monadic/dyadic models.

READ FULL TEXT

page 1

page 2

page 7

research
02/13/2021

Learning Speech-driven 3D Conversational Gestures from Video

We propose the first approach to automatically and jointly synthesize bo...
research
05/20/2022

Analysis of Co-Laughter Gesture Relationship on RGB videos in Dyadic Conversation Contex

The development of virtual agents has enabled human-avatar interactions ...
research
07/23/2020

Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body Dynamics

We propose a novel learned deep prior of body motion for 3D hand shape s...
research
04/10/2023

Robust Body Exposure (RoBE): A Graph-based Dynamics Modeling Approach to Manipulating Blankets over People

Robotic caregivers could potentially improve the quality of life of many...
research
07/02/2019

Language2Pose: Natural Language Grounded Pose Forecasting

Generating animations from natural language sentences finds its applicat...
research
07/10/2022

A Probabilistic Model Of Interaction Dynamics for Dyadic Face-to-Face Settings

Natural conversations between humans often involve a large number of non...
research
12/19/2017

Audio to Body Dynamics

We present a method that gets as input an audio of violin or piano playi...

Please sign up or login with your details

Forgot password? Click here to reset