Affective Faces for Goal-Driven Dyadic Communication

01/26/2023
by   Scott Geng, et al.
0

We introduce a video framework for modeling the association between verbal and non-verbal communication during dyadic conversation. Given the input speech of a speaker, our approach retrieves a video of a listener, who has facial expressions that would be socially appropriate given the context. Our approach further allows the listener to be conditioned on their own goals, personalities, or backgrounds. Our approach models conversations through a composition of large language models and vision-language models, creating internal representations that are interpretable and controllable. To study multimodal communication, we propose a new video dataset of unscripted conversations covering diverse topics and demographics. Experiments and visualizations show our approach is able to output listeners that are significantly more socially appropriate than baselines. However, many challenges remain, and we release our dataset publicly to spur further progress. See our website for video results, data, and code: https://realtalk.cs.columbia.edu.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

page 8

page 13

research
06/08/2023

Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

Conversation agents fueled by Large Language Models (LLMs) are providing...
research
04/18/2022

Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion

We present a framework for modeling interactional communication in dyadi...
research
08/06/2023

SAPIEN: Affective Virtual Agents Powered by Large Language Models

In this demo paper, we introduce SAPIEN, a platform for high-fidelity vi...
research
06/28/2020

Video Representations of Goals Emerge from Watching Failure

We introduce a video representation learning framework that models the l...
research
11/25/2019

Learning to Learn Words from Narrated Video

When we travel, we often encounter new scenarios we have never experienc...
research
07/06/2023

Your spouse needs professional help: Determining the Contextual Appropriateness of Messages through Modeling Social Relationships

Understanding interpersonal communication requires, in part, understandi...
research
07/24/2023

3D-LLM: Injecting the 3D World into Large Language Models

Large language models (LLMs) and Vision-Language Models (VLMs) have been...

Please sign up or login with your details

Forgot password? Click here to reset