Responsive Listening Head Generation: A Benchmark Dataset and Baseline

by   Mohan Zhou, et al.
Microsoft, Inc.
Harbin Institute of Technology

Responsive listening during face-to-face conversations is a critical element of social interaction and is well established in psychological research. Through non-verbal signals response to the speakers' words, intonations, or behaviors in real-time, listeners show how they are engaged in dialogue. In this work, we build the Responsive Listener Dataset (RLD), a conversation video corpus collected from the public resources featuring 67 speakers, 76 listeners with three different attitudes. We define the responsive listening head generation task as the synthesis of a non-verbal head with motions and expressions reacting to the multiple inputs, including the audio and visual signal of the speaker. Unlike speech-driven gesture or talking head generation, we introduce more modals in this task, hoping to benefit several research fields, including human-to-human interaction, video-to-video translation, cross-modal understanding, and generation. Furthermore, we release an attitude conditioned listening head generation baseline. Project page: <>.


page 2

page 3

page 4

page 5

page 7

page 8

page 11

page 12


Interactive Conversational Head Generation

We introduce a new conversation head generation benchmark for synthesizi...

Speech2Video: Cross-Modal Distillation for Speech to Video Generation

This paper investigates a novel task of talking face video generation so...

Visual-Aware Text-to-Speech

Dynamically synthesizing talking speech that actively responds to a list...

Text-based Editing of Talking-head Video

Editing talking-head video to change the speech content or to remove fil...

Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline

In dyadic speaker-listener interactions, the listener's head reactions a...

Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary

With the advance of deep learning technology, automatic video generation...

What comprises a good talking-head video generation?: A Survey and Benchmark

Over the years, performance evaluation has become essential in computer ...

Please sign up or login with your details

Forgot password? Click here to reset