Responsive Listening Head Generation: A Benchmark Dataset and Baseline

12/27/2021
by   Mohan Zhou, et al.
3

Responsive listening during face-to-face conversations is a critical element of social interaction and is well established in psychological research. Through non-verbal signals response to the speakers' words, intonations, or behaviors in real-time, listeners show how they are engaged in dialogue. In this work, we build the Responsive Listener Dataset (RLD), a conversation video corpus collected from the public resources featuring 67 speakers, 76 listeners with three different attitudes. We define the responsive listening head generation task as the synthesis of a non-verbal head with motions and expressions reacting to the multiple inputs, including the audio and visual signal of the speaker. Unlike speech-driven gesture or talking head generation, we introduce more modals in this task, hoping to benefit several research fields, including human-to-human interaction, video-to-video translation, cross-modal understanding, and generation. Furthermore, we release an attitude conditioned listening head generation baseline. Project page: <https://project.mhzhou.com/rld>.

READ FULL TEXT

page 2

page 3

page 4

page 5

page 7

page 8

page 11

page 12

research
07/05/2023

Interactive Conversational Head Generation

We introduce a new conversation head generation benchmark for synthesizi...
research
07/10/2021

Speech2Video: Cross-Modal Distillation for Speech to Video Generation

This paper investigates a novel task of talking face video generation so...
research
06/21/2023

Visual-Aware Text-to-Speech

Dynamically synthesizing talking speech that actively responds to a list...
research
06/04/2019

Text-based Editing of Talking-head Video

Editing talking-head video to change the speech content or to remove fil...
research
07/19/2023

Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline

In dyadic speaker-listener interactions, the listener's head reactions a...
research
04/29/2021

Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary

With the advance of deep learning technology, automatic video generation...
research
05/07/2020

What comprises a good talking-head video generation?: A Survey and Benchmark

Over the years, performance evaluation has become essential in computer ...

Please sign up or login with your details

Forgot password? Click here to reset