Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline

07/19/2023
by   Zhigang Chang, et al.
0

In dyadic speaker-listener interactions, the listener's head reactions along with the speaker's head movements, constitute an important non-verbal semantic expression together. The listener Head generation task aims to synthesize responsive listener's head videos based on audios of the speaker and reference images of the listener. Compared to the Talking-head generation, it is more challenging to capture the correlation clues from the speaker's audio and visual information. Following the ViCo baseline scheme, we propose a high-performance solution by enhancing the hierarchical semantic extraction capability of the audio encoder module and improving the decoder part, renderer and post-processing modules. Our solution gets the first place on the official leaderboard for the track of listening head generation. This paper is a technical report of ViCo@2023 Conversational Head Generation Challenge in ACM Multimedia 2023 conference.

READ FULL TEXT

page 3

page 4

research
06/26/2022

Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer

This paper reports our solution for MultiMedia ViCo 2022 Conversational ...
research
04/16/2021

Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation

In this paper, we propose a novel text-based talking-head video generati...
research
02/13/2022

Lip movements information disentanglement for lip sync

The lip movements information is critical for many audio-visual tasks. H...
research
11/02/2022

Autoregressive GAN for Semantic Unconditional Head Motion Generation

We address the task of unconditional head motion generation to animate s...
research
12/27/2021

Responsive Listening Head Generation: A Benchmark Dataset and Baseline

Responsive listening during face-to-face conversations is a critical ele...
research
08/05/2022

Robust Acoustic Domain Identification with its Application to Speaker Diarization

With the rise in multimedia content over the years, more variety is obse...
research
12/09/2021

7th AI Driving Olympics: 1st Place Report for Panoptic Tracking

In this technical report, we describe our EfficientLPT architecture that...

Please sign up or login with your details

Forgot password? Click here to reset