Lip Movements Generation at a Glance

03/28/2018
by   Lele Chen, et al.
0

Cross-modality generation is an emerging topic that aims to synthesize data in one modality based on information in a different modality. In this paper, we consider a task of such: given an arbitrary audio speech and one lip image of arbitrary target identity, generate synthesized lip movements of the target identity saying the speech. To perform well in this task, it inevitably requires a model to not only consider the retention of target identity, photo-realistic of synthesized images, consistency and smoothness of lip images in a sequence, but more importantly, learn the correlations between audio speech and lip movements. To solve the collective problems, we explore the best modeling of the audio-visual correlations in building and training a lip-movement generator network. Specifically, we devise a method to fuse audio and image embeddings to generate multiple lip images at once and propose a novel correlation loss to synchronize lip changes and speech changes. Our final model utilizes a combination of four losses for a comprehensive consideration in generating lip movements; it is trained in an end-to-end fashion and is robust to lip shapes, view angles and different facial characteristics. Thoughtful experiments on three datasets ranging from lab-recorded to lips in-the-wild show that our model significantly outperforms other state-of-the-art methods extended to this task.

READ FULL TEXT

page 2

page 12

page 13

page 14

research
08/09/2021

AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person

Automatically generating videos in which synthesized speech is synchroni...
research
08/06/2020

Unsupervised Cross-Domain Singing Voice Conversion

We present a wav-to-wav generative model for the task of singing voice c...
research
12/06/2021

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning

Audio-driven one-shot talking face generation methods are usually traine...
research
05/09/2019

Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss

We devise a cascade GAN approach to generate talking face video, which i...
research
06/28/2022

Show Me Your Face, And I'll Tell You How You Speak

When we speak, the prosody and content of the speech can be inferred fro...
research
07/20/2018

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation

Talking face generation aims to synthesize a sequence of face images tha...
research
12/17/2018

High-Resolution Talking Face Generation via Mutual Information Approximation

Given an arbitrary speech clip and a facial image, talking face generati...

Please sign up or login with your details

Forgot password? Click here to reset