Stochastic Talking Face Generation Using Latent Distribution Matching

11/21/2020
by   Ravindra Yadav, et al.
0

The ability to envisage the visual of a talking face based just on hearing a voice is a unique human capability. There have been a number of works that have solved for this ability recently. We differ from these approaches by enabling a variety of talking face generations based on single audio input. Indeed, just having the ability to generate a single talking face would make a system almost robotic in nature. In contrast, our unsupervised stochastic audio-to-video generation model allows for diverse generations from a single audio input. Particularly, we present an unsupervised stochastic audio-to-video generation model that can capture multiple modes of the video distribution. We ensure that all the diverse generations are plausible. We do so through a principled multi-modal variational autoencoder framework. We demonstrate its efficacy on the challenging LRW and GRID datasets and demonstrate performance better than the baseline, while having the ability to generate multiple diverse lip synchronized videos.

READ FULL TEXT

page 3

page 4

research
04/01/2021

Collaborative Learning to Generate Audio-Video Jointly

There have been a number of techniques that have demonstrated the genera...
research
01/16/2022

Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels

In this paper, we present a dynamic convolution kernel (DCK) strategy fo...
research
05/08/2017

You said that?

We present a method for generating a video of a talking face. The method...
research
08/11/2021

FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset

While significant advancements have been made in the generation of deepf...
research
06/11/2021

GANs N' Roses: Stable, Controllable, Diverse Image to Image Translation (works for videos too!)

We show how to learn a map that takes a content code, derived from a fac...
research
12/14/2020

Multi Modal Adaptive Normalization for Audio to Video Generation

Speech-driven facial video generation has been a complex problem due to ...
research
05/11/2022

Diverse Video Generation from a Single Video

GANs are able to perform generation and manipulation tasks, trained on a...

Please sign up or login with your details

Forgot password? Click here to reset