One Shot Audio to Animated Video Generation

02/19/2021
by   Neeraj Kumar, et al.
9

We consider the challenging problem of audio to animated video generation. We propose a novel method OneShotAu2AV to generate an animated video of arbitrary length using an audio clip and a single unseen image of a person as an input. The proposed method consists of two stages. In the first stage, OneShotAu2AV generates the talking-head video in the human domain given an audio and a person's image. In the second stage, the talking-head video from the human domain is converted to the animated domain. The model architecture of the first stage consists of spatially adaptive normalization based multi-level generator and multiple multilevel discriminators along with multiple adversarial and non-adversarial losses. The second stage leverages attention based normalization driven GAN architecture along with temporal predictor based recycle loss and blink loss coupled with lipsync loss, for unsupervised generation of animated video. In our approach, the input audio clip is not restricted to any specific language, which gives the method multilingual applicability. OneShotAu2AV can generate animated videos that have: (a) lip movements that are in sync with the audio, (b) natural facial expressions such as blinks and eyebrow movements, (c) head movements. Experimental evaluation demonstrates superior performance of OneShotAu2AV as compared to U-GAT-IT and RecycleGan on multiple quantitative metrics including KID(Kernel Inception Distance), Word error rate, blinks/sec

READ FULL TEXT

page 7

page 8

research
12/14/2020

Robust One Shot Audio to Video Generation

Audio to Video generation is an interesting problem that has numerous ap...
research
04/01/2021

Collaborative Learning to Generate Audio-Video Jointly

There have been a number of techniques that have demonstrated the genera...
research
12/14/2020

Multi Modal Adaptive Normalization for Audio to Video Generation

Speech-driven facial video generation has been a complex problem due to ...
research
11/02/2020

Facial Keypoint Sequence Generation from Audio

Whenever we speak, our voice is accompanied by facial movements and expr...
research
12/15/2020

HeadGAN: Video-and-Audio-Driven Talking Head Synthesis

Recent attempts to solve the problem of talking head synthesis using a s...
research
06/20/2023

Audio-Driven 3D Facial Animation from In-the-Wild Videos

Given an arbitrary audio clip, audio-driven 3D facial animation aims to ...
research
05/09/2019

Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss

We devise a cascade GAN approach to generate talking face video, which i...

Please sign up or login with your details

Forgot password? Click here to reset