Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary

04/29/2021
by   Sibo Zhang, et al.
0

With the advance of deep learning technology, automatic video generation from audio or text has become an emerging and promising research topic. In this paper, we present a novel approach to synthesize video from the text. The method builds a phoneme-pose dictionary and trains a generative adversarial network (GAN) to generate video from interpolated phoneme poses. Compared to audio-driven video generation algorithms, our approach has a number of advantages: 1) It only needs a fraction of the training data used by an audio-driven approach; 2) It is more flexible and not subject to vulnerability due to speaker variation; 3) It significantly reduces the preprocessing, training and inference time. We perform extensive experiments to compare the proposed method with state-of-the-art talking face generation methods on a benchmark dataset and datasets of our own. The results demonstrate the effectiveness and superiority of our approach.

READ FULL TEXT

page 2

page 3

page 4

research
12/11/2019

Neural Voice Puppetry: Audio-driven Facial Reenactment

We present Neural Voice Puppetry, a novel approach for audio-driven faci...
research
01/16/2022

Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels

In this paper, we present a dynamic convolution kernel (DCK) strategy fo...
research
04/11/2019

FTGAN: A Fully-trained Generative Adversarial Networks for Text to Face Generation

As a sub-domain of text-to-image synthesis, text-to-face generation has ...
research
06/06/2023

Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis

We are interested in a novel task, namely low-resource text-to-talking a...
research
12/27/2021

Responsive Listening Head Generation: A Benchmark Dataset and Baseline

Responsive listening during face-to-face conversations is a critical ele...
research
10/22/2020

NU-GAN: High resolution neural upsampling with GAN

In this paper, we propose NU-GAN, a new method for resampling audio from...
research
10/07/2022

A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis

Audio driven talking head synthesis is a challenging task that attracts ...

Please sign up or login with your details

Forgot password? Click here to reset