Text-based Editing of Talking-head Video

by   Ohad Fried, et al.

Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis.


page 2

page 3

page 4

page 5

page 7

page 8

page 13

page 14


VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

We present VideoReTalking, a new system to edit the faces of a real-worl...

Instruct-Video2Avatar: Video-to-Avatar Generation with Instructions

We propose a method for synthesizing edited photo-realistic digital avat...

X2Face: A network for controlling face generation by using images, audio, and pose codes

The objective of this paper is a neural network model that controls the ...

Responsive Listening Head Generation: A Benchmark Dataset and Baseline

Responsive listening during face-to-face conversations is a critical ele...

PVP: Personalized Video Prior for Editable Dynamic Portraits using StyleGAN

Portrait synthesis creates realistic digital avatars which enable users ...

Everybody's Talkin': Let Me Talk as You Want

We present a method to edit a target portrait footage by taking a sequen...

Towards the Effects of Alignment Edits on the Quality of Experience of 360 Videos

The optimization of viewers' quality of experience (QoE) in 360 videos f...

Please sign up or login with your details

Forgot password? Click here to reset