End to End Lip Synchronization with a Temporal AutoEncoder

03/30/2022
by   Yoav Shalev, et al.
0

We study the problem of syncing the lip movement in a video with the audio stream. Our solution finds an optimal alignment using a dual-domain recurrent neural network that is trained on synthetic data we generate by dropping and duplicating video frames. Once the alignment is found, we modify the video in order to sync the two sources. Our method is shown to greatly outperform the literature methods on a variety of existing and new benchmarks. As an application, we demonstrate our ability to robustly align text-to-speech generated audio with an existing video stream. Our code and samples are available at https://github.com/itsyoavshalev/End-to-End-Lip-Synchronization-with-a-Temporal-AutoEncoder.

READ FULL TEXT
research
04/20/2021

Detection of Audio-Video Synchronization Errors Via Event Detection

We present a new method and a large-scale database to detect audio-video...
research
07/27/2022

One-Trimap Video Matting

Recent studies made great progress in video matting by extending the suc...
research
02/12/2020

AlignNet: A Unifying Approach to Audio-Visual Alignment

We present AlignNet, a model that synchronizes videos with reference aud...
research
05/18/2020

End-to-End Lip Synchronisation

The goal of this work is to synchronise audio and video of a talking fac...
research
10/08/2021

Phone-to-audio alignment without text: A Semi-supervised Approach

The task of phone-to-audio alignment has many applications in speech res...
research
09/16/2022

A Deep Moving-camera Background Model

In video analysis, background models have many applications such as back...
research
10/27/2020

End-to-end trainable network for degraded license plate detection via vehicle-plate relation mining

License plate detection is the first and essential step of the license p...

Please sign up or login with your details

Forgot password? Click here to reset