DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis

08/07/2023
by   Zhongjie Duan, et al.
0

In recent years, diffusion models have emerged as the most powerful approach in image synthesis. However, applying these models directly to video synthesis presents challenges, as it often leads to noticeable flickering contents. Although recently proposed zero-shot methods can alleviate flicker to some extent, we still struggle to generate coherent videos. In this paper, we propose DiffSynth, a novel approach that aims to convert image synthesis pipelines to video synthesis pipelines. DiffSynth consists of two key components: a latent in-iteration deflickering framework and a video deflickering algorithm. The latent in-iteration deflickering framework applies video deflickering to the latent space of diffusion models, effectively preventing flicker accumulation in intermediate steps. Additionally, we propose a video deflickering algorithm, named patch blending algorithm, that remaps objects in different frames and blends them together to enhance video consistency. One of the notable advantages of DiffSynth is its general applicability to various video synthesis tasks, including text-guided video stylization, fashion video synthesis, image-guided video stylization, video restoring, and 3D rendering. In the task of text-guided video stylization, we make it possible to synthesize high-quality videos without cherry-picking. The experimental results demonstrate the effectiveness of DiffSynth. All videos can be viewed on our project page. Source codes will also be released.

READ FULL TEXT

page 5

page 6

page 7

research
06/13/2023

Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

Large text-to-image diffusion models have exhibited impressive proficien...
research
03/23/2023

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

Recent text-to-video generation approaches rely on computationally heavy...
research
04/12/2023

DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion

We present DreamPose, a diffusion-based method for generating animated f...
research
05/06/2023

AADiff: Audio-Aligned Video Synthesis with Text-to-Image Diffusion

Recent advances in diffusion models have showcased promising results in ...
research
12/28/2022

Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models

Recent CLIP-guided 3D optimization methods, e.g., DreamFields and PureCL...
research
05/10/2023

Sketching the Future (STF): Applying Conditional Control Techniques to Text-to-Video Models

The proliferation of video content demands efficient and flexible neural...
research
08/15/2023

Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with Image Diffusion Model

The rising demand for creating lifelike avatars in the digital realm has...

Please sign up or login with your details

Forgot password? Click here to reset