Dual-Stream Diffusion Net for Text-to-Video Generation

08/16/2023
by   Binhui Liu, et al.
0

With the emerging diffusion models, recently, text-to-video generation has aroused increasing attention. But an important bottleneck therein is that generative videos often tend to carry some flickers and artifacts. In this work, we propose a dual-stream diffusion net (DSDN) to improve the consistency of content variations in generating videos. In particular, the designed two diffusion streams, video content and motion branches, could not only run separately in their private spaces for producing personalized video variations as well as content, but also be well-aligned between the content and motion domains through leveraging our designed cross-transformer interaction module, which would benefit the smoothness of generated videos. Besides, we also introduce motion decomposer and combiner to faciliate the operation on video motion. Qualitative and quantitative experiments demonstrate that our method could produce amazing continuous videos with fewer flickers.

READ FULL TEXT

page 1

page 4

page 5

page 6

page 7

research
07/26/2023

VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet

Recently, diffusion models like StableDiffusion have achieved impressive...
research
02/26/2021

Dual-MTGAN: Stochastic and Deterministic Motion Transfer for Image-to-Video Synthesis

Generating videos with content and motion variations is a challenging ta...
research
12/22/2022

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

To reproduce the success of text-to-image (T2I) generation, recent works...
research
07/15/2022

WordSig: QR streams enabling platform-independent self-identification that's impossible to deepfake

Deepfakes can degrade the fabric of society by limiting our ability to t...
research
02/06/2023

Structure and Content-Guided Video Synthesis with Diffusion Models

Text-guided generative diffusion models unlock powerful image creation a...
research
01/30/2021

Video Reenactment as Inductive Bias for Content-Motion Disentanglement

We introduce a self-supervised motion-transfer VAE model to disentangle ...
research
06/12/2023

AI-Generated Image Detection using a Cross-Attention Enhanced Dual-Stream Network

With the rapid evolution of AI Generated Content (AIGC), forged images p...

Please sign up or login with your details

Forgot password? Click here to reset