ModelScope Text-to-Video Technical Report

08/12/2023
by   Jiuniu Wang, et al.
0

This paper introduces ModelScopeT2V, a text-to-video synthesis model that evolves from a text-to-image synthesis model (i.e., Stable Diffusion). ModelScopeT2V incorporates spatio-temporal blocks to ensure consistent frame generation and smooth movement transitions. The model could adapt to varying frame numbers during training and inference, rendering it suitable for both image-text and video-text datasets. ModelScopeT2V brings together three components (i.e., VQGAN, a text encoder, and a denoising UNet), totally comprising 1.7 billion parameters, in which 0.5 billion parameters are dedicated to temporal capabilities. The model demonstrates superior performance over state-of-the-art methods across three evaluation metrics. The code and an online demo are available at <https://modelscope.cn/models/damo/text-to-video-synthesis/summary>.

READ FULL TEXT

page 2

page 3

page 8

page 9

research
10/11/2022

Continuous conditional video synthesis by neural processes

We propose a unified model for multiple conditional video synthesis task...
research
03/23/2023

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

Recent text-to-video generation approaches rely on computationally heavy...
research
01/11/2022

Condensing a Sequence to One Informative Frame for Video Recognition

Video is complex due to large variations in motion and rich content in f...
research
06/03/2023

VideoComposer: Compositional Video Synthesis with Motion Controllability

The pursuit of controllability as a higher standard of visual content cr...
research
05/22/2023

ControlVideo: Training-free Controllable Text-to-Video Generation

Text-driven diffusion models have unlocked unprecedented abilities in im...
research
06/01/2023

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance

Creating a vivid video from the event or scenario in our imagination is ...
research
12/14/2022

Towards Smooth Video Composition

Video generation requires synthesizing consistent and persistent frames ...

Please sign up or login with your details

Forgot password? Click here to reset