Probabilistic Adaptation of Text-to-Video Models

06/02/2023
by   Mengjiao Yang, et al.
5

Large text-to-video models trained on internet-scale data have demonstrated exceptional capabilities in generating high-fidelity videos from arbitrary textual descriptions. However, adapting these models to tasks with limited domain-specific data, such as animation or robotics videos, poses a significant computational challenge, since finetuning a pretrained large model can be prohibitively expensive. Inspired by how a small modifiable component (e.g., prompts, prefix-tuning) can adapt a large language model to perform new tasks without requiring access to the model weights, we investigate how to adapt a large pretrained text-to-video model to a variety of downstream domains and tasks without finetuning. In answering this question, we propose Video Adapter, which leverages the score function of a large pretrained video diffusion model as a probabilistic prior to guide the generation of a task-specific small video model. Our experiments show that Video Adapter is capable of incorporating the broad knowledge and preserving the high fidelity of a large pretrained video model in a task-specific small video model that is able to generate high-quality yet specialized videos on a variety of tasks such as animation, egocentric modeling, and modeling of simulated and real-world robotics data. More videos can be found on the website https://video-adapter.github.io/.

READ FULL TEXT

page 2

page 6

page 7

page 8

page 9

page 10

research
04/30/2021

GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions

Generating videos from text is a challenging task due to its high comput...
research
10/05/2022

Imagen Video: High Definition Video Generation with Diffusion Models

We present Imagen Video, a text-conditional video generation system base...
research
12/22/2022

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

To reproduce the success of text-to-image (T2I) generation, recent works...
research
04/29/2022

Flamingo: a Visual Language Model for Few-Shot Learning

Building models that can be rapidly adapted to numerous tasks using only...
research
12/13/2022

RT-1: Robotics Transformer for Real-World Control at Scale

By transferring knowledge from large, diverse, task-agnostic datasets, m...
research
06/06/2019

Scaling Autoregressive Video Models

Due to the statistical complexity of video, the high degree of inherent ...
research
08/16/2023

OnUVS: Online Feature Decoupling Framework for High-Fidelity Ultrasound Video Synthesis

Ultrasound (US) imaging is indispensable in clinical practice. To diagno...

Please sign up or login with your details

Forgot password? Click here to reset