MeDM: Mediating Image Diffusion Models for Video-to-Video Translation with Temporal Correspondence Guidance

08/19/2023
by   Ernie Chu, et al.
0

This study introduces an efficient and effective method, MeDM, that utilizes pre-trained image Diffusion Models for video-to-video translation with consistent temporal flow. The proposed framework can render videos from scene position information, such as a normal G-buffer, or perform text-guided editing on videos captured in real-world scenarios. We employ explicit optical flows to construct a practical coding that enforces physical constraints on generated frames and mediates independent frame-wise scores. By leveraging this coding, maintaining temporal consistency in the generated videos can be framed as an optimization problem with a closed-form solution. To ensure compatibility with Stable Diffusion, we also suggest a workaround for modifying observed-space scores in latent-space Diffusion Models. Notably, MeDM does not require fine-tuning or test-time optimization of the Diffusion Models. Through extensive qualitative, quantitative, and subjective experiments on various benchmarks, the study demonstrates the effectiveness and superiority of the proposed approach.

READ FULL TEXT

page 2

page 3

page 6

page 7

page 11

page 12

page 13

research
05/30/2023

Video ControlNet: Towards Temporally Consistent Synthetic-to-Real Video Translation Using Conditional Image Diffusion Models

In this study, we present an efficient and effective approach for achiev...
research
07/26/2023

VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet

Recently, diffusion models like StableDiffusion have achieved impressive...
research
06/13/2023

Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

Large text-to-image diffusion models have exhibited impressive proficien...
research
12/12/2020

Evaluation and Comparison of Diffusion Models with Motif Features

Diffusion models simulate the propagation of influence in networks. The ...
research
08/21/2023

EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints

Motivated by the superior performance of image diffusion models, more an...
research
08/24/2023

APLA: Additional Perturbation for Latent Noise with Adversarial Training Enables Consistency

Diffusion models have exhibited promising progress in video generation. ...
research
05/06/2023

AADiff: Audio-Aligned Video Synthesis with Text-to-Image Diffusion

Recent advances in diffusion models have showcased promising results in ...

Please sign up or login with your details

Forgot password? Click here to reset