AADiff: Audio-Aligned Video Synthesis with Text-to-Image Diffusion

05/06/2023
by   Seungwoo Lee, et al.
0

Recent advances in diffusion models have showcased promising results in the text-to-video (T2V) synthesis task. However, as these T2V models solely employ text as the guidance, they tend to struggle in modeling detailed temporal dynamics. In this paper, we introduce a novel T2V framework that additionally employ audio signals to control the temporal dynamics, empowering an off-the-shelf T2I diffusion to generate audio-aligned videos. We propose audio-based regional editing and signal smoothing to strike a good balance between the two contradicting desiderata of video synthesis, i.e., temporal flexibility and coherence. We empirically demonstrate the effectiveness of our method through experiments, and further present practical applications for contents creation.

READ FULL TEXT

page 3

page 4

research
06/29/2023

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

The Video-to-Audio (V2A) model has recently gained attention for its pra...
research
06/16/2023

CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models

Recent work has studied text-to-audio synthesis using large amounts of p...
research
06/01/2023

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance

Creating a vivid video from the event or scenario in our imagination is ...
research
08/07/2023

DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis

In recent years, diffusion models have emerged as the most powerful appr...
research
09/08/2023

The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion

In recent years, video generation has become a prominent generative tool...
research
08/19/2023

MeDM: Mediating Image Diffusion Models for Video-to-Video Translation with Temporal Correspondence Guidance

This study introduces an efficient and effective method, MeDM, that util...
research
10/06/2021

EdiTTS: Score-based Editing for Controllable Text-to-Speech

We present EdiTTS, an off-the-shelf speech editing methodology based on ...

Please sign up or login with your details

Forgot password? Click here to reset