V2Meow: Meowing to the Visual Beat via Music Generation

05/11/2023
by   Kun Su, et al.
0

Generating high quality music that complements the visual content of a video is a challenging task. Most existing visual conditioned music generation systems generate symbolic music data, such as MIDI files, instead of raw audio waveform. Given the limited availability of symbolic music data, such methods can only generate music for a few instruments or for specific types of visual input. In this paper, we propose a novel approach called V2Meow that can generate high-quality music audio that aligns well with the visual semantics of a diverse range of video input types. Specifically, the proposed music generation system is a multi-stage autoregressive model which is trained with a number of O(100K) music audio clips paired with video frames, which are mined from in-the-wild music videos, and no parallel symbolic music data is involved. V2Meow is able to synthesize high-fidelity music audio waveform solely conditioned on pre-trained visual features extracted from an arbitrary silent video clip, and it also allows high-level control over the music style of generation examples via supporting text prompts in addition to the video frames conditioning. Through both qualitative and quantitative evaluations, we demonstrate that our model outperforms several existing music generation systems in terms of both visual-audio correspondence and audio quality.

READ FULL TEXT

page 2

page 4

page 11

page 12

page 13

research
01/22/2023

Dance2MIDI: Dance-driven multi-instruments music generation

Dance-driven music generation aims to generate musical pieces conditione...
research
06/23/2020

Audeo: Audio Generation for a Silent Performance Video

We present a novel system that gets as an input video frames of a musici...
research
04/01/2022

Quantized GAN for Complex Music Generation from Dance Videos

We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal fr...
research
11/02/2018

Listen to Dance: Music-driven choreography generation using Autoregressive Encoder-Decoder Network

Automatic choreography generation is a challenging task because it often...
research
05/03/2023

Diverse and Vivid Sound Generation from Text Descriptions

Previous audio generation mainly focuses on specified sound classes such...
research
09/05/2023

Generating Realistic Images from In-the-wild Sounds

Representing wild sounds as images is an important but challenging task ...
research
02/09/2021

TräumerAI: Dreaming Music with StyleGAN

The goal of this paper to generate a visually appealing video that respo...

Please sign up or login with your details

Forgot password? Click here to reset