Music2Video: Automatic Generation of Music Video with fusion of audio and text

01/11/2022
by   Joel Jang, et al.
0

Creation of images using generative adversarial networks has been widely adapted into multi-modal regime with the advent of multi-modal representation models pre-trained on large corpus. Various modalities sharing a common representation space could be utilized to guide the generative models to create images from text or even from audio source. Departing from the previous methods that solely rely on either text or audio, we exploit the expressiveness of both modality. Based on the fusion of text and audio, we create video whose content is consistent with the distinct modalities that are provided. A simple approach to automatically segment the video into variable length intervals and maintain time consistency in generated video is part of our method. Our proposed framework for generating music video shows promising results in application level where users can interactively feed in music source and text source to create artistic music videos. Our code is available at https://github.com/joeljang/music2video.

READ FULL TEXT

page 2

page 3

research
05/29/2023

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Vision and text have been fully explored in contemporary video-text foun...
research
12/01/2020

MusicTM-Dataset for Joint Representation Learning among Sheet Music, Lyrics, and Musical Audio

This work present a music dataset named MusicTM-Dataset, which is utiliz...
research
10/11/2022

ConchShell: A Generative Adversarial Networks that Turns Pictures into Piano Music

We present ConchShell, a multi-modal generative adversarial framework th...
research
04/25/2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Large language models (LLMs) have exhibited remarkable capabilities acro...
research
09/08/2019

MIDI-Sandwich2: RNN-based Hierarchical Multi-modal Fusion Generation VAE networks for multi-track symbolic music generation

Currently, almost all the multi-track music generation models use the Co...
research
04/17/2023

Generative Disco: Text-to-Video Generation for Music Visualization

Visuals are a core part of our experience of music, owing to the way the...
research
07/24/2023

IteraTTA: An interface for exploring both text prompts and audio priors in generating music with text-to-audio models

Recent text-to-audio generation techniques have the potential to allow n...

Please sign up or login with your details

Forgot password? Click here to reset