Multi-Instrumentalist Net: Unsupervised Generation of Music from Body Movements

12/07/2020
by   Kun Su, et al.
0

We propose a novel system that takes as an input body movements of a musician playing a musical instrument and generates music in an unsupervised setting. Learning to generate multi-instrumental music from videos without labeling the instruments is a challenging problem. To achieve the transformation, we built a pipeline named 'Multi-instrumentalistNet' (MI Net). At its base, the pipeline learns a discrete latent representation of various instruments music from log-spectrogram using a Vector Quantized Variational Autoencoder (VQ-VAE) with multi-band residual blocks. The pipeline is then trained along with an autoregressive prior conditioned on the musician's body keypoints movements encoded by a recurrent neural network. Joint training of the prior with the body movements encoder succeeds in the disentanglement of the music into latent features indicating the musical components and the instrumental features. The latent space results in distributions that are clustered into distinct instruments from which new music can be generated. Furthermore, the VQ-VAE architecture supports detailed music generation with additional conditioning. We show that a Midi can further condition the latent space such that the pipeline will generate the exact content of the music being played by the instrument in the video. We evaluate MI Net on two datasets containing videos of 13 instruments and obtain generated music of reasonable audio quality, easily associated with the corresponding instrument, and consistent with the music audio content.

READ FULL TEXT

page 2

page 5

page 7

research
07/21/2020

Foley Music: Learning to Generate Music from Videos

In this paper, we introduce Foley Music, a system that can synthesize pl...
research
05/21/2018

A Universal Music Translation Network

We present a method for translating music across musical instruments, ge...
research
07/13/2023

Real-time Percussive Technique Recognition and Embedding Learning for the Acoustic Guitar

Real-time music information retrieval (RT-MIR) has much potential to aug...
research
06/23/2020

Audeo: Audio Generation for a Silent Performance Video

We present a novel system that gets as an input video frames of a musici...
research
11/25/2021

A-Muze-Net: Music Generation by Composing the Harmony based on the Generated Melody

We present a method for the generation of Midi files of piano music. The...
research
04/20/2020

Music Gesture for Visual Sound Separation

Recent deep learning approaches have achieved impressive performance on ...
research
11/26/2020

Real-time error correction and performance aid for MIDI instruments

Making a slight mistake during live music performance can easily be spot...

Please sign up or login with your details

Forgot password? Click here to reset