Self-Supervised Disentanglement of Harmonic and Rhythmic Features in Music Audio Signals

09/06/2023
by   Yiming Wu, et al.
0

The aim of latent variable disentanglement is to infer the multiple informative latent representations that lie behind a data generation process and is a key factor in controllable data generation. In this paper, we propose a deep neural network-based self-supervised learning method to infer the disentangled rhythmic and harmonic representations behind music audio generation. We train a variational autoencoder that generates an audio mel-spectrogram from two latent features representing the rhythmic and harmonic content. In the training phase, the variational autoencoder is trained to reconstruct the input mel-spectrogram given its pitch-shifted version. At each forward computation in the training phase, a vector rotation operation is applied to one of the latent features, assuming that the dimensions of the feature vectors are related to pitch intervals. Therefore, in the trained variational autoencoder, the rotated latent feature represents the pitch-related information of the mel-spectrogram, and the unrotated latent feature represents the pitch-invariant information, i.e., the rhythmic content. The proposed method was evaluated using a predictor-based disentanglement metric on the learned features. Furthermore, we demonstrate its application to the automatic generation of music remixes.

READ FULL TEXT
research
11/24/2021

3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces

Learning a disentangled, interpretable, and structured latent representa...
research
10/13/2020

A variational autoencoder for music generation controlled by tonal tension

Many of the music generation systems based on neural networks are fully ...
research
01/04/2021

Transformer-based Conditional Variational Autoencoder for Controllable Story Generation

We investigate large-scale latent variable models (LVMs) for neural stor...
research
11/16/2022

Conditional variational autoencoder to improve neural audio synthesis for polyphonic music sound

Deep generative models for audio synthesis have recently been significan...
research
03/22/2022

Upmixing via style transfer: a variational autoencoder for disentangling spatial images and musical content

In the stereo-to-multichannel upmixing problem for music, one of the mai...
research
05/14/2020

Semi-supervised Neural Chord Estimation Based on a Variational Autoencoder with Discrete Labels and Continuous Textures of Chords

This paper describes a statistically-principled semi-supervised method o...
research
03/14/2020

Semi-supervised Disentanglement with Independent Vector Variational Autoencoders

We aim to separate the generative factors of data into two latent vector...

Please sign up or login with your details

Forgot password? Click here to reset