Unsupervised Generative Adversarial Alignment Representation for Sheet music, Audio and Lyrics

07/29/2020
by   Donghuo Zeng, et al.
0

Sheet music, audio, and lyrics are three main modalities during writing a song. In this paper, we propose an unsupervised generative adversarial alignment representation (UGAAR) model to learn deep discriminative representations shared across three major musical modalities: sheet music, lyrics, and audio, where a deep neural network based architecture on three branches is jointly trained. In particular, the proposed model can transfer the strong relationship between audio and sheet music to audio-lyrics and sheet-lyrics pairs by learning the correlation in the latent shared subspace. We apply CCA components of audio and sheet music to establish new ground truth. The generative (G) model learns the correlation of two couples of transferred pairs to generate new audio-sheet pair for a fixed lyrics to challenge the discriminative (D) model. The discriminative model aims at distinguishing the input which is from the generative model or the ground truth. The two models simultaneously train in an adversarial way to enhance the ability of deep alignment representation learning. Our experimental results demonstrate the feasibility of our proposed UGAAR for alignment representation learning among sheet music, audio, and lyrics.

READ FULL TEXT

page 1

page 2

research
12/01/2020

MusicTM-Dataset for Joint Representation Learning among Sheet Music, Lyrics, and Musical Audio

This work present a music dataset named MusicTM-Dataset, which is utiliz...
research
07/31/2017

Learning Audio - Sheet Music Correspondences for Score Identification and Offline Alignment

This work addresses the problem of matching short excerpts of audio with...
research
11/22/2017

GraphGAN: Graph Representation Learning with Generative Adversarial Nets

The goal of graph representation learning is to embed each vertex in a g...
research
07/10/2023

HCLAS-X: Hierarchical and Cascaded Lyrics Alignment System Using Multimodal Cross-Correlation

In this work, we address the challenge of lyrics alignment, which involv...
research
04/10/2019

Neuralogram: A Deep Neural Network Based Representation for Audio Signals

We propose the Neuralogram – a deep neural network based representation ...
research
11/15/2022

SSM-Net: feature learning for Music Structure Analysis using a Self-Similarity-Matrix based loss

In this paper, we propose a new paradigm to learn audio features for Mus...
research
08/15/2019

Conditional LSTM-GAN for Melody Generation from Lyrics

Melody generation from lyrics has been a challenging research issue in t...

Please sign up or login with your details

Forgot password? Click here to reset