Learning long-term music representations via hierarchical contextual constraints

02/13/2022
by   Shiqi Wei, et al.
0

Learning symbolic music representations, especially disentangled representations with probabilistic interpretations, has been shown to benefit both music understanding and generation. However, most models are only applicable to short-term music, while learning long-term music representations remains a challenging task. We have seen several studies attempting to learn hierarchical representations directly in an end-to-end manner, but these models have not been able to achieve the desired results and the training process is not stable. In this paper, we propose a novel approach to learn long-term symbolic music representations through contextual constraints. First, we use contrastive learning to pre-train a long-term representation by constraining its difference from the short-term representation (extracted by an off-the-shelf model). Then, we fine-tune the long-term representation by a hierarchical prediction model such that a good long-term representation (e.g., an 8-bar representation) can reconstruct the corresponding short-term ones (e.g., the 2-bar representations within the 8-bar range). Experiments show that our method stabilizes the training and the fine-tuning steps. In addition, the designed contextual constraints benefit both reconstruction and disentanglement, significantly outperforming the baselines.

READ FULL TEXT
research
11/03/2022

Convolution channel separation and frequency sub-bands aggregation for music genre classification

In music, short-term features such as pitch and tempo constitute long-te...
research
01/05/2023

HierVL: Learning Hierarchical Video-Language Embeddings

Video-language embeddings are a promising avenue for injecting semantics...
research
02/05/2020

Continuous Melody Generation via Disentangled Short-Term Representations and Structural Conditions

Automatic music generation is an interdisciplinary research topic that c...
research
04/25/2023

GTN-Bailando: Genre Consistent Long-Term 3D Dance Generation based on Pre-trained Genre Token Network

Music-driven 3D dance generation has become an intensive research topic ...
research
05/17/2022

The Power of Reuse: A Multi-Scale Transformer Model for Structural Dynamic Segmentation in Symbolic Music Generation

Symbolic Music Generation relies on the contextual representation capabi...
research
12/12/2018

MorpheuS: generating structured music with constrained patterns and tension

Automatic music generation systems have gained in popularity and sophist...
research
08/07/2023

TempFuser: Learning Tactical and Agile Flight Maneuvers in Aerial Dogfights using a Long Short-Term Temporal Fusion Transformer

Aerial dogfights necessitate understanding the tactically changing maneu...

Please sign up or login with your details

Forgot password? Click here to reset