An Comparative Analysis of Different Pitch and Metrical Grid Encoding Methods in the Task of Sequential Music Generation

01/31/2023
by   Yuqiang Li, et al.
0

Pitch and meter are two fundamental music features for symbolic music generation tasks, where researchers usually choose different encoding methods depending on specific goals. However, the advantages and drawbacks of different encoding methods have not been frequently discussed. This paper presents a integrated analysis of the influence of two low-level feature, pitch and meter, on the performance of a token-based sequential music generation model. First, the commonly used MIDI number encoding and a less used class-octave encoding are compared. Second, an dense intra-bar metric grid is imposed to the encoded sequence as auxiliary features. Different complexity and resolutions of the metric grid are compared. For complexity, the single token approach and the multiple token approach are compared; for grid resolution, 0 (ablation), 1 (bar-level), 4 (downbeat-level) 12, (8th-triplet-level) up to 64 (64th-note-grid-level) are compared; for duration resolution, 4, 8, 12 and 16 subdivisions per beat are compared. All different encodings are tested on separately trained Transformer-XL models for a melody generation task. Regarding distribution similarity of several objective evaluation metrics to the test dataset, results suggest that the class-octave encoding significantly outperforms the taken-for-granted MIDI encoding on pitch-related metrics; finer grids and multiple-token grids improve the rhythmic quality, but also suffer from over-fitting at early training stage. Results display a general phenomenon of over-fitting from two aspects, the pitch embedding space and the test loss of the single-token grid encoding. From a practical perspective, we both demonstrate the feasibility and raise the concern of easy over-fitting problem of using smaller networks and lower embedding dimensions on the generation task. The findings can also contribute to futural models in terms of feature engineering.

READ FULL TEXT
research
01/27/2023

Byte Pair Encoding for Symbolic Music

The symbolic music modality is nowadays mostly represented as discrete a...
research
05/08/2023

Token-level Fitting Issues of Seq2seq Models

Sequence-to-sequence (seq2seq) models have been widely used for natural ...
research
12/02/2022

A Domain-Knowledge-Inspired Music Embedding Space and a Novel Attention Mechanism for Symbolic Music Modeling

Following the success of the transformer architecture in the natural lan...
research
10/16/2020

PiRhDy: Learning Pitch-, Rhythm-, and Dynamics-aware Embeddings for Symbolic Music

Definitive embeddings remain a fundamental challenge of computational mu...
research
01/07/2021

Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs

To apply neural sequence models such as the Transformers to music genera...
research
06/08/2023

Simple and Controllable Music Generation

We tackle the task of conditional music generation. We introduce MusicGe...
research
06/01/2023

EEL: Efficiently Encoding Lattices for Reranking

Standard decoding approaches for conditional text generation tasks typic...

Please sign up or login with your details

Forgot password? Click here to reset