PiRhDy: Learning Pitch-, Rhythm-, and Dynamics-aware Embeddings for Symbolic Music
Definitive embeddings remain a fundamental challenge of computational musicology for symbolic music in deep learning today. Analogous to natural language, music can be modeled as a sequence of tokens. This motivates the majority of existing solutions to explore the utilization of word embedding models to build music embeddings. However, music differs from natural languages in two key aspects: (1) musical token is multi-faceted – it comprises of pitch, rhythm and dynamics information; and (2) musical context is two-dimensional – each musical token is dependent on both melodic and harmonic contexts. In this work, we provide a comprehensive solution by proposing a novel framework named PiRhDy that integrates pitch, rhythm, and dynamics information seamlessly. PiRhDy adopts a hierarchical strategy which can be decomposed into two steps: (1) token (i.e., note event) modeling, which separately represents pitch, rhythm, and dynamics and integrates them into a single token embedding; and (2) context modeling, which utilizes melodic and harmonic knowledge to train the token embedding. A thorough study was made on each component and sub-strategy of PiRhDy. We further validate our embeddings in three downstream tasks – melody completion, accompaniment suggestion, and genre classification. Results indicate a significant advancement of the neural approach towards symbolic music as well as PiRhDy's potential as a pretrained tool for a broad range of symbolic music applications.
READ FULL TEXT