HCLAS-X: Hierarchical and Cascaded Lyrics Alignment System Using Multimodal Cross-Correlation

07/10/2023
by   Minsung Kang, et al.
0

In this work, we address the challenge of lyrics alignment, which involves aligning the lyrics and vocal components of songs. This problem requires the alignment of two distinct modalities, namely text and audio. To overcome this challenge, we propose a model that is trained in a supervised manner, utilizing the cross-correlation matrix of latent representations between vocals and lyrics. Our system is designed in a hierarchical and cascaded manner. It predicts synced time first on a sentence-level and subsequently on a word-level. This design enables the system to process long sequences, as the cross-correlation uses quadratic memory with respect to sequence length. In our experiments, we demonstrate that our proposed system achieves a significant improvement in mean average error, showcasing its robustness in comparison to the previous state-of-the-art model. Additionally, we conduct a qualitative analysis of the system after successfully deploying it in several music streaming services.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/25/2022

Audio-text Retrieval in Context

Audio-text retrieval based on natural language descriptions is a challen...
research
07/29/2020

Unsupervised Generative Adversarial Alignment Representation for Sheet music, Audio and Lyrics

Sheet music, audio, and lyrics are three main modalities during writing ...
research
07/27/2021

Audio-to-Score Alignment Using Deep Automatic Music Transcription

Audio-to-score alignment (A2SA) is a multimodal task consisting in the a...
research
07/27/2023

Cascaded Cross-Modal Transformer for Request and Complaint Detection

We propose a novel cascaded cross-modal transformer (CCMT) that combines...
research
06/25/2019

Acoustic Modeling for Automatic Lyrics-to-Audio Alignment

Automatic lyrics to polyphonic audio alignment is a challenging task not...
research
02/18/2019

End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition Model

Time-aligned lyrics can enrich the music listening experience by enablin...

Please sign up or login with your details

Forgot password? Click here to reset