End-to-End Lyrics Recognition with Self-supervised Learning

09/26/2022
by   Xiangyu Zhang, et al.
0

Lyrics recognition is an important task in music processing. Despite traditional algorithms such as the hybrid HMM- TDNN model achieving good performance, studies on applying end-to-end models and self-supervised learning (SSL) are limited. In this paper, we first establish an end-to-end baseline for lyrics recognition and then explore the performance of SSL models on lyrics recognition task. We evaluate a variety of upstream SSL models with different training methods (masked reconstruction, masked prediction, autoregressive reconstruction, and contrastive learning). Our end-to-end self-supervised models, evaluated on the DAMP music dataset, outperform the previous state-of-the-art (SOTA) system by 5.23 set even without a language model trained by a large corpus. Moreover, we investigate the effect of background music on the performance of self-supervised learning models and conclude that the SSL models cannot extract features efficiently in the presence of background music. Finally, we study the out-of-domain generalization ability of the SSL features considering that those models were not trained on music datasets.

READ FULL TEXT
research
07/10/2022

Towards Proper Contrastive Self-supervised Learning Strategies For Music Audio Representation

The common research goal of self-supervised learning is to extract a gen...
research
12/05/2022

MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music Audio Representation Learning

The deep learning community has witnessed an exponentially growing inter...
research
02/17/2022

End-to-end Music Remastering System Using Self-supervised and Adversarial Training

Mastering is an essential step in music production, but it is also a cha...
research
02/03/2023

Blockwise Self-Supervised Learning at Scale

Current state-of-the-art deep networks are all powered by backpropagatio...
research
04/23/2023

End-to-End Feasible Optimization Proxies for Large-Scale Economic Dispatch

The paper proposes a novel End-to-End Learning and Repair (E2ELR) archit...
research
10/29/2022

Relating Human Perception of Musicality to Prediction in a Predictive Coding Model

We explore the use of a neural network inspired by predictive coding for...
research
07/02/2023

End-to-End Out-of-distribution Detection with Self-supervised Sampling

Out-of-distribution (OOD) detection empowers the model trained on the cl...

Please sign up or login with your details

Forgot password? Click here to reset