Interpreting Song Lyrics with an Audio-Informed Pre-trained Language Model

08/24/2022
by   Yixiao Zhang, et al.
0

Lyric interpretations can help people understand songs and their lyrics quickly, and can also make it easier to manage, retrieve and discover songs efficiently from the growing mass of music archives. In this paper we propose BART-fusion, a novel model for generating lyric interpretations from lyrics and music audio that combines a large-scale pre-trained language model with an audio encoder. We employ a cross-modal attention module to incorporate the audio representation into the lyrics representation to help the pre-trained language model understand the song from an audio perspective, while preserving the language model's original generative performance. We also release the Song Interpretation Dataset, a new large-scale dataset for training and evaluating our model. Experimental results show that the additional audio information helps our model to understand words and music better, and to generate precise and fluent interpretations. An additional experiment on cross-modal music retrieval shows that interpretations generated by BART-fusion can also help people retrieve music more accurately than with the original BART.

READ FULL TEXT
research
12/14/2020

Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval

The goal of audio captioning is to translate input audio into its descri...
research
11/24/2017

Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval

Little research focuses on cross-modal correlation learning where tempor...
research
09/21/2023

Self-Supervised Contrastive Learning for Robust Audio-Sheet Music Retrieval Systems

Linking sheet music images to audio recordings remains a key problem for...
research
04/30/2021

Cross-Modal Music-Video Recommendation: A Study of Design Choices

In this work, we study music/video cross-modal recommendation, i.e. reco...
research
12/21/2022

RECAP: Retrieval Augmented Music Captioner

With the prevalence of stream media platforms serving music search and r...
research
05/23/2023

When the Music Stops: Tip-of-the-Tongue Retrieval for Music

We present a study of Tip-of-the-tongue (ToT) retrieval for music, where...
research
06/26/2019

Learning Soft-Attention Models for Tempo-invariant Audio-Sheet Music Retrieval

Connecting large libraries of digitized audio recordings to their corres...

Please sign up or login with your details

Forgot password? Click here to reset