Low Resource Audio-to-Lyrics Alignment From Polyphonic Music Recordings

by   Emir Demirel, et al.

Lyrics alignment in long music recordings can be memory exhaustive when performed in a single pass. In this study, we present a novel method that performs audio-to-lyrics alignment with a low memory consumption footprint regardless of the duration of the music recording. The proposed system first spots the anchoring words within the audio signal. With respect to these anchors, the recording is then segmented and a second-pass alignment is performed to obtain the word timings. We show that our audio-to-lyrics alignment system performs competitively with the state-of-the-art, while requiring much less computational resources. In addition, we utilise our lyrics alignment system to segment the music recordings into sentence-level chunks. Notably on the segmented recordings, we report the lyrics transcription scores on a number of benchmark test sets. Finally, our experiments highlight the importance of the source separation step for good performance on the transcription and alignment tasks. For reproducibility, we publicly share our code with the research community.



There are no comments yet.


page 1

page 2

page 3

page 4


Structure-Aware Audio-to-Score Alignment using Progressively Dilated Convolutional Neural Networks

The identification of structural differences between a music performance...

Improved Handling of Repeats and Jumps in Audio-Sheet Image Synchronization

This paper studies the problem of automatically generating piano score f...

Similarity graphs for the concealment of long duration data loss in music

We present a novel method for the compensation of long duration data gap...

A Cross-Verification Approach for Protecting World Leaders from Fake and Tampered Audio

This paper tackles the problem of verifying the authenticity of speech r...

Exploratory Analysis of a Large Flamenco Corpus using an Ensemble of Convolutional Neural Networks as a Structural Annotation Backend

We present computational tools that we developed for the analysis of a l...

MSTRE-Net: Multistreaming Acoustic Modeling for Automatic Lyrics Transcription

This paper makes several contributions to automatic lyrics transcription...

Catch-A-Waveform: Learning to Generate Audio from a Single Short Example

Models for audio generation are typically trained on hours of recordings...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.