LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

06/29/2023
by   Le Zhuo, et al.
0

We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal. Our novel, training-free approach utilizes Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based large language model. In the proposed method, Whisper functions as the "ear" by transcribing the audio, while GPT-4 serves as the "brain," acting as an annotator with a strong performance for contextualized output selection and correction. Our experiments show that LyricWhiz significantly reduces Word Error Rate compared to existing methods in English and can effectively transcribe lyrics across multiple languages. Furthermore, we use LyricWhiz to create the first publicly available, large-scale, multilingual lyrics transcription dataset with a CC-BY-NC-SA copyright license, based on MTG-Jamendo, and offer a human-annotated subset for noise level estimation and evaluation. We anticipate that our proposed method and dataset will advance the development of multilingual lyrics transcription, a challenging and emerging task.

READ FULL TEXT
research
06/05/2023

N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition

Whisper, the recently developed multilingual weakly supervised model, is...
research
01/30/2023

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining

While neural text-to-speech (TTS) has achieved human-like natural synthe...
research
11/03/2022

Crosslingual Generalization through Multitask Finetuning

Multitask prompted finetuning (MTF) has been shown to help large languag...
research
11/04/2022

A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability

In this paper, we introduce our work of building a Streaming Multilingua...
research
07/28/2021

Goal-Oriented Script Construction

The knowledge of scripts, common chains of events in stereotypical scena...
research
05/02/2023

MultiLegalSBD: A Multilingual Legal Sentence Boundary Detection Dataset

Sentence Boundary Detection (SBD) is one of the foundational building bl...
research
11/28/2019

GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors

The lack of large-scale datasets has been a major hindrance to the devel...

Please sign up or login with your details

Forgot password? Click here to reset