Towards Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription

07/20/2022
by   Longshen Ou, et al.
0

Automatic speech recognition (ASR) has progressed significantly in recent years due to large-scale datasets and the paradigm of self-supervised learning (SSL) methods. However, as its counterpart problem in the singing domain, automatic lyric transcription (ALT) suffers from limited data and degraded intelligibility of sung lyrics, which has caused it to develop at a slower pace. To fill in the performance gap between ALT and ASR, we attempt to exploit the similarities between speech and singing. In this work, we propose a transfer-learning-based ALT solution that takes advantage of these similarities by adapting wav2vec 2.0, an SSL ASR model, to the singing domain. We maximize the effectiveness of transfer learning by exploring the influence of different transfer starting points. We further enhance the performance by extending the original CTC model to a hybrid CTC/attention model. Our method surpasses previous approaches by a large margin on various ALT benchmark datasets. Further experiment shows that, with even a tiny proportion of training data, our method still achieves competitive performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2020

Refining Automatic Speech Recognition System for older adults

Building a high quality automatic speech recognition (ASR) system with l...
research
11/21/2019

Cantonese Automatic Speech Recognition Using Transfer Learning from Mandarin

We propose a system to develop a basic automatic speech recognizer(ASR) ...
research
04/13/2021

EAT: Enhanced ASR-TTS for Self-supervised Speech Recognition

Self-supervised ASR-TTS models suffer in out-of-domain data conditions. ...
research
06/01/2017

Transfer Learning for Speech Recognition on a Budget

End-to-end training of automated speech recognition (ASR) systems requir...
research
02/01/2022

BEA-Base: A Benchmark for ASR of Spontaneous Hungarian

Hungarian is spoken by 15 million people, still, easily accessible Autom...
research
10/21/2022

Deep LSTM Spoken Term Detection using Wav2Vec 2.0 Recognizer

In recent years, the standard hybrid DNN-HMM speech recognizers are outp...
research
11/01/2021

A transfer learning based approach for pronunciation scoring

Phone-level pronunciation scoring is a challenging task, with performanc...

Please sign up or login with your details

Forgot password? Click here to reset