DeepAI AI Chat
Log In Sign Up

Computational Pronunciation Analysis in Sung Utterances

by   Emir Demirel, et al.

Recent automatic lyrics transcription (ALT) approaches focus on building stronger acoustic models or in-domain language models, while the pronunciation aspect is seldom touched upon. This paper applies a novel computational analysis on the pronunciation variances in sung utterances and further proposes a new pronunciation model adapted for singing. The singing-adapted model is tested on multiple public datasets via word recognition experiments. It performs better than the standard speech dictionary in all settings reporting the best results on ALT in a capella recordings using n-gram language models. For reproducibility, we share the sentence-level annotations used in testing, providing a new benchmark evaluation set for ALT.


ASR2K: Speech Recognition for Around 2000 Languages without Audio

Most recent speech recognition models rely on large supervised datasets,...

SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis

In this work, we present the SOMOS dataset, the first large-scale mean o...

Visual Speech Language Models

Language models (LM) are very powerful in lipreading systems. Language m...

Automatic Chord Recognition with Higher-Order Harmonic Language Modelling

Common temporal models for automatic chord recognition model chord chang...

Effective Sentence Scoring Method using Bidirectional Language Model for Speech Recognition

In automatic speech recognition, many studies have shown performance imp...

Model Interpolation with Trans-dimensional Random Field Language Models for Speech Recognition

The dominant language models (LMs) such as n-gram and neural network (NN...

Exploring Distributional Shifts in Large Language Models for Code Analysis

We systematically study the capacity of two large language models for co...