Audio to score matching by combining phonetic and duration information

07/12/2017
by   Rong Gong, et al.
0

We approach the singing phrase audio to score matching problem by using phonetic and duration information - with a focus on studying the jingju a cappella singing case. We argue that, due to the existence of a basic melodic contour for each mode in jingju music, only using melodic information (such as pitch contour) will result in an ambiguous matching. This leads us to propose a matching approach based on the use of phonetic and duration information. Phonetic information is extracted with an acoustic model shaped with our data, and duration information is considered with the Hidden Markov Models (HMMs) variants we investigate. We build a model for each lyric path in our scores and we achieve the matching by ranking the posterior probabilities of the decoded most likely state sequences. Three acoustic models are investigated: (i) convolutional neural networks (CNNs), (ii) deep neural networks (DNNs) and (iii) Gaussian mixture models (GMMs). Also, two duration models are compared: (i) hidden semi-Markov model (HSMM) and (ii) post-processor duration model. Results show that CNNs perform better in our (small) audio dataset and also that HSMM outperforms the post-processor duration model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/29/2012

Inference in Hidden Markov Models with Explicit State Duration Distributions

In this letter we borrow from the inference techniques developed for unb...
research
01/10/2017

Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are effective models for reducing s...
research
06/05/2018

Singing voice phoneme segmentation by hierarchically inferring syllable and phoneme onset positions

In this paper, we tackle the singing voice phoneme segmentation problem ...
research
08/16/2018

Improved Chord Recognition by Combining Duration and Harmonic Language Models

Chord recognition systems typically comprise an acoustic model that pred...
research
09/19/2022

Duration modeling with semi-Markov Conditional Random Fields for keyphrase extraction

Existing methods for keyphrase extraction need preprocessing to generate...
research
04/08/2014

A Stochastic Temporal Model of Polyphonic MIDI Performance with Ornaments

We study indeterminacies in realization of ornaments and how they can be...
research
11/11/2017

Parkinson's Disease Digital Biomarker Discovery with Optimized Transitions and Inferred Markov Emissions

We search for digital biomarkers from Parkinson's Disease by observing a...

Please sign up or login with your details

Forgot password? Click here to reset