Calibration of Phone Likelihoods in Automatic Speech Recognition

06/14/2016
by   David A. van Leeuwen, et al.
0

In this paper we study the probabilistic properties of the posteriors in a speech recognition system that uses a deep neural network (DNN) for acoustic modeling. We do this by reducing Kaldi's DNN shared pdf-id posteriors to phone likelihoods, and using test set forced alignments to evaluate these using a calibration sensitive metric. Individual frame posteriors are in principle well-calibrated, because the DNN is trained using cross entropy as the objective function, which is a proper scoring rule. When entire phones are assessed, we observe that it is best to average the log likelihoods over the duration of the phone. Further scaling of the average log likelihoods by the logarithm of the duration slightly improves the calibration, and this improvement is retained when tested on independent test data.

READ FULL TEXT
research
06/19/2018

A Survey of Recent DNN Architectures on the TIMIT Phone Recognition Task

In this survey paper, we have evaluated several recent deep neural netwo...
research
07/11/2021

Multilingual and crosslingual speech recognition using phonological-vector based phone embeddings

The use of phonological features (PFs) potentially allows language-speci...
research
05/15/2020

Context-Dependent Acoustic Modeling without Explicit Phone Clustering

Phoneme-based acoustic modeling of large vocabulary automatic speech rec...
research
10/22/2017

Deep Triphone Embedding Improves Phoneme Recognition

In this paper, we present a novel Deep Triphone Embedding (DTE) represen...
research
06/19/2018

Recurrent DNNs and its Ensembles on the TIMIT Phone Recognition Task

In this paper, we have investigated recurrent deep neural networks (DNNs...

Please sign up or login with your details

Forgot password? Click here to reset