A higher order Minkowski loss for improved prediction ability of acoustic model in ASR

12/02/2021
by   Vishwanath Pratap Singh, et al.
0

Conventional automatic speech recognition (ASR) system uses second-order minkowski loss during inference time which is suboptimal as it incorporates only first order statistics in posterior estimation [2]. In this paper we have proposed higher order minkowski loss (4th Order and 6th Order) during inference time, without any changes during training time. The main contribution of the paper is to show that higher order loss uses higher order statistics in posterior estimation, which improves the prediction ability of acoustic model in ASR system. We have shown mathematically that posterior probability obtained due to higher order loss is function of second order posterior and thus the method can be incorporated in standard ASR system in an easy manner. It is to be noted that all changes are proposed during test(inference) time, we do not make any change in any training pipeline. Multiple baseline systems namely, TDNN1, TDNN2, DNN and LSTM are developed to verify the improvement incurred due to higher order minkowski loss. All experiments are conducted on LibriSpeech dataset and performance metrics are word error rate (WER) on "dev-clean", "test-clean", "dev-other" and "test-other" datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/02/2021

A Mixture of Expert Based Deep Neural Network for Improved ASR

This paper presents a novel deep learning architecture for acoustic mode...
research
02/05/2021

Intermediate Loss Regularization for CTC-based Speech Recognition

We present a simple and efficient auxiliary loss function for automatic ...
research
10/07/2021

Learning Higher-Order Dynamics in Video-Based Cardiac Measurement

Computer vision methods typically optimize for first-order dynamics (e.g...
research
10/12/2021

Improving Character Error Rate Is Not Equal to Having Clean Speech: Speech Enhancement for ASR Systems with Black-box Acoustic Models

A deep neural network (DNN)-based speech enhancement (SE) aiming to maxi...
research
04/09/2021

Feature Replacement and Combination for Hybrid ASR Systems

Acoustic modeling of raw waveform and learning feature extractors as par...
research
11/29/2017

Stream Attention for far-field multi-microphone ASR

A stream attention framework has been applied to the posterior probabili...
research
08/16/2018

Automatic Chord Recognition with Higher-Order Harmonic Language Modelling

Common temporal models for automatic chord recognition model chord chang...

Please sign up or login with your details

Forgot password? Click here to reset