power-law nonlinearity with maximally uniform distribution criterion for improved neural network training in automatic speech recognition

12/22/2019
by   Chanwoo Kim, et al.
0

In this paper, we describe the Maximum Uniformity of Distribution (MUD) algorithm with the power-law nonlinearity. In this approach, we hypothesize that neural network training will become more stable if feature distribution is not too much skewed. We propose two different types of MUD approaches: power function-based MUD and histogram-based MUD. In these approaches, we first obtain the mel filterbank coefficients and apply nonlinearity functions for each filterbank channel. With the power function-based MUD, we apply a power-function based nonlinearity where power function coefficients are chosen to maximize the likelihood assuming that nonlinearity outputs follow the uniform distribution. With the histogram-based MUD, the empirical Cumulative Density Function (CDF) from the training database is employed to transform the original distribution into a uniform distribution. In MUD processing, we do not use any prior knowledge (e.g. logarithmic relation) about the energy of the incoming signal and the perceived intensity by a human. Experimental results using an end-to-end speech recognition system demonstrate that power-function based MUD shows better result than the conventional Mel Filterbank Cepstral Coefficients (MFCCs). On the LibriSpeech database, we could achieve 4.02 on test-clean and 13.34 (LMs). The major contribution of this work is that we developed a new algorithm for designing the compressive nonlinearity in a data-driven way, which is much more flexible than the previous approaches and may be extended to other domains as well.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2020

Small energy masking for improved neural network training for end-to-end speech recognition

In this paper, we present a Small Energy Masking (SEM) algorithm, which ...
research
04/18/2019

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

We present SpecAugment, a simple data augmentation method for speech rec...
research
11/05/2018

Manner of Articulation Detection using Connectionist Temporal Classification to Improve Automatic Speech Recognition Performance

Conventionally, the manner of articulations in speech signal are derived...
research
05/21/2020

End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming

Despite successful applications of end-to-end approaches in multi-channe...
research
01/16/2023

BayesSpeech: A Bayesian Transformer Network for Automatic Speech Recognition

Recent developments using End-to-End Deep Learning models have been show...
research
02/25/2016

Adaptive Frequency Cepstral Coefficients for Word Mispronunciation Detection

Systems based on automatic speech recognition (ASR) technology can provi...
research
09/10/2018

An Optimization-Based Generative Model of Power Laws Using a New Information Theory Based Metric

In this paper, we propose an optimization-based mechanism to explain pow...

Please sign up or login with your details

Forgot password? Click here to reset