A Distributed Optimisation Framework Combining Natural Gradient with Hessian-Free for Discriminative Sequence Training

03/12/2021
by   Adnan Haider, et al.
0

This paper presents a novel natural gradient and Hessian-free (NGHF) optimisation framework for neural network training that can operate efficiently in a distributed manner. It relies on the linear conjugate gradient (CG) algorithm to combine the natural gradient (NG) method with local curvature information from Hessian-free (HF) or other second-order methods. A solution to a numerical issue in CG allows effective parameter updates to be generated with far fewer CG iterations than usually used (e.g. 5-8 instead of 200). This work also presents a novel preconditioning approach to improve the progress made by individual CG iterations for models with shared parameters. Although applicable to other training losses and model structures, NGHF is investigated in this paper for lattice-based discriminative sequence training for hybrid hidden Markov model acoustic models using a standard recurrent neural network, long short-term memory, and time delay neural network models for output probability calculation. Automatic speech recognition experiments are reported on the multi-genre broadcast data set for a range of different acoustic model types. These experiments show that NGHF achieves larger word error rate reductions than standard stochastic gradient descent or Adam, while requiring orders of magnitude fewer parameter updates.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2018

Combining Natural Gradient with Hessian Free Methods for Sequence Training

This paper presents a new optimisation approach to train Deep Neural Net...
research
04/06/2018

Sequence Training of DNN Acoustic Models With Natural Gradient

Deep Neural Network (DNN) acoustic models often use discriminative seque...
research
07/10/2019

Acoustic Model Optimization Based On Evolutionary Stochastic Gradient Descent with Anchors for Automatic Speech Recognition

Evolutionary stochastic gradient descent (ESGD) was proposed as a popula...
research
11/20/2020

Improving RNN-T ASR Accuracy Using Untranscribed Context Audio

We present a new training scheme for streaming automatic speech recognit...
research
01/16/2013

Training Neural Networks with Stochastic Hessian-Free Optimization

Hessian-free (HF) optimization has been successfully used for training d...
research
06/17/2019

Adversarial Training for Multilingual Acoustic Modeling

Multilingual training has been shown to improve acoustic modeling perfor...
research
10/21/2021

Asynchronous Decentralized Distributed Training of Acoustic Models

Large-scale distributed training of deep acoustic models plays an import...

Please sign up or login with your details

Forgot password? Click here to reset