Exponential Moving Average Model in Parallel Speech Recognition Training

03/03/2017
by   Xu Tian, et al.
0

As training data rapid growth, large-scale parallel training with multi-GPUs cluster is widely applied in the neural network model learning currently.We present a new approach that applies exponential moving average method in large-scale parallel training of neural network model. It is a non-interference strategy that the exponential moving average model is not broadcasted to distributed workers to update their local models after model synchronization in the training process, and it is implemented as the final model of the training system. Fully-connected feed-forward neural networks (DNNs) and deep unidirectional Long short-term memory (LSTM) recurrent neural networks (RNNs) are successfully trained with proposed method for large vocabulary continuous speech recognition on Shenma voice search data in Mandarin. The character error rate (CER) of Mandarin speech recognition further degrades than state-of-the-art approaches of parallel training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/21/2017

Deep LSTM for Large Vocabulary Continuous Speech Recognition

Recurrent neural networks (RNNs), especially long short-term memory (LST...
research
03/17/2017

Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling

Deep learning models (DLMs) are state-of-the-art techniques in speech re...
research
05/04/2021

Performance Evaluation of Deep Convolutional Maxout Neural Network in Speech Recognition

In this paper, various structures and methods of Deep Artificial Neural ...
research
06/09/2021

A Comparative Study on Neural Architectures and Training Methods for Japanese Speech Recognition

End-to-end (E2E) modeling is advantageous for automatic speech recogniti...
research
07/12/2018

A Comparison of Adaptation Techniques and Recurrent Neural Network Architectures

Recently, recurrent neural networks have become state-of-the-art in acou...
research
10/04/2019

Distributed Learning of Deep Neural Networks using Independent Subnet Training

Stochastic gradient descent (SGD) is the method of choice for distribute...
research
08/23/2023

Stabilizing RNN Gradients through Pre-training

Numerous theories of learning suggest to prevent the gradient variance f...

Please sign up or login with your details

Forgot password? Click here to reset