Distributed Deep Learning Strategies For Automatic Speech Recognition

04/10/2019
by   Wei Zhang, et al.
0

In this paper, we propose and investigate a variety of distributed deep learning strategies for automatic speech recognition (ASR) and evaluate them with a state-of-the-art Long short-term memory (LSTM) acoustic model on the 2000-hour Switchboard (SWB2000), which is one of the most widely used datasets for ASR performance benchmark. We first investigate what are the proper hyper-parameters (e.g., learning rate) to enable the training with sufficiently large batch size without impairing the model accuracy. We then implement various distributed strategies, including Synchronous (SYNC), Asynchronous Decentralized Parallel SGD (ADPSGD) and the hybrid of the two HYBRID, to study their runtime/accuracy trade-off. We show that we can train the LSTM model using ADPSGD in 14 hours with 16 NVIDIA P100 GPUs to reach a 7.6 Hub5- 2000 Switchboard (SWB) test set and a 13.1 set. Furthermore, we can train the model using HYBRID in 11.5 hours with 32 NVIDIA V100 GPUs without loss in accuracy.

READ FULL TEXT
research
07/10/2019

A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition

Modern Automatic Speech Recognition (ASR) systems rely on distributed de...
research
10/21/2021

Asynchronous Decentralized Distributed Training of Acoustic Models

Large-scale distributed training of deep acoustic models plays an import...
research
05/25/2022

Heterogeneous Reservoir Computing Models for Persian Speech Recognition

Over the last decade, deep-learning methods have been gradually incorpor...
research
02/04/2020

Improving Efficiency in Large-Scale Decentralized Distributed Training

Decentralized Parallel SGD (D-PSGD) and its asynchronous variant Asynchr...
research
03/25/2022

Impact of Dataset on Acoustic Models for Automatic Speech Recognition

In Automatic Speech Recognition, GMM-HMM had been widely used for acoust...
research
11/29/2022

Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation

The neural transducer is an end-to-end model for automatic speech recogn...
research
06/05/2018

LSTM Benchmarks for Deep Learning Frameworks

This study provides benchmarks for different implementations of LSTM uni...

Please sign up or login with your details

Forgot password? Click here to reset