Neural Architecture Search for Speech Recognition

by   Shoukang Hu, et al.

Deep neural networks (DNNs) based automatic speech recognition (ASR) systems are often designed using expert knowledge and empirical evaluation. In this paper, a range of neural architecture search (NAS) techniques are used to automatically learn two hyper-parameters that heavily affect the performance and model complexity of state-of-the-art factored time delay neural network (TDNN-F) acoustic models: i) the left and right splicing context offsets; and ii) the dimensionality of the bottleneck linear projection at each hidden layer. These include the standard DARTS method fully integrating the estimation of architecture weights and TDNN parameters in lattice-free MMI (LF-MMI) training; Gumbel-Softmax DARTS that reduces the confusion between candidate architectures; Pipelined DARTS that circumvents the overfitting of architecture weights using held-out data; and Penalized DARTS that further incorporates resource constraints to adjust the trade-off between performance and system complexity. Parameter sharing among candidate architectures was also used to facilitate efficient search over up to 7^28 different TDNN systems. Experiments conducted on a 300-hour Switchboard conversational telephone speech recognition task suggest the NAS auto-configured TDNN-F systems consistently outperform the baseline LF-MMI trained TDNN-F systems using manual expert configurations. Absolute word error rate reductions up to 1.0 model size reduction of 28



There are no comments yet.


page 1

page 2

page 3

page 4


Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks

State-of-the-art automatic speech recognition (ASR) system development i...

Leveraging End-to-End Speech Recognition with Neural Architecture Search

Deep neural networks (DNNs) have been demonstrated to outperform many tr...

Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition

In this paper, we explore the neural architecture search (NAS) for autom...

Neural Architecture Search for Speech Emotion Recognition

Deep neural networks have brought significant advancements to speech emo...

Latency-Controlled Neural Architecture Search for Streaming Speech Recognition

Recently, neural architecture search (NAS) has attracted much attention ...

Quantization of Acoustic Model Parameters in Automatic Speech Recognition Framework

Robust automatic speech recognition (ASR) system exploits state-of-the-a...

Combining Natural Gradient with Hessian Free Methods for Sequence Training

This paper presents a new optimisation approach to train Deep Neural Net...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.