Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks

by   Shoukang Hu, et al.

State-of-the-art automatic speech recognition (ASR) system development is data and computation intensive. The optimal design of deep neural networks (DNNs) for these systems often require expert knowledge and empirical evaluation. In this paper, a range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper-parameters of factored time delay neural networks (TDNN-Fs): i) the left and right splicing context offsets; and ii) the dimensionality of the bottleneck linear projection at each hidden layer. These techniques include the differentiable neural architecture search (DARTS) method integrating architecture learning with lattice-free MMI training; Gumbel-Softmax and pipelined DARTS methods reducing the confusion over candidate architectures and improving the generalization of architecture selection; and Penalized DARTS incorporating resource constraints to balance the trade-off between performance and system complexity. Parameter sharing among TDNN-F architectures allows an efficient search over up to 7^28 different systems. Statistically significant word error rate (WER) reductions of up to 1.2 over a state-of-the-art 300-hour Switchboard corpus trained baseline LF-MMI TDNN-F system featuring speed perturbation, i-Vector and learning hidden unit contribution (LHUC) based speaker adaptation as well as RNNLM rescoring. Performance contrasts on the same task against recent end-to-end systems reported in the literature suggest the best NAS auto-configured system achieves state-of-the-art WERs of 9.9 sets respectively with up to 96 Bayesian learning shows that ...



There are no comments yet.


page 1


Neural Architecture Search for Speech Recognition

Deep neural networks (DNNs) based automatic speech recognition (ASR) sys...

Efficient Neural Architecture Search for End-to-end Speech Recognition via Straight-Through Gradients

Neural Architecture Search (NAS), the process of automating architecture...

Leveraging End-to-End Speech Recognition with Neural Architecture Search

Deep neural networks (DNNs) have been demonstrated to outperform many tr...

Neural Architecture Search for Speech Emotion Recognition

Deep neural networks have brought significant advancements to speech emo...

XNAS: Neural Architecture Search with Expert Advice

This paper introduces a novel optimization method for differential neura...

Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL

While early AutoML frameworks focused on optimizing traditional ML pipel...

SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems

We design deep neural networks (DNNs) and corresponding networks' splitt...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.