Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks

01/08/2022
by   Shoukang Hu, et al.
0

State-of-the-art automatic speech recognition (ASR) system development is data and computation intensive. The optimal design of deep neural networks (DNNs) for these systems often require expert knowledge and empirical evaluation. In this paper, a range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper-parameters of factored time delay neural networks (TDNN-Fs): i) the left and right splicing context offsets; and ii) the dimensionality of the bottleneck linear projection at each hidden layer. These techniques include the differentiable neural architecture search (DARTS) method integrating architecture learning with lattice-free MMI training; Gumbel-Softmax and pipelined DARTS methods reducing the confusion over candidate architectures and improving the generalization of architecture selection; and Penalized DARTS incorporating resource constraints to balance the trade-off between performance and system complexity. Parameter sharing among TDNN-F architectures allows an efficient search over up to 7^28 different systems. Statistically significant word error rate (WER) reductions of up to 1.2 over a state-of-the-art 300-hour Switchboard corpus trained baseline LF-MMI TDNN-F system featuring speed perturbation, i-Vector and learning hidden unit contribution (LHUC) based speaker adaptation as well as RNNLM rescoring. Performance contrasts on the same task against recent end-to-end systems reported in the literature suggest the best NAS auto-configured system achieves state-of-the-art WERs of 9.9 sets respectively with up to 96 Bayesian learning shows that ...

READ FULL TEXT
research
07/17/2020

Neural Architecture Search for Speech Recognition

Deep neural networks (DNNs) based automatic speech recognition (ASR) sys...
research
11/11/2020

Efficient Neural Architecture Search for End-to-end Speech Recognition via Straight-Through Gradients

Neural Architecture Search (NAS), the process of automating architecture...
research
12/11/2019

Leveraging End-to-End Speech Recognition with Neural Architecture Search

Deep neural networks (DNNs) have been demonstrated to outperform many tr...
research
06/19/2019

XNAS: Neural Architecture Search with Expert Advice

This paper introduces a novel optimization method for differential neura...
research
03/31/2022

Neural Architecture Search for Speech Emotion Recognition

Deep neural networks have brought significant advancements to speech emo...
research
08/28/2022

Bayesian Neural Network Language Modeling for Speech Recognition

State-of-the-art neural network language models (NNLMs) represented by l...
research
05/23/2023

Improving Speech Emotion Recognition Performance using Differentiable Architecture Search

Speech Emotion Recognition (SER) is a critical enabler of emotion-aware ...

Please sign up or login with your details

Forgot password? Click here to reset