Neural Architecture Search for Speech Recognition

07/17/2020
by   Shoukang Hu, et al.
0

Deep neural networks (DNNs) based automatic speech recognition (ASR) systems are often designed using expert knowledge and empirical evaluation. In this paper, a range of neural architecture search (NAS) techniques are used to automatically learn two hyper-parameters that heavily affect the performance and model complexity of state-of-the-art factored time delay neural network (TDNN-F) acoustic models: i) the left and right splicing context offsets; and ii) the dimensionality of the bottleneck linear projection at each hidden layer. These include the standard DARTS method fully integrating the estimation of architecture weights and TDNN parameters in lattice-free MMI (LF-MMI) training; Gumbel-Softmax DARTS that reduces the confusion between candidate architectures; Pipelined DARTS that circumvents the overfitting of architecture weights using held-out data; and Penalized DARTS that further incorporates resource constraints to adjust the trade-off between performance and system complexity. Parameter sharing among candidate architectures was also used to facilitate efficient search over up to 7^28 different TDNN systems. Experiments conducted on a 300-hour Switchboard conversational telephone speech recognition task suggest the NAS auto-configured TDNN-F systems consistently outperform the baseline LF-MMI trained TDNN-F systems using manual expert configurations. Absolute word error rate reductions up to 1.0 model size reduction of 28

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/08/2022

Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks

State-of-the-art automatic speech recognition (ASR) system development i...
research
12/11/2019

Leveraging End-to-End Speech Recognition with Neural Architecture Search

Deep neural networks (DNNs) have been demonstrated to outperform many tr...
research
08/25/2020

Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition

In this paper, we explore the neural architecture search (NAS) for autom...
research
03/31/2022

Neural Architecture Search for Speech Emotion Recognition

Deep neural networks have brought significant advancements to speech emo...
research
05/08/2021

Latency-Controlled Neural Architecture Search for Streaming Speech Recognition

Recently, neural architecture search (NAS) has attracted much attention ...
research
06/23/2022

Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection

Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating p...
research
10/03/2018

Combining Natural Gradient with Hessian Free Methods for Sequence Training

This paper presents a new optimisation approach to train Deep Neural Net...

Please sign up or login with your details

Forgot password? Click here to reset