Layer Pruning on Demand with Intermediate CTC

06/17/2021
by   Jaesong Lee, et al.
0

Deploying an end-to-end automatic speech recognition (ASR) model on mobile/embedded devices is a challenging task, since the device computational power and energy consumption requirements are dynamically changed in practice. To overcome the issue, we present a training and pruning method for ASR based on the connectionist temporal classification (CTC) which allows reduction of model depth at run-time without any extra fine-tuning. To achieve the goal, we adopt two regularization methods, intermediate CTC and stochastic depth, to train a model whose performance does not degrade much after pruning. We present an in-depth analysis of layer behaviors using singular vector canonical correlation analysis (SVCCA), and efficient strategies for finding layers which are safe to prune. Using the proposed method, we show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU, while each pruned sub-model maintains the accuracy of individually trained model of the same depth.

READ FULL TEXT
research
02/05/2021

Intermediate Loss Regularization for CTC-based Speech Recognition

We present a simple and efficient auxiliary loss function for automatic ...
research
03/17/2021

Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation

End-to-end automatic speech recognition (ASR), unlike conventional ASR, ...
research
09/21/2023

CoMFLP: Correlation Measure based Fast Search on ASR Layer Pruning

Transformer-based speech recognition (ASR) model with deep layers exhibi...
research
05/19/2022

Insights on Neural Representations for End-to-End Speech Recognition

End-to-end automatic speech recognition (ASR) models aim to learn a gene...
research
10/15/2021

Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet

From wearables to powerful smart devices, modern automatic speech recogn...
research
03/14/2023

I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

Transformer-based end-to-end speech recognition has achieved great succe...
research
06/15/2023

MobileASR: A resource-aware on-device personalisation framework for automatic speech recognition in mobile phones

We describe a comprehensive methodology for developing user-voice person...

Please sign up or login with your details

Forgot password? Click here to reset