I3D: Transformer architectures with input-dependent dynamic depth for speech recognition

03/14/2023
by   Yifan Peng, et al.
0

Transformer-based end-to-end speech recognition has achieved great success. However, the large footprint and computational overhead make it difficult to deploy these models in some real-world applications. Model compression techniques can reduce the model size and speed up inference, but the compressed model has a fixed architecture which might be suboptimal. We propose a novel Transformer encoder with Input-Dependent Dynamic Depth (I3D) to achieve strong performance-efficiency trade-offs. With a similar number of layers at inference time, I3D-based models outperform the vanilla Transformer and the static pruned model via iterative layer pruning. We also present interesting analysis on the gate probabilities and the input-dependency, which helps us better understand deep encoders.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset

Recently, Transformer based end-to-end models have achieved great succes...
research
11/18/2021

Quality and Cost Trade-offs in Passage Re-ranking Task

Deep learning models named transformers achieved state-of-the-art result...
research
10/30/2019

Lightweight and Efficient End-to-End Speech Recognition Using Low-Rank Transformer

High performing deep neural networks come at the cost of computational c...
research
05/07/2021

SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts

Recently, Mixture of Experts (MoE) based Transformer has shown promising...
research
05/18/2021

Vision Transformer for Fast and Efficient Scene Text Recognition

Scene text recognition (STR) enables computers to read text in natural s...
research
04/27/2020

Explicitly Modeling Adaptive Depths for Transformer

The vanilla Transformer conducts a fixed number of computations over all...
research
06/17/2021

Layer Pruning on Demand with Intermediate CTC

Deploying an end-to-end automatic speech recognition (ASR) model on mobi...

Please sign up or login with your details

Forgot password? Click here to reset