HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition

04/13/2022
by   Ji Won Yoon, et al.
0

Pre-training with self-supervised models, such as Hidden-unit BERT (HuBERT) and wav2vec 2.0, has brought significant improvements in automatic speech recognition (ASR). However, these models usually require an expensive computational cost to achieve outstanding performance, slowing down the inference speed. To improve the model efficiency, we propose an early exit scheme for ASR, namely HuBERT-EE, that allows the model to stop the inference dynamically. In HuBERT-EE, multiple early exit branches are added at the intermediate layers, and each branch is used to decide whether a prediction can be exited early. Experimental results on the LibriSpeech dataset show that HuBERT-EE can accelerate the inference of a large-scale HuBERT model while simultaneously balancing the trade-off between the word error rate (WER) performance and the latency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2021

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

This paper is a study of performance-efficiency trade-offs in pre-traine...
research
09/18/2023

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

The possibility of dynamically modifying the computational load of neura...
research
11/01/2022

Avoid Overthinking in Self-Supervised Models for Speech Recognition

Self-supervised learning (SSL) models reshaped our approach to speech, l...
research
11/21/2022

You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model

Large-scale Transformer models bring significant improvements for variou...
research
10/22/2022

Guided contrastive self-supervised pre-training for automatic speech recognition

Contrastive Predictive Coding (CPC) is a representation learning method ...
research
11/21/2022

SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale

End-to-end automatic speech recognition systems represent the state of t...
research
12/13/2021

PM-MMUT: Boosted Phone-mask Data Augmentation using Multi-modeing Unit Training for Robust Uyghur E2E Speech Recognition

Consonant and vowel reduction are often encountered in Uyghur speech, wh...

Please sign up or login with your details

Forgot password? Click here to reset