Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

09/18/2023
by   George August Wright, et al.
0

The possibility of dynamically modifying the computational load of neural models at inference time is crucial for on-device processing, where computational power is limited and time-varying. Established approaches for neural model compression exist, but they provide architecturally static models. In this paper, we investigate the use of early-exit architectures, that rely on intermediate exit branches, applied to large-vocabulary speech recognition. This allows for the development of dynamic models that adjust their computational cost to the available resources and recognition performance. Unlike previous works, besides using pre-trained backbones we also train the model from scratch with an early-exit architecture. Experiments on public datasets show that early-exit architectures from scratch not only preserve performance levels when using fewer encoder layers, but also improve task accuracy as compared to using single-exit models or using pre-trained models. Additionally, we investigate an exit selection strategy based on posterior probabilities as an alternative to frame-based entropy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2022

HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition

Pre-training with self-supervised models, such as Hidden-unit BERT (HuBE...
research
08/17/2022

Analyzing Robustness of End-to-End Neural Models for Automatic Speech Recognition

We investigate robustness properties of pre-trained neural models for au...
research
10/18/2022

HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch

In this work, we compare from-scratch sequence-level cross-entropy (full...
research
03/08/2020

Development of Automatic Speech Recognition for Kazakh Language using Transfer Learning

Development of Automatic Speech Recognition system for Kazakh language i...
research
06/10/2021

PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition

Recent work on speech self-supervised learning (speech SSL) demonstrated...
research
07/28/2020

Multimodal Integration for Large-Vocabulary Audio-Visual Speech Recognition

For many small- and medium-vocabulary tasks, audio-visual speech recogni...
research
06/29/2022

The THUEE System Description for the IARPA OpenASR21 Challenge

This paper describes the THUEE team's speech recognition system for the ...

Please sign up or login with your details

Forgot password? Click here to reset