A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes

04/13/2022
by   Shaojin Ding, et al.
0

In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios. Moreover, the model can significantly reduce model size and power consumption without loss of quality. Namely, with the dynamic cascaded encoder model, we explore three techniques to maximally boost the performance of each model size: 1) Use separate decoders for each sub-model while sharing the encoders; 2) Use funnel-pooling to improve the encoder efficiency; 3) Balance the size of causal and non-causal encoders to improve quality and fit deployment constraints. Overall, the proposed large-medium model has 30 smaller size and reduces power consumption by 33 cascaded encoder model. The triple-size model that unifies the large, medium, and small models achieves 37 while substantially reducing the engineering efforts of having separate models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2022

Streaming parallel transducer beam search with fast-slow cascaded encoders

Streaming ASR with strict latency constraints is required in many speech...
research
10/27/2020

Cascaded encoders for unifying streaming and non-streaming ASR

End-to-end (E2E) automatic speech recognition (ASR) models, by now, have...
research
11/21/2020

A Better and Faster End-to-End Model for Streaming ASR

End-to-end (E2E) models have shown to outperform state-of-the-art conven...
research
03/31/2023

Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR

Conformer models maintain a large number of internal states, the vast ma...
research
11/28/2022

E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model

We explore unifying a neural segmenter with two-pass cascaded encoder AS...
research
04/05/2021

Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency

We propose a dynamic encoder transducer (DET) for on-device speech recog...
research
11/21/2017

Multiple-Instance, Cascaded Classification for Keyword Spotting in Narrow-Band Audio

We propose using cascaded classifiers for a keyword spotting (KWS) task ...

Please sign up or login with your details

Forgot password? Click here to reset