Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR

03/31/2023
by   Rami Botros, et al.
0

Conformer models maintain a large number of internal states, the vast majority of which are associated with self-attention layers. With limited memory bandwidth, reading these from memory at each inference step can slow down inference. In this paper, we design an optimized conformer that is small enough to meet on-device restrictions and has fast inference on TPUs. We explore various ideas to improve the execution speed, including replacing lower conformer blocks with convolution-only blocks, strategically downsizing the architecture, and utilizing an RNNAttention-Performer. Our optimized conformer can be readily incorporated into a cascaded-encoder setting, allowing a second-pass decoder to operate on its output and improve the accuracy whenever more resources are available. Altogether, we find that these optimizations can reduce latency by a factor of 6.8x, and come at a reasonable trade-off in quality. With the cascaded second-pass, we show that the recognition accuracy is completely recoverable. Thus, our proposed encoder can double as a strong standalone encoder in on device, and as the first part of a high-performance ASR pipeline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2022

E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model

We explore unifying a neural segmenter with two-pass cascaded encoder AS...
research
11/21/2020

A Better and Faster End-to-End Model for Streaming ASR

End-to-end (E2E) models have shown to outperform state-of-the-art conven...
research
04/13/2022

A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes

In this paper, we propose a dynamic cascaded encoder Automatic Speech Re...
research
12/15/2022

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units

Direct speech-to-speech translation (S2ST), in which all components can ...
research
08/05/2020

Hybrid Transformer/CTC Networks for Hardware Efficient Voice Triggering

We consider the design of two-pass voice trigger detection systems. We f...
research
08/30/2020

Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition

Recent advances of end-to-end models have outperformed conventional mode...

Please sign up or login with your details

Forgot password? Click here to reset