Compute Cost Amortized Transformer for Streaming ASR

07/05/2022
by   Yi Xie, et al.
0

We present a streaming, Transformer-based end-to-end automatic speech recognition (ASR) architecture which achieves efficient neural inference through compute cost amortization. Our architecture creates sparse computation pathways dynamically at inference time, resulting in selective use of compute resources throughout decoding, enabling significant reductions in compute with minimal impact on accuracy. The fully differentiable architecture is trained end-to-end with an accompanying lightweight arbitrator mechanism operating at the frame-level to make dynamic decisions on each input while a tunable loss function is used to regularize the overall level of compute against predictive performance. We report empirical results from experiments using the compute amortized Transformer-Transducer (T-T) model conducted on LibriSpeech data. Our best model can achieve a 60 word error rate (WER) increase.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/13/2020

Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition

Transformer has achieved competitive performance against state-of-the-ar...
research
04/19/2021

Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers

This paper addresses end-to-end automatic speech recognition (ASR) for l...
research
08/03/2021

Amortized Neural Networks for Low-Latency Speech Recognition

We introduce Amortized Neural Networks (AmNets), a compute cost- and lat...
research
03/23/2023

Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition

Transformer-based models have recently made significant achievements in ...
research
11/09/2019

Fully Quantizing a Simplified Transformer for End-to-end Speech Recognition

While significant improvements have been made in recent years in terms o...
research
04/20/2020

WHALETRANS: E2E WHisper to nAturaL spEech conversion using modified TRANSformer network

In this article, we investigate whispered-to natural-speech conversion m...
research
05/21/2020

Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning

In this work, we study leveraging extra text data to improve low-resourc...

Please sign up or login with your details

Forgot password? Click here to reset