Amortized Neural Networks for Low-Latency Speech Recognition

08/03/2021
by   Jonathan Macoskey, et al.
0

We introduce Amortized Neural Networks (AmNets), a compute cost- and latency-aware network architecture particularly well-suited for sequence modeling tasks. We apply AmNets to the Recurrent Neural Network Transducer (RNN-T) to reduce compute cost and latency for an automatic speech recognition (ASR) task. The AmNets RNN-T architecture enables the network to dynamically switch between encoder branches on a frame-by-frame basis. Branches are constructed with variable levels of compute cost and model capacity. Here, we achieve variable compute for two well-known candidate techniques: one using sparse pruning and the other using matrix factorization. Frame-by-frame switching is determined by an arbitrator network that requires negligible compute overhead. We present results using both architectures on LibriSpeech data and show that our proposed architecture can reduce inference cost by up to 45% and latency to nearly real-time without incurring a loss in accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/03/2021

Bifocal Neural ASR: Exploiting Keyword Spotting for Inference Optimization

We present Bifocal RNN-T, a new variant of the Recurrent Neural Network ...
research
07/05/2022

Compute Cost Amortized Transformer for Streaming ASR

We present a streaming, Transformer-based end-to-end automatic speech re...
research
04/03/2023

Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition

We present dual-attention neural biasing, an architecture designed to bo...
research
09/13/2022

Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification

Language identification is critical for many downstream tasks in automat...
research
05/08/2021

Latency-Controlled Neural Architecture Search for Streaming Speech Recognition

Recently, neural architecture search (NAS) has attracted much attention ...
research
12/16/2016

Delta Networks for Optimized Recurrent Network Computation

Many neural networks exhibit stability in their activation patterns over...
research
01/13/2019

Serverless architecture efficiency: an exploratory study

Cloud service provider propose services to insensitive customers to use ...

Please sign up or login with your details

Forgot password? Click here to reset