Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency

04/05/2021
by   Yangyang Shi, et al.
0

We propose a dynamic encoder transducer (DET) for on-device speech recognition. One DET model scales to multiple devices with different computation capacities without retraining or finetuning. To trading off accuracy and latency, DET assigns different encoders to decode different parts of an utterance. We apply and compare the layer dropout and the collaborative learning for DET training. The layer dropout method that randomly drops out encoder layers in the training phase, can do on-demand layer dropout in decoding. Collaborative learning jointly trains multiple encoders with different depths in one single model. Experiment results on Librispeech and in-house data show that DET provides a flexible accuracy and latency trade-off. Results on Librispeech show that the full-size encoder in DET relatively reduces the word error rate of the same size baseline by over 8 lightweight encoder in DET trained with collaborative learning reduces the model size by 25 gets similar accuracy as a baseline model with better latency on a large in-house data set by assigning a lightweight encoder for the beginning part of one utterance and a full-size encoder for the rest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2021

Collaborative Training of Acoustic Encoders for Speech Recognition

On-device speech recognition requires training models of different sizes...
research
10/07/2021

Enabling On-Device Training of Speech Recognition Models with Federated Dropout

Federated learning can be used to train machine learning models on the e...
research
11/03/2020

Dynamic latency speech recognition with asynchronous revision

In this work we propose an inference technique, asynchronous revision, t...
research
04/06/2021

Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios

Often, the storage and computational constraints of embeddeddevices dema...
research
04/13/2022

A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes

In this paper, we propose a dynamic cascaded encoder Automatic Speech Re...
research
11/23/2022

Pruned Lightweight Encoders for Computer Vision

Latency-critical computer vision systems, such as autonomous driving or ...
research
12/18/2010

Self-Organising Stochastic Encoders

The processing of mega-dimensional data, such as images, scales linearly...

Please sign up or login with your details

Forgot password? Click here to reset