On-demand compute reduction with stochastic wav2vec 2.0

04/25/2022
by   Apoorv Vyas, et al.
0

Squeeze and Efficient Wav2vec (SEW) is a recently proposed architecture that squeezes the input to the transformer encoder for compute efficient pre-training and inference with wav2vec 2.0 (W2V2) models. In this work, we propose stochastic compression for on-demand compute reduction for W2V2 models. As opposed to using a fixed squeeze factor, we sample it uniformly during training. We further introduce query and key-value pooling mechanisms that can be applied to each transformer layer for further compression. Our results for models pre-trained on 960h Librispeech dataset and fine-tuned on 10h of transcribed data show that using the same stochastic model, we get a smooth trade-off between word error rate (WER) and inference time with only marginal WER degradation compared to the W2V2 and SEW models trained for a specific setting. We further show that we can fine-tune the same stochastically pre-trained model to a specific configuration to recover the WER difference resulting in significant computational savings on pre-training models from scratch.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2021

Pre-training also Transfers Non-Robustness

Pre-training has enabled many state-of-the-art results on many tasks. In...
research
09/14/2021

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

This paper is a study of performance-efficiency trade-offs in pre-traine...
research
05/18/2023

Statistical Foundations of Prior-Data Fitted Networks

Prior-data fitted networks (PFNs) were recently proposed as a new paradi...
research
03/28/2023

TabRet: Pre-training Transformer-based Tabular Models for Unseen Columns

We present TabRet, a pre-trainable Transformer-based model for tabular d...
research
06/13/2021

Memory-efficient Transformers via Top-k Attention

Following the success of dot-product attention in Transformers, numerous...
research
11/27/2021

Exploring Low-Cost Transformer Model Compression for Large-Scale Commercial Reply Suggestions

Fine-tuning pre-trained language models improves the quality of commerci...
research
12/13/2020

Discriminative Pre-training for Low Resource Title Compression in Conversational Grocery

The ubiquity of smart voice assistants has made conversational shopping ...

Please sign up or login with your details

Forgot password? Click here to reset