When to Use Efficient Self Attention? Profiling Text, Speech and Image Transformer Variants

06/14/2023
by   Anuj Diwan, et al.
0

We present the first unified study of the efficiency of self-attention-based Transformer variants spanning text, speech and vision. We identify input length thresholds (tipping points) at which efficient Transformer variants become more efficient than vanilla models, using a variety of efficiency metrics (latency, throughput, and memory). To conduct this analysis for speech, we introduce L-HuBERT, a novel local-attention variant of a self-supervised speech model. We observe that these thresholds are (a) much higher than typical dataset sequence lengths and (b) dependent on the metric and modality, showing that choosing the right model depends on modality, task type (long-form vs. typical context) and resource constraints (time vs. memory). By visualising the breakdown of the computational costs for transformer components, we also show that non-self-attention components exhibit significant computational costs. We release our profiling toolkit at https://github.com/ajd12342/profiling-transformers .

READ FULL TEXT

page 4

page 10

research
02/23/2022

FastRPB: a Scalable Relative Positional Encoding for Long Sequence Tasks

Transformers achieve remarkable performance in various domains, includin...
research
09/29/2022

Dilated Neighborhood Attention Transformer

Transformers are quickly becoming one of the most heavily applied deep l...
research
05/16/2020

Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory

Transformer-based acoustic modeling has achieved great suc-cess for both...
research
12/21/2020

Sub-Linear Memory: How to Make Performers SLiM

The Transformer architecture has revolutionized deep learning on sequent...
research
11/17/2020

s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis

Neural end-to-end text-to-speech (TTS) , which adopts either a recurrent...
research
05/28/2021

An Attention Free Transformer

We introduce Attention Free Transformer (AFT), an efficient variant of T...

Please sign up or login with your details

Forgot password? Click here to reset