AttMEMO : Accelerating Transformers with Memoization on Big Memory Systems

01/23/2023
by   Yuan Feng, et al.
0

Transformer models gain popularity because of their superior inference accuracy and inference throughput. However, the transformer is computation-intensive, causing a long inference time. The existing works on transformer inference acceleration have limitations caused by either the modification of transformer architectures or the need of specialized hardware. In this paper, we identify the opportunities of using memoization to accelerate the self-attention mechanism in transformers without the above limitations. Built upon a unique observation that there is rich similarity in attention computation across inference sequences, we build a memoization database that leverages the emerging big memory system. We introduce a novel embedding technique to find semantically similar inputs to identify computation similarity. We also introduce a series of techniques such as memory mapping and selective memoization to avoid memory copy and unnecessary overhead. We enable 22 inference accuracy.

READ FULL TEXT

page 2

page 4

page 7

page 10

page 11

page 12

research
05/30/2023

Blockwise Parallel Transformer for Long Context Large Models

Transformers have emerged as the cornerstone of state-of-the-art natural...
research
10/21/2021

Transformer Acceleration with Dynamic Sparse Attention

Transformers are the mainstream of NLP applications and are becoming inc...
research
08/29/2021

TCCT: Tightly-Coupled Convolutional Transformer on Time Series Forecasting

Time series forecasting is essential for a wide range of real-world appl...
research
03/23/2023

Primer: Fast Private Transformer Inference on Encrypted Data

It is increasingly important to enable privacy-preserving inference for ...
research
09/23/2022

Faith: An Efficient Framework for Transformer Verification on GPUs

Transformer verification draws increasing attention in machine learning ...
research
10/04/2022

Memory in humans and deep language models: Linking hypotheses for model augmentation

The computational complexity of the self-attention mechanism in Transfor...
research
03/22/2021

DeepViT: Towards Deeper Vision Transformer

Vision transformers (ViTs) have been successfully applied in image class...

Please sign up or login with your details

Forgot password? Click here to reset