CPSAA: Accelerating Sparse Attention using Crossbar-based Processing-In-Memory Architecture

10/13/2022
by   Huize Li, et al.
0

The attention mechanism requires huge computational efforts to process unnecessary calculations, significantly limiting the system's performance. Researchers propose sparse attention to convert some DDMM operations to SDDMM and SpMM operations. However, current sparse attention solutions introduce massive off-chip random memory access. We propose CPSAA, a novel crossbar-based PIM-featured sparse attention accelerator. First, we present a novel attention calculation mode. Second, we design a novel PIM-based sparsity pruning architecture. Finally, we present novel crossbar-based methods. Experimental results show that CPSAA has an average of 89.6X, 32.2X, 17.8X, 3.39X, and 3.84X performance improvement and 755.6X, 55.3X, 21.3X, 5.7X, and 4.9X energy-saving when compare with GPU, FPGA, SANGER, ReBERT, and ReTransformer.

READ FULL TEXT

page 3

page 4

page 6

page 7

page 10

page 11

research
12/17/2020

SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning

The attention mechanism is becoming increasingly popular in Natural Lang...
research
07/21/2020

TCIM: Triangle Counting Acceleration With Processing-In-MRAM Architecture

Triangle counting (TC) is a fundamental problem in graph analysis and ha...
research
09/01/2022

Sparse Attention Acceleration with Synergistic In-Memory Pruning and On-Chip Recomputation

As its core computation, a self-attention mechanism gauges pairwise corr...
research
04/28/2022

Attention Mechanism with Energy-Friendly Operations

Attention mechanism has become the dominant module in natural language p...
research
08/21/2017

GraphR: Accelerating Graph Processing Using ReRAM

This paper presents GRAPHR, the first ReRAM-based graph processing accel...
research
09/20/2022

Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design

Attention-based neural networks have become pervasive in many AI tasks. ...
research
08/22/2022

Performance Modeling Sparse MTTKRP Using Optical Static Random Access Memory on FPGA

Electrical static random memory (E-SRAM) is the current standard for int...

Please sign up or login with your details

Forgot password? Click here to reset