Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers

03/16/2021
by   Jacob R. Stevens, et al.
0

Transformers have transformed the field of natural language processing. This performance is largely attributed to the use of stacked self-attention layers, each of which consists of matrix multiplies as well as softmax operations. As a result, unlike other neural networks, the softmax operation accounts for a significant fraction of the total run-time of Transformers. To address this, we propose Softermax, a hardware-friendly softmax design. Softermax consists of base replacement, low-precision softmax computations, and an online normalization calculation. We show Softermax results in 2.35x the energy efficiency at 0.90x the size of a comparable baseline, with negligible impact on network accuracy.

READ FULL TEXT
research
07/05/2022

Softmax-free Linear Transformers

Vision transformers (ViTs) have pushed the state-of-the-art for various ...
research
10/22/2021

Sinkformers: Transformers with Doubly Stochastic Attention

Attention based models such as Transformers involve pairwise interaction...
research
07/07/2023

ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers

Transformer networks have emerged as the state-of-the-art approach for n...
research
05/19/2020

Normalized Attention Without Probability Cage

Attention architectures are widely used; they recently gained renewed po...
research
03/03/2023

Convex Bounds on the Softmax Function with Applications to Robustness Verification

The softmax function is a ubiquitous component at the output of neural n...
research
05/08/2018

Online normalizer calculation for softmax

The Softmax function is ubiquitous in machine learning, multiple previou...
research
04/08/2023

SwiftTron: An Efficient Hardware Accelerator for Quantized Transformers

Transformers' compute-intensive operations pose enormous challenges for ...

Please sign up or login with your details

Forgot password? Click here to reset