DeepAI AI Chat
Log In Sign Up

HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression

11/30/2022
by   Jiaqi Gu, et al.
The University of Texas at Austin
17

Transformers have attained superior performance in natural language processing and computer vision. Their self-attention and feedforward layers are overparameterized, limiting inference speed and energy efficiency. Tensor decomposition is a promising technique to reduce parameter redundancy by leveraging tensor algebraic properties to express the parameters in a factorized form. Prior efforts used manual or heuristic factorization settings without hardware-aware customization, resulting in poor hardware efficiencies and large performance degradation. In this work, we propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions and automates the choice of tensorization shape and decomposition rank with hardware-aware co-optimization. We jointly investigate tensor contraction path optimizations and a fused Einsum mapping strategy to bridge the gap between theoretical benefits and real hardware efficiency improvement. Our two-stage knowledge distillation flow resolves the trainability bottleneck and thus significantly boosts the final accuracy of factorized Transformers. Overall, we experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1 Pareto frontier than hand-tuned and heuristic baselines.

READ FULL TEXT

page 4

page 9

page 13

04/27/2022

DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers

Transformers are successfully applied to computer vision due to their po...
05/28/2020

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

Transformers are ubiquitous in Natural Language Processing (NLP) tasks, ...
04/26/2023

Tensor Decomposition for Model Reduction in Neural Networks: A Review

Modern neural networks have revolutionized the fields of computer vision...
03/16/2021

Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers

Transformers have transformed the field of natural language processing. ...
06/30/2021

Improving the Efficiency of Transformers for Resource-Constrained Devices

Transformers provide promising accuracy and have become popular and used...
03/24/2023

EdgeTran: Co-designing Transformers for Efficient Inference on Mobile Edge Platforms

Automated design of efficient transformer models has recently attracted ...
10/09/2017

Hotspot-aware DSA Grouping and Mask Assignment

In Directed Self Assembly (DSA), poor printing of guiding templates can ...