Vision Transformer Compression with Structured Pruning and Low Rank Approximation

03/25/2022
by   Ankur Kumar, et al.
0

Transformer architecture has gained popularity due to its ability to scale with large dataset. Consequently, there is a need to reduce the model size and latency, especially for on-device deployment. We focus on vision transformer proposed for image recognition task (Dosovitskiy et al., 2021), and explore the application of different compression techniques such as low rank approximation and pruning for this purpose. Specifically, we investigate a structured pruning method proposed recently in Zhu et al. (2021) and find that mostly feedforward blocks are pruned with this approach, that too, with severe degradation in accuracy. We propose a hybrid compression approach to mitigate this where we compress the attention blocks using low rank approximation and use the previously mentioned pruning with a lower rate for feedforward blocks in each transformer layer. Our technique results in 50 increase in classification error whereas we obtain 44 relative increase in error when only pruning is applied. We propose further enhancements to bridge the accuracy gap but leave it as a future work.

READ FULL TEXT
research
06/20/2023

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Transformer models have achieved remarkable results in various natural l...
research
11/02/2021

Low-Rank+Sparse Tensor Compression for Neural Networks

Low-rank tensor compression has been proposed as a promising approach to...
research
03/19/2020

Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression

In this paper, we analyze two popular network compression techniques, i....
research
05/24/2022

Compression-aware Training of Neural Networks using Frank-Wolfe

Many existing Neural Network pruning approaches either rely on retrainin...
research
02/27/2023

Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding

Self-supervised speech representation learning (SSL) has shown to be eff...
research
08/24/2021

Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation

In this thesis, we introduce Greenformers, a collection of model efficie...
research
01/13/2023

GOHSP: A Unified Framework of Graph and Optimization-based Heterogeneous Structured Pruning for Vision Transformer

The recently proposed Vision transformers (ViTs) have shown very impress...

Please sign up or login with your details

Forgot password? Click here to reset