A Highly Effective Low-Rank Compression of Deep Neural Networks with Modified Beam-Search and Modified Stable Rank

11/30/2021
by   Moonjung Eo, et al.
0

Compression has emerged as one of the essential deep learning research topics, especially for the edge devices that have limited computation power and storage capacity. Among the main compression techniques, low-rank compression via matrix factorization has been known to have two problems. First, an extensive tuning is required. Second, the resulting compression performance is typically not impressive. In this work, we propose a low-rank compression method that utilizes a modified beam-search for an automatic rank selection and a modified stable rank for a compression-friendly training. The resulting BSR (Beam-search and Stable Rank) algorithm requires only a single hyperparameter to be tuned for the desired compression ratio. The performance of BSR in terms of accuracy and compression ratio trade-off curve turns out to be superior to the previously known low-rank compression methods. Furthermore, BSR can perform on par with or better than the state-of-the-art structured pruning methods. As with pruning, BSR can be easily combined with quantization for an additional compression.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/25/2023

Low-Rank Prune-And-Factorize for Language Model Compression

The components underpinning PLMs – large weight matrices – were shown to...
research
02/20/2018

DeepThin: A Self-Compressing Library for Deep Neural Networks

As the industry deploys increasingly large and complex neural networks t...
research
10/30/2018

DeepTwist: Learning Model Compression via Occasional Weight Distortion

Model compression has been introduced to reduce the required hardware re...
research
03/22/2023

Low Rank Optimization for Efficient Deep Learning: Making A Balance between Compact Architecture and Fast Training

Deep neural networks have achieved great success in many data processing...
research
08/04/2020

PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning

Lossy gradient compression has become a practical tool to overcome the c...
research
11/21/2022

Learning Low-Rank Representations for Model Compression

Vector Quantization (VQ) is an appealing model compression method to obt...
research
10/06/2015

Structured Transforms for Small-Footprint Deep Learning

We consider the task of building compact deep learning pipelines suitabl...

Please sign up or login with your details

Forgot password? Click here to reset