Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation

09/11/2020
by   Murad Tukan, et al.
0

A common technique for compressing a neural network is to compute the k-rank ℓ_2 approximation A_k,2 of the matrix A∈ℝ^n× d that corresponds to a fully connected layer (or embedding layer). Here, d is the number of the neurons in the layer, n is the number in the next one, and A_k,2 can be stored in O((n+d)k) memory instead of O(nd). This ℓ_2-approximation minimizes the sum over every entry to the power of p=2 in the matrix A - A_k,2, among every matrix A_k,2∈ℝ^n× d whose rank is k. While it can be computed efficiently via SVD, the ℓ_2-approximation is known to be very sensitive to outliers ("far-away" rows). Hence, machine learning uses e.g. Lasso Regression, ℓ_1-regularization, and ℓ_1-SVM that use the ℓ_1-norm. This paper suggests to replace the k-rank ℓ_2 approximation by ℓ_p, for p∈ [1,2]. We then provide practical and provable approximation algorithms to compute it for any p≥1, based on modern techniques in computational geometry. Extensive experimental results on the GLUE benchmark for compressing BERT, DistilBERT, XLNet, and RoBERTa confirm this theoretical advantage. For example, our approach achieves 28% compression of RoBERTa's embedding layer with only 0.63% additive drop in the accuracy (without fine-tuning) in average over all tasks in GLUE, compared to 11% drop using the existing ℓ_2-approximation. Open code is provided for reproducing and extending our results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2020

Deep Learning Meets Projective Clustering

A common approach for compressing NLP networks is to encode the embeddin...
research
07/30/2021

Pruning Neural Networks with Interpolative Decompositions

We introduce a principled approach to neural network pruning that casts ...
research
09/04/2017

Domain-adaptive deep network compression

Deep Neural Networks trained on large datasets can be easily transferred...
research
08/22/2022

SVD-NAS: Coupling Low-Rank Approximation and Neural Architecture Search

The task of compressing pre-trained Deep Neural Networks has attracted w...
research
01/16/2018

Rank Selection of CP-decomposed Convolutional Layers with Variational Bayesian Matrix Factorization

Convolutional Neural Networks (CNNs) is one of successful method in many...
research
05/25/2023

Sharpness-Aware Minimization Leads to Low-Rank Features

Sharpness-aware minimization (SAM) is a recently proposed method that mi...
research
05/27/2019

Minimizing Time-to-Rank: A Learning and Recommendation Approach

Consider the following problem faced by an online voting platform: A use...

Please sign up or login with your details

Forgot password? Click here to reset