Cascaded Projection: End-to-End Network Compression and Acceleration

03/12/2019
by   Breton Minnehan, et al.
0

We propose a data-driven approach for deep convolutional neural network compression that achieves high accuracy with high throughput and low memory requirements. Current network compression methods either find a low-rank factorization of the features that requires more memory, or select only a subset of features by pruning entire filter channels. We propose the Cascaded Projection (CaP) compression method that projects the output and input filter channels of successive layers to a unified low dimensional space based on a low-rank projection. We optimize the projection to minimize classification loss and the difference between the next layer's features in the compressed and uncompressed networks. To solve this non-convex optimization problem we propose a new optimization method of a proxy matrix using backpropagation and Stochastic Gradient Descent (SGD) with geometric constraints. Our cascaded projection approach leads to improvements in all critical areas of network compression: high accuracy, low memory consumption, low parameter count and high processing speed. The proposed CaP method demonstrates state-of-the-art results compressing VGG16 and ResNet networks with over 4x reduction in the number of computations and excellent performance in top-5 accuracy on the ImageNet dataset before and after fine-tuning.

READ FULL TEXT
research
03/24/2019

One time is not enough: iterative tensor decomposition for neural network compression

The low-rank tensor approximation is very promising for the compression ...
research
04/12/2022

Compact Model Training by Low-Rank Projection with Energy Transfer

Low-rankness plays an important role in traditional machine learning, bu...
research
03/19/2020

Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression

In this paper, we analyze two popular network compression techniques, i....
research
06/09/2023

End-to-End Neural Network Compression via ℓ_1/ℓ_2 Regularized Latency Surrogates

Neural network (NN) compression via techniques such as pruning, quantiza...
research
03/23/2018

Iterative Low-Rank Approximation for CNN Compression

Deep convolutional neural networks contain tens of millions of parameter...
research
11/21/2022

Learning Low-Rank Representations for Model Compression

Vector Quantization (VQ) is an appealing model compression method to obt...
research
01/16/2018

Rank Selection of CP-decomposed Convolutional Layers with Variational Bayesian Matrix Factorization

Convolutional Neural Networks (CNNs) is one of successful method in many...

Please sign up or login with your details

Forgot password? Click here to reset