Make $\ell_1$ Regularization Effective in Training Sparse CNN

05/18/2021
by   Juncai He, et al.
0

Compressed Sensing using 𝓁1 regularization is among the most powerful and popular sparsification technique in many applications, but why has it not been used to obtain sparse deep learning model such as convolutional neural network (CNN)? This paper is aimed to provide an answer to this question and to show how to make it work. Following Xiao (J Mach Learn Res 11(Oct):2543–2596, 2010), We first demonstrate that the commonly used stochastic gradient descent and variants training algorithm is not an appropriate match with 𝓁1 regularization and then replace it with a different training algorithm based on a regularized dual averaging (RDA) method. The RDA method of Xiao (J Mach Learn Res 11(Oct):2543–2596, 2010) was originally designed specifically for convex problem, but with new theoretical insight and algorithmic modifications (using proper initialization and adaptivity), we have made it an effective match with 𝓁1 regularization to achieve a state-of-the-art sparsity for the highly non-convex CNN compared to other weight pruning methods without compromising accuracy (achieving 95% sparsity for ResNet-18 on CIFAR- 10, for example).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/21/2020

Training Sparse Neural Networks using Compressed Sensing

Pruning the weights of neural networks is an effective and widely-used t...
research
07/11/2018

Modified Regularized Dual Averaging Method for Training Sparse Convolutional Neural Networks

We proposed a modified regularized dual averaging method for training sp...
research
07/07/2023

Smoothing the Edges: A General Framework for Smooth Optimization in Sparse Regularization using Hadamard Overparametrization

This paper introduces a smooth method for (structured) sparsity in ℓ_q a...
research
04/07/2020

Orthant Based Proximal Stochastic Gradient Method for ℓ_1-Regularized Optimization

Sparsity-inducing regularization problems are ubiquitous in machine lear...
research
06/26/2018

The Sparse Recovery Autoencoder

Linear encoding of sparse vectors is widely popular, but is most commonl...
research
09/30/2022

Sparse tree-based initialization for neural networks

Dedicated neural network (NN) architectures have been designed to handle...
research
09/18/2018

Enhanced 3DTV Regularization and Its Applications on Hyper-spectral Image Denoising and Compressed Sensing

The 3-D total variation (3DTV) is a powerful regularization term, which ...

Please sign up or login with your details

Forgot password? Click here to reset