Parsimonious Deep Learning: A Differential Inclusion Approach with Global Convergence

05/23/2019
by   Yanwei Fu, et al.
15

Over-parameterization is ubiquitous nowadays in training neural networks to benefit both optimization in seeking global optima and generalization in reducing prediction error. However, compressive networks are desired in many real world applications and direct training of small networks may be trapped in local optima. In this paper, instead of pruning or distilling an over-parameterized model to compressive ones, we propose a parsimonious learning approach based on differential inclusions of inverse scale spaces, that generates a family of models from simple to complex ones with a better efficiency and interpretability than stochastic gradient descent in exploring the model space. It enjoys a simple discretization, the Split Linearized Bregman Iterations, with provable global convergence that from any initializations, algorithmic iterations converge to a critical point of empirical risks. One may exploit the proposed method to boost the complexity of neural networks progressively. Numerical experiments with MNIST, Cifar-10/100, and ImageNet are conducted to show the method is promising in training large scale models with a favorite interpretability.

READ FULL TEXT

page 5

page 10

page 13

page 14

research
07/04/2020

DessiLBI: Exploring Structural Sparsity of Deep Networks via Differential Inclusion Paths

Over-parameterization is ubiquitous nowadays in training neural networks...
research
01/28/2022

Improved Overparametrization Bounds for Global Convergence of Stochastic Gradient Descent for Shallow Neural Networks

We study the overparametrization bounds required for the global converge...
research
09/07/2019

Towards Understanding the Importance of Noise in Training Neural Networks

Numerous empirical evidence has corroborated that the noise plays a cruc...
research
07/25/2020

Economical ensembles with hypernetworks

Averaging the predictions of many independently trained neural networks ...
research
12/19/2014

Qualitatively characterizing neural network optimization problems

Training neural networks involves solving large-scale non-convex optimiz...
research
12/17/2022

Convergence Analysis for Training Stochastic Neural Networks via Stochastic Gradient Descent

In this paper, we carry out numerical analysis to prove convergence of a...
research
10/11/2015

Neural Networks with Few Multiplications

For most deep learning algorithms training is notoriously time consuming...

Please sign up or login with your details

Forgot password? Click here to reset