Redundancy in active paths of deep networks: a random active path model

05/02/2017
by   Haiping Huang, et al.
0

Deep learning has become a powerful and popular tool for a variety of machine learning tasks. However, it is extremely challenging to understand the mechanism of deep learning from a theoretical perspective. In this work, we study robustness of a deep network in its generalization capability against removal of a certain number of connections between layers. A critical value of this number is observed to separate a robust (redundant) regime from a sensitive regime. This empirical behavior is captured qualitatively by a random active path model, where the path from input to output is randomly and independently constructed. The empirical critical value corresponds to termination of a paramagnetic phase in the random active path model. Furthermore, this model provides us qualitative understandings about dropconnect probability commonly used in the dropconnect algorithm and its relationship with the redundancy phenomenon. In addition, we combine the dropconnect and the random feedback alignment for feedforward and backward pass in a deep network training respectively, and observe fast learning and improved test performance in classifying a benchmark handwritten digits dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2021

Experiments with Rich Regime Training for Deep Learning

In spite of advances in understanding lazy training, recent work attribu...
research
05/23/2017

Input Fast-Forwarding for Better Deep Learning

This paper introduces a new architectural framework, known as input fast...
research
05/28/2019

A Hessian Based Complexity Measure for Deep Networks

Deep (neural) networks have been applied productively in a wide range of...
research
09/19/2020

Redundancy of Hidden Layers in Deep Learning: An Information Perspective

Although the deep structure guarantees the powerful expressivity of deep...
research
11/19/2015

Universal halting times in optimization and machine learning

The authors present empirical distributions for the halting time (measur...
research
05/24/2023

On progressive sharpening, flat minima and generalisation

We present a new approach to understanding the relationship between loss...

Please sign up or login with your details

Forgot password? Click here to reset