Unreasonable Effectiveness of Learning Neural Networks: From Accessible States and Robust Ensembles to Basic Algorithmic Schemes

05/20/2016
by   Carlo Baldassi, et al.
0

In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost-function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare - but extremely dense and accessible - regions of configurations in the network weight space. We define a novel measure, which we call the "robust ensemble" (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models, and also provide a general algorithmic scheme which is straightforward to implement: define a cost-function given by a sum of a finite number of replicas of the original cost-function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful new algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2015

Subdominant Dense Clusters Allow for Simple Learning and High Computational Performance in Neural Networks with Discrete Synapses

We show that discrete synaptic weights can be efficiently used for learn...
research
10/26/2017

On the role of synaptic stochasticity in training low-precision neural networks

Stochasticity and limited precision of synaptic weights in neural networ...
research
05/26/2022

Avoiding Barren Plateaus with Classical Deep Neural Networks

Variational quantum algorithms (VQAs) are among the most promising algor...
research
05/29/2019

How to iron out rough landscapes and get optimal performances: Replicated Gradient Descent and its application to tensor PCA

In many high-dimensional estimation problems the main task consists in m...
research
07/17/2019

On the geometry of solutions and on the capacity of multi-layer neural networks with ReLU activations

Rectified Linear Units (ReLU) have become the main model for the neural ...
research
02/23/2020

Automatic Cost Function Learning with Interpretable Compositional Networks

Cost Function Networks (CFN) are a formalism in Constraint Programming t...

Please sign up or login with your details

Forgot password? Click here to reset