Representation mitosis in wide neural networks

06/07/2021
by   Diego Doimo, et al.
0

Deep neural networks (DNNs) defy the classical bias-variance trade-off: adding parameters to a DNN that exactly interpolates its training data will typically improve its generalisation performance. Explaining the mechanism behind the benefit of such over-parameterisation is an outstanding challenge for deep learning theory. Here, we study the last layer representation of various deep architectures such as Wide-ResNets for image classification and find evidence for an underlying mechanism that we call *representation mitosis*: if the last hidden representation is wide enough, its neurons tend to split into groups which carry identical information, and differ from each other only by a statistically independent noise. Like in a mitosis process, the number of such groups, or “clones”, increases linearly with the width of the layer, but only if the width is above a critical value. We show that a key ingredient to activate mitosis is continuing the training process until the training error is zero. Finally, we show that in one of the learning tasks we considered, a wide model with several automatically developed clones performs significantly better than a deep ensemble based on architectures in which the last layer has the same size as the clones.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/21/2021

A Note on Connectivity of Sublevel Sets in Deep Learning

It is shown that for deep neural networks, a single wide layer of width ...
research
10/30/2017

The loss surface and expressivity of deep convolutional neural networks

We analyze the expressiveness and loss surface of practical deep convolu...
research
09/25/2019

Wider Networks Learn Better Features

Transferability of learned features between tasks can massively reduce t...
research
06/23/2016

DropNeuron: Simplifying the Structure of Deep Neural Networks

Deep learning using multi-layer neural networks (NNs) architecture manif...
research
01/08/2021

On the Turnpike to Design of Deep Neural Nets: Explicit Depth Bounds

It is well-known that the training of Deep Neural Networks (DNN) can be ...
research
06/02/2023

MLP-Mixer as a Wide and Sparse MLP

Multi-layer perceptron (MLP) is a fundamental component of deep learning...
research
04/03/2017

Truncating Wide Networks using Binary Tree Architectures

Recent study shows that a wide deep network can obtain accuracy comparab...

Please sign up or login with your details

Forgot password? Click here to reset