Do Deep Learning Models Have Too Many Parameters? An Information Theory Viewpoint

02/20/2018
by   Léonard Blier, et al.
0

Deep learning models often have more parameters than observations, and still perform well. This is sometimes described as a paradox. In this work, we show experimentally that despite their huge number of parameters, deep neural networks can compress the data losslessly even when taking the cost of encoding the parameters into account. Such a compression viewpoint originally motivated the use of variational methods in neural networks. However, we show that these variational methods provide surprisingly poor compression bounds, despite being explicitly built to minimize such bounds. This might explain the relatively poor practical performance of variational methods in deep learning. Better encoding methods, imported from the Minimum Description Length (MDL) toolbox, yield much better compression values on deep networks, corroborating the hypothesis that good compression on the training set correlates with good test performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2022

Information Flow in Deep Neural Networks

Although deep neural networks have been immensely successful, there is n...
research
11/24/2022

PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization

While there has been progress in developing non-vacuous generalization b...
research
11/07/2017

Compression-aware Training of Deep Networks

In recent years, great progress has been made in a variety of applicatio...
research
12/13/2021

On the Compression of Natural Language Models

Deep neural networks are effective feature extractors but they are prohi...
research
09/30/2018

Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters

While deep neural networks are a highly successful model class, their la...
research
04/20/2019

Compression and Localization in Reinforcement Learning for ATARI Games

Deep neural networks have become commonplace in the domain of reinforcem...
research
11/12/2018

Generalized Ternary Connect: End-to-End Learning and Compression of Multiplication-Free Deep Neural Networks

The use of deep neural networks in edge computing devices hinges on the ...

Please sign up or login with your details

Forgot password? Click here to reset