A Study of the Mathematics of Deep Learning

04/28/2021
by   Anirbit Mukherjee, et al.
0

"Deep Learning"/"Deep Neural Nets" is a technological marvel that is now increasingly deployed at the cutting-edge of artificial intelligence tasks. This dramatic success of deep learning in the last few years has been hinged on an enormous amount of heuristics and it has turned out to be a serious mathematical challenge to be able to rigorously explain them. In this thesis, submitted to the Department of Applied Mathematics and Statistics, Johns Hopkins University we take several steps towards building strong theoretical foundations for these new paradigms of deep-learning. In chapter 2 we show new circuit complexity theorems for deep neural functions and prove classification theorems about these function spaces which in turn lead to exact algorithms for empirical risk minimization for depth 2 ReLU nets. We also motivate a measure of complexity of neural functions to constructively establish the existence of high-complexity neural functions. In chapter 3 we give the first algorithm which can train a ReLU gate in the realizable setting in linear time in an almost distribution free set up. In chapter 4 we give rigorous proofs towards explaining the phenomenon of autoencoders being able to do sparse-coding. In chapter 5 we give the first-of-its-kind proofs of convergence for stochastic and deterministic versions of the widely used adaptive gradient deep-learning algorithms, RMSProp and ADAM. This chapter also includes a detailed empirical study on autoencoders of the hyper-parameter values at which modern algorithms have a significant advantage over classical acceleration based methods. In the last chapter 6 we give new and improved PAC-Bayesian bounds for the risk of stochastic neural nets. This chapter also includes an experimental investigation revealing new geometric properties of the paths in weight space that are traced out by the net during the training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/11/2020

Semantic Relations and Deep Learning

The second edition of "Semantic Relations Between Nominals" (by Vivi Nas...
research
11/28/2021

Generalization Performance of Empirical Risk Minimization on Over-parameterized Deep ReLU Nets

In this paper, we study the generalization performance of global minima ...
research
12/16/2019

Realization of spatial sparseness by deep ReLU nets with massive data

The great success of deep learning poses urgent challenges for understan...
research
02/21/2019

The Power of Self-Reducibility: Selectivity, Information, and Approximation

This chapter provides a hands-on tutorial on the important technique kno...
research
04/01/2020

Depth Selection for Deep ReLU Nets in Feature Extraction and Generalization

Deep learning is recognized to be capable of discovering deep features f...
research
03/28/2023

Bayesian Free Energy of Deep ReLU Neural Network in Overparametrized Cases

In many research fields in artificial intelligence, it has been shown th...
research
03/31/2019

Deep Learning in steganography and steganalysis from 2015 to 2018

For almost 10 years, the detection of a message hidden in an image has b...

Please sign up or login with your details

Forgot password? Click here to reset