Generalization by design: Shortcuts to Generalization in Deep Learning

by   Petr Taborsky, et al.

We take a geometrical viewpoint and present a unifying view on supervised deep learning with the Bregman divergence loss function - this entails frequent classification and prediction tasks. Motivated by simulations we suggest that there is principally no implicit bias of vanilla stochastic gradient descent training of deep models towards "simpler" functions. Instead, we show that good generalization may be instigated by bounded spectral products over layers leading to a novel geometric regularizer. It is revealed that in deep enough models such a regularizer enables both, extreme accuracy and generalization, to be reached. We associate popular regularization techniques like weight decay, drop out, batch normalization, and early stopping with this perspective. Backed up by theory we further demonstrate that "generalization by design" is practically possible and that good generalization may be encoded into the structure of the network. We design two such easy-to-use structural regularizers that insert an additional generalization layer into a model architecture, one with a skip connection and another one with drop-out. We verify our theoretical results in experiments on various feedforward and convolutional architectures, including ResNets, and datasets (MNIST, CIFAR10, synthetic data). We believe this work opens up new avenues of research towards better generalizing architectures.


page 1

page 2

page 3

page 4


Drop-Activation: Implicit Parameter Reduction and Harmonic Regularization

Overfitting frequently occurs in deep learning. In this paper, we propos...

Robust Implicit Regularization via Weight Normalization

Overparameterized models may have many interpolating solutions; implicit...

Train longer, generalize better: closing the generalization gap in large batch training of neural networks

Background: Deep learning models are typically trained using stochastic ...

Implicit Regularization of Stochastic Gradient Descent in Natural Language Processing: Observations and Implications

Deep neural networks with remarkably strong generalization performances ...

Deep Learning: Generalization Requires Deep Compositional Feature Space Design

Generalization error defines the discriminability and the representation...

Mitigating the Effect of Dataset Bias on Training Deep Models for Chest X-rays

Deep learning has gained tremendous attention on CAD (Computer-aided Dia...

Constrained Deep Learning using Conditional Gradient and Applications in Computer Vision

A number of results have recently demonstrated the benefits of incorpora...

Please sign up or login with your details

Forgot password? Click here to reset