## 1 Introduction

Implicit biases introduced by optimization algorithms play an crucial role in learning deep neural networks

(neyshabur2015search; neyshabur2015path; hochreiter1997flat; keskar2016large; chaudhari2016entropy; dinh2017sharp; andrychowicz2016learning; neyshabur2017geometry; zhang2017understanding; wilson2017marginal; hoffer2017train; Smith2018). Large scale neural networks used in practice are highly over-parameterized with far more trainable model parameters compared to the number of training examples. Consequently, optimization objectives for learning such high capacity models have many global minima that fit training data perfectly. However, minimizing the training loss using specific optimization algorithms take us to not just any global minima, but some special global minima, e.g., global minima minimizing some regularizer
Comments

There are no comments yet.