Implicit Bias of Gradient Descent on Linear Convolutional Networks

06/01/2018 ∙ by Suriya Gunasekar, et al. ∙ University of Southern California Toyota Technological Institute at Chicago 0

We show that gradient descent on full-width linear convolutional networks of depth L converges to a linear predictor related to the ℓ_2/L bridge penalty in the frequency domain. This is in contrast to linearly fully connected networks, where gradient descent converges to the hard margin linear support vector machine solution, regardless of depth.



There are no comments yet.


page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Implicit biases introduced by optimization algorithms play an crucial role in learning deep neural networks

(neyshabur2015search; neyshabur2015path; hochreiter1997flat; keskar2016large; chaudhari2016entropy; dinh2017sharp; andrychowicz2016learning; neyshabur2017geometry; zhang2017understanding; wilson2017marginal; hoffer2017train; Smith2018). Large scale neural networks used in practice are highly over-parameterized with far more trainable model parameters compared to the number of training examples. Consequently, optimization objectives for learning such high capacity models have many global minima that fit training data perfectly. However, minimizing the training loss using specific optimization algorithms take us to not just any global minima, but some special global minima, e.g., global minima minimizing some regularizer