DeepAI AI Chat
Log In Sign Up

Revisiting "Qualitatively Characterizing Neural Network Optimization Problems"

by   Jonathan Frankle, et al.

We revisit and extend the experiments of Goodfellow et al. (2014), who showed that - for then state-of-the-art networks - "the objective function has a simple, approximately convex shape" along the linear path between initialization and the trained weights. We do not find this to be the case for modern networks on CIFAR-10 and ImageNet. Instead, although loss is roughly monotonically non-increasing along this path, it remains high until close to the optimum. In addition, training quickly becomes linearly separated from the optimum by loss barriers. We conclude that, although Goodfellow et al.'s findings describe the "relatively easy to optimize" MNIST setting, behavior is qualitatively different in modern settings.


page 1

page 2

page 3

page 4


Qualitatively characterizing neural network optimization problems

Training neural networks involves solving large-scale non-convex optimiz...

A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks

We analyze speed of convergence to global optimum for gradient descent t...

Understanding the Loss Surface of Neural Networks for Binary Classification

It is widely conjectured that the reason that training algorithms for ne...

Facies classification from well logs using an inception convolutional network

The idea to use automated algorithms to determine geological facies from...

DROCC: Deep Robust One-Class Classification

Classical approaches for one-class problems such as one-class SVM (Schol...

Convolution Aware Initialization

Initialization of parameters in deep neural networks has been shown to h...

Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark

The GLUE benchmark (Wang et al., 2019b) is a suite of language understan...