Do optimization methods in deep learning applications matter?

02/28/2020
by   Buse Melis Ozyildirim, et al.
0

With advances in deep learning, exponential data growth and increasing model complexity, developing efficient optimization methods are attracting much research attention. Several implementations favor the use of Conjugate Gradient (CG) and Stochastic Gradient Descent (SGD) as being practical and elegant solutions to achieve quick convergence, however, these optimization processes also present many limitations in learning across deep learning applications. Recent research is exploring higher-order optimization functions as better approaches, but these present very complex computational challenges for practical use. Comparing first and higher-order optimization functions, in this paper, our experiments reveal that Levemberg-Marquardt (LM) significantly supersedes optimal convergence but suffers from very large processing time increasing the training complexity of both, classification and reinforcement learning problems. Our experiments compare off-the-shelf optimization functions(CG, SGD, LM and L-BFGS) in standard CIFAR, MNIST, CartPole and FlappyBird experiments.The paper presents arguments on which optimization functions to use and further, which functions would benefit from parallelization efforts to improve pretraining time and learning rate convergence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/04/2019

Quasi-Newton Optimization Methods For Deep Learning Applications

Deep learning algorithms often require solving a highly non-linear and n...
research
03/11/2019

Accelerating Minibatch Stochastic Gradient Descent using Typicality Sampling

Machine learning, especially deep neural networks, has been rapidly deve...
research
04/11/2020

Exploit Where Optimizer Explores via Residuals

To train neural networks faster, many research efforts have been devoted...
research
11/08/2019

MindTheStep-AsyncPSGD: Adaptive Asynchronous Parallel Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) is very useful in optimization problem...
research
12/09/2015

Efficient Distributed SGD with Variance Reduction

Stochastic Gradient Descent (SGD) has become one of the most popular opt...
research
12/20/2019

Second-order Information in First-order Optimization Methods

In this paper, we try to uncover the second-order essence of several fir...
research
09/21/2017

Neural Optimizer Search with Reinforcement Learning

We present an approach to automate the process of discovering optimizati...

Please sign up or login with your details

Forgot password? Click here to reset