Quasi-Newton Optimization Methods For Deep Learning Applications

09/04/2019
by   Jacob Rafati, et al.
0

Deep learning algorithms often require solving a highly non-linear and nonconvex unconstrained optimization problem. Methods for solving optimization problems in large-scale machine learning, such as deep learning and deep reinforcement learning (RL), are generally restricted to the class of first-order algorithms, like stochastic gradient descent (SGD). While SGD iterates are inexpensive to compute, they have slow theoretical convergence rates. Furthermore, they require exhaustive trial-and-error to fine-tune many learning parameters. Using second-order curvature information to find search directions can help with more robust convergence for non-convex optimization problems. However, computing Hessian matrices for large-scale problems is not computationally practical. Alternatively, quasi-Newton methods construct an approximate of the Hessian matrix to build a quadratic model of the objective function. Quasi-Newton methods, like SGD, require only first-order gradient information, but they can result in superlinear convergence, which makes them attractive alternatives to SGD. The limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) approach is one of the most popular quasi-Newton methods that construct positive definite Hessian approximations. In this chapter, we propose efficient optimization methods based on L-BFGS quasi-Newton methods using line search and trust-region strategies. Our methods bridge the disparity between first- and second-order methods by using gradient information to calculate low-rank updates to Hessian approximations. We provide formal convergence analysis of these methods as well as empirical results on deep learning applications, such as image classification tasks and deep reinforcement learning on a set of ATARI 2600 video games. Our results show a robust convergence with preferred generalization characteristics as well as fast training time.

READ FULL TEXT
research
11/06/2018

Quasi-Newton Optimization in Deep Q-Learning for Playing ATARI Games

Reinforcement Learning (RL) algorithms allow artificial agents to improv...
research
07/01/2018

Trust-Region Algorithms for Training Responses: Machine Learning Methods Using Indefinite Hessian Approximations

Machine learning (ML) problems are often posed as highly nonlinear and n...
research
05/18/2022

On the efficiency of Stochastic Quasi-Newton Methods for Deep Learning

While first-order methods are popular for solving optimization problems ...
research
08/25/2017

Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study

The resurgence of deep learning, as a highly effective machine learning ...
research
05/17/2021

Trust Region Method for Coupled Systems of PDE Solvers and Deep Neural Networks

Physics-informed machine learning and inverse modeling require the solut...
research
02/28/2020

Do optimization methods in deep learning applications matter?

With advances in deep learning, exponential data growth and increasing m...
research
04/19/2022

A Novel Fast Exact Subproblem Solver for Stochastic Quasi-Newton Cubic Regularized Optimization

In this work we describe an Adaptive Regularization using Cubics (ARC) m...

Please sign up or login with your details

Forgot password? Click here to reset