GPU Accelerated Sub-Sampled Newton's Method

02/26/2018
by   Sudhir B. Kylasa, et al.
0

First order methods, which solely rely on gradient information, are commonly used in diverse machine learning (ML) and data analysis (DA) applications. This is attributed to the simplicity of their implementations, as well as low per-iteration computational/storage costs. However, they suffer from significant disadvantages; most notably, their performance degrades with increasing problem ill-conditioning. Furthermore, they often involve a large number of hyper-parameters, and are notoriously sensitive to parameters such as the step-size. By incorporating additional information from the Hessian, second-order methods, have been shown to be resilient to many such adversarial effects. However, these advantages of using curvature information come at the cost of higher per-iteration costs, which in big data regimes, can be computationally prohibitive. In this paper, we show that, contrary to conventional belief, second-order methods, when implemented appropriately, can be more efficient than first-order alternatives in many large-scale ML/ DA applications. In particular, in convex settings, we consider variants of classical Newton's method in which the Hessian and/or the gradient are randomly sub-sampled. We show that by effectively leveraging the power of GPUs, such randomized Newton-type algorithms can be significantly accelerated, and can easily outperform state of the art implementations of existing techniques in popular ML/ DA software packages such as TensorFlow. Additionally these randomized methods incur a small memory overhead compared to first-order methods. In particular, we show that for million-dimensional problems, our GPU accelerated sub-sampled Newton's method achieves a higher test accuracy in milliseconds as compared with tens of seconds for first order alternatives.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/18/2018

Distributed Second-order Convex Optimization

Convex optimization problems arise frequently in diverse machine learnin...
research
01/18/2016

Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Large scale optimization problems are ubiquitous in machine learning and...
research
10/17/2019

A Stochastic Variance Reduced Nesterov's Accelerated Quasi-Newton Method

Recently algorithms incorporating second order curvature information hav...
research
01/18/2016

Sub-Sampled Newton Methods II: Local Convergence Rates

Many data-fitting applications require the solution of an optimization p...
research
08/19/2023

Complexity Guarantees for Nonconvex Newton-MR Under Inexact Hessian Information

We consider extensions of the Newton-MR algorithm for nonconvex optimiza...
research
05/22/2017

Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method

We consider large scale empirical risk minimization (ERM) problems, wher...
research
01/19/2017

Efficient Implementation Of Newton-Raphson Methods For Sequential Data Prediction

We investigate the problem of sequential linear data prediction for real...

Please sign up or login with your details

Forgot password? Click here to reset