Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method

05/22/2017
by   Mark Eisen, et al.
0

We consider large scale empirical risk minimization (ERM) problems, where both the problem dimension and variable size is large. In these cases, most second order methods are infeasible due to the high cost in both computing the Hessian over all samples and computing its inverse in high dimensions. In this paper, we propose a novel adaptive sample size second-order method, which reduces the cost of computing the Hessian by solving a sequence of ERM problems corresponding to a subset of samples and lowers the cost of computing the Hessian inverse using a truncated eigenvalue decomposition. We show that while we geometrically increase the size of the training set at each stage, a single iteration of the truncated Newton method is sufficient to solve the new ERM within its statistical accuracy. Moreover, for a large number of samples we are allowed to double the size of the training set at each stage, and the proposed method subsequently reaches the statistical accuracy of the full training set approximately after two effective passes. In addition to this theoretical result, we show empirically on a number of well known data sets that the proposed truncated adaptive sample size algorithm outperforms stochastic alternatives for solving ERM problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2018

Efficient Distributed Hessian Free Algorithm for Large-scale Empirical Risk Minimization via Accumulating Sample Strategy

In this paper, we propose a Distributed Accumulated Newton Conjugate gra...
research
06/10/2021

Exploiting Local Convergence of Quasi-Newton Methods Globally: Adaptive Sample Size Approach

In this paper, we study the application of quasi-Newton methods for solv...
research
05/23/2018

Approximate Newton-based statistical inference using only stochastic gradients

We present a novel inference framework for convex empirical risk minimiz...
research
06/11/2019

ADASS: Adaptive Sample Selection for Training Acceleration

Stochastic gradient decent (SGD) and its variants, including some accele...
research
06/06/2022

Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches

Stochastic variance reduction has proven effective at accelerating first...
research
12/08/2021

Learning Linear Models Using Distributed Iterative Hessian Sketching

This work considers the problem of learning the Markov parameters of a l...
research
02/26/2018

GPU Accelerated Sub-Sampled Newton's Method

First order methods, which solely rely on gradient information, are comm...

Please sign up or login with your details

Forgot password? Click here to reset