Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization

09/18/2020
by   Pan Zhou, et al.
0

Stochastic variance-reduced gradient (SVRG) algorithms have been shown to work favorably in solving large-scale learning problems. Despite the remarkable success, the stochastic gradient complexity of SVRG-type algorithms usually scales linearly with data size and thus could still be expensive for huge data. To address this deficiency, we propose a hybrid stochastic-deterministic minibatch proximal gradient (HSDMPG) algorithm for strongly-convex problems that enjoys provably improved data-size-independent complexity guarantees. More precisely, for quadratic loss F(θ) of n components, we prove that HSDMPG can attain an ϵ-optimization-error 𝔼[F(θ)-F(θ^*)]≤ϵ within 𝒪(κ^1.5ϵ^0.75log^1.5(1/ϵ)+1/ϵ∧(κ√(n)log^1.5(1/ϵ)+nlog(1/ϵ))) stochastic gradient evaluations, where κ is condition number. For generic strongly convex loss functions, we prove a nearly identical complexity bound though at the cost of slightly increased logarithmic factors. For large-scale learning problems, our complexity bounds are superior to those of the prior state-of-the-art SVRG algorithms with or without dependence on data size. Particularly, in the case of ϵ=𝒪(1/√(n)) which is at the order of intrinsic excess error bound of a learning model and thus sufficient for generalization, the stochastic gradient complexity bounds of HSDMPG for quadratic and generic loss functions are respectively 𝒪 (n^0.875log^1.5(n)) and 𝒪 (n^0.875log^2.25(n)), which to our best knowledge, for the first time achieve optimal generalization in less than a single pass over data. Extensive numerical results demonstrate the computational advantages of our algorithm over the prior ones.

READ FULL TEXT

page 1

page 2

page 3

page 4

02/19/2018

Generalization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization

The success of deep learning has led to a rising interest in the general...
05/28/2022

Stochastic Gradient Methods with Compressed Communication for Decentralized Saddle Point Problems

We propose two stochastic gradient algorithms to solve a class of saddle...
09/09/2020

Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization

We study stochastic decentralized optimization for the problem of traini...
11/15/2017

Random gradient extrapolation for distributed and stochastic optimization

In this paper, we consider a class of finite-sum convex optimization pro...
04/22/2021

A Dimension-Insensitive Algorithm for Stochastic Zeroth-Order Optimization

This paper concerns a convex, stochastic zeroth-order optimization (S-ZO...
06/17/2021

Stochastic Bias-Reduced Gradient Methods

We develop a new primitive for stochastic optimization: a low-bias, low-...
09/12/2016

Less than a Single Pass: Stochastically Controlled Stochastic Gradient Method

We develop and analyze a procedure for gradient-based optimization that ...