
Generalization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization
The success of deep learning has led to a rising interest in the general...
read it

A Proximal Stochastic Gradient Method with Progressive Variance Reduction
We consider the problem of minimizing the sum of two convex functions: o...
read it

Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization
We study stochastic decentralized optimization for the problem of traini...
read it

An Optimal Hybrid VarianceReduced Algorithm for Stochastic Composite Nonconvex Optimization
In this note we propose a new variant of the hybrid variancereduced pro...
read it

A DimensionInsensitive Algorithm for Stochastic ZerothOrder Optimization
This paper concerns a convex, stochastic zerothorder optimization (SZO...
read it

Momentumbased variancereduced proximal stochastic gradient method for composite nonconvex stochastic optimization
Stochastic gradient methods (SGMs) have been extensively used for solvin...
read it

Less than a Single Pass: Stochastically Controlled Stochastic Gradient Method
We develop and analyze a procedure for gradientbased optimization that ...
read it
Hybrid StochasticDeterministic Minibatch Proximal Gradient: LessThanSinglePass Optimization with Nearly Optimal Generalization
Stochastic variancereduced gradient (SVRG) algorithms have been shown to work favorably in solving largescale learning problems. Despite the remarkable success, the stochastic gradient complexity of SVRGtype algorithms usually scales linearly with data size and thus could still be expensive for huge data. To address this deficiency, we propose a hybrid stochasticdeterministic minibatch proximal gradient (HSDMPG) algorithm for stronglyconvex problems that enjoys provably improved datasizeindependent complexity guarantees. More precisely, for quadratic loss F(θ) of n components, we prove that HSDMPG can attain an ϵoptimizationerror 𝔼[F(θ)F(θ^*)]≤ϵ within 𝒪(κ^1.5ϵ^0.75log^1.5(1/ϵ)+1/ϵ∧(κ√(n)log^1.5(1/ϵ)+nlog(1/ϵ))) stochastic gradient evaluations, where κ is condition number. For generic strongly convex loss functions, we prove a nearly identical complexity bound though at the cost of slightly increased logarithmic factors. For largescale learning problems, our complexity bounds are superior to those of the prior stateoftheart SVRG algorithms with or without dependence on data size. Particularly, in the case of ϵ=𝒪(1/√(n)) which is at the order of intrinsic excess error bound of a learning model and thus sufficient for generalization, the stochastic gradient complexity bounds of HSDMPG for quadratic and generic loss functions are respectively 𝒪 (n^0.875log^1.5(n)) and 𝒪 (n^0.875log^2.25(n)), which to our best knowledge, for the first time achieve optimal generalization in less than a single pass over data. Extensive numerical results demonstrate the computational advantages of our algorithm over the prior ones.
READ FULL TEXT
Comments
There are no comments yet.