Stochastic Compositional Gradient Descent: Algorithms for Minimizing Compositions of Expected-Value Functions

11/14/2014
by   Mengdi Wang, et al.
0

Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, i.e., problems of the form _x E_v [f_v(E_w [g_w(x)])]. In order to solve this stochastic composition problem, we propose a class of stochastic compositional gradient descent (SCGD) algorithms that can be viewed as stochastic versions of quasi-gradient method. SCGD update the solutions based on noisy sample gradients of f_v,g_w and use an auxiliary variable to track the unknown quantity E_w[g_w(x)]. We prove that the SCGD converge almost surely to an optimal solution for convex optimization problems, as long as such a solution exists. The convergence involves the interplay of two iterations with different time scales. For nonsmooth convex problems, the SCGD achieve a convergence rate of O(k^-1/4) in the general case and O(k^-2/3) in the strongly convex case, after taking k samples. For smooth convex problems, the SCGD can be accelerated to converge at a rate of O(k^-2/7) in the general case and O(k^-4/5) in the strongly convex case. For nonconvex problems, we prove that any limit point generated by SCGD is a stationary point, for which we also provide the convergence rate analysis. Indeed, the stochastic setting where one wants to optimize compositions of expected-value functions is very common in practice. The proposed SCGD methods find wide applications in learning, estimation, dynamic programming, etc.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/04/2018

Compositional Stochastic Average Gradient for Machine Learning and Related Applications

Many machine learning, statistical inference, and portfolio optimization...
research
10/26/2017

Duality-free Methods for Stochastic Composition Optimization

We consider the composition optimization with two expected-value functio...
research
09/02/2017

A convergence analysis of the perturbed compositional gradient flow: averaging principle and normal deviations

We consider in this work a system of two stochastic differential equatio...
research
09/07/2018

An Anderson-Chebyshev Mixing Method for Nonlinear Optimization

Anderson mixing (or Anderson acceleration) is an efficient acceleration ...
research
09/07/2018

A Fast Anderson-Chebyshev Mixing Method for Nonlinear Optimization

Anderson mixing (or Anderson acceleration) is an efficient acceleration ...
research
06/16/2023

Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima

Sharpness-Aware Minimization (SAM) is an optimizer that takes a descent ...
research
12/14/2020

Noisy Linear Convergence of Stochastic Gradient Descent for CV@R Statistical Learning under Polyak-Łojasiewicz Conditions

Conditional Value-at-Risk (CV@R) is one of the most popular measures of ...

Please sign up or login with your details

Forgot password? Click here to reset