Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum Minimization
We propose an adaptive variance-reduction method, called AdaSpider, for minimization of L-smooth, non-convex functions with a finite-sum structure. In essence, AdaSpider combines an AdaGrad-inspired [Duchi et al., 2011, McMahan Streeter, 2010], but a fairly distinct, adaptive step-size schedule with the recursive stochastic path integrated estimator proposed in [Fang et al., 2018]. To our knowledge, Adaspider is the first parameter-free non-convex variance-reduction method in the sense that it does not require the knowledge of problem-dependent parameters, such as smoothness constant L, target accuracy ϵ or any bound on gradient norms. In doing so, we are able to compute an ϵ-stationary point with Õ(n + √(n)/ϵ^2) oracle-calls, which matches the respective lower bound up to logarithmic factors.
READ FULL TEXT