Almost Sure Convergence Rates of Stochastic Zeroth-order Gradient Descent for Łojasiewicz Functions
We prove almost sure convergence rates of Zeroth-order Gradient Descent (SZGD) algorithms for Łojasiewicz functions. The SZGD algorithm iterates as x_t+1 = x_t - η_t ∇ f (x_t), t = 0,1,2,3,⋯ , where f is the objective function that satisfies the Łojasiewicz inequality with Łojasiewicz exponent θ, η_t is the step size (learning rate), and ∇ f (x_t) is the approximate gradient estimated using zeroth-order information. We show that, for smooth Łojasiewicz functions, the sequence { x_t }_t∈ℕ governed by SZGD converges to a bounded point x_∞ almost surely, and x_∞ is a critical point of f. If θ∈ (0,1/2], f (x_t) - f (x_∞), ∑_s=t^∞ x_s - x_∞^2 and x_t - x_∞ (· is the Euclidean norm) converge to zero linearly almost surely. If θ∈ (1/2, 1), then f (x_t) - f (x_∞) (and ∑_s=t^∞ x_s+1 - x_s ^2) converges to zero at rate o ( t^1/1 - 2θlog t ) almost surely; x_t - x_∞ converges to zero at rate o ( t^1-θ/1-2θlog t ) almost surely. To the best of our knowledge, this paper provides the first almost sure convergence rate guarantee for stochastic zeroth order algorithms for Łojasiewicz functions.
READ FULL TEXT