Path Length Bounds for Gradient Descent and Flow
We provide path length bounds on gradient descent (GD) and flow (GF) curves for various classes of smooth convex and nonconvex functions. We make six distinct contributions: (a) we prove a meta-theorem that if GD has linear convergence towards an optimal set, then its path length is upper bounded by the distance to the optimal set multiplied by a function of the rate of convergence, (b) under the Polyak-Lojasiewicz (PL) condition (a generalization of strong convexity that allows for certain nonconvex functions), we show that the aforementioned multiplicative factor is at most √(κ), (c) we show an Ω(√(d)∧κ^1/4), times the length of the direct path, lower bound on the worst-case path length for PL functions, (d) for the special case of quadratics, we show that the bound is Θ({√(d),√(κ)}) and in some cases can be independent of κ, (e) under the weaker assumption of just convexity, where there is no natural notion of a condition number, we prove that the path length can be at most 2^10d^2 times the length of the direct path, (f) finally, for separable quasiconvex functions the path length is both upper and lower bounded by Θ(√(d)) times the length of the direct path.
READ FULL TEXT