Backtracking Gradient Descent allowing unbounded learning rates

by   Tuyen Trung Truong, et al.

In unconstrained optimisation on an Euclidean space, to prove convergence in Gradient Descent processes (GD) x_n+1=x_n-δ _n ∇ f(x_n) it usually is required that the learning rates δ _n's are bounded: δ _n≤δ for some positive δ. Under this assumption, if the sequence x_n converges to a critical point z, then with large values of n the update will be small because ||x_n+1-x_n||≲ ||∇ f(x_n)||. This may also force the sequence to converge to a bad minimum. If we can allow, at least theoretically, that the learning rates δ _n's are not bounded, then we may have better convergence to better minima. A previous joint paper by the author showed convergence for the usual version of Backtracking GD under very general assumptions on the cost function f. In this paper, we allow the learning rates δ _n to be unbounded, in the sense that there is a function h:(0,∞)→ (0,∞ ) such that lim _t→ 0th(t)=0 and δ _n≲max{h(x_n),δ} for all n satisfying Armijo's condition, and prove convergence under the same assumptions as in the mentioned paper. It will be shown that this growth rate of h is best possible if one wants convergence of the sequence {x_n}. A specific way for choosing δ _n in a discrete way connects to Two-way Backtracking GD defined in the mentioned paper. We provide some results which either improve or are implicitly contained in those in the mentioned paper and another recent paper on avoidance of saddle points.


page 1

page 2

page 3

page 4


Asymptotic behaviour of learning rates in Armijo's condition

Fix a constant 0<α <1. For a C^1 function f:ℝ^k→ℝ, a point x and a posit...

Almost Sure Convergence Rates of Stochastic Zeroth-order Gradient Descent for Łojasiewicz Functions

We prove almost sure convergence rates of Zeroth-order Gradient Descent ...

Convergence to minima for the continuous version of Backtracking Gradient Descent

The main result of this paper is: Theorem. Let f:R^k→R be a C^1 funct...

Analysis of gradient descent methods with non-diminishing, bounded errors

The main aim of this paper is to provide an analysis of gradient descent...

Learning Unitaries by Gradient Descent

We study the hardness of learning unitary transformations by performing ...

Convergence of online k-means

We prove asymptotic convergence for a general class of k-means algorithm...

Adam Can Converge Without Any Modification on Update Rules

Ever since Reddi et al. 2018 pointed out the divergence issue of Adam, m...

Please sign up or login with your details

Forgot password? Click here to reset