U-Clip: On-Average Unbiased Stochastic Gradient Clipping
U-Clip is a simple amendment to gradient clipping that can be applied to any iterative gradient optimization algorithm. Like regular clipping, U-Clip involves using gradients that are clipped to a prescribed size (e.g. with component wise or norm based clipping) but instead of discarding the clipped portion of the gradient, U-Clip maintains a buffer of these values that is added to the gradients on the next iteration (before clipping). We show that the cumulative bias of the U-Clip updates is bounded by a constant. This implies that the clipped updates are unbiased on average. Convergence follows via a lemma that guarantees convergence with updates u_i as long as ∑_i=1^t (u_i - g_i) = o(t) where g_i are the gradients. Extensive experimental exploration is performed on CIFAR10 with further validation given on ImageNet.
READ FULL TEXT