Adaptive Online Learning with Varying Norms
Given any increasing sequence of norms ·_0,...,·_T-1, we provide an online convex optimization algorithm that outputs points w_t in some domain W in response to convex losses ℓ_t:W→R that guarantees regret R_T(u)=∑_t=1^T ℓ_t(w_t)-ℓ_t(u)<Õ(u_T-1√(∑_t=1^T g_t_t-1,^2)) where g_t is a subgradient of ℓ_t at w_t. Our method does not require tuning to the value of u and allows for arbitrary convex W. We apply this result to obtain new "full-matrix"-style regret bounds. Along the way, we provide a new examination of the full-matrix AdaGrad algorithm, suggesting a better learning rate value that improves significantly upon prior analysis. We use our new techniques to tune AdaGrad on-the-fly, realizing our improved bound in a concrete algorithm.
READ FULL TEXT