Stochastic linear optimization never overfits with quadratically-bounded losses on general data

02/14/2022
by   Matus Telgarsky, et al.
0

This work shows that a diverse collection of linear optimization methods, when run on general data, fail to overfit, despite lacking any explicit constraints or regularization: with high probability, their trajectories stay near the curve of optimal constrained solutions over the population distribution. This analysis is powered by an elementary but flexible proof scheme which can handle many settings, summarized as follows. Firstly, the data can be general: unlike other implicit bias works, it need not satisfy large margin or other structural conditions, and moreover can arrive sequentially IID, sequentially following a Markov chain, as a batch, and lastly it can have heavy tails. Secondly, while the main analysis is for mirror descent, rates are also provided for the Temporal-Difference fixed-point method from reinforcement learning; all prior high probability analyses in these settings required bounded iterates, bounded updates, bounded noise, or some equivalent. Thirdly, the losses are general, and for instance the logistic and squared losses can be handled simultaneously, unlike other implicit bias works. In all of these settings, not only is low population error guaranteed with high probability, but moreover low sample complexity is guaranteed so long as there exists any low-complexity near-optimal solution, even if the global problem structure and in particular global optima have high complexity.

READ FULL TEXT
research
05/13/2013

Boosting with the Logistic Loss is Consistent

This manuscript provides optimization guarantees, generalization bounds,...
research
10/03/2022

High Probability Convergence for Accelerated Stochastic Mirror Descent

In this work, we describe a generic approach to show convergence with hi...
research
07/22/2021

Learning Sparse Fixed-Structure Gaussian Bayesian Networks

Gaussian Bayesian networks (a.k.a. linear Gaussian structural equation m...
research
08/17/2022

High Probability Bounds for Stochastic Subgradient Schemes with Heavy Tailed Noise

In this work we study high probability bounds for stochastic subgradient...
research
02/17/2023

SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance

We study Stochastic Gradient Descent with AdaGrad stepsizes: a popular a...
research
02/07/2022

Bilevel Optimization with a Lower-level Contraction: Optimal Sample Complexity without Warm-Start

We analyze a general class of bilevel problems, in which the upper-level...
research
06/07/2017

Stochastic Global Optimization Algorithms: A Systematic Formal Approach

As we know, some global optimization problems cannot be solved using ana...

Please sign up or login with your details

Forgot password? Click here to reset