-
Small-loss bounds for online learning with partial information
We consider the problem of adversarial (non-stochastic) online learning ...
read it
-
Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems
The analysis of online least squares estimation is at the heart of many ...
read it
-
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition
We consider the problem of learning in episodic finite-horizon Markov de...
read it
-
Limitations of adversarial robustness: strong No Free Lunch Theorem
This manuscript presents some new results on adversarial robustness in m...
read it
-
Confidently Comparing Estimators with the c-value
Modern statistics provides an ever-expanding toolkit for estimating unkn...
read it
-
Logistic Regression: The Importance of Being Improper
Learning linear predictors with the logistic loss---both in stochastic a...
read it
-
A Closer Look at Small-loss Bounds for Bandits with Graph Feedback
We study small-loss bounds for the adversarial multi-armed bandits probl...
read it
Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs
We develop a new approach to obtaining high probability regret bounds for online learning with bandit feedback against an adaptive adversary. While existing approaches all require carefully constructing optimistic and biased loss estimators, our approach uses standard unbiased estimators and relies on a simple increasing learning rate schedule, together with the help of logarithmically homogeneous self-concordant barriers and a strengthened Freedman's inequality. Besides its simplicity, our approach enjoys several advantages. First, the obtained high-probability regret bounds are data-dependent and could be much smaller than the worst-case bounds, which resolves an open problem asked by Neu (2015). Second, resolving another open problem of Bartlett et al. (2008) and Abernethy and Rakhlin (2009), our approach leads to the first general and efficient algorithm with a high-probability regret bound for adversarial linear bandits, while previous methods are either inefficient or only applicable to specific action sets. Finally, our approach can also be applied to learning adversarial Markov Decision Processes and provides the first algorithm with a high-probability small-loss bound for this problem.
READ FULL TEXT
Comments
There are no comments yet.