-
More Adaptive Algorithms for Adversarial Bandits
We develop a novel and generic algorithm for the adversarial multi-armed...
read it
-
An Optimal Algorithm for Adversarial Bandits with Arbitrary Delays
We propose a new algorithm for adversarial multi-armed bandits with unre...
read it
-
Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits
We study agents communicating over an underlying network by exchanging m...
read it
-
Small-loss bounds for online learning with partial information
We consider the problem of adversarial (non-stochastic) online learning ...
read it
-
Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs
We develop a new approach to obtaining high probability regret bounds fo...
read it
-
Improved Confidence Bounds for the Linear Logistic Model and Applications to Linear Bandits
We propose improved fixed-design confidence bounds for the linear logist...
read it
-
Stochastic Online Learning with Probabilistic Graph Feedback
We consider a problem of stochastic online learning with general probabi...
read it
A Closer Look at Small-loss Bounds for Bandits with Graph Feedback
We study small-loss bounds for the adversarial multi-armed bandits problem with graph feedback, that is, adaptive regret bounds that depend on the loss of the best arm or related quantities, instead of the total number of rounds. We derive the first small-loss bound for general strongly observable graphs, resolving an open problem proposed in (Lykouris et al., 2018). Specifically, we develop an algorithm with regret Õ(√(κ L_*)) where κ is the clique partition number and L_* is the loss of the best arm, and for the special case where every arm has a self-loop, we improve the regret to Õ(min{√(α T), √(κ L_*)}) where α≤κ is the independence number. Our results significantly improve and extend those by Lykouris et al. (2018) who only consider self-aware undirected graphs. Furthermore, we also take the first attempt at deriving small-loss bounds for weakly observable graphs. We first prove that no typical small-loss bounds are achievable in this case, and then propose algorithms with alternative small-loss bounds in terms of the loss of some specific subset of arms. A surprising side result is that Õ(√(T)) regret is achievable even for weakly observable graphs as long as the best arm has a self-loop. Our algorithms are based on the Online Mirror Descent framework but require a suite of novel techniques that might be of independent interest. Moreover, all our algorithms can be made parameter-free without the knowledge of the environment.
READ FULL TEXT
Comments
There are no comments yet.