Bandit algorithms: Letting go of logarithmic regret for statistical robustness

06/22/2020
by   Kumar Ashutosh, et al.
0

We study regret minimization in a stochastic multi-armed bandit setting and establish a fundamental trade-off between the regret suffered under an algorithm, and its statistical robustness. Considering broad classes of underlying arms' distributions, we show that bandit learning algorithms with logarithmic regret are always inconsistent and that consistent learning algorithms always suffer a super-logarithmic regret. This result highlights the inevitable statistical fragility of all `logarithmic regret' bandit algorithms available in the literature—for instance, if a UCB algorithm designed for σ-subGaussian distributions is used in a subGaussian setting with a mismatched variance parameter, the learning performance could be inconsistent. Next, we show a positive result: statistically robust and consistent learning performance is attainable if we allow the regret to be slightly worse than logarithmic. Specifically, we propose three classes of distribution oblivious algorithms that achieve an asymptotic regret that is arbitrarily close to logarithmic.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2011

Regret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem

This paper is devoted to regret lower bounds in the classical model of s...
research
11/18/2021

From Optimality to Robustness: Dirichlet Sampling Strategies in Stochastic Bandits

The stochastic multi-arm bandit problem has been extensively studied und...
research
06/17/2020

Constrained regret minimization for multi-criterion multi-armed bandits

We consider a stochastic multi-armed bandit setting and study the proble...
research
03/04/2019

Learning Modular Safe Policies in the Bandit Setting with Application to Adaptive Clinical Trials

The stochastic multi-armed bandit problem is a well-known model for stud...
research
09/22/2021

On Optimal Robustness to Adversarial Corruption in Online Decision Problems

This paper considers two fundamental sequential decision-making problems...
research
03/29/2021

Distributed learning in congested environments with partial information

How can non-communicating agents learn to share congested resources effi...
research
02/19/2021

Learning to Persuade on the Fly: Robustness Against Ignorance

We study a repeated persuasion setting between a sender and a receiver, ...

Please sign up or login with your details

Forgot password? Click here to reset