Regret Distribution in Stochastic Bandits: Optimal Trade-off between Expectation and Tail Risk

04/10/2023
by   David Simchi-Levi, et al.
0

We study the trade-off between expectation and tail risk for regret distribution in the stochastic multi-armed bandit problem. We fully characterize the interplay among three desired properties for policy design: worst-case optimality, instance-dependent consistency, and light-tailed risk. We show how the order of expected regret exactly affects the decaying rate of the regret tail probability for both the worst-case and instance-dependent scenario. A novel policy is proposed to characterize the optimal regret tail probability for any regret threshold. Concretely, for any given α∈[1/2, 1) and β∈[0, α], our policy achieves a worst-case expected regret of Õ(T^α) (we call it α-optimal) and an instance-dependent expected regret of Õ(T^β) (we call it β-consistent), while enjoys a probability of incurring an Õ(T^δ) regret (δ≥α in the worst-case scenario and δ≥β in the instance-dependent scenario) that decays exponentially with a polynomial T term. Such decaying rate is proved to be best achievable. Moreover, we discover an intrinsic gap of the optimal tail rate under the instance-dependent scenario between whether the time horizon T is known a priori or not. Interestingly, when it comes to the worst-case scenario, this gap disappears. Finally, we extend our proposed policy design to (1) a stochastic multi-armed bandit setting with non-stationary baseline rewards, and (2) a stochastic linear bandit setting. Our results reveal insights on the trade-off between regret expectation and regret tail risk for both worst-case and instance-dependent scenarios, indicating that more sub-optimality and inconsistency leave space for more light-tailed risk of incurring a large regret, and that knowing the planning horizon in advance can make a difference on alleviating tail risks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2022

A Simple and Optimal Policy Design with Safety against Heavy-tailed Risk for Multi-armed Bandits

We design new policies that ensure both worst-case optimality for expect...
research
07/20/2020

Minimax Policy for Heavy-tailed Multi-armed Bandits

We study the stochastic Multi-Armed Bandit (MAB) problem under worst cas...
research
10/09/2019

Robust Monopoly Regulation

We study the regulation of a monopolistic firm using a robust-design app...
research
06/03/2021

A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms

One of the key drivers of complexity in the classical (stochastic) multi...
research
06/22/2020

Adaptive Discretization for Adversarial Bandits with Continuous Action Spaces

Lipschitz bandits is a prominent version of multi-armed bandits that stu...
research
07/17/2018

Continuous Assortment Optimization with Logit Choice Probabilities under Incomplete Information

We consider assortment optimization in relation to a product for which a...
research
02/14/2022

The Impact of Batch Learning in Stochastic Linear Bandits

We consider a special case of bandit problems, named batched bandits, in...

Please sign up or login with your details

Forgot password? Click here to reset