Optimal Learning for Structured Bandits

07/14/2020
by   Bart P. G. Van Parys, et al.
0

We study structured multi-armed bandits, which is the problem of online decision-making under uncertainty in the presence of structural information. In this problem, the decision-maker needs to discover the best course of action despite observing only uncertain rewards over time. The decision-maker is aware of certain structural information regarding the reward distributions and would like to minimize his regret by exploiting this information, where the regret is its performance difference against a benchmark policy which knows the best action ahead of time. In the absence of structural information, the classical UCB and Thomson sampling algorithms are well known to suffer only minimal regret. As recently pointed out, neither algorithms is, however, capable of exploiting structural information which is commonly available in practice. We propose a novel learning algorithm which we call "DUSA" whose worst-case regret matches the information-theoretic regret lower bound up to a constant factor and can handle a wide range of structural information. Our algorithm DUSA solves a dual counterpart of regret lower bound at the empirical reward distribution and follows the suggestion made by the dual problem. Our proposed algorithm is the first computationally viable learning policy for structured bandit problems that suffers asymptotic minimal regret.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2017

Minimal Exploration in Structured Stochastic Bandits

This paper introduces and addresses a wide class of stochastic bandit pr...
research
07/20/2020

Minimax Policy for Heavy-tailed Multi-armed Bandits

We study the stochastic Multi-Armed Bandit (MAB) problem under worst cas...
research
02/04/2021

Transfer Learning in Bandits with Latent Continuity

Structured stochastic multi-armed bandits provide accelerated regret rat...
research
03/24/2021

Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information

We propose a novel algorithm for multi-player multi-armed bandits withou...
research
12/02/2018

Quick Best Action Identification in Linear Bandit Problems

In this paper, we consider a best action identification problem in the s...
research
09/24/2022

Non-monotonic Resource Utilization in the Bandits with Knapsacks Problem

Bandits with knapsacks (BwK) is an influential model of sequential decis...
research
06/13/2011

From Bandits to Experts: On the Value of Side-Observations

We consider an adversarial online learning setting where a decision make...

Please sign up or login with your details

Forgot password? Click here to reset