Structure Adaptive Algorithms for Stochastic Bandits

07/02/2020
by   Rémy Degenne, et al.
0

We study reward maximisation in a wide class of structured stochastic multi-armed bandit problems, where the mean rewards of arms satisfy some given structural constraints, e.g. linear, unimodal, sparse, etc. Our aim is to develop methods that are flexible (in that they easily adapt to different structures), powerful (in that they perform well empirically and/or provably match instance-dependent lower bounds) and efficient in that the per-round computational burden is small. We develop asymptotically optimal algorithms from instance-dependent lower-bounds using iterative saddle-point solvers. Our approach generalises recent iterative methods for pure exploration to reward maximisation, where a major challenge arises from the estimation of the sub-optimality gaps and their reciprocals. Still we manage to achieve all the above desiderata. Notably, our technique avoids the computational cost of the full-blown saddle point oracle employed by previous work, while at the same time enabling finite-time regret bounds. Our experiments reveal that our method successfully leverages the structural assumptions, while its regret is at worst comparable to that of vanilla UCB.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/13/2020

Tight Lower Bounds for Combinatorial Multi-Armed Bandits

The Combinatorial Multi-Armed Bandit problem is a sequential decision-ma...
research
06/28/2021

Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

We introduce the "inverse bandit" problem of estimating the rewards of a...
research
10/18/2018

Exploiting Correlation in Finite-Armed Structured Bandits

We consider a correlated multi-armed bandit problem in which rewards of ...
research
06/25/2019

Non-Asymptotic Pure Exploration by Solving Games

Pure exploration (aka active testing) is the fundamental task of sequent...
research
07/07/2020

Optimal Strategies for Graph-Structured Bandits

We study a structured variant of the multi-armed bandit problem specifie...
research
06/15/2020

Crush Optimism with Pessimism: Structured Bandits Beyond Asymptotic Optimality

We study stochastic structured bandits for minimizing regret. The fact t...
research
05/23/2023

Disincentivizing Polarization in Social Networks

On social networks, algorithmic personalization drives users into filter...

Please sign up or login with your details

Forgot password? Click here to reset