Adaptive Discretization for Adversarial Bandits with Continuous Action Spaces

06/22/2020
by   Chara Podimata, et al.
0

Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces such as the [0,1] interval, where similar actions are guaranteed to have similar rewards. A central theme here is the adaptive discretization of the action space, which gradually "zooms in" on the more promising regions thereof. The goal is to take advantage of "nicer" problem instances, while retaining near-optimal worst-case performance. While the stochastic version of the problem is well-understood, the general version with adversarially chosen rewards is not. We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds. In particular, we recover the worst-case optimal regret bound for the adversarial version, and the instance-dependent regret bound for the stochastic version. Further, an application of our algorithm to dynamic pricing (a version in which the algorithm repeatedly adjusts prices for a product) enjoys these regret bounds without any smoothness assumptions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2020

Advances in Bandits with Knapsacks

"Bandits with Knapsacks" () is a general model for multi-armed bandits u...
research
10/17/2019

Adaptive Discretization for Episodic Reinforcement Learning in Metric Spaces

We present an efficient algorithm for model-free episodic reinforcement ...
research
04/14/2020

Improved Sleeping Bandits with Stochastic Actions Sets and Adversarial Rewards

In this paper, we consider the problem of sleeping bandits with stochast...
research
04/10/2023

Regret Distribution in Stochastic Bandits: Optimal Trade-off between Expectation and Tail Risk

We study the trade-off between expectation and tail risk for regret dist...
research
11/28/2018

Adversarial Bandits with Knapsacks

We consider Bandits with Knapsacks (henceforth, BwK), a general model fo...
research
02/05/2019

Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting

We study contextual bandit learning with an abstract policy class and co...
research
03/09/2020

Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces

Despite the wealth of research into provably efficient reinforcement lea...

Please sign up or login with your details

Forgot password? Click here to reset