Old Dog Learns New Tricks: Randomized UCB for Bandit Problems

10/11/2019
by   Sharan Vaswani, et al.
0

We propose RandUCB, a bandit strategy that uses theoretically derived confidence intervals similar to upper confidence bound (UCB) algorithms, but akin to Thompson sampling (TS), uses randomization to trade off exploration and exploitation. In the K-armed bandit setting, we show that there are infinitely many variants of RandUCB, all of which achieve the minimax-optimal O(√(K T)) regret after T rounds. Moreover, in a specific multi-armed bandit setting, we show that both UCB and TS can be recovered as special cases of RandUCB. For structured bandits, where each arm is associated with a d-dimensional feature vector and rewards are distributed according to a linear or generalized linear model, we prove that RandUCB achieves the minimax-optimal O(d √(T)) regret even in the case of infinite arms. We demonstrate the practical effectiveness of RandUCB with experiments in both the multi-armed and structured bandit settings. Our results illustrate that RandUCB matches the empirical performance of TS while obtaining the theoretically optimal regret bounds of UCB algorithms, thus achieving the best of both worlds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/17/2018

Correlated Multi-armed Bandits with a Latent Random Source

We consider a novel multi-armed bandit framework where the rewards obtai...
research
12/13/2022

Towards Efficient and Domain-Agnostic Evasion Attack with High-dimensional Categorical Inputs

Our work targets at searching feasible adversarial perturbation to attac...
research
02/15/2023

Bandit Social Learning: Exploration under Myopic Behavior

We study social learning dynamics where the agents collectively follow a...
research
06/03/2021

A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms

One of the key drivers of complexity in the classical (stochastic) multi...
research
05/12/2018

Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models

In this paper we consider the dynamic assortment selection problem under...
research
03/03/2017

Contextual Multi-armed Bandits under Feature Uncertainty

We study contextual multi-armed bandit problems under linear realizabili...
research
01/31/2022

Generalized Bayesian Upper Confidence Bound with Approximate Inference for Bandit Problems

Bayesian bandit algorithms with approximate inference have been widely u...

Please sign up or login with your details

Forgot password? Click here to reset