Adapting multi-armed bandits policies to contextual bandits scenarios

11/11/2018
by   David Cortes, et al.
4

This work explores adaptations of successful multi-armed bandits policies to the online contextual bandits scenario with binary rewards using binary classification algorithms such as logistic regression as black-box oracles. Some of these adaptations are achieved through bootstrapping or approximate bootstrapping, while others rely on other forms of randomness, resulting in more scalable approaches than previous works, and the ability to work with any type of classification algorithm. In particular, the Adaptive-Greedy algorithm shows a lot of promise, in many cases achieving better performance than upper confidence bound and Thompson sampling strategies, at the expense of more hyperparameters to tune.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2013

Exponentiated Gradient LINUCB for Contextual Multi-Armed Bandits

We present Exponentiated Gradient LINUCB, an algorithm for con-textual m...
research
07/25/2018

Deep Contextual Multi-armed Bandits

Contextual multi-armed bandit problems arise frequently in important ind...
research
02/19/2021

Output-Weighted Sampling for Multi-Armed Bandits with Extreme Payoffs

We present a new type of acquisition functions for online decision makin...
research
07/19/2021

An Analysis of Reinforcement Learning for Malaria Control

Previous work on policy learning for Malaria control has often formulate...
research
04/15/2019

Introduction to Multi-Armed Bandits

Multi-armed bandits a simple but very powerful framework for algorithms ...
research
01/14/2022

Application of Multi-Armed Bandits to Model-assisted designs for Dose-Finding Clinical Trials

We consider applying multi-armed bandits to model-assisted designs for d...
research
07/27/2020

Greedy Bandits with Sampled Context

Bayesian strategies for contextual bandits have proved promising in sing...

Please sign up or login with your details

Forgot password? Click here to reset