Simple Regret Minimization for Contextual Bandits

10/17/2018
by   Aniket Anand Deshmukh, et al.
0

There are two variants of the classical multi-armed bandit (MAB) problem that have received considerable attention from machine learning researchers in recent years: contextual bandits and simple regret minimization. Contextual bandits are a sub-class of MABs where, at every time step, the learner has access to side information that is predictive of the best arm. Simple regret minimization assumes that the learner only incurs regret after a pure exploration phase. In this work, we study simple regret minimization for contextual bandits. Motivated by applications where the learner has separate training and autonomous modes, we assume that, the learner experiences a pure exploration phase, where feedback is received after every action but no regret is incurred, followed by a pure exploitation phase in which regret is incurred but there is no feedback. We present the Contextual-Gap algorithm and establish performance guarantees on the simple regret, i.e., the regret during the pure exploitation phase. Our experiments examine a novel application to adaptive sensor selection for magnetic field estimation in interplanetary spacecraft, and demonstrate considerable improvement over algorithms designed to minimize the cumulative regret.

READ FULL TEXT

page 2

page 3

research
02/20/2020

Regret Minimization in Stochastic Contextual Dueling Bandits

We consider the problem of stochastic K-armed dueling bandit in the cont...
research
12/10/2018

Duelling Bandits with Weak Regret in Adversarial Environments

Research on the multi-armed bandit problem has studied the trade-off of ...
research
02/01/2022

Regret Minimization with Performative Feedback

In performative prediction, the deployment of a predictive model trigger...
research
02/09/2016

Compliance-Aware Bandits

Motivated by clinical trials, we study bandits with observable non-compl...
research
09/11/2019

Combinatorial Bandits for Sequential Learning in Colonel Blotto Games

The Colonel Blotto game is a renowned resource allocation problem with a...
research
06/19/2022

Nested bandits

In many online decision processes, the optimizing agent is called to cho...
research
07/05/2023

Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization

Simple regret minimization is a critical problem in learning optimal tre...

Please sign up or login with your details

Forgot password? Click here to reset