Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization

Simple regret minimization is a critical problem in learning optimal treatment assignment policies across various domains, including healthcare and e-commerce. However, it remains understudied in the contextual bandit setting. We propose a new family of computationally efficient bandit algorithms for the stochastic contextual bandit settings, with the flexibility to be adapted for cumulative regret minimization (with near-optimal minimax guarantees) and simple regret minimization (with SOTA guarantees). Furthermore, our algorithms adapt to model misspecification and extend to the continuous arm settings. These advantages come from constructing and relying on "conformal arm sets" (CASs), which provide a set of arms at every context that encompass the context-specific optimal arm with some probability across the context distribution. Our positive results on simple and cumulative regret guarantees are contrasted by a negative result, which shows that an algorithm can't achieve instance-dependent simple regret guarantees while simultaneously achieving minimax optimal cumulative regret guarantees.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/05/2022

Instance-optimal PAC Algorithms for Contextual Bandits

In the stochastic contextual bandit setting, regret-minimizing algorithm...
research
02/23/2020

Survey Bandits with Regret Guarantees

We consider a variant of the contextual bandit problem. In standard cont...
research
11/22/2022

Contextual Bandits in a Survey Experiment on Charitable Giving: Within-Experiment Outcomes versus Policy Learning

We design and implement an adaptive experiment (a “contextual bandit”) t...
research
10/17/2018

Simple Regret Minimization for Contextual Bandits

There are two variants of the classical multi-armed bandit (MAB) problem...
research
02/15/2018

Bandit Learning with Positive Externalities

Many platforms are characterized by the fact that future user arrivals a...
research
10/17/2016

Risk-Aware Algorithms for Adversarial Contextual Bandits

In this work we consider adversarial contextual bandits with risk constr...
research
03/30/2021

Optimal Stochastic Nonconvex Optimization with Bandit Feedback

In this paper, we analyze the continuous armed bandit problems for nonco...

Please sign up or login with your details

Forgot password? Click here to reset