Action Centered Contextual Bandits

11/09/2017
by   Kristjan Greenewald, et al.
0

Contextual bandits have become popular as they offer a middle ground between very simple approaches based on multi-armed bandits and very complex approaches using the full power of reinforcement learning. They have demonstrated success in web applications and have a rich body of associated theoretical guarantees. Linear models are well understood theoretically and preferred by practitioners because they are not only easily interpretable but also simple to implement and debug. Furthermore, if the linear model is true, we get very strong performance guarantees. Unfortunately, in emerging applications in mobile health, the time-invariant linear model assumption is untenable. We provide an extension of the linear model for contextual bandits that has two parts: baseline reward and treatment effect. We allow the former to be complex but keep the latter simple. We argue that this model is plausible for mobile health applications. At the same time, it leads to algorithms with strong performance guarantees as in the linear model setting, while still allowing for complex nonlinear baseline modeling. Our theory is supported by experiments on data gathered in a recently concluded mobile health study.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/15/2018

Balanced Linear Contextual Bandits

Contextual bandit algorithms are sensitive to the estimation method of t...
research
08/21/2020

Offline Contextual Multi-armed Bandits for Mobile Health Interventions: A Case Study on Emotion Regulation

Delivering treatment recommendations via pervasive electronic devices su...
research
03/30/2022

Flexible and Efficient Contextual Bandits with Heterogeneous Treatment Effect Oracle

Many popular contextual bandit algorithms estimate reward models to info...
research
11/08/2021

Universal and data-adaptive algorithms for model selection in linear contextual bandits

Model selection in contextual bandits is an important complementary prob...
research
12/01/2022

AC-Band: A Combinatorial Bandit-Based Approach to Algorithm Configuration

We study the algorithm configuration (AC) problem, in which one seeks to...
research
04/15/2019

Introduction to Multi-Armed Bandits

Multi-armed bandits a simple but very powerful framework for algorithms ...
research
09/15/2021

Estimation of Warfarin Dosage with Reinforcement Learning

In this paper, it has attempted to use Reinforcement learning to model t...

Please sign up or login with your details

Forgot password? Click here to reset