Estimation Considerations in Contextual Bandits

11/19/2017
by   Maria Dimakopoulou, et al.
0

Contextual bandit algorithms seek to learn a personalized treatment assignment policy, balancing exploration against exploitation. Although a number of algorithms have been proposed, there is little guidance available for applied researchers to select among various approaches. Motivated by the econometrics and statistics literatures on causal effects estimation, we study a new consideration to the exploration vs. exploitation framework, which is that the way exploration is conducted in the present may contribute to the bias and variance in the potential outcome model estimation in subsequent stages of learning. We leverage parametric and non-parametric statistical estimation methods and causal effect estimation methods in order to propose new contextual bandit designs. Through a variety of simulations, we show how alternative design choices impact the learning performance and provide insights on why we observe these effects.

READ FULL TEXT

page 9

page 10

page 12

page 13

page 18

research
12/15/2018

Balanced Linear Contextual Bandits

Contextual bandit algorithms are sensitive to the estimation method of t...
research
06/01/2023

Causal Estimation of User Learning in Personalized Systems

In online platforms, the impact of a treatment on an observed outcome ma...
research
02/12/2018

Practical Evaluation and Optimization of Contextual Bandit Algorithms

We study and empirically optimize contextual bandit learning, exploratio...
research
08/21/2023

Graph Neural Bandits

Contextual bandits algorithms aim to choose the optimal arm with the hig...
research
04/28/2017

Exploiting the Natural Exploration In Contextual Bandits

The contextual bandit literature has traditionally focused on algorithms...
research
06/14/2017

A Practical Method for Solving Contextual Bandit Problems Using Decision Trees

Many efficient algorithms with strong theoretical guarantees have been p...
research
08/07/2023

Provably Efficient Learning in Partially Observable Contextual Bandit

In this paper, we investigate transfer learning in partially observable ...

Please sign up or login with your details

Forgot password? Click here to reset