Exploiting the Natural Exploration In Contextual Bandits

04/28/2017
by   Hamsa Bastani, et al.
0

The contextual bandit literature has traditionally focused on algorithms that address the exploration-exploitation trade-off. In particular, greedy policies that exploit current estimates without any exploration may be sub-optimal in general. However, exploration-free greedy policies are desirable in many practical settings where exploration may be prohibitively costly or unethical (e.g. clinical trials). We prove that, for a general class of context distributions, the greedy policy benefits from a natural exploration obtained from the varying contexts and becomes asymptotically rate-optimal for the two-armed contextual bandit. Through simulations, we also demonstrate that these results generalize to more than two arms if the dimension of contexts is large enough. Motivated by these results, we introduce Greedy-First, a new algorithm that uses only observed contexts and rewards to determine whether to follow a greedy policy or to explore. We prove that this algorithm is asymptotically optimal without any additional assumptions on the distribution of contexts or the number of arms. Extensive simulations demonstrate that Greedy-First successfully reduces experimentation and outperforms existing (exploration-based) contextual bandit algorithms such as Thompson sampling, UCB, or ϵ-greedy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2019

Meta-Learning for Contextual Bandit Exploration

We describe MELEE, a meta-learning algorithm for learning a good explora...
research
02/12/2018

Practical Evaluation and Optimization of Contextual Bandit Algorithms

We study and empirically optimize contextual bandit learning, exploratio...
research
12/15/2022

Ungeneralizable Contextual Logistic Bandit in Credit Scoring

The application of reinforcement learning in credit scoring has created ...
research
01/10/2018

A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem

Bandit learning is characterized by the tension between long-term explor...
research
06/19/2023

High-dimensional Contextual Bandit Problem without Sparsity

In this research, we investigate the high-dimensional linear contextual ...
research
10/15/2019

Adaptive Exploration in Linear Contextual Bandit

Contextual bandits serve as a fundamental model for many sequential deci...
research
11/19/2017

Estimation Considerations in Contextual Bandits

Contextual bandit algorithms seek to learn a personalized treatment assi...

Please sign up or login with your details

Forgot password? Click here to reset