A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem

01/10/2018
by   Sampath Kannan, et al.
0

Bandit learning is characterized by the tension between long-term exploration and short-term exploitation. However, as has recently been noted, in settings in which the choices of the learning algorithm correspond to important decisions about individual people (such as criminal recidivism prediction, lending, and sequential drug trials), exploration corresponds to explicitly sacrificing the well-being of one individual for the potential future benefit of others. This raises a fairness concern. In such settings, one might like to run a "greedy" algorithm, which always makes the (myopically) optimal decision for the individuals at hand - but doing this can result in a catastrophic failure to learn. In this paper, we consider the linear contextual bandit problem and revisit the performance of the greedy algorithm. We give a smoothed analysis, showing that even when contexts may be chosen by an adversary, small perturbations of the adversary's choices suffice for the algorithm to achieve "no regret", perhaps (depending on the specifics of the setting) with a constant amount of initial training data. This suggests that "generically" (i.e. in slightly perturbed environments), exploration and exploitation need not be in conflict in the linear setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/28/2017

Exploiting the Natural Exploration In Contextual Bandits

The contextual bandit literature has traditionally focused on algorithms...
research
02/26/2020

Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis

Bandit learning algorithms typically involve the balance of exploration ...
research
05/19/2020

Greedy Algorithm almost Dominates in Smoothed Contextual Bandits

Online learning algorithms, widely used to power search and content opti...
research
07/16/2020

A Smoothed Analysis of Online Lasso for the Sparse Linear Contextual Bandit Problem

We investigate the sparse linear contextual bandit problem where the par...
research
03/17/2021

Homomorphically Encrypted Linear Contextual Bandit

Contextual bandit is a general framework for online learning in sequenti...
research
06/01/2018

The Externalities of Exploration and How Data Diversity Helps Exploitation

Online learning algorithms, widely used to power search and content opti...
research
06/19/2023

High-dimensional Contextual Bandit Problem without Sparsity

In this research, we investigate the high-dimensional linear contextual ...

Please sign up or login with your details

Forgot password? Click here to reset