Risk and optimal policies in bandit experiments

12/13/2021
by   Karun Adusumilli, et al.
0

This paper provides a decision theoretic analysis of bandit experiments. The bandit setting corresponds to a dynamic programming problem, but solving this directly is typically infeasible. Working within the framework of diffusion asymptotics, we define a suitable notion of asymptotic Bayes risk for bandit settings. For normally distributed rewards, the minimal Bayes risk can be characterized as the solution to a nonlinear second-order partial differential equation (PDE). Using a limit of experiments approach, we show that this PDE characterization also holds asymptotically under both parametric and non-parametric distribution of the rewards. The approach further describes the state variables it is asymptotically sufficient to restrict attention to, and therefore suggests a practical strategy for dimension reduction. The upshot is that we can approximate the dynamic programming problem defining the bandit setting with a PDE which can be efficiently solved using sparse matrix routines. We derive near-optimal policies from the numerical solutions to these equations. The proposed policies substantially dominate existing methods such Thompson sampling. The framework also allows for substantial generalizations to the bandit problem such as time discounting and pure exploration motives.

READ FULL TEXT
research
05/24/2018

New Insights into Bootstrapping for Bandits

We investigate the use of bootstrapping in the bandit setting. We first ...
research
05/18/2012

Thompson Sampling: An Asymptotically Optimal Finite Time Analysis

The question of the optimality of Thompson Sampling for solving the stoc...
research
02/11/2022

A PDE-Based Analysis of the Symmetric Two-Armed Bernoulli Bandit

This work addresses a version of the two-armed Bernoulli bandit problem ...
research
07/11/2023

Reliable optimal controls for SEIR models in epidemiology

We present and compare two different optimal control approaches applied ...
research
10/25/2021

Deterministic particle flows for constraining SDEs

Devising optimal interventions for diffusive systems often requires the ...
research
02/03/2023

An Asymptotically Optimal Algorithm for the One-Dimensional Convex Hull Feasibility Problem

This work studies the pure-exploration setting for the convex hull feasi...

Please sign up or login with your details

Forgot password? Click here to reset