Instance-optimal PAC Algorithms for Contextual Bandits

07/05/2022
by   Zhaoqi Li, et al.
0

In the stochastic contextual bandit setting, regret-minimizing algorithms have been extensively researched, but their instance-minimizing best-arm identification counterparts remain seldom studied. In this work, we focus on the stochastic bandit problem in the (ϵ,δ)-PAC setting: given a policy class Π the goal of the learner is to return a policy π∈Π whose expected reward is within ϵ of the optimal policy with probability greater than 1-δ. We characterize the first instance-dependent PAC sample complexity of contextual bandits through a quantity ρ_Π, and provide matching upper and lower bounds in terms of ρ_Π for the agnostic and linear contextual best-arm identification settings. We show that no algorithm can be simultaneously minimax-optimal for regret minimization and instance-dependent PAC for best-arm identification. Our main result is a new instance-optimal and computationally efficient algorithm that relies on a polynomial number of calls to an argmax oracle.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2023

Context-lumpable stochastic bandits

We consider a contextual bandit problem with S contexts and A actions. I...
research
07/05/2023

Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization

Simple regret minimization is a critical problem in learning optimal tre...
research
10/09/2018

Bridging the gap between regret minimization and best arm identification, with application to A/B tests

State of the art online learning procedures focus either on selecting th...
research
01/06/2022

Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach

Controlling antenna tilts in cellular networks is imperative to reach an...
research
07/30/2020

A PAC algorithm in relative precision for bandit problem with costly sampling

This paper considers the problem of maximizing an expectation function o...
research
10/07/2020

Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

In the classical multi-armed bandit problem, instance-dependent algorith...
research
12/12/2019

Sublinear Optimal Policy Value Estimation in Contextual Bandits

We study the problem of estimating the expected reward of the optimal po...

Please sign up or login with your details

Forgot password? Click here to reset