Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

10/07/2020
by   Dylan J. Foster, et al.
10

In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm. Are similar guarantees possible for contextual bandits? While positive results are known for certain special cases, there is no general theory characterizing when and how instance-dependent regret bounds for contextual bandits can be achieved for rich, general classes of policies. We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds. We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case. Finally, we provide structural results that tie together a number of complexity measures previously proposed throughout contextual bandits, reinforcement learning, and active learning and elucidate their role in determining the optimal instance-dependent regret. In a large-scale empirical evaluation, we find that our approach often gives superior results for challenging exploration problems. Turning our focus to reinforcement learning with function approximation, we develop new oracle-efficient algorithms for reinforcement learning with rich observations that obtain optimal gap-dependent sample complexity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2022

Asymptotic Instance-Optimal Algorithms for Interactive Decision Making

Past research on interactive decision making problems (bandits, reinforc...
research
07/04/2021

Robust Restless Bandits: Tackling Interval Uncertainty with Deep Reinforcement Learning

We introduce Robust Restless Bandits, a challenging generalization of re...
research
04/24/2023

Instance-Optimality in Interactive Decision Making: Toward a Non-Asymptotic Theory

We consider the development of adaptive, instance-dependent algorithms f...
research
07/05/2022

Instance-optimal PAC Algorithms for Contextual Bandits

In the stochastic contextual bandit setting, regret-minimizing algorithm...
research
02/03/2023

Multiplier Bootstrap-based Exploration

Despite the great interest in the bandit problem, designing efficient al...
research
05/21/2021

Parallelizing Contextual Linear Bandits

Standard approaches to decision-making under uncertainty focus on sequen...
research
07/05/2021

Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination

A recurring theme in statistical learning, online learning, and beyond i...

Please sign up or login with your details

Forgot password? Click here to reset