Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits

06/13/2023
by   Lequn Wang, et al.
0

We consider policy optimization in contextual bandits, where one is given a fixed dataset of logged interactions. While pessimistic regularizers are typically used to mitigate distribution shift, prior implementations thereof are not computationally efficient. We present the first oracle-efficient algorithm for pessimistic policy optimization: it reduces to supervised learning, leading to broad applicability. We also obtain best-effort statistical guarantees analogous to those for pessimistic approaches in prior work. We instantiate our approach for both discrete and continuous actions. We perform extensive experiments in both settings, showing advantage over unregularized policy optimization across a wide range of configurations.

READ FULL TEXT

page 21

page 22

page 23

research
06/10/2020

Efficient Contextual Bandits with Continuous Actions

We create a computationally tractable algorithm for contextual bandits w...
research
07/12/2021

Adapting to Misspecification in Contextual Bandits

A major research direction in contextual bandits is to develop algorithm...
research
02/18/2020

Adaptive Estimator Selection for Off-Policy Evaluation

We develop a generic data-driven method for estimator selection in off-p...
research
11/27/2021

Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization

Offline policy learning (OPL) leverages existing data collected a priori...
research
03/05/2020

Generalized Policy Elimination: an efficient algorithm for Nonparametric Contextual Bandits

We propose the Generalized Policy Elimination (GPE) algorithm, an oracle...
research
10/24/2022

PAC-Bayesian Offline Contextual Bandits With Guarantees

This paper introduces a new principled approach for offline policy optim...
research
10/24/2022

Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions

We consider local kernel metric learning for off-policy evaluation (OPE)...

Please sign up or login with your details

Forgot password? Click here to reset