Multi-action Offline Policy Learning with Bayesian Optimization

03/17/2020
by   Fang Cai, et al.
0

We study an offline multi-action policy learning algorithm based on doubly robust estimators from causal inference settings, using argmax linear policy function classes. For general policy classes, we establish the connection of the regret bound with a generalization of the VC dimension in higher dimensions and specialize this to prove optimal regret bounds for the argmax linear function class. We also study various optimization approaches to solving the non-smooth non-convex problem associated with the argmax linear class, including convex relaxation, softmax relaxation, and Bayesian optimization. We find that Bayesian optimization with the Gradient-based Adaptive Stochastic Search (GASS) algorithm consistently outperforms convex relaxation in terms of policy value, and is much faster compared to softmax relaxation. Finally, we apply the algorithms to simulated and warfarin dataset. On the warfarin dataset the offline algorithm trained using only a subset of features achieves state-of-the-art accuracy.

READ FULL TEXT

page 26

page 27

page 28

page 29

page 30

page 31

research
06/02/2023

A Convex Relaxation Approach to Bayesian Regret Minimization in Offline Bandits

Algorithms for offline bandits must optimize decisions in uncertain envi...
research
11/18/2022

Model-based Causal Bayesian Optimization

How should we intervene on an unknown structural causal model to maximiz...
research
10/10/2018

Offline Multi-Action Policy Learning: Generalization and Optimization

In many settings, a decision-maker wishes to learn a rule, or policy, th...
research
10/04/2022

SAM as an Optimal Relaxation of Bayes

Sharpness-aware minimization (SAM) and related adversarial deep-learning...
research
04/01/2019

Bayesian Optimization for Policy Search via Online-Offline Experimentation

Online field experiments are the gold-standard way of evaluating changes...
research
06/27/2020

Overfitting and Optimization in Offline Policy Learning

We consider the task of policy learning from an offline dataset generate...
research
07/31/2023

Adversarial Causal Bayesian Optimization

In Causal Bayesian Optimization (CBO), an agent intervenes on an unknown...

Please sign up or login with your details

Forgot password? Click here to reset