Semi-Parametric Efficient Policy Learning with Continuous Actions

05/24/2019
by   Mert Demirer, et al.
0

We consider off-policy evaluation and optimization with continuous action spaces. We focus on observational data where the data collection policy is unknown and needs to be estimated. We take a semi-parametric approach where the value function takes a known parametric form in the treatment, but we are agnostic on how it depends on the observed contexts. We propose a doubly robust off-policy estimate for this setting and show that off-policy optimization based on this estimate is robust to estimation errors of the policy function or the regression model. Our results also apply if the model does not satisfy our semi-parametric form, but rather we measure regret in terms of the best projection of the true value function to this functional space. Our work extends prior approaches of policy optimization from observational data that only considered discrete actions. We provide an experimental evaluation of our method in a synthetic data example motivated by optimal personalized pricing and costly resource allocation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2020

Deep Jump Q-Evaluation for Offline Policy Evaluation in Continuous Action Space

We consider off-policy evaluation (OPE) in continuous action domains, su...
research
07/29/2023

First-order Policy Optimization for Robust Policy Evaluation

We adopt a policy optimization viewpoint towards policy evaluation for r...
research
08/29/2022

Safe Policy Learning under Regression Discontinuity Designs

The regression discontinuity (RD) design is widely used for program eval...
research
02/12/2020

Efficient Policy Learning from Surrogate-Loss Classification Reductions

Recent work on policy learning from observational data has highlighted t...
research
02/16/2018

Policy Evaluation and Optimization with Continuous Treatments

We study the problem of policy evaluation and learning from batched cont...
research
10/09/2020

Discussion of Kallus (2020) and Mo, Qi, and Liu (2020): New Objectives for Policy Learning

We discuss the thought-provoking new objective functions for policy lear...
research
10/17/2021

Rejoinder: Learning Optimal Distributionally Robust Individualized Treatment Rules

We thank the opportunity offered by editors for this discussion and the ...

Please sign up or login with your details

Forgot password? Click here to reset