DeepAI AI Chat
Log In Sign Up

Semi-Parametric Efficient Policy Learning with Continuous Actions

by   Mert Demirer, et al.

We consider off-policy evaluation and optimization with continuous action spaces. We focus on observational data where the data collection policy is unknown and needs to be estimated. We take a semi-parametric approach where the value function takes a known parametric form in the treatment, but we are agnostic on how it depends on the observed contexts. We propose a doubly robust off-policy estimate for this setting and show that off-policy optimization based on this estimate is robust to estimation errors of the policy function or the regression model. Our results also apply if the model does not satisfy our semi-parametric form, but rather we measure regret in terms of the best projection of the true value function to this functional space. Our work extends prior approaches of policy optimization from observational data that only considered discrete actions. We provide an experimental evaluation of our method in a synthetic data example motivated by optimal personalized pricing and costly resource allocation.


page 1

page 2

page 3

page 4


Deep Jump Q-Evaluation for Offline Policy Evaluation in Continuous Action Space

We consider off-policy evaluation (OPE) in continuous action domains, su...

First-order Policy Optimization for Robust Policy Evaluation

We adopt a policy optimization viewpoint towards policy evaluation for r...

Safe Policy Learning under Regression Discontinuity Designs

The regression discontinuity (RD) design is widely used for program eval...

Efficient Policy Learning from Surrogate-Loss Classification Reductions

Recent work on policy learning from observational data has highlighted t...

Policy Evaluation and Optimization with Continuous Treatments

We study the problem of policy evaluation and learning from batched cont...

Discussion of Kallus (2020) and Mo, Qi, and Liu (2020): New Objectives for Policy Learning

We discuss the thought-provoking new objective functions for policy lear...

Rejoinder: Learning Optimal Distributionally Robust Individualized Treatment Rules

We thank the opportunity offered by editors for this discussion and the ...