Policy Evaluation and Optimization with Continuous Treatments

02/16/2018
by   Nathan Kallus, et al.
0

We study the problem of policy evaluation and learning from batched contextual bandit data when treatments are continuous, going beyond previous work on discrete treatments. Previous work for discrete treatment/action spaces focuses on inverse probability weighting (IPW) and doubly robust (DR) methods that use a rejection sampling approach for evaluation and the equivalent weighted classification problem for learning. In the continuous setting, this reduction fails as we would almost surely reject all observations. To tackle the case of continuous treatments, we extend the IPW and DR approaches to the continuous setting using a kernel function that leverages treatment proximity to attenuate discrete rejection. Our policy estimator is consistent and we characterize the optimal bandwidth. The resulting continuous policy optimizer (CPO) approach using our estimator achieves convergent regret and approaches the best-in-class policy for learnable policy classes. We demonstrate that the estimator performs well and, in particular, outperforms a discretization-based benchmark. We further study the performance of our policy optimizer in a case study on personalized dosing based on a dataset of Warfarin patients, their covariates, and final therapeutic doses. Our learned policy outperforms benchmarks and nears the oracle-best linear policy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2019

Balanced Off-Policy Evaluation General Action Spaces

In many practical applications of contextual bandits, online learning is...
research
06/09/2019

Balanced Off-Policy Evaluation in General Action Spaces

In many practical applications of contextual bandits, online learning is...
research
12/18/2021

Off-Policy Evaluation Using Information Borrowing and Context-Based Switching

We consider the off-policy evaluation (OPE) problem in contextual bandit...
research
05/24/2019

Semi-Parametric Efficient Policy Learning with Continuous Actions

We consider off-policy evaluation and optimization with continuous actio...
research
10/24/2022

Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions

We consider local kernel metric learning for off-policy evaluation (OPE)...
research
05/19/2020

Treatment recommendation with distributional targets

We study the problem of a decision maker who must provide the best possi...
research
08/31/2016

Recursive Partitioning for Personalization using Observational Data

We study the problem of learning to choose from m discrete treatment opt...

Please sign up or login with your details

Forgot password? Click here to reset