Discussion of Kallus (2020) and Mo, Qi, and Liu (2020): New Objectives for Policy Learning

10/09/2020
by   Sijia Li, et al.
0

We discuss the thought-provoking new objective functions for policy learning that were proposed in "More efficient policy learning via optimal retargeting" by Nathan Kallus and "Learning optimal distributionally robust individualized treatment rules" by Weibin Mo, Zhengling Qi, and Yufeng Liu. We show that it is important to take the curvature of the value function into account when working within the retargeting framework, and we introduce two ways to do so. We also describe more efficient approaches for leveraging calibration data when learning distributionally robust policies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/05/2020

Rejoinder: New Objectives for Policy Learning

I provide a rejoinder for discussion of "More Efficient Policy Learning ...
research
10/17/2021

Rejoinder: Learning Optimal Distributionally Robust Individualized Treatment Rules

We thank the opportunity offered by editors for this discussion and the ...
research
10/31/2011

Optimal and Approximate Q-value Functions for Decentralized POMDPs

Decision-theoretic planning is a popular approach to sequential decision...
research
07/24/2022

Towards Using Fully Observable Policies for POMDPs

Partially Observable Markov Decision Process (POMDP) is a framework appl...
research
05/24/2019

Semi-Parametric Efficient Policy Learning with Continuous Actions

We consider off-policy evaluation and optimization with continuous actio...
research
06/20/2019

More Efficient Policy Learning via Optimal Retargeting

Policy learning can be used to extract individualized treatment regimes ...
research
01/27/2023

Policy-Value Alignment and Robustness in Search-based Multi-Agent Learning

Large-scale AI systems that combine search and learning have reached sup...

Please sign up or login with your details

Forgot password? Click here to reset