Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning

06/06/2020
by   Nathan Kallus, et al.
4

We study the efficient off-policy evaluation of natural stochastic policies, which are defined in terms of deviations from the behavior policy. This is a departure from the literature on off-policy evaluation where most work consider the evaluation of explicitly specified policies. Crucially, offline reinforcement learning with natural stochastic policies can help alleviate issues of weak overlap, lead to policies that build upon current practice, and improve policies' implementability in practice. Compared with the classic case of a pre-specified evaluation policy, when evaluating natural stochastic policies, the efficiency bound, which measures the best-achievable estimation error, is inflated since the evaluation policy itself is unknown. In this paper we derive the efficiency bounds of two major types of natural stochastic policies: tilting policies and modified treatment policies. We then propose efficient nonparametric estimators that attain the efficiency bounds under very lax conditions. These also enjoy a (partial) double robustness property.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/29/2020

Optimal Mixture Weights for Off-Policy Evaluation with Multiple Behavior Policies

Off-policy evaluation is a key component of reinforcement learning which...
research
06/20/2019

More Efficient Policy Learning via Optimal Retargeting

Policy learning can be used to extract individualized treatment regimes ...
research
03/04/2018

Process Ordering in a Process Calculus for Spatially-Explicit Ecological Models

In this paper we extend PALPS, a process calculus proposed for the spati...
research
10/11/2022

Learning Control Policies for Region Stabilization in Stochastic Systems

We consider the problem of learning control policies in stochastic syste...
research
07/01/2022

Action-modulated midbrain dopamine activity arises from distributed control policies

Animal behavior is driven by multiple brain regions working in parallel ...
research
09/12/2019

Efficiently Breaking the Curse of Horizon: Double Reinforcement Learning in Infinite-Horizon Processes

Off-policy evaluation (OPE) in reinforcement learning is notoriously dif...
research
09/12/2017

Information Design in Crowdfunding under Thresholding Policies

In crowdfunding, an entrepreneur often has to decide how to disclose the...

Please sign up or login with your details

Forgot password? Click here to reset