Off-Policy Evaluation via the Regularized Lagrangian

07/07/2020
by   Mengjiao Yang, et al.
4

The recently proposed distribution correction estimation (DICE) family of estimators has advanced the state of the art in off-policy evaluation from behavior-agnostic data. While these estimators all perform some form of stationary distribution correction, they arise from different derivations and objective functions. In this paper, we unify these estimators as regularized Lagrangians of the same linear program. The unification allows us to expand the space of DICE estimators to new alternatives that demonstrate improved performance. More importantly, by analyzing the expanded space of estimators both mathematically and empirically we find that dual solutions offer greater flexibility in navigating the tradeoff between optimization stability and estimation bias, and generally provide superior estimates in practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/10/2019

Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior Policies

We consider off-policy policy evaluation when the trajectory data are ge...
research
06/23/2023

Correcting discount-factor mismatch in on-policy policy gradient methods

The policy gradient theorem gives a convenient form of the policy gradie...
research
11/26/2017

The Inverse Weighted Lindley Distribution: Properties, Estimation and an Application on a Failure Time Data

In this paper a new distribution is proposed. This new model provides mo...
research
08/26/2020

Improved estimators in beta prime regression models

In this paper, we consider the beta prime regression model recently prop...
research
11/29/2018

An Evaluation of Design-based Properties of Different Composite Estimators

For the last decades, the US Census Bureau has been using the AK composi...
research
06/10/2019

DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections

In many real-world reinforcement learning applications, access to the en...
research
07/19/2021

Mind the Income Gap: Behavior of Inequality Estimators from Complex Survey Small Samples

Income inequality measures are biased in small samples leading generally...

Please sign up or login with your details

Forgot password? Click here to reset