GenDICE: Generalized Offline Estimation of Stationary Values

02/21/2020
by   Ruiyi Zhang, et al.
11

An important problem that arises in reinforcement learning and Monte Carlo methods is estimating quantities defined by the stationary distribution of a Markov chain. In many real-world applications, access to the underlying transition operator is limited to a fixed set of data that has already been collected, without additional interaction with the environment being available. We show that consistent estimation remains possible in this challenging scenario, and that effective estimation can still be achieved in important applications. Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions, derived from fundamental properties of the stationary distribution, and exploiting constraint reformulations based on variational divergence minimization. The resulting algorithm, GenDICE, is straightforward and effective. We prove its consistency under general conditions, provide an error analysis, and demonstrate strong empirical performance on benchmark problems, including off-line PageRank and off-policy policy evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2020

Batch Stationary Distribution Estimation

We consider the problem of approximating the stationary distribution of ...
research
02/28/2022

LobsDICE: Offline Imitation Learning from Observation via Stationary Distribution Correction Estimation

We consider the problem of imitation from observation (IfO), in which th...
research
06/10/2019

DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections

In many real-world reinforcement learning applications, access to the en...
research
01/29/2020

GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values

We present GradientDICE for estimating the density ratio between the sta...
research
11/05/2020

Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity

Quality-Diversity (QD) is a concept from Neuroevolution with some intrig...
research
03/24/2020

Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning

Off-policy estimation for long-horizon problems is important in many rea...
research
10/06/2019

Statistical Analysis of Stationary Solutions of Coupled Nonconvex Nonsmooth Empirical Risk Minimization

This paper has two main goals: (a) establish several statistical propert...

Please sign up or login with your details

Forgot password? Click here to reset