Distributional Offline Policy Evaluation with Predictive Error Guarantees

02/19/2023
by   Runzhe Wu, et al.
0

We study the problem of estimating the distribution of the return of a policy using an offline dataset that is not generated from the policy, i.e., distributional offline policy evaluation (OPE). We propose an algorithm called Fitted Likelihood Estimation (FLE), which conducts a sequence of Maximum Likelihood Estimation (MLE) problems and has the flexibility of integrating any state-of-art probabilistic generative models as long as it can be trained via MLE. FLE can be used for both finite horizon and infinite horizon discounted settings where rewards can be multi-dimensional vectors. In our theoretical results, we show that for both finite and infinite horizon discounted settings, FLE can learn distributions that are close to the ground truth under total variation distance and Wasserstein distance, respectively. Our theoretical results hold under the conditions that the offline data covers the test policy's traces and the supervised learning MLE procedures succeed. Experimentally, we demonstrate the performance of FLE with two generative models, Gaussian mixture models and diffusion models. For the multi-dimensional reward setting, FLE with diffusion models is capable of estimating the complicated distribution of the return of a test policy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/06/2023

Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs

Diffusion models have exhibited excellent performance in various domains...
research
10/29/2018

Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation

We consider the off-policy estimation problem of estimating the expected...
research
05/22/2023

Offline Primal-Dual Reinforcement Learning for Linear MDPs

Offline Reinforcement Learning (RL) aims to learn a near-optimal policy ...
research
02/20/2023

Infinite-Dimensional Diffusion Models for Function Spaces

We define diffusion-based generative models in infinite dimensions, and ...
research
10/26/2021

Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

A growing trend for value-based reinforcement learning (RL) algorithms i...
research
03/03/2023

Diffusion Models are Minimax Optimal Distribution Estimators

While efficient distribution learning is no doubt behind the groundbreak...
research
04/11/2023

A Tale of Sampling and Estimation in Discounted Reinforcement Learning

The most relevant problems in discounted reinforcement learning involve ...

Please sign up or login with your details

Forgot password? Click here to reset