Conditional Importance Sampling for Off-Policy Learning

10/16/2019
by   Mark Rowland, et al.
12

The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

READ FULL TEXT

page 1

page 2

page 3

page 4

03/18/2022

Importance Sampling Placement in Off-Policy Temporal-Difference Methods

A central challenge to applying many off-policy reinforcement learning a...
10/15/2019

Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling

We establish a connection between the importance sampling estimators typ...
09/10/2021

An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment

Many off-policy prediction learning algorithms have been proposed in the...
09/15/2020

The Importance of Pessimism in Fixed-Dataset Policy Optimization

We study worst-case guarantees on the expected return of fixed-dataset p...
06/04/2018

Asymptotic optimality of adaptive importance sampling

Adaptive importance sampling (AIS) uses past samples to update the sampl...
06/04/2018

Efficiency of adaptive importance sampling

The sampling policy of stage t, formally expressed as a probability dens...
06/11/2019

Importance Resampling for Off-policy Prediction

Importance sampling (IS) is a common reweighting strategy for off-policy...