Conditional Importance Sampling for Off-Policy Learning

10/16/2019
by   Mark Rowland, et al.
12

The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/18/2022

Importance Sampling Placement in Off-Policy Temporal-Difference Methods

A central challenge to applying many off-policy reinforcement learning a...
research
10/15/2019

Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling

We establish a connection between the importance sampling estimators typ...
research
06/27/2023

Value-aware Importance Weighting for Off-policy Reinforcement Learning

Importance sampling is a central idea underlying off-policy prediction i...
research
02/05/2023

Sample Dropout: A Simple yet Effective Variance Reduction Technique in Deep Policy Optimization

Recent success in Deep Reinforcement Learning (DRL) methods has shown th...
research
06/04/2018

Asymptotic optimality of adaptive importance sampling

Adaptive importance sampling (AIS) uses past samples to update the sampl...
research
09/10/2021

An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment

Many off-policy prediction learning algorithms have been proposed in the...
research
01/26/2023

Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning

Off-policy learning from multistep returns is crucial for sample-efficie...

Please sign up or login with your details

Forgot password? Click here to reset