Importance Resampling for Off-policy Prediction

06/11/2019
by   Matthew Schlegel, et al.
3

Importance sampling (IS) is a common reweighting strategy for off-policy prediction in reinforcement learning. While it is consistent and unbiased, it can result in high variance updates to the weights for the value function. In this work, we explore a resampling strategy as an alternative to reweighting. We propose Importance Resampling (IR) for off-policy prediction, which resamples experience from a replay buffer and applies standard on-policy updates. The approach avoids using importance sampling ratios in the update, instead correcting the distribution before the update. We characterize the bias and consistency of IR, particularly compared to Weighted IS (WIS). We demonstrate in several microworlds that IR has improved sample efficiency and lower variance updates, as compared to IS and several variance-reduced IS strategies, including variants of WIS and V-trace which clips IS ratios. We also provide a demonstration showing IR improves over IS for learning a value function from images in a racing car simulator.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2023

Value-aware Importance Weighting for Off-policy Reinforcement Learning

Importance sampling is a central idea underlying off-policy prediction i...
research
09/10/2021

An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment

Many off-policy prediction learning algorithms have been proposed in the...
research
11/12/2018

Importance Weighted Evolution Strategies

Evolution Strategies (ES) emerged as a scalable alternative to popular R...
research
09/15/2022

On the Reuse Bias in Off-Policy Reinforcement Learning

Importance sampling (IS) is a popular technique in off-policy evaluation...
research
07/03/2022

USHER: Unbiased Sampling for Hindsight Experience Replay

Dealing with sparse rewards is a long-standing challenge in reinforcemen...
research
06/04/2018

Efficiency of adaptive importance sampling

The sampling policy of stage t, formally expressed as a probability dens...
research
06/04/2018

Asymptotic optimality of adaptive importance sampling

Adaptive importance sampling (AIS) uses past samples to update the sampl...

Please sign up or login with your details

Forgot password? Click here to reset