Sample Dropout: A Simple yet Effective Variance Reduction Technique in Deep Policy Optimization

02/05/2023
by   Zichuan Lin, et al.
0

Recent success in Deep Reinforcement Learning (DRL) methods has shown that policy optimization with respect to an off-policy distribution via importance sampling is effective for sample reuse. In this paper, we show that the use of importance sampling could introduce high variance in the objective estimate. Specifically, we show in a principled way that the variance of importance sampling estimate grows quadratically with importance ratios and the large ratios could consequently jeopardize the effectiveness of surrogate objective optimization. We then propose a technique called sample dropout to bound the estimation variance by dropping out samples when their ratio deviation is too high. We instantiate this sample dropout technique on representative policy optimization algorithms, including TRPO, PPO, and ESPO, and demonstrate that it consistently boosts the performance of those DRL algorithms on both continuous and discrete action controls, including MuJoCo, DMControl and Atari video games. Our code is open-sourced at <https://github.com/LinZichuan/sdpo.git>.

READ FULL TEXT
research
03/18/2022

Importance Sampling Placement in Off-Policy Temporal-Difference Methods

A central challenge to applying many off-policy reinforcement learning a...
research
05/07/2019

Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning

In importance sampling (IS)-based reinforcement learning algorithms such...
research
09/17/2018

Policy Optimization via Importance Sampling

Policy optimization is an effective reinforcement learning approach to s...
research
05/03/2023

Enhancing Precision with the Local Pivotal Method: A General Variance Reduction Approach

The local pivotal method (LPM) is a successful sampling method for takin...
research
11/22/2019

Importance Sampling of Many Lights with Reinforcement Lightcuts Learning

In this manuscript, we introduce a novel technique for sampling and inte...
research
09/10/2021

An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment

Many off-policy prediction learning algorithms have been proposed in the...
research
10/16/2019

Conditional Importance Sampling for Off-Policy Learning

The principal contribution of this paper is a conceptual framework for o...

Please sign up or login with your details

Forgot password? Click here to reset