Policy Optimization via Importance Sampling

09/17/2018
by   Alberto Maria Metelli, et al.
0

Policy optimization is an effective reinforcement learning approach to solve continuous control tasks. Recent achievements have shown that alternating on-line and off-line optimization is a successful choice for efficient trajectory reuse. However, deciding when to stop optimizing and collect new trajectories is non-trivial as it requires to account for the variance of the objective function estimate. In this paper, we propose a novel model-free policy search algorithm, POIS, applicable in both control-based and parameter-based settings. We first derive a high-confidence bound for importance sampling estimation and then we define a surrogate objective function which is optimized off-line using a batch of trajectories. Finally, the algorithm is tested on a selection of continuous control tasks, with both linear and deep policies, and compared with the state-of-the-art policy optimization methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2023

Sample Dropout: A Simple yet Effective Variance Reduction Technique in Deep Policy Optimization

Recent success in Deep Reinforcement Learning (DRL) methods has shown th...
research
10/09/2019

Policy Optimization Through Approximated Importance Sampling

Recent policy optimization approaches (Schulman et al., 2015a, 2017) hav...
research
09/15/2022

On the Reuse Bias in Off-Policy Reinforcement Learning

Importance sampling (IS) is a popular technique in off-policy evaluation...
research
12/07/2022

Low Variance Off-policy Evaluation with State-based Importance Sampling

In off-policy reinforcement learning, a behaviour policy performs explor...
research
05/14/2019

Trajectory-Based Off-Policy Deep Reinforcement Learning

Policy gradient methods are powerful reinforcement learning algorithms a...
research
07/01/2022

Offline Policy Optimization with Eligible Actions

Offline policy optimization could have a large impact on many real-world...
research
04/13/2021

Muesli: Combining Improvements in Policy Optimization

We propose a novel policy update that combines regularized policy optimi...

Please sign up or login with your details

Forgot password? Click here to reset