Low-Variance Policy Gradient Estimation with World Models

10/29/2020
by   Michal Nauman, et al.
0

In this paper, we propose World Model Policy Gradient (WMPG), an approach to reduce the variance of policy gradient estimates using learned world models (WM's). In WMPG, a WM is trained online and used to imagine trajectories. The imagined trajectories are used in two ways. Firstly, to calculate a without-replacement estimator of the policy gradient. Secondly, the return of the imagined trajectories is used as an informed baseline. We compare the proposed approach with AC and MAC on a set of environments of increasing complexity (CartPole, LunarLander and Pong) and find that WMPG has better sample efficiency. Based on these results, we conclude that WMPG can yield increased sample efficiency in cases where a robust latent representation of the environment can be learned.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2017

Sample-efficient Policy Optimization with Stein Control Variate

Policy gradient methods have achieved remarkable successes in solving ch...
research
07/06/2018

Memory Augmented Policy Optimization for Program Synthesis with Generalization

This paper presents Memory Augmented Policy Optimization (MAPO): a novel...
research
01/26/2023

Partial advantage estimator for proximal policy optimization

Estimation of value in policy gradient methods is a fundamental problem....
research
09/03/2018

Emergence of Communication in an Interactive World with Consistent Speakers

Training agents to communicate with one another given task-based supervi...
research
06/28/2020

Deep Bayesian Quadrature Policy Optimization

We study the problem of obtaining accurate policy gradient estimates. Th...
research
02/03/2022

ExPoSe: Combining State-Based Exploration with Gradient-Based Online Search

A tree-based online search algorithm iteratively simulates trajectories ...
research
05/24/2023

Policy Learning based on Deep Koopman Representation

This paper proposes a policy learning algorithm based on the Koopman ope...

Please sign up or login with your details

Forgot password? Click here to reset