Expected Policy Gradients for Reinforcement Learning

01/10/2018
by   Kamil Ciosek, et al.
0

We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates (or sums) across actions when estimating the gradient, instead of relying only on the action in the sampled trajectory. For continuous action spaces, we first derive a practical result for Gaussian policies and quadric critics and then extend it to an analytical method for the universal case, covering a broad class of actors and critics, including Gaussian, exponential families, and reparameterised policies with bounded support. For Gaussian policies, we show that it is optimal to explore using covariance proportional to the matrix exponential of the scaled Hessian of the critic with respect to the actions. EPG also provides a general framework for reasoning about policy gradient methods, which we use to establish a new general policy gradient theorem, of which the stochastic and deterministic policy gradient theorems are special cases. Furthermore, we prove that EPG reduces the variance of the gradient estimates without requiring deterministic policies and with little computational overhead. Finally, we show that EPG outperforms existing approaches on six challenging domains involving the simulated control of physical systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2017

Expected Policy Gradients

We propose expected policy gradients (EPG), which unify stochastic polic...
research
02/19/2018

Fourier Policy Gradients

We propose a new way of deriving policy gradient updates for reinforceme...
research
06/14/2019

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Direct optimization is an appealing approach to differentiating through ...
research
06/13/2018

Marginal Policy Gradients for Complex Control

Many complex domains, such as robotics control and real-time strategy (R...
research
05/09/2023

Policy Gradient Methods in the Presence of Symmetries and State Abstractions

Reinforcement learning on high-dimensional and complex problems relies o...
research
11/24/2019

Merging Deterministic Policy Gradient Estimations with Varied Bias-Variance Tradeoff for Effective Deep Reinforcement Learning

Deep reinforcement learning (DRL) on Markov decision processes (MDPs) wi...
research
05/08/2019

Smoothing Policies and Safe Policy Gradients

Policy gradient algorithms are among the best candidates for the much an...

Please sign up or login with your details

Forgot password? Click here to reset