Marginal Policy Gradients for Complex Control

06/13/2018
by   Carson Eisenach, et al.
0

Many complex domains, such as robotics control and real-time strategy (RTS) games, require an agent to learn a continuous control. In the former, an agent learns a policy over R^d and in the latter, over a discrete set of actions each of which is parametrized by a continuous parameter. Such problems are naturally solved using policy based reinforcement learning (RL) methods, but unfortunately these often suffer from high variance leading to instability and slow convergence. We show that in many cases a substantial portion of the variance in policy gradient estimators is completely unnecessary and can be eliminated without introducing bias. Unnecessary variance is introduced whenever policies over bounded action spaces are modeled using distributions with unbounded support, by applying a transformation T to the sampled action before execution in the environment. Recent works have studied variance reduced policy gradients for actions in bounded intervals, but to date no variance reduced methods exist when the action is a direction -- constrained to the unit sphere -- something often seen in RTS games. To address these challenges we: (1) introduce a stochastic policy gradient method for directional control; (2) introduce the marginal policy gradient framework, a powerful technique to obtain variance reduced policy gradients for arbitrary T; (3) show that marginal policy gradients are guaranteed to reduce variance, quantifying that reduction exactly; (4) validate our framework by applying the methods to a popular RTS game and a navigation task, demonstrating improvement over a policy gradient baseline.

READ FULL TEXT

page 11

page 12

research
02/21/2018

Clipped Action Policy Gradient

Many continuous control tasks have bounded action spaces and clip out-of...
research
01/17/2013

Efficient Sample Reuse in Policy Gradients with Parameter-based Exploration

The policy gradient approach is a flexible and powerful reinforcement le...
research
01/10/2018

Expected Policy Gradients for Reinforcement Learning

We propose expected policy gradients (EPG), which unify stochastic polic...
research
05/08/2019

Smoothing Policies and Safe Policy Gradients

Policy gradient algorithms are among the best candidates for the much an...
research
10/24/2022

On All-Action Policy Gradients

In this paper, we analyze the variance of stochastic policy gradient wit...
research
02/20/2021

Causal Policy Gradients

Policy gradient methods can solve complex tasks but often fail when the ...
research
07/11/2021

Coordinate-wise Control Variates for Deep Policy Gradients

The control variates (CV) method is widely used in policy gradient estim...

Please sign up or login with your details

Forgot password? Click here to reset