Fourier Policy Gradients

02/19/2018
by   Matthew Fellows, et al.
0

We propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as convolutions and turns them into multiplications. The obtained analytical solutions allow us to capture the low variance benefits of EPG in a broad range of settings. For the critic, we treat trigonometric and radial basis functions, two function families with the universal approximation property. The choice of policy can be almost arbitrary, including mixtures or hybrid continuous-discrete probability distributions. Moreover, we derive a general family of sample-based estimators for stochastic policy gradients, which unifies existing results on sample-based approximation. We believe that this technique has the potential to shape the next generation of policy gradient approaches, powered by analytical results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/10/2018

Expected Policy Gradients for Reinforcement Learning

We propose expected policy gradients (EPG), which unify stochastic polic...
research
01/17/2013

Efficient Sample Reuse in Policy Gradients with Parameter-based Exploration

The policy gradient approach is a flexible and powerful reinforcement le...
research
06/15/2017

Expected Policy Gradients

We propose expected policy gradients (EPG), which unify stochastic polic...
research
02/04/2022

A Temporal-Difference Approach to Policy Gradient Estimation

The policy gradient theorem (Sutton et al., 2000) prescribes the usage o...
research
02/10/2020

Statistically Efficient Off-Policy Policy Gradients

Policy gradient methods in reinforcement learning update policy paramete...
research
07/20/2020

A Short Note on Soft-max and Policy Gradients in Bandits Problems

This is a short communication on a Lyapunov function argument for softma...
research
06/14/2019

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Direct optimization is an appealing approach to differentiating through ...

Please sign up or login with your details

Forgot password? Click here to reset