Bag of Tricks for Natural Policy Gradient Reinforcement Learning

01/22/2022
by   Brennan Gebotys, et al.
0

Natural policy gradient methods are popular reinforcement learning methods that improve the stability of policy gradient methods by preconditioning the gradient with the inverse of the Fisher-information matrix. However, leveraging natural policy gradient methods in an optimal manner can be very challenging as many implementation details must be set to achieve optimal performance. To the best of the authors' knowledge, there has not been a study that has investigated strategies for setting these details for natural policy gradient methods to achieve high performance in a comprehensive and systematic manner. To address this, we have implemented and compared strategies that impact performance in natural policy gradient reinforcement learning across five different second-order approximations. These include varying batch sizes and optimizing the critic network using the natural gradient. Furthermore, insights about the fundamental trade-offs when optimizing for performance (stability, sample efficiency, and computation time) were generated. Experimental results indicate that the proposed collection of strategies for performance optimization can improve results by 86 benchmark, with TENGraD exhibiting the best approximation performance amongst the tested approximations. Code in this study is available at https://github.com/gebob19/natural-policy-gradient-reinforcement-learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2021

On the Linear convergence of Natural Policy Gradient Algorithm

Markov Decision Processes are classically solved using Value Iteration a...
research
05/25/2020

Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO

We study the roots of algorithmic progress in deep policy gradient algor...
research
05/28/2021

A nearly Blackwell-optimal policy gradient method

For continuing environments, reinforcement learning methods commonly max...
research
01/08/2021

Learning Low-Correlation GPS Spreading Codes with a Policy Gradient Algorithm

With the birth of the next-generation GPS III constellation and the upco...
research
02/10/2017

Batch Policy Gradient Methods for Improving Neural Conversation Models

We study reinforcement learning of chatbots with recurrent neural networ...
research
07/12/2022

Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning

In lifelong learning, an agent learns throughout its entire life without...
research
06/30/2021

Inverse Design of Grating Couplers Using the Policy Gradient Method from Reinforcement Learning

We present a proof-of-concept technique for the inverse design of electr...

Please sign up or login with your details

Forgot password? Click here to reset