Episodic Policy Gradient Training

12/03/2021
by   Hung Le, et al.
13

We introduce a novel training procedure for policy gradient methods wherein episodic memory is used to optimize the hyperparameters of reinforcement learning algorithms on-the-fly. Unlike other hyperparameter searches, we formulate hyperparameter scheduling as a standard Markov Decision Process and use episodic memory to store the outcome of used hyperparameters and their training contexts. At any policy update step, the policy learner refers to the stored experiences, and adaptively reconfigures its learning algorithm with the new hyperparameters determined by the memory. This mechanism, dubbed as Episodic Policy Gradient Training (EPGT), enables an episodic learning process, and jointly learns the policy and the learning algorithm's hyperparameters within a single run. Experimental results on both continuous and discrete environments demonstrate the advantage of using the proposed method in boosting the performance of various policy gradient algorithms.

READ FULL TEXT

page 2

page 13

page 17

research
02/18/2019

Fast Efficient Hyperparameter Tuning for Policy Gradients

The performance of policy gradient methods is sensitive to hyperparamete...
research
10/20/2020

Proximal Policy Gradient: PPO with Policy Gradient

In this paper, we propose a new algorithm PPG (Proximal Policy Gradient)...
research
10/18/2018

Trust Region Policy Optimization of POMDPs

We propose Generalized Trust Region Policy Optimization (GTRPO), a Reinf...
research
02/23/2022

Consistent Dropout for Policy Gradient Reinforcement Learning

Dropout has long been a staple of supervised learning, but is rarely use...
research
10/08/2017

Recurrent Network-based Deterministic Policy Gradient for Solving Bipedal Walking Challenge on Rugged Terrains

This paper presents the learning algorithm based on the Recurrent Networ...
research
04/17/2020

Deep Reinforcement Learning for Adaptive Learning Systems

In this paper, we formulate the adaptive learning problem—the problem of...
research
05/18/2023

Deep Metric Tensor Regularized Policy Gradient

Policy gradient algorithms are an important family of deep reinforcement...

Please sign up or login with your details

Forgot password? Click here to reset