Policy Gradient RL Algorithms as Directed Acyclic Graphs

12/14/2020
by   Juan Jose Garau Luis, et al.
0

Meta Reinforcement Learning (RL) methods focus on automating the design of RL algorithms that generalize to a wide range of environments. The framework introduced in (Anonymous, 2020) addresses the problem by representing different RL algorithms as Directed Acyclic Graphs (DAGs), and using an evolutionary meta learner to modify these graphs and find good agent update rules. While the search language used to generate graphs in the paper serves to represent numerous already-existing RL algorithms (e.g., DQN, DDQN), it has limitations when it comes to representing Policy Gradient algorithms. In this work we try to close this gap by extending the original search language and proposing graphs for five different Policy Gradient algorithms: VPG, PPO, DDPG, TD3, and SAC.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/12/2020

Provably Convergent Policy Gradient Methods for Model-Agnostic Meta-Reinforcement Learning

We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcemen...
research
10/03/2022

Policy Gradient for Reinforcement Learning with General Utilities

In Reinforcement Learning (RL), the goal of agents is to discover an opt...
research
07/17/2020

Discovering Reinforcement Learning Algorithms

Reinforcement learning (RL) algorithms update an agent's parameters acco...
research
05/25/2020

Meta-Reinforcement Learning for Trajectory Design in Wireless UAV Networks

In this paper, the design of an optimal trajectory for an energy-constra...
research
01/08/2020

A Nonparametric Offpolicy Policy Gradient

Reinforcement learning (RL) algorithms still suffer from high sample com...
research
01/28/2022

Leveraging class abstraction for commonsense reinforcement learning via residual policy gradient methods

Enabling reinforcement learning (RL) agents to leverage a knowledge base...
research
04/28/2020

Improving Sample Efficiency and Multi-Agent Communication in RL-based Train Rescheduling

We present preliminary results from our sixth placed entry to the Flatla...

Please sign up or login with your details

Forgot password? Click here to reset