Parameter Sharing in Coagent Networks

01/28/2020
by   Modjtaba Shokrian Zini, et al.
0

In this paper, we aim to prove the theorem that generalizes the Coagent Network Policy Gradient Theorem (Kostas et. al., 2019) to the context where parameters are shared among the function approximators involved. This provides the theoretical foundation to use any pattern of parameter sharing and leverage the freedom in the graph structure of the network to possibility exploit relational bias in a given task. As another application, we will apply our result to give a more intuitive proof for the Hierarchical Option Critic Policy Gradient Theorem, first shown in (Riemer et. al., 2019).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2017

Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines

We show how an action-dependent baseline can be used by the policy gradi...
research
02/04/2022

A Temporal-Difference Approach to Policy Gradient Estimation

The policy gradient theorem (Sutton et al., 2000) prescribes the usage o...
research
10/09/2019

Compatible features for Monotonic Policy Improvement

Recent policy optimization approaches have achieved substantial empirica...
research
11/22/2018

An Off-policy Policy Gradient Theorem Using Emphatic Weightings

Policy gradient methods are widely used for control in reinforcement lea...
research
10/27/2018

Learning Abstract Options

Building systems that autonomously create temporal abstractions from dat...
research
06/08/2018

Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing

Dynamic oracles provide strong supervision for training constituency par...

Please sign up or login with your details

Forgot password? Click here to reset