Difference Rewards Policy Gradients

12/21/2020
by   Jacopo Castellini, et al.
0

Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent's contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.Reinforce that explicitly tackles this by combining difference rewards with policy gradients to allow for learning decentralized policies when the reward function is known. By differencing the reward function directly, Dr.Reinforce avoids difficulties associated with learning the Q-function as done by Counterfactual Multiagent Policy Gradients (COMA), a state-of-the-art difference rewards method. For applications where the reward function is unknown, we show the effectiveness of a version of Dr.Reinforce that learns an additional reward network that is used to estimate the difference rewards.

READ FULL TEXT
research
08/02/2019

Health-Informed Policy Gradients for Multi-Agent Reinforcement Learning

This paper proposes a definition of system health in the context of mult...
research
05/13/2019

Learning Novel Policies For Tasks

In this work, we present a reinforcement learning algorithm that can fin...
research
03/06/2022

Leveraging Reward Gradients For Reinforcement Learning in Differentiable Physics Simulations

In recent years, fully differentiable rigid body physics simulators have...
research
08/16/2022

Solving the Diffusion of Responsibility Problem in Multiagent Reinforcement Learning with a Policy Resonance Approach

SOTA multiagent reinforcement algorithms distinguish themselves in many ...
research
12/05/2019

Learning Human Objectives by Evaluating Hypothetical Behavior

We seek to align agent behavior with a user's objectives in a reinforcem...
research
02/03/2023

Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased

There is a recent trend of applying multi-agent reinforcement learning (...
research
12/21/2022

Reward Bonuses with Gain Scheduling Inspired by Iterative Deepening Search

This paper introduces a novel method of adding intrinsic bonuses to task...

Please sign up or login with your details

Forgot password? Click here to reset