Effects of Spectral Normalization in Multi-agent Reinforcement Learning

12/10/2022
by   Kinal Mehta, et al.
0

A reliable critic is central to on-policy actor-critic learning. But it becomes challenging to learn a reliable critic in a multi-agent sparse reward scenario due to two factors: 1) The joint action space grows exponentially with the number of agents 2) This, combined with the reward sparseness and environment noise, leads to large sample requirements for accurate learning. We show that regularising the critic with spectral normalization (SN) enables it to learn more robustly, even in multi-agent on-policy sparse reward scenarios. Our experiments show that the regularised critic is quickly able to learn from the sparse rewarding experience in the complex SMAC and RWARE domains. These findings highlight the importance of regularisation in the critic for stable learning.

READ FULL TEXT
research
06/12/2020

Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning

Exploration in multi-agent reinforcement learning is a challenging probl...
research
10/31/2019

PIC: Permutation Invariant Critic for Multi-Agent Deep Reinforcement Learning

Sample efficiency and scalability to a large number of agents are two im...
research
06/12/2020

Potential Field Guided Actor-Critic Reinforcement Learning

In this paper, we consider the problem of actor-critic reinforcement lea...
research
12/31/2020

Multi-Agent Reinforcement Learning for Unmanned Aerial Vehicle Coordination by Multi-Critic Policy Gradient Optimization

Recent technological progress in the development of Unmanned Aerial Vehi...
research
01/18/2020

Effects of sparse rewards of different magnitudes in the speed of learning of model-based actor critic methods

Actor critic methods with sparse rewards in model-based deep reinforceme...
research
09/07/2018

Improving On-policy Learning with Statistical Reward Accumulation

Deep reinforcement learning has obtained significant breakthroughs in re...
research
10/15/2020

Cooperative-Competitive Reinforcement Learning with History-Dependent Rewards

Consider a typical organization whose worker agents seek to collectively...

Please sign up or login with your details

Forgot password? Click here to reset