Revisiting the Gumbel-Softmax in MADDPG

02/23/2023
by   Callum Rhys Tilbury, et al.
0

MADDPG is an algorithm in multi-agent reinforcement learning (MARL) that extends the popular single-agent method, DDPG, to multi-agent scenarios. Importantly, DDPG is an algorithm designed for continuous action spaces, where the gradient of the state-action value function exists. For this algorithm to work in discrete action spaces, discrete gradient estimation must be performed. For MADDPG, the Gumbel-Softmax (GS) estimator is used – a reparameterisation which relaxes a discrete distribution into a similar continuous one. This method, however, is statistically biased, and a recent MARL benchmarking paper suggests that this bias makes MADDPG perform poorly in grid-world situations, where the action space is discrete. Fortunately, many alternatives to the GS exist, boasting a wide range of properties. This paper explores several of these alternatives and integrates them into MADDPG for discrete grid-world scenarios. The corresponding impact on various performance metrics is then measured and analysed. It is found that one of the proposed estimators performs significantly better than the original GS in several tasks, achieving up to 55 higher returns, along with faster convergence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2022

Deep Multi-Agent Reinforcement Learning with Hybrid Action Spaces based on Maximum Entropy

Multi-agent deep reinforcement learning has been applied to address a va...
research
03/12/2019

Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces

Deep Reinforcement Learning (DRL) has been applied to address a variety ...
research
03/22/2021

Regularized Softmax Deep Multi-Agent Q-Learning

Tackling overestimation in Q-learning is an important problem that has b...
research
01/21/2023

Decentralized Multi-agent Filtering

This paper addresses the considerations that comes along with adopting d...
research
06/02/2018

Efficient Entropy for Policy Gradient with Multidimensional Action Space

In recent years, deep reinforcement learning has been shown to be adept ...
research
05/31/2021

Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning

Reinforcement Learning in large action spaces is a challenging problem. ...
research
05/31/2023

Handling Large Discrete Action Spaces via Dynamic Neighborhood Construction

Large discrete action spaces remain a central challenge for reinforcemen...

Please sign up or login with your details

Forgot password? Click here to reset