Multiagent Soft Q-Learning

04/25/2018
by   Ermo Wei, et al.
0

Policy gradient methods are often applied to reinforcement learning in continuous multiagent games. These methods perform local search in the joint-action space, and as we show, they are susceptable to a game-theoretic pathology known as relative overgeneralization. To resolve this issue, we propose Multiagent Soft Q-learning, which can be seen as the analogue of applying Q-learning to continuous controls. We compare our method to MADDPG, a state-of-the-art approach, and show that our method achieves better coordination in multiagent cooperative tasks, converging to better local optima in the joint action space.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2018

SCC-rFMQ Learning in Cooperative Markov Games with Continuous Actions

Although many reinforcement learning methods have been proposed for lear...
research
05/27/2019

Policy Search by Target Distribution Learning for Continuous Control

We observe that several existing policy gradient methods (such as vanill...
research
08/07/2014

Learning to Cooperate via Policy Search

Cooperative games are those in which both agents share the same payoff s...
research
06/25/2020

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

In recent years, Deep Reinforcement Learning (DRL) algorithms have achie...
research
08/19/2022

Unified Policy Optimization for Continuous-action Reinforcement Learning in Non-stationary Tasks and Games

This paper addresses policy learning in non-stationary environments and ...
research
09/26/2019

CAQL: Continuous Action Q-Learning

Value-based reinforcement learning (RL) methods like Q-learning have sho...
research
07/17/2023

Meta-Value Learning: a General Framework for Learning with Learning Awareness

Gradient-based learning in multi-agent systems is difficult because the ...

Please sign up or login with your details

Forgot password? Click here to reset