Is Vanilla Policy Gradient Overlooked? Analyzing Deep Reinforcement Learning for Hanabi

03/22/2022
by   Bram Grooten, et al.
0

In pursuit of enhanced multi-agent collaboration, we analyze several on-policy deep reinforcement learning algorithms in the recently published Hanabi benchmark. Our research suggests a perhaps counter-intuitive finding, where Proximal Policy Optimization (PPO) is outperformed by Vanilla Policy Gradient over multiple random seeds in a simplified environment of the multi-agent cooperative card game. In our analysis of this behavior we look into Hanabi-specific metrics and hypothesize a reason for PPO's plateau. In addition, we provide proofs for the maximum length of a perfect game (71 turns) and any game (89 turns). Our code can be found at: https://github.com/bramgrooten/DeepRL-for-Hanabi

READ FULL TEXT

page 4

page 5

page 6

page 10

page 11

research
12/17/2020

MAGNet: Multi-agent Graph Network for Deep Multi-agent Reinforcement Learning

Over recent years, deep reinforcement learning has shown strong successe...
research
07/23/2022

Halftoning with Multi-Agent Deep Reinforcement Learning

Deep neural networks have recently succeeded in digital halftoning using...
research
11/23/2020

Consolidation via Policy Information Regularization in Deep RL for Multi-Agent Games

This paper introduces an information-theoretic constraint on learned pol...
research
08/29/2018

Deep Reinforcement Learning in Portfolio Management

In this paper, we implement two state-of-art continuous reinforcement le...
research
05/25/2020

Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO

We study the roots of algorithmic progress in deep policy gradient algor...
research
03/08/2019

A cooperative game for automated learning of elasto-plasticity knowledge graphs and models with AI-guided experimentation

We introduce a multi-agent meta-modeling game to generate data, knowledg...
research
12/09/2020

Deep Reinforcement Learning for Stock Portfolio Optimization

Stock portfolio optimization is the process of constant re-distribution ...

Please sign up or login with your details

Forgot password? Click here to reset