Accelerating Training in Pommerman with Imitation and Reinforcement Learning

11/12/2019
by   Hardik Meisheri, et al.
0

The Pommerman simulation was recently developed to mimic the classic Japanese game Bomberman, and focuses on competitive gameplay in a multi-agent setting. We focus on the 2×2 team version of Pommerman, developed for a competition at NeurIPS 2018[%s]. Our methodology involves training an agent initially through imitation learning on a noisy expert policy, followed by a proximal-policy optimization (PPO) reinforcement learning algorithm. The basic PPO approach is modified for stable transition from the imitation learning phase through reward shaping, action filters based on heuristics, and curriculum learning. The proposed methodology is able to beat heuristic and pure reinforcement learning baselines with a combined 100,000 training games, significantly faster than other non-tree-search methods in literature. We present results against multiple agents provided by the developers of the simulation, including some that we have enhanced. We include a sensitivity analysis over different parameters, and highlight undesirable effects of some strategies that initially appear promising. Since Pommerman is a complex multi-agent competitive environment, the strategies developed here provide insights into several real-world problems with characteristics such as partial observability, decentralized execution (without communication), and very sparse and delayed rewards.

READ FULL TEXT
research
08/20/2023

Mimicking To Dominate: Imitation Learning Strategies for Success in Multiagent Competitive Games

Training agents in multi-agent competitive games presents significant ch...
research
11/01/2020

Sample Efficient Training in Multi-Agent Adversarial Games with Limited Teammate Communication

We describe our solution approach for Pommerman TeamRadio, a competition...
research
07/26/2018

Multi-Agent Generative Adversarial Imitation Learning

Imitation learning algorithms can be used to learn a policy from expert ...
research
07/23/2020

Bridging the Imitation Gap by Adaptive Insubordination

Why do agents often obtain better reinforcement learning policies when i...
research
04/20/2019

Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition

The Pommerman Team Environment is a recently proposed benchmark which in...
research
03/25/2023

Embedding Contextual Information through Reward Shaping in Multi-Agent Learning: A Case Study from Google Football

Artificial Intelligence has been used to help human complete difficult t...
research
07/12/2019

Environment Reconstruction with Hidden Confounders for Reinforcement Learning based Recommendation

Reinforcement learning aims at searching the best policy model for decis...

Please sign up or login with your details

Forgot password? Click here to reset