School of hard knocks: Curriculum analysis for Pommerman with a fixed computational budget

02/23/2021
by   Omkar Shelke, et al.
0

Pommerman is a hybrid cooperative/adversarial multi-agent environment, with challenging characteristics in terms of partial observability, limited or no communication, sparse and delayed rewards, and restrictive computational time limits. This makes it a challenging environment for reinforcement learning (RL) approaches. In this paper, we focus on developing a curriculum for learning a robust and promising policy in a constrained computational budget of 100,000 games, starting from a fixed base policy (which is itself trained to imitate a noisy expert policy). All RL algorithms starting from the base policy use vanilla proximal-policy optimization (PPO) with the same reward function, and the only difference between their training is the mix and sequence of opponent policies. One expects that beginning training with simpler opponents and then gradually increasing the opponent difficulty will facilitate faster learning, leading to more robust policies compared against a baseline where all available opponent policies are introduced from the start. We test this hypothesis and show that within constrained computational budgets, it is in fact better to "learn in the school of hard knocks", i.e., against all available opponent policies nearly from the start. We also include ablation studies where we study the effect of modifying the base environment properties of ammo and bomb blast strength on the agent performance.

READ FULL TEXT
research
10/08/2020

Guided Curriculum Learning for Walking Over Complex Terrain

Reliable bipedal walking over complex terrain is a challenging problem, ...
research
06/26/2022

Improving Policy Optimization with Generalist-Specialist Learning

Generalization in deep reinforcement learning over unseen environment va...
research
12/01/2019

Automated curriculum generation for Policy Gradients from Demonstrations

In this paper, we present a technique that improves the process of train...
research
10/20/2022

Task Phasing: Automated Curriculum Learning from Demonstrations

Applying reinforcement learning (RL) to sparse reward domains is notorio...
research
07/18/2018

Backplay: "Man muss immer umkehren"

A long-standing problem in model free reinforcement learning (RL) is tha...
research
02/17/2022

Robust Reinforcement Learning via Genetic Curriculum

Achieving robust performance is crucial when applying deep reinforcement...
research
05/09/2021

Improving Cost Learning for JPEG Steganography by Exploiting JPEG Domain Knowledge

Although significant progress in automatic learning of steganographic co...

Please sign up or login with your details

Forgot password? Click here to reset