On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman

07/26/2019
by   Chao Gao, et al.
5

How to best explore in domains with sparse, delayed, and deceptive rewards is an important open problem for reinforcement learning (RL). This paper considers one such domain, the recently-proposed multi-agent benchmark of Pommerman. This domain is very challenging for RL --- past work has shown that model-free RL algorithms fail to achieve significant learning without artificially reducing the environment's complexity. In this paper, we illuminate reasons behind this failure by providing a thorough analysis on the hardness of random exploration in Pommerman. While model-free random exploration is typically futile, we develop a model-based automatic reasoning module that can be used for safer exploration by pruning actions that will surely lead the agent to death. We empirically demonstrate that this module can significantly improve learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

research
04/10/2019

Safer Deep RL with Shallow MCTS: A Case Study in Pommerman

Safe reinforcement learning has many variants and it is still an open re...
research
07/15/2021

PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration

Model-based Reinforcement Learning (RL) is a popular learning paradigm d...
research
02/26/2020

Optimistic Exploration even with a Pessimistic Initialisation

Optimistic initialisation is an effective strategy for efficient explora...
research
01/11/2019

An investigation of model-free planning

The field of reinforcement learning (RL) is facing increasingly challeng...
research
02/19/2022

Who Are the Best Adopters? User Selection Model for Free Trial Item Promotion

With the increasingly fierce market competition, offering a free trial h...
research
09/06/2019

A Reinforcement Learning Based Approach for Joint Multi-Agent Decision Making

Reinforcement Learning (RL) is being increasingly applied to optimize co...
research
04/08/2020

Adaptive Stress Testing without Domain Heuristics using Go-Explore

Recently, reinforcement learning (RL) has been used as a tool for findin...

Please sign up or login with your details

Forgot password? Click here to reset