Safer Deep RL with Shallow MCTS: A Case Study in Pommerman

04/10/2019
by   Bilal Kartal, et al.
1

Safe reinforcement learning has many variants and it is still an open research problem. Here, we focus on how to use action guidance by means of a non-expert demonstrator to avoid catastrophic events in a domain with sparse, delayed, and deceptive rewards: the recently-proposed multi-agent benchmark of Pommerman. This domain is very challenging for reinforcement learning (RL) --- past work has shown that model-free RL algorithms fail to achieve significant learning. In this paper, we shed light into the reasons behind this failure by exemplifying and analyzing the high rate of catastrophic events (i.e., suicides) that happen under random exploration in this domain. While model-free random exploration is typically futile, we propose a new framework where even a non-expert simulated demonstrator, e.g., planning algorithms such as Monte Carlo tree search with small number of rollouts, can be integrated to asynchronous distributed deep reinforcement learning methods. Compared to vanilla deep RL algorithms, our proposed methods both learn faster and converge to better policies on a two-player mini version of the Pommerman game.

READ FULL TEXT
research
07/25/2019

Action Guidance with MCTS for Deep Reinforcement Learning

Deep reinforcement learning has achieved great successes in recent years...
research
07/26/2019

On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman

How to best explore in domains with sparse, delayed, and deceptive rewar...
research
11/30/2018

Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL

Deep reinforcement learning (DRL) has achieved great successes in recent...
research
02/26/2020

Optimistic Exploration even with a Pessimistic Initialisation

Optimistic initialisation is an effective strategy for efficient explora...
research
10/26/2022

Knowledge-Guided Exploration in Deep Reinforcement Learning

This paper proposes a new method to drastically speed up deep reinforcem...
research
03/22/2019

Improving Safety in Reinforcement Learning Using Model-Based Architectures and Human Intervention

Recent progress in AI and Reinforcement learning has shown great success...
research
06/21/2019

Reinforcement Learning with Convex Constraints

In standard reinforcement learning (RL), a learning agent seeks to optim...

Please sign up or login with your details

Forgot password? Click here to reset