Action Guidance with MCTS for Deep Reinforcement Learning

07/25/2019
by   Bilal Kartal, et al.
0

Deep reinforcement learning has achieved great successes in recent years, however, one main challenge is the sample inefficiency. In this paper, we focus on how to use action guidance by means of a non-expert demonstrator to improve sample efficiency in a domain with sparse, delayed, and possibly deceptive rewards: the recently-proposed multi-agent benchmark of Pommerman. We propose a new framework where even a non-expert simulated demonstrator, e.g., planning algorithms such as Monte Carlo tree search with a small number rollouts, can be integrated within asynchronous distributed deep reinforcement learning methods. Compared to a vanilla deep RL algorithm, our proposed methods both learn faster and converge to better policies on a two-player mini version of the Pommerman game.

READ FULL TEXT
research
04/10/2019

Safer Deep RL with Shallow MCTS: A Case Study in Pommerman

Safe reinforcement learning has many variants and it is still an open re...
research
11/30/2018

Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL

Deep reinforcement learning (DRL) has achieved great successes in recent...
research
05/12/2020

Unbiased Deep Reinforcement Learning: A General Training Framework for Existing and Future Algorithms

In recent years deep neural networks have been successfully applied to t...
research
12/26/2020

Towards sample-efficient episodic control with DAC-ML

The sample-inefficiency problem in Artificial Intelligence refers to the...
research
07/24/2019

Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning

Deep reinforcement learning has achieved great successes in recent years...
research
03/23/2020

Importance of using appropriate baselines for evaluation of data-efficiency in deep reinforcement learning for Atari

Reinforcement learning (RL) has seen great advancements in the past few ...
research
11/25/2019

Biologically inspired architectures for sample-efficient deep reinforcement learning

Deep reinforcement learning requires a heavy price in terms of sample ef...

Please sign up or login with your details

Forgot password? Click here to reset