Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL

11/30/2018
by   Bilal Kartal, et al.
0

Deep reinforcement learning (DRL) has achieved great successes in recent years with the help of novel methods and higher compute power. However, there are still several challenges to be addressed such as convergence to locally optimal policies and long training times. In this paper, firstly, we augment Asynchronous Advantage Actor-Critic (A3C) method with a novel self-supervised auxiliary task, i.e. Terminal Prediction, measuring temporal closeness to terminal states, namely A3C-TP. Secondly, we propose a new framework where planning algorithms such as Monte Carlo tree search or other sources of (simulated) demonstrators can be integrated to asynchronous distributed DRL methods. Compared to vanilla A3C, our proposed methods both learn faster and converge to better policies on a two-player mini version of the Pommerman game.

READ FULL TEXT
research
07/25/2019

Action Guidance with MCTS for Deep Reinforcement Learning

Deep reinforcement learning has achieved great successes in recent years...
research
07/24/2019

Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning

Deep reinforcement learning has achieved great successes in recent years...
research
04/10/2019

Safer Deep RL with Shallow MCTS: A Case Study in Pommerman

Safe reinforcement learning has many variants and it is still an open re...
research
02/01/2023

Alphazzle: Jigsaw Puzzle Solver with Deep Monte-Carlo Tree Search

Solving jigsaw puzzles requires to grasp the visual features of a sequen...
research
09/20/2021

Seriema: RDMA-based Remote Invocationwith a Case-Study on Monte-Carlo Tree Search

We introduce Seriema, a middleware that integrates RDMA-based remote inv...
research
02/14/2021

Costly Features Classification using Monte Carlo Tree Search

We consider the problem of costly feature classification, where we seque...
research
02/11/2023

UGAE: A Novel Approach to Non-exponential Discounting

The discounting mechanism in Reinforcement Learning determines the relat...

Please sign up or login with your details

Forgot password? Click here to reset