Policy Gradient Algorithms with Monte-Carlo Tree Search for Non-Markov Decision Processes

06/02/2022
by   Tetsuro Morimura, et al.
10

Policy gradient (PG) is a reinforcement learning (RL) approach that optimizes a parameterized policy model for an expected return using gradient ascent. Given a well-parameterized policy model, such as a neural network model, with appropriate initial parameters, the PG algorithms work well even when environment does not have the Markov property. Otherwise, they can be trapped on a plateau or suffer from peakiness effects. As another successful RL approach, algorithms based on Monte-Carlo Tree Search (MCTS), which include AlphaZero, have obtained groundbreaking results especially on the board game playing domain. They are also suitable to be applied to non-Markov decision processes. However, since the standard MCTS does not have the ability to learn state representation, the size of the tree-search space can be too large to search. In this work, we examine a mixture policy of PG and MCTS to complement each other's difficulties and take advantage of them. We derive conditions for asymptotic convergence with results of a two-timescale stochastic approximation and propose an algorithm that satisfies these conditions. The effectivity of the proposed methods is verified through numerical experiments on non-Markov decision processes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2015

Geometry and Determinism of Optimal Stationary Control in Partially Observable Markov Decision Processes

It is well known that for any finite state Markov decision process (MDP)...
research
05/23/2022

Learning to branch with Tree MDPs

State-of-the-art Mixed Integer Linear Program (MILP) solvers combine sys...
research
07/25/2020

Simulation Based Algorithms for Markov Decision Processes and Multi-Action Restless Bandits

We consider multi-dimensional Markov decision processes and formulate a ...
research
06/13/2022

Relative Policy-Transition Optimization for Fast Policy Transfer

We consider the problem of policy transfer between two Markov Decision P...
research
04/03/2018

Renewal Monte Carlo: Renewal theory based reinforcement learning

In this paper, we present an online reinforcement learning algorithm, ca...
research
08/15/2023

Formally-Sharp DAgger for MCTS: Lower-Latency Monte Carlo Tree Search using Data Aggregation with Formal Methods

We study how to efficiently combine formal methods, Monte Carlo Tree Sea...
research
10/20/2011

A Version of Geiringer-like Theorem for Decision Making in the Environments with Randomness and Incomplete Information

Purpose: In recent years Monte-Carlo sampling methods, such as Monte Car...

Please sign up or login with your details

Forgot password? Click here to reset