Monte-Carlo Tree Search for Policy Optimization

12/23/2019
by   Xiaobai Ma, et al.
0

Gradient-based methods are often used for policy optimization in deep reinforcement learning, despite being vulnerable to local optima and saddle points. Although gradient-free methods (e.g., genetic algorithms or evolution strategies) help mitigate these issues, poor initialization and local optima are still concerns in highly nonconvex spaces. This paper presents a method for policy optimization based on Monte-Carlo tree search and gradient-free optimization. Our method, called Monte-Carlo tree search for policy optimization (MCTSPO), provides a better exploration-exploitation trade-off through the use of the upper confidence bound heuristic. We demonstrate improved performance on reinforcement learning tasks with deceptive or sparse reward functions compared to popular gradient-based and deep genetic algorithm baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2022

An Efficient Dynamic Sampling Policy For Monte Carlo Tree Search

We consider the popular tree-based search strategy within the framework ...
research
03/11/2013

Monte-Carlo utility estimates for Bayesian reinforcement learning

This paper introduces a set of algorithms for Monte-Carlo Bayesian reinf...
research
09/12/2023

Update Monte Carlo tree search (UMCTS) algorithm for heuristic global search of sizing optimization problems for truss structures

Sizing optimization of truss structures is a complex computational probl...
research
08/07/2023

MCTS guided Genetic Algorithm for optimization of neural network weights

In this research, we investigate the possibility of applying a search st...
research
05/18/2018

Multifunction Cognitive Radar Task Scheduling Using Monte Carlo Tree Search and Policy Networks

A modern radar may be designed to perform multiple functions, such as su...
research
05/17/2021

Efficient yield optimization with limited gradient information

In this work an efficient strategy for yield optimization with uncertain...
research
11/28/2001

Gradient-based Reinforcement Planning in Policy-Search Methods

We introduce a learning method called "gradient-based reinforcement plan...

Please sign up or login with your details

Forgot password? Click here to reset