Feedback-Based Tree Search for Reinforcement Learning

05/15/2018
by   Daniel R. Jiang, et al.
0

Inspired by recent successes of Monte-Carlo tree search (MCTS) in a number of artificial intelligence (AI) application domains, we propose a model-based reinforcement learning (RL) technique that iteratively applies MCTS on batches of small, finite-horizon versions of the original infinite-horizon Markov decision process. The terminal condition of the finite-horizon problems, or the leaf-node evaluator of the decision tree generated by MCTS, is specified using a combination of an estimated value function and an estimated policy function. The recommendations generated by the MCTS procedure are then provided as feedback in order to refine, through classification and regression, the leaf-node evaluator for the next iteration. We provide the first sample complexity bounds for a tree search-based RL algorithm. In addition, we show that a deep neural network implementation of the technique can create a competitive AI agent for the popular multi-player online battle arena (MOBA) game King of Glory.

READ FULL TEXT

page 7

page 8

research
04/26/2022

An Efficient Dynamic Sampling Policy For Monte Carlo Tree Search

We consider the popular tree-based search strategy within the framework ...
research
04/11/2022

Settling the Sample Complexity of Model-Based Offline Reinforcement Learning

This paper is concerned with offline reinforcement learning (RL), which ...
research
12/12/2021

Tree-based Focused Web Crawling with Reinforcement Learning

A focused crawler aims at discovering as many web pages relevant to a ta...
research
11/01/2019

Frequentist Regret Bounds for Randomized Least-Squares Value Iteration

We consider the exploration-exploitation dilemma in finite-horizon reinf...
research
02/12/2018

ReinforceWalk: Learning to Walk in Graph with Monte Carlo Tree Search

Learning to walk over a graph towards a target node for a given input qu...
research
12/12/2022

Reinforcement Learning and Tree Search Methods for the Unit Commitment Problem

The unit commitment (UC) problem, which determines operating schedules o...
research
10/30/2017

Artificial Intelligence as Structural Estimation: Economic Interpretations of Deep Blue, Bonanza, and AlphaGo

Artificial intelligence (AI) has achieved superhuman performance in a gr...

Please sign up or login with your details

Forgot password? Click here to reset