DeepAI AI Chat
Log In Sign Up

Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

by   Anders Jonsson, et al.

We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support. We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability. This problem-dependent sample complexity result is expressed in terms of the sub-optimality gaps of the state-action pairs that are visited during exploration. Our experiments reveal that MDP-GapE is also effective in practice, in contrast with other algorithms with sample complexity guarantees in the fixed-confidence setting, that are mostly theoretical.


page 1

page 2

page 3

page 4


On the Sample Complexity of Reinforcement Learning with a Generative Model

We consider the problem of learning the optimal action-value function in...

Active Model Estimation in Markov Decision Processes

We study the problem of efficient exploration in order to learn an accur...

Practical Open-Loop Optimistic Planning

We consider the problem of online planning in a Markov Decision Process ...

Speeding Up Planning in Markov Decision Processes via Automatically Constructed Abstractions

In this paper, we consider planning in stochastic shortest path (SSP) pr...

A Markov Decision Process Approach to Active Meta Learning

In supervised learning, we fit a single statistical model to a given dat...

An Extensible and Modular Design and Implementation of Monte Carlo Tree Search for the JVM

Flexible implementations of Monte Carlo Tree Search (MCTS), combined wit...

Open Loop Execution of Tree-Search Algorithms

In the context of tree-search stochastic planning algorithms where a gen...