Bounded Optimal Exploration in MDP

04/05/2016
by   Kenji Kawaguchi, et al.
0

Within the framework of probably approximately correct Markov decision processes (PAC-MDP), much theoretical work has focused on methods to attain near optimality after a relatively long period of learning and exploration. However, practical concerns require the attainment of satisfactory behavior within a short period of time. In this paper, we relax the PAC-MDP conditions to reconcile theoretically driven exploration methods and practical needs. We propose simple algorithms for discrete and continuous state spaces, and illustrate the benefits of our proposed relaxation via theoretical analyses and numerical examples. Our algorithms also maintain anytime error bounds and average loss bounds. Our approach accommodates both Bayesian and non-Bayesian methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/03/2022

PAC Statistical Model Checking of Mean Payoff in Discrete- and Continuous-Time MDP

Markov decision processes (MDP) and continuous-time MDP (CTMDP) are the ...
research
06/17/2019

Of Cores: A Partial-Exploration Framework for Markov Decision Processes

We introduce a framework for approximate analysis of Markov decision pro...
research
10/06/2018

Bayes-CPACE: PAC Optimal Exploration in Continuous Space Bayes-Adaptive Markov Decision Processes

We present the first PAC optimal algorithm for Bayes-Adaptive Markov Dec...
research
12/28/2020

Blackwell Online Learning for Markov Decision Processes

This work provides a novel interpretation of Markov Decision Processes (...
research
11/24/2021

Reinforcement Learning for General LTL Objectives Is Intractable

In recent years, researchers have made significant progress in devising ...
research
08/31/2018

Directed Exploration in PAC Model-Free Reinforcement Learning

We study an exploration method for model-free RL that generalizes the co...
research
03/10/2022

Data-driven Abstractions with Probabilistic Guarantees for Linear PETC Systems

We employ the scenario approach to compute probably approximately correc...

Please sign up or login with your details

Forgot password? Click here to reset