Incremental Model-based Learners With Formal Learning-Time Guarantees

06/27/2012
by   Alexander L. Strehl, et al.
0

Model-based learning algorithms have been shown to use experience efficiently when learning to solve Markov Decision Processes (MDPs) with finite state and action spaces. However, their high computational cost due to repeatedly solving an internal model inhibits their use in large-scale problems. We propose a method based on real-time dynamic programming (RTDP) to speed up two model-based algorithms, RMAX and MBIE (model-based interval estimation), resulting in computationally much faster algorithms with little loss compared to existing bounds. Specifically, our two new learning algorithms, RTDP-RMAX and RTDP-IE, have considerably smaller computational demands than RMAX and MBIE. We develop a general theoretical framework that allows us to prove that both are efficient learners in a PAC (probably approximately correct) sense. We also present an experimental evaluation of these new algorithms that helps quantify the tradeoff between computational and experience demands.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2019

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

State-of-the-art efficient model-based Reinforcement Learning (RL) algor...
research
06/11/2020

PAC Bounds for Imitation and Model-based Batch Learning of Contextual Markov Decision Processes

We consider the problem of batch multi-task reinforcement learning with ...
research
12/21/2019

Can Agents Learn by Analogy? An Inferable Model for PAC Reinforcement Learning

Model-based reinforcement learning algorithms make decisions by building...
research
06/26/2013

Scaling Up Robust MDPs by Reinforcement Learning

We consider large-scale Markov decision processes (MDPs) with parameter ...
research
09/05/2020

A Hybrid PAC Reinforcement Learning Algorithm

This paper offers a new hybrid probably asymptotically correct (PAC) rei...
research
05/15/2019

Stochastic approximation with cone-contractive operators: Sharp ℓ_∞-bounds for Q-learning

Motivated by the study of Q-learning algorithms in reinforcement learnin...
research
02/19/2016

Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models

In this paper we study a model-based approach to calculating approximate...

Please sign up or login with your details

Forgot password? Click here to reset