Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models

02/19/2016
by   Bernardo Ávila Pires, et al.
0

In this paper we study a model-based approach to calculating approximately optimal policies in Markovian Decision Processes. In particular, we derive novel bounds on the loss of using a policy derived from a factored linear model, a class of models which generalize numerous previous models out of those that come with strong computational guarantees. For the first time in the literature, we derive performance bounds for model-based techniques where the model inaccuracy is measured in weighted norms. Moreover, our bounds show a decreased sensitivity to the discount factor and, unlike similar bounds derived for other approaches, they are insensitive to measure mismatch. Similarly to previous works, our proofs are also based on contraction arguments, but with the main differences that we use carefully constructed norms building on Banach lattices, and the contraction property is only assumed for operators acting on "compressed" spaces, thus weakening previous assumptions, while strengthening previous results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2015

Emphatic TD Bellman Operator is a Contraction

Recently, SuttonMW15 introduced the emphatic temporal differences (ETD) ...
research
05/15/2019

Stochastic approximation with cone-contractive operators: Sharp ℓ_∞-bounds for Q-learning

Motivated by the study of Q-learning algorithms in reinforcement learnin...
research
09/18/2020

A Contraction Approach to Model-based Reinforcement Learning

Model-based Reinforcement Learning has shown considerable experimental s...
research
06/11/2020

PAC Bounds for Imitation and Model-based Batch Learning of Contextual Markov Decision Processes

We consider the problem of batch multi-task reinforcement learning with ...
research
03/02/2021

Minimax Model Learning

We present a novel off-policy loss function for learning a transition mo...
research
01/31/2023

Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments

We study variance-dependent regret bounds for Markov decision processes ...
research
06/27/2012

Incremental Model-based Learners With Formal Learning-Time Guarantees

Model-based learning algorithms have been shown to use experience effici...

Please sign up or login with your details

Forgot password? Click here to reset