Log In Sign Up

Model-Based Reinforcement Learning with Value-Targeted Regression

by   Alex Ayoub, et al.

This paper studies model-based reinforcement learning (RL) for regret minimization. We focus on finite-horizon episodic RL where the transition model P belongs to a known family of models P, a special case of which is when models in P take the form of linear mixtures: P_θ = ∑_i=1^dθ_iP_i. We propose a model based RL algorithm that is based on optimism principle: In each episode, the set of models that are `consistent' with the data collected is constructed. The criterion of consistency is based on the total squared error of that the model incurs on the task of predicting values as determined by the last value estimate along the transitions. The next value function is then chosen by solving the optimistic planning problem with the constructed set of models. We derive a bound on the regret, which, in the special case of linear mixtures, the regret bound takes the form Õ(d√(H^3T)), where H, T and d are the horizon, total number of steps and dimension of θ, respectively. In particular, this regret bound is independent of the total number of states or actions, and is close to a lower bound Ω(√(HdT)). For a general model family P, the regret bound is derived using the notion of the so-called Eluder dimension proposed by Russo Van Roy (2014).


page 1

page 2

page 3

page 4


Horizon-Free Reinforcement Learning for Latent Markov Decision Processes

We study regret minimization for reinforcement learning (RL) in Latent M...

Exponential Family Model-Based Reinforcement Learning via Score Matching

We propose an optimistic model-based algorithm, dubbed SMRL, for finite-...

Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning

In this work, we propose a novel Kernelized Stein Discrepancy-based Post...

Branching Reinforcement Learning

In this paper, we propose a novel Branching Reinforcement Learning (Bran...

Minimax Regret Bounds for Reinforcement Learning

We consider the problem of provably optimal exploration in reinforcement...

Model Selection with Near Optimal Rates for Reinforcement Learning with General Model Classes

We address the problem of model selection for the finite horizon episodi...

A Tractable Algorithm For Finite-Horizon Continuous Reinforcement Learning

We consider the finite horizon continuous reinforcement learning problem...