Log In Sign Up

Polynomial Time Reinforcement Learning in Correlated FMDPs with Linear Value Functions

by   Siddartha Devic, et al.

Many reinforcement learning (RL) environments in practice feature enormous state spaces that may be described compactly by a "factored" structure, that may be modeled by Factored Markov Decision Processes (FMDPs). We present the first polynomial-time algorithm for RL with FMDPs that does not rely on an oracle planner, and instead of requiring a linear transition model, only requires a linear value function with a suitable local basis with respect to the factorization. With this assumption, we can solve FMDPs in polynomial time by constructing an efficient separation oracle for convex optimization. Importantly, and in contrast to prior work, we do not assume that the transitions on various factors are independent.


page 1

page 2

page 3

page 4


Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

We study reinforcement learning (RL) with linear function approximation....

Optimistic Initialization and Greediness Lead to Polynomial Time Learning in Factored MDPs - Extended Version

In this paper we propose an algorithm for polynomial-time reinforcement ...

Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity

Reinforcement learning (RL) is empirically successful in complex nonline...

A polynomial-time algorithm for learning nonparametric causal graphs

We establish finite-sample guarantees for a polynomial-time algorithm fo...

State Aggregation Learning from Markov Transition Data

State aggregation is a model reduction method rooted in control theory a...

Oracle-Efficient Reinforcement Learning in Factored MDPs with Unknown Structure

We consider provably-efficient reinforcement learning (RL) in non-episod...

Computational-Statistical Gaps in Reinforcement Learning

Reinforcement learning with function approximation has recently achieved...