DeepAI
Log In Sign Up

Polynomial Time Reinforcement Learning in Correlated FMDPs with Linear Value Functions

07/12/2021
by   Siddartha Devic, et al.
0

Many reinforcement learning (RL) environments in practice feature enormous state spaces that may be described compactly by a "factored" structure, that may be modeled by Factored Markov Decision Processes (FMDPs). We present the first polynomial-time algorithm for RL with FMDPs that does not rely on an oracle planner, and instead of requiring a linear transition model, only requires a linear value function with a suitable local basis with respect to the factorization. With this assumption, we can solve FMDPs in polynomial time by constructing an efficient separation oracle for convex optimization. Importantly, and in contrast to prior work, we do not assume that the transitions on various factors are independent.

READ FULL TEXT

page 1

page 2

page 3

page 4

12/12/2022

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

We study reinforcement learning (RL) with linear function approximation....
04/21/2009

Optimistic Initialization and Greediness Lead to Polynomial Time Learning in Factored MDPs - Extended Version

In this paper we propose an algorithm for polynomial-time reinforcement ...
06/15/2021

Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity

Reinforcement learning (RL) is empirically successful in complex nonline...
06/22/2020

A polynomial-time algorithm for learning nonparametric causal graphs

We establish finite-sample guarantees for a polynomial-time algorithm fo...
11/06/2018

State Aggregation Learning from Markov Transition Data

State aggregation is a model reduction method rooted in control theory a...
09/13/2020

Oracle-Efficient Reinforcement Learning in Factored MDPs with Unknown Structure

We consider provably-efficient reinforcement learning (RL) in non-episod...
02/11/2022

Computational-Statistical Gaps in Reinforcement Learning

Reinforcement learning with function approximation has recently achieved...