Polynomial Time Reinforcement Learning in Correlated FMDPs with Linear Value Functions

07/12/2021
by   Siddartha Devic, et al.
0

Many reinforcement learning (RL) environments in practice feature enormous state spaces that may be described compactly by a "factored" structure, that may be modeled by Factored Markov Decision Processes (FMDPs). We present the first polynomial-time algorithm for RL with FMDPs that does not rely on an oracle planner, and instead of requiring a linear transition model, only requires a linear value function with a suitable local basis with respect to the factorization. With this assumption, we can solve FMDPs in polynomial time by constructing an efficient separation oracle for convex optimization. Importantly, and in contrast to prior work, we do not assume that the transitions on various factors are independent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2022

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

We study reinforcement learning (RL) with linear function approximation....
research
06/15/2021

Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity

Reinforcement learning (RL) is empirically successful in complex nonline...
research
04/21/2009

Optimistic Initialization and Greediness Lead to Polynomial Time Learning in Factored MDPs - Extended Version

In this paper we propose an algorithm for polynomial-time reinforcement ...
research
06/22/2020

A polynomial-time algorithm for learning nonparametric causal graphs

We establish finite-sample guarantees for a polynomial-time algorithm fo...
research
11/06/2018

State Aggregation Learning from Markov Transition Data

State aggregation is a model reduction method rooted in control theory a...
research
02/11/2022

Computational-Statistical Gaps in Reinforcement Learning

Reinforcement learning with function approximation has recently achieved...
research
10/20/2022

Dynamic selection of p-norm in linear adaptive filtering via online kernel-based reinforcement learning

This study addresses the problem of selecting dynamically, at each time ...

Please sign up or login with your details

Forgot password? Click here to reset