DeepAI AI Chat
Log In Sign Up

Near-optimal Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms for the Non-episodic Setting

by   Ziping Xu, et al.

We study reinforcement learning in factored Markov decision processes (FMDPs) in the non-episodic setting. We focus on regret analyses providing both upper and lower bounds. We propose two near-optimal and oracle-efficient algorithms for FMDPs. Assuming oracle access to an FMDP planner, they enjoy a Bayesian and a frequentist regret bound respectively, both of which reduce to the near-optimal bound O(DS√(AT)) for standard non-factored MDPs. Our lower bound depends on the span of the bias vector rather than the diameter D and we show via a simple Cartesian product construction that FMDPs with a bounded span can have an arbitrarily large diameter, which suggests that bounds with a dependence on diameter can be extremely loose. We, therefore, propose another algorithm that only depends on span but relies on a computationally stronger oracle. Our algorithms outperform the previous near-optimal algorithms on computer network administrator simulations.


page 1

page 2

page 3

page 4


Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

We present an algorithm based on the Optimism in the Face of Uncertainty...

Decentralized Cooperative Reinforcement Learning with Hierarchical Information Structure

Multi-agent reinforcement learning (MARL) problems are challenging due t...

Oracle-Efficient Reinforcement Learning in Factored MDPs with Unknown Structure

We consider provably-efficient reinforcement learning (RL) in non-episod...

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Modern tasks in reinforcement learning are always with large state and a...

Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning

We introduce SCAL, an algorithm designed to perform efficient exploratio...

Nonstationary Reinforcement Learning with Linear Function Approximation

We consider reinforcement learning (RL) in episodic Markov decision proc...

Peek Search: Near-Optimal Online Markov Decoding

We resolve the fundamental problem of online decoding with ergodic Marko...

Code Repositories


Code for paper: Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting

view repo