Near-optimal Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms for the Non-episodic Setting

02/06/2020
by   Ziping Xu, et al.
9

We study reinforcement learning in factored Markov decision processes (FMDPs) in the non-episodic setting. We focus on regret analyses providing both upper and lower bounds. We propose two near-optimal and oracle-efficient algorithms for FMDPs. Assuming oracle access to an FMDP planner, they enjoy a Bayesian and a frequentist regret bound respectively, both of which reduce to the near-optimal bound O(DS√(AT)) for standard non-factored MDPs. Our lower bound depends on the span of the bias vector rather than the diameter D and we show via a simple Cartesian product construction that FMDPs with a bounded span can have an arbitrarily large diameter, which suggests that bounds with a dependence on diameter can be extremely loose. We, therefore, propose another algorithm that only depends on span but relies on a computationally stronger oracle. Our algorithms outperform the previous near-optimal algorithms on computer network administrator simulations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2019

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

We present an algorithm based on the Optimism in the Face of Uncertainty...
research
11/01/2021

Decentralized Cooperative Reinforcement Learning with Hierarchical Information Structure

Multi-agent reinforcement learning (MARL) problems are challenging due t...
research
09/13/2020

Oracle-Efficient Reinforcement Learning in Factored MDPs with Unknown Structure

We consider provably-efficient reinforcement learning (RL) in non-episod...
research
06/23/2020

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Modern tasks in reinforcement learning are always with large state and a...
research
02/12/2018

Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning

We introduce SCAL, an algorithm designed to perform efficient exploratio...
research
10/08/2020

Nonstationary Reinforcement Learning with Linear Function Approximation

We consider reinforcement learning (RL) in episodic Markov decision proc...
research
10/16/2018

Peek Search: Near-Optimal Online Markov Decoding

We resolve the fundamental problem of online decoding with ergodic Marko...

Please sign up or login with your details

Forgot password? Click here to reset