Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

06/24/2020
by   Yi Tian, et al.
11

We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components. Assuming the factorization is known, we propose two model-based algorithms. The first one achieves minimax optimal regret guarantees for a rich class of factored structures, while the second one enjoys better computational complexity with a slightly worse regret. A key new ingredient of our algorithms is the design of a bonus term to guide exploration. We complement our algorithms by presenting several structure-dependent lower bounds on regret for FMDPs that reveal the difficulty hiding in the intricacy of the structures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2020

Minimax Optimal Reinforcement Learning for Discounted MDPs

We study the reinforcement learning problem for discounted Markov Decisi...
research
01/31/2023

Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments

We study variance-dependent regret bounds for Markov decision processes ...
research
12/08/2020

Minimax Regret Optimisation for Robust Planning in Uncertain Markov Decision Processes

The parameters for a Markov Decision Process (MDP) often cannot be speci...
research
08/21/2020

Refined Analysis of FPL for Adversarial Markov Decision Processes

We consider the adversarial Markov Decision Process (MDP) problem, where...
research
05/27/2019

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

State-of-the-art efficient model-based Reinforcement Learning (RL) algor...
research
09/13/2020

Oracle-Efficient Reinforcement Learning in Factored MDPs with Unknown Structure

We consider provably-efficient reinforcement learning (RL) in non-episod...
research
04/22/2019

Non-Stationary Markov Decision Processes a Worst-Case Approach using Model-Based Reinforcement Learning

This work tackles the problem of robust zero-shot planning in non-statio...

Please sign up or login with your details

Forgot password? Click here to reset