An Option-Dependent Analysis of Regret Minimization Algorithms in Finite-Horizon Semi-Markov Decision Processes

05/10/2023
by   Gianluca Drappo, et al.
0

A large variety of real-world Reinforcement Learning (RL) tasks is characterized by a complex and heterogeneous structure that makes end-to-end (or flat) approaches hardly applicable or even infeasible. Hierarchical Reinforcement Learning (HRL) provides general solutions to address these problems thanks to a convenient multi-level decomposition of the tasks, making their solution accessible. Although often used in practice, few works provide theoretical guarantees to justify this outcome effectively. Thus, it is not yet clear when to prefer such approaches compared to standard flat ones. In this work, we provide an option-dependent upper bound to the regret suffered by regret minimization algorithms in finite-horizon problems. We illustrate that the performance improvement derives from the planning horizon reduction induced by the temporal abstraction enforced by the hierarchical structure. Then, focusing on a sub-setting of HRL approaches, the options framework, we highlight how the average duration of the available options affects the planning horizon and, consequently, the regret itself. Finally, we relax the assumption of having pre-trained options to show how in particular situations, learning hierarchically from scratch could be preferable to using a standard approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2022

Horizon-Free Reinforcement Learning for Latent Markov Decision Processes

We study regret minimization for reinforcement learning (RL) in Latent M...
research
03/25/2017

Exploration--Exploitation in MDPs with Options

While a large body of empirical results show that temporally-extended ac...
research
10/26/2021

Average-Reward Learning and Planning with Options

We extend the options framework for temporal abstraction in reinforcemen...
research
02/12/2020

Regret Bounds for Discounted MDPs

Recently, it has been shown that carefully designed reinforcement learni...
research
10/15/2020

Local Differentially Private Regret Minimization in Reinforcement Learning

Reinforcement learning algorithms are widely used in domains where it is...
research
04/15/2019

Disentangling Options with Hellinger Distance Regularizer

In reinforcement learning (RL), temporal abstraction still remains as an...
research
12/30/2022

POMRL: No-Regret Learning-to-Plan with Increasing Horizons

We study the problem of planning under model uncertainty in an online me...

Please sign up or login with your details

Forgot password? Click here to reset