Algorithms for Batch Hierarchical Reinforcement Learning

03/29/2016
by   Tiancheng Zhao, et al.
0

Hierarchical Reinforcement Learning (HRL) exploits temporal abstraction to solve large Markov Decision Processes (MDP) and provide transferable subtask policies. In this paper, we introduce an off-policy HRL algorithm: Hierarchical Q-value Iteration (HQI). We show that it is possible to effectively learn recursive optimal policies for any valid hierarchical decomposition of the original MDP, given a fixed dataset collected from a flat stochastic behavioral policy. We first formally prove the convergence of the algorithm for tabular MDP. Then our experiments on the Taxi domain show that HQI converges faster than a flat Q-value Iteration and enjoys easy state abstraction. Also, we demonstrate that our algorithm is able to learn optimal policies for different hierarchical structures from the same fixed dataset, which enables model comparison without recollecting data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2015

Value Iteration with Options and State Aggregation

This paper presents a way of solving Markov Decision Processes that comb...
research
09/20/2018

Logically-Constrained Neural Fitted Q-Iteration

This paper proposes a method for efficient training of the Q-function fo...
research
04/26/2022

BATS: Best Action Trajectory Stitching

The problem of offline reinforcement learning focuses on learning a good...
research
03/13/2018

Hierarchical Reinforcement Learning: Approximating Optimal Discounted TSP Using Local Policies

In this work, we provide theoretical guarantees for reward decomposition...
research
03/22/2023

Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees

Although deep reinforcement learning (DRL) has many success stories, the...
research
12/12/2016

Online Reinforcement Learning for Real-Time Exploration in Continuous State and Action Markov Decision Processes

This paper presents a new method to learn online policies in continuous ...
research
06/23/2022

Recursive Reinforcement Learning

Recursion is the fundamental paradigm to finitely describe potentially i...

Please sign up or login with your details

Forgot password? Click here to reset