Hierarchical Reinforcement Learning: Approximating Optimal Discounted TSP Using Local Policies

03/13/2018
by   Tom Zahavy, et al.
0

In this work, we provide theoretical guarantees for reward decomposition in deterministic MDPs. Reward decomposition is a special case of Hierarchical Reinforcement Learning, that allows one to learn many policies in parallel and combine them into a composite solution. Our approach builds on mapping this problem into a Reward Discounted Traveling Salesman Problem, and then deriving approximate solutions for it. In particular, we focus on approximate solutions that are local, i.e., solutions that only observe information about the current state. Local policies are easy to implement and do not require substantial computational resources as they do not perform planning. While local deterministic policies, like Nearest Neighbor, are being used in practice for hierarchical reinforcement learning, we propose three stochastic policies that guarantee better performance than any deterministic policy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2019

Planning in Hierarchical Reinforcement Learning: Guarantees for Using Local Policies

We consider a settings of hierarchical reinforcement learning, in which ...
research
03/29/2016

Algorithms for Batch Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning (HRL) exploits temporal abstraction ...
research
02/19/2023

Compositionality and Bounds for Optimal Value Functions in Reinforcement Learning

An agent's ability to reuse solutions to previously solved problems is c...
research
06/23/2020

Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Offline Reinforcement Learning (RL) is a promising approach for learning...
research
01/18/2021

Regularized Policies are Reward Robust

Entropic regularization of policies in Reinforcement Learning (RL) is a ...
research
09/24/2017

An Optimal Online Method of Selecting Source Policies for Reinforcement Learning

Transfer learning significantly accelerates the reinforcement learning p...
research
06/09/2021

Interaction-Grounded Learning

Consider a prosthetic arm, learning to adapt to its user's control signa...

Please sign up or login with your details

Forgot password? Click here to reset