Planning in Hierarchical Reinforcement Learning: Guarantees for Using Local Policies

02/26/2019
by   Tom Zahavy, et al.
0

We consider a settings of hierarchical reinforcement learning, in which the reward is a sum of components. For each component we are given a policy that maximizes it and our goal is to assemble a policy from the individual policies that maximizes the sum of the components. We provide theoretical guarantees for assembling such policies in deterministic MDPs with collectible rewards. Our approach builds on formulating this problem as a traveling salesman problem with discounted reward. We focus on local solutions, i.e., policies that only use information from the current state; thus, they are easy to implement and do not require substantial computational resources. We propose three local stochastic policies and prove that they guarantee better performance than any deterministic local policy in the worst case; experimental results suggest that they also perform better on average.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/13/2018

Hierarchical Reinforcement Learning: Approximating Optimal Discounted TSP Using Local Policies

In this work, we provide theoretical guarantees for reward decomposition...
research
02/07/2020

Provably efficient reconstruction of policy networks

Recent research has shown that learning poli-cies parametrized by large ...
research
02/10/2021

Defense Against Reward Poisoning Attacks in Reinforcement Learning

We study defense strategies against reward poisoning attacks in reinforc...
research
09/07/2022

On the Near-Optimality of Local Policies in Large Cooperative Multi-Agent Reinforcement Learning

We show that in a cooperative N-agent network, one can design locally ex...
research
01/18/2021

Regularized Policies are Reward Robust

Entropic regularization of policies in Reinforcement Learning (RL) is a ...
research
06/09/2021

Interaction-Grounded Learning

Consider a prosthetic arm, learning to adapt to its user's control signa...
research
12/24/2020

Mesh Based Analysis of Low Fractal Dimension ReinforcementLearning Policies

In previous work, using a process we call meshing, the reachable state s...

Please sign up or login with your details

Forgot password? Click here to reset