A compact, hierarchical Q-function decomposition

06/27/2012
by   Bhaskara Marthi, et al.
0

Previous work in hierarchical reinforcement learning has faced a dilemma: either ignore the values of different possible exit states from a subroutine, thereby risking suboptimal behavior, or represent those values explicitly thereby incurring a possibly large representation cost because exit values refer to nonlocal aspects of the world (i.e., all subsequent rewards). This paper shows that, in many cases, one can avoid both of these problems. The solution is based on recursively decomposing the exit value function in terms of Q-functions at higher levels of the hierarchy. This leads to an intuitively appealing runtime architecture in which a parent subroutine passes to its child a value function on the exit states and the child reasons about how its choices affect the exit value. We also identify structural conditions on the value function and transition distributions that allow much more concise representations of exit state distributions, leading to further state abstraction. In essence, the only variables whose exit values need be considered are those that the parent cares about and the child affects. We demonstrate the utility of our algorithms on a series of increasingly complex environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/19/2013

Avoiding Confusion between Predictors and Inhibitors in Value Function Approximation

In reinforcement learning, the goal is to seek rewards and avoid punishm...
research
07/30/2021

Maximum Entropy Dueling Network Architecture

In recent years, there have been many deep structures for Reinforcement ...
research
05/23/2019

Recurrent Value Functions

Despite recent successes in Reinforcement Learning, value-based methods ...
research
01/31/2019

A Geometric Perspective on Optimal Representations for Reinforcement Learning

This paper proposes a new approach to representation learning based on g...
research
01/19/2023

Shapley Values with Uncertain Value Functions

We propose a novel definition of Shapley values with uncertain value fun...
research
11/05/2019

Robo-advising: Learning Investor's Risk Preferences via Portfolio Choices

We introduce a reinforcement learning framework for retail robo-advising...
research
02/24/2022

Threading the Needle of On and Off-Manifold Value Functions for Shapley Explanations

A popular explainable AI (XAI) approach to quantify feature importance o...

Please sign up or login with your details

Forgot password? Click here to reset