DeepAI AI Chat
Log In Sign Up

Hierarchical model-based policy optimization: from actions to action sequences and back

11/28/2019
by   Daniel McNamee, et al.
UCL
0

We develop a normative framework for hierarchical model-based policy optimization based on applying second-order methods in the space of all possible state-action paths. The resulting natural path gradient performs policy updates in a manner which is sensitive to the long-range correlational structure of the induced stationary state-action densities. We demonstrate that the natural path gradient can be computed exactly given an environment dynamics model and depends on expressions akin to higher-order successor representations. In simulation, we show that the priorization of local policy updates in the resulting policy flow indeed reflects the intuitive state-space hierarchy in several toy problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

09/19/2019

Revisit Policy Optimization in Matrix Form

In tabular case, when the reward and environment dynamics are known, pol...
01/26/2023

Joint action loss for proximal policy optimization

PPO (Proximal Policy Optimization) is a state-of-the-art policy gradient...
12/29/2017

Characterizing optimal hierarchical policy inference on graphs via non-equilibrium thermodynamics

Hierarchies are of fundamental interest in both stochastic optimal contr...
06/20/2019

Exploring Model-based Planning with Policy Networks

Model-based reinforcement learning (MBRL) with model-predictive control ...
01/29/2019

Discretizing Continuous Action Space for On-Policy Optimization

In this work, we show that discretizing action space for continuous cont...
01/28/2023

Stochastic Dimension-reduced Second-order Methods for Policy Optimization

In this paper, we propose several new stochastic second-order algorithms...
12/23/2018

The Equivalent Conversions of the Role-Based Access Control Model

The problems which are important for the effective functioning of an acc...