DeepAI AI Chat
Log In Sign Up

Hierarchical model-based policy optimization: from actions to action sequences and back

by   Daniel McNamee, et al.

We develop a normative framework for hierarchical model-based policy optimization based on applying second-order methods in the space of all possible state-action paths. The resulting natural path gradient performs policy updates in a manner which is sensitive to the long-range correlational structure of the induced stationary state-action densities. We demonstrate that the natural path gradient can be computed exactly given an environment dynamics model and depends on expressions akin to higher-order successor representations. In simulation, we show that the priorization of local policy updates in the resulting policy flow indeed reflects the intuitive state-space hierarchy in several toy problems.


page 1

page 2

page 3

page 4


Revisit Policy Optimization in Matrix Form

In tabular case, when the reward and environment dynamics are known, pol...

Joint action loss for proximal policy optimization

PPO (Proximal Policy Optimization) is a state-of-the-art policy gradient...

Characterizing optimal hierarchical policy inference on graphs via non-equilibrium thermodynamics

Hierarchies are of fundamental interest in both stochastic optimal contr...

Exploring Model-based Planning with Policy Networks

Model-based reinforcement learning (MBRL) with model-predictive control ...

Discretizing Continuous Action Space for On-Policy Optimization

In this work, we show that discretizing action space for continuous cont...

Stochastic Dimension-reduced Second-order Methods for Policy Optimization

In this paper, we propose several new stochastic second-order algorithms...

The Equivalent Conversions of the Role-Based Access Control Model

The problems which are important for the effective functioning of an acc...