DeepAI AI Chat
Log In Sign Up

Data-efficient Hindsight Off-policy Option Learning

by   Markus Wulfmeier, et al.

Solutions to most complex tasks can be decomposed into simpler, intermediate skills, reusable across wider ranges of problems. We follow this concept and introduce Hindsight Off-policy Options (HO2), a new algorithm for efficient and robust option learning. The algorithm relies on critic-weighted maximum likelihood estimation and an efficient dynamic programming inference procedure over off-policy trajectories. We can backpropagate through the inference procedure through time and the policy components for every time-step, making it possible to train all component's parameters off-policy, independently of the data-generating behavior policy. Experimentally, we demonstrate that HO2 outperforms competitive baselines and solves demanding robot stacking and ball-in-cup tasks from raw pixel inputs in simulation. We further compare autoregressive option policies with simple mixture policies, providing insights into the relative impact of two types of abstractions common in the options framework: action abstraction and temporal abstraction. Finally, we illustrate challenges caused by stale data in off-policy options learning and provide effective solutions.


page 6

page 13

page 20


Options of Interest: Temporal Abstraction with Interest Functions

Temporal abstraction refers to the ability of an agent to use behaviours...

Learning with Options that Terminate Off-Policy

A temporally abstract action, or an option, is specified by a policy and...

SOAC: The Soft Option Actor-Critic Architecture

The option framework has shown great promise by automatically extracting...

MDP Abstraction with Successor Features

Abstraction plays an important role for generalisation of knowledge and ...

Context-Specific Representation Abstraction for Deep Option Learning

Hierarchical reinforcement learning has focused on discovering temporall...

Soft Options Critic

The option-critic paper and several variants have successfully demonstra...

Option Encoder: A Framework for Discovering a Policy Basis in Reinforcement Learning

Option discovery and skill acquisition frameworks are integral to the fu...