Data-efficient Hindsight Off-policy Option Learning

07/30/2020
by   Markus Wulfmeier, et al.
38

Solutions to most complex tasks can be decomposed into simpler, intermediate skills, reusable across wider ranges of problems. We follow this concept and introduce Hindsight Off-policy Options (HO2), a new algorithm for efficient and robust option learning. The algorithm relies on critic-weighted maximum likelihood estimation and an efficient dynamic programming inference procedure over off-policy trajectories. We can backpropagate through the inference procedure through time and the policy components for every time-step, making it possible to train all component's parameters off-policy, independently of the data-generating behavior policy. Experimentally, we demonstrate that HO2 outperforms competitive baselines and solves demanding robot stacking and ball-in-cup tasks from raw pixel inputs in simulation. We further compare autoregressive option policies with simple mixture policies, providing insights into the relative impact of two types of abstractions common in the options framework: action abstraction and temporal abstraction. Finally, we illustrate challenges caused by stale data in off-policy options learning and provide effective solutions.

READ FULL TEXT

Authors

page 6

page 13

page 20

01/01/2020

Options of Interest: Temporal Abstraction with Interest Functions

Temporal abstraction refers to the ability of an agent to use behaviours...
11/10/2017

Learning with Options that Terminate Off-Policy

A temporally abstract action, or an option, is specified by a policy and...
11/04/2020

Diversity-Enriched Option-Critic

Temporal abstraction allows reinforcement learning agents to represent k...
06/25/2020

SOAC: The Soft Option Actor-Critic Architecture

The option framework has shown great promise by automatically extracting...
09/20/2021

Context-Specific Representation Abstraction for Deep Option Learning

Hierarchical reinforcement learning has focused on discovering temporall...
05/23/2019

Soft Options Critic

The option-critic paper and several variants have successfully demonstra...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.