Data-efficient Hindsight Off-policy Option Learning

07/30/2020
by   Markus Wulfmeier, et al.
38

Solutions to most complex tasks can be decomposed into simpler, intermediate skills, reusable across wider ranges of problems. We follow this concept and introduce Hindsight Off-policy Options (HO2), a new algorithm for efficient and robust option learning. The algorithm relies on critic-weighted maximum likelihood estimation and an efficient dynamic programming inference procedure over off-policy trajectories. We can backpropagate through the inference procedure through time and the policy components for every time-step, making it possible to train all component's parameters off-policy, independently of the data-generating behavior policy. Experimentally, we demonstrate that HO2 outperforms competitive baselines and solves demanding robot stacking and ball-in-cup tasks from raw pixel inputs in simulation. We further compare autoregressive option policies with simple mixture policies, providing insights into the relative impact of two types of abstractions common in the options framework: action abstraction and temporal abstraction. Finally, we illustrate challenges caused by stale data in off-policy options learning and provide effective solutions.

READ FULL TEXT

page 6

page 13

page 20

research
01/01/2020

Options of Interest: Temporal Abstraction with Interest Functions

Temporal abstraction refers to the ability of an agent to use behaviours...
research
11/10/2017

Learning with Options that Terminate Off-Policy

A temporally abstract action, or an option, is specified by a policy and...
research
06/25/2020

SOAC: The Soft Option Actor-Critic Architecture

The option framework has shown great promise by automatically extracting...
research
10/18/2021

MDP Abstraction with Successor Features

Abstraction plays an important role for generalisation of knowledge and ...
research
09/20/2021

Context-Specific Representation Abstraction for Deep Option Learning

Hierarchical reinforcement learning has focused on discovering temporall...
research
05/23/2019

Soft Options Critic

The option-critic paper and several variants have successfully demonstra...
research
09/09/2019

Option Encoder: A Framework for Discovering a Policy Basis in Reinforcement Learning

Option discovery and skill acquisition frameworks are integral to the fu...

Please sign up or login with your details

Forgot password? Click here to reset