SOAC: The Soft Option Actor-Critic Architecture

06/25/2020
by   Chenghao Li, et al.
5

The option framework has shown great promise by automatically extracting temporally-extended sub-tasks from a long-horizon task. Methods have been proposed for concurrently learning low-level intra-option policies and high-level option selection policy. However, existing methods typically suffer from two major challenges: ineffective exploration and unstable updates. In this paper, we present a novel and stable off-policy approach that builds on the maximum entropy model to address these challenges. Our approach introduces an information-theoretical intrinsic reward for encouraging the identification of diverse and effective options. Meanwhile, we utilize a probability inference model to simplify the optimization problem as fitting optimal trajectories. Experimental results demonstrate that our approach significantly outperforms prior on-policy and off-policy methods in a range of Mujoco benchmark tasks while still providing benefits for transfer learning. In these tasks, our approach learns a diverse set of options, each of whose state-action space has strong coherence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2019

DAC: The Double Actor-Critic Architecture for Learning Options

We reformulate the option framework as two parallel augmented MDPs. Unde...
research
11/04/2020

Diversity-Enriched Option-Critic

Temporal abstraction allows reinforcement learning agents to represent k...
research
04/01/2019

Multitask Soft Option Learning

We present Multitask Soft Option Learning (MSOL), a hierarchical multita...
research
07/30/2020

Data-efficient Hindsight Off-policy Option Learning

Solutions to most complex tasks can be decomposed into simpler, intermed...
research
09/05/2022

MO2: Model-Based Offline Options

The ability to discover useful behaviours from past experience and trans...
research
09/30/2022

Multi-Task Option Learning and Discovery for Stochastic Path Planning

This paper addresses the problem of reliably and efficiently solving bro...
research
11/01/2019

PODNet: A Neural Network for Discovery of Plannable Options

Learning from demonstration has been widely studied in machine learning ...

Please sign up or login with your details

Forgot password? Click here to reset