DeepAI AI Chat
Log In Sign Up

SOAC: The Soft Option Actor-Critic Architecture

06/25/2020
by   Chenghao Li, et al.
Tsinghua University
5

The option framework has shown great promise by automatically extracting temporally-extended sub-tasks from a long-horizon task. Methods have been proposed for concurrently learning low-level intra-option policies and high-level option selection policy. However, existing methods typically suffer from two major challenges: ineffective exploration and unstable updates. In this paper, we present a novel and stable off-policy approach that builds on the maximum entropy model to address these challenges. Our approach introduces an information-theoretical intrinsic reward for encouraging the identification of diverse and effective options. Meanwhile, we utilize a probability inference model to simplify the optimization problem as fitting optimal trajectories. Experimental results demonstrate that our approach significantly outperforms prior on-policy and off-policy methods in a range of Mujoco benchmark tasks while still providing benefits for transfer learning. In these tasks, our approach learns a diverse set of options, each of whose state-action space has strong coherence.

READ FULL TEXT

page 1

page 2

page 3

page 4

04/29/2019

DAC: The Double Actor-Critic Architecture for Learning Options

We reformulate the option framework as two parallel augmented MDPs. Unde...
11/04/2020

Diversity-Enriched Option-Critic

Temporal abstraction allows reinforcement learning agents to represent k...
04/01/2019

Multitask Soft Option Learning

We present Multitask Soft Option Learning (MSOL), a hierarchical multita...
07/30/2020

Data-efficient Hindsight Off-policy Option Learning

Solutions to most complex tasks can be decomposed into simpler, intermed...
09/05/2022

MO2: Model-Based Offline Options

The ability to discover useful behaviours from past experience and trans...
09/30/2022

Multi-Task Option Learning and Discovery for Stochastic Path Planning

This paper addresses the problem of reliably and efficiently solving bro...
11/01/2019

PODNet: A Neural Network for Discovery of Plannable Options

Learning from demonstration has been widely studied in machine learning ...