DAC: The Double Actor-Critic Architecture for Learning Options

04/29/2019
by   Shangtong Zhang, et al.
16

We reformulate the option framework as two parallel augmented MDPs. Under this novel formulation, all policy optimization algorithms can be used off the shelf to learn intra-option policies, option termination conditions, and a master policy over options. We apply an actor-critic algorithm on each augmented MDP, yielding the Double Actor-Critic (DAC) architecture. Furthermore, we show that, when state-value functions are used as critics, one critic can be expressed in terms of the other, and hence only one critic is necessary. Our experiments on challenging robot simulation tasks demonstrate that DAC outperforms previous gradient-based option learning algorithms by a large margin and significantly outperforms its hierarchy-free counterparts in a transfer learning setting.

READ FULL TEXT
research
07/12/2021

Cautious Actor-Critic

The oscillating performance of off-policy learning and persisting errors...
research
11/06/2018

ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search

In this paper, we propose an actor ensemble algorithm, named ACE, for co...
research
06/25/2020

SOAC: The Soft Option Actor-Critic Architecture

The option framework has shown great promise by automatically extracting...
research
11/11/2019

Provably Convergent Off-Policy Actor-Critic with Function Approximation

We present the first provably convergent off-policy actor-critic algorit...
research
05/23/2019

Soft Options Critic

The option-critic paper and several variants have successfully demonstra...
research
04/01/2019

Multitask Soft Option Learning

We present Multitask Soft Option Learning (MSOL), a hierarchical multita...
research
12/04/2018

Natural Option Critic

The recently proposed option-critic architecture Bacon et al. provide a ...

Please sign up or login with your details

Forgot password? Click here to reset