DeepAI AI Chat
Log In Sign Up

Safe Option-Critic: Learning Safety in the Option-Critic Architecture

07/21/2018
by   Arushi Jain, et al.
McGill University
0

Designing hierarchical reinforcement learning algorithms that induce a notion of safety is not only vital for safety-critical applications, but also, brings better understanding of an artificially intelligent agent's decisions. While learning end-to-end options automatically has been fully realized recently, we propose a solution to learning safe options. We introduce the idea of controllability of states based on the temporal difference errors in the option-critic framework. We then derive the policy-gradient theorem with controllability and propose a novel framework called safe option-critic. We demonstrate the effectiveness of our approach in the four-rooms grid-world, cartpole, and three games in the Arcade Learning Environment (ALE): MsPacman, Amidar and Q*Bert. Learning of end-to-end options with the proposed notion of safety achieves reduction in the variance of return and boosts the performance in environments with intrinsic variability in the reward structure. More importantly, the proposed algorithm outperforms the vanilla options in all the environments and primitive actions in two out of three ALE games.

READ FULL TEXT

page 5

page 6

11/04/2020

Diversity-Enriched Option-Critic

Temporal abstraction allows reinforcement learning agents to represent k...
01/07/2022

Attention Option-Critic

Temporal abstraction in reinforcement learning is the ability of an agen...
12/31/2019

On the Role of Weight Sharing During Deep Option Learning

The options framework is a popular approach for building temporally exte...
11/30/2017

Learnings Options End-to-End for Continuous Action Tasks

We present new results on learning temporally extended actions for conti...
12/04/2018

Natural Option Critic

The recently proposed option-critic architecture Bacon et al. provide a ...
05/23/2019

Soft Options Critic

The option-critic paper and several variants have successfully demonstra...
09/14/2017

When Waiting is not an Option : Learning Options with a Deliberation Cost

Recent work has shown that temporally extended actions (options) can be ...