DeepAI AI Chat
Log In Sign Up

Safe Option-Critic: Learning Safety in the Option-Critic Architecture

by   Arushi Jain, et al.
McGill University

Designing hierarchical reinforcement learning algorithms that induce a notion of safety is not only vital for safety-critical applications, but also, brings better understanding of an artificially intelligent agent's decisions. While learning end-to-end options automatically has been fully realized recently, we propose a solution to learning safe options. We introduce the idea of controllability of states based on the temporal difference errors in the option-critic framework. We then derive the policy-gradient theorem with controllability and propose a novel framework called safe option-critic. We demonstrate the effectiveness of our approach in the four-rooms grid-world, cartpole, and three games in the Arcade Learning Environment (ALE): MsPacman, Amidar and Q*Bert. Learning of end-to-end options with the proposed notion of safety achieves reduction in the variance of return and boosts the performance in environments with intrinsic variability in the reward structure. More importantly, the proposed algorithm outperforms the vanilla options in all the environments and primitive actions in two out of three ALE games.


page 5

page 6


Diversity-Enriched Option-Critic

Temporal abstraction allows reinforcement learning agents to represent k...

Attention Option-Critic

Temporal abstraction in reinforcement learning is the ability of an agen...

On the Role of Weight Sharing During Deep Option Learning

The options framework is a popular approach for building temporally exte...

Learnings Options End-to-End for Continuous Action Tasks

We present new results on learning temporally extended actions for conti...

Natural Option Critic

The recently proposed option-critic architecture Bacon et al. provide a ...

Soft Options Critic

The option-critic paper and several variants have successfully demonstra...

When Waiting is not an Option : Learning Options with a Deliberation Cost

Recent work has shown that temporally extended actions (options) can be ...