Safe Option-Critic: Learning Safety in the Option-Critic Architecture

07/21/2018
by   Arushi Jain, et al.
0

Designing hierarchical reinforcement learning algorithms that induce a notion of safety is not only vital for safety-critical applications, but also, brings better understanding of an artificially intelligent agent's decisions. While learning end-to-end options automatically has been fully realized recently, we propose a solution to learning safe options. We introduce the idea of controllability of states based on the temporal difference errors in the option-critic framework. We then derive the policy-gradient theorem with controllability and propose a novel framework called safe option-critic. We demonstrate the effectiveness of our approach in the four-rooms grid-world, cartpole, and three games in the Arcade Learning Environment (ALE): MsPacman, Amidar and Q*Bert. Learning of end-to-end options with the proposed notion of safety achieves reduction in the variance of return and boosts the performance in environments with intrinsic variability in the reward structure. More importantly, the proposed algorithm outperforms the vanilla options in all the environments and primitive actions in two out of three ALE games.

READ FULL TEXT

page 5

page 6

research
11/04/2020

Diversity-Enriched Option-Critic

Temporal abstraction allows reinforcement learning agents to represent k...
research
01/07/2022

Attention Option-Critic

Temporal abstraction in reinforcement learning is the ability of an agen...
research
12/31/2019

On the Role of Weight Sharing During Deep Option Learning

The options framework is a popular approach for building temporally exte...
research
11/30/2017

Learnings Options End-to-End for Continuous Action Tasks

We present new results on learning temporally extended actions for conti...
research
12/04/2018

Natural Option Critic

The recently proposed option-critic architecture Bacon et al. provide a ...
research
05/23/2019

Soft Options Critic

The option-critic paper and several variants have successfully demonstra...
research
09/14/2017

When Waiting is not an Option : Learning Options with a Deliberation Cost

Recent work has shown that temporally extended actions (options) can be ...

Please sign up or login with your details

Forgot password? Click here to reset