Situationally Aware Options

11/20/2017
by   Daniel J. Mankowitz, et al.
0

Hierarchical abstractions, also known as options -- a type of temporally extended action (Sutton et. al. 1999) that enables a reinforcement learning agent to plan at a higher level, abstracting away from the lower-level details. In this work, we learn reusable options whose parameters can vary, encouraging different behaviors, based on the current situation. In principle, these behaviors can include vigor, defence or even risk-averseness. These are some examples of what we refer to in the broader context as Situational Awareness (SA). We incorporate SA, in the form of vigor, into hierarchical RL by defining and learning situationally aware options in a Probabilistic Goal Semi-Markov Decision Process (PG-SMDP). This is achieved using our Situationally Aware oPtions (SAP) policy gradient algorithm which comes with a theoretical convergence guarantee. We learn reusable options in different scenarios in a RoboCup soccer domain (i.e., winning/losing). These options learn to execute with different levels of vigor resulting in human-like behaviours such as `time-wasting' in the winning scenario. We show the potential of the agent to exit bad local optima using reusable options in RoboCup. Finally, using SAP, the agent mitigates feature-based model misspecification in a Bottomless Pit of Death domain.

READ FULL TEXT

page 2

page 8

research
10/10/2016

Situational Awareness by Risk-Conscious Skills

Hierarchical Reinforcement Learning has been previously shown to speed u...
research
10/27/2018

Learning Abstract Options

Building systems that autonomously create temporal abstractions from dat...
research
03/29/2022

Multi-Agent Asynchronous Cooperation with Hierarchical Reinforcement Learning

Hierarchical multi-agent reinforcement learning (MARL) has shown a signi...
research
03/25/2017

Exploration--Exploitation in MDPs with Options

While a large body of empirical results show that temporally-extended ac...
research
12/22/2022

Reusable Options through Gradient-based Meta Learning

Hierarchical methods in reinforcement learning have the potential to red...
research
05/22/2019

Learning Robust Options by Conditional Value at Risk Optimization

Options are generally learned by using an inaccurate environment model (...
research
02/09/2018

Learning Robust Options

Robust reinforcement learning aims to produce policies that have strong ...

Please sign up or login with your details

Forgot password? Click here to reset