Policy Entropy for Out-of-Distribution Classification

05/25/2020
by   Andreas Sedlmeier, et al.
19

One critical prerequisite for the deployment of reinforcement learning systems in the real world is the ability to reliably detect situations on which the agent was not trained. Such situations could lead to potential safety risks when wrong predictions lead to the execution of harmful actions. In this work, we propose PEOC, a new policy entropy based out-of-distribution classifier that reliably detects unencountered states in deep reinforcement learning. It is based on using the entropy of an agent's policy as the classification score of a one-class classifier. We evaluate our approach using a procedural environment generator. Results show that PEOC is highly competitive against state-of-the-art one-class classification algorithms on the evaluated environments. Furthermore, we present a structured process for benchmarking out-of-distribution classification in reinforcement learning.

READ FULL TEXT
research
12/31/2019

Uncertainty-Based Out-of-Distribution Classification in Deep Reinforcement Learning

Robustness to out-of-distribution (OOD) data is an important goal in bui...
research
11/21/2022

Examining Policy Entropy of Reinforcement Learning Agents for Personalization Tasks

This effort is focused on examining the behavior of reinforcement learni...
research
02/27/2019

Introspection Learning

Traditional reinforcement learning agents learn from experience, past or...
research
12/28/2022

Don't do it: Safer Reinforcement Learning With Rule-based Guidance

During training, reinforcement learning systems interact with the world ...
research
09/29/2021

Improving Safety in Deep Reinforcement Learning using Unsupervised Action Planning

One of the key challenges to deep reinforcement learning (deep RL) is to...
research
10/17/2022

You Only Live Once: Single-Life Reinforcement Learning

Reinforcement learning algorithms are typically designed to learn a perf...
research
04/11/2018

KS(conf ): A Light-Weight Test if a ConvNet Operates Outside of Its Specifications

Computer vision systems for automatic image categorization have become a...

Please sign up or login with your details

Forgot password? Click here to reset