Learning to Be Cautious

10/29/2021
by   Montaser Mohammedalamen, et al.
8

A key challenge in the field of reinforcement learning is to develop agents that behave cautiously in novel situations. It is generally impossible to anticipate all situations that an autonomous system may face or what behavior would best avoid bad outcomes. An agent that could learn to be cautious would overcome this challenge by discovering for itself when and how to behave cautiously. In contrast, current approaches typically embed task-specific safety information or explicit cautious behaviors into the system, which is error-prone and imposes extra burdens on practitioners. In this paper, we present both a sequence of tasks where cautious behavior becomes increasingly non-obvious, as well as an algorithm to demonstrate that it is possible for a system to learn to be cautious. The essential features of our algorithm are that it characterizes reward function uncertainty without task-specific safety information and uses this uncertainty to construct a robust policy. Specifically, we construct robust policies with a k-of-N counterfactual regret minimization (CFR) subroutine given a learned reward function uncertainty represented by a neural network ensemble belief. These policies exhibit caution in each of our tasks without any task-specific safety tuning.

READ FULL TEXT

page 2

page 7

page 12

page 13

page 16

page 19

page 21

research
10/07/2017

Meta Inverse Reinforcement Learning via Maximum Reward Sharing for Human Motion Analysis

This work handles the inverse reinforcement learning (IRL) problem where...
research
07/26/2019

Environment Probing Interaction Policies

A key challenge in reinforcement learning (RL) is environment generaliza...
research
10/06/2020

Safety Aware Reinforcement Learning (SARL)

As reinforcement learning agents become increasingly integrated into com...
research
03/02/2023

Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning

In imitation and reinforcement learning, the cost of human supervision l...
research
11/01/2021

On the Expressivity of Markov Reward

Reward is the driving force for reinforcement-learning agents. This pape...
research
03/12/2019

Imitation Learning of Factored Multi-agent Reactive Models

We apply recent advances in deep generative modeling to the task of imit...
research
03/19/2023

CLIP4MC: An RL-Friendly Vision-Language Model for Minecraft

One of the essential missions in the AI research community is to build a...

Please sign up or login with your details

Forgot password? Click here to reset