Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear

11/03/2016
by   Zachary C Lipton, et al.
0

To use deep reinforcement learning in the wild, we might hope for an agent that can avoid catastrophic mistakes. Unfortunately, even in simple environments, the popular deep Q-network (DQN) algorithm is doomed by a Sisyphean curse. Owing to the use of function approximation, these agents may eventually forget experiences as they become exceedingly unlikely under a new policy. Consequently, for as long as they continue to train, DQNs may periodically repeat avoidable catastrophic mistakes. In this paper, we learn a reward shaping that accelerates learning and guards oscillating policies against repeated catastrophes. First, we demonstrate unacceptable performance of DQNs on two toy problems. We then introduce intrinsic fear, a new method that mitigates these problems by avoiding dangerous states. Our approach incorporates a second model trained via supervised learning to predict the probability of catastrophe within a short number of steps. This score then acts to penalize the Q-learning objective. Equipped with intrinsic fear, our DQNs solve the toy environments and improve on the Atari games Seaquest, Asteroids, and Freeway.

READ FULL TEXT

page 2

page 6

research
03/31/2019

Risk Averse Robust Adversarial Reinforcement Learning

Deep reinforcement learning has recently made significant progress in so...
research
09/04/2018

Transferring Deep Reinforcement Learning with Adversarial Objective and Augmentation

In the past few years, deep reinforcement learning has been proven to so...
research
04/20/2016

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Learning goal-directed behavior in environments with sparse feedback is ...
research
11/10/2017

Towards the Use of Deep Reinforcement Learning with Global Policy For Query-based Extractive Summarisation

Supervised approaches for text summarisation suffer from the problem of ...
research
11/21/2019

Memory-Efficient Episodic Control Reinforcement Learning with Dynamic Online k-means

Recently, neuro-inspired episodic control (EC) methods have been develop...
research
06/03/2021

Lifetime policy reuse and the importance of task capacity

A long-standing challenge in artificial intelligence is lifelong learnin...
research
09/16/2022

Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability

A trustworthy reinforcement learning algorithm should be competent in so...

Please sign up or login with your details

Forgot password? Click here to reset