Redeeming Intrinsic Rewards via Constrained Optimization

11/14/2022
by   Eric Chen, et al.
5

State-of-the-art reinforcement learning (RL) algorithms typically use random sampling (e.g., ϵ-greedy) for exploration, but this method fails on hard exploration tasks like Montezuma's Revenge. To address the challenge of exploration, prior works incentivize exploration by rewarding the agent when it visits novel states. Such intrinsic rewards (also called exploration bonus or curiosity) often lead to excellent performance on hard exploration tasks. However, on easy exploration tasks, the agent gets distracted by intrinsic rewards and performs unnecessary exploration even when sufficient task (also called extrinsic) reward is available. Consequently, such an overly curious agent performs worse than an agent trained with only task reward. Such inconsistency in performance across tasks prevents the widespread use of intrinsic rewards with RL algorithms. We propose a principled constrained optimization procedure called Extrinsic-Intrinsic Policy Optimization (EIPO) that automatically tunes the importance of the intrinsic reward: it suppresses the intrinsic reward when exploration is unnecessary and increases it when exploration is required. The results is superior exploration that does not require manual tuning in balancing the intrinsic reward against the task reward. Consistent performance gains across sixty-one ATARI games validate our claim. The code is available at https://github.com/Improbable-AI/eipo.

READ FULL TEXT

page 22

page 24

page 26

research
05/24/2023

Successor-Predecessor Intrinsic Exploration

Exploration is essential in reinforcement learning, particularly in envi...
research
11/04/2019

Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards

While using shaped rewards can be beneficial when solving sparse reward ...
research
04/04/2022

Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization

We present Reward-Switching Policy Optimization (RSPO), a paradigm to di...
research
10/21/2019

Exploration via Sample-Efficient Subgoal Design

The problem of exploration in unknown environments continues to pose a c...
research
06/18/2021

MADE: Exploration via Maximizing Deviation from Explored Regions

In online reinforcement learning (RL), efficient exploration remains par...
research
09/21/2018

Constrained Exploration and Recovery from Experience Shaping

We consider the problem of reinforcement learning under safety requireme...
research
05/22/2023

Developmental Curiosity and Social Interaction in Virtual Agents

Infants explore their complex physical and social environment in an orga...

Please sign up or login with your details

Forgot password? Click here to reset