Decoupling Exploration and Exploitation in Reinforcement Learning

07/19/2021
by   Lukas Schäfer, et al.
14

Intrinsic rewards are commonly applied to improve exploration in reinforcement learning. However, these approaches suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters. In this work, we propose Decoupled RL (DeRL) which trains separate policies for exploration and exploitation. DeRL can be applied with on-policy and off-policy RL algorithms. We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards. We show that DeRL is more robust to scaling and speed of decay of intrinsic rewards and converges to the same evaluation returns than intrinsically motivated baselines in fewer interactions.

READ FULL TEXT

page 8

page 16

page 17

page 21

page 25

research
10/31/2019

VASE: Variational Assorted Surprise Exploration for Reinforcement Learning

Exploration in environments with continuous control and sparse rewards r...
research
04/06/2020

Intrinsic Exploration as Multi-Objective RL

Intrinsic motivation enables reinforcement learning (RL) agents to explo...
research
02/13/2023

Improving robot navigation in crowded environments using intrinsic rewards

Autonomous navigation in crowded environments is an open problem with ma...
research
10/30/2019

RBED: Reward Based Epsilon Decay

ε-greedy is a policy used to balance exploration and exploitation in man...
research
01/20/2020

Reinforcement Learning with Probabilistically Complete Exploration

Balancing exploration and exploitation remains a key challenge in reinfo...
research
05/30/2022

SEREN: Knowing When to Explore and When to Exploit

Efficient reinforcement learning (RL) involves a trade-off between "expl...
research
03/03/2022

Intrinsically-Motivated Reinforcement Learning: A Brief Introduction

Reinforcement learning (RL) is one of the three basic paradigms of machi...

Please sign up or login with your details

Forgot password? Click here to reset