MULEX: Disentangling Exploitation from Exploration in Deep RL

07/01/2019
by   Lucas Beyer, et al.
6

An agent learning through interactions should balance its action selection process between probing the environment to discover new rewards and using the information acquired in the past to adopt useful behaviour. This trade-off is usually obtained by perturbing either the agent's actions (e.g., e-greedy or Gibbs sampling) or the agent's parameters (e.g., NoisyNet), or by modifying the reward it receives (e.g., exploration bonus, intrinsic motivation, or hand-shaped rewards). Here, we adopt a disruptive but simple and generic perspective, where we explicitly disentangle exploration and exploitation. Different losses are optimized in parallel, one of them coming from the true objective (maximizing cumulative rewards from the environment) and others being related to exploration. Every loss is used in turn to learn a policy that generates transitions, all shared in a single replay buffer. Off-policy methods are then applied to these transitions to optimize each loss. We showcase our approach on a hard-exploration environment, show its sample-efficiency and robustness, and discuss further implications.

READ FULL TEXT

page 3

page 4

page 11

page 13

research
10/24/2022

MEET: A Monte Carlo Exploration-Exploitation Trade-off for Buffer Sampling

Data selection is essential for any data-based optimization technique, s...
research
04/06/2020

Intrinsic Exploration as Multi-Objective RL

Intrinsic motivation enables reinforcement learning (RL) agents to explo...
research
10/30/2019

RBED: Reward Based Epsilon Decay

ε-greedy is a policy used to balance exploration and exploitation in man...
research
03/22/2019

DQN with model-based exploration: efficient learning on environments with sparse rewards

We propose Deep Q-Networks (DQN) with model-based exploration, an algori...
research
05/20/2021

Don't Do What Doesn't Matter: Intrinsic Motivation with Action Usefulness

Sparse rewards are double-edged training signals in reinforcement learni...
research
01/20/2021

Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments

Exploration under sparse reward is a long-standing challenge of model-fr...
research
06/03/2014

Changing the Environment Based on Empowerment as Intrinsic Motivation

One aspect of intelligence is the ability to restructure your own enviro...

Please sign up or login with your details

Forgot password? Click here to reset