RBED: Reward Based Epsilon Decay

10/30/2019
by   Aakash Maroti, et al.
0

ε-greedy is a policy used to balance exploration and exploitation in many reinforcement learning setting. In cases where the agent uses some on-policy algorithm to learn optimal behaviour, it makes sense for the agent to explore more initially and eventually exploit more as it approaches the target behaviour. This shift from heavy exploration to heavy exploitation can be represented as decay in the ε value, where ε depicts the how much an agent is allowed to explore. This paper proposes a new approach to this ε decay where the decay is based on feedback from the environment. This paper also compares and contrasts one such approach based on rewards and compares it against standard exponential decay. The new approach, in the environments tested, produces more consistent results that on average perform better.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2021

Decoupling Exploration and Exploitation in Reinforcement Learning

Intrinsic rewards are commonly applied to improve exploration in reinfor...
research
07/01/2019

MULEX: Disentangling Exploitation from Exploration in Deep RL

An agent learning through interactions should balance its action selecti...
research
10/24/2022

MEET: A Monte Carlo Exploration-Exploitation Trade-off for Buffer Sampling

Data selection is essential for any data-based optimization technique, s...
research
01/01/2020

Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement Learning

Reinforcement learning with sparse rewards is still an open challenge. C...
research
02/09/2014

Recommandation mobile, sensible au contexte de contenus évolutifs: Contextuel-E-Greedy

We introduce in this paper an algorithm named Contextuel-E-Greedy that t...
research
09/06/2016

Q-Learning with Basic Emotions

Q-learning is a simple and powerful tool in solving dynamic problems whe...
research
04/27/2023

Exploring the flavor structure of quarks and leptons with reinforcement learning

We propose a method to explore the flavor structure of quarks and lepton...

Please sign up or login with your details

Forgot password? Click here to reset