MEET: A Monte Carlo Exploration-Exploitation Trade-off for Buffer Sampling

10/24/2022
by   Julius Ott, et al.
0

Data selection is essential for any data-based optimization technique, such as Reinforcement Learning. State-of-the-art sampling strategies for the experience replay buffer improve the performance of the Reinforcement Learning agent. However, they do not incorporate uncertainty in the Q-Value estimation. Consequently, they cannot adapt the sampling strategies, including exploration and exploitation of transitions, to the complexity of the task. To address this, this paper proposes a new sampling strategy that leverages the exploration-exploitation trade-off. This is enabled by the uncertainty estimation of the Q-Value function, which guides the sampling to explore more significant transitions and, thus, learn a more efficient policy. Experiments on classical control environments demonstrate stable results across various environments. They show that the proposed method outperforms state-of-the-art sampling strategies for dense rewards w.r.t. convergence and peak performance by 26

READ FULL TEXT
research
04/23/2018

State Distribution-aware Sampling for Deep Q-learning

A critical and challenging problem in reinforcement learning is how to l...
research
07/01/2019

MULEX: Disentangling Exploitation from Exploration in Deep RL

An agent learning through interactions should balance its action selecti...
research
11/15/2017

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

We present a new algorithm that significantly improves the efficiency of...
research
07/22/2023

Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs

Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) is a promis...
research
10/30/2019

RBED: Reward Based Epsilon Decay

ε-greedy is a policy used to balance exploration and exploitation in man...
research
04/15/2011

Polyethism in a colony of artificial ants

We explore self-organizing strategies for role assignment in a foraging ...
research
11/21/2019

Sample-Efficient Reinforcement Learning with Maximum Entropy Mellowmax Episodic Control

Deep networks have enabled reinforcement learning to scale to more compl...

Please sign up or login with your details

Forgot password? Click here to reset