Reinforcement learning with experience replay and adaptation of action dispersion

07/30/2022
by   Paweł Wawrzyński, et al.
0

Effective reinforcement learning requires a proper balance of exploration and exploitation defined by the dispersion of action distribution. However, this balance depends on the task, the current stage of the learning process, and the current environment state. Existing methods that designate the action distribution dispersion require problem-dependent hyperparameters. In this paper, we propose to automatically designate the action distribution dispersion using the following principle: This distribution should have sufficient dispersion to enable the evaluation of future policies. To that end, the dispersion should be tuned to assure a sufficiently high probability (densities) of the actions in the replay buffer and the modes of the distributions that generated them, yet this dispersion should not be higher. This way, a policy can be effectively evaluated based on the actions in the buffer, but exploratory randomness in actions decreases when this policy converges. The above principle is verified here on challenging benchmarks Ant, HalfCheetah, Hopper, and Walker2D, with good results. Our method makes the action standard deviations converge to values similar to those resulting from trial-and-error optimization.

READ FULL TEXT
research
04/23/2018

State Distribution-aware Sampling for Deep Q-learning

A critical and challenging problem in reinforcement learning is how to l...
research
07/27/2022

Safe and Robust Experience Sharing for Deterministic Policy Gradient Algorithms

Learning in high dimensional continuous tasks is challenging, mainly whe...
research
06/20/2022

MASER: Multi-Agent Reinforcement Learning with Subgoals Generated from Experience Replay Buffer

In this paper, we consider cooperative multi-agent reinforcement learnin...
research
01/15/2020

Continuous-action Reinforcement Learning for Playing Racing Games: Comparing SPG to PPO

In this paper, a novel racing environment for OpenAI Gym is introduced. ...
research
10/28/2022

Using Contrastive Samples for Identifying and Leveraging Possible Causal Relationships in Reinforcement Learning

A significant challenge in reinforcement learning is quantifying the com...
research
03/15/2023

Replay Buffer With Local Forgetting for Adaptive Deep Model-Based Reinforcement Learning

One of the key behavioral characteristics used in neuroscience to determ...
research
06/08/2018

Fidelity-based Probabilistic Q-learning for Control of Quantum Systems

The balance between exploration and exploitation is a key problem for re...

Please sign up or login with your details

Forgot password? Click here to reset