Prioritizing Starting States for Reinforcement Learning

11/27/2018
by   Arash Tavakoli, et al.
0

Online, off-policy reinforcement learning algorithms are able to use an experience memory to remember and replay past experiences. In prior work, this approach was used to stabilize training by breaking the temporal correlations of the updates and avoiding the rapid forgetting of possibly rare experiences. In this work, we propose a conceptually simple framework that uses an experience memory to help exploration by prioritizing the starting states from which the agent starts acting in the environment, importantly, in a fashion that is also compatible with on-policy algorithms. Given the capacity to restart the agent in states corresponding to its past observations, we achieve this objective by (i) enabling the agent to restart in states belonging to significant past experiences (e.g., nearby goals), and (ii) promoting faster coverage of the state space through starting from a more diverse set of states. While, using a good measure of priority to identify significant past transitions, we expect case (i) to more considerably help exploration in certain problems (e.g., sparse reward tasks), we hypothesize that case (ii) will generally be beneficial, even without any prioritization. We show empirically that our approach improves learning performance for both off-policy and on-policy deep reinforcement learning methods, with the most notable improvement in a significantly sparse reward task.

READ FULL TEXT
research
06/19/2019

Experience Replay Optimization

Experience replay enables reinforcement learning agents to memorize and ...
research
09/16/2018

Improvements on Hindsight Learning

Sparse reward problems are one of the biggest challenges in Reinforcemen...
research
09/18/2023

Contrastive Initial State Buffer for Reinforcement Learning

In Reinforcement Learning, the trade-off between exploration and exploit...
research
11/12/2021

Improving Experience Replay through Modeling of Similar Transitions' Sets

In this work, we propose and evaluate a new reinforcement learning metho...
research
05/22/2023

TOM: Learning Policy-Aware Models for Model-Based Reinforcement Learning via Transition Occupancy Matching

Standard model-based reinforcement learning (MBRL) approaches fit a tran...
research
06/08/2022

Action Noise in Off-Policy Deep Reinforcement Learning: Impact on Exploration and Performance

Many deep reinforcement learning algorithms rely on simple forms of expl...
research
06/18/2020

Deep Reinforcement Learning amidst Lifelong Non-Stationarity

As humans, our goals and our environment are persistently changing throu...

Please sign up or login with your details

Forgot password? Click here to reset