State Distribution-aware Sampling for Deep Q-learning

04/23/2018
by   Weichao Li, et al.
0

A critical and challenging problem in reinforcement learning is how to learn the state-action value function from the experience replay buffer and simultaneously keep sample efficiency and faster convergence to a high quality solution. In prior works, transitions are uniformly sampled at random from the replay buffer or sampled based on their priority measured by temporal-difference (TD) error. However, these approaches do not fully take into consideration the intrinsic characteristics of transition distribution in the state space and could result in redundant and unnecessary TD updates, slowing down the convergence of the learning procedure. To overcome this problem, we propose a novel state distribution-aware sampling method to balance the replay times for transitions with skew distribution, which takes into account both the occurrence frequencies of transitions and the uncertainty of state-action values. Consequently, our approach could reduce the unnecessary TD updates and increase the TD updates for state-action value with more uncertainty, making the experience replay more effective and efficient. Extensive experiments are conducted on both classic control tasks and Atari 2600 games based on OpenAI gym platform and the experimental results demonstrate the effectiveness of our approach in comparison with the standard DQN approach.

READ FULL TEXT
research
11/02/2021

Off-Policy Correction for Deep Deterministic Policy Gradient Algorithms via Batch Prioritized Experience Replay

The experience replay mechanism allows agents to use the experiences mul...
research
10/24/2022

MEET: A Monte Carlo Exploration-Exploitation Trade-off for Buffer Sampling

Data selection is essential for any data-based optimization technique, s...
research
03/29/2022

Topological Experience Replay

State-of-the-art deep Q-learning methods update Q-values using state tra...
research
07/30/2022

Reinforcement learning with experience replay and adaptation of action dispersion

Effective reinforcement learning requires a proper balance of exploratio...
research
10/28/2022

Using Contrastive Samples for Identifying and Leveraging Possible Causal Relationships in Reinforcement Learning

A significant challenge in reinforcement learning is quantifying the com...
research
07/12/2020

An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay

Prioritized Experience Replay (PER) is a deep reinforcement learning tec...
research
11/21/2022

Simultaneously Updating All Persistence Values in Reinforcement Learning

In reinforcement learning, the performance of learning agents is highly ...

Please sign up or login with your details

Forgot password? Click here to reset