Log In Sign Up

Generalized Data Distribution Iteration

by   Jiajun Fan, et al.

To obtain higher sample efficiency and superior final performance simultaneously has been one of the major challenges for deep reinforcement learning (DRL). Previous work could handle one of these challenges but typically failed to address them concurrently. In this paper, we try to tackle these two challenges simultaneously. To achieve this, we firstly decouple these challenges into two classic RL problems: data richness and exploration-exploitation trade-off. Then, we cast these two problems into the training data distribution optimization problem, namely to obtain desired training data within limited interactions, and address them concurrently via i) explicit modeling and control of the capacity and diversity of behavior policy and ii) more fine-grained and adaptive control of selective/sampling distribution of the behavior policy using a monotonic data distribution optimization. Finally, we integrate this process into Generalized Policy Iteration (GPI) and obtain a more general framework called Generalized Data Distribution Iteration (GDI). We use the GDI framework to introduce operator-based versions of well-known RL methods from DQN to Agent57. Theoretical guarantee of the superiority of GDI compared with GPI is concluded. We also demonstrate our state-of-the-art (SOTA) performance on Arcade Learning Environment (ALE), wherein our algorithm has achieved 9620.33 normalized score (HNS), 1146.39 records using only 200M training frames. Our performance is comparable to Agent57's while we consume 500 times less data. We argue that there is still a long way to go before obtaining real superhuman agents in ALE.


GDI: Rethinking What Makes Reinforcement Learning Different From Supervised Learning

Deep Q Network (DQN) firstly kicked the door of deep reinforcement learn...

Distance-Sensitive Offline Reinforcement Learning

In offline reinforcement learning (RL), one detrimental issue to policy ...

Off-Policy Policy Gradient Algorithms by Constraining the State Distribution Shift

Off-policy deep reinforcement learning (RL) algorithms are incapable of ...

QD-RL: Efficient Mixing of Quality and Diversity in Reinforcement Learning

We propose a novel reinforcement learning algorithm,QD-RL, that incorpor...

Evolutionary Reinforcement Learning

Deep Reinforcement Learning (DRL) algorithms have been successfully appl...

Reinforcement Learning-Based Automatic Berthing System

Previous studies on automatic berthing systems based on artificial neura...

Generalized Sampling in Julia

Generalized sampling is a numerically stable framework for obtaining rec...