DeepAI
Log In Sign Up

Generalized Data Distribution Iteration

06/07/2022
by   Jiajun Fan, et al.
0

To obtain higher sample efficiency and superior final performance simultaneously has been one of the major challenges for deep reinforcement learning (DRL). Previous work could handle one of these challenges but typically failed to address them concurrently. In this paper, we try to tackle these two challenges simultaneously. To achieve this, we firstly decouple these challenges into two classic RL problems: data richness and exploration-exploitation trade-off. Then, we cast these two problems into the training data distribution optimization problem, namely to obtain desired training data within limited interactions, and address them concurrently via i) explicit modeling and control of the capacity and diversity of behavior policy and ii) more fine-grained and adaptive control of selective/sampling distribution of the behavior policy using a monotonic data distribution optimization. Finally, we integrate this process into Generalized Policy Iteration (GPI) and obtain a more general framework called Generalized Data Distribution Iteration (GDI). We use the GDI framework to introduce operator-based versions of well-known RL methods from DQN to Agent57. Theoretical guarantee of the superiority of GDI compared with GPI is concluded. We also demonstrate our state-of-the-art (SOTA) performance on Arcade Learning Environment (ALE), wherein our algorithm has achieved 9620.33 normalized score (HNS), 1146.39 records using only 200M training frames. Our performance is comparable to Agent57's while we consume 500 times less data. We argue that there is still a long way to go before obtaining real superhuman agents in ALE.

READ FULL TEXT
06/11/2021

GDI: Rethinking What Makes Reinforcement Learning Different From Supervised Learning

Deep Q Network (DQN) firstly kicked the door of deep reinforcement learn...
05/23/2022

Distance-Sensitive Offline Reinforcement Learning

In offline reinforcement learning (RL), one detrimental issue to policy ...
11/16/2019

Off-Policy Policy Gradient Algorithms by Constraining the State Distribution Shift

Off-policy deep reinforcement learning (RL) algorithms are incapable of ...
06/15/2020

QD-RL: Efficient Mixing of Quality and Diversity in Reinforcement Learning

We propose a novel reinforcement learning algorithm,QD-RL, that incorpor...
05/21/2018

Evolutionary Reinforcement Learning

Deep Reinforcement Learning (DRL) algorithms have been successfully appl...
12/03/2021

Reinforcement Learning-Based Automatic Berthing System

Previous studies on automatic berthing systems based on artificial neura...
07/14/2016

Generalized Sampling in Julia

Generalized sampling is a numerically stable framework for obtaining rec...