Striving for Simplicity in Off-policy Deep Reinforcement Learning

by   Rishabh Agarwal, et al.

Reflecting on the advances of off-policy deep reinforcement learning (RL) algorithms since the development of DQN in 2013, it is important to ask: are the complexities of recent off-policy methods really necessary? In an attempt to isolate the contributions of various factors of variation in off-policy deep RL and to help design simpler algorithms, this paper investigates a set of related questions: First, can effective policies be learned given only access to logged offline experience? Second, how much of the benefits of recent distributional RL algorithms is attributed to improvements in exploration versus exploitation behavior? Third, can simpler off-policy RL algorithms outperform distributional RL without learning explicit distributions over returns? This paper uses a batch RL experimental setup on Atari 2600 games to investigate these questions. Unexpectedly, we find that batch RL algorithms trained solely on logged experiences of a DQN agent are able to significantly outperform online DQN. Our experiments suggest that the benefits of distributional RL mainly stem from better exploitation. We present a simple and novel variant of ensemble Q-learning called Random Ensemble Mixture (REM), which enforces optimal Bellman consistency on random convex combinations of the Q-heads of a multi-head Q-network. The batch REM agent trained offline on DQN data outperforms the batch QR-DQN and online C51 algorithms.


page 5

page 16

page 17


Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation

Most of the existing deep reinforcement learning (RL) approaches for ses...

Ensemble-based Offline-to-Online Reinforcement Learning: From Pessimistic Learning to Optimistic Exploration

Offline reinforcement learning (RL) is a learning paradigm where an agen...

Malaria Likelihood Prediction By Effectively Surveying Households Using Deep Reinforcement Learning

We build a deep reinforcement learning (RL) agent that can predict the l...

Bayesian Distributional Policy Gradients

Distributional Reinforcement Learning (RL) maintains the entire probabil...

Dirichlet policies for reinforced factor portfolios

This article aims to combine factor investing and reinforcement learning...

Efficient Reservoir Management through Deep Reinforcement Learning

Dams impact downstream river dynamics through flow regulation and disrup...

Bag of Policies for Distributional Deep Exploration

Efficient exploration in complex environments remains a major challenge ...

Please sign up or login with your details

Forgot password? Click here to reset