
Discovering Reinforcement Learning Algorithms
Reinforcement learning (RL) algorithms update an agent's parameters acco...
MetaGradient Reinforcement Learning with an Objective Discovered Online
Deep reinforcement learning includes a broad family of algorithms that p...
What Can Learned Intrinsic Rewards Capture?
Reinforcement learning agents can include different components, such as ...
Hindsight Credit Assignment
We consider the problem of efficient credit assignment in reinforcement ...
Conditional Importance Sampling for OffPolicy Learning
The principal contribution of this paper is a conceptual framework for o...
Discovery of Useful Questions as Auxiliary Tasks
Arguably, intelligent agents ought to be able to discover their own ques...
Behaviour Suite for Reinforcement Learning
This paper introduces the Behaviour Suite for Reinforcement Learning, or...
General nonlinear Bellman equations
We consider a general class of nonlinear Bellman equations. These open ...
On Inductive Biases in Deep Reinforcement Learning
Many deep reinforcement learning algorithms contain inductive biases tha...
When to use parametric models in reinforcement learning?
We examine the question of when and how parametric models are most usefu...
Metalearning of Sequential Strategies
In this report we review memorybased metalearning as a tool for buildi...
Universal Successor Features Approximators
The ability of a reinforcement learning (RL) agent to learn about many r...
Deep Reinforcement Learning and the Deadly Triad
We know from reinforcement learning theory that temporal difference lear...
The Barbados 2018 List of Open Issues in Continual Learning
We want to make progress toward artificial general intelligence, namely ...
Multitask Deep Reinforcement Learning with PopArt
The reinforcement learning community has made great strides in designing...
Observe and Look Further: Achieving Consistent Performance on Atari
Despite significant advances in the field of deep Reinforcement Learning...
MetaGradient Reinforcement Learning
The goal of reinforcement learning algorithms is to estimate and/or opti...
Distributed Prioritized Experience Replay
We propose a distributed architecture for deep reinforcement learning at...
Unicorn: Continual Learning with a Universal, Offpolicy Agent
Some realworld domains are best characterized as a single task, but for...
Rainbow: Combining Improvements in Deep Reinforcement Learning
The deep reinforcement learning community has made several independent i...
StarCraft II: A New Challenge for Reinforcement Learning
This paper introduces SC2LE (StarCraft II Learning Environment), a reinf...
The Predictron: EndToEnd Learning and Planning
One of the key challenges of artificial intelligence is to learn models ...
Learning values across many orders of magnitude
Most learning algorithms are not invariant to the scale of the function ...
Deep Reinforcement Learning in Large Discrete Action Spaces
Being able to reason in an environment with a large number of discrete a...
Estimating the Maximum Expected Value: An Analysis of (Nested) Cross Validation and the Maximum Sample Average
We investigate the accuracy of the two most common estimators for the ma...
Hado van Hasselt
