
MonteCarlo Tree Search as Regularized Policy Optimization
The combination of MonteCarlo tree search (MCTS) with deep reinforcemen...
read it

Bootstrap Your Own Latent: A New Approach to SelfSupervised Learning
We introduce Bootstrap Your Own Latent (BYOL), a new approach to selfsu...
read it

Navigating the Landscape of Games
Games are traditionally recognized as one of the key testbeds underlying...
read it

Bootstrap LatentPredictive Representations for Multitask Reinforcement Learning
Learning a good representation is an essential component for deep reinfo...
read it

Leverage the Average: an Analysis of Regularization in RL
Building upon the formalism of regularized Markov decision processes, we...
read it

Taylor Expansion Policy Optimization
In this work, we investigate the application of Taylor expansions in rei...
read it

From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization
In this paper we investigate the Follow the Regularized Leader dynamics ...
read it

Hindsight Credit Assignment
We consider the problem of efficient credit assignment in reinforcement ...
read it

Conditional Importance Sampling for OffPolicy Learning
The principal contribution of this paper is a conceptual framework for o...
read it

Adaptive TradeOffs in OffPolicy Learning
A great variety of offpolicy learning algorithms exist in the literatur...
read it

A Generalized Training Approach for Multiagent Learning
This paper investigates a populationbased training regime based on game...
read it

Multiagent Evaluation under Incomplete Information
This paper investigates the evaluation of learned multiagent strategies ...
read it

Neural Replicator Dynamics
In multiagent learning, agents interact in inherently nonstationary envi...
read it

αRank: MultiAgent Evaluation by Evolution
We introduce αRank, a principled evolutionary dynamics methodology, for...
read it

The Termination Critic
In this work, we consider the problem of autonomously discovering behavi...
read it

Statistics and Samples in Distributional Reinforcement Learning
We present a unifying framework for designing and analysing distribution...
read it

World Discovery Models
As humans we are driven by a strong desire for seeking novelty in our wo...
read it

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement
The ability to transfer skills across tasks has the potential to scale u...
read it

Optimistic optimization of a Brownian
We address the problem of optimizing a Brownian motion. We consider a (r...
read it

Universal Successor Features Approximators
The ability of a reinforcement learning (RL) agent to learn about many r...
read it

Neural Predictive Belief Representations
Unsupervised representation learning has succeeded with excellent result...
read it

ActorCritic Policy Optimization in Partially Observable Multiagent Environments
Optimization of parameterized policies for reinforcement learning (RL) i...
read it

Autoregressive Quantile Networks for Generative Modeling
We introduce autoregressive implicit quantile networks (AIQN), a fundame...
read it

Implicit Quantile Networks for Distributional Reinforcement Learning
In this work, we build on recent advances in distributional reinforcemen...
read it

Maximum a Posteriori Policy Optimisation
We introduce a new algorithm for reinforcement learning called Maximum a...
read it

Observe and Look Further: Achieving Consistent Performance on Atari
Despite significant advances in the field of deep Reinforcement Learning...
read it

Lowpass Recurrent Neural Networks  A memory architecture for longerterm correlation discovery
Reinforcement learning (RL) agents performing complex tasks must be able...
read it

A Study on Overfitting in Deep Reinforcement Learning
Recent years have witnessed significant progresses in deep Reinforcement...
read it

An Analysis of Categorical Distributional Reinforcement Learning
Distributional approaches to valuebased reinforcement learning model th...
read it

Learning to Search with MCTSnets
Planning problems are among the most important and wellstudied problems...
read it

IMPALA: Scalable Distributed DeepRL with Importance Weighted ActorLearner Architectures
In this work we aim to solve a large collection of tasks using a single ...
read it

Distributional Reinforcement Learning with Quantile Regression
In reinforcement learning an agent interacts with the environment by tak...
read it

The Uncertainty Bellman Equation and Exploration
We consider the exploration/exploitation problem in reinforcement learni...
read it

A Distributional Perspective on Reinforcement Learning
In this paper we argue for the fundamental importance of the value distr...
read it

Noisy Networks for Exploration
We introduce NoisyNet, a deep reinforcement learning agent with parametr...
read it

Observational Learning by Reinforcement Learning
Observational learning is a type of learning that occurs as a function o...
read it

The Cramer Distance as a Solution to Biased Wasserstein Gradients
The Wasserstein probability metric has received much attention from the ...
read it

The Reactor: A SampleEfficient ActorCritic Architecture
In this work we present a new reinforcement learning agent, called React...
read it

Automated Curriculum Learning for Neural Networks
We introduce a method for automatically selecting the path, or syllabus,...
read it

Minimax Regret Bounds for Reinforcement Learning
We consider the problem of provably optimal exploration in reinforcement...
read it

Learning to reinforcement learn
In recent years deep reinforcement learning (RL) systems have attained s...
read it

Combining policy gradient and Qlearning
Policy gradient is an efficient technique for improving a policy in a re...
read it

Successor Features for Transfer in Reinforcement Learning
Transfer in reinforcement learning refers to the notion that generalizat...
read it

MemoryEfficient Backpropagation Through Time
We propose a novel approach to reduce memory consumption of the backprop...
read it

Safe and Efficient OffPolicy Reinforcement Learning
In this work, we take a fresh look at some old and new algorithms for of...
read it

Unifying CountBased Exploration and Intrinsic Motivation
We consider an agent's uncertainty about its environment and the problem...
read it

Q(λ) with OffPolicy Corrections
We propose and analyze an alternate approach to offpolicy multistep te...
read it

Increasing the Action Gap: New Operators for Reinforcement Learning
This paper introduces new optimalitypreserving operators on Qfunctions...
read it

Generalized Emphatic Temporal Difference Learning: BiasVariance Analysis
We consider the offpolicy evaluation problem in Markov decision process...
read it

Active Regression by Stratification
We propose a new active learning algorithm for parametric linear regress...
read it