
ModelFree Learning for TwoPlayer ZeroSum Partially Observable Markov Games with Perfect Recall
We study the problem of learning a Nash equilibrium (NE) in an imperfect...
read it

Taylor Expansion of Discount Factors
In practical reinforcement learning (RL), the discount factor used for e...
read it

Concave Utility Reinforcement Learning: the Meanfield Game viewpoint
Concave Utility Reinforcement Learning (CURL) extends RL from linear to ...
read it

Revisiting Peng's Q(λ) for Modern Reinforcement Learning
Offpolicy multistep reinforcement learning algorithms consist of conse...
read it

Bootstrapped Representation Learning on Graphs
Current stateoftheart selfsupervised learning methods for graph neur...
read it

Geometric Entropic Exploration
Exploration is essential for solving complex Reinforcement Learning (RL)...
read it

Counterfactual Credit Assignment in ModelFree Reinforcement Learning
Credit assignment in reinforcement learning is the problem of measuring ...
read it

Game Plan: What AI can do for Football, and What Football can do for AI
The rapid progress in artificial intelligence (AI) and machine learning ...
read it

The Advantage RegretMatching ActorCritic
Regret minimization has played a key role in online learning, equilibriu...
read it

MonteCarlo Tree Search as Regularized Policy Optimization
The combination of MonteCarlo tree search (MCTS) with deep reinforcemen...
read it

Bootstrap Your Own Latent: A New Approach to SelfSupervised Learning
We introduce Bootstrap Your Own Latent (BYOL), a new approach to selfsu...
read it

Navigating the Landscape of Multiplayer Games to Probe the Drosophila of AI
Multiplayer games have a long history in being used as key testbeds for ...
read it

Bootstrap LatentPredictive Representations for Multitask Reinforcement Learning
Learning a good representation is an essential component for deep reinfo...
read it

Leverage the Average: an Analysis of Regularization in RL
Building upon the formalism of regularized Markov decision processes, we...
read it

Taylor Expansion Policy Optimization
In this work, we investigate the application of Taylor expansions in rei...
read it

From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization
In this paper we investigate the Follow the Regularized Leader dynamics ...
read it

Hindsight Credit Assignment
We consider the problem of efficient credit assignment in reinforcement ...
read it

Conditional Importance Sampling for OffPolicy Learning
The principal contribution of this paper is a conceptual framework for o...
read it

Adaptive TradeOffs in OffPolicy Learning
A great variety of offpolicy learning algorithms exist in the literatur...
read it

A Generalized Training Approach for Multiagent Learning
This paper investigates a populationbased training regime based on game...
read it

Multiagent Evaluation under Incomplete Information
This paper investigates the evaluation of learned multiagent strategies ...
read it

Neural Replicator Dynamics
In multiagent learning, agents interact in inherently nonstationary envi...
read it

αRank: MultiAgent Evaluation by Evolution
We introduce αRank, a principled evolutionary dynamics methodology, for...
read it

The Termination Critic
In this work, we consider the problem of autonomously discovering behavi...
read it

Statistics and Samples in Distributional Reinforcement Learning
We present a unifying framework for designing and analysing distribution...
read it

World Discovery Models
As humans we are driven by a strong desire for seeking novelty in our wo...
read it

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement
The ability to transfer skills across tasks has the potential to scale u...
read it

Optimistic optimization of a Brownian
We address the problem of optimizing a Brownian motion. We consider a (r...
read it

Universal Successor Features Approximators
The ability of a reinforcement learning (RL) agent to learn about many r...
read it

Neural Predictive Belief Representations
Unsupervised representation learning has succeeded with excellent result...
read it

ActorCritic Policy Optimization in Partially Observable Multiagent Environments
Optimization of parameterized policies for reinforcement learning (RL) i...
read it

Autoregressive Quantile Networks for Generative Modeling
We introduce autoregressive implicit quantile networks (AIQN), a fundame...
read it

Implicit Quantile Networks for Distributional Reinforcement Learning
In this work, we build on recent advances in distributional reinforcemen...
read it

Maximum a Posteriori Policy Optimisation
We introduce a new algorithm for reinforcement learning called Maximum a...
read it

Observe and Look Further: Achieving Consistent Performance on Atari
Despite significant advances in the field of deep Reinforcement Learning...
read it

Lowpass Recurrent Neural Networks  A memory architecture for longerterm correlation discovery
Reinforcement learning (RL) agents performing complex tasks must be able...
read it

A Study on Overfitting in Deep Reinforcement Learning
Recent years have witnessed significant progresses in deep Reinforcement...
read it

An Analysis of Categorical Distributional Reinforcement Learning
Distributional approaches to valuebased reinforcement learning model th...
read it

Learning to Search with MCTSnets
Planning problems are among the most important and wellstudied problems...
read it

IMPALA: Scalable Distributed DeepRL with Importance Weighted ActorLearner Architectures
In this work we aim to solve a large collection of tasks using a single ...
read it

Distributional Reinforcement Learning with Quantile Regression
In reinforcement learning an agent interacts with the environment by tak...
read it

The Uncertainty Bellman Equation and Exploration
We consider the exploration/exploitation problem in reinforcement learni...
read it

A Distributional Perspective on Reinforcement Learning
In this paper we argue for the fundamental importance of the value distr...
read it

Noisy Networks for Exploration
We introduce NoisyNet, a deep reinforcement learning agent with parametr...
read it

Observational Learning by Reinforcement Learning
Observational learning is a type of learning that occurs as a function o...
read it

The Cramer Distance as a Solution to Biased Wasserstein Gradients
The Wasserstein probability metric has received much attention from the ...
read it

The Reactor: A SampleEfficient ActorCritic Architecture
In this work we present a new reinforcement learning agent, called React...
read it

Automated Curriculum Learning for Neural Networks
We introduce a method for automatically selecting the path, or syllabus,...
read it

Minimax Regret Bounds for Reinforcement Learning
We consider the problem of provably optimal exploration in reinforcement...
read it

Learning to reinforcement learn
In recent years deep reinforcement learning (RL) systems have attained s...
read it