
ModelFree Learning for TwoPlayer ZeroSum Partially Observable Markov Games with Perfect Recall
We study the problem of learning a Nash equilibrium (NE) in an imperfect...
Taylor Expansion of Discount Factors
In practical reinforcement learning (RL), the discount factor used for e...
Concave Utility Reinforcement Learning: the Meanfield Game viewpoint
Concave Utility Reinforcement Learning (CURL) extends RL from linear to ...
Revisiting Peng's Q(λ) for Modern Reinforcement Learning
Offpolicy multistep reinforcement learning algorithms consist of conse...
Bootstrapped Representation Learning on Graphs
Current stateoftheart selfsupervised learning methods for graph neur...
Geometric Entropic Exploration
Exploration is essential for solving complex Reinforcement Learning (RL)...
Counterfactual Credit Assignment in ModelFree Reinforcement Learning
Credit assignment in reinforcement learning is the problem of measuring ...
Game Plan: What AI can do for Football, and What Football can do for AI
The rapid progress in artificial intelligence (AI) and machine learning ...
The Advantage RegretMatching ActorCritic
Regret minimization has played a key role in online learning, equilibriu...
MonteCarlo Tree Search as Regularized Policy Optimization
The combination of MonteCarlo tree search (MCTS) with deep reinforcemen...
Bootstrap Your Own Latent: A New Approach to SelfSupervised Learning
We introduce Bootstrap Your Own Latent (BYOL), a new approach to selfsu...
Navigating the Landscape of Multiplayer Games to Probe the Drosophila of AI
Multiplayer games have a long history in being used as key testbeds for ...
Bootstrap LatentPredictive Representations for Multitask Reinforcement Learning
Learning a good representation is an essential component for deep reinfo...
Leverage the Average: an Analysis of Regularization in RL
Building upon the formalism of regularized Markov decision processes, we...
Taylor Expansion Policy Optimization
In this work, we investigate the application of Taylor expansions in rei...
From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization
In this paper we investigate the Follow the Regularized Leader dynamics ...
Hindsight Credit Assignment
We consider the problem of efficient credit assignment in reinforcement ...
Conditional Importance Sampling for OffPolicy Learning
The principal contribution of this paper is a conceptual framework for o...
Adaptive TradeOffs in OffPolicy Learning
A great variety of offpolicy learning algorithms exist in the literatur...
A Generalized Training Approach for Multiagent Learning
This paper investigates a populationbased training regime based on game...
Multiagent Evaluation under Incomplete Information
This paper investigates the evaluation of learned multiagent strategies ...
Neural Replicator Dynamics
In multiagent learning, agents interact in inherently nonstationary envi...
αRank: MultiAgent Evaluation by Evolution
We introduce αRank, a principled evolutionary dynamics methodology, for...
The Termination Critic
In this work, we consider the problem of autonomously discovering behavi...
Statistics and Samples in Distributional Reinforcement Learning
We present a unifying framework for designing and analysing distribution...
World Discovery Models
As humans we are driven by a strong desire for seeking novelty in our wo...
Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement
The ability to transfer skills across tasks has the potential to scale u...
Optimistic optimization of a Brownian
We address the problem of optimizing a Brownian motion. We consider a (r...
Universal Successor Features Approximators
The ability of a reinforcement learning (RL) agent to learn about many r...
Neural Predictive Belief Representations
Unsupervised representation learning has succeeded with excellent result...
ActorCritic Policy Optimization in Partially Observable Multiagent Environments
Optimization of parameterized policies for reinforcement learning (RL) i...
Autoregressive Quantile Networks for Generative Modeling
We introduce autoregressive implicit quantile networks (AIQN), a fundame...
Implicit Quantile Networks for Distributional Reinforcement Learning
In this work, we build on recent advances in distributional reinforcemen...
Maximum a Posteriori Policy Optimisation
We introduce a new algorithm for reinforcement learning called Maximum a...
Observe and Look Further: Achieving Consistent Performance on Atari
Despite significant advances in the field of deep Reinforcement Learning...
Lowpass Recurrent Neural Networks  A memory architecture for longerterm correlation discovery
Reinforcement learning (RL) agents performing complex tasks must be able...
A Study on Overfitting in Deep Reinforcement Learning
Recent years have witnessed significant progresses in deep Reinforcement...
An Analysis of Categorical Distributional Reinforcement Learning
Distributional approaches to valuebased reinforcement learning model th...
Learning to Search with MCTSnets
Planning problems are among the most important and wellstudied problems...
IMPALA: Scalable Distributed DeepRL with Importance Weighted ActorLearner Architectures
In this work we aim to solve a large collection of tasks using a single ...
Distributional Reinforcement Learning with Quantile Regression
In reinforcement learning an agent interacts with the environment by tak...
The Uncertainty Bellman Equation and Exploration
We consider the exploration/exploitation problem in reinforcement learni...
A Distributional Perspective on Reinforcement Learning
In this paper we argue for the fundamental importance of the value distr...
Noisy Networks for Exploration
We introduce NoisyNet, a deep reinforcement learning agent with parametr...
Observational Learning by Reinforcement Learning
Observational learning is a type of learning that occurs as a function o...
The Cramer Distance as a Solution to Biased Wasserstein Gradients
The Wasserstein probability metric has received much attention from the ...
The Reactor: A SampleEfficient ActorCritic Architecture
In this work we present a new reinforcement learning agent, called React...
Automated Curriculum Learning for Neural Networks
We introduce a method for automatically selecting the path, or syllabus,...
Minimax Regret Bounds for Reinforcement Learning
We consider the problem of provably optimal exploration in reinforcement...
Learning to reinforcement learn
In recent years deep reinforcement learning (RL) systems have attained s...
