
Taylor Expansion of Discount Factors
In practical reinforcement learning (RL), the discount factor used for e...
Revisiting Peng's Q(λ) for Modern Reinforcement Learning
Offpolicy multistep reinforcement learning algorithms consist of conse...
On The Effect of Auxiliary Tasks on Representation Dynamics
While auxiliary tasks play a key role in shaping the representations lea...
Game Plan: What AI can do for Football, and What Football can do for AI
The rapid progress in artificial intelligence (AI) and machine learning ...
Revisiting Fundamentals of Experience Replay
Experience replay is central to offpolicy algorithms in deep reinforcem...
Navigating the Landscape of Multiplayer Games to Probe the Drosophila of AI
Multiplayer games have a long history in being used as key testbeds for ...
From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization
In this paper we investigate the Follow the Regularized Leader dynamics ...
Conditional Importance Sampling for OffPolicy Learning
The principal contribution of this paper is a conceptual framework for o...
Adaptive TradeOffs in OffPolicy Learning
A great variety of offpolicy learning algorithms exist in the literatur...
A Generalized Training Approach for Multiagent Learning
This paper investigates a populationbased training regime based on game...
Multiagent Evaluation under Incomplete Information
This paper investigates the evaluation of learned multiagent strategies ...
Metalearning of Sequential Strategies
In this report we review memorybased metalearning as a tool for buildi...
Orthogonal Estimation of Wasserstein Distances
Wasserstein distances are increasingly used in a wide variety of applica...
αRank: MultiAgent Evaluation by Evolution
We introduce αRank, a principled evolutionary dynamics methodology, for...
Statistics and Samples in Distributional Reinforcement Learning
We present a unifying framework for designing and analysing distribution...
Antithetic and Monte Carlo kernel estimators for partial rankings
In the modern age, rankings data is ubiquitous and it is useful for a va...
Gaussian Process Behaviour in Wide Deep Neural Networks
Whilst deep neural networks have shown great empirical success, there is...
Structured Evolution with Compact Architectures for Scalable Policy Optimization
We present a new method of blackbox optimization via gradient approximat...
An Analysis of Categorical Distributional Reinforcement Learning
Distributional approaches to valuebased reinforcement learning model th...
Distributional Reinforcement Learning with Quantile Regression
In reinforcement learning an agent interacts with the environment by tak...
The Unreasonable Effectiveness of Structured Random Orthogonal Embeddings
We examine a class of embeddings based on structured random matrices wit...
Magnetic Hamiltonian Monte Carlo
Hamiltonian Monte Carlo (HMC) exploits Hamiltonian dynamics to construct...
Blackbox αdivergence Minimization
Blackbox alpha (BBα) is a new approximate inference method based on th...
