
Statistics and Samples in Distributional Reinforcement Learning
We present a unifying framework for designing and analysing distribution...
read it

Taylor Expansion Policy Optimization
In this work, we investigate the application of Taylor expansions in rei...
read it

From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization
In this paper we investigate the Follow the Regularized Leader dynamics ...
read it

Multiagent Evaluation under Incomplete Information
This paper investigates the evaluation of learned multiagent strategies ...
read it

A Generalized Training Approach for Multiagent Learning
This paper investigates a populationbased training regime based on game...
read it

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement
The ability to transfer skills across tasks has the potential to scale u...
read it

Neural Replicator Dynamics
In multiagent learning, agents interact in inherently nonstationary envi...
read it

Conditional Importance Sampling for OffPolicy Learning
The principal contribution of this paper is a conceptual framework for o...
read it

The Termination Critic
In this work, we consider the problem of autonomously discovering behavi...
read it

ActorCritic Policy Optimization in Partially Observable Multiagent Environments
Optimization of parameterized policies for reinforcement learning (RL) i...
read it

Universal Successor Features Approximators
The ability of a reinforcement learning (RL) agent to learn about many r...
read it

Automated Curriculum Learning for Neural Networks
We introduce a method for automatically selecting the path, or syllabus,...
read it

Distributional Reinforcement Learning with Quantile Regression
In reinforcement learning an agent interacts with the environment by tak...
read it

The Uncertainty Bellman Equation and Exploration
We consider the exploration/exploitation problem in reinforcement learni...
read it

MemoryEfficient Backpropagation Through Time
We propose a novel approach to reduce memory consumption of the backprop...
read it

A Distributional Perspective on Reinforcement Learning
In this paper we argue for the fundamental importance of the value distr...
read it

The Reactor: A SampleEfficient ActorCritic Architecture
In this work we present a new reinforcement learning agent, called React...
read it

Noisy Networks for Exploration
We introduce NoisyNet, a deep reinforcement learning agent with parametr...
read it

Observational Learning by Reinforcement Learning
Observational learning is a type of learning that occurs as a function o...
read it

The Cramer Distance as a Solution to Biased Wasserstein Gradients
The Wasserstein probability metric has received much attention from the ...
read it

Minimax Regret Bounds for Reinforcement Learning
We consider the problem of provably optimal exploration in reinforcement...
read it

Successor Features for Transfer in Reinforcement Learning
Transfer in reinforcement learning refers to the notion that generalizat...
read it

Unifying CountBased Exploration and Intrinsic Motivation
We consider an agent's uncertainty about its environment and the problem...
read it

Learning to reinforcement learn
In recent years deep reinforcement learning (RL) systems have attained s...
read it

Combining policy gradient and Qlearning
Policy gradient is an efficient technique for improving a policy in a re...
read it

Increasing the Action Gap: New Operators for Reinforcement Learning
This paper introduces new optimalitypreserving operators on Qfunctions...
read it

Safe and Efficient OffPolicy Reinforcement Learning
In this work, we take a fresh look at some old and new algorithms for of...
read it

On Minimax Optimal Offline Policy Evaluation
This paper studies the offpolicy evaluation problem, where one aims to ...
read it

Bandit Algorithms for Tree Search
Bandit based methods for tree search have recently gained popularity whe...
read it

Q(λ) with OffPolicy Corrections
We propose and analyze an alternate approach to offpolicy multistep te...
read it

Generalized Emphatic Temporal Difference Learning: BiasVariance Analysis
We consider the offpolicy evaluation problem in Markov decision process...
read it

Active Regression by Stratification
We propose a new active learning algorithm for parametric linear regress...
read it

FiniteTime Analysis of Kernelised Contextual Bandits
We tackle the problem of online reward maximisation over a large finite ...
read it

Thompson Sampling for 1Dimensional Exponential Family Bandits
Thompson Sampling has been demonstrated in many complex bandit models, h...
read it

Fast gradient descent for drifting least squares regression, with application to bandits
Online learning algorithms require to often recompute least squares regr...
read it

Stochastic approximation for speeding up LSTD (and LSPI)
We propose a stochastic approximation (SA) based method with randomizati...
read it

Toward Optimal Stratification for Stratified MonteCarlo Integration
We consider the problem of adaptive stratified sampling for Monte Carlo ...
read it

Adaptive Stratified Sampling for MonteCarlo integration of Differentiable functions
We consider the problem of adaptive stratified sampling for Monte Carlo ...
read it

Regret Bounds for Restless Markov Bandits
We consider the restless Markov bandit problem, in which the state of ea...
read it

On the Sample Complexity of Reinforcement Learning with a Generative Model
We consider the problem of learning the optimal actionvalue function in...
read it

Thompson Sampling: An Asymptotically Optimal Finite Time Analysis
The question of the optimality of Thompson Sampling for solving the stoc...
read it

IMPALA: Scalable Distributed DeepRL with Importance Weighted ActorLearner Architectures
In this work we aim to solve a large collection of tasks using a single ...
read it

An Analysis of Categorical Distributional Reinforcement Learning
Distributional approaches to valuebased reinforcement learning model th...
read it

Learning to Search with MCTSnets
Planning problems are among the most important and wellstudied problems...
read it

A Study on Overfitting in Deep Reinforcement Learning
Recent years have witnessed significant progresses in deep Reinforcement...
read it

Lowpass Recurrent Neural Networks  A memory architecture for longerterm correlation discovery
Reinforcement learning (RL) agents performing complex tasks must be able...
read it

Observe and Look Further: Achieving Consistent Performance on Atari
Despite significant advances in the field of deep Reinforcement Learning...
read it

Autoregressive Quantile Networks for Generative Modeling
We introduce autoregressive implicit quantile networks (AIQN), a fundame...
read it

Implicit Quantile Networks for Distributional Reinforcement Learning
In this work, we build on recent advances in distributional reinforcemen...
read it

Maximum a Posteriori Policy Optimisation
We introduce a new algorithm for reinforcement learning called Maximum a...
read it