
Statistics and Samples in Distributional Reinforcement Learning
We present a unifying framework for designing and analysing distribution...
02/21/2019 ∙ by Mark Rowland, et al. ∙ 64 ∙ shareread it

Multiagent Evaluation under Incomplete Information
This paper investigates the evaluation of learned multiagent strategies ...
09/21/2019 ∙ by Mark Rowland, et al. ∙ 24 ∙ shareread it

A Generalized Training Approach for Multiagent Learning
This paper investigates a populationbased training regime based on game...
09/27/2019 ∙ by Paul Müller, et al. ∙ 20 ∙ shareread it

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement
The ability to transfer skills across tasks has the potential to scale u...
01/30/2019 ∙ by Andre Barreto, et al. ∙ 12 ∙ shareread it

Neural Replicator Dynamics
In multiagent learning, agents interact in inherently nonstationary envi...
06/01/2019 ∙ by Shayegan Omidshafiei, et al. ∙ 12 ∙ shareread it

Conditional Importance Sampling for OffPolicy Learning
The principal contribution of this paper is a conceptual framework for o...
10/16/2019 ∙ by Mark Rowland, et al. ∙ 12 ∙ shareread it

The Termination Critic
In this work, we consider the problem of autonomously discovering behavi...
02/26/2019 ∙ by Anna Harutyunyan, et al. ∙ 10 ∙ shareread it

ActorCritic Policy Optimization in Partially Observable Multiagent Environments
Optimization of parameterized policies for reinforcement learning (RL) i...
10/21/2018 ∙ by Sriram Srinivasan, et al. ∙ 8 ∙ shareread it

Universal Successor Features Approximators
The ability of a reinforcement learning (RL) agent to learn about many r...
12/18/2018 ∙ by Diana Borsa, et al. ∙ 6 ∙ shareread it

Automated Curriculum Learning for Neural Networks
We introduce a method for automatically selecting the path, or syllabus,...
04/10/2017 ∙ by Alex Graves, et al. ∙ 0 ∙ shareread it

Distributional Reinforcement Learning with Quantile Regression
In reinforcement learning an agent interacts with the environment by tak...
10/27/2017 ∙ by Will Dabney, et al. ∙ 0 ∙ shareread it

The Uncertainty Bellman Equation and Exploration
We consider the exploration/exploitation problem in reinforcement learni...
09/15/2017 ∙ by Brendan O'Donoghue, et al. ∙ 0 ∙ shareread it

MemoryEfficient Backpropagation Through Time
We propose a novel approach to reduce memory consumption of the backprop...
06/10/2016 ∙ by Audrūnas Gruslys, et al. ∙ 0 ∙ shareread it

A Distributional Perspective on Reinforcement Learning
In this paper we argue for the fundamental importance of the value distr...
07/21/2017 ∙ by Marc G. Bellemare, et al. ∙ 0 ∙ shareread it

The Reactor: A SampleEfficient ActorCritic Architecture
In this work we present a new reinforcement learning agent, called React...
04/15/2017 ∙ by Audrūnas Gruslys, et al. ∙ 0 ∙ shareread it

Noisy Networks for Exploration
We introduce NoisyNet, a deep reinforcement learning agent with parametr...
06/30/2017 ∙ by Meire Fortunato, et al. ∙ 0 ∙ shareread it

Observational Learning by Reinforcement Learning
Observational learning is a type of learning that occurs as a function o...
06/20/2017 ∙ by Diana Borsa, et al. ∙ 0 ∙ shareread it

The Cramer Distance as a Solution to Biased Wasserstein Gradients
The Wasserstein probability metric has received much attention from the ...
05/30/2017 ∙ by Marc G. Bellemare, et al. ∙ 0 ∙ shareread it

Minimax Regret Bounds for Reinforcement Learning
We consider the problem of provably optimal exploration in reinforcement...
03/16/2017 ∙ by Mohammad Gheshlaghi Azar, et al. ∙ 0 ∙ shareread it

Successor Features for Transfer in Reinforcement Learning
Transfer in reinforcement learning refers to the notion that generalizat...
06/16/2016 ∙ by Andre Barreto, et al. ∙ 0 ∙ shareread it

Unifying CountBased Exploration and Intrinsic Motivation
We consider an agent's uncertainty about its environment and the problem...
06/06/2016 ∙ by Marc G. Bellemare, et al. ∙ 0 ∙ shareread it

Learning to reinforcement learn
In recent years deep reinforcement learning (RL) systems have attained s...
11/17/2016 ∙ by Jane X Wang, et al. ∙ 0 ∙ shareread it

Combining policy gradient and Qlearning
Policy gradient is an efficient technique for improving a policy in a re...
11/05/2016 ∙ by Brendan O'Donoghue, et al. ∙ 0 ∙ shareread it

Increasing the Action Gap: New Operators for Reinforcement Learning
This paper introduces new optimalitypreserving operators on Qfunctions...
12/15/2015 ∙ by Marc G. Bellemare, et al. ∙ 0 ∙ shareread it

Safe and Efficient OffPolicy Reinforcement Learning
In this work, we take a fresh look at some old and new algorithms for of...
06/08/2016 ∙ by Remi Munos, et al. ∙ 0 ∙ shareread it

On Minimax Optimal Offline Policy Evaluation
This paper studies the offpolicy evaluation problem, where one aims to ...
09/12/2014 ∙ by Lihong Li, et al. ∙ 0 ∙ shareread it

Bandit Algorithms for Tree Search
Bandit based methods for tree search have recently gained popularity whe...
08/09/2014 ∙ by PierreArnuad Coquelin, et al. ∙ 0 ∙ shareread it

Q(λ) with OffPolicy Corrections
We propose and analyze an alternate approach to offpolicy multistep te...
02/16/2016 ∙ by Anna Harutyunyan, et al. ∙ 0 ∙ shareread it

Generalized Emphatic Temporal Difference Learning: BiasVariance Analysis
We consider the offpolicy evaluation problem in Markov decision process...
09/17/2015 ∙ by Assaf Hallak, et al. ∙ 0 ∙ shareread it

Active Regression by Stratification
We propose a new active learning algorithm for parametric linear regress...
10/22/2014 ∙ by Sivan Sabato, et al. ∙ 0 ∙ shareread it

FiniteTime Analysis of Kernelised Contextual Bandits
We tackle the problem of online reward maximisation over a large finite ...
09/26/2013 ∙ by Michal Valko, et al. ∙ 0 ∙ shareread it

Thompson Sampling for 1Dimensional Exponential Family Bandits
Thompson Sampling has been demonstrated in many complex bandit models, h...
07/12/2013 ∙ by Nathaniel Korda, et al. ∙ 0 ∙ shareread it

Fast gradient descent for drifting least squares regression, with application to bandits
Online learning algorithms require to often recompute least squares regr...
07/11/2013 ∙ by Nathaniel Korda, et al. ∙ 0 ∙ shareread it

Stochastic approximation for speeding up LSTD (and LSPI)
We propose a stochastic approximation (SA) based method with randomizati...
06/11/2013 ∙ by L. A. Prashanth, et al. ∙ 0 ∙ shareread it

Toward Optimal Stratification for Stratified MonteCarlo Integration
We consider the problem of adaptive stratified sampling for Monte Carlo ...
03/12/2013 ∙ by Alexandra Carpentier, et al. ∙ 0 ∙ shareread it

Adaptive Stratified Sampling for MonteCarlo integration of Differentiable functions
We consider the problem of adaptive stratified sampling for Monte Carlo ...
10/19/2012 ∙ by Alexandra Carpentier, et al. ∙ 0 ∙ shareread it

Regret Bounds for Restless Markov Bandits
We consider the restless Markov bandit problem, in which the state of ea...
09/12/2012 ∙ by Ronald Ortner, et al. ∙ 0 ∙ shareread it

On the Sample Complexity of Reinforcement Learning with a Generative Model
We consider the problem of learning the optimal actionvalue function in...
06/27/2012 ∙ by Mohammad Gheshlaghi Azar, et al. ∙ 0 ∙ shareread it

Thompson Sampling: An Asymptotically Optimal Finite Time Analysis
The question of the optimality of Thompson Sampling for solving the stoc...
05/18/2012 ∙ by Emilie Kaufmann, et al. ∙ 0 ∙ shareread it

IMPALA: Scalable Distributed DeepRL with Importance Weighted ActorLearner Architectures
In this work we aim to solve a large collection of tasks using a single ...
02/05/2018 ∙ by Lasse Espeholt, et al. ∙ 0 ∙ shareread it

An Analysis of Categorical Distributional Reinforcement Learning
Distributional approaches to valuebased reinforcement learning model th...
02/22/2018 ∙ by Mark Rowland, et al. ∙ 0 ∙ shareread it

Learning to Search with MCTSnets
Planning problems are among the most important and wellstudied problems...
02/13/2018 ∙ by Arthur Guez, et al. ∙ 0 ∙ shareread it

A Study on Overfitting in Deep Reinforcement Learning
Recent years have witnessed significant progresses in deep Reinforcement...
04/18/2018 ∙ by Chiyuan Zhang, et al. ∙ 0 ∙ shareread it

Lowpass Recurrent Neural Networks  A memory architecture for longerterm correlation discovery
Reinforcement learning (RL) agents performing complex tasks must be able...
05/13/2018 ∙ by Thomas Stepleton, et al. ∙ 0 ∙ shareread it

Observe and Look Further: Achieving Consistent Performance on Atari
Despite significant advances in the field of deep Reinforcement Learning...
05/29/2018 ∙ by Tobias Pohlen, et al. ∙ 0 ∙ shareread it

Autoregressive Quantile Networks for Generative Modeling
We introduce autoregressive implicit quantile networks (AIQN), a fundame...
06/14/2018 ∙ by Georg Ostrovski, et al. ∙ 0 ∙ shareread it

Implicit Quantile Networks for Distributional Reinforcement Learning
In this work, we build on recent advances in distributional reinforcemen...
06/14/2018 ∙ by Will Dabney, et al. ∙ 0 ∙ shareread it

Maximum a Posteriori Policy Optimisation
We introduce a new algorithm for reinforcement learning called Maximum a...
06/14/2018 ∙ by Abbas Abdolmaleki, et al. ∙ 0 ∙ shareread it

Neural Predictive Belief Representations
Unsupervised representation learning has succeeded with excellent result...
11/15/2018 ∙ by Zhaohan Daniel Guo, et al. ∙ 0 ∙ shareread it

World Discovery Models
As humans we are driven by a strong desire for seeking novelty in our wo...
02/20/2019 ∙ by Mohammad Gheshlaghi Azar, et al. ∙ 0 ∙ shareread it