
Does Standard Backpropagation Forget Less Catastrophically Than Adam?
Catastrophic forgetting remains a severe hindrance to the broad applicat...
AverageReward OffPolicy Policy Evaluation with Function Approximation
We consider offpolicy policy evaluation with function approximation (FA...
Understanding the Pathologies of Approximate Policy Evaluation when Combined with Greedification in Reinforcement Learning
Despite empirical success, the theory of reinforcement learning (RL) wit...
Documentediting Assistants and Modelbased Reinforcement Learning as a Path to Conversational AI
Intelligent assistants that follow commands or answer simple questions, ...
Inverse Policy Evaluation for Valuebased Sequential Decisionmaking
Valuebased methods for reinforcement learning lack generally applicable...
Learning and Planning in AverageReward Markov Decision Processes
We introduce improved learning and planning algorithms for averagerewar...
Learning Sparse Representations Incrementally in Deep Reinforcement Learning
Sparse representations have been shown to be useful in deep reinforcemen...
Discounted Reinforcement Learning is Not an Optimization Problem
Discounted reinforcement learning is fundamentally incompatible with fun...
FixedHorizon Temporal Difference Methods for Stable Reinforcement Learning
We explore fixedhorizon temporal difference (TD) methods, reinforcement...
Planning with Expectation Models
Distribution and sample models are two popular model choices in modelba...
Learning Feature Relevance Through Step Size Adaptation in TemporalDifference Learning
There is a long history of using meta learning as representation learnin...
Should All Temporal Difference Learning Use Emphasis?
Emphatic Temporal Difference (ETD) learning has recently been proposed a...
Understanding MultiStep Deep Reinforcement Learning: A Systematic Study of the DQN Target
Multistep methods such as Retrace(λ) and nstep Qlearning have become ...
Online Offpolicy Prediction
This paper investigates the problem of online prediction learning, where...
Predicting Periodicity with Temporal Difference Learning
Temporal difference (TD) learning is an important approach in reinforcem...
Perdecision Multistep Temporal Difference Learning with Control Variates
Multistep temporal difference (TD) learning is an important approach in...
Integrating Episodic Memory into a Reinforcement Learning Agent using Reservoir Sampling
Episodic memory is a psychology term which refers to the ability to reca...
Two geometric input transformation methods for fast online reinforcement learning with neural nets
We apply neural nets with ReLU gates in online reinforcement learning. O...
TIDBD: Adapting Temporaldifference Stepsizes Through Stochastic Metadescent
In this paper, we introduce a method for adapting the stepsizes of temp...
Reactive Reinforcement Learning in Asynchronous Environments
The relationship between a reinforcement learning (RL) agent and an asyn...
Directly Estimating the Variance of the λReturn Using TemporalDifference Methods
This paper investigates estimating the variance of a temporaldifference...
A Deeper Look at Experience Replay
Experience replay plays an important role in the success of deep reinfor...
Communicative Capital for Prosthetic Agents
This work presents an overarching perspective on the role that machine i...
A First Empirical Study of Emphatic Temporal Difference Learning
In this paper we present the first empirical study of the emphatic tempo...
Integral Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space
Policy iteration (PI) is a recursive process of policy evaluation and im...
Learning Representations by Stochastic MetaGradient Descent in Neural Networks
Representations are fundamental to artificial intelligence. The performa...
Face valuing: Training user interfaces with facial expressions and reinforcement learning
An important application of interactive machine learning is extending or...
True Online TemporalDifference Learning
The temporaldifference methods TD(λ) and Sarsa(λ) form a core part of m...
An Empirical Evaluation of True Online TD(λ)
The true online TD(λ) algorithm has recently been proposed (van Seijen a...
TemporalDifference Learning to Assist Human Decision Making during the Control of an Artificial Limb
In this work we explore the use of reinforcement learning (RL) to help w...
Planning by Prioritized Sweeping with Small Backups
Efficient planning plays a crucial role in modelbased reinforcement lea...
DynaStyle Planning with Linear Function Approximation and Prioritized Sweeping
We consider the problem of efficiently learning optimal control policies...
Richard S. Sutton
Distinguished Research Scientist at DeepMind Techologies, Professor and iCORE chair in Department of Computing Science at University of Alberta, AITF Chair in Reinforcement Learning and Artificial Intelligence Department of Computing Science at University of Alberta, Ph.D., Computer Science, University of Massachusetts, 1984