
Does Standard Backpropagation Forget Less Catastrophically Than Adam?
Catastrophic forgetting remains a severe hindrance to the broad applicat...
read it

AverageReward OffPolicy Policy Evaluation with Function Approximation
We consider offpolicy policy evaluation with function approximation (FA...
read it

Understanding the Pathologies of Approximate Policy Evaluation when Combined with Greedification in Reinforcement Learning
Despite empirical success, the theory of reinforcement learning (RL) wit...
read it

Documentediting Assistants and Modelbased Reinforcement Learning as a Path to Conversational AI
Intelligent assistants that follow commands or answer simple questions, ...
read it

Inverse Policy Evaluation for Valuebased Sequential Decisionmaking
Valuebased methods for reinforcement learning lack generally applicable...
read it

Learning and Planning in AverageReward Markov Decision Processes
We introduce improved learning and planning algorithms for averagerewar...
read it

Learning Sparse Representations Incrementally in Deep Reinforcement Learning
Sparse representations have been shown to be useful in deep reinforcemen...
read it

Discounted Reinforcement Learning is Not an Optimization Problem
Discounted reinforcement learning is fundamentally incompatible with fun...
read it

FixedHorizon Temporal Difference Methods for Stable Reinforcement Learning
We explore fixedhorizon temporal difference (TD) methods, reinforcement...
read it

Planning with Expectation Models
Distribution and sample models are two popular model choices in modelba...
read it

Learning Feature Relevance Through Step Size Adaptation in TemporalDifference Learning
There is a long history of using meta learning as representation learnin...
read it

Should All Temporal Difference Learning Use Emphasis?
Emphatic Temporal Difference (ETD) learning has recently been proposed a...
read it

Understanding MultiStep Deep Reinforcement Learning: A Systematic Study of the DQN Target
Multistep methods such as Retrace(λ) and nstep Qlearning have become ...
read it

Online Offpolicy Prediction
This paper investigates the problem of online prediction learning, where...
read it

Predicting Periodicity with Temporal Difference Learning
Temporal difference (TD) learning is an important approach in reinforcem...
read it

Perdecision Multistep Temporal Difference Learning with Control Variates
Multistep temporal difference (TD) learning is an important approach in...
read it

Integrating Episodic Memory into a Reinforcement Learning Agent using Reservoir Sampling
Episodic memory is a psychology term which refers to the ability to reca...
read it

Two geometric input transformation methods for fast online reinforcement learning with neural nets
We apply neural nets with ReLU gates in online reinforcement learning. O...
read it

TIDBD: Adapting Temporaldifference Stepsizes Through Stochastic Metadescent
In this paper, we introduce a method for adapting the stepsizes of temp...
read it

Reactive Reinforcement Learning in Asynchronous Environments
The relationship between a reinforcement learning (RL) agent and an asyn...
read it

Directly Estimating the Variance of the λReturn Using TemporalDifference Methods
This paper investigates estimating the variance of a temporaldifference...
read it

A Deeper Look at Experience Replay
Experience replay plays an important role in the success of deep reinfor...
read it

Communicative Capital for Prosthetic Agents
This work presents an overarching perspective on the role that machine i...
read it

A First Empirical Study of Emphatic Temporal Difference Learning
In this paper we present the first empirical study of the emphatic tempo...
read it

Integral Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space
Policy iteration (PI) is a recursive process of policy evaluation and im...
read it

Learning Representations by Stochastic MetaGradient Descent in Neural Networks
Representations are fundamental to artificial intelligence. The performa...
read it

Face valuing: Training user interfaces with facial expressions and reinforcement learning
An important application of interactive machine learning is extending or...
read it

True Online TemporalDifference Learning
The temporaldifference methods TD(λ) and Sarsa(λ) form a core part of m...
read it

An Empirical Evaluation of True Online TD(λ)
The true online TD(λ) algorithm has recently been proposed (van Seijen a...
read it

TemporalDifference Learning to Assist Human Decision Making during the Control of an Artificial Limb
In this work we explore the use of reinforcement learning (RL) to help w...
read it

Planning by Prioritized Sweeping with Small Backups
Efficient planning plays a crucial role in modelbased reinforcement lea...
read it

DynaStyle Planning with Linear Function Approximation and Prioritized Sweeping
We consider the problem of efficiently learning optimal control policies...
read it
Richard S. Sutton
is this you? claim profile
Distinguished Research Scientist at DeepMind Techologies, Professor and iCORE chair in Department of Computing Science at University of Alberta, AITF Chair in Reinforcement Learning and Artificial Intelligence Department of Computing Science at University of Alberta, Ph.D., Computer Science, University of Massachusetts, 1984