
Muesli: Combining Improvements in Policy Optimization
We propose a novel policy update that combines regularized policy optimi...
Counterfactual Credit Assignment in ModelFree Reinforcement Learning
Credit assignment in reinforcement learning is the problem of measuring ...
On the role of planning in modelbased deep reinforcement learning
Modelbased planning is often thought to be necessary for deep, careful ...
Beyond TabulaRasa: a Modular Reinforcement Learning Approach for Physically Embedded 3D Sokoban
Intelligent robots need to achieve abstract objectives using concrete, s...
Physically Embedded Planning Problems: New Challenges for Reinforcement Learning
Recent work in deep reinforcement learning (RL) has produced algorithms ...
Valuedriven Hindsight Modelling
Value estimation is a critical component of the reinforcement learning (...
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
Constructing agents with planning capabilities has long been one of the ...
Augmenting learning using symmetry in a biologicallyinspired domain
Invariances to translation, rotation and other spatial transformations a...
An investigation of modelfree planning
The field of reinforcement learning (RL) is facing increasingly challeng...
Woulda, Coulda, Shoulda: CounterfactuallyGuided Policy Search
Learning policies on data synthesized by models can in principle quench ...
Learning to Search with MCTSnets
Planning problems are among the most important and wellstudied problems...
Mastering Chess and Shogi by SelfPlay with a General Reinforcement Learning Algorithm
The game of chess is the most widelystudied domain in the history of ar...
ImaginationAugmented Agents for Deep Reinforcement Learning
We introduce ImaginationAugmented Agents (I2As), a novel architecture f...
The Predictron: EndToEnd Learning and Planning
One of the key challenges of artificial intelligence is to learn models ...
Learning values across many orders of magnitude
Most learning algorithms are not invariant to the scale of the function ...
Increasing the Action Gap: New Operators for Reinforcement Learning
This paper introduces new optimalitypreserving operators on Qfunctions...
Better Optimism By Bayes: Adaptive Planning with Rich Models
The computational costs of inference and planning have confined Bayesian...
Efficient BayesAdaptive Reinforcement Learning using SampleBased Search
Bayesian modelbased reinforcement learning is a formally elegant approa...
Arthur Guez
