
Muesli: Combining Improvements in Policy Optimization
We propose a novel policy update that combines regularized policy optimi...
read it

Counterfactual Credit Assignment in ModelFree Reinforcement Learning
Credit assignment in reinforcement learning is the problem of measuring ...
read it

On the role of planning in modelbased deep reinforcement learning
Modelbased planning is often thought to be necessary for deep, careful ...
read it

Beyond TabulaRasa: a Modular Reinforcement Learning Approach for Physically Embedded 3D Sokoban
Intelligent robots need to achieve abstract objectives using concrete, s...
read it

Physically Embedded Planning Problems: New Challenges for Reinforcement Learning
Recent work in deep reinforcement learning (RL) has produced algorithms ...
read it

Valuedriven Hindsight Modelling
Value estimation is a critical component of the reinforcement learning (...
read it

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
Constructing agents with planning capabilities has long been one of the ...
read it

Augmenting learning using symmetry in a biologicallyinspired domain
Invariances to translation, rotation and other spatial transformations a...
read it

An investigation of modelfree planning
The field of reinforcement learning (RL) is facing increasingly challeng...
read it

Woulda, Coulda, Shoulda: CounterfactuallyGuided Policy Search
Learning policies on data synthesized by models can in principle quench ...
read it

Learning to Search with MCTSnets
Planning problems are among the most important and wellstudied problems...
read it

Mastering Chess and Shogi by SelfPlay with a General Reinforcement Learning Algorithm
The game of chess is the most widelystudied domain in the history of ar...
read it

ImaginationAugmented Agents for Deep Reinforcement Learning
We introduce ImaginationAugmented Agents (I2As), a novel architecture f...
read it

The Predictron: EndToEnd Learning and Planning
One of the key challenges of artificial intelligence is to learn models ...
read it

Learning values across many orders of magnitude
Most learning algorithms are not invariant to the scale of the function ...
read it

Increasing the Action Gap: New Operators for Reinforcement Learning
This paper introduces new optimalitypreserving operators on Qfunctions...
read it

Better Optimism By Bayes: Adaptive Planning with Rich Models
The computational costs of inference and planning have confined Bayesian...
read it

Efficient BayesAdaptive Reinforcement Learning using SampleBased Search
Bayesian modelbased reinforcement learning is a formally elegant approa...
read it
Arthur Guez
is this you? claim profile