
Critic Regularized Regression
Offline reinforcement learning (RL), also known as batch RL, offers the ...
DivideandConquer Monte Carlo Tree Search For GoalDirected Planning
Standard planners for sequential decision making (including Monte Carlo ...
Exploiting Hierarchy for Learning and Transfer in KLregularized RL
As reinforcement learning agents are tasked with solving more challengin...
Quinoa: a Qfunction You Infer Normalized Over Actions
We present an algorithm for learning an approximate actionvalue soft Q...
A Generalized Training Approach for Multiagent Learning
This paper investigates a populationbased training regime based on game...
dm_control: Software and Tasks for Continuous Control
The dm_control software package is a collection of Python libraries and ...
Valuedriven Hindsight Modelling
Value estimation is a critical component of the reinforcement learning (...
Metalearning of Sequential Strategies
In this report we review memorybased metalearning as a tool for buildi...
ContinuousDiscrete Reinforcement Learning for Hybrid Control in Robotics
Many realworld control problems involve both discrete decision variable...
Neural probabilistic motor primitives for humanoid control
We focus on the problem of learning a single motor module that can flexi...
Reusable neural skill embeddings for visionguided whole body movement and object manipulation
Both in simulation settings and robotics, there is an ambition to produc...
The Termination Critic
In this work, we consider the problem of autonomously discovering behavi...
RL Unplugged: Benchmarks for Offline Reinforcement Learning
Offline methods for reinforcement learning have the potential to help br...
Stabilizing Transformers for Reinforcement Learning
Owing to their ability to both effectively integrate information over lo...
Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures
This paper addresses the problem of evaluating learning systems in safet...
Information asymmetry in KLregularized RL
Many real world tasks exhibit rich structure that is repeated across dif...
Meta reinforcement learning as task inference
Humans achieve efficient learning by relying on prior knowledge about th...
Imagined Value Gradients: ModelBased Policy Optimization with Transferable Latent Dynamics Models
Humans are masters at quickly learning many complex tasks, relying on an...
Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces
Direct optimization is an appealing approach to differentiating through ...
Mix&Match  Agent Curricula for Reinforcement Learning
We introduce Mix&Match (M&M)  a training framework designed to facilita...
Graph networks as learnable physics engines for inference and control
Understanding and interacting with everyday physical scenes requires ric...
Relative Entropy Regularized Policy Iteration
We present an offpolicy actorcritic algorithm for Reinforcement Learni...
Entropic Policy Composition with Generalized Policy Improvement and Divergence Correction
Deep reinforcement learning (RL) algorithms have made great strides in r...
Emergent Coordination Through Competition
We study the emergence of cooperative behaviors in reinforcement learnin...
Simple Sensor Intentions for Exploration
Modern reinforcement learning algorithms can learn solutions to increasi...
Regularized Hierarchical Policies for Compositional Transfer in Robotics
The successful application of flexible, general learning algorithms  s...
Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards
We propose a general and modelfree approach for Reinforcement Learning ...
Emergence of Locomotion Behaviours in Rich Environments
The reinforcement learning paradigm allows, in principle, for complex be...
Learning Hierarchical Information Flow with Recurrent Neural Modules
We propose ThalNet, a deep learning model inspired by neocortical commun...
ImaginationAugmented Agents for Deep Reinforcement Learning
We introduce ImaginationAugmented Agents (I2As), a novel architecture f...
Learning modelbased planning from scratch
Conventional wisdom holds that modelbased planning is a powerful approa...
Metacontrol for Adaptive ImaginationBased Optimization
Many machine learning systems are built to solve the hardest examples of...
Distral: Robust Multitask Reinforcement Learning
Most deep reinforcement learning algorithms are data inefficient in comp...
Learning Continuous Control Policies by Stochastic Value Gradients
We present a unified framework for learning continuous control policies ...
Filtering Variational Objectives
When used as a surrogate objective for maximum likelihood estimation in ...
Learning and Transfer of Modulated Locomotor Controllers
We study a novel architecture and training procedure for locomotion task...
Continuous control with deep reinforcement learning
We adapt the ideas underlying the success of Deep QLearning to the cont...
Unsupervised Learning of 3D Structure from Images
A key goal of computer vision is to recover the underlying 3D structure ...
KernelBased JustInTime Learning for Passing Expectation Propagation Messages
We propose an efficient nonparametric strategy for learning a message op...
Passing Expectation Propagation Messages with Kernel Methods
We propose to learn a kernelbased message operator which takes as input...
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models
We present a framework for efficient inference in structured image model...
Recurrent Models of Visual Attention
Applying convolutional neural networks to large images is computationall...
Weakly Supervised Learning of ForegroundBackground Segmentation using Masked RBMs
We propose an extension of the Restricted Boltzmann Machine (RBM) that a...
Reinforcement and Imitation Learning for Diverse Visuomotor Skills
We propose a modelfree deep reinforcement learning method that leverage...
Learning by Playing  Solving Sparse Reward Tasks from Scratch
We propose Scheduled Auxiliary Control (SACX), a new learning paradigm ...
Distributed Distributional Deterministic Policy Gradients
This work adopts the very successful distributional perspective on reinf...
Relational inductive biases, deep learning, and graph networks
Artificial intelligence (AI) has undergone a renaissance recently, makin...
Maximum a Posteriori Policy Optimisation
We introduce a new algorithm for reinforcement learning called Maximum a...
Woulda, Coulda, Shoulda: CounterfactuallyGuided Policy Search
Learning policies on data synthesized by models can in principle quench ...
Hierarchical visuomotor control of humanoids
We aim to build complex humanoid agents that integrate perception, motor...
Nicolas Heess
PhD student at the Institute for Adaptive and Neural Computation, University of Edinburgh