
On Multiobjective Policy Optimization as a Tool for Reinforcement Learning
Many advances that have improved the robustness and efficiency of deep r...
From Motor Control to Team Play in Simulated Humanoid Football
Intelligent behaviour in the physical world exhibits structure at multip...
Neural Production Systems
Visual environments are structured, consisting of distinct objects or en...
Counterfactual Credit Assignment in ModelFree Reinforcement Learning
Credit assignment in reinforcement learning is the problem of measuring ...
Game Plan: What AI can do for Football, and What Football can do for AI
The rapid progress in artificial intelligence (AI) and machine learning ...
Behavior Priors for Efficient Reinforcement Learning
As we deploy reinforcement learning agents to solve increasingly challen...
Robust Constrained Reinforcement Learning for Continuous Control with Model Misspecification
Many realworld physical control systems are required to satisfy constra...
Learning Dexterous Manipulation from Suboptimal Experts
Learning dexterous manipulation in highdimensional stateaction spaces ...
Local Search for Policy Iteration in Continuous Control
We present an algorithm for local, regularized, policy improvement in re...
Beyond TabulaRasa: a Modular Reinforcement Learning Approach for Physically Embedded 3D Sokoban
Intelligent robots need to achieve abstract objectives using concrete, s...
Learning to swim in potential flow
Fish swim by undulating their bodies. These propulsive motions require c...
Physically Embedded Planning Problems: New Challenges for Reinforcement Learning
Recent work in deep reinforcement learning (RL) has produced algorithms ...
Importance Weighted Policy Learning and Adaption
The ability to exploit prior experience to solve novel problems rapidly ...
Action and Perception as Divergence Minimization
We introduce a unified objective for action and perception of intelligen...
Towards General and Autonomous Learning of Core Skills: A Case Study in Locomotion
Modern Reinforcement Learning (RL) algorithms promise to solve difficult...
Dataefficient Hindsight Offpolicy Option Learning
Solutions to most complex tasks can be decomposed into simpler, intermed...
Critic Regularized Regression
Offline reinforcement learning (RL), also known as batch RL, offers the ...
RL Unplugged: Benchmarks for Offline Reinforcement Learning
Offline methods for reinforcement learning have the potential to help br...
dm_control: Software and Tasks for Continuous Control
The dm_control software package is a collection of Python libraries and ...
Simple Sensor Intentions for Exploration
Modern reinforcement learning algorithms can learn solutions to increasi...
A Distributional View on MultiObjective Policy Optimization
Many realworld problems require trading off multiple competing objectiv...
DivideandConquer Monte Carlo Tree Search For GoalDirected Planning
Standard planners for sequential decision making (including Monte Carlo ...
Valuedriven Hindsight Modelling
Value estimation is a critical component of the reinforcement learning (...
ContinuousDiscrete Reinforcement Learning for Hybrid Control in Robotics
Many realworld control problems involve both discrete decision variable...
Hindsight Credit Assignment
We consider the problem of efficient credit assignment in reinforcement ...
Reusable neural skill embeddings for visionguided whole body movement and object manipulation
Both in simulation settings and robotics, there is an ambition to produc...
Quinoa: a Qfunction You Infer Normalized Over Actions
We present an algorithm for learning an approximate actionvalue soft Q...
Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions
A plethora of problems in AI, engineering and the sciences are naturally...
Stabilizing Transformers for Reinforcement Learning
Owing to their ability to both effectively integrate information over lo...
Imagined Value Gradients: ModelBased Policy Optimization with Transferable Latent Dynamics Models
Humans are masters at quickly learning many complex tasks, relying on an...
A Generalized Training Approach for Multiagent Learning
This paper investigates a populationbased training regime based on game...
VMPO: OnPolicy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control
Some of the most successful applications of deep reinforcement learning ...
Regularized Hierarchical Policies for Compositional Transfer in Robotics
The successful application of flexible, general learning algorithms  s...
Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces
Direct optimization is an appealing approach to differentiating through ...
Meta reinforcement learning as task inference
Humans achieve efficient learning by relying on prior knowledge about th...
Metalearning of Sequential Strategies
In this report we review memorybased metalearning as a tool for buildi...
Information asymmetry in KLregularized RL
Many real world tasks exhibit rich structure that is repeated across dif...
Exploiting Hierarchy for Learning and Transfer in KLregularized RL
As reinforcement learning agents are tasked with solving more challengin...
The Termination Critic
In this work, we consider the problem of autonomously discovering behavi...
Emergent Coordination Through Competition
We study the emergence of cooperative behaviors in reinforcement learnin...
Value constrained modelfree continuous control
The naive application of Reinforcement Learning algorithms to continuous...
Credit Assignment Techniques in Stochastic Computation Graphs
Stochastic computation graphs (SCGs) provide a formalism to represent st...
Selfsupervised Learning of Image Embedding for Continuous Control
Operating directly from raw high dimensional sensory inputs like images ...
Relative Entropy Regularized Policy Iteration
We present an offpolicy actorcritic algorithm for Reinforcement Learni...
Entropic Policy Composition with Generalized Policy Improvement and Divergence Correction
Deep reinforcement learning (RL) algorithms have made great strides in r...
Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures
This paper addresses the problem of evaluating learning systems in safet...
Neural probabilistic motor primitives for humanoid control
We focus on the problem of learning a single motor module that can flexi...
Hierarchical visuomotor control of humanoids
We aim to build complex humanoid agents that integrate perception, motor...
Woulda, Coulda, Shoulda: CounterfactuallyGuided Policy Search
Learning policies on data synthesized by models can in principle quench ...
Maximum a Posteriori Policy Optimisation
We introduce a new algorithm for reinforcement learning called Maximum a...
Nicolas Heess
PhD student at the Institute for Adaptive and Neural Computation, University of Edinburgh