
Exploiting Hierarchy for Learning and Transfer in KLregularized RL
As reinforcement learning agents are tasked with solving more challengin...
read it

Quinoa: a Qfunction You Infer Normalized Over Actions
We present an algorithm for learning an approximate actionvalue soft Q...
read it

A Generalized Training Approach for Multiagent Learning
This paper investigates a populationbased training regime based on game...
read it

Valuedriven Hindsight Modelling
Value estimation is a critical component of the reinforcement learning (...
read it

Metalearning of Sequential Strategies
In this report we review memorybased metalearning as a tool for buildi...
read it

ContinuousDiscrete Reinforcement Learning for Hybrid Control in Robotics
Many realworld control problems involve both discrete decision variable...
read it

Neural probabilistic motor primitives for humanoid control
We focus on the problem of learning a single motor module that can flexi...
read it

Reusable neural skill embeddings for visionguided whole body movement and object manipulation
Both in simulation settings and robotics, there is an ambition to produc...
read it

The Termination Critic
In this work, we consider the problem of autonomously discovering behavi...
read it

Stabilizing Transformers for Reinforcement Learning
Owing to their ability to both effectively integrate information over lo...
read it

Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures
This paper addresses the problem of evaluating learning systems in safet...
read it

Information asymmetry in KLregularized RL
Many real world tasks exhibit rich structure that is repeated across dif...
read it

Meta reinforcement learning as task inference
Humans achieve efficient learning by relying on prior knowledge about th...
read it

Imagined Value Gradients: ModelBased Policy Optimization with Transferable Latent Dynamics Models
Humans are masters at quickly learning many complex tasks, relying on an...
read it

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces
Direct optimization is an appealing approach to differentiating through ...
read it

Mix&Match  Agent Curricula for Reinforcement Learning
We introduce Mix&Match (M&M)  a training framework designed to facilita...
read it

Graph networks as learnable physics engines for inference and control
Understanding and interacting with everyday physical scenes requires ric...
read it

Relative Entropy Regularized Policy Iteration
We present an offpolicy actorcritic algorithm for Reinforcement Learni...
read it

Entropic Policy Composition with Generalized Policy Improvement and Divergence Correction
Deep reinforcement learning (RL) algorithms have made great strides in r...
read it

Emergent Coordination Through Competition
We study the emergence of cooperative behaviors in reinforcement learnin...
read it

Regularized Hierarchical Policies for Compositional Transfer in Robotics
The successful application of flexible, general learning algorithms  s...
read it

Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards
We propose a general and modelfree approach for Reinforcement Learning ...
read it

Emergence of Locomotion Behaviours in Rich Environments
The reinforcement learning paradigm allows, in principle, for complex be...
read it

Learning Hierarchical Information Flow with Recurrent Neural Modules
We propose ThalNet, a deep learning model inspired by neocortical commun...
read it

ImaginationAugmented Agents for Deep Reinforcement Learning
We introduce ImaginationAugmented Agents (I2As), a novel architecture f...
read it

Learning modelbased planning from scratch
Conventional wisdom holds that modelbased planning is a powerful approa...
read it

Metacontrol for Adaptive ImaginationBased Optimization
Many machine learning systems are built to solve the hardest examples of...
read it

Distral: Robust Multitask Reinforcement Learning
Most deep reinforcement learning algorithms are data inefficient in comp...
read it

Learning Continuous Control Policies by Stochastic Value Gradients
We present a unified framework for learning continuous control policies ...
read it

Filtering Variational Objectives
When used as a surrogate objective for maximum likelihood estimation in ...
read it

Learning and Transfer of Modulated Locomotor Controllers
We study a novel architecture and training procedure for locomotion task...
read it

Continuous control with deep reinforcement learning
We adapt the ideas underlying the success of Deep QLearning to the cont...
read it

Unsupervised Learning of 3D Structure from Images
A key goal of computer vision is to recover the underlying 3D structure ...
read it

KernelBased JustInTime Learning for Passing Expectation Propagation Messages
We propose an efficient nonparametric strategy for learning a message op...
read it

Passing Expectation Propagation Messages with Kernel Methods
We propose to learn a kernelbased message operator which takes as input...
read it

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models
We present a framework for efficient inference in structured image model...
read it

Recurrent Models of Visual Attention
Applying convolutional neural networks to large images is computationall...
read it

Weakly Supervised Learning of ForegroundBackground Segmentation using Masked RBMs
We propose an extension of the Restricted Boltzmann Machine (RBM) that a...
read it

Reinforcement and Imitation Learning for Diverse Visuomotor Skills
We propose a modelfree deep reinforcement learning method that leverage...
read it

Learning by Playing  Solving Sparse Reward Tasks from Scratch
We propose Scheduled Auxiliary Control (SACX), a new learning paradigm ...
read it

Distributed Distributional Deterministic Policy Gradients
This work adopts the very successful distributional perspective on reinf...
read it

Relational inductive biases, deep learning, and graph networks
Artificial intelligence (AI) has undergone a renaissance recently, makin...
read it

Maximum a Posteriori Policy Optimisation
We introduce a new algorithm for reinforcement learning called Maximum a...
read it

Woulda, Coulda, Shoulda: CounterfactuallyGuided Policy Search
Learning policies on data synthesized by models can in principle quench ...
read it

Hierarchical visuomotor control of humanoids
We aim to build complex humanoid agents that integrate perception, motor...
read it

Credit Assignment Techniques in Stochastic Computation Graphs
Stochastic computation graphs (SCGs) provide a formalism to represent st...
read it

Value constrained modelfree continuous control
The naive application of Reinforcement Learning algorithms to continuous...
read it

Selfsupervised Learning of Image Embedding for Continuous Control
Operating directly from raw high dimensional sensory inputs like images ...
read it

VMPO: OnPolicy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control
Some of the most successful applications of deep reinforcement learning ...
read it

Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions
A plethora of problems in AI, engineering and the sciences are naturally...
read it
Nicolas Heess
is this you? claim profile
PhD student at the Institute for Adaptive and Neural Computation, University of Edinburgh