
Counterfactual Credit Assignment in ModelFree Reinforcement Learning
Credit assignment in reinforcement learning is the problem of measuring ...
read it

Game Plan: What AI can do for Football, and What Football can do for AI
The rapid progress in artificial intelligence (AI) and machine learning ...
read it

Behavior Priors for Efficient Reinforcement Learning
As we deploy reinforcement learning agents to solve increasingly challen...
read it

Robust Constrained Reinforcement Learning for Continuous Control with Model Misspecification
Many realworld physical control systems are required to satisfy constra...
read it

Learning Dexterous Manipulation from Suboptimal Experts
Learning dexterous manipulation in highdimensional stateaction spaces ...
read it

Local Search for Policy Iteration in Continuous Control
We present an algorithm for local, regularized, policy improvement in re...
read it

Beyond TabulaRasa: a Modular Reinforcement Learning Approach for Physically Embedded 3D Sokoban
Intelligent robots need to achieve abstract objectives using concrete, s...
read it

Learning to swim in potential flow
Fish swim by undulating their bodies. These propulsive motions require c...
read it

Physically Embedded Planning Problems: New Challenges for Reinforcement Learning
Recent work in deep reinforcement learning (RL) has produced algorithms ...
read it

Importance Weighted Policy Learning and Adaption
The ability to exploit prior experience to solve novel problems rapidly ...
read it

Action and Perception as Divergence Minimization
We introduce a unified objective for action and perception of intelligen...
read it

Towards General and Autonomous Learning of Core Skills: A Case Study in Locomotion
Modern Reinforcement Learning (RL) algorithms promise to solve difficult...
read it

Dataefficient Hindsight Offpolicy Option Learning
Solutions to most complex tasks can be decomposed into simpler, intermed...
read it

Critic Regularized Regression
Offline reinforcement learning (RL), also known as batch RL, offers the ...
read it

RL Unplugged: Benchmarks for Offline Reinforcement Learning
Offline methods for reinforcement learning have the potential to help br...
read it

dm_control: Software and Tasks for Continuous Control
The dm_control software package is a collection of Python libraries and ...
read it

Simple Sensor Intentions for Exploration
Modern reinforcement learning algorithms can learn solutions to increasi...
read it

A Distributional View on MultiObjective Policy Optimization
Many realworld problems require trading off multiple competing objectiv...
read it

DivideandConquer Monte Carlo Tree Search For GoalDirected Planning
Standard planners for sequential decision making (including Monte Carlo ...
read it

Valuedriven Hindsight Modelling
Value estimation is a critical component of the reinforcement learning (...
read it

ContinuousDiscrete Reinforcement Learning for Hybrid Control in Robotics
Many realworld control problems involve both discrete decision variable...
read it

Hindsight Credit Assignment
We consider the problem of efficient credit assignment in reinforcement ...
read it

Reusable neural skill embeddings for visionguided whole body movement and object manipulation
Both in simulation settings and robotics, there is an ambition to produc...
read it

Quinoa: a Qfunction You Infer Normalized Over Actions
We present an algorithm for learning an approximate actionvalue soft Q...
read it

Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions
A plethora of problems in AI, engineering and the sciences are naturally...
read it

Stabilizing Transformers for Reinforcement Learning
Owing to their ability to both effectively integrate information over lo...
read it

Imagined Value Gradients: ModelBased Policy Optimization with Transferable Latent Dynamics Models
Humans are masters at quickly learning many complex tasks, relying on an...
read it

A Generalized Training Approach for Multiagent Learning
This paper investigates a populationbased training regime based on game...
read it

VMPO: OnPolicy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control
Some of the most successful applications of deep reinforcement learning ...
read it

Regularized Hierarchical Policies for Compositional Transfer in Robotics
The successful application of flexible, general learning algorithms  s...
read it

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces
Direct optimization is an appealing approach to differentiating through ...
read it

Meta reinforcement learning as task inference
Humans achieve efficient learning by relying on prior knowledge about th...
read it

Metalearning of Sequential Strategies
In this report we review memorybased metalearning as a tool for buildi...
read it

Information asymmetry in KLregularized RL
Many real world tasks exhibit rich structure that is repeated across dif...
read it

Exploiting Hierarchy for Learning and Transfer in KLregularized RL
As reinforcement learning agents are tasked with solving more challengin...
read it

The Termination Critic
In this work, we consider the problem of autonomously discovering behavi...
read it

Emergent Coordination Through Competition
We study the emergence of cooperative behaviors in reinforcement learnin...
read it

Value constrained modelfree continuous control
The naive application of Reinforcement Learning algorithms to continuous...
read it

Credit Assignment Techniques in Stochastic Computation Graphs
Stochastic computation graphs (SCGs) provide a formalism to represent st...
read it

Selfsupervised Learning of Image Embedding for Continuous Control
Operating directly from raw high dimensional sensory inputs like images ...
read it

Relative Entropy Regularized Policy Iteration
We present an offpolicy actorcritic algorithm for Reinforcement Learni...
read it

Entropic Policy Composition with Generalized Policy Improvement and Divergence Correction
Deep reinforcement learning (RL) algorithms have made great strides in r...
read it

Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures
This paper addresses the problem of evaluating learning systems in safet...
read it

Neural probabilistic motor primitives for humanoid control
We focus on the problem of learning a single motor module that can flexi...
read it

Hierarchical visuomotor control of humanoids
We aim to build complex humanoid agents that integrate perception, motor...
read it

Woulda, Coulda, Shoulda: CounterfactuallyGuided Policy Search
Learning policies on data synthesized by models can in principle quench ...
read it

Maximum a Posteriori Policy Optimisation
We introduce a new algorithm for reinforcement learning called Maximum a...
read it

Mix&Match  Agent Curricula for Reinforcement Learning
We introduce Mix&Match (M&M)  a training framework designed to facilita...
read it

Relational inductive biases, deep learning, and graph networks
Artificial intelligence (AI) has undergone a renaissance recently, makin...
read it

Graph networks as learnable physics engines for inference and control
Understanding and interacting with everyday physical scenes requires ric...
read it
Nicolas Heess
is this you? claim profile
PhD student at the Institute for Adaptive and Neural Computation, University of Edinburgh