
Breaking the Deadly Triad with a Target Network
The deadly triad refers to the instability of a reinforcement learning a...
read it

Deep Interactive Bayesian Reinforcement Learning via MetaLearning
Agents that interact with other agents often do not know a priori what t...
read it

AverageReward OffPolicy Policy Evaluation with Function Approximation
We consider offpolicy policy evaluation with function approximation (FA...
read it

UneVEn: Universal Value Exploration for MultiAgent Reinforcement Learning
This paper focuses on cooperative valuebased multiagent reinforcement ...
read it

My Body is a Cage: the Role of Morphology in GraphBased Incompatible Control
Multitask Reinforcement Learning is a promising way to obtain models wit...
read it

A Deeper Look at Discounting Mismatch in ActorCritic Algorithms
We investigate the discounting mismatch in actorcritic algorithm implem...
read it

Exploration in Approximate HyperState Space for Meta Reinforcement Learning
Metalearning is a powerful tool for learning policies that can adapt ef...
read it

Exploiting Submodular Value Functions For Scaling Up Active Perception
In active perception tasks, an agent aims to select sensory actions that...
read it

WordCraft: An Environment for Benchmarking Commonsense Agents
The ability to quickly solve a wide range of realworld tasks requires a...
read it

Learning Retrospective Knowledge with Reverse Reinforcement Learning
We present a Reverse Reinforcement Learning (Reverse RL) approach for re...
read it

Weighted QMIX: Expanding Monotonic Value Function Factorisation
QMIX is a popular Qlearning algorithm for cooperative MARL in the centr...
read it

The Impact of Nonstationarity on Generalisation in Deep Reinforcement Learning
Nonstationarity arises in Reinforcement Learning (RL) even in stationar...
read it

AIQMIX: Attention and Imagination for Dynamic MultiAgent Reinforcement Learning
Real world multiagent tasks often involve varying types and quantities ...
read it

Privileged Information Dropout in Reinforcement Learning
Using privileged information during training can improve the sample effi...
read it

Maximizing Information Gain in Partially Observable Environments via Prediction Reward
Information gathering in a partially observable environment can be formu...
read it

PerStep Reward: A New Perspective for RiskAverse Reinforcement Learning
We present a new perstep reward perspective for riskaverse control in ...
read it

Monotonic Value Function Factorisation for Deep MultiAgent Reinforcement Learning
In many realworld settings, a team of agents must coordinate its behavi...
read it

Deep MultiAgent Reinforcement Learning for Decentralized Continuous Cooperative Control
Deep multiagent reinforcement learning (MARL) holds the promise of auto...
read it

Optimistic Exploration even with a Pessimistic Initialisation
Optimistic initialisation is an effective strategy for efficient explora...
read it

Reinforcement Learning Enhanced Quantuminspired Algorithm for Combinatorial Optimization
Quantum hardware and quantuminspired algorithms are becoming increasing...
read it

GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values
We present GradientDICE for estimating the density ratio between the sta...
read it

Facial Feedback for Reinforcement Learning: A Case Study and Offline Analysis Using the TAMER Framework
Interactive reinforcement learning provides a way for agents to learn to...
read it

VIABLE: Fast Adaptation via Backpropagating Learned Loss
In fewshot learning, typically, the loss function which is applied at t...
read it

Provably Convergent OffPolicy ActorCritic with Function Approximation
We present the first provably convergent offpolicy actorcritic algorit...
read it

VariBAD: A Very Good Method for BayesAdaptive Deep RL via MetaLearning
Trading off exploration and exploitation in an unknown environment is ke...
read it

MAVEN: MultiAgent Variational Exploration
Centralised training with decentralised execution is an important settin...
read it

Deep Coordination Graphs
This paper introduces the deep coordination graph (DCG) for collaborativ...
read it

Improving SAT Solver Heuristics with Graph Networks and Reinforcement Learning
We present GQSAT, a branching heuristic in a Boolean SAT solver trained ...
read it

Loaded DiCE: Trading off Bias and Variance in AnyOrder Score Function Estimators for Reinforcement Learning
Gradientbased methods for optimisation of objectives in stochastic sett...
read it

Growing Action Spaces
In complex tasks, such as those with large combinatorial action spaces, ...
read it

A Survey of Reinforcement Learning Informed by Natural Language
To be successful in realworld tasks, Reinforcement Learning (RL) needs ...
read it

Exploration with Unreliable Intrinsic Reward in MultiAgent Reinforcement Learning
This paper investigates the use of intrinsic reward to guide exploration...
read it

Deep Residual Reinforcement Learning
We revisit residual algorithms in both modelfree and modelbased reinfo...
read it

DAC: The Double ActorCritic Architecture for Learning Options
We reformulate the option framework as two parallel augmented MDPs. Unde...
read it

Multitask Soft Option Learning
We present Multitask Soft Option Learning (MSOL), a hierarchical multita...
read it

Generalized OffPolicy ActorCritic
We propose a new objective, the counterfactual objective, unifying exist...
read it

The Representational Capacity of ActionValue Networks for MultiAgent Reinforcement Learning
Recent years have seen the application of deep reinforcement learning te...
read it

Fast Efficient Hyperparameter Tuning for Policy Gradients
The performance of policy gradient methods is sensitive to hyperparamete...
read it

The StarCraft MultiAgent Challenge
In the last few years, deep multiagent reinforcement learning (RL) has ...
read it

Stable Opponent Shaping in Differentiable Games
A growing number of learning methods are actually games which optimise m...
read it

Learning from Demonstration in the Wild
Learning from demonstration (LfD) is useful in settings where handcodin...
read it

Bayesian Action Decoder for Deep MultiAgent Reinforcement Learning
When observing the actions of others, humans carry out inferences about ...
read it

VIREL: A Variational Inference Framework for Reinforcement Learning
Applying probabilistic models to reinforcement learning (RL) has become ...
read it

MultiAgent Common Knowledge Reinforcement Learning
In multiagent reinforcement learning, centralised policies can only be ...
read it

CAML: Fast Context Adaptation via MetaLearning
We propose CAML, a metalearning method for fast adaptation that partiti...
read it

Deep Variational Reinforcement Learning for POMDPs
Many realworld sequential decision making problems are partially observ...
read it

Contextual Policy Optimisation
Policy gradient methods have been successfully applied to a variety of r...
read it

QMIX: Monotonic Value Function Factorisation for Deep MultiAgent Reinforcement Learning
In many realworld settings, a team of agents must coordinate their beha...
read it

TACO: Learning Task Decomposition via Temporal Alignment for Control
Many advanced Learning from Demonstration (LfD) methods consider the dec...
read it

Fourier Policy Gradients
We propose a new way of deriving policy gradient updates for reinforceme...
read it
Shimon Whiteson
is this you? claim profile
Professor in the Department of Computer Science at the University of Oxford, Fellow of St Catherine's College, Chief Scientist at Morpheus Labs, Associate Professor at University of Amsterdam from 20072015