
Multitask Soft Option Learning
We present Multitask Soft Option Learning (MSOL), a hierarchical multita...
04/01/2019 ∙ by Maximilian Igl, et al. ∙ 98 ∙ shareread it

Stable Opponent Shaping in Differentiable Games
A growing number of learning methods are actually games which optimise m...
11/20/2018 ∙ by Alistair Letcher, et al. ∙ 74 ∙ shareread it

VariBAD: A Very Good Method for BayesAdaptive Deep RL via MetaLearning
Trading off exploration and exploitation in an unknown environment is ke...
10/18/2019 ∙ by Luisa Zintgraf, et al. ∙ 66 ∙ shareread it

Deep Coordination Graphs
This paper introduces the deep coordination graph (DCG) for collaborativ...
09/27/2019 ∙ by Wendelin Böhmer, et al. ∙ 54 ∙ shareread it

A Survey of Reinforcement Learning Informed by Natural Language
To be successful in realworld tasks, Reinforcement Learning (RL) needs ...
06/10/2019 ∙ by Jelena Luketina, et al. ∙ 52 ∙ shareread it

Deep Residual Reinforcement Learning
We revisit residual algorithms in both modelfree and modelbased reinfo...
05/03/2019 ∙ by Shangtong Zhang, et al. ∙ 38 ∙ shareread it

Improving SAT Solver Heuristics with Graph Networks and Reinforcement Learning
We present GQSAT, a branching heuristic in a Boolean SAT solver trained ...
09/26/2019 ∙ by Vitaly Kurin, et al. ∙ 37 ∙ shareread it

The StarCraft MultiAgent Challenge
In the last few years, deep multiagent reinforcement learning (RL) has ...
02/11/2019 ∙ by Mikayel Samvelyan, et al. ∙ 32 ∙ shareread it

Generalized OffPolicy ActorCritic
We propose a new objective, the counterfactual objective, unifying exist...
03/27/2019 ∙ by Shangtong Zhang, et al. ∙ 28 ∙ shareread it

Provably Convergent OffPolicy ActorCritic with Function Approximation
We present the first provably convergent offpolicy actorcritic algorit...
11/11/2019 ∙ by Shangtong Zhang, et al. ∙ 26 ∙ shareread it

Fast Efficient Hyperparameter Tuning for Policy Gradients
The performance of policy gradient methods is sensitive to hyperparamete...
02/18/2019 ∙ by Supratik Paul, et al. ∙ 20 ∙ shareread it

Loaded DiCE: Trading off Bias and Variance in AnyOrder Score Function Estimators for Reinforcement Learning
Gradientbased methods for optimisation of objectives in stochastic sett...
09/23/2019 ∙ by Gregory Farquhar, et al. ∙ 20 ∙ shareread it

MAVEN: MultiAgent Variational Exploration
Centralised training with decentralised execution is an important settin...
10/16/2019 ∙ by Anuj Mahajan, et al. ∙ 17 ∙ shareread it

DAC: The Double ActorCritic Architecture for Learning Options
We reformulate the option framework as two parallel augmented MDPs. Unde...
04/29/2019 ∙ by Shangtong Zhang, et al. ∙ 16 ∙ shareread it

Exploration with Unreliable Intrinsic Reward in MultiAgent Reinforcement Learning
This paper investigates the use of intrinsic reward to guide exploration...
06/05/2019 ∙ by Wendelin Böhmer, et al. ∙ 14 ∙ shareread it

VIABLE: Fast Adaptation via Backpropagating Learned Loss
In fewshot learning, typically, the loss function which is applied at t...
11/29/2019 ∙ by Leo Feng, et al. ∙ 10 ∙ shareread it

Learning from Demonstration in the Wild
Learning from demonstration (LfD) is useful in settings where handcodin...
11/08/2018 ∙ by Feryal Behbahani, et al. ∙ 8 ∙ shareread it

Growing Action Spaces
In complex tasks, such as those with large combinatorial action spaces, ...
06/28/2019 ∙ by Gregory Farquhar, et al. ∙ 7 ∙ shareread it

Deep Variational Reinforcement Learning for POMDPs
Many realworld sequential decision making problems are partially observ...
06/06/2018 ∙ by Maximilian Igl, et al. ∙ 2 ∙ shareread it

Contextual Policy Optimisation
Policy gradient methods have been successfully applied to a variety of r...
05/27/2018 ∙ by Supratik Paul, et al. ∙ 2 ∙ shareread it

CAML: Fast Context Adaptation via MetaLearning
We propose CAML, a metalearning method for fast adaptation that partiti...
10/08/2018 ∙ by Luisa M Zintgraf, et al. ∙ 2 ∙ shareread it

Expected Policy Gradients
We propose expected policy gradients (EPG), which unify stochastic polic...
06/15/2017 ∙ by Kamil Ciosek, et al. ∙ 0 ∙ shareread it

MultiObjective Deep Reinforcement Learning
We propose Deep Optimistic Linear Support Learning (DOL) to solve highd...
10/09/2016 ∙ by Hossam Mossalam, et al. ∙ 0 ∙ shareread it

Learning to Communicate to Solve Riddles with Deep Distributed Recurrent QNetworks
We propose deep distributed recurrent Qnetworks (DDRQN), which enable t...
02/08/2016 ∙ by Jakob N. Foerster, et al. ∙ 0 ∙ shareread it

Alternating Optimisation and Quadrature for Robust Control
Bayesian optimisation has been successfully applied to a variety of rein...
05/24/2016 ∙ by Supratik Paul, et al. ∙ 0 ∙ shareread it

Probably Approximately Correct Greedy Maximization
Submodular function maximization finds application in a variety of real...
02/25/2016 ∙ by Yash Satsangi, et al. ∙ 0 ∙ shareread it

A Survey of MultiObjective Sequential DecisionMaking
Sequential decisionmaking problems with multiple objectives arise natur...
02/04/2014 ∙ by Diederik Marijn Roijers, et al. ∙ 0 ∙ shareread it

Incremental Clustering and Expansion for Faster Optimal Planning in DecPOMDPs
This article presents the stateoftheart in optimal solution methods f...
02/04/2014 ∙ by Frans Adriaan Oliehoek, et al. ∙ 0 ∙ shareread it

Exploiting Structure in Cooperative Bayesian Games
Cooperative Bayesian games (BGs) can model decisionmaking problems for ...
10/16/2012 ∙ by Frans A. Oliehoek, et al. ∙ 0 ∙ shareread it

Exploiting Agent and Type Independence in Collaborative Graphical Bayesian Games
Efficient collaborative decision making is an important challenge for mu...
08/01/2011 ∙ by Frans A. Oliehoek, et al. ∙ 0 ∙ shareread it

LipNet: EndtoEnd Sentencelevel Lipreading
Lipreading is the task of decoding text from the movement of a speaker's...
11/05/2016 ∙ by Yannis M. Assael, et al. ∙ 0 ∙ shareread it

Stabilising Experience Replay for Deep MultiAgent Reinforcement Learning
Many realworld problems, such as network packet routing and urban traff...
02/28/2017 ∙ by Jakob Foerster, et al. ∙ 0 ∙ shareread it

TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning
Combining deep modelfree reinforcement learning with online planning i...
10/31/2017 ∙ by Gregory Farquhar, et al. ∙ 0 ∙ shareread it

Expected Policy Gradients for Reinforcement Learning
We propose expected policy gradients (EPG), which unify stochastic polic...
01/10/2018 ∙ by Kamil Ciosek, et al. ∙ 0 ∙ shareread it

Learning with OpponentLearning Awareness
Multiagent settings are quickly gathering importance in machine learnin...
09/13/2017 ∙ by Jakob N. Foerster, et al. ∙ 0 ∙ shareread it

Counterfactual MultiAgent Policy Gradients
Cooperative multiagent systems can be naturally used to model many real...
05/24/2017 ∙ by Jakob Foerster, et al. ∙ 0 ∙ shareread it

Fourier Policy Gradients
We propose a new way of deriving policy gradient updates for reinforceme...
02/19/2018 ∙ by Matthew Fellows, et al. ∙ 0 ∙ shareread it

TACO: Learning Task Decomposition via Temporal Alignment for Control
Many advanced Learning from Demonstration (LfD) methods consider the dec...
03/02/2018 ∙ by Kyriacos Shiarlis, et al. ∙ 0 ∙ shareread it

DiCE: The Infinitely Differentiable MonteCarlo Estimator
The score function estimator is widely used for estimating gradients of ...
02/14/2018 ∙ by Jakob Foerster, et al. ∙ 0 ∙ shareread it

QMIX: Monotonic Value Function Factorisation for Deep MultiAgent Reinforcement Learning
In many realworld settings, a team of agents must coordinate their beha...
03/30/2018 ∙ by Tabish Rashid, et al. ∙ 0 ∙ shareread it

MultiAgent Common Knowledge Reinforcement Learning
In multiagent reinforcement learning, centralised policies can only be ...
10/27/2018 ∙ by Jakob N. Foerster, et al. ∙ 0 ∙ shareread it

VIREL: A Variational Inference Framework for Reinforcement Learning
Applying probabilistic models to reinforcement learning (RL) has become ...
11/03/2018 ∙ by Matthew Fellows, et al. ∙ 0 ∙ shareread it

Bayesian Action Decoder for Deep MultiAgent Reinforcement Learning
When observing the actions of others, humans carry out inferences about ...
11/04/2018 ∙ by Jakob N. Foerster, et al. ∙ 0 ∙ shareread it

The Representational Capacity of ActionValue Networks for MultiAgent Reinforcement Learning
Recent years have seen the application of deep reinforcement learning te...
02/20/2019 ∙ by Jacopo Castellini, et al. ∙ 0 ∙ shareread it
Shimon Whiteson
is this you? claim profile
Professor in the Department of Computer Science at the University of Oxford, Fellow of St Catherine's College, Chief Scientist at Morpheus Labs, Associate Professor at University of Amsterdam from 20072015