
WordCraft: An Environment for Benchmarking Commonsense Agents
The ability to quickly solve a wide range of realworld tasks requires a...
Learning Retrospective Knowledge with Reverse Reinforcement Learning
We present a Reverse Reinforcement Learning (Reverse RL) approach for re...
Weighted QMIX: Expanding Monotonic Value Function Factorisation
QMIX is a popular Qlearning algorithm for cooperative MARL in the centr...
The Impact of Nonstationarity on Generalisation in Deep Reinforcement Learning
Nonstationarity arises in Reinforcement Learning (RL) even in stationar...
AIQMIX: Attention and Imagination for Dynamic MultiAgent Reinforcement Learning
Real world multiagent tasks often involve varying types and quantities ...
Privileged Information Dropout in Reinforcement Learning
Using privileged information during training can improve the sample effi...
Maximizing Information Gain in Partially Observable Environments via Prediction Reward
Information gathering in a partially observable environment can be formu...
PerStep Reward: A New Perspective for RiskAverse Reinforcement Learning
We present a new perstep reward perspective for riskaverse control in ...
Monotonic Value Function Factorisation for Deep MultiAgent Reinforcement Learning
In many realworld settings, a team of agents must coordinate its behavi...
Deep MultiAgent Reinforcement Learning for Decentralized Continuous Cooperative Control
Deep multiagent reinforcement learning (MARL) holds the promise of auto...
Optimistic Exploration even with a Pessimistic Initialisation
Optimistic initialisation is an effective strategy for efficient explora...
Reinforcement Learning Enhanced Quantuminspired Algorithm for Combinatorial Optimization
Quantum hardware and quantuminspired algorithms are becoming increasing...
GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values
We present GradientDICE for estimating the density ratio between the sta...
Facial Feedback for Reinforcement Learning: A Case Study and Offline Analysis Using the TAMER Framework
Interactive reinforcement learning provides a way for agents to learn to...
VIABLE: Fast Adaptation via Backpropagating Learned Loss
In fewshot learning, typically, the loss function which is applied at t...
Provably Convergent OffPolicy ActorCritic with Function Approximation
We present the first provably convergent offpolicy actorcritic algorit...
VariBAD: A Very Good Method for BayesAdaptive Deep RL via MetaLearning
Trading off exploration and exploitation in an unknown environment is ke...
MAVEN: MultiAgent Variational Exploration
Centralised training with decentralised execution is an important settin...
Deep Coordination Graphs
This paper introduces the deep coordination graph (DCG) for collaborativ...
Improving SAT Solver Heuristics with Graph Networks and Reinforcement Learning
We present GQSAT, a branching heuristic in a Boolean SAT solver trained ...
Loaded DiCE: Trading off Bias and Variance in AnyOrder Score Function Estimators for Reinforcement Learning
Gradientbased methods for optimisation of objectives in stochastic sett...
Growing Action Spaces
In complex tasks, such as those with large combinatorial action spaces, ...
A Survey of Reinforcement Learning Informed by Natural Language
To be successful in realworld tasks, Reinforcement Learning (RL) needs ...
Exploration with Unreliable Intrinsic Reward in MultiAgent Reinforcement Learning
This paper investigates the use of intrinsic reward to guide exploration...
Deep Residual Reinforcement Learning
We revisit residual algorithms in both modelfree and modelbased reinfo...
DAC: The Double ActorCritic Architecture for Learning Options
We reformulate the option framework as two parallel augmented MDPs. Unde...
Multitask Soft Option Learning
We present Multitask Soft Option Learning (MSOL), a hierarchical multita...
Generalized OffPolicy ActorCritic
We propose a new objective, the counterfactual objective, unifying exist...
The Representational Capacity of ActionValue Networks for MultiAgent Reinforcement Learning
Recent years have seen the application of deep reinforcement learning te...
Fast Efficient Hyperparameter Tuning for Policy Gradients
The performance of policy gradient methods is sensitive to hyperparamete...
The StarCraft MultiAgent Challenge
In the last few years, deep multiagent reinforcement learning (RL) has ...
Stable Opponent Shaping in Differentiable Games
A growing number of learning methods are actually games which optimise m...
Learning from Demonstration in the Wild
Learning from demonstration (LfD) is useful in settings where handcodin...
Bayesian Action Decoder for Deep MultiAgent Reinforcement Learning
When observing the actions of others, humans carry out inferences about ...
VIREL: A Variational Inference Framework for Reinforcement Learning
Applying probabilistic models to reinforcement learning (RL) has become ...
MultiAgent Common Knowledge Reinforcement Learning
In multiagent reinforcement learning, centralised policies can only be ...
CAML: Fast Context Adaptation via MetaLearning
We propose CAML, a metalearning method for fast adaptation that partiti...
Deep Variational Reinforcement Learning for POMDPs
Many realworld sequential decision making problems are partially observ...
Contextual Policy Optimisation
Policy gradient methods have been successfully applied to a variety of r...
TACO: Learning Task Decomposition via Temporal Alignment for Control
Many advanced Learning from Demonstration (LfD) methods consider the dec...
Fourier Policy Gradients
We propose a new way of deriving policy gradient updates for reinforceme...
DiCE: The Infinitely Differentiable MonteCarlo Estimator
The score function estimator is widely used for estimating gradients of ...
Expected Policy Gradients for Reinforcement Learning
We propose expected policy gradients (EPG), which unify stochastic polic...
TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning
Combining deep modelfree reinforcement learning with online planning i...
Learning with OpponentLearning Awareness
Multiagent settings are quickly gathering importance in machine learnin...
Counterfactual MultiAgent Policy Gradients
Cooperative multiagent systems can be naturally used to model many real...
Stabilising Experience Replay for Deep MultiAgent Reinforcement Learning
Many realworld problems, such as network packet routing and urban traff...
LipNet: EndtoEnd Sentencelevel Lipreading
Lipreading is the task of decoding text from the movement of a speaker's...
