Shimon Whiteson

research

∙ 08/24/2023

Bayesian Exploration Networks

Bayesian reinforcement learning (RL) offers a principled and elegant app...

0 Mattie Fellows, et al. ∙

research

∙ 05/19/2023

The Waymo Open Sim Agents Challenge

Simulation with realistic, interactive agents represents a key task for ...

0 Nico Montali, et al. ∙

research

∙ 03/19/2023

Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning

By enabling agents to communicate, recent cooperative multi-agent reinfo...

1 Yat Long Lo, et al. ∙

research

∙ 02/24/2023

Why Target Networks Stabilise Temporal Difference Methods

Integral to recent successes in deep reinforcement learning has been a c...

0 Mattie Fellows, et al. ∙

research

∙ 02/22/2023

Universal Morphology Control via Contextual Modulation

Learning a universal policy across different robot morphologies can sign...

0 Zheng Xiong, et al. ∙

research

∙ 02/15/2023

Trust-Region-Free Policy Optimization for Stochastic Policies

Trust Region Policy Optimization (TRPO) is an iterative method that simu...

0 Mingfei Sun, et al. ∙

research

∙ 01/19/2023

A Survey of Meta-Reinforcement Learning

While deep reinforcement learning (RL) has fueled multiple high-profile ...

0 Jacob Beck, et al. ∙

research

∙ 12/21/2022

Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios

Imitation learning (IL) is a simple and powerful way to use high-quality...

0 Yiren Lu, et al. ∙

research

∙ 12/14/2022

SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning

The availability of challenging benchmarks has played a key role in the ...

0 Benjamin Ellis, et al. ∙

research

∙ 12/14/2022

Particle-Based Score Estimation for State Space Model Learning in Autonomous Driving

Multi-object state estimation is a fundamental problem for robotic appli...

2 Angad Singh, et al. ∙

research

∙ 12/02/2022

Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula

ML-based motion planning is a promising approach to produce agents that ...

0 Eli Bronstein, et al. ∙

research

∙ 10/21/2022

Equivariant Networks for Zero-Shot Coordination

Successful coordination in Dec-POMDPs requires agents to adopt robust st...

1 Darius Muglich, et al. ∙

research

∙ 10/20/2022

Hypernetworks in Meta-Reinforcement Learning

Training a reinforcement learning (RL) agent on a real-world robotics ta...

0 Jacob Beck, et al. ∙

research

∙ 10/18/2022

Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving

We demonstrate the first large-scale application of model-based generati...

0 Eli Bronstein, et al. ∙

research

∙ 09/22/2022

An Investigation of the Bias-Variance Tradeoff in Meta-Gradients

Meta-gradients provide a general approach for optimizing the meta-parame...

0 Risto Vuorio, et al. ∙

research

∙ 06/26/2022

Generalized Beliefs for Cooperative AI

Self-play is a common paradigm for constructing solutions in Markov game...

2 Darius Muglich, et al. ∙

research

∙ 05/06/2022

Symphony: Learning Realistic and Diverse Agents for Autonomous Driving Simulation

Simulation is a crucial tool for accelerating the development of autonom...

8 Maximilian Igl, et al. ∙

research

∙ 01/31/2022

Monotonic Improvement Guarantees under Non-stationarity for Decentralized PPO

We present a new monotonic improvement guarantee for optimizing decentra...

0 Mingfei Sun, et al. ∙

research

∙ 01/31/2022

You May Not Need Ratio Clipping in PPO

Proximal Policy Optimization (PPO) methods learn a policy by iteratively...

0 Mingfei Sun, et al. ∙

research

∙ 01/11/2022

In Defense of the Unitary Scalarization for Deep Multi-Task Learning

Recent multi-task learning research argues against unitary scalarization...

0 Vitaly Kurin, et al. ∙

research

∙ 12/11/2021

Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency

Sample efficiency is crucial for imitation learning methods to be applic...

0 Mingfei Sun, et al. ∙

research

∙ 12/01/2021

On the Practical Consistency of Meta-Reinforcement Learning Algorithms

Consistency is the theoretical property of a meta learning algorithm tha...

0 Zheng Xiong, et al. ∙

research

∙ 10/27/2021

Reinforcement Learning in Factored Action Spaces using Tensor Decompositions

We present an extended abstract for the previously published work TESSER...

4 Anuj Mahajan, et al. ∙

research

∙ 10/27/2021

Model based Multi-agent Reinforcement Learning with Tensor Decompositions

A challenge in multi-agent reinforcement learning is to be able to gener...

0 Pascal Van Der Vaart, et al. ∙

research

∙ 08/11/2021

Truncated Emphatic Temporal Difference Methods for Prediction and Control

Emphatic Temporal Difference (TD) methods are a class of off-policy Rein...

0 Shangtong Zhang, et al. ∙

research

∙ 07/17/2021

Implicit Communication as Minimum Entropy Coupling

In many common-payoff games, achieving good performance requires players...

3 Samuel Sokota, et al. ∙

research

∙ 06/09/2021

Bayesian Bellman Operators

We introduce a novel perspective on Bayesian reinforcement learning (RL)...

0 Matthew Fellows, et al. ∙

research

∙ 06/06/2021

SoftDICE for Imitation Learning: Rethinking Off-policy Distribution Matching

We present SoftDICE, which achieves state-of-the-art performance for imi...

0 Mingfei Sun, et al. ∙

research

∙ 05/31/2021

Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning

Reinforcement Learning in large action spaces is a challenging problem. ...

33 Anuj Mahajan, et al. ∙

research

∙ 03/22/2021

Regularized Softmax Deep Multi-Agent Q-Learning

Tackling overestimation in Q-learning is an important problem that has b...

0 Ling Pan, et al. ∙

research

∙ 03/01/2021

Snowflake: Scaling GNNs to High-Dimensional Continuous Control via Parameter Freezing

Recent research has shown that Graph Neural Networks (GNNs) can learn po...

8 Charlie Blake, et al. ∙

research

∙ 01/21/2021

Breaking the Deadly Triad with a Target Network

The deadly triad refers to the instability of a reinforcement learning a...

0 Shangtong Zhang, et al. ∙

research

∙ 01/11/2021

Deep Interactive Bayesian Reinforcement Learning via Meta-Learning

Agents that interact with other agents often do not know a priori what t...

0 Luisa Zintgraf, et al. ∙

research

∙ 01/08/2021

Average-Reward Off-Policy Policy Evaluation with Function Approximation

We consider off-policy policy evaluation with function approximation (FA...

0 Shangtong Zhang, et al. ∙

research

∙ 10/06/2020

UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

This paper focuses on cooperative value-based multi-agent reinforcement ...

4 Tarun Gupta, et al. ∙

research

∙ 10/05/2020

My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control

Multitask Reinforcement Learning is a promising way to obtain models wit...

7 Vitaly Kurin, et al. ∙

research

∙ 10/02/2020

A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms

We investigate the discounting mismatch in actor-critic algorithm implem...

1 Shangtong Zhang, et al. ∙

research

∙ 10/02/2020

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Meta-learning is a powerful tool for learning policies that can adapt ef...

4 Luisa Zintgraf, et al. ∙

research

∙ 09/21/2020

Exploiting Submodular Value Functions For Scaling Up Active Perception

In active perception tasks, an agent aims to select sensory actions that...

5 Yash Satsangi, et al. ∙

research

∙ 07/17/2020

WordCraft: An Environment for Benchmarking Commonsense Agents

The ability to quickly solve a wide range of real-world tasks requires a...

17 Minqi Jiang, et al. ∙

research

∙ 07/09/2020

Learning Retrospective Knowledge with Reverse Reinforcement Learning

We present a Reverse Reinforcement Learning (Reverse RL) approach for re...

25 Shangtong Zhang, et al. ∙

research

∙ 06/18/2020

Weighted QMIX: Expanding Monotonic Value Function Factorisation

QMIX is a popular Q-learning algorithm for cooperative MARL in the centr...

0 Tabish Rashid, et al. ∙

research

∙ 06/10/2020

The Impact of Non-stationarity on Generalisation in Deep Reinforcement Learning

Non-stationarity arises in Reinforcement Learning (RL) even in stationar...

13 Maximilian Igl, et al. ∙

research

∙ 06/07/2020

AI-QMIX: Attention and Imagination for Dynamic Multi-Agent Reinforcement Learning

Real world multi-agent tasks often involve varying types and quantities ...

9 Shariq Iqbal, et al. ∙

research

∙ 05/19/2020

Privileged Information Dropout in Reinforcement Learning

Using privileged information during training can improve the sample effi...

14 Pierre-Alexandre Kamienny, et al. ∙

research

∙ 05/11/2020

Maximizing Information Gain in Partially Observable Environments via Prediction Reward

Information gathering in a partially observable environment can be formu...

32 Yash Satsangi, et al. ∙

research

∙ 04/22/2020

Per-Step Reward: A New Perspective for Risk-Averse Reinforcement Learning

We present a new per-step reward perspective for risk-averse control in ...

15 Shangtong Zhang, et al. ∙

research

∙ 03/19/2020

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

In many real-world settings, a team of agents must coordinate its behavi...

1 Tabish Rashid, et al. ∙

research

∙ 03/14/2020

Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control

Deep multi-agent reinforcement learning (MARL) holds the promise of auto...

13 Christian Schroeder de Witt, et al. ∙

research

∙ 02/26/2020

Optimistic Exploration even with a Pessimistic Initialisation

Optimistic initialisation is an effective strategy for efficient explora...

5 Tabish Rashid, et al. ∙

Shimon Whiteson

Featured Co-authors

Sign in with Google

Consider DeepAI Pro