Remi Munos

research

∙ 09/01/2023

Local and adaptive mirror descents in extensive-form games

We study how to learn ϵ-optimal strategies in zero-sum imperfect informa...

0 Côme Fiegel, et al. ∙

research

∙ 05/29/2023

VA-learning as a more efficient alternative to Q-learning

In reinforcement learning, the advantage function is critical for policy...

0 Yunhao Tang, et al. ∙

research

∙ 05/29/2023

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

Multi-step learning applies lookahead over multiple time steps and has p...

0 Yunhao Tang, et al. ∙

research

∙ 05/29/2023

Towards a Better Understanding of Representation Dynamics under TD-learning

TD-learning is a foundation reinforcement learning (RL) algorithm for va...

0 Yunhao Tang, et al. ∙

research

∙ 05/28/2023

The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation

We study the problem of temporal-difference-based policy evaluation in r...

0 Mark Rowland, et al. ∙

research

∙ 05/22/2023

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

Mirror descent value iteration (MDVI), an abstraction of Kullback-Leible...

0 Toshinori Kitamura, et al. ∙

research

∙ 05/01/2023

Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition

Representation learning and exploration are among the key challenges for...

0 Yash Chandak, et al. ∙

research

∙ 03/14/2023

Fast Rates for Maximum Entropy Exploration

We consider the reinforcement learning (RL) setting, in which the agent ...

0 Daniil Tiapkin, et al. ∙

research

∙ 01/11/2023

An Analysis of Quantile Temporal-Difference Learning

We analyse quantile temporal-difference learning (QTD), a distributional...

0 Mark Rowland, et al. ∙

research

∙ 12/23/2022

Adapting to game trees in zero-sum imperfect information games

Imperfect information games (IIG) are games in which each player only pa...

0 Côme Fiegel, et al. ∙

research

∙ 12/06/2022

Understanding Self-Predictive Learning for Reinforcement Learning

We study the learning dynamics of self-predictive learning for reinforce...

0 Yunhao Tang, et al. ∙

research

∙ 11/18/2022

Curiosity in hindsight

Consider the exploration in sparse-reward or reward-free environments, s...

0 Daniel Jarrett, et al. ∙

research

∙ 09/28/2022

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

We consider reinforcement learning in an environment modeled by an episo...

0 Daniil Tiapkin, et al. ∙

research

∙ 07/15/2022

The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning

We study the multi-step off-policy learning approach to distributional R...

0 Yunhao Tang, et al. ∙

research

∙ 06/30/2022

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

We introduce DeepNash, an autonomous agent capable of learning to play t...

6 Julien Perolat, et al. ∙

research

∙ 06/16/2022

BYOL-Explore: Exploration by Bootstrapped Prediction

We present BYOL-Explore, a conceptually simple yet general approach for ...

0 Zhaohan Daniel Guo, et al. ∙

research

∙ 05/27/2022

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

In this work, we consider and analyze the sample complexity of model-fre...

6 Tadashi Kozuno, et al. ∙

research

∙ 03/30/2022

Marginalized Operators for Off-policy Reinforcement Learning

In this work, we propose marginalized operators, a new class of off-poli...

0 Yunhao Tang, et al. ∙

research

∙ 06/24/2021

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

Model-agnostic meta-reinforcement learning requires estimating the Hessi...

0 Yunhao Tang, et al. ∙

research

∙ 06/11/2021

Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

We study the problem of learning a Nash equilibrium (NE) in an imperfect...

0 Tadashi Kozuno, et al. ∙

research

∙ 06/11/2021

Taylor Expansion of Discount Factors

In practical reinforcement learning (RL), the discount factor used for e...

0 Yunhao Tang, et al. ∙

research

∙ 06/07/2021

Concave Utility Reinforcement Learning: the Mean-field Game viewpoint

Concave Utility Reinforcement Learning (CURL) extends RL from linear to ...

0 Matthieu Geist, et al. ∙

research

∙ 02/27/2021

Revisiting Peng's Q(λ) for Modern Reinforcement Learning

Off-policy multi-step reinforcement learning algorithms consist of conse...

0 Tadashi Kozuno, et al. ∙

research

∙ 02/12/2021

Bootstrapped Representation Learning on Graphs

Current state-of-the-art self-supervised learning methods for graph neur...

0 Shantanu Thakoor, et al. ∙

research

∙ 01/06/2021

Geometric Entropic Exploration

Exploration is essential for solving complex Reinforcement Learning (RL)...

0 Zhaohan Daniel Guo, et al. ∙

research

∙ 11/18/2020

Counterfactual Credit Assignment in Model-Free Reinforcement Learning

Credit assignment in reinforcement learning is the problem of measuring ...

8 Thomas Mesnard, et al. ∙

research

∙ 11/18/2020

Game Plan: What AI can do for Football, and What Football can do for AI

The rapid progress in artificial intelligence (AI) and machine learning ...

11 Karl Tuyls, et al. ∙

research

∙ 08/27/2020

The Advantage Regret-Matching Actor-Critic

Regret minimization has played a key role in online learning, equilibriu...

0 Audrūnas Gruslys, et al. ∙

research

∙ 07/24/2020

Monte-Carlo Tree Search as Regularized Policy Optimization

The combination of Monte-Carlo tree search (MCTS) with deep reinforcemen...

5 Jean-Bastien Grill, et al. ∙

research

∙ 06/13/2020

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-su...

0 Jean-Bastien Grill, et al. ∙

research

∙ 05/04/2020

Navigating the Landscape of Multiplayer Games to Probe the Drosophila of AI

Multiplayer games have a long history in being used as key testbeds for ...

11 Shayegan Omidshafiei, et al. ∙

research

∙ 04/30/2020

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning

Learning a good representation is an essential component for deep reinfo...

0 Daniel Guo, et al. ∙

research

∙ 03/31/2020

Leverage the Average: an Analysis of Regularization in RL

Building upon the formalism of regularized Markov decision processes, we...

7 Nino Vieillard, et al. ∙

research

∙ 03/13/2020

Taylor Expansion Policy Optimization

In this work, we investigate the application of Taylor expansions in rei...

37 Yunhao Tang, et al. ∙

research

∙ 02/19/2020

From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization

In this paper we investigate the Follow the Regularized Leader dynamics ...

32 Julien Perolat, et al. ∙

research

∙ 12/05/2019

Hindsight Credit Assignment

We consider the problem of efficient credit assignment in reinforcement ...

0 Anna Harutyunyan, et al. ∙

research

∙ 10/16/2019

Conditional Importance Sampling for Off-Policy Learning

The principal contribution of this paper is a conceptual framework for o...

12 Mark Rowland, et al. ∙

research

∙ 10/16/2019

Adaptive Trade-Offs in Off-Policy Learning

A great variety of off-policy learning algorithms exist in the literatur...

0 Mark Rowland, et al. ∙

research

∙ 09/27/2019

A Generalized Training Approach for Multiagent Learning

This paper investigates a population-based training regime based on game...

20 Paul Müller, et al. ∙

research

∙ 09/21/2019

Multiagent Evaluation under Incomplete Information

This paper investigates the evaluation of learned multiagent strategies ...

24 Mark Rowland, et al. ∙

research

∙ 06/01/2019

Neural Replicator Dynamics

In multiagent learning, agents interact in inherently nonstationary envi...

12 Shayegan Omidshafiei, et al. ∙

research

∙ 03/04/2019

α-Rank: Multi-Agent Evaluation by Evolution

We introduce α-Rank, a principled evolutionary dynamics methodology, for...

0 Shayegan Omidshafiei, et al. ∙

research

∙ 02/26/2019

The Termination Critic

In this work, we consider the problem of autonomously discovering behavi...

10 Anna Harutyunyan, et al. ∙

research

∙ 02/21/2019

Statistics and Samples in Distributional Reinforcement Learning

We present a unifying framework for designing and analysing distribution...

64 Mark Rowland, et al. ∙

research

∙ 02/20/2019

World Discovery Models

As humans we are driven by a strong desire for seeking novelty in our wo...

0 Mohammad Gheshlaghi Azar, et al. ∙

research

∙ 01/30/2019

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

The ability to transfer skills across tasks has the potential to scale u...

12 Andre Barreto, et al. ∙

research

∙ 01/15/2019

Optimistic optimization of a Brownian

We address the problem of optimizing a Brownian motion. We consider a (r...

0 Jean-Bastien Grill, et al. ∙

research

∙ 12/18/2018

Universal Successor Features Approximators

The ability of a reinforcement learning (RL) agent to learn about many r...

6 Diana Borsa, et al. ∙

research

∙ 11/15/2018

Neural Predictive Belief Representations

Unsupervised representation learning has succeeded with excellent result...

0 Zhaohan Daniel Guo, et al. ∙

research

∙ 10/21/2018

Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

Optimization of parameterized policies for reinforcement learning (RL) i...

8 Sriram Srinivasan, et al. ∙

Remi Munos

Featured Co-authors

Sign in with Google

Consider DeepAI Pro