Odalric-Ambrym Maillard

research

∙ 09/19/2023

Monte-Carlo tree search with uncertainty propagation via optimal transport

This paper introduces a novel backup strategy for Monte-Carlo Tree Searc...

0 Tuan Dam, et al. ∙

research

∙ 06/19/2023

AdaStop: sequential testing for efficient and reliable comparisons of Deep RL Agents

The reproducibility of many experimental results in Deep Reinforcement L...

0 Timothée Mathieu, et al. ∙

research

∙ 10/05/2022

Bilinear Exponential Family of MDPs: Frequentist Regret Bound with Tractable Exploration and Planning

We study the problem of episodic reinforcement learning in continuous st...

6 Reda Ouhamma, et al. ∙

research

∙ 09/15/2022

Risk-aware linear bandits with convex loss

In decision-making problems such as the multi-armed bandit, an agent lea...

0 Patrick Saux, et al. ∙

research

∙ 08/24/2022

Collaborative Algorithms for Online Personalized Mean Estimation

We consider an online estimation problem involving a set of agents. Each...

0 Mahsa Asadi, et al. ∙

research

∙ 03/07/2022

Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm

In this paper, we study the stochastic bandits problem with k unknown he...

2 Debabrota Basu, et al. ∙

research

∙ 01/18/2022

Bregman Deviations of Generic Exponential Families

We revisit the method of mixture technique, also known as the Laplace me...

0 Sayak Ray Chowdhury, et al. ∙

research

∙ 12/02/2021

Indexed Minimum Empirical Divergence for Unimodal Bandits

We consider a multi-armed bandit problem specified by a set of one-dimen...

0 Hassan Saber, et al. ∙

research

∙ 11/18/2021

From Optimality to Robustness: Dirichlet Sampling Strategies in Stochastic Bandits

The stochastic multi-arm bandit problem has been extensively studied und...

0 Dorian Baudry, et al. ∙

research

∙ 10/27/2020

Sub-sampling for Efficient Non-Parametric Bandit Exploration

In this paper we propose the first multi-armed bandit algorithm based on...

0 Dorian Baudry, et al. ∙

research

∙ 10/09/2020

Is Standard Deviation the New Standard? Revisiting the Critic in Deep Policy Gradients

Policy gradient algorithms have proven to be successful in diverse decis...

7 Yannis Flet-Berliac, et al. ∙

research

∙ 09/09/2020

Improved Exploration in Factored Average-Reward MDPs

We consider a regret minimization task under the average-reward criterio...

0 Mohammad Sadegh Talebi, et al. ∙

research

∙ 07/07/2020

Optimal Strategies for Graph-Structured Bandits

We study a structured variant of the multi-armed bandit problem specifie...

0 Hassan Saber, et al. ∙

research

∙ 06/30/2020

Forced-exploration free Strategies for Unimodal Bandits

We consider a multi-armed bandit problem specified by a set of Gaussian ...

0 Hassan Saber, et al. ∙

research

∙ 04/20/2020

Tightening Exploration in Upper Confidence Reinforcement Learning

The upper confidence reinforcement learning (UCRL2) strategy introduced ...

0 Hippolyte Bourel, et al. ∙

research

∙ 02/25/2020

Robust Estimation, Prediction and Control with Linear Dynamics and Generic Costs

We develop a framework for the adaptive model predictive control of a li...

0 Edouard Leurent, et al. ∙

research

∙ 10/09/2019

Model-Based Reinforcement Learning Exploiting State-Action Equivalence

Leveraging an equivalence property in the state-space of a Markov Decisi...

12 Mahsa Asadi, et al. ∙

research

∙ 05/30/2019

Distribution-dependent and Time-uniform Bounds for Piecewise i.i.d Bandits

We consider the setup of stochastic multi-armed bandits in the case when...

0 Subhojyoti Mukherjee, et al. ∙

research

∙ 05/27/2019

Learning Multiple Markov Chains via Adaptive Allocation

We study the problem of learning the transition matrices of a set of Mar...

0 M. Sadegh Talebi, et al. ∙

research

∙ 04/09/2019

Practical Open-Loop Optimistic Planning

We consider the problem of online planning in a Markov Decision Process ...

0 Edouard Leurent, et al. ∙

research

∙ 03/01/2019

Approximate Robust Control of Uncertain Dynamical Systems

This work studies the design of safe control policies for large-scale no...

0 Edouard Leurent, et al. ∙

research

∙ 03/05/2018

Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs

The problem of reinforcement learning in an unknown and discrete Markov ...

0 Mohammad Sadegh Talebi, et al. ∙

research

∙ 08/31/2017

Efficient tracking of a growing number of experts

We consider a variation on the problem of prediction with expert advice,...

0 Jaouad Mourtada, et al. ∙

research

∙ 08/02/2017

Streaming kernel regression with provably adaptive mean, variance, and regularization

We consider the problem of streaming kernel regression, when the observa...

0 Audrey Durand, et al. ∙

research

∙ 05/24/2017

Boundary Crossing Probabilities for General Exponential Families

We consider parametric exponential families of dimension K on the real l...

0 Odalric-Ambrym Maillard, et al. ∙

research

∙ 09/07/2016

Random Shuffling and Resets for the Non-stationary Stochastic Bandit Problem

We consider a non-stationary formulation of the stochastic multi-armed b...

0 Robin Allesiardo, et al. ∙

Odalric-Ambrym Maillard

Featured Co-authors

Sign in with Google

Consider DeepAI Pro