
Problem Dependent View on Structured Thresholding Bandit Problems
We investigate the problem dependent regime in the stochastic Thresholdi...
read it

ModelFree Learning for TwoPlayer ZeroSum Partially Observable Markov Games with Perfect Recall
We study the problem of learning a Nash equilibrium (NE) in an imperfect...
read it

Bandits with many optimal arms
We consider a stochastic bandit problem with a possibly infinite number ...
read it

UCB Momentum Qlearning: Correcting the bias without forgetting
We propose UCBMQ, Upper Confidence Bound Momentum Qlearning, a new algo...
read it

Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited
In this paper, we propose new problemindependent lower bounds on the sa...
read it

Fast active learning for pure exploration in reinforcement learning
Realistic environments often provide agents with very limited feedback. ...
read it

A KernelBased Approach to NonStationary Reinforcement Learning in Metric Spaces
In this work, we propose KeRNS: an algorithm for episodic reinforcement ...
read it

Optimal Strategies for GraphStructured Bandits
We study a structured variant of the multiarmed bandit problem specifie...
read it

Gamification of Pure Exploration for Linear Bandits
We investigate an active pureexploration setting, that includes bestar...
read it

Forcedexploration free Strategies for Unimodal Bandits
We consider a multiarmed bandit problem specified by a set of Gaussian ...
read it

The Influence of Shape Constraints on the Thresholding Bandit Problem
We investigate the stochastic Thresholding Bandit problem (TBP) under se...
read it

Adaptive RewardFree Exploration
Rewardfree exploration is a reinforcement learning setting recently stu...
read it

Planning in Markov Decision Processes with GapDependent Sample Complexity
We propose MDPGapE, a new trajectorybased MonteCarlo Tree Search algo...
read it

Regret Bounds for KernelBased Reinforcement Learning
We consider the explorationexploitation dilemma in finitehorizon reinf...
read it

FixedConfidence Guarantees for Bayesian BestArm Identification
We investigate and provide new insights on the sampling rule called Top...
read it

NonAsymptotic Pure Exploration by Solving Games
Pure exploration (aka active testing) is the fundamental task of sequent...
read it

Gradient Ascent for Active Exploration in Bandit Problems
We present a new algorithm based on an gradient ascent for a general Act...
read it

KLUCBswitch: optimal regret bounds for stochastic bandits from both a distributiondependent and a distributionfree viewpoints
In the context of Karmed stochastic bandits with distribution only assu...
read it

Thresholding Bandit for Doseranging: The Impact of Monotonicity
We analyze the sample complexity of the thresholding bandit problem, wit...
read it

A minimax and asymptotically optimal algorithm for stochastic bandits
We propose the klUCB ++ algorithm for regret minimization in stochastic...
read it
Pierre Ménard
is this you? claim profile