
Subsampling for Efficient NonParametric Bandit Exploration
In this paper we propose the first multiarmed bandit algorithm based on...
Is Standard Deviation the New Standard? Revisiting the Critic in Deep Policy Gradients
Policy gradient algorithms have proven to be successful in diverse decis...
Improved Exploration in Factored AverageReward MDPs
We consider a regret minimization task under the averagereward criterio...
Optimal Strategies for GraphStructured Bandits
We study a structured variant of the multiarmed bandit problem specifie...
Forcedexploration free Strategies for Unimodal Bandits
We consider a multiarmed bandit problem specified by a set of Gaussian ...
Tightening Exploration in Upper Confidence Reinforcement Learning
The upper confidence reinforcement learning (UCRL2) strategy introduced ...
Robust Estimation, Prediction and Control with Linear Dynamics and Generic Costs
We develop a framework for the adaptive model predictive control of a li...
ModelBased Reinforcement Learning Exploiting StateAction Equivalence
Leveraging an equivalence property in the statespace of a Markov Decisi...
Distributiondependent and Timeuniform Bounds for Piecewise i.i.d Bandits
We consider the setup of stochastic multiarmed bandits in the case when...
Learning Multiple Markov Chains via Adaptive Allocation
We study the problem of learning the transition matrices of a set of Mar...
Practical OpenLoop Optimistic Planning
We consider the problem of online planning in a Markov Decision Process ...
Approximate Robust Control of Uncertain Dynamical Systems
This work studies the design of safe control policies for largescale no...
VarianceAware Regret Bounds for Undiscounted Reinforcement Learning in MDPs
The problem of reinforcement learning in an unknown and discrete Markov ...
Efficient tracking of a growing number of experts
We consider a variation on the problem of prediction with expert advice,...
Streaming kernel regression with provably adaptive mean, variance, and regularization
We consider the problem of streaming kernel regression, when the observa...
Boundary Crossing Probabilities for General Exponential Families
We consider parametric exponential families of dimension K on the real l...
Random Shuffling and Resets for the Nonstationary Stochastic Bandit Problem
We consider a nonstationary formulation of the stochastic multiarmed b...
OdalricAmbrym Maillard
