
Subsampling for Efficient NonParametric Bandit Exploration
In this paper we propose the first multiarmed bandit algorithm based on...
read it

Is Standard Deviation the New Standard? Revisiting the Critic in Deep Policy Gradients
Policy gradient algorithms have proven to be successful in diverse decis...
read it

Improved Exploration in Factored AverageReward MDPs
We consider a regret minimization task under the averagereward criterio...
read it

Optimal Strategies for GraphStructured Bandits
We study a structured variant of the multiarmed bandit problem specifie...
read it

Forcedexploration free Strategies for Unimodal Bandits
We consider a multiarmed bandit problem specified by a set of Gaussian ...
read it

Tightening Exploration in Upper Confidence Reinforcement Learning
The upper confidence reinforcement learning (UCRL2) strategy introduced ...
read it

Robust Estimation, Prediction and Control with Linear Dynamics and Generic Costs
We develop a framework for the adaptive model predictive control of a li...
read it

ModelBased Reinforcement Learning Exploiting StateAction Equivalence
Leveraging an equivalence property in the statespace of a Markov Decisi...
read it

Distributiondependent and Timeuniform Bounds for Piecewise i.i.d Bandits
We consider the setup of stochastic multiarmed bandits in the case when...
read it

Learning Multiple Markov Chains via Adaptive Allocation
We study the problem of learning the transition matrices of a set of Mar...
read it

Practical OpenLoop Optimistic Planning
We consider the problem of online planning in a Markov Decision Process ...
read it

Approximate Robust Control of Uncertain Dynamical Systems
This work studies the design of safe control policies for largescale no...
read it

VarianceAware Regret Bounds for Undiscounted Reinforcement Learning in MDPs
The problem of reinforcement learning in an unknown and discrete Markov ...
read it

Efficient tracking of a growing number of experts
We consider a variation on the problem of prediction with expert advice,...
read it

Streaming kernel regression with provably adaptive mean, variance, and regularization
We consider the problem of streaming kernel regression, when the observa...
read it

Boundary Crossing Probabilities for General Exponential Families
We consider parametric exponential families of dimension K on the real l...
read it

Random Shuffling and Resets for the Nonstationary Stochastic Bandit Problem
We consider a nonstationary formulation of the stochastic multiarmed b...
read it
OdalricAmbrym Maillard
is this you? claim profile