
Improved Regret Bound and Experience Replay in Regularized Policy Iteration
In this work, we study algorithms for learning in infinitehorizon undis...
read it

Optimization Issues in KLConstrained Approximate Policy Iteration
Many reinforcement learning algorithms can be seen as versions of approx...
read it

On Queryefficient Planning in MDPs under Linear Realizability of the Optimal Statevalue Function
We consider the problem of local planning in fixedhorizon Markov Decisi...
read it

The Elliptical Potential Lemma Revisited
This note proposes a new proof and new perspectives on the socalled Ell...
read it

Regret Balancing for Bandit and RL Model Selection
We consider model selection in stochastic bandit and reinforcement learn...
read it

Sample Efficient GraphBased Optimization with Noisy Observations
We study sample complexity of optimizing "hillclimbing friendly" functi...
read it

Model Selection in Contextual Stochastic Bandit Problems
We study model selection in stochastic bandit problems. Our approach rel...
read it

Provably Efficient Adaptive Approximate Policy Iteration
Modelfree reinforcement learning algorithms combined with value functio...
read it

ExplorationEnhanced POLITEX
We study algorithms for averagecost reinforcement learning problems wit...
read it

Thompson Sampling and Approximate Inference
We study the effects of approximate inference on the performance of Thom...
read it

Bootstrapping Upper Confidence Bound
Upper Confidence Bound (UCB) method is arguably the most celebrated one ...
read it

LargeScale Markov Decision Problems via the Linear Programming Dual
We consider the problem of controlling a fully specified Markov decision...
read it

New Insights into Bootstrapping for Bandits
We investigate the use of bootstrapping in the bandit setting. We first ...
read it

Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting
We study the problem of sampling from a distribution where the negative ...
read it

Offline Evaluation of Ranking Policies with Click Models
Many web systems rank and present a list of items to users, from recomme...
read it

Regret Bounds for ModelFree Linear Quadratic Control
Modelfree approaches for reinforcement learning (RL) and continuous con...
read it

Optimizing over a Restricted Policy Class in Markov Decision Processes
We address the problem of finding an optimal policy in a Markov decision...
read it

A Continuation Method for Discrete Optimization and its Application to Nearest Neighbor Classification
The continuation method is a popular approach in nonconvex optimization...
read it

Stochastic LowRank Bandits
Many problems in computer vision and recommender systems involve lowran...
read it

Posterior Sampling for Large Scale Reinforcement Learning
Posterior sampling for reinforcement learning (PSRL) is a popular algori...
read it

Conservative Contextual Linear Bandits
Safety is a desirable property that can immensely increase the applicabi...
read it

HitandRun for Sampling and Planning in NonConvex Spaces
We propose the HitandRun algorithm for planning and sampling problems ...
read it

Online learning in MDPs with side information
We study online learning of finite Markov decision process (MDP) problem...
read it

Linear Programming for LargeScale Markov Decision Problems
We consider the problem of controlling a Markov decision process (MDP) w...
read it

Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions
We study the problem of learning Markov decision processes with finite s...
read it

Improved Mean and Variance Approximations for Belief Net Responses via Network Doubling
A Bayesian belief network models a joint distribution with an directed a...
read it

Online Least Squares Estimation with SelfNormalized Processes: An Application to Bandit Problems
The analysis of online least squares estimation is at the heart of many ...
read it