
-
Improved Regret Bound and Experience Replay in Regularized Policy Iteration
In this work, we study algorithms for learning in infinite-horizon undis...
read it
-
Optimization Issues in KL-Constrained Approximate Policy Iteration
Many reinforcement learning algorithms can be seen as versions of approx...
read it
-
On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function
We consider the problem of local planning in fixed-horizon Markov Decisi...
read it
-
The Elliptical Potential Lemma Revisited
This note proposes a new proof and new perspectives on the so-called Ell...
read it
-
Regret Balancing for Bandit and RL Model Selection
We consider model selection in stochastic bandit and reinforcement learn...
read it
-
Sample Efficient Graph-Based Optimization with Noisy Observations
We study sample complexity of optimizing "hill-climbing friendly" functi...
read it
-
Model Selection in Contextual Stochastic Bandit Problems
We study model selection in stochastic bandit problems. Our approach rel...
read it
-
Provably Efficient Adaptive Approximate Policy Iteration
Model-free reinforcement learning algorithms combined with value functio...
read it
-
Exploration-Enhanced POLITEX
We study algorithms for average-cost reinforcement learning problems wit...
read it
-
Thompson Sampling and Approximate Inference
We study the effects of approximate inference on the performance of Thom...
read it
-
Bootstrapping Upper Confidence Bound
Upper Confidence Bound (UCB) method is arguably the most celebrated one ...
read it
-
Large-Scale Markov Decision Problems via the Linear Programming Dual
We consider the problem of controlling a fully specified Markov decision...
read it
-
New Insights into Bootstrapping for Bandits
We investigate the use of bootstrapping in the bandit setting. We first ...
read it
-
Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting
We study the problem of sampling from a distribution where the negative ...
read it
-
Offline Evaluation of Ranking Policies with Click Models
Many web systems rank and present a list of items to users, from recomme...
read it
-
Regret Bounds for Model-Free Linear Quadratic Control
Model-free approaches for reinforcement learning (RL) and continuous con...
read it
-
Optimizing over a Restricted Policy Class in Markov Decision Processes
We address the problem of finding an optimal policy in a Markov decision...
read it
-
A Continuation Method for Discrete Optimization and its Application to Nearest Neighbor Classification
The continuation method is a popular approach in non-convex optimization...
read it
-
Stochastic Low-Rank Bandits
Many problems in computer vision and recommender systems involve low-ran...
read it
-
Posterior Sampling for Large Scale Reinforcement Learning
Posterior sampling for reinforcement learning (PSRL) is a popular algori...
read it
-
Conservative Contextual Linear Bandits
Safety is a desirable property that can immensely increase the applicabi...
read it
-
Hit-and-Run for Sampling and Planning in Non-Convex Spaces
We propose the Hit-and-Run algorithm for planning and sampling problems ...
read it
-
Online learning in MDPs with side information
We study online learning of finite Markov decision process (MDP) problem...
read it
-
Linear Programming for Large-Scale Markov Decision Problems
We consider the problem of controlling a Markov decision process (MDP) w...
read it
-
Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions
We study the problem of learning Markov decision processes with finite s...
read it
-
Improved Mean and Variance Approximations for Belief Net Responses via Network Doubling
A Bayesian belief network models a joint distribution with an directed a...
read it
-
Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems
The analysis of online least squares estimation is at the heart of many ...
read it