
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
Statistical performance bounds for reinforcement learning (RL) algorithm...
read it

Beyond No Regret: InstanceDependent PAC Reinforcement Learning
The theory of reinforcement learning has focused on two fundamental prob...
read it

A PAC algorithm in relative precision for bandit problem with costly sampling
This paper considers the problem of maximizing an expectation function o...
read it

Private Reinforcement Learning with PAC and Regret Guarantees
Motivated by highstakes decisionmaking domains like personalized medic...
read it

On Statistical Learning of Simplices: Unmixing Problem Revisited
Learning of highdimensional simplices from uniformlysampled observatio...
read it

A Greedy Approximation of Bayesian Reinforcement Learning with Probably Optimistic Transition Model
Bayesian Reinforcement Learning (RL) is capable of not only incorporatin...
read it

Smoothed Dual Embedding Control
We revisit the Bellman optimality equation with Nesterov's smoothing tec...
read it
UniformPAC Bounds for Reinforcement Learning with Linear Function Approximation
We study reinforcement learning (RL) with linear function approximation. Existing algorithms for this problem only have highprobability regret and/or Probably Approximately Correct (PAC) sample complexity guarantees, which cannot guarantee the convergence to the optimal policy. In this paper, in order to overcome the limitation of existing algorithms, we propose a new algorithm called FLUTE, which enjoys uniformPAC convergence to the optimal policy with high probability. The uniformPAC guarantee is the strongest possible guarantee for reinforcement learning in the literature, which can directly imply both PAC and high probability regret bounds, making our algorithm superior to all existing algorithms with linear function approximation. At the core of our algorithm is a novel minimax value function estimator and a multilevel partition scheme to select the training samples from historical observations. Both of these techniques are new and of independent interest.
READ FULL TEXT
Comments
There are no comments yet.