In some applications of reinforcement learning, a dataset of pre-collect...
Model-free algorithms for reinforcement learning typically require a
con...
The Q-learning algorithm is a simple and widely-used stochastic
approxim...
We introduce a new reinforcement learning principle that approximates th...
Actor-critic methods are widely used in offline reinforcement learning
p...
In the stochastic linear contextual bandit setting there exist several
m...
Policy optimization methods are popular reinforcement learning algorithm...
Several practical applications of reinforcement learning involve an agen...
There has been growing progress on theoretical analyses for provably
eff...
We study the exploration problem with approximate linear action-value
fu...
In order to make good decision under uncertainty an agent must learn fro...
We consider the exploration-exploitation dilemma in finite-horizon
reinf...
Strong worst-case performance bounds for episodic reinforcement learning...
This paper focuses on the problem of determining as large a region as
po...