
InfiniteHorizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm
In this paper, we investigate the sample complexity of policy evaluation...
Leverage the Average: an Analysis of Regularization in RL
Building upon the formalism of regularized Markov decision processes, we...
Momentum in Reinforcement Learning
We adapt the optimization's concept of momentum to reinforcement learnin...
A Theory of Regularized Markov Decision Processes
Many recent successful (deep) reinforcement learning algorithms make use...
Anderson Acceleration for Reinforcement Learning
Anderson acceleration is an old and simple method for accelerating the c...
How to Combine TreeSearch Methods in Reinforcement Learning
Finitehorizon lookahead policies are abundantly used in Reinforcement L...
MultipleStep Greedy Policies in Online and Approximate Reinforcement Learning
Multiplestep lookahead policies have demonstrated high empirical compet...
Beyond the One Step Greedy Approach in Reinforcement Learning
The famous Policy Iteration algorithm alternates between policy improvem...
Rate of Convergence and Error Bounds for LSTD(λ)
We consider LSTD(λ), the leastsquares temporaldifference algorithm wit...
Approximate Policy Iteration Schemes: A Comparison
We consider the infinitehorizon discounted optimal control problem form...
Policy Search: Any Local Optimum Enjoys a Global Performance Guarantee
Local Policy Search is a popular reinforcement learning approach for han...
On the Performance Bounds of some Policy Search Dynamic Programming Algorithms
We consider the infinitehorizon discounted optimal control problem form...
Improved and Generalized Upper Bounds on the Complexity of Policy Iteration
Given a Markov Decision Process (MDP) with n states and a totalnumber m ...
Tight Performance Bounds for Approximate Modified Policy Iteration with NonStationary Policies
We consider approximate dynamic programming for the infinitehorizon sta...
Offpolicy Learning with Eligibility Traces: A Survey
In the framework of Markov Decision Processes, offpolicy learning, that...
On the Use of NonStationary Policies for Stationary InfiniteHorizon Markov Decision Processes
We consider infinitehorizon stationary γdiscounted Markov Decision Pro...
A Dantzig Selector Approach to Temporal Difference Learning
LSTD is a popular algorithm for value function approximation. Whenever t...
Approximate Modified Policy Iteration
Modified policy iteration (MPI) is a dynamic programming (DP) algorithm ...
On the Use of NonStationary Policies for InfiniteHorizon Discounted Markov Decision Processes
We consider infinitehorizon γdiscounted Markov Decision Processes, for...
Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view
We investigate projection methods, for evaluating a linear approximation...
