
InfiniteHorizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm
In this paper, we investigate the sample complexity of policy evaluation...
read it

Leverage the Average: an Analysis of Regularization in RL
Building upon the formalism of regularized Markov decision processes, we...
read it

Momentum in Reinforcement Learning
We adapt the optimization's concept of momentum to reinforcement learnin...
read it

A Theory of Regularized Markov Decision Processes
Many recent successful (deep) reinforcement learning algorithms make use...
read it

Anderson Acceleration for Reinforcement Learning
Anderson acceleration is an old and simple method for accelerating the c...
read it

How to Combine TreeSearch Methods in Reinforcement Learning
Finitehorizon lookahead policies are abundantly used in Reinforcement L...
read it

MultipleStep Greedy Policies in Online and Approximate Reinforcement Learning
Multiplestep lookahead policies have demonstrated high empirical compet...
read it

Beyond the One Step Greedy Approach in Reinforcement Learning
The famous Policy Iteration algorithm alternates between policy improvem...
read it

Rate of Convergence and Error Bounds for LSTD(λ)
We consider LSTD(λ), the leastsquares temporaldifference algorithm wit...
read it

Approximate Policy Iteration Schemes: A Comparison
We consider the infinitehorizon discounted optimal control problem form...
read it

Policy Search: Any Local Optimum Enjoys a Global Performance Guarantee
Local Policy Search is a popular reinforcement learning approach for han...
read it

On the Performance Bounds of some Policy Search Dynamic Programming Algorithms
We consider the infinitehorizon discounted optimal control problem form...
read it

Improved and Generalized Upper Bounds on the Complexity of Policy Iteration
Given a Markov Decision Process (MDP) with n states and a totalnumber m ...
read it

Tight Performance Bounds for Approximate Modified Policy Iteration with NonStationary Policies
We consider approximate dynamic programming for the infinitehorizon sta...
read it

Offpolicy Learning with Eligibility Traces: A Survey
In the framework of Markov Decision Processes, offpolicy learning, that...
read it

On the Use of NonStationary Policies for Stationary InfiniteHorizon Markov Decision Processes
We consider infinitehorizon stationary γdiscounted Markov Decision Pro...
read it

A Dantzig Selector Approach to Temporal Difference Learning
LSTD is a popular algorithm for value function approximation. Whenever t...
read it

Approximate Modified Policy Iteration
Modified policy iteration (MPI) is a dynamic programming (DP) algorithm ...
read it

On the Use of NonStationary Policies for InfiniteHorizon Discounted Markov Decision Processes
We consider infinitehorizon γdiscounted Markov Decision Processes, for...
read it

Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view
We investigate projection methods, for evaluating a linear approximation...
read it