Convergence of SARSA with linear function approximation: The random horizon case

06/07/2023
by   Lina Palmborg, et al.
0

The reinforcement learning algorithm SARSA combined with linear function approximation has been shown to converge for infinite horizon discounted Markov decision problems (MDPs). In this paper, we investigate the convergence of the algorithm for random horizon MDPs, which has not previously been shown. We show, similar to earlier results for infinite horizon discounted MDPs, that if the behaviour policy is ε-soft and Lipschitz continuous with respect to the weight vector of the linear function approximation, with small enough Lipschitz constant, then the algorithm will converge with probability one when considering a random horizon MDP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2021

Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

We study reinforcement learning in an infinite-horizon average-reward se...
research
10/27/2021

Finite Horizon Q-learning: Stability, Convergence and Simulations

Q-learning is a popular reinforcement learning algorithm. This algorithm...
research
01/09/2023

Minimax Weight Learning for Absorbing MDPs

Reinforcement learning policy evaluation problems are often modeled as f...
research
02/14/2022

On the Chattering of SARSA with Linear Function Approximation

SARSA, a classical on-policy control algorithm for reinforcement learnin...
research
02/27/2021

Parallel Stochastic Mirror Descent for MDPs

We consider the problem of learning the optimal policy for infinite-hori...
research
09/07/2022

On the Convergence of Monte Carlo UCB for Random-Length Episodic MDPs

In reinforcement learning, Monte Carlo algorithms update the Q function ...
research
06/17/2020

A maximum-entropy approach to off-policy evaluation in average-reward MDPs

This work focuses on off-policy evaluation (OPE) with function approxima...

Please sign up or login with your details

Forgot password? Click here to reset