Zap Q-Learning for Optimal Stopping Time Problems

04/25/2019
by   Shuhang Chen, et al.
0

We propose a novel reinforcement learning algorithm that approximates solutions to the problem of discounted optimal stopping in an irreducible, uniformly ergodic Markov chain evolving on a compact subset of R^n. A dynamic programming approach has been taken by Tsitsikilis and Van Roy to solve this problem, wherein they propose a Q-learning algorithm to estimate the value function, in a linear function approximation setting. The Zap-Q learning algorithm proposed in this work is the first algorithm that is designed to achieve optimal asymptotic variance. We prove convergence of the algorithm using ODE analysis, and the optimal asymptotic variance property is reflected via fast convergence in a finance example.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2019

Successive Over Relaxation Q-Learning

In a discounted reward Markov Decision Process (MDP) the objective is to...
research
03/04/2019

QuickStop: A Markov Optimal Stopping Approach for Quickest Misinformation Detection

This paper combines data-driven and model-driven methods for real-time m...
research
04/03/2023

Theoretical guarantees for neural control variates in MCMC

In this paper, we propose a variance reduction approach for Markov chain...
research
01/22/2017

Binary Matrix Guessing Problem

We introduce the Binary Matrix Guessing Problem and provide two algorith...
research
03/27/2019

The Global Convergence Analysis of the Bat Algorithm Using a Markovian Framework and Dynamical System Theory

The bat algorithm (BA) has been shown to be effective to solve a wider r...
research
06/03/2011

Efficient Reinforcement Learning Using Recursive Least-Squares Methods

The recursive least-squares (RLS) algorithm is one of the most well-know...
research
06/30/2023

TD Convergence: An Optimization Perspective

We study the convergence behavior of the celebrated temporal-difference ...

Please sign up or login with your details

Forgot password? Click here to reset