On the Convergence of the Monte Carlo Exploring Starts Algorithm for Reinforcement Learning

02/10/2020
by   Che Wang, et al.
0

A simple and natural algorithm for reinforcement learning is Monte Carlo Exploring States (MCES), where the Q-function is estimated by averaging the Monte Carlo returns, and the policy is improved by choosing actions that maximize the current estimate of the Q-function. Exploration is performed by "exploring starts", that is, each episode begins with a randomly chosen state and action and then follows the current policy. Establishing convergence for this algorithm has been an open problem for more than 20 years. We make headway with this problem by proving convergence for Optimal Policy Feed-Forward MDPs, which are MDPs whose states are not revisited within any episode for an optimal policy. Such MDPs include all deterministic environments (including Cliff Walking and other gridworld examples) and a large class of stochastic environments (including Blackjack). The convergence results presented here make progress for this long-standing open problem in reinforcement learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2022

On the Convergence of Monte Carlo UCB for Random-Length Episodic MDPs

In reinforcement learning, Monte Carlo algorithms update the Q function ...
research
07/21/2020

On the Convergence of Reinforcement Learning with Monte Carlo Exploring Starts

A basic simulation-based reinforcement learning algorithm is the Monte C...
research
03/13/2018

Active Reinforcement Learning with Monte-Carlo Tree Search

Active Reinforcement Learning (ARL) is a twist on RL where the agent obs...
research
04/03/2018

Renewal Monte Carlo: Renewal theory based reinforcement learning

In this paper, we present an online reinforcement learning algorithm, ca...
research
05/26/2022

Approximate Q-learning and SARSA(0) under the ε-greedy Policy: a Differential Inclusion Analysis

Q-learning and SARSA(0) with linear function approximation, under ϵ-gree...
research
06/25/2022

Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization

The class of deep deterministic off-policy algorithms is effectively app...
research
07/02/2022

Reinforcement Learning Approaches for the Orienteering Problem with Stochastic and Dynamic Release Dates

In this paper, we study a sequential decision making problem faced by e-...

Please sign up or login with your details

Forgot password? Click here to reset