Reinforcement Learning Approaches for the Orienteering Problem with Stochastic and Dynamic Release Dates

07/02/2022
by   Yuanyuan Li, et al.
0

In this paper, we study a sequential decision making problem faced by e-commerce carriers related to when to send out a vehicle from the central depot to serve customer requests, and in which order to provide the service, under the assumption that the time at which parcels arrive at the depot is stochastic and dynamic. The objective is to maximize the number of parcels that can be delivered during the service hours. We propose two reinforcement learning approaches for solving this problem, one based on a policy function approximation (PFA) and the second on a value function approximation (VFA). Both methods are combined with a look-ahead strategy, in which future release dates are sampled in a Monte-Carlo fashion and a tailored batch approach is used to approximate the value of future states. Our PFA and VFA make a good use of branch-and-cut-based exact methods to improve the quality of decisions. We also establish sufficient conditions for partial characterization of optimal policy and integrate them into PFA/VFA. In an empirical study based on 720 benchmark instances, we conduct a competitive analysis using upper bounds with perfect information and we show that PFA and VFA greatly outperform two alternative myopic approaches. Overall, PFA provides best solutions, while VFA (which benefits from a two-stage stochastic optimization model) achieves a better tradeoff between solution quality and computing time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2023

On The Convergence Of Policy Iteration-Based Reinforcement Learning With Monte Carlo Policy Evaluation

A common technique in reinforcement learning is to evaluate the value fu...
research
07/21/2020

On the Convergence of Reinforcement Learning with Monte Carlo Exploring Starts

A basic simulation-based reinforcement learning algorithm is the Monte C...
research
02/14/2019

On Reinforcement Learning Using Monte Carlo Tree Search with Supervised Learning: Non-Asymptotic Analysis

Inspired by the success of AlphaGo Zero (AGZ) which utilizes Monte Carlo...
research
03/11/2013

Monte-Carlo utility estimates for Bayesian reinforcement learning

This paper introduces a set of algorithms for Monte-Carlo Bayesian reinf...
research
02/10/2020

On the Convergence of the Monte Carlo Exploring Starts Algorithm for Reinforcement Learning

A simple and natural algorithm for reinforcement learning is Monte Carlo...
research
05/10/2023

Constant Approximation for Network Revenue Management with Markovian-Correlated Customer Arrivals

The Network Revenue Management (NRM) problem is a well-known challenge i...
research
12/30/2022

A deep real options policy for sequential service region design and timing

As various city agencies and mobility operators navigate toward innovati...

Please sign up or login with your details

Forgot password? Click here to reset