One-step dispatching policy improvement in multiple-server queueing systems with Poisson arrivals

03/28/2018
by   Olivier Bilenne, et al.
0

Policy iteration techniques for multiple-server dispatching rely on the computation of value functions. In this context, we consider the M/G/1-FCFS queue endowed with an arbitrarily-designed cost function for the waiting times of the incoming jobs, and we study an undiscounted value function integrating the total cost surplus expected from each state relative to the steady-state costs. When coupled with random initial policies, this value function takes closed-form expressions for polynomial and exponential costs, or for piecewise compositions of the latter, thus hinting in the most general case at the derivation of interval bounds for the value function in the form of power series or trigonometric sums. The value function approximations induced by Taylor polynomial expansions of the cost function prove however to converge only for entire cost functions with low growth orders, and to diverge otherwise. A more suitable approach for assessing convergent interval bounds is found in the uniform approximation framework. Bernstein polynomials constitute straightforward, yet slowly convergent, cost function approximators over intervals. The best convergence rate in the sense of D. Jackson's theorem is achieved by more sophisticated polynomial solutions derived from trigonometric sums. This study is organized as a guide to implementing multiple-server dispatching policies, from the specification of cost functions towards the computation of interval bounds for the value functions and the implementation of the policy improvement step.

READ FULL TEXT
research
01/16/2013

Policy Iteration for Factored MDPs

Many large MDPs can be represented compactly using a dynamic Bayesian ne...
research
08/26/2020

Inverse Policy Evaluation for Value-based Sequential Decision-making

Value-based methods for reinforcement learning lack generally applicable...
research
01/23/2013

Continuous Value Function Approximation for Sequential Bidding Policies

Market-based mechanisms such as auctions are being studied as an appropr...
research
07/31/2020

Queueing Network Controls via Deep Reinforcement Learning

Novel advanced policy gradient (APG) methods with conservative policy it...
research
09/14/2023

Rates of Convergence in Certain Native Spaces of Approximations used in Reinforcement Learning

This paper studies convergence rates for some value function approximati...
research
01/09/2020

Self-guided Approximate Linear Programs

Approximate linear programs (ALPs) are well-known models based on value ...
research
02/06/2020

Minimax Confidence Interval for Off-Policy Evaluation and Policy Optimization

We study minimax methods for off-policy evaluation (OPE) using value-fun...

Please sign up or login with your details

Forgot password? Click here to reset