The theory of matching started with Peterson and König and was under a lot of interests in graph theory with problems like maximum matchings. It was extended to online matching setting (Manshadi et al., 2012; Feldman et al., 2009; Jaillet and Lu, 2014) where one population is static and the other arrives according to a stochastic process. In the recent years, fully dynamic matching models have been considered where both populations are random. The importance of matching models was shown through applications in many fields: health (for Organ Sharing, [n. d.]; Ashlagi et al., 2013), ridesharing (Banerjee et al., 2018), power grid (Zdeborová et al., 2009)
, or pattern recognition(Schalkoff, 1991).
We study matching models from a queueing theory perspective, where a supply item and a demand item arrive to the system at each time step and can be matched or stay in buffers. (Mairesse and Moyal, 2016, Theorem 1) proves that in a matching model where items arrive one by one, there exists no arrival distribution which verifies the necessary stability conditions for bipartite matching graphs. This result justifies why we assume arrivals by pairs as in (Busic and Meyn, 2015; Busic et al., 2013). We consider that there is a holding cost that is a function of the buffer sizes. Our objective is to find the optimal matching policy in the discounted cost problem and in the average cost problem for general bipartite matching graphs. For this purpose, we model this problem as a Markov Decision Process.
The search for good policies and the question of their performance for various matching models have received great interest in the recent literature. For example, the FCFS infinite bipartite matching model was introduced in (Caldentey et al., 2009) and further studied in (Adan et al., 2017; Adan and Weiss, 2012) that established the reversibility of the dynamics and the product form of stationary distribution. In (Busic et al., 2013) the bipartite matching model was extended to other matching policies. In that paper, the authors established necessary conditions for stability and studied the stability region of various policies, including priorities and MaxWeigth. For ridesharing systems, state-dependent dispatch policies were identified in (Banerjee et al., 2018)
which achieved exponential decay of the demand-dropping probability in heavy traffic regime. In(Gurvich and Ward, 2014), the authors presented the imbalance process and derived a lower bound on the holding costs.
Optimality results are scarce, and have been derived for some matching models in the asymptotic regimes. An extension of the greedy primal-dual algorithm was developed in (Nazari and Stolyar, 2016) and was proved to be asymptotically optimal for the long-term average matching reward. However, they considered rewards on the edges, which differs from our model with holding costs. In (Busic and Meyn, 2015), the authors, based on a workload relaxation, identified a policy that is approximately optimal with bounded regret. Their results were made under the asymptotic heavy-traffic setting and thus, cannot be used in our framework as we consider any arrival rates under stability conditions.
We first consider a matching model with two supply and two demand classes. For this system, we show that the optimal matching policy is of threshold type for the diagonal edge and with priority to the end edges of the matching graph. We also compute the optimal threshold in the case of the average cost.
For more general bipartite matching graphs, the optimal matching policy identified in the case
can be generalized. We give a heuristic for general bipartite graphs where threshold-type policies performs also very well according to our preliminary numerical experiments.
2. Model Description
We consider a (bipartite) matching graph where and are, respectively, the set of demand nodes (or queues) and the set of supply nodes. is the set of allowed matching pairs. In Figure 1 it is depicted an example of a matching graph with three demand nodes and three supply nodes. In each time slot , a demand item and a supply item arrive to the system according to the i.i.d. arrival process . The demand item arrives to the queue (i.e ) with probability and the supply item arrives to the queue (i.e ) with probability .
We denote by the queue length of node at time slot , where . Let
be the vector of the queue length of all the nodes. We must havefor all . Matchings at time are carried out after the arrivals at time . Let . Hence, evolves over time according to the following expression:
where is the vector of the items that are matched at time which must belong to the set of admissible matchings. When the state of the system is , the set of admissible matchings is defined as:
where is the set of demand classes that can be matched with a class supply and is the set of supply classes that can be matched with a class demand. The extension to subsets and is and . is defined for all where is the set of all the possible states of the system. The dynamics of the system can be alternatively written as
which is a Markov Decision Process where the control is denoted by . We consider a linear cost function on the buffer size of the nodes: . Our analysis presented in the following sections holds for more general cost functions as long as they satisfy the assumptions of Theorem 2.1 and Theorem 2.2. We choosed a linear cost function because it satisfies these assumptions and allow us to give an analytical form for the optimal threshold. The buffer size of the nodes is infinite, thus we are in the unbounded costs setting.
A matching policy is a sequence of decision rules . A stationary matching policy is a matching policy which uses decisions rules that only depend on the state of the system and not on time . The goal is to obtain the optimal matching policy, that is, the policy that minimizes the cost of the system. We will study two optimization problems which can be written as:
The average cost problem:
The discounted cost problem:
where is the discount factor. The notation indicates that the expectation is over the arrival process given that and using the matching policy to determine the decision rules for all .
For a given function , , , we define for all :
and in particular, we define and . A solution of the discounted cost problem can be obtained as a solution of the Bellman fixed point equation . In the average cost problem, the Bellman equation is given by .
We say that a value function or a decision rule is structured if it satisfies a special property, such as being increasing, decreasing or convex. Throughout the article, by increasing we mean nondecreasing and we will use strictly increasing for increasing. A policy is called structured when it only uses structured decision rules.
The framework of this work is that of property preservation when we apply the Dynamic Programming operator. First, we identify a set of structured value functions and a set of structured decision rules such that if the value function belongs to an optimal decision rule belongs to . Then, we show that the properties of are preserved by the Dynamic Programming operator and that they hold in the limit. Theorem 2.1 (Hyon and Jean-Marie, 2012, Theorem 1) lets us conclude that there exists an optimal policy which can be chosen in the set of structured policies .
Theorem 2.1 ().
(Hyon and Jean-Marie, 2012, Theorem 1) Assume that the following properties hold: there exists positive function on the state space such that
and for every , , there exists , and some integer such that for every -tuple of Markov deterministic decision rules and every
where denotes the -step transition matrix under policy . Let . Let the set of functions in the state space which have a finite -weighted supremum norm, i.e., . Assume that
for each , there exists a deterministic Markov decision rule such that .
Let and be such that
implies that ;
implies that there exists a decision such that ;
is a closed subset of the set of value functions by simple convergence.
Then, there exists an optimal stationary policy that belongs to with .
This result is an adapted version of (Puterman, 2005, Theorem 6.11.3). The former removes the need to verify that (assumption made in the latter) and its statement separates the structural requirements ((a), (b) and (c)) from the technical requirements related to the unboundedness of the cost function ((2), (3), (4) and ()).
In the case of the average cost problem, we will use the results of the discounted cost problem. We consider the average cost problem as a limit when tends to one and we show that the properties still hold for this limit. In order to prove the optimality in the average cost case, we will use (Puterman, 2005, Theorem 8.11.1):
Theorem 2.2 ().
(Puterman, 2005, Theorem 8.11.1) Suppose that the following properties hold:
There exists a nonnegative function such that
There exists for which
for any sequence , , for which ,
implies that there exists a decision such that ;
Then and implies that the stationary matching policy which uses is lim sup average optimal.
3. The case
We consider the system that is formed by two supply nodes and two demand nodes with a -shaped matching graph (see Figure 2). Let be the edge between and , i=1,2, and the edge between and . Let us also define as the imaginary edge between and (imaginary because ) that we introduce to ease the notations. To ensure stability, we assume that .
The set of all the possible states of the system is
and the set of possible matchings, when the state of the system is , is:
We will show that the optimal policy for this case has a specific structure. For this purpose, we first present the properties of the value function. Then, we show how these properties characterize the optimal decision rule and how they are preserved by the dynamic programming operator. Finally, we prove the desired results in Theorem 3.10 and Theorem 3.11.
3.1. Value Function Properties
Let be the vector of all zeros except in the -th coordinate, . Let , , and . We start by defining increasing properties in , and :
Definition 3.1 (Increasing property).
Let . We say that a function is increasing in or if
Remark 1 ().
The increasing property in can be interpreted as the fact that we prefer to match and rather than to match . Indeed, .
We also define the convexity in and as follows:
Definition 3.2 (Convexity property).
A function is convex in or if is increasing in , i.e.,
Likewise, is convex in or if is nondecreasing in , i.e.,
Definition 3.3 (Boundary property).
A function if
As we will show in Proposition 3.7, the properties , , and characterize the optimal decision rule. On the other hand, and are required to show that is preserved by the operator .
3.2. Optimal decision rule
In this section, we show that, for any , there is a control of threshold-type in with priority to and that minimizes the .
Definition 3.4 (Threshold-type decision rule).
A decision rule is said to be of threshold type in with priority to and if:
it matches all of and .
it matches only if the remaining items (in and ) are above a specific threshold, denoted by (with ).
This means that:
Remark 2 ().
If , the decision rule will never match . If , the decision rule will match until the remaining items in and are equal to the threshold .
In the remainder of the article, we consider that is the set of decision rule that are of threshold type in with priority to and (as defined in Definition 3.4) for any . In the next proposition, we establish that there exists an optimal decision rule with priority to and .
Proposition 3.5 ().
Let , let . For any , there exists such that , and . In particular, this result holds for the average operator: .
Let , , , and (the number of matchings in of ). The maximum number of matchings in is denoted by and in by .
Let be the number of possible matchings that can be transformed from to and matchings. We define a policy that removes the matchings in and matches times and , that is, . We verify that this policy is admissible, i.e : (c) is true because . (a) is true because and . () and () are true because and . Then, we can use the fact that to show that .
Moreover, we define that matches all the the possible and of , that is, of the remaining items when we apply : . We also verify that this policy is admissible, i.e : (c), () and () are true because . If , then
If , then
If , then
In every case, (a) is true. Hence, since , it results that .
As a result, we have , and . This was done for any and because is finite for every , we can choose such that it belongs to giving the final result. ∎
From this result, it follows that there exists an optimal decision rule that matches all possible and . In addition, due to Proposition 3.5 and (c) from the definition of , there exists such that we have and . Our goal now is to find the optimal number of matchings in (i.e the optimal ). We introduce first some notation:
Definition 3.6 ().
Let , . We define:
the set of possible matching in after having matched all possible and .
Remark 3 ().
The state of the system after having matched all possible and is of the form if and of the form otherwise (because of the definition of and ).
Finally, we prove that a decision rule of threshold type in with priority to and is optimal. This is done by choosing the right for different cases such that is the optimal number of matchings in for a given .
Proposition 3.7 ().
Let . Let . There exists (see Definition 3.4) such that . In particular, this result holds for the average operator: .
Let and . We note and . We supposed that , so we can use Proposition 3.5 : such that and with . We now have to prove that there exists such that
where (see Definition 3.4). If , then and we have which satisfies (5). Otherwise, and . Therefore, the state of the system after having matched (or ), i.e (or ), is of the form . So when we compare with , this comes down to comparing with ().
First of all, suppose that . We choose , so and . By assumption, we have for all and because , we have proven (5).
Then, suppose that . We choose , so and . By assumption and because is convex in , we have for all and because (with for all ), we have proven (5).
Finally, suppose that . Let . By definition of and by convexity of in , we have