# Periodic Reranking for Online Matching of Reusable Resources

We consider a generalization of the vertex weighted online bipartite matching problem where the offline vertices, called resources, are reusable. In particular, when a resource is matched it is unavailable for a deterministic time duration d after which it becomes available for a re-match. Thus, a resource can be matched to many different online vertices over a period of time. While recent work on the problem has resolved the asymptotic case where we have large starting inventory (i.e., many copies) of every resource, we consider the (more general) case of unit inventory and give the first algorithm that is provably better than the naïve greedy approach which has a competitive ratio of (exactly) 0.5. In particular, we achieve a competitive ratio of 0.589 against an LP relaxation of the offline problem.

## Authors

• 7 publications
08/21/2020

### Greedy Approaches to Online Stochastic Matching

Within the context of stochastic probing with commitment, we consider th...
11/22/2017

### Allocation Problems in Ride-Sharing Platforms: Online Matching with Offline Reusable Resources

Bipartite matching markets pair agents on one side of a market with agen...
07/09/2019

### Vertex-weighted Online Stochastic Matching with Patience Constraints

Online Bipartite Matching is a classic problem introduced by Karp, Vazir...
12/08/2021

### Greedy Algorithm for Multiway Matching with Bounded Regret

In this paper we prove the efficacy of a simple greedy algorithm for a f...
02/06/2020

### Online Allocation of Reusable Resources: Achieving Optimal Competitive Ratio

We study the problem of allocating a given set of resources to sequentia...
05/07/2019

### Online and Offline Greedy Algorithms for Routing with Switching Costs

Motivated by the use of high speed circuit switches in large scale data ...
07/26/2020

### Resource Augmentation

This chapter introduces resource augmentation, in which the performance ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

On a platform such as Airbnb, where heterogeneous customers arrive over time, ensuring a good match between property listings (resources) and customers is a challenging task. Of the many challenges, a well studied one stems from the sequential and uncertain (online) nature of demand. There is a wealth of work on policies for pricing and allocating resources when the demand is online (online matching (Mehta et al. 2013), online assortments (Golrezaei et al. 2014), pricing and network revenue management (Talluri et al. 2004)). Most of these works focus on resources that are not reusable and may be allocated at most once. However, a feature common to most online platforms in the sharing economy is that resources are reusable and a given unit of a resource may be re-allocated several times. This has led to a surge of interest in designing policies for online allocation of reusable resources. Nonetheless, many fundamental questions remain unanswered. In this paper, we study one such question.

We start by defining the setting of online matching with reusable resources (OMR). Consider a set of reusable resources with unit inventory and price . Requests for resources arrive sequentially and each request is for up to one unit of some subset of resources. Formally, let denote the set of requests, also called arrivals, and let denote the set of arrival times. Up on arrival of request at time , we observe set of edges incident on , and must make an immediate and irrevocable decision to match to a neighboring resource or reject the request. At time , we have no knowledge of future arrivals, which could arrive in an adversarial order. When resource is matched to an arrival, we obtain reward and the resource is unavailable for subsequent arrivals during the next units of time i.e., if is matched to , it is unavailable during the interval . After this interval, resource is available for re-match. The duration is known to us and our objective is to maximize total reward. An online algorithm for this problem is evaluated relative to the optimal offline solution to the problem. The offline solution is computed with knowledge of the entire sequence of arrivals and arrival times . This comparison is quantified through the competitive ratio, defined as follows,

 minGOnline(G)Offline(G),

where represents an instance of the problem, characterized by sets and and values and . Online represents the (expected) total reward of a (randomized) online algorithm. We are interested in designing an online algorithm with a strong competitive ratio guarantee for this problem.

OMR is a fundamental generalization of the classic online bipartite matching (OM) problem (Karp et al. 1990), where resources have identical rewards and each resource can be matched at most once. To see this, observe that when and rewards , then every resource can be matched to at most one arrival and the objective simplifies to finding the maximum matching. For OM, it is well known that no deterministic algorithm can achieve a competitive ratio better than 0.5 (Karp et al. 1990). As OMR generalizes OM, this upper bound applies directly. In fact, the greedy algorithm that matches every arrival to an available neighbor (and breaks ties deterministically) is exactly 0.5 competitive for both OM (Karp et al. 1990) and OMR (Gong et al. 2021). The main algorithm design goal in these problems and their many variations, is to design an online algorithm that outperforms greedy.

For the OM problem, Karp et al. (1990) proposed the Ranking algorithm that randomly ranks resources at the beginning and then matches each arrival to the best ranked neighbor that is available. They showed that Ranking is competitive for OM and this is the best possible guarantee for any online algorithm. For the OMR problem, no algorithm with guarantee better than 0.5 was known prior to this work. We establish the following result for this problem.

There is a randomized algorithm for OMR (Algorithm 1 with ), with competitive ratio 0.589.

Next, we discuss previous work on this problem. This is followed by a discussion on the significance of the unit inventory setting.

### 1.1 Previous Work

Settings with uncertain sequentially arriving demand and reusable resources has received significant interest recently. We start by discussing work that is closest to our setting. A generalization of OMR was first studied by Gong et al. (2021). In their setting,

1. Usage durations are stochastic i.e., when a resource is allocated, it is used for an independently sampled random duration .

2. Instead of matching an arrival to a resource, we offer an assortment (i.e., set) of resources to each arrival, and they choose up to one resource from the assortment according to a stochastic choice model that is revealed on arrival.

They showed that the greedy algorithm which offers a revenue maximizing assortment to each arrival, is 0.5 competitive for this general setting. Subsequently, Feng et al. (2019, 2021) and Goyal et al. (2021) considered this setting with the additional structural assumption of large starting inventory for every resource i.e., large number of identical copies of each resource. Despite this assumption, is the best possible guarantee for the problem. Feng et al. (2019, 2021) showed that a classic inventory balancing algorithm, originally proposed for non-reusable resources, is competitive for reusable resources with deterministic (but not necessarily identical) usage times. Goyal et al. (2021) considered the general case of stochastic usage durations (same as Gong et al. (2021)), and demonstrated that classic approaches fail to improve on the performance of greedy in this more general setting. They proposed a novel algorithm that accounts for the stochastic nature of reusability by balancing “effective” inventory in a fluid way, and achieves the best possible guarantee of for arbitrary usage distributions. We note that a parallel stream of work considers the setting of reusable resources with stochastic arrivals (Dickerson et al. 2018, Rusmevichientong et al. 2020, Baek and Ma 2019, Feng et al. 2020). For a detailed review of these settings, see Gong et al. (2021) and Goyal et al. (2021).

The setting of unit inventory (considered in this paper) captures the large inventory setting as a special case. When we have multiple copies of a resources, we may treat each copy as a distinct resource with unit inventory. Thus, an algorithm designed for the unit inventory case can be generalized to settings with arbitrary (known) inventory without affecting its performance guarantee (see Gong et al. (2021) for a formal proof). Prior to this work, the 0.5 guarantee of greedy was the best known result for OMR.

For non-reusable resources, there is a wealth of work that improves on greedy and achieves the best possible guarantee of . Recall, Karp et al. (1990) introduced the OM problem and showed (among other results) that the Ranking algorithm which randomly ranking resources at the start and then matches every arrival to the best ranked unmatched resource, is competitive for OM and that this is the best possible guarantee achievable for the problem. The analysis of Ranking was clarified and considerably simplified by Birnbaum and Mathieu (2008) and Goel and Mehta (2008). In the more general setting where resources have arbitrary rewards , Aggarwal et al. (2011) proposed the Perturbed Greedy algorithm and showed that is competitive for this generalization of OM. In the large inventory setting, Kalyanasundaram and Pruhs (2000) considered the problem of online matching where the budget of every resource can be more than 1 and showed that as , the natural (deterministic) algorithm that balances the budget used across resources is competitive. Also in the large inventory setting, Mehta et al. (2007) introduced the Adwords problem which generalizes the OM setting by allowing multi-unit demand. They gave an online algorithm with guarantee of for Adwords. Buchbinder et al. (2007) gave a primal-dual analysis for the result of Mehta et al. (2007). Subsequently, Devanur et al. (2013) proposed the randomized primal-dual framework that can be used to show the aforementioned results in a unified way. In addition to these settings, there is a vast body of work on online matching and (non-reusable) resource allocation in stochastic and hybrid/mixed models of arrival. For a comprehensive review of these works, see Mehta et al. (2013).

### 1.2 Significance of the Unit Inventory Setting

Reusability of resources is an undeniably important aspect of online platforms such as Airbnb, Upwork, Thumbtack. In these settings, each resource is unique and there is often just one unit of inventory per resource. For example, on a platform such as Airbnb, where every listing is a reusable resource, a typical listing may be occupied by at most one customer at a time. A similar situation arises in case of boutique hotels (Rusmevichientong et al. 2021). On platforms such as Upwork and Thumbtack (Feng et al. 2021), each free-lancing agent can be modeled as a distinct reusable resource that can perform at most one task at a time. Settings with small (not necessarily unit) inventory also arise in applications where procuring new inventory is expensive, sales are slow moving, and resources can be reused many times before expiry. For instance, Besbes et al. (2020) consider a setting where resources are rotable spare parts for aircrafts and the starting inventory for most parts is under 10 units (see Figure 9 in Besbes et al. (2020)).

In contrast, the large inventory assumption is natural for applications such as cloud computing, where each machine is a resource and the capacity of a machine is the number of jobs it can handle in parallel (Goyal et al. 2021). Another instance where the large inventory assumption is appropriate is make-to-order settings (Gong et al. 2021), where each production line or machine is a resource and the capacity is measured in the number of units of a good that the machine can manufacture in a day.

## 2 The Periodic Reranking Algorithm

At the start of the planning horizon, the PR algorithm (independently) samples a random seed , for every . Using this seed, and a monotonically increasing trade-off function , the algorithm evaluates reduced prices . Observe that the reduced prices change over time. In particular, after every units of time, PR samples new seeds for the resources. Re-sampling over periods of length ensures that resources have a new seed every time they return back to the system after a match. Given the reduced prices, PR

matches each arrival to an available neighbor with the highest reduced price at the moment of arrival. The name Periodic Reranking comes from the following observation. When rewards

, due to the monotonicity of , the algorithm is equivalent to reranking resources after every units of time and matching arrivals to the best ranked available neighbor.

When resources are non-reusable, say , the PR algorithm reduces to the Perturbed Greedy (PG) algorithm. For the PG algorithm, Aggarwal et al. (2011) showed that choosing leads to the best possible guarantee of for OM with arbitrary rewards. In PR, we consider the family of functions parameterized by . Our analysis dictates the choice of . In particular, optimizes the guarantee that can be achieved with our analysis.

### 2.1 Intuition Behind Reranking and Periodicity

We start by making a key observation about the Ranking algorithm for OM through an example.

###### Example 2.1

Consider a setting with three (non-reusable) resources

and two arrivals. The first arrival has an edge to resources 1 and 2. The second arrival has an edge to resources 2 and 3. Now, Ranking matches the first arrival to resource 2 with probability 0.5. Conditioned on resource 2 being available at the second arrival, the probability that it is matched is 1/3 (

). In contrast, if we rerank resources at the second arrival (equivalent to randomly picking a resource to match), this conditional probability is 0.5.

This example highlights the balancing property of Ranking. Resources that have been considered for fewer matches previously, have a higher probability of being chosen for a match. When resources are reusable, the need to balance requires a careful adjustment. This is illustrated by the case where i.e., a matched resource returns right away. In this case, greedy is always optimal since matching a resource now has no impact on future availability of the resource. However, in general, greedy is no better than 0.5 competitive, demonstrating the need for at least some balancing (but less than Ranking). A natural way to accomplish this is to rerank resources. As illustrated in Example 2.1, reranking removes the balancing effect. Taking this idea to the extreme leads us to the following algorithm.

Frequent reranking: Consider the algorithm that reranks resources at every arrival. When vertex weights are identical, say , this algorithm is equivalent to the following randomized algorithm: Match every arrival (that can be matched) by sampling a resource uniformly randomly. This algorithm, called Random, is known to have worst case performance same as greedy even for non-reusable resources (Karp et al. 1990).

So clearly, we need a less extreme approach to reranking. To motivate our next algorithm, observe that a decision to match a resource at time does not affect the resource availability after time . In operationalizing this insight, we hope to use reranking to untangle the the dependence between matching decisions for arrivals that are well separated across time. The next algorithm tries to accomplish this by reranking a resource every time it is reused.

Reranking on Return (RoR): A less aggressive reranking algorithm is as follows: Rerank a resource every time it returns back to the system after a match. We call this the RoR algorithm. Notice that if a resource is not highly ranked then it may note be matched and its rank is not reset. Consequently, RoR does not fully succeed in untangling dependence between matching decisions at arrivals that are well separated. In particular, consider Example 2.1 with the two arrivals separated by units of time. While resource 2 is always available at the second arrival, the probability that it has a higher rank (at second arrival) than resource 3, is ().

Periodic Reranking overcomes this issue by reranking on a periodic schedule. Notice that PR generates a new rank more frequently than RoR. Revisiting Example 2.1 with PR, we see that the probability that resource 2 is matched to the second arrival (when both arrivals are well separated), is exactly 0.5, as desired. Within a period, PR maintains the same rank and provides the balancing effect of Ranking. Indeed, PR reduces to Ranking when .

## 3 Analysis of Periodic Reranking

Our analysis relies on the primal-dual framework of Devanur et al. (2013), which is a versatile and general technique for proving guarantees for online matching and related problems. To describe the framework, consider the following primal problem (adapted from Dickerson et al. (2018)), that upper bounds the optimal offline solution for OMR.

 Primal: min ∑(i,t)∈Erixit s.t. ∑τ≤t∣(i,τ)∈E1(a(t)−a(τ)≤d)xiτ≤1∀i∈I,t∈T ∑i∈I∣(i,t)∈Exit≤1∀t∈T xit≥0∀(i,t)∈E

Dual certificate: For , let denote the last arrival in the time interval . Let PR denote both the algorithm and its expected total reward. Now, suppose there exist non-negative values such that,

Then, by weak LP duality, we have that PR is competitive.

Recall that the PR algorithm works in fixed periods of length . Let denote the total number of periods and let denote the period that contains arrival . To ensure that is well defined for every , we add a dummy period (time interval ) prior to the first arrival. This period does not have any arrivals and simply ensures that for every arrival. Let denote the -th seed of resource . Note that is the seed of in period . Let

denote the vector of all random seeds. Given a resource

and arrival , let denote the vector of all seeds except and . In other words, captures all seeds except the seed of at and the period prior to . We use to denote expectation with respect to the randomness in seeds and .

In order to define our dual candidate, we first define random variables

, and subsequently set and . Recall that denotes the last arrival in the time interval , Inspired by Devanur et al. (2013), we set and as follows.

Dual fitting: Initialize all dual variables to 0. Conditioned on , for each match in PR set,

 λt(Y)=ri(1−g(yk(t)i)), (1)

and increment as follows,

 θit(d)(Y)=θit(d)(Y)+rig(yk(t)i). (2)

The dual candidate given by (1) and (2) satisfies constraint (i) of the dual certificate.

###### Proof.

Proof. Let denote the matching output by PR given seed vector . From (2), we have

 θiτ(Y)=ri∑t∣t(d)=τ,(i,t)∈\textscPR(Y)g(yk(t)i).

Observe that the sets , partition the set of arrivals that are matched in . Overloading notation, let also denote the total revenue of the matching . Then,

 ∑t∈Tλt(Y)+∑i∈I,t∈Tθit(Y)=∑(i,t)∈\textscPR(Y)ri=\textscPR(Y).

Consider an edge and seed . Suppose that for the candidate solution given by (1) and (2), we have

 Eyk(t)i,yk(t)−1i⎡⎣λt(Y)+t(d)∑τ=tθiτ(Y)∣∣Y−it⎤⎦≥αri, (3)

for some value . Then, constraint (ii) of the dual certificate is satisfied for edge with the same .

###### Proof.

Proof. The lemma follows by taking expectation over on both sides of (3). ∎

We now focus on proving (3) for every edge and seed . To this end, fix an arbitrary edge and seed . To simplify notation, let and . Further, let denote the matching output by PR given seeds for and with other seeds fixed according to . Let denote the set of resources available at arrival in . Given , for every arrival , define the critical threshold as the solution to,

 ri(1−g(ycτ(y1)))=maxj∈Sτ(y1,1),(j,τ)∈Erj(1−g(yk(τ)j)).

Due to the monotonicity of function , there is at most one solution to this equation. If there is no solution, we let .

Recall that we fixed , so for simplicity let and . We also write the conditional expectation as . The next lemma gives useful lower bounds on the expectations and . Recall that is the period that contains . Let denote the sub-interval of that includes all arrivals prior to (and including) . We let i.e., denotes the set of all resources that are available at some point of time in interval .

Given such that , we have,

1. .

Since every resource is matched at most once within each period, the bounds in Lemma 3 are quite similar to their counterparts in the classic OM setting where resources are matched at most once (Devanur et al. 2013). For a proof, see Appendix 6.

Given such that , we have, .

###### Proof.

Proof. Given , we have that is unavailable from the start of period until (at least) time . Thus, is matched in period to an arrival such that .

Let denote the last arrival in the interval . Since , we have . From (2), given match , we increase the value of by . Thus,

 t(d)∑τ=tθiτ(y1,y2)≥θit′(d)(y1,y2)≥rig(y1)∀y2∈[0,1].

Notice that Lemma 3 and Lemma 3 apply to mutually exclusive scenarios. Lemma 3 applies to the case where returns in period at some time prior to arrival of (). On the other hand, Lemma 3 considers the case where is in use during the initial part of period until at least time (). For every , we are in one of the two scenarios. Lemma 3 gives a sharp characterization of the set of values that lead to each scenario. Consider a value such that for , is matched to some arrival, say , in period . Then, for every , is matched in period to arrival or an arrival that precedes it.

###### Proof.

Proof. Recall that except and , all seeds are fixed. The value of does not affect the output of PR in periods prior to . Similarly, the value of does not affect the matching prior to period . Since every resource can be matched at most once during a single period, when , is the unique arrival matched to during period .

Now, let and consider the change in the matching during period as we vary in the interval . Suppose there exists a value , with , such that is not matched prior to in period (if no such value exists, we are done). Then, for , is available at and the matching prior to is identical to the matching when . Hence, the set of resources available at is identical for both values of . Since, (by monotonicity of function ), must be matched to when . This completes the proof. ∎

There exists values , such that and,

1. and is not matched to any arrival in period .

2. and is matched to some arrival in period .

3. and is matched to some arrival in period .

###### Proof.

Proof. Recall that denotes the set of all resources that are available at some point of time in interval . Observe that the value of does not influence the scenario i.e., whether is in (or not in) .

Let be the highest value such that for , is matched in period . Set is no such value exists. From Lemma 3, we have that for every , will continue to be matched in period and in fact, to (possibly) earlier arrivals. Thus, there exists a unique threshold such that is matched in period for every , and unmatched in period for every . Note that if is unmatched in period , then . This scenario corresponds to part of the lemma.

Next, let be the highest value such that for for . In other words, returns from its match in in time to be available at some arrival in . Set is no such value exists. From Lemma 3, for every , is matched (possibly) even earlier in period . Therefore, for every . This corresponds to part of the lemma.

Finally, by definitions of thresholds and , when , we have that is matched in period but . This corresponds to part and completes the proof. ∎

The following statements are true.

1. For every , we have .

2. For every , we have .

###### Proof.

Proof. From Lemma 3, we have that for every , resource is unmatched in period . Therefore, with fixed at 1, the matching output by PR is identical for every value of . This proves part .

Let denote the reduced price of the resource matched to arrival in the matching . Set if is unmatched. To prove part , fix an arbitrary value and consider the matching . From Lemma 3, we have that is matched in period but returns prior to arrival . Let denote the arrival matched to in period and let . Observe that every arrival , is in the interval . Thus, every resource is matched to at most one arrival in . Now, given that is strictly increasing in (for ), to prove , it suffices to show that . Note that for every . Therefore, when the reduced price of arrival matched to is 0, same as, if the arrival were unmatched. Combining this observation with the fact that PR matches each arrival greedily based on reduced prices, follows from,

 Sτ(1,1)∖{i}⊆Sτ(y1,1)∀y1∈(z1,z2),τ∈T′. (4)

We prove (4) via induction over the set . The first arrival in is . From Lemma 3, prior to , resource is not matched to any arrival in period in the matching . Thus, and are identical prior to and . Now, suppose that (4) holds for all arrivals . We show that (4) holds for arrival as well.

For the sake of contradiction, suppose there exists a resource . Recall that for all . Thus, is matched to arrival in , where . Since every resource is matched to at most one arrival in , we have, . Thus, i.e., in , resource is available but not matched to . This contradicts the fact that PR matches greedily based on reduced prices. ∎

Let and . There exists values with , such that

 Ey1,y2⎡⎣λt(y1,y2)+t(d)∑τ=tθiτ(y1,y2)⎤⎦ ≥ri[G(z2)−G(z1)+(1−g(yct(1)))(1−z1)+(1−z2)(G(yct(1))−G(0))+z1(1−g(0))].
###### Proof.

Proof. Observe that for any random variable derived from , we have,

 Ey1,y2[X]=(1−z2)Ey1,y2[X∣y1>z2]+(z2−z1)Ey1,y2[X∣y1∈(z1,z2)]+z1Ey1,y2[X∣y1

We prove the main claim by establishing lower bounds on each of the three terms on the RHS.

Case I: . From Lemma 3, when , is not matched in period . Thus, i.e., in this case, the matching does not change with . From Lemma 3 and Lemma 3, we have

 λt(y1,y2)≥ri(1−g(yct(1)))∀y1∈(z2,1],y2∈[0,1].

Taking expectation over randomness in , we have

 Ey1[Ey2[λt(y1,y2)∣y1>z2]]≥Ey1[ri(1−g(yct(1)))]=ri(1−g(yct(1))).

Finally, from Lemma 3 and Lemma 3, we have,

 Ey1⎡⎣Ey2⎡⎣t(d)∑τ=tθiτ(y1,y2)∣y1>z2⎤⎦⎤⎦ ≥ Ey1[Ey2[1(y2z2], = Ey1[ri∫yct(1)0g(x)dx∣y1>z2], = ri(G(yct(1))−G(0)).

Case II: . In this case, is not available in period prior to arrival and the value of does not affect the matching until after arrival . From Lemma 3, we have,

 Ey1,y2⎡⎣t(d)∑τ=tθiτ(y1,y2)∣y1∈(z1,z2)⎤⎦≥ri∫z2z1g(x)dx=ri(G(z2)−G(z1)).

From Lemma 3, we have

 Ey1,y2[λt(y1,y2)∣y1∈(z1,z2)]=Ey1[λt(y1,1)∣y1∈(z1,z2)]≥ri(1−g(yct(1))).

Case III: . In this case, . From Lemma 3, we have

 Ey2[λt(y1,y2)∣y1

From part of Lemma 3,

 Ey2⎡⎣t(d)∑τ=tθiτ(y1,y2)∣y1

Combining (5) and (6), we have,

 Ey2⎡⎣λt(y1,y2)+t(d)∑τ=tθiτ(y1,y2)∣y1

The first equality uses the fact that , where is some constant. The second equality follows from the fact that , is a non-decreasing function of for . Thus,

 Ey1⎡⎣Ey2⎡⎣λt(y1,y2)+t(d)∑τ=tθiτ(y1,y2)∣y1

###### Proof.

Proof of Theorem 1. Let

 f(z1,z2,x)=G(z2)−G(z1)+(1−g(x))(1−z1)+(1−z2)(G(x)−G(0))+z1(1−g(0)).

We show that, for . Then, using Lemma 3 completes the proof. First, using the fact that , where is some constant, we have,

 f(z1,z2,x) = 1β(g(z2)−g(z1))+(1−z1)(1−g(x))+1−z2β(g(x)−g(0))+z1(1−g(0)) (7) = 1β(g(z2)−g(z1))+1−g(0)β(1−z2+βz1)+g(x)β(1−z2+βz1−β)

To find the minimum of this function, consider the following cases.

Case I: . In this case, (7) is minimized at . Thus,

 f(z1,z2,x) ≥ 1β(g(z2)−g(z1))+1−g(0) ≥ 1−g(0)=1−e−β,

where we used the fact that for and .

Case II: . In this case (7) is minimized at . Thus,

 f(z1,z2,x) ≥ 1β(g(z