## 1 Introduction

Online bipartite matching problems and its variants and generalizations have been intensely studied in the last two decades [KVV90, MSVV05, DJK13, MP12, MWZ15, AGKM11, GM08, M13, DH09, CHN14, MGS12, KMT11, FMMM09, BM08]. Internet advertising is a major application domain for these problems, along with Crowdsourcing [HV12, KOS14], resource allocation [CF09, MSL18, GNR14] and more recently personalized recommendations [GNR14]. Despite a rich body of work on these problems, there are several basic questions that remain open. In this work, we study one such problem.

In the classical online bipartite matching problem introduced by Karp et al. [KVV90], we have a graph where the vertices in , which might correspond to resources in allocation problems, advertisers in Internet advertising, and tasks in Crowdsourcing, are known in advance and and vertices in , also referred to as *arrivals* (corresponding to customers, ad slots, workers), are sequentially revealed one at a time. When a vertex arrives, the set of edges is revealed. After each arrival the (online) algorithm must make an irrevocable decision to offer at most one (available/unmatched) vertex with an edge to , with the goal of maximizing the total number of matches. The performance of the algorithm is compared against an optimal offline algorithm which knows all edges in advance. In particular, let denote a sequence of arrivals, the optimal achievable by an offline algorithm and the expected value achieved by a (possibly randomized) online algorithm . The goal is to design an algorithm that maximizes the competitive ratio^{1}^{1}1Note that we compare against a non-adaptive adversary and thus take the minimum over all fixed arrival sequences. This is a standard assumption in the literature on online matching.,

Karp et al. [KVV90] proposed and analyzed the RANKING algorithm, which attains the best possible competitive ratio of for online bipartite matching (see [BM08, GM08, DJK13] for a corrected and simplified analysis). This was generalized to vertex-weighted matchings in Aggarwal et al. [AGKM11]. While this gives a more or less complete picture for the classical setting we described above, in all the applications we mentioned there is in fact a non-zero probability that any given match might fail. For instance, in Internet advertising users might not click on the ad and an advertiser pays only if the user actually clicks i.e. the match succeeds. Similarly in personalized recommendations, profit is earned only if the customer actually buys the offered product. Motivated by this observation, Mehta and Panigrahi [MP12] introduced a generalization of the problem where the objective is to maximize the number of “successful” matches. They associate a probability with every edge, for edge , which is revealed online along with the edge. When an algorithm makes a match, the arrival accepts (*successful* match or reward) with probability given by the corresponding edge and the success of a given match is independent of past ones. The outcome of the stochastic reward is revealed only after the match is made and the objective is to maximize the expected number of successful matches. A natural generalization of this problem allows each resource (vertex in ) to have a possibly different reward (vertex weighted rewards)^{2}^{2}2If rewards are allowed to depend on both the arrival and resource the problem becomes much harder. It was shown in [AGKM11] that in this case no algorithm with competitive ratio better than is possible.. This is the setting we study.

The introduction of stochastic rewards to the setting of online matching raises an interesting question regarding the nature of the offline algorithm one should compare against. In addition to knowing all edges and their probabilities in advance, should the offline algorithm know the outcomes of the random rewards too? Unsurprisingly, such a benchmark turns out to be too strong for any meaningful bound, so Mehta and Panigrahi [MP12] introduce a more interesting benchmark via the following LP (for the case of unit rewards, ),

As highlighted in Mehta and Panigrahi [MP12] and Mehta et al. [MWZ15], this benchmark is not subject to the stochasticity of rewards and allows us to evaluate the impact of both the online as well as stochastic aspect of the problem. While it is possible to get competitive algorithms in even more general settings using deterministic greedy or even non-adaptive policies (see [GNR14, MWZ15]), beating has proven to be a major challenge. Progress has been made in special cases with remarkable new insights. In particular, Mehta and Panigrahi [MP12] found that with identical rewards and identical probabilities i.e., and : Assigning arrivals to available neighbors with the least number of failed matches in the past (called StochasticBalance) is 0.567 competitive as . Further, no algorithm has a competitive ratio better than . For the same special case, they also showed that RANKING is 0.534 approximate for vanishing . When the probabilities are heterogeneous but vanishingly small, Mehta et al. [MWZ15] gave a 0.534 competitive algorithm (called SemiAdaptive). No results beating are known to us when rewards are not identical, even for identical probabilities.

While the above work focuses on comparing against , other benchmarks are also used in closely related literature. For instance, a natural comparison could be with algorithms that know the arrivals, edges and edge probabilities in advance but the outcome of a match is not revealed a priori and the algorithm must adapt to each outcome in real time. Such *clairvoyant* algorithms appear to be an oft-used alternative benchmark [GNR14, MSL18, CF09]. Let denote the expected reward of the best clairvoyant algorithm. While it can be difficult to understand the performance of such algorithms, observe that an upper bound on is easily given by (see [MSL18, GNR14] for a formal proof), and the latter is very crisply stated. Moreover, in absence of the stochastic component (all probabilities one), the two benchmarks converge and we have . As a result, even when evaluating competitive ratios against clairvoyant algorithms one often chooses to use the upper bound . To the best of our knowledge there are no better upper bounds known on in related literature.

Roadmap: In the next section, we summarize our results and techniques used followed by further discussion of related work. In Section 1.3, we discuss an obstacle to using the randomized primal-dual framework of Devanur et al. [DJK13], for the stochastic rewards problem. This motivates the consideration of our path based program in Section 2, where we formally state and prove our main results. Finally, in Section 3 we conclude with a review of some relevant open problems.

### 1.1 Our Results

Consider the following natural generalization of the RANKING algorithm.

*Remarks:* When this is exactly the vertex-weighted version of RANKING, shown to be competitive for the deterministic rewards case in Aggarwal et al. [AGKM11] and Devanur et al. [DJK13]. Additionally, the algorithm is adaptive as it takes into account the outcome (success/failure) of every match (refer [MP12, MWZ15] for detailed discussions on adaptivity/non-adaptivity).

Recall, in the classical vertex-weighted RANKING algorithm the values induce an ordering over the vertices , independent of arrivals [DJK13, AGKM11]. Therefore, given any two resources and an arrival with an edge to both, RANKING prefers to offer over whenever , regardless of . This fixed ordering/ranking is a key component in the analysis of RANKING for the classical case. However, in our setting Algorithm 1 may prefer over for arrival but vice versa for another arrival due to different values of . Yet, if decomposes into for all , we retain the arrival independent ordering. Similar special cases in the context of online matching (with deterministic edge weights) and related machines model in scheduling have been previously considered (see [CHN14] for a detailed discussion).

###### Theorem 1.

For decomposable probabilities , Algorithm 1 achieves the best possible competitive ratio of w.r.t. clairvoyant algorithms.

We remark that the case of decomposable probabilities (with otherwise arbitrary magnitudes) includes the case of identical probabilities for which Mehta and Panigrahi [MP12] showed that there is no competitive algorithm when comparing with . In contrast, Theorem 1 shows that one can achieve a guarantee of when comparing against . For the case of arbitrary probabilities , it is not clear if Algorithm 1 yields a guarantee better than and we leave this as an intriguing problem for future work. The analysis primarily breaks down due to the lack of a unique preference order across arrivals (Algorithm 1 may prefer over for arrival but vice versa for another arrival due to non-decomposable values ). Interestingly, if we additionally assume that the probabilities are vanishingly small (the case in [MWZ15]) i.e., probabilities are fully heterogeneous but for every , we can overcome this obstacle. This is accomplished by *fudging* the duals through adding an auxiliary (non-negative) term to guarantee dual feasibility. Of course, making the dual larger will in general always ensure feasibility, but possibly at the cost of making the dual objective incomparably large. In this case however, we can ensure that our auxiliary dual variables do not increase the dual objective by a significant factor. More formally, we show the following.

###### Theorem 2.

Our theorem requires probabilities to vanish at a rate of , and this matches the rate required in Mehta et al. [MWZ15], who also look at the same special case. We discuss this further in Section 2.2.1.

Overview of our techniques: In an effort to address the hurdle outlined in Section 1.3 and get a tighter bound on , we introduce a new path based program that has a constraint for every possible sample path instance of the stochastic reward process along with constraints that enforce the independence of clairvoyant’s decisions from the future outcomes on the sample path. Then, we establish a form of weak duality to compare the value of our online algorithm with the optimal value of this program. Combined, this allows us to leverage the general and powerful randomized primal-dual framework from [DJK13] to prove Theorem 1. To show Theorem 2, we define our dual variables in a novel way. Specifically, we introduce an auxiliary term into our dual variables. This guarantees feasibility and by being careful we can prevent the dual objective from blowing up for small probabilitites. Thus, the idea of path based formulation and a novel dual fitting allows us to extend the reach of the randomized primal-dual framework to the setting of stochastic rewards. In particular, we believe that the notion of path based programs might be useful in other related settings that involve stochastic components and where one seeks to compare against clairvoyant algorithms.

Competitive Ratio (Against Clairvoyant)

Identical rewards | Vertex-weighted rewards | ||
---|---|---|---|

[GNR14] | |||

General | 1/2 [MWZ15] | 1/2 [GNR14] |

### 1.2 Related Work

Karp et al. [KVV90] introduced the online bipartite matching model and proposed the optimal competitive RANKING algorithm. Birnbaum and Mathieu [BM08], Goel and Mehta [GM08] considerably simplified the original analysis. Subsequently, Devanur et al. [DJK13] gave an elegant randomized primal-dual interpretation that we also use here. Their framework applies to and simplifies more general settings, such as vertex-weighted matchings in Aggarwal et al. [AGKM11], and the related budgeted setting (AdWords problem of Mehta et al. [MSVV05]). There has also been a series of results in the random arrival model, where there is distributional information in the arrivals that can be exploited for better results [BSSX16, DH09, MGS12, KMT11, FMMM09]. For a detailed survey we refer the reader to the monograph by Mehta [M13].

Other settings closest to ours have been considered in Golrezaei et al. [GNR14] and Ma and Simchi-Levi [MSL18]. Golrezaei et al. [GNR14] consider a broad generalization of our setting where one offers an assortment of products to each arrival, and the arrival then chooses based on a *choice model* that is revealed online at the time of arrival. With the objective of maximizing total expected reward, they show that when the number of copies (inventory) of each resource approaches an *inventory balancing* algorithm is asymptotically competitive. They seek to compare against clairvoyant algorithms however, for the purpose of analysis they use the upper bound for a suitably generalized LP. Nonetheless, they achieve the best possible competitive ratio asymptotically. However, for the case of unit inventory^{3}^{3}3 Guarantees for the case of unit inventory leads to stronger results that generalize to the case of arbitrary inventories. Consider a unit inventory setting where in place of each in the original setting, we have resources each with inventory of 1 and arrivals that have edges to all vertices for every edge in the original instance. Now the offline/clairvoyant algorithm knows all arrivals in advance and thus knows these copies represent the same resource. Therefore, remains unchanged and an algorithm for the unit inventory case can be used for arbitrary inventory levels without loss in guarantee. their guarantee converges to . Recall that beating is an open problem even in our setting, which is a special case of theirs. Note that their asymptotic result also implies a deterministic competitive algorithm for our setting when we have infinitely many copies of each vertex . More recently, Ma and Simchi-Levi [MSL18] also studied a generalization of our setting in the resource allocation framework, where each resource can be sold at multiple rewards rates with possibly different probabilities of successful reward for each rate. Similar to Golrezaei et al. [GNR14], they focus on the asymptotic regime where the inventory of each resource approaches and give the optimal competitive ratio against clairvoyant algorithms, using as an upper bound on for analysis.

### 1.3 Preliminaries

Devanur et al. [DJK13] introduced a unifying primal-dual framework for understanding and analyzing algorithms for online matching. Given the LP for the stochastic rewards setting, a natural approach would be to explore a similar primal-dual algorithm and analysis. At the outset it might even seem that Algorithm 1 offers a natural extension, so the analysis might generalize directly. Yet there are certain obstacles on this path and previous work explores various novel approaches instead [MP12, MWZ15]. Let us understand one such hurdle from the context of the framework in [DJK13]. We start with the dual of the LP,

In line with [DJK13], when the algorithm offers vertex to arrival : if the match succeeds let us set dual variables to and and we let the variables be zero otherwise. The sum , clearly captures the reward of the algorithm. Further, if the setting also ensured that the dual constraints were satisfied to within a constant factor, we would have a corresponding competitive ratio guarantee for Algorithm 1. To that end, consider expectations over the success of the match . We have, and . While for a specific set of values , this assignment of dual values may not be close to feasible, in the deterministic case [DJK13] showed that the dual constraints are feasible to within a constant factor in expectation over the values i.e., . However, in case of stochastic rewards such a setting of duals need not be feasible to within any constant factor even in expectation (over values ). For instance, consider the dual constraint corresponding to match in our example,

Where is an indicator variable that is 1 if edge succeeds. Taking expectation over the outcome of the match with fixed we get,

For small , this approaches . So if is matched to only for and is otherwise unmatched we have, . Part of the problem seems to be that the formulation only insists that each resource is used at most once in expectation, giving rise to a term as we saw above. Alternatively, similar to [GNR14] we could define the duals to guarantee feasibility. For instance, we can let be the same as above but set (regardless of success/failure of the match). Now, . However, the sum can be much larger than the expected reward of the algorithm for match i.e., . The latter can even approach 0 for . It is not obvious (to us) if one can overcome these hurdles while still considering an expectation based LP. Our path dependent formulation circumvents this problem by imposing constraints on every sample path as opposed to in expectation over all paths.

### 1.4 Notation

We now review and introduce new notation before proceeding with a formal presentation of the results in the next section. First, recall the problem definition. We have a bipartite graph with a set of offline vertices and an arbitrary number of vertices arriving online. We use the index of arrivals to also denote their order in time. So assume vertex arrives at time . Now, all edges incident on vertex are revealed when arrives, along with a corresponding probability of success . On each new arrival, we must make an irrevocable decision to match the arrival to any one of the available offline neighbours. Once a match, say , is made it succeeds with probability , making unavailable to future arrivals and leading to a reward . The objective is to maximize the expected reward summed over all arrivals. Recall that denotes the expected reward of the best clairvoyant algorithm.

For edge , let

be an indicator random variable that takes value one w.p.

. Let denote a sample path given by an instance of stochastic rewards over all arrivals, and let represent the partial sample path described by up to and including arrival . In other words, determines values of random variables , for all edges incident on arrivals . Also, we define to be consistent with for every , meaning both represent the same path up to arrival . Let and represent the corresponding random variables. Let denote the value of random variable on sample path . Finally, let denote the universe of all sample paths (full and partial). Note that the number of (full) sample paths is at most for .## 2 Main Results

We analyze Algorithm 1, which is equivalent to the vertex weighted version of the RANKING algorithm with rewards/revenues replaced by for arrival , and show a competitive ratio of in expectation against a non-adaptive adversary.

As we discussed in Section 1.3, considering an upper bound on the value of the clairvoyant via an LP that only imposes constraints in expectation raises several issues. So consider sample paths , and for every edge , let represent the decision of clairvoyant on whether is matched to on this sample path. Clearly, the following must be satisfied on every sample path ,

(1) | |||

(2) |

Constraints (1) capture the fact that any resource is used at most once on every sample path. This is in contrast to the LP earlier, where this condition was imposed only in expectation over all sample paths. Similarly, constraints (2) capture that is matched to at most one vertex on every sample path. Recall that we assume the clairvoyant knows all edges and edge probabilities in advance but not actual outcomes of future matches. In fact, when deciding the match for arrival we can let clairvoyant have access to values , for all edges incident on arrivals . Knowing if other matches in the past would have been (un)successful does not give any additional useful information to the clairvoyant, due to the irrevocability of decisions and independence of rewards across arrivals. However, must be independent of the edge random variables for all edges revealed in the future i.e., . In other words, we must have,

(3) |

), we now formulate a linear program with these constraints and the objective of maximizing the total expected reward.

s.t. | ||||

Since a clairvoyant algorithm must satisfy all constraints in the program, the values generated by executing any clairvoyant algorithm over sample paths , yield a feasible solution for PBP. Let us simplify the objective of PBP, refer to the optimal objective value of PBP as and consider the following,

Where the first equality follows form the tower property of expectation, the second from condition (3), the third equality follows from the independence of each reward from past rewards, and the final equality also from condition (3).

*Remarks:* Note that, given a random seed for a randomized clairvoyant algorithm, variables corresponding to the output of the clairvoyant on sample path , are binary. However, can be fractional. Further note, in the deterministic case where all edge probabilities are one, PBP is equivalent to the classical LP.

The lemma that follows establishes a weak duality result which lets us upper bound using suitable dual values. We could of course, take a dual directly. However, Lemma 1 allows a very natural dual fitting in the next section, since it avoids the additional set of dual variables that result from equalities 3. The interested reader may refer to Appendix A for further discussion on proceeding directly via LP duality.

###### Lemma 1.

Consider non-negative variables satisfying,

for every edge . Then for every feasible solution for PBP,

Therefore, .

###### Proof.

Fix an arbitrary feasible solution for PBP, then for every edge and sample path we have,

Where we multiplied inequality (2) for by and inequality (1) for by and used non-negativity. Following the standard procedure for LP duality, let us sum over all for a fixed ,

Next, taking expectation (convex combination of linear constraints) over sample paths and using we have,

Where we used the tower property of expectation and condition (3) to get the final inequality. ∎

The above lemma implies that we need to set dual variables such that the ‘dual constraint’,

is satisfied for every edge in expectation, conditioned on knowing the success failure of all edges before . Therefore, in analyzing the dual feasibility for any edge in the next section, we fix a sample path and work in expectation over the success/failure of edges incident on arrival .

### 2.1 Decomposable Probabilities ()

Given Lemma 1, we now aim to find dual variables and such that,

(4) |

and for every ,

(5) |

Equation (4) requires that the expected sum of dual variables matches the expected reward of Algorithm 1, where expectation is over the stochastic rewards and random variables (samples denoted using ) generated by the algorithm. Inequalities (5) require that the dual feasibility condition required by Lemma 1 is met to within a constant factor. If we can find such variables, then and would satisfy the condition in Lemma 1 resulting in a proof of competitiveness for Algorithm 1. In what follows, we will find such a setting of duals with .

To ease comparing the performance of Algorithm 1 to clairvoyant, we use the following coupling for the analysis. A sample path for the stochastic rewards is randomly sampled. Instead of using different sample paths for clairvoyant and Algorithm 1, we use the same path for both. So whenever a match is made by either algorithm, we use the corresponding variable from to see if it is successful. So w.l.o.g., both algorithms are subject to the same values .

Consider the following process for setting duals. Initialize variables and to 0. Consider the online arrival process and execute Algorithm 1 for fixed . Let be the sample path of the stochastic rewards experienced by the algorithm. Now, whenever the algorithm offers to set,

(6) |

Clearly, is set uniquely since the algorithm offers at most one to arrival , and takes a non-zero value only if it is also accepted by some , and if this occurs is never re-set. Taking expectation over we define our dual variables as,

First, note that if algorithm matches to we have, . Therefore, the expected total reward earned by Algorithm 1 is given by, , as desired. So this setting of dual variables satisfies condition (4). Moreover, for all edges that are output by the algorithm for a given we have, . Thus, we do not face the issue highlighted in Section 1.3, where setting dual variables similar to above for the expectation based LP led to a term in the dual constraints.

In the rest of this section, we focus on showing that inequalities (5) are satisfied for every edge (i,t), with , as long as the probabilities are decomposable. We are interested in conditional expectations where the values of variables are fixed for all . So we proceed as above by fixing an arbitrary sample path up to time and perform the rest of the analysis conditioned on this path. So all expectations for the rest of this section are conditioned on , and we suppress this in the notation. Also for notational convenience, we write simply as and similarly, as . Also, expectation is written simply as , but we continue to use to distinguish the two. Now consider the following proposition.

###### Proposition 1.

Fix an edge and sample path up to (and including) arrival i.e., . Also fix (arbitrary) values for , denoted . Then for duals set according to (6), it suffices to show,

Where the expectation is conditioned on , and we ignored from sub/superscript.

###### Proof.

For every edge , . ∎

So with fixed arrival sequence, fixed , and fixed , we wish to show the above inequality for an arbitrary edge . Similar to the analysis for the classical case in [DJK13], our proof is broken into showing lower bounds on and . The bound on will be similar to Lemmas 1 and 2 in [DJK13]. In contrast, the bound on has an interesting subtlety absent in the classical case, owing to the fact that only some matches might be successful on (fixed) sample path . We discuss this further in the proof of Lemma 3.

In order to proceed with the proofs, consider the matching given by Algorithm 1 when it is executed with the reduced set of vertices . Unlike the deterministic case, here may have one offline vertex ‘matched’ multiple times (though only one match could have actually succeeded). Let denote the set of available neighbours at in this execution of Algorithm 1 without vertex . Define such that . Set if no such value exists. Due to the monotonicity of , if a value of exists it will be unique.

###### Lemma 2.

With fixed, for every . Thus, .

###### Proof.

Let be the set of available neighbours at when Algorithm 1 is executed with the full vertex set and value for . It suffices to show that for every , , which follows directly from . So fix and suppose is the first arrival is offered to by the algorithm. If , the output of the algorithm prior to arrival coincides with , and we have . When , the set of available matches for arrival is a superset (not necessarily strict) of the set . Inductively, this is true for every arrival after , including , giving us the desired. ∎

###### Lemma 3.

If the probabilities decompose such that for every , then for fixed we have, .

###### Proof.

We focus on the interval . There are two possibilities, either is successfully matched before and unavailable for or is available when arrives. We show that in the latter scenario the algorithm matches to . Let us first see how this proves the claim in the lemma. When is unavailable for , since was successfully matched to an arrival preceding . In case is available and thus matched to (by assumption), . Therefore, in both cases . Since this holds for all values of we have, .

To finish the proof we argue that for , is matched to if available. In the classical/deterministic case this follows directly from and the fact that could not have been offered to any arrival prior to . In our case, we still have . However, for some value of , may have been unsuccessfully matched to some arrival preceding , freeing up some () that was successfully matched to for larger . If is not matched/offered to any arrival preceding then the claim follows as in the deterministic case. If is matched (unsuccessfully) prior to and consequently is matched to some () such that , then consider the graph given by the difference between the current matching (with value ) and the matching , where is removed from consideration during the execution. On this difference graph, there exists a unique alternating path that includes both and . Using the decomposition of probabilities, for every edge on the alternating path we have, (note we may have ). In particular, and thus, . Therefore, if is available on arrival of , the algorithm matches to (since ties occur w.p. 0). While this completes the proof, observe that if there are no successful matches prior to in the matching , then is matched to (if available) for all for arbitrary (not necessarily decomposable) probabilities, and the claim in the lemma holds. This fact will be useful later. ∎

###### Proof.

∎

For general probabilities, let us see an example where Lemma 3 fails. Consider a 3x3 bipartite graph with arrival arriving first followed by and then . Let the vertices be labeled with edges , , . Let and consider probabilities , and . Note that such probabilities are not decomposable. We will focus on the dual feasibility of constraint corresponding to edge so we fix , with and consider the following sample path before arrives, and , . Observe that and consider and as varies. For , is offered to , is offered to and accepted by and is offered to therefore, and . For , is offered to but not accepted by and suppose is sufficiently smaller than so is offered to . Thus, but for . For , is offered to and accepted by , is offered to and accepted by and is unmatched with and . Combining all pieces we have . Clearly, Lemma 3 does not hold and the previous expectation can be . More concretely, let and be such that . Then for

Comments

There are no comments yet.