Graph connectivity is considered as an important metric on measuring the functionality of a network. Typically, the connectivity-related problems usually ask for the minimum-size set of components (nodes or edges) whose removal disconnects the target set of nodes. This consideration has led to the investigation of many forms of cutting problems in a network: e.g the minimum cut problem, the minimum multicut problem, the sparest cut problem (Vazirani, 2013) and the most recent work, the Length-Bounded Multicut (LB-MULTICUT) problem (Kuhnle et al., 2018a). In addition, various measures based on connectivity have formed the framework for assessment of network resilience to external attacks (Grubesic et al., 2008; Sen et al., 2009; Shen et al., 2013, 2012; Nguyen et al., 2013; Dinh et al., 2014; Dinh and Thai, 2015b, a; Pan et al., 2018; Dinh et al., 2010; Mishra et al., 2014).
However, many network applications now consider other factors when determining a network functionality in addition to connectivity. For example, in Bitcoin network, to guarantee synchronization, not only the network connectivity is required but a network is also configured in order to ensure the broadcasting time of transaction messages under several seconds (Apostolaki et al., 2017). As another example, consider a time-sensitive delivery on a road network, where edge weights represents the travel time between destinations. Connectivity between a source and a destination is insufficient when a guarantee on the delivery time is required.
Therefore, a natural question is whether a tech-savvy attacker can damage the network functionality without impacting the connectivity? Under various forms, this kind of attacks actually is common, yet stealthy. For example, in the I-SIG system, real-time vehicle trajectory data transmitted using the CV technology are used to intelligently control the duration and sequence of traffic signals (CvA, 2018; CVp, 2018a, b; Checkoway et al., 2011; Koscher et al., 2010; Mazloom et al., 2016; Chen et al., 2018). An adversary, therefore, can compromise multiple vehicles and send malicious messages with false data (e.g., speed and location) to the I-SIG system to impact the traffic control decisions. As reported by previous works, it has been shown that even one single attack vehicle can manipulate the intelligent traffic control algorithm in the I-SIG system and cause severe traffic jams (CvA, 2018; Chen et al., 2018). To understand the severity of such attack, it is necessary to study on which roads the attackers can target to and what is the minimum number of vehicles the attackers have to compromise to cause large-scale congestions, e.g. traveling from two certain locations takes several hours longer than usual. Such attack can be for political or financial purposes, e.g. blocking traffics of business competitors (CvA, 2018).
As another example, in Bitcoin network or any Blockchain-based applications, an attacker can target to damage the consensus between copies of public ledger of major miners by delaying block propagation between them. Recent works (Apostolaki et al., 2017) have shown that after receiving request for a block information from another node, a Bitcoin node can have up to 20 minutes to respond. An attacker, therefore, can flood the Bitcoin nodes with too many requests or “dust” messages to handle, thus delay their block delivery. By flooding multiple nodes, the attacker can disrupt miners to reach consensus on a certain state of Blockchain. The impact of this attack varies relying upon the victims. If the victim is a merchant, it is vulnerable to double spending attacks (Dou, 2018). If the victim is a miner, the attack wastes its computational power (Pinzón and Rocha, 2016). If the victim is a regular node, it will have an outdated view of the Blockchain, and thus more vulnerable to the temporal attacks which exploit the lagging in Blockchain synchronization (Pinzón and Rocha, 2016; Dennis et al., 2016). Therefore, it is necessary to study which nodes are critical and how the attacker should attack such nodes (e.g. how much bandwidth consumption) to impact the Bitcoin network functionality, e.g. causing major miners several hours to reach consensus.
With this motivation, we consider the Quality of Service Degradation (QoSD) problem. Given a directed graph representing a network, threshold and set of pairs in , the objective is to identify a minimum budget to increase the edge (or node) weights to ensure the weighted, shortest-path distance between each pair in is no smaller than . Intuitively, the goal of this problem is to assess how robust the network is; the greater budget to increase edge weights found, the more resilient the network is to the perturbation in terms of edge weights. In addition, the budget to increase weight of a edge in the solution provides an indication of the importance of this edge to the desired functionality.
In the context of network reliability, Kuhnle et al. (Kuhnle et al., 2018a) have recently studied a special case of our problem under the name LB-MULTICUT. Different to our problem, the objective of this problem is to identify a minimum set of edges whose removal ensures the distance between each pair of nodes is no smaller than . Directly adopting the LB-MULTICUT solutions to our QoSD problem is not feasible since most of those solutions exploited a trait that their problems can be formulated by Integer Programming and exhibit submodular behaviors. QoSD problem, on the other hand, is shown to be neither submodular nor supermodular, making QoSD more challenging to devise an efficient algorithm. Also, modern networked systems are increasingly massive in scale, often with size of millions of vertices and edges. The need for a scalable algorithm on large-scale networks poses another challenge for our problem. Motivated by these observations, the main contribution of this work are as follows.
We provide three highly scalable algorithms for our problem: Two iterative algorithms, and , with approximation ratio and respectively, where is a metric measuring the concave property of edge weight functions w.r.t a budget to increase edge weights, h is the maximum number of edges of a path connecting between a pair in , and is the number of nodes in ; and , a probabilistic approximation algorithm returning
approximation result with high probability, whereis the maximum degree of .
When the edge weight functions are linear w.r.t the cost to increase edge weight, we propose , a randomized rounding algorithm based on LP relaxation of the problem. provides approximation guarantee.
We extensively evaluate our algorithms on both synthetic networks and large-scale, real-world networks. All of our four algorithms are demonstrated to scale to networks with millions of nodes and edges in under a few hours and return nearly optimal solutions. Also, the experiments show the trade-off between our proposed algorithms in terms of runtime and quality of solution.
Organization. The rest of this paper is organized as follows. Section 2 reviews literatures related to our problem. In Section 3, we formally define the problem and discuss its challenges. The four solutions, IG, AT, SA and LR, are presented in Section 4, 5, 6 and 7, respectively. In Section 8
, we evaluate our algorithms, comparing to heuristic methods for the general case and to algorithms in(Kuhnle et al., 2018a) for the special case. Finally, Section 9 concludes the paper.
2. Related works
Relationship with Kuhnle et al. (Kuhnle et al., 2018a) Kuhnle et al. has studied the Length-Bounded Multicut Problem (LB-MULTICUT). The objective of this problem is to identify a minimum set of edges whose removal ensures the distance between each pair of nodes of a given set is no smaller than . LB-MULTICUT is a special case of QoSD where we restrict to two conditions: 1) the only way to increase an edge weight is making the weight greater than T and 2) the cost of doing so is uniform among edges.
Our QoSD problem is more general and realistic than LB-MULTICUT, as briefly discussed earlier. In the adversarial perspective, it is impractical to remove edges out of a network structure. Taking the I-SIG system as an example, the attacker can only damage the network functionality by compromising multiple vehicles, causing severe traffic jams on road network rather than physically damaging road lines. Furthermore, on the Bitcoin-based applications, the Bitcoin protocol only allows a maximum delay of 20 minutes for any packet delivery. For any damage of a P2P connection, the protocol creates another connection to guarantee the connectivity of Bitcoin network. Thus, the LB-MULTICUT cannot be applied on those two applications.
Other than the special case, LB-MULTICUT and QoSD are fundamentally different, thus solutions to LB-MULTICUT are not readily applied to QoSD. More specifically, Kuhnle et al. proposed three approximation algorithms for LB-MULTICUT, which are , , (Kuhnle et al., 2018a). We are going to discuss the limits of these algorithms w.r.t solving QoSD.
The general idea of MIA is to find the multicut of sub-graphs of the input network such that each optimal multicut is a lower bound of the optimal solution of LB-MULTICUT instance. In this solution, the authors exploit the similarity between LB-MULTICUT and the multicut problem where cutting an edge in a single path is sufficient to disconnect this path. With the multicut solution, MIA utilizes the approximation algorithm proposed by Agarwal et al. (Agarwal et al., 2007). Thus MIA’s performance guarantee is bounded by where is the number of considered subgraphs. Our problem does not require edge removals, so there is not clear connection with multicut. Therefore, we find it infeasible to apply MIA, even with modification, to solve our problem.
The next algorithm of LB-MULTICUT is TAG. In general, TAG is a dynamic algorithm, which uses a primal-dual solution to bound the worst-case performance under incremental graph changes and improves the solution in practice by periodic pruning. TAG utilizes the trait that cutting all edges, which are in the maximal set of disjoint paths connecting target pairs of nodes, is sufficient to disconnect those pairs. However, this solution may not be practical in our problem. Increasing weights of those edges to maximum does not guarantee the shortest paths, which connect target pairs of nodes, no smaller than the threshold .
The algorithm is a greedy, sampling-based solution with an approximation guarantee ( is the maximum number of edges of a single path connecting a pair in ), which holds with the probability of at least . Our algorithm is inspired by in that we also use a greedy approach based on path samples, generated by using probabilistic hints based upon shortest path computations to guide the sampling. However, since our objective function is non-submodular, we prove that an approximation guarantee of depends on , where measures the concave property of edge weight functions. Moreover, we boost the process of obtaining a feasible solution by allowing a finite budget of at most to be added on each step of sampling, where can be any number. We prove that does not impact the performance guarantee of .
Optimization on Integer Lattice. As there is a finite budget to increase the edge weight, we model our problem in a form of minimization problem on Integer Lattice: given a set of functions on the Integer Lattice, the objective is to minimize the cardinality of that for all . The optimization on the Integer Lattice has received much attention recently. However, most of those works focus on the maximization version, which asks for maximizing under a cardinality constraint . When is non-submodular, those works exploits either the submodularity ratio (Das and Kempe, 2011), generalized curvature (Bian et al., 2017) or the diminishing-return ratio (Kuhnle et al., 2018b; Lehmann et al., 2006) to devise approximation solutions with performance guarantee in terms of those parameters. However, the fact that those parameters can be small and computationally hard to obtain on several real-world objectives raises a concern on those theoretical approximation ratios. For example, Kuhnle et al. (Kuhnle et al., 2018b) proposed a fast maximization of Non-Submodular, Monotonic Functions on the Integer Lattice with approximation ratio for any . If or is , this ratio will be smaller than . In our work, we utilize the concave property of edge weight functions to introduce the concave ratio , which we use to prove the theoretical guarantee of and , and bound the sampling size of . can be found easily from the derivative of edge weight functions or scanning through all edge weight functions with time complexity. can be small in some cases, so we devise the solution from an improved algorithm, which discards the dependence on value to obtain better theoretical performance guarantee but a worse runtime in trade-off.
Classical Multicut Problem. The Multicut problem asks for the minimum number of edges (or nodes) whose removal ensures each pair in is topologically disconnected. For the edge version in an undirected graph, an approximation was developed by Garg et al. (Garg et al., 1996) by considering multicommodity flow. In directed graphs, Gupta (Gupta, 2003) developed an approximation algorithm, which was later improved to by Agarwal et al. (Agarwal et al., 2007). These solutions were based on the optimal solution of the linear relaxation modeling the problem instance. Our LR algorithm was inspired by this approach but we have to deal with the challenge that a LP-optimal value of each edge could be larger than 1. Therefore, any discretization technique of the Multicut problem cannot be directly applied to our problem. We have devised a randomized rounding technique on which we can obtain a feasible solution with high probability while ensuring an performance ratio.
3. Problem Formulation
In this section, we formally define the Quality of Service Degradation () problem in the format of cardinality minimization on the Integer Lattice and present challenges on solving .
We abstract the network using a weighted directed graph with nodes and directed edges. Each edge is associated with a function which indicates the weight of w.r.t a budget to increase weight of . In another word, if we spend on edge , the weight of edge will be . is monotonically increasing.
Let be the maximum possible budget to increase the weight of edge . Denote
is a vector whereis the budget to increase weight of the edge and similarly , we have . is called the box. The overall budget to increase weight of all edges is denoted by . Let be a set of edge weight functions. Note that, for simplicity, the notation is used to present an edge in and also the index of this edge, i.e. if we write , we mean the budget to increase the weight of edge (to ) and also the element in that is corresponding to . The same rule is applied with . Also, if we write (or ), we indicate the edge right next to on the left (right) in .
A path is a sequence of vertices such that for . A path can also be understood as the sequence of edges . In this work, a path is used interchangeably as a sequence of edges or a sequence of nodes. A single path is a path containing no cycles (i.e repeated vertices). Under a budget vector , the length of a path is defined as . We now formally define as follows:
Definition 1 ().
Quality of Service Degradation (). Given a directed graph , a set of edge weight functions, a box and a target set , determine a minimum budget such that under , the weighted, shortest-path between each pair in exceeds a threshold . A problem instance may be represented by the tuple
For each edge , let denote the initial weight of . In this work, we assume for all , which can be justified by the fact that most networks have positive costs associated with their edges, even when there is no interference from external sources (i.e., propagation delay in communication networks, processing delay in Blockchains).
Let denote a set of simple paths connecting the pair and for all . Let , we call a path a feasible path and is a set of all feasible paths in . Let , it is trivial that the number of edges of a feasible path is upper-bounded by . Denote .
Under , given a pair of nodes , if there exists no single path from to which satisfies , we call is separated from or the pair is separated by . Also, given a feasible path , if , we call is blocked by or blocks .
The problem can be formulated as the follows:
Note that even , this is not an Integer Program because may not be a linear function.
We can see this formulation as the cardinality minimization on the Integer lattice to satisfy multiple constraints. Before going further, we will look at several notations, mathematical operators on Integer lattice, which will be used along the theoretical proofs of our algorithms. Given , we have:
Moreover, we say if for all , the similar rule is applied to .
Let be a unit vector with the same dimension with , has value in the element and elsewhere. Therefore, we could also write . Table 1 summarizes all the notations we have so far.
Discussion. Given an instance of , the optimal solution can be obtained by formulating the problem as the following Integer Programming (IP):
where is an indicator variable which is if and otherwise. The first constraint (Eq. 6) is to guarantee the budget to increase weight of edge is a value in range and the second constraint (Eq. 7) is to ensure the length of each feasible path is at least . However, solving this IP is extremely expensive. Not only because solving IP is NP-hard (the performance is strongly dependent on which solver is used) but also listing all the paths for the second constraint is very expensive in practice since it requires in the worst case. Our algorithms are designed to be efficient even when is large and hence do not require a listing of or an optimal solution of the linear relaxation of this IP formulation.
Hardness and Inapproximability. Since LB-MULTICUT is a special case of , is NP-hard. Furthermore, any inapproximability result of LB-MULTICUT or the Multicut problem is also the inapproximability of QoSD. We summarize those results as follows:
Kuhnle et al. (Kuhnle et al., 2018a) Let . Unless , there is no polynomial-time algorithm to approximate within a factor of for any .
Lee et al. (Lee, 2016): When is fixed and initial edge weights are uniform, is inapproximable within a factor of assuming the Unique Games Conjecture.
Chawla et al. (Chawla et al., 2006): There exist no -approximation algorithm for unless .
Node version of the problem. The node version of the problem asks for the minimum budget to increase node weights rather than edge weights in the problem definition above. All our four algorithms can be easily adapted for the node version and keep the same theoretical performance guarantees.
|Input directed graph|
|Vertex and edge sets of , respectively|
|Number of vertices, edges in , respectively|
|d||The maximum degree of|
|The set of target pairs of nodes|
|The number of pairs in target set|
|The threshold on the path length|
|The weight function of edge w.r.t a budget|
|The set of all weight functions of edges in|
|The set of all feasible paths|
|The maximum number of edges of a path in|
|The maximum added cost in each iteration of SA|
|The budget vector, is the budget on edge|
|Unit vector, 1 in the element and elsewhere|
|The concave ratio of the function set|
|Bias parameter in the sampling of|
|Optimal solution to the problem instance|
|Size of optimal solution|
4. Iterative solution
There are two challenging tasks to solve the QoSD problem. The first one is the number of feasible paths could be extremely large, thus we need to avoid listing all the feasible paths as discussed earlier. The second challenge is that the objective function of QoSD can be non-submodular, depending on the edge weight functions. We handle the challenges via two different algorithms: Iterative Greedy (IG) and Adaptive Trading (AT). After the discussion of IG and AT, we provide the theoretical analysis and approximation guarantee of both algorithms.
To tackle the first challenge, instead of listing all feasible paths of the network, we build a set of candidate paths which is a subset of but blocking all paths in is sufficient to separate all pairs in . is built incrementally and iteratively. For each iteration, we find a budget vector to block all paths in . Then, we set the length of an edge to be . Next, we check whether is sufficient to separate all pairs in by checking whether there exists the shortest path of a certain pair in whose length is smaller than . If yes, then blocking all paths in is not sufficient to separate all pairs in ; we add all the shortest paths of pairs whose length has not exceeded into and continue to the next iteration. If no, then is sufficient to separate all pairs in ; we terminate the algorithm and return . The full algorithms is represented by Alg. 1.
Since the maximum number of edges of a feasible path could reach up to , the number of feasible paths of the network is upper bounded by . Because we guarantee there should be at least a feasible path is added into in each iteration (line 3 Alg. 1), the number of iterations in Alg. 1 is at most . This is a large number and comparable to the case if we tried to enumerate all feasible paths. However in experiment, we found that the number of iterations is much smaller even on large and highly dense networks.
Lemma 4.1 ().
The approximation guarantee of Alg. 1 equals to the approximation guarantee of the algorithm that finds to block all paths in
Since is a subset of all feasible paths in , the optimal solution to block all feasible paths is also a feasible solution to block all paths in . Therefore, the optimal solution to block all paths in is at most the size of the optimal solution of QoSD. Denote and as the optimal solutions to block paths in and respectively. Assume the algorithm in line 5 of Alg. 1 returns -approximation result. We have . And since finally is a feasible solution to our problem, then the output of Alg. 1 is within factor to optimal solution . ∎
Now let us discuss the the second challenge: how to block all paths in , line 5 of Alg. 1. To address this, we propose two algorithms, Greedy and Adaptive Trading. Before delving into the details of each algorithm, we introduce the parameter , which is used to measure the concave property of weight functions. would be utilized on performance analysis for our algorithms.
4.1. Concave property of weight functions
The concave ratio of a set of functions is defined as follows:
Definition 2 ().
(Concave ratio) The concave ratio of a set of non-negative functions is the largest scalar such that:
For all and
In our problem, the set of non-negative functions contains all weight functions of edges in . Therefore, for simplicity, we denote as the concave ratio of these set of weight functions. Now, we will utilize to get several useful exploration for our solutions. First, given a path and a vector , define:
Let be an arbitrary linear combination of for all . could be presented as follows:
Given a vector , define:
We have the following lemma.
Lemma 4.2 ().
Given two budget vectors where and a unit vector , we have:
Without lost of generality, we assume , a unit vector which has value at the element and elsewhere. We prove that: given a feasible path , the marginal gain of by is at least times the marginal gain of by .
By definition, the value of any budget vector cannot exceed . Also, if . Therefore, we consider three different cases: (1) ; (2) ; and (3) . All three cases guarantee . Since is a linear combination of , the lemma follows. ∎
Lemma 4.3 ().
Given three budget vectors where we have:
Let where is a unit vector, we have:
which completes the proof. ∎
Note that . A budget vector is sufficient to block all paths in iff for all . Therefore, to block all paths in , we find the minimum such that:
In the next subsections, we devise two approximation algorithms to find such and provide their performance guarantees.
4.2. Iterative Greedy algorithm
The first algorithm to block all paths in is the iterative greedy algorithm (IG). The general idea is that: we iteratively add a unit vector into , which maximizes the marginal gain , until is sufficient to block all paths in . Hence, the final overall budget () is equal to the number of iterations of the algorithm. IG is fully presented by Alg. 3.
However, the objective function is neither submodular nor supermodular w.r.t . If each edge weight function is concave, exhibits a submodular behavior. On the other hand, if each weight function is convex, then can be much more than the sum of values of unit vectors constituting , which is a supermodular behavior. The non-submodularity of means that the returned by IG may not have an approximation ratio. Actually the concave ratio plays an important role on the performance guarantee of IG, which is proved theoretically by Theorem 4.4 and would be further illustrated in the experimental evaluation.
Theorem 4.4 ().
IG returns a solution within factor of the optimal solution for blocking all paths in .
Denote as an optimal solution to the QoSD instance (). Denote as our obtained solution before the iteration in Alg. 3. The key of our proof is that: the gap between and will be reduced after each iteration by a factor at least . To be specific:
This was proved by using the property of concave ratio from lemma 4.2 and the greedy selection.
Furthermore, since there should exist at least a feasible path such that before the final iteration of the algorithm, we prove that the number of iterations is upper bounded by . The theorem follows as the number of iterations is equal to . ∎
4.3. Adaptive Trading algorithm
The concave ratio of the edge weight functions could be very small if the weight functions are convex, which makes the approximation guarantee of IG undesirable. Therefore, in this section, we propose a solution whose performance guarantee does not depend on the concave ratio . We name this algorithm Adaptive Trading (AT).
The algorithm still works in the iterative manner and terminates only when the desired is found, but different from IG on how the solution is improved in each iteration. To be specific, in each iteration, the algorithm finds an amount of additional budget to increase the weight of an edge such that maximize the ratio between the increasing amount of and the additional budget. Therefore, in each iteration, the additional budget could be bigger than . To find such amount, the simplest way is to scan through all possible amounts of additional budget of each edge. Note that the maximum budget which can be added to increase weight of edge is upper bounded by . Therefore, the computation complexity in each iteration of is upper bounded by . Denote as a vector where the element corresponding to edge has value and other elements are . AT is fully presented in Alg. 4 and its approximation guarantee is provided by Theorem 4.5.
Theorem 4.5 ().
AT returns a solution within factor of the optimal solution for blocking all paths in .
Denote as our obtained solution before the iteration in Alg. 4. Let be an optimal solution which is in addition to to block all paths in . Denote . Trivially, and . Let be a vector we add into solution in the iteration. The key of our proof is that the following inequality is always guaranteed after each iteration.
for any . This is proved by utilizing the monotonicity of w.r.t and the trait that the selection of our algorithm ensures for any and .
Furthermore, the Eq. 14 helps us to prove that: the gap between and will be reduced after each iteration by a factor at least . To be specific:
since there should exist at least a feasible path such that before the final iteration of the algorithm, utilizing Cauchy theorem (cau, 2018), we bound the budget by . Since , the theorem follows. ∎
5. Sampling Approach
In this section, we introduce a sampling solution SA to QoSD which has approximation guarantee with probability at least where are arbitrarily small numbers. SA runs in polynomial time when the parameter is fixed.
We define a blocking metric of a budget vector as follows
It is trivial that blocks all pairs in iff .
In essence, SA attempts to minimize while ensuring . To do so, SA works in the greedy manner as follows: in each iteration, SA finds a budget vector , , to add into which maximizes . Rather than an expensive listing of
, an estimator is employed by path sampling procedure to find the vector. This process is repeated until the budget vector is sufficient to block all paths in . SA is fully presented in Alg. 5.
Since we will not list , the questions now are (1) how to estimate ; and (2) how many sample paths should be generated to bound the error between the estimator of and its actual value. In sub-section 5.1, we define the estimator employed in each iteration of Alg. 5. We provide the approximation guarantee of greedily selection on sub-section 5.2. Sub-section 5.3 provides the lower bound on the number of sampling paths to bound the error. We then put all the results together to obtain the performance guarantee of SA.
Let an instance of QoSD be given. Denote as a set of all single paths in . For each , define:
It is trivial that . Inspired by the estimation on the number of paths in a graph (Roberts and Kroese, 2007), we define the estimator of
in the following way: Given a probability distributionon such that for all . Let be a set of paths samples from , could be estimated by
Lemma 5.1 ().
is an unbiased estimator of
is an unbiased estimator of
To sampling paths, we utilize the following biased, self-avoiding random walk sampling technique, which was once proposed by Kuhnle (Kuhnle et al., 2018a). First, we randomly select a pair from and put into the sample path
. Considering in a certain moment,( is called a tail node of at this time). The NeighborSelection procedure would select a node among the out-going neighbors of to add into . The selection is as follows: Let be the shortest-path tree directed towards . Let be the parent of in . If , then the next node we add into is . If , we select with probability and the other nodes in with probability of . If , we select the next node uniform randomly among . The sampling procedure ends when we meet the node or the length of exceeds . With the path-sampling procedure defined, given a path , we could easily find . Also, for all . The sampling technique is fully presented in Alg. 6.
5.2. Greedy selection on the estimator
Having defined the estimator and the path sampling procedure, we now find the budget vector , , to maximize . is found in the greedy manner as follows: we run in iterations and in each iteration, selecting the unit vector that maximizes the marginal gain of . Since it is trivial, we will not write down the pseudo-code on how we find .
The question now is what approximation guarantee can provide? Note that is a finite combination of functions with . Hence, is submodular if all weight functions are concave and supermodular if they are convex. So maximizing using greedy algorithm may not return approximation result. Therefore, similar to IG, we use the concave ratio to obtain the performance guarantee of the greedy selection to maximize .
Denote as an optimal solution that maximizes , where is a unit vector (). Lemma 5.2 provides approximation guarantee of the greedy selection.
Lemma 5.2 ().
Denote as the budget vector after greedily selecting first unit vectors. The key of the proof comes from the following inequality:
This inequality is proved by using the property of from lemma 4.2 and the trait that is monotone w.r.t . Using this inequality, we prove that
in which the lemma follows. ∎
5.3. Sample size and Performance guarantee
We have proved the performance guarantee of the additional budget vector to maximize . The question now is: what is the size of to bound the error between and ? In this part, we will answer this question. Then, putting together with the performance guarantee of selecting on , we provide the performance guarantee of SA.
To find the minimum number of samples, we utilize the following Chernoff Bound theory.
Theorem 5.3 ().
Considering a path , we have:
where is the maximum out-going degree of a node in . Therefore, for any single path ,
Denote as an optimal solution that maximizes .
Lemma 5.4 ().
Given , with the number of sampling paths satisfies
the following condition is guaranteed:
This lemma is trivially derived from Eq. 16.
Lemma 5.5 ().
Given , with the number of sampling paths satisfies
we have for all budget vectors , which satisfy , with probability at least
Let us consider an arbitrary budget vector ,
Using the union bound theory, to let satisfy for any budget vector , , we have
The lemma follows by letting ∎
Lemma 5.6 ().
Given , let and . If the number of sampling paths is at least