1 Introduction
1.1 LPtype problems
LPtype problems were defined by Sharir and Welzl [30] as problems characterized by a tuple where is a finite set and is a function that maps subsets from to values in a totally ordered set containing . The function is required to satisfy two conditions:

Monotonicity: For all sets , .

Locality: For all sets with and every element , if then .
A minimal subset with for all proper subsets of is called a basis of . An optimal basis is a basis with . The maximum cardinality of a basis is called the (combinatorial) dimension of and denoted by . LPtype problems cover many important optimization problems.
Linear optimization
In this case, is the set of all linear constraints and denotes the optimal value in the polytope formed by with respect to the given objective function. W.l.o.g., we may assume that is nondegenerate, i.e., for every subset , is associated with a unique solution (by, for example, slightly perturbing the coefficients in the linear constraints). The monotonicity condition obviously holds in this case. Also, the locality condition holds since if (i.e., if is violated by the solution associated with ), then due to , is also violated by . The combinatorial dimension is simply the number of variables of the LP.
Smallest enclosing ball
In this case, is a set of points in a Euclidean space and denotes the radius of the smallest enclosing ball for . The monotonicity condition can be verified easily. Also the locality condition holds since if the smallest enclosing balls for and have the same radius (and thus they actually are the same ball) and point lies outside of the ball of , then must also lie outside of the ball of . Since in the 2dimensional case at most 3 points are sufficient to determine the smallest enclosing ball for , the combinatorial dimension of this problem is 3. For dimensions, at most points are sufficient.
Clarkson [6] proposed a very elegant randomized algorithm for solving LPtype problems (see Algorithm 1). In this algorithm, each has a multiplicity of , and is a multiset where each occurs times in . The algorithm requires a subroutine for computing for sets of size , but this is usually straightforward if is a constant. The runtime analysis is simple enough so that we will review it in this paper, since it will also be helpful for the analysis of our distributed algorithms. In the following, let , and we say that an iteration of the repeatloop is successful if .
[[17]] Let be an LPtype problem of dimension and let be any multiplicity function. For any , where , the expected size of for a random multiset of size from is at most .
Proof.
Let be the set of all multisets of elements in , i.e., all results for . By definition of the expected value it holds
For and let be the indicator variable for the event that . Then we have
Equation (1) is true since choosing a set of size from and subsequently choosing some is the same as choosing a set of constraints from and the subsequent removal of from . Equation (2) follows from the fact that the dimension of — and therefore also of — is at most and the monotonicity condition, which implies that there are at most many with . Resolving the inequality to results in the lemma. ∎
From this lemma and the Markov inequality it immediately follows that the probability that
is at most . Moreover, it holds:Proof.
Each successful iteration increases the multiplicity of by a factor of at most . Therefore, . On the other hand, for each successful iteration with , . Due to the monotonicity condition, and for any subset . Let be any maximal subset of (w.r.t. ) with . Since , there is an with and therefore, due to the locality condition, . Hence, there is a constraint in that is doubled at least times in successful iterations, which implies that . ∎
Lemma 1 implies that Clarkson’s algorithm must terminate after at most successful iterations (as otherwise ), so Clarkson’s algorithm performs at most iterations of the repeatloop, on expectation. This bound is also best possible in the worst case for any : given that there is a unique optimal basis of size , its elements can have a multiplicity of at most after iterations, so the probability that is contained in is polynomially small in up to that point.
Clarkson’s algorithm has the advantage that it can easily be transformed into a distributed algorithm with expected runtime if nodes are available that are interconnected by a hypercube, for example, because in that case every round of the algorithm can be executed in communication rounds w.h.p.^{1}^{1}1By “with high probability”, or short, “w.h.p.”, we mean a probability of least for any constant .. However, it has been completely open so far whether it is also possible to construct a distributed algorithm for LPtype problems with an expected runtime of (either with a variant of Clarkson’s algorithm or a different approach). We will show in this paper that this is possible when running certain variants of Clarkson’s algorithm in the gossip model, even if has a polynomial size.
1.2 Network Model
We assume that we are given a fixed node set of size consisting of the nodes . In our paper, we do not require the nodes to have IDs since all of our protocols work for fully anonymous nodes. Moreover, we assume the standard synchronous message passing model, i.e., the nodes operate in synchronous (communication) rounds, and all messages sent (or requested) in round will be received at the beginning of round .
In the (uniform) gossip model, a node can only send or receive messages via random push and pull operations. In a push operation, it can send a message to a node chosen uniformly at random while in a pull operation, it can ask a node chosen uniformly at random to send it a message. We will restrict the message size (i.e., its number of bits) to . A node may execute multiple push and pull operations in parallel in a round. The number of push and pull operations executed by it is called its (communication) work.
Protocols for the gossip model are usually very practical due to their fast convergence, their simplicity, and their stability under stress and disruptions. Many gossipbased protocols have already been presented in the past, including protocols for information dissemination, network coding, loadbalancing, consensus, and quantile computations (see
[8, 20, 21, 22, 23] for some examples). Also, gossip protocols can be used efficiently in the context of population protocols and overlay networks, two important areas of network algorithms. In fact, it is easy to see that any algorithm with runtime and maximum work in the gossip model can be emulated by overlay networks in time and with maximum work w.h.p. (since it is easy to set up (near)random overlay edges in hypercubic networks in time).1.3 Related Work
There has already been a significant amount of work on finding efficient sequential and parallel algorithms for linear programs of constant dimension (i.e., a constant number of variables), which is a special case of LPtype problems of constant combinatorial dimension (see [10] for a very thorough survey). We just focus here on parallel and distributed algorithms. The fastest parallel algorithm known for the CRCW PRAM is due to Alon and Megiddo [2], which has a runtime of . It essentially follows the idea of Clarkson, with the main difference that it replicates elements in much more aggressively by exploiting the power of the CRCW PRAM. This is achieved by first compressing the violated elements into a small area and then replicating them by a factor of (instead of just 2). The best workoptimal algorithm for the CRCW PRAM is due to Goodrich [18], which is based on an algorithm by Dyer and Frieze [11] and has a runtime of . This also implies a workoptimal algorithm for the EREW PRAM, but the runtime increases to in this case. The fastest parallel algorithm known for the EREW PRAM is due to Dyer [9], which achieves a runtime of when using an time parallel sorting algorithm (like Cole’s algorithm). Since the runtime of any algorithm for solving a linear program of constant dimension in an EREW PRAM is known to be [10], the upper bound is optimal for .
Due to Ranade’s seminal work [28], it is known that any CRCW PRAM step can be emulated in a butterfly network in communication rounds, yielding an time algorithm for linear programs of constant dimension in the butterfly. However, it is not clear whether any of the parallel algorithms would work for arbitrary LPtype problems. Also, none of the proposed parallel algorithms seem to be easily adaptable to an algorithm that works efficiently (i.e., in time and with work) for the gossip model as they require processors to work together in certain groups or on certain memory locations in a coordinated manner, and assuming anonymous nodes would further complicate the matter.
Algorithms for (integer) linear programs have also been investigated in the distributed domain. Their study was initiated by Papadimitriou and Yannakakis [27]. Bartal, Byers and Raz [3] presented a distributed approximation scheme for positive linear programs with a polylogarithmic runtime. Kuhn, Moscibroda, and Wattenhofer [24] present a distributed approximation scheme for packing LPs and covering LPs. For certain cases, their scheme is even a local approximation scheme, i.e., it only needs to know a constantdistance neighborhood and therefore can be implemented in a constant number of rounds (given sufficiently large edge bandwidths). Floréen et al. [13] studied the problem of finding local approximation schemes for maxmin linear programs, which are a generalized form of packing LPs. They show that in most cases there is no local approximation scheme and identify certain cases where a local approximation scheme can be constructed. Positive LPs and maxmin LPs are a special form of LPtype problems, but to the best of our knowledge, no distributed algorithms have been formally studied for LPtype problems in general.
As mentioned above, LPtype problems were introduced by Sharir and Welzl [30]. Since then, various results have been shown, but only for sequential algorithms. Combining results by Gärtner [14] with Clarkson’s methods, Gärtner and Welzl [16] showed that an expected linear number of violation tests and basis computations is sufficient to solve arbitrary LPtype problems of constant combinatorial dimension. For the case of a large combinatorial dimension , Hansen and Zwick [19] proposed an algorithm with runtime . Extensions of LPtype problems were studied by Gärtner [14] (abstract optimization problems) and Skovron [31] (violator spaces). Gärtner et al. [15] and Brise and Gärtner [4] showed that Clarkson’s approach still works for violator spaces.
For the applications usually considered in the context of LPtype problems, the combinatorial dimension is closely related to the minimum size of an optimal basis. But there are also LPtype problems whose combinatorial dimension might be much larger. Prominent examples are the hitting set problem and the equivalent set cover problem. Both are known to be NPhard problems. Also, Dinur and Steurer [7] showed that the set cover problem (and therefore the hitting set problem) cannot be approximated within a factor unless . Based on Clarkson’s algorithm, Brönnimann and Goodrich [5] and Agarwal and Pan [1] gave algorithms that compute an approximate set cover / hitting set of a geometric set cover instance in time. Their results imply, for example, an approximation algorithm for the hitting set problem that runs in time for range spaces induced by 2D axisparallel rectangles, and an approximate set cover in time for range spaces induced by 2D disks. The currently best distributed algorithm for the set cover problem was presented by Even, Ghaffari, and Medina [12]. They present a deterministic distributed algorithm for computing an approximation of the set cover problem, for any constant , in rounds, where is the maximum element frequency and is the cardinality of the largest set. This almost matches the lower bound of Kuhn, Moscibroda, and Wattenhofer [25]. Their network is a bipartite graph with the nodes on one side representing the elements and the nodes on the other side representing the sets, and their algorithm follows the primaldual scheme. However, it is unclear whether their algorithm can be adapted to the gossip model.
1.4 Our Results
In all of our results, we assume that initially, is randomly distributed among the nodes. This is easy to achieve in the gossip model if this is not the case (for example, each node initially represents its own point for the smallest enclosing ball problem) by performing a push operation on each element. The nodes are assumed to know
, and we require the nodes to have a constant factor estimate of
for the algorithms to provide a correct output, w.h.p., but they may not have any information about . For simplicity, we also assume that the nodes know . If not, they may perform a binary search on (by stopping the algorithm if it takes too long for some to switch to ), which does not affect our bounds below since they depend at least linearly on .We usually assume that the dimension of the given LPtype problem is a constant (i.e., independent of ), though our proofs and results would also be true for nonconstant (as long as is sufficiently small compared to ). In Section 2, we start with the lightly loaded case (i.e., ) and prove the following theorem.
For any LPtype problem satisfying , the LowLoad Clarkson Algorithm finds an optimal solution in rounds with maximum work per round, w.h.p.
At a high level, the LowLoad Clarkson Algorithm is similar to the original Clarkson algorithm, but sampling a random multiset and termination detection are more complex now, and a filtering approach is needed to keep low at all times so that the work is low. In Section 3, we then consider the highly loaded case and prove the following theorem.
For any LPtype problem with and , the HighLoad Clarkson Algorithm finds an optimal solution in rounds with maximum work per round, w.h.p. If we allow a maximum work of per round, for any constant , the runtime reduces to , w.h.p.
Note that as long as we only allow the nodes to spend polylogarithmic work per round, a trivial lower bound on the runtime when using Clarkson’s approach is since in rounds an element in can only be spread to nodes, so the probability of fetching it under the gossip model is minute.
The reason why we designed different algorithms for the lightly loaded and highly loaded cases is that the LowLoad Clarkson Algorithm is much more efficient than the HighLoad Clarkson Algorithm concerning internal computations. Also, it is better concerning the work for the lightly loaded case, but its work does not scale well with an increasing . The main innovation for Theorem 1.4 is that we come up with a Chernoffstyle bound for that holds for all LPtype problems. Gärtner and Welzl [17] also provided a Chernoffstyle bound on for LPtype problems, but their proof only works for LPtype problems that are regular (i.e., for all with , all optimal bases of have a size of exactly ) and nondegenerate (i.e., every with has a unique optimal basis). While regularity can be enforced in the nondegenerate case, it is not known so far how to make a general LPtype problem nondegenerate without substantially changing its structure (though for most of the applications considered so far for LPtype problems, slight perturbations of the input would solve this problem). Since the duplication approach of Clarkson’s algorithm generates degenerate instances, their Chernoffstyle bound therefore cannot be used here.
Finally, we will study two LPtype problems that can potentially have a very high combinatorial dimension even though the size of an optimal basis might just be a constant: the hitting set problem and the set cover problem.
Let be a set of elements and be a collection of subsets of . A subset is called a hitting set of if for all , . In the hitting set problem we are given , and the goal is to find a hitting set of minimum size.
First of all, it is easy to verify that , where for any subset of denotes the number of sets in intersected by , satisfies the monotonicity and locality conditions, so is an LPtype problem. However, its combinatorial dimension might be much larger than the size of a minimum hitting set. Nevertheless, we present a distributed gossipbased algorithm that is able to find an approximate solution efficiently.
We assume that every node knows so that it can locally evaluate . Note that knowing may not necessarily mean that every node knows because the sets might just be defined implicitly w.r.t. , e.g., the sets might represent polygons in some 2dimensional space. Also, initially, the points in are randomly distributed among the nodes. Under these assumptions, we can show the following theorem.
For any hitting set problem with and and a minimum hitting set of size , our Hitting Set Algorithm finds a hitting set of size in rounds with maximum work per round, w.h.p.
Finally, let us review the set cover problem. Again, let be a set of elements and be a collection of subsets of , where we assume here that . A set is called a set cover of if . In the (simple form of the) set cover problem we are given , and the goal is to find a set cover of minimum size, i.e., a minimum number of sets.
It is easy to verify that , where for any subset of denotes the number of elements in covered by , satisfies the monotonicity and locality conditions, so is an LPtype problem.
We assume that every node knows so that it can locally evaluate , and initially the elements in are randomly distributed among the nodes. Note that even though some set might contain many elements in , we will assume here that every has a compact representation (like a polygon) so that it can be sent in one message.
We can then use our Hitting Set Algorithm to solve any set cover problem with the same bounds as in Theorem 1.4, because there is a wellknown equivalent formulation as a hitting set problem: Given that , let and , where . Then a set cover in corresponds to a hitting set in .
2 LowLoad Clarkson Algorithm
Suppose that we have an arbitrary LPtype problem of dimension with . First, we present and analyze an algorithm for , and then we extend it to any .
Recall that initially the elements of are assigned to the nodes uniformly and independently at random. Let us denote the set of these elements in node by to distinguish them from copies created later by the algorithm, and let .
At any time, denotes the (multi)set of elements in known to (including the elements in ) and , where represents the node set. Let . At a high level, our distributed algorithm is similar to the original Clarkson algorithm, but sampling a random multiset and termination detection are more complex now (which will be explained in dedicated subsections). In fact, the sampling might fail since a node might not be able to collect enough elements for . Also, a filtering approach is needed to keep low at all times (see Algorithm 2). However, it will never become too low since the algorithm never deletes an element in , so at any time. Note that never deleting an element in also guarantees that no element in will ever be washed out (which would result in incorrect solutions).
For the runtime analysis, we note that sampling can be done in one round (see Section 2.1), spreading just takes one round (by executing the push operations in parallel), and we just need one more round for processing the received elements , so for simplicity we just assume in the following that an iteration of the repeat loop takes one round. We start with a slight variant of Lemma 1.
Let be an LPtype problem of dimension and let be any multiset of of size . For any , the expected size of for a random multiset of size from is at most .
Proof.
According to Lemma 1, the expected size of for a random multiset is at most . Since every element in has a probability of to belong to , . ∎
This allows us to prove the following lemma.
For all , , w.h.p., and , w.h.p.
Proof.
Let the random variable
be defined as and let . If the sampling of fails then, certainly, , and otherwise, for all . Thus, . Also, since the elements in are distributed uniformly and independently at random among the nodes at all times, the standard Chernoff bounds imply that w.h.p., and therefore also w.h.p. Unfortunately, the ’s are not independent since is not chosen independently of the other ’s, but the dependencies are minute: given that we have already determined for many ’s, where is sufficiently small, the probability that any one of the remaining elements is assigned to is , so that for any subset of size ,This allows us to use a ChernoffHoeffdingstyle bound for wise negatively correlated random variables, which is a slight extension of Theorem 3 in [29]:
Let be random variables with for some . Suppose there is a and with for all subsets of size . Let and . Then it holds for all with that
Setting , with and , and large enough so that but so that inequality applies, which works for , is polynomially small in . ∎
Next, we show that will never be too large, so that the communication work of the nodes will never be too large.
For up to polynomially many rounds of the LowLoad Clarkson Algorithm, , w.h.p.
Proof.
Let and suppose that for some , which implies that . Then it holds for the size of at the end of a repeatround that
Since the decision to keep elements is done independently for each , it follows from the Chernoff bounds that is polynomially small in for . Moreover, Lemma 2 implies that can increase by at most in each round, w.h.p., so for polynomially many rounds of the algorithm, w.h.p. ∎
Thus, combining Lemma 2 and Lemma 2, the maximum work per round for pushing out some is bounded by w.h.p. Next we prove a lemma that adapts Lemma 1 to our setting.
Let be an arbitrary optimal basis of . If, for many rounds of the LowLoad Clarkson Algorithm, every node was successful in sampling a random multiset and no satisfies , then after these rounds.
Proof.
Let , , and let be the probability that . If node has chosen some with , then there must exist an with , which implies that under the condition that , . The ’s are the same for each since each has the same probability of picking some multiset of of size . Hence, we can simplify to and state that . Now, let be the probability that for a randomly chosen multiset in round , and fix any values for the so that for all and . Let be the multiplicity of at the end of round . Then, for all , , and
Hence, . Since for all , it follows that . Also, since , there must be a with . Therefore, there must be a with , which completes the proof. ∎
Since is bounded by w.h.p., the expected number of copies of should be at most w.h.p. as well. Due to Lemma 2, this cannot be the case if is sufficiently large. Thus, the algorithm must terminate within rounds w.h.p.
In order to complete the description of our algorithm, we need distributed algorithms satisfying the following claims:

The nodes succeed in sampling multisets uniformly at random in a round, w.h.p., with maximum work .

Once a node has chosen an with , all nodes are aware of that within communication rounds, w.h.p., so that the LowLoad Clarkson Algorithm can terminate. The maximum work for the termination detection is per round.
The next two subsections are dedicated to these algorithms.
2.1 Sampling random multisets
For simplicity, we assume here that every node knows the exact value of , but it is easy to see that the sampling algorithm also works if the nodes just know a constant factor estimate of , if the constant used below is sufficiently large.
Each node samples a multiset in a way that is as simple as it can possibly get: asks random nodes via pull operations to send it a random element in , where is a sufficiently large constant. Out of the returned elements, selects distinct elements at random for its multiset . If hasn’t received at least distinct elements, the sampling fails. Certainly, the work for each node is just .
For any , node succeeds in sampling a multiset uniformly at random, w.h.p.
Proof.
Suppose that succeeds in receiving distinct elements in the sampling procedure above. Since the elements in are distributed uniformly and independently at random among the nodes, every multiset of size in has the same probability of representing these elements. Hence, it remains to show that succeeds in receiving at least elements w.h.p.
Consider any numbering of the pull requests from 1 to . For the th pull request of , two bad events can occur. First of all, the pull request might be sent to a node that does not have any elements. Since , the probability for that is at most . Second, the pull request might return an element that was already returned by one of the prior pull requests. Since this is definitely avoided if the th pull request selects a node that is different from the nodes selected by the prior pull requests, the probability for that is at most . So altogether, the probability that a pull request fails is at most .
Now, let the binary random variable be 1 if and only if the th pull request fails. Since the upper bound of for the failure holds independently of the other pull requests, it holds for any subset that . Hence, Theorem 2 implies that w.h.p. If is sufficiently large, then , which completes the proof. ∎
Note that our sampling strategy does not reveal any information about which elements are stored in , so each element still has a probability of to be stored in , which implies that Lemma 2 still holds.
2.2 Termination
We use the following strategy for each node :
Suppose that in iteration of the repeat loop, , i.e., . Then determines an optimal basis of , stores the entry in a dedicated set , and performs a push operation on . At the beginning of iteration of the repeat loop, works as described in Algorithm 3. In the comparison between and we assume w.l.o.g. that if and only if (otherwise, we use a lexicographic ordering of the elements as a tie breaker). The parameter in the algorithm is assumed to be a sufficiently large constant known to all nodes.
If the constant in the termination algorithm is large enough, it holds w.h.p.: Once a node satisfies , then all nodes output a value with after iterations, and if a node outputs a value , then .
Proof.
Using standard arguments, it can be shown that if the constant is large enough, then for every iteration , it takes at most iterations, w.h.p., until the basis with maximum injected into some at iteration (which we assume to be unique by using some tie breaking mechanism) is contained in all ’s. At this point, we have two cases. If , then for all , at any point from iteration to , and otherwise, there must be at least one at iteration with . In the first case, no will ever set in the entry to 0, so after an additional iterations, every still stores and therefore outputs . In the second case, there is at least one entry of the form at iteration . For this entry, it takes at most further iterations, w.h.p., to spread to all nodes so that at the end, no node outputs . ∎
Since the age of an entry is at most and for each age a node performs at most one push operation, every node has to execute just push operations in each round.
2.3 Extension to any
If , the probability that our sampling strategy might fail will get too large. Hence, we need to extend the LowLoad Clarkson algorithm so that we quickly reach a point where at any time afterwards. We do this by integrating a socalled pull phase into the algorithm.
Initially, a node sets its Boolean variable to if and only if (which would happen if none of the elements in has been assigned to it). Afterwards, it executes the algorithm shown in Algorithm 4. As long as (i.e., is still in its pull phase), keeps executing a pull operation in each iteration of the algorithm, which asks the contacted node to send it a copy of a random element in , until it successfully receives an element that way. Once this is the case, pushes the successfully pulled element to a random node (so that all elements are distributed uniformly and independently at random among the nodes), which will store it in , and starts executing the LowLoad Clarkson algorithm from above.
After rounds, all nodes have completed their pull phase, w.h.p.
Proof.
Note that no node will ever delete an element in , and pull requests only generate elements for , so the filtering approach of the LowLoad Clarkson algorithm cannot interfere with the pull phase. Thus, it follows from a slight adaptation of proofs in previous work on gossip algorithms (e.g., [22]) that for any , all nodes have completed their pull phase after at most rounds, w.h.p. ∎
Certainly, and at any time, and once all nodes have finished their pull phase, , so we are back to the situation of the original LowLoad Clarkson Algorithm.
During the time when some nodes are still in their pull phase, some nodes might already be executing Algorithm 2, which may cause the sampling of to fail for some nodes . However, the analyses of Lemma 2 and Lemma 2 already take that into account. Once all nodes have finished their pull phase, Lemma 2 applies, which means that after an additional rounds at least one node has found the optimal solution, w.h.p. Thus, after an additional nodes, all nodes will know the optimal solution and terminate. Altogether, we therefore still get the same runtime and work bounds as before, completing the proof of Theorem 1.4.
3 HighLoad Clarkson Algorithm
If , then our LPtype algorithm in the previous section will become too expensive since, on expectation, might be in the order of , which is now . In this section, we present an alternative distributed LPtype algorithm that just causes work for any , but the internal computations are more expensive then in the algorithm presented in the previous section. Again, we assume that initially the elements in are randomly distributed among the nodes in . Let the initial be all elements of assigned that way to . As before, .
Irrespective of which elements get selected for the ’s in each round, is a random subset of because the elements in are assumed to be randomly distributed among the nodes and every element in is sent to a random node in . Hence, if follows from and the standard Chernoff bounds that is within , w.h.p., for any constant . Thus, we are computing bases of random multisets of size within , w.h.p. This, in turn, implies with , where , that . In the worst case, however, could be very large, so just bounding the expectation of does not suffice to show that our algorithm has a low work. Therefore, we need a proper extension of Lemma 2
that exploits higher moments. Note that it works for arbitrary LPtype problems, i.e., also problems that are nonregular and/or nondegenerate.
Let be an LPtype problem of dimension and let be any multiplicity function. For any and any , where , it holds for for a random multiset of size from that .
Proof.
By definition of the expected value it holds that
For and let be the indicator variable for the event that . Then we have
Comments
There are no comments yet.