In distributed graph algorithms (or network algorithms) a number of individual entities are connected via a potentially large network. Starting with the breakthrough by Awerbuch et al. [AGLP89], and the seminal work of Linial [Lin92], Peleg [Pel00] and Naor and Stockmeyer [NS95], the area of distributed graph algorithms is growing rapidly. Recently, it has been receiving considerably more theoretical and practical attention motivated by the spread of multi-core computers, cloud computing, and distributed databases. We consider the standard synchronous message passing model (the model) where in each round bits can be transmitted over every edge where is the number of entities.
The common principle underlying all distributed graph algorithms (regardless of the model specification) is that the input of the algorithm is given in a distributed format, and consequently the goal of each vertex is to compute its own part of the output, e.g., whether it is a member of a computed maximal independent set, its own color in a valid coloring of the graph, its incident edges in the minimum spanning tree, or its chosen edge for a maximal matching solution. In most distributed algorithms, throughout execution, vertices learn much more than merely their own output but rather collect additional information on the input or output of (potentially) many other vertices in the network. This seems inherent in many distributed algorithms, as the output of one node is used in the computation of another. For instance, most randomized coloring (or MIS) algorithms [Lub86, BE13, BEPS16, HSS16, Gha16, CPS17] are based on the vertices exchanging their current color with their neighbors in order to decide whether they are legally colored.
In cases where the data is sensitive or private, these algorithms may raise security concerns. To exemplify this point, consider the task of computing the average salary in a distributed network. This is a rather simple distributed task: construct a BFS tree and let the nodes send their salary from the leaves to the root where each intermediate node sends to its parent in the tree, the sum of all salaries received from its children. While the output goal has been achieved, privacy has been compromised as intermediate nodes learn more information regarding the salaries of their subtrees. Additional motivation for secure distributed computation include private medical data, networks of selfish agents with private utility functions, and decentralized digital currencies such as the Bitcoin.
The community of distributed graph algorithms is commonly concerned with two main challenges, namely, locality (i.e., communication is only performed between neighboring nodes) and congestion (i.e., communication links have bounded bandwidth). Security is usually not specified as a desired requirement of the distributed algorithm and the main efficiency criterion is the round complexity (while respecting bandwidth limitation).
Albeit being a rather virgin objective in the area of distributed graph algorithms, the notion of security in multi-party computation (MPC) is one of the main themes in the Cryptographic community. Broadly speaking, secure MPC protocols allow parties to jointly compute a function of their inputs without revealing anything about their inputs except the output of the function. There has been tremendous work on MPC protocols, starting from general feasibility results [Yao82, GMW87, BGW88, CCD88] that apply to any functionality to protocols that are designed to be extremely efficient for specific functionalities [BNP08, BLO16]. There is also a wide range of security notions: information-theoretic security or security that is based on computational assumptions, the adversary is either semi-honest or malicious333A semi-honest adversary does not deviate from the described protocol, but may run any computation on the received transcript to gain additional information. A malicious adversary might arbitrarily deviate from the protocol. and in might collude with several parties.
Most MPC protocols are designed for the clique networks where each two parties have a secure channel between them. The works that do consider general graph topologies usually take the following framework. For a given function of interest, design first a protocol for securely computing in the simpler setting of a clique network, then “translate” this protocol to any given graph . Although this framework yields protocols that are secure in the strong sense (e.g., handling collusions and a malicious adversary), they do not quite fit the framework of distributed graph algorithms, and simulating these protocols in the model results in a large overhead in the round complexity. It is important to note that the blow-up in the number of rounds might occur regardless of the security requirement; for instance, when the desired function is non-local, its distributed computation in general graphs might be costly with respect to rounds even in the insecure setting. In the lack of distributed graph algorithms for general graphs that are both secure and efficient compared to their non-secure counterparts, we ask:
How to design distributed algorithms that are both efficient (in terms of round complexity) and secure (where nothing is learned but the desired output)?
We address this challenge by introducing a new framework for secure distributed graph algorithms in the model. Our approach is different from previous secure algorithms mentioned above and allows one to decouple between the price of locality and the price of security of a given function . In particular, instead of adopting a clique-based secure protocol for , we take the best distributed algorithm for computing , and then compile to a secure algorithm . This compiled algorithm respects the same bandwidth limitations, relies on no setup phase nor on any computational assumption and works for (almost) any graph. The price of security comes as an overhead in the number of rounds. Before presenting the precise parameters of the secure compiler, we first discuss the security notion used in this paper.
Our Security Notion.
Consider a (potentially insecure) distributed algorithm . Intuitively, we say that a distributed algorithm securely simulates if (1) both algorithms have the exact same output for every node (or the exact same output distribution if the algorithm is randomized) and (2) each node learns “nothing more” than its final output. This strong notion of security is known as “perfect privacy” - which provides pure information theoretic guarantees and relies and no computational assumptions. The perfect privacy notion is formalized by the existence of an (unbounded) simulator [BGW88, Gol09, Can00, AL17], with the following intuition: a node learns nothing except its own output , from the messages it receives throughout the execution of the algorithm, if a simulator can produce the same output while receiving only and the graph .
Assume that one of the nodes in the network is an “adversary” that is trying to learn as much as possible from the execution of the algorithm. Then the security notion has some restrictions on the operations the adversary is allowed to perform: (1) The adversary is passive and only listens to the messages but does not deviate from the prescribed protocol; this is known in the literature as semi-honest security. (2) The adversary is not allowed to collude with other nodes in the network. (3) The adversary gets to see the entire graph. That is, in this framework, the topology of the graph is not considered private and is not protected by the security notion. The private bits of information that are protected by our compiler are: the inputs of the nodes (e.g., color) and the randomness chosen during the execution of the algorithm; as a result, the outputs of the nodes are private (see Definition 1 for precise details).
A Stronger Adversary.
The goal of this paper is to lay down the groundwork, especially, the graph theoretic infrastructures for secure distributed graph algorithms. As a first step towards this goal, we consider a weaker notion of security than the usual one obtained in MPC literature. In particular, in MPC, the adversary is allowed to collude with several other parties, and security still holds. As will be shown in this paper, this basic notion of semi-honest security already brings along many graph theoretical challenges, which we believe will serve the basis for stronger security guarantees in the future (e.g., malicious adversaries and collusions).
1.1 Our Results
Our end result is the first general compiler that can take any (possibly insecure) distributed algorithm to one that has perfect security.
Theorem 1 (Secure Simulation, Informal).
Let be a 2-vertex connected444A graph is 2-vertex connected if for all the graph is connected. -vertex graph with diameter and maximal degree . Let be a natural distributed algorithm that runs on in rounds. Then, can be transformed to an equivalent algorithm with perfect privacy which runs in rounds.
We note that our compiler works for any distributed algorithm rather than only on natural ones. The number of rounds will be proportional to the space complexity of the internal computation functions of the distributed algorithm (an explicit statement for any algorithm can be found in Remark 1).
This quite general framework is made possible due to fascinating connections between “secure cryptographic definitions” and natural combinatorial graph properties. Most notably is the cycle cover of a graph, namely, a covering decomposition of graph edges into cycles. While cycle covers have been studied in the literature, e.g., the well-known double cycle cover conjecture by Szekeresand and Seymour [JT92]; the Chinese postman problem by Edmond [EJ73], none of the known results satisfy our requires. Instead, we prove a new theorem regarding cycle covers with low congestion which we foresee being of independent interest and exploited in future work.
Low-Congestion Graph Structures.
Given a bridgeless555A graph is bridgeless if is connected for every . graph , a -cover is a collection of cycles of length at most d such that each edge appears at least once and at most c many times on each of the cycles (the congestion of the cover). A-priori, it is not clear that cycle covers that enjoy both low congestion and short lengths even exist, nor if it is possible to efficiently find them. Perhaps quite surprisingly, we prove the following.
Theorem 2 (Low Congestion Cycle Cover).
Every bridgeless graph with diameter has a -cycle cover where and . That is, the edges of can be covered by cycles such that each cycle is of length at most and each edge participates in at most cycles.
Our secure compiler assumes that the nodes in the graph know the cycle cover. That is, each node knows its neighbors in the cycles it participates in. Thus, in order for the whole transformation to work, the network needs to run a distributed algorithm to learn the cycle cover. This can be achieved by performing a prepossessing phase to compute the cycle cover and then run the distributed algorithm. This preprocessing phase can be done once and for all, and moreover, we show that it can constructed distributively in
Our low congestion cycle covers turn out to be a building block for constructing a more complex type of cover which is the actually what we use in the final secure compiler. We call this building block private neighborhood trees, which can be thought of as a generalization of cycle cover. Roughly speaking, a private neighborhood tree collection of a 2-vertex connected graph is a collection of trees, one per node , where each tree contains all the neighbors of but does not contain . Intuitively, the private neighborhood trees allow all neighbors of all nodes to exchange a secret without . Note that these covers exists if and only if the graph is 2-vertex connected. Similarly to low-congestion cycle covers, we define a -private neighborhood trees in which each tree has depth at most d and each edge belongs to at most c many trees. This allows the distributed compiler to use all trees simultaneously in rounds, by employing the random delay approach [LMR94, Gha15].
Theorem 3 (Private Neighborhood Trees).
Every -vertex connected graph with diameter and maximum degree has a -private neighborhood trees with and .
The flow of our constructions are summarized in Figure 1.
Applications for Known Distributed Algorithms.
Theorem 1 enables us to compile almost all of the known distributed algorithms to a secure version of them. It is worth noting that deterministic algorithms for problems in which the nodes do em not have any input cannot be made secure by our approach since these algorithms only depend on the graph topology which we do not try to hide. Our compiler is meaningful for algorithms where the nodes have input or for randomized algorithms which define a distribution over the output of the nodes. For instance, the randomized coloring algorithms (see e.g., [BE13]) which sample a random legal coloring of the graph can be made secure. Specifically, we get a distributed algorithm that (legally) colors a graph (or computes a legal configuration, in general), while the information that each node learns at the end is as if a centralized entity ran the algorithm for the entire network, and revealed each node’s output privately (i.e., revealing the final color of ).
Our approach captures global (e.g., MST) as well as many local problems [NS95]. The MIS algorithm of Luby [Lub86] along with our compiler yields secure algorithm according to the notion described above. Slight variations of this algorithm also gives the -round -coloring algorithm (e.g., Algorithm 19 of [BE13]). Combining it with our compiler we get a secure -coloring algorithm with round complexity of . Using the Matching algorithm of Israeli and Itai [II86] we get an secure maximal matching algorithm. Finally, another example comes from distributed algorithms for the Lovász local lemma (LLL) which receives a lot of attention recently [BFH16, FG17, CP17] for the class of bounded degree graphs. Using [CPS17], most of these (non-secure) algorithms for defective coloring, frugal coloring, and list vertex-coloring can be made secure within rounds.
2 Our Approach and Techniques
We next describe the high level ideas of our secure compiler. In section 2.1, we describe how the secure computation in the distributed setting boils down into a novel graph theoretic structure, namely, private neighborhood trees. The construction of these private trees is based on cycle covers. In Section 2.2, we describe the high level ideas of our new cycle cover theorem which is one of the main technical contribution of this paper. Private trees are then constructed by using a simple reduction to cycle covers, as shown in Section 4.2.
2.1 From Security Requirements to Graph Structures
In this section, we give an overview of the construction of a secure compiler and begin by showing how to compile a single round distributed algorithm into a secure one. This single-round setting already captures most of the challenges of the compiler. At the end of section, we describe the additional ideas required for generalizing this to arbitrary -round algorithms.
Secure Simulation of a Single Round.
Let be an -vertex graph with maximum degree , and for any node in the graph let be its initial state (including its ID, and private input). Any single round algorithm can be described by a function that maps the initial state of and the messages received from its neighbors, to the output of the algorithm for the node .
As a concrete running example, let be a single round algorithm that verifies vertex coloring in a graph. In this algorithm, the initial state of a node includes a color , and nodes exchange their color with their neighbors and output if and only if all of their neighbors are colored with a color that is different than . It is easy to see that in this simple algorithm, the nodes learn more than the final output of the algorithm, namely, they learn the color of their neighbors. Our goal is to compile this algorithm to a secure one, where nothing is learned expect the final output. In particular, where nodes do not learn the color of any other node in the network. This fits the model of Private Simultaneous Messages (PSM) that we describe next.666Other MPC protocols might be suitable here as well (e.g., [Yao82, GMW87, BGW88]), however, we find the star topology of PSM model to be a best fit.
The PSM model was introduced by Feige, Kilian and Naor [FKN94] (and later generalized by [IK97]) as a “minimal” model of MPC for securely computing a function . In this model, there are clients that hold inputs which are all connected to a single server (i.e., a star topology). The clients share private randomness that is hidden from the server. The goal is for the server to compute while learning “nothing more” but this output. The protocol consists of a single round where each client sends a message to the server that depends on its own input and the randomness . The server, using these messages, computes the final output . In [FKN94], it was shown that any function admits such a PSM protocol with information-theoretic privacy. The complexity measure of the protocol is the size of the messages (and shared randomness) which are exponential in the space complexity of the function (see Definition 2 and Theorem 5 for precise details).
Turning back to our single round distributed algorithm , the secure simulation of can be based on the PSM protocol for securely computing the function , the function that characterizes algorithm . In this view, each node in the graph acts as a server in the PSM protocol, while its (at most ) neighbors in the graph act as the clients.
In order to simulate the PSM protocol of [FKN94] in the model, one has to take care of several issues. The first issue concerns the bandwidth restriction; in the model, every neighbor can send only bits to in a single round. Note that the PSM messages are exponential in the space complexity of the function , and that in our setting the total input of has bits. Thus, in a naïve implementation only functions that are computable in logarithmic space can be computed with the desired overhead of rounds. Our goal is to capture a wider family of functions, in particular the class of natural algorithms in which is computable in polynomial time. Therefore, in our final compiler, we do not compute in a single round, but rather compute it gate-by-gate. Since in natural algorithms is computed by a circuit of polynomial size, and since a single gate is computable in logarithmic space, we incur a total round overhead that is polynomial in . In what follows, assume that is computable in logarithmic space.
Another issue to be resolved is that in the PSM model, the server did not hold an input whereas in our setting the function depends not only on the input of the neighbors, but on the input of the node as well. This subtlety is handled by having secret share777A secret share of to parties is a random tuple such that . its input to the neighbors.
There is one final critical missing piece that requires hard work: the neighbors of must share private randomness that is not known to . Thus, the secure simulation of a single round distributed algorithm can be translated into the following problem:
How to share a secret between the neighbors of each node in the graph while hiding it from itself?
Note that this task should be done for all nodes in the graph simultaneously. That is, for every node , we need the neighbors of to share a private random string that is hidden from . Our solution to this problem is information theoretic and builds upon specific graph structures. However, we begin by discussing a much simpler solution, yet, based on computational assumptions.
A Simple Solution Based on Computational Assumptions.
For simplicity, we will assume that our public-key encryption scheme has two properties: (1) the encryption does not increase the size of the plaintext, and (2) the length of the public-key is – the security parameter of the public-key scheme. We next describe an protocol that computes the secret which is shared by all neighbors of while hiding it from , under the public-key assumption.
Consider a node and let be its neighbors. For simplicity, assume that is power of . First, computes the random string , this string will be shared with all ’s nodes in phases. In each phase , we assume that all the vertices for know . We will show that at the end of the phase, all vertices know . This is done as follows. Each vertex sends its public-key to via the common neighbor , encrypts with the key of and sends this encrypted information to via . As the length of the public-key is and the length of the encrypted secret needed by the PSM protocol has bits, this can be done in rounds. It is easy to see that cannot learn the secret under the public-key assumption. Using this protocol with the PSM machinery yields a protocol that compiles any -round algorithm (even non-natural one) into a secure algorithm with rounds. We note that it is not clear what is as a function of the number of nodes . Clearly, if , this overhead is quite large. The benefit of our perfect security scheme is that it relies on no computational assumptions, does not introduce an additional security parameter and as a result the round complexity of the compiled algorithms depends only on the properties of the graph, e.g., number of nodes, maximum degree and diameter, finally, the dependencies on these graph parameters is existentially required.
Our Information-Theoretic Solution Based on Low-Congestion Structures.
Suppose two nodes, , wish to share information that is hidden from a node in the information-theoretic sense. Then, they must use a - path in that is free from . Hence, in order for the neighbors of a node to exchange private randomness, they must use a connected subgraph of that spans all the neighbors of but does not include . (This in particular explains our requirement for to be 2-vertex connected.) Using this subgraph , the neighbors can communicate privately (without ) and exchange shared randomness. In order to reduce the overhead of the compiler, we need the diameter of to be as small as possible. Moreover, in the compiled algorithm, we will have the neighbors of all nodes in the graph exchange randomness simultaneously. Since there is bandwidth limits, we need to have a minimal overlap of the different subgraphs . It is easy to see that for every vertex , there exists a tree of diameter that spans all the neighbors of . However, an arbitrarily collection of trees where each might result in an edge that is common to trees. This is undesirable as it might lead to a blow-up of in the round complexity of our compiler.
Towards this end, we define the notion of private neighborhood trees which provides us the communication backbone for implementing this distributed protocol in general graph topologies for all nodes simultaneously. Roughly speaking, a private neighborhood tree of a 2-vertex connected graph is a collection of trees, one per node , where each tree contains all the neighbors of but does not contain . A -private neighborhood trees in which each tree has depth at most d and each edge belongs to at most c many trees. This allows us to use all trees simultaneously and exchange all the private randomness in rounds.
Let be a -vertex connected graph and let be the diameter of . By the discussion above, achieving -private neighborhood trees with and is easy, but yields an inefficient compiler. We show how to construct -private neighborhood trees for and , these parameters are nearly optimal existentially. The construction builds on a simpler and more natural structure called cycle cover. Using these private neighborhood trees, the neighbors of each node can exchange the bits of in rounds. This is done for all nodes simultaneously using the random delay approach. Note that unlike the computational setting, here the round complexity is existentially optimal (up to poly-logarithmic terms) and only depends on the parameters of the graph. An overview of low-congestion cycle covers construction which underlies the construction of private trees is given in Section 2.2.
Secure Simulation of Many Rounds.
We have described how to securely simulate single round distributed algorithms. Consider an -round distributed algorithm . In a broad view, can be viewed as a collection of functions . At round , a node holds a state and needs to update its state according to a function that depends on and the messages it has received in this round. Moreover, the same function computes the messages that will send to its neighbors in the next round. That is,
Assume that the final state is the final output of the algorithm for node . A first attempt is to simply apply the solution for a single round for many times, round by round. As a result, the node learns all internal states and nothing more. This is of course undesirable as these internal states, for , might already reveal much more information than the final output. Instead, we simulate the computation of the internal states , in an oblivious manner without knowing any except for which is the final output of the algorithm.
Towards this end, in our scheme, the node holds an “encrypted” state, , instead of the actual state
. The encryption we use is a simple one-time-pad where the key is a random stringsuch that . The key will be chosen by an arbitrary neighbor of . In addition to the state, the node should not be able to learn the messages sent to the neighbors in the original algorithm. Thus, each neighbor holds the key that is used to encrypt its incoming message to . Overall, at any given round , any node holds an encrypted state , and encrypted outgoing messages ; the neighbors of hold the corresponding decryption keys. To compute the new state and the messages that sends to its neighbors in the next round, we use the PSM protocol as described in a single round but with respect to a function which is related to the function and is defined as follows. The input of the function is an encrypted state (of ), encrypted messages from its neighbors, keys for decrypting the input, and new keys for encrypting the final output. First, the function decrypts the encrypted input to get the original state and the messages sent from its neighbors (i.e., the input for function ). Then, the function applies to get the next state and new outgoing messages from to its neighbors. Finally, it uses new encryption keys to encrypt the new output and finally outputs the new encrypted data (states and messages to be sent). A summary of the algorithm for a single node is given in Figure 2. The full proof is given in Section 5.
2.2 Low Congestion Cycle Covers
As mentioned above, cycle covers provide the basis for the construction of private neighborhood trees, the graph infrastructure used by our compiler. We next give an overview of the construction of low congestion cycle covers of Theorem 2. The private neighborhood trees are constructed via a reduction to cycle covers as will be shown in Section 4.2.
Let be a bridgeless -vertex graph with diameter . Our approach is based on constructing a BFS tree in the graph and covering the edges by two procedures: the first constructs a low congestion cycle cover for the non-tree edges and the second covers the tree edges.
Covering the Non-Tree Edges.
Let be the set of non-tree edges. Since the diameter888The graph might be disconnected, when referring to its diameter, we refer to the maximum diameter in each connected component of . of might be large (e.g., ), to cover the edges of by short cycles (i.e., of length ), one must use the edges of . A naïve approach is to cover every edge in by taking its fundamental cycle in (i.e., using the - path in ). Although this yields short cycles, the congestion on the tree edges might become . The key challenge is to use the edges of (as we indeed have to) in a way that the output cycles would be short without overloading any tree edge more than times.
Our approach is based on using the edges of the tree only for the purpose of connecting nodes that are somewhat close to each other (under some definition of closeness to be described later), in a way that would balance the overload on each tree edge. To realize this approach, we define a specific way of partitioning the nodes of the tree into blocks according to . In a very rough manner, a block consists of a set of nodes that have few incident edges in . To define these blocks, we number the nodes based on postorder traversal in and partition them into blocks containing nodes with consecutive numbering. The density of a block is the number of edges in with an endpoint in . Letting b be some threshold of constant value on the density, the blocks are partitioned such that every block is either (1) a singleton block consisting of one node with at least b edges in or (2) consists of at least two nodes but has a density bounded by . As a result, the number of blocks is not too large (say, at most ).
To cover the edges of by cycles, the algorithm considers the contracted graph obtained by contracting all nodes in a given block into one supernode and connecting two supernodes and , if there is an edge in whose one endpoint is in , and the other endpoint is in . This graph is in fact a multigraph as it might contain self-loops or multi-edges. We now use the fact that any -vertex graph with at least edges has girth at most . Since the contracted graph contains at most nodes and has edges, its girth is . The algorithm then repeatedly finds (edge-disjoint) short cycles (of length ) in this contracted graph999That is, it computes a short cycle , omit the edges of from the contracted graph and repeat., until we are left with at most edges. The cycles computed in the contracted graph are then translated to cycles in the original graph by using the tree paths between nodes belonging to the same supernode (block). We note that this translation might result in cycles that are non-simple, and this is handled later on.
Our key insight is that eventhough the tree paths connecting two nodes in a given block might be long, i.e., of length , we show that every tree edge is “used” by at most two blocks. That is, for each edge of the tree, there are at most 2 blocks such that the tree path of nodes in the block passes through . (If a block has only a single node, then it will use no tree edges.) Since the (non-singleton) blocks have constant density, we are able to bound the congestion on each tree edge . The translation of cycles in the contracted graph to cycles in the original graph yields -length cycles in the original graph where every edge belongs to cycles.
The above step already covers of the edges in . We continue this process for times until all edges of are covered, and thus get a factor in the congestion.
Finally, to make the output cycle simple, we have an additional “cleanup” step (procedure ) which takes the output collection of non-simple cycles and produces a collection of simple ones. In this process, some of the edges in the non-simple cycles might be omitted, however, we prove that only tree edges might get omitted and all non-tree edges remain covered by the simple cycles. This concludes the high level idea of covering the non-tree edges. We note the our blocking definition is quite useful also for distributed implementations. The reason is that although the blocks are not independent, in the sense that the tree path connecting two nodes in a given block pass through other blocks, this independence is very limited.
Covering the Tree Edges.
Covering the tree edges turns out to be the harder case where new ideas are required. Specifically, whereas in the non-tree edge our goal is to find cycles that use the tree edge as rarely as possible, here we aim to find cycles that cover all edges in the tree, but still avoiding a particular tree edge from participating in too many cycles.
The algorithm for covering the tree edges is recursive, where in each step we split the tree into two edge disjoint subtrees that are balanced in terms of number of edges. To perform a recursive step, we would like to break the problem into two independent subproblems, one that covers the edges of and the other that covers the edges of . However, observe that there might be edges where the only cycle that covers them101010Recall that the graph is two edge connected. passes through (and vice versa). For every such node , let be the first node in that appears on the fundamental cycle of the edge .
To cover these tree edges, we employ two procedures, one on and the other on that together form the desired cycles (for an illustration, see Figures 11 and 9). First, we mark all nodes such that their is in . Then, we use an Algorithm called
We employ Algorithm on with the marked nodes as described above. Then for every pair that got matched by Algorithm , we add a virtual edge between and in . Since this virtual edge is a non-tree edge with both endpoints in , we have translated the dependency between and to covering a non-tree edge. At this point, we can simply use Algorithm on the tree and the non-virtual edges. This computes a cycle collection which covers all virtual edges . In the final step, we replace each virtual edge with the edge disjoint tree path and the paths between and (as well as the path connecting and ).
This above description is simplified and avoids many details and complications that we had to address in the full algorithm. For instance, in our algorithm, a given tree edge might be responsible for the covering of up to many tree edges. This prevents us from using the edge disjoint paths of Algorithm in a naïve manner. In particular, our algorithm has to avoid the multiple appearance of a given tree edge on the same cycle as in such a case, when making the cycle simple that tree edge might get omitted and will no longer be covered. See Section 4.1 for the precise details of the proof, and see Figure 3 for a summary of our algorithm.
Unless stated otherwise, the logarithms in this paper are base 2. For a distribution we denote by an element chosen from uniformly at random. For an integer we denote by the set . We denote by
the uniform distribution over
-bit strings. For two distributions (or random variables)we write if they are identical distributions. That is, for any it holds that .
For a tree , let be the subtree of rooted at , and let be the tree path between and , when is clear from the context, we may omit it and simply write . Let be a - path (possibly ) and be a - path, we denote by to be the concatenation of the two paths. The fundamental cycle of an edge is the cycle formed by taking and the tree path between and in , i.e., . For , let be the length (in edges) of the shortest path in . For every integer , let . When , we simply write . Let be the degree of in . For a subset of edges , let be the number of edges incident to in . For a subset of nodes , let . For a subset of vertices , let be the induced subgraph on .
[Moore Bound, [Bol04]] Every -vertex graph with at least edges has a cycle of length at most .
3.1 Distributed Algorithms
The Communication Model.
We use a standard message passing model, the model [Pel00], where the execution proceeds in synchronous rounds and in each round, each node can send a message of size to each of its neighbors. In this model, local computation is done for free at each node and the primary complexity measure is the number of communication rounds. Each node holds a processor with a unique and arbitrary ID of bits. Throughout, we make an extensive use of the following useful tool, which is based on the random delay approach of [LMR94].
Theorem 4 ([Gha15, Theorem 1.3]).
Let be a graph and let be distributed algorithms in
where each algorithm takes at most d rounds, and where for each
edge of , at most c messages need to go through it, in total
over all these algorithms. Then, there is a randomized distributed
algorithm (using only private randomness) that, with high probability, produces
a schedule that runs all the algorithms in
messages need to go through it, in total over all these algorithms. Then, there is a randomized distributed algorithm (using only private randomness) that, with high probability, produces a schedule that runs all the algorithms inrounds, after rounds of pre-computation.
A Distributed Algorithm.
Consider an -vertex graph with maximal degree . We model a distributed algorithm that works in rounds as describing functions as follows. Let be a node in the graph with input and neighbors . At any round , the memory of a node consists of a state, denoted by and messages that were received in the previous round.
Initially, we set to contained only the input of and its ID and initialize all messages to . At round the node updates its state to according to its previous state and the message from the previous round, and prepares messages to send . To ease notation (and without loss of generality) we assume that each state contains the ID of the node , thus, we can focus on a single update function for every round that works for all nodes. The function gets the state and messages , and randomness and outputs the next state and outgoing message:
At the end of the rounds, each node has a state and a final output of the algorithm. Without loss of generality, we assume that is the final output of the algorithm (we can always modify accordingly).
Natural Distributed Algorithms.
We define a family of distributed algorithms which we call natural, which captures almost all known distributed algorithms. A natural distributed algorithm has two restrictions for any round : (1) the size the state is bounded by , and (2) the function is computable in polynomial time. The input for is the state and at most message each of length . Thus, the input length for is bounded by , and the running time should be polynomial in this input length.
We introduce this family of algorithms mainly for simplifying the presentation of our main result. For these algorithms, our main statement can be described with minimal overhead. However, our results are general and work for any algorithm, with appropriate dependency on the size of the state and the running time the function (i.e., the internal computation time at each node in round ).
We introduce some notations: For an algorithm , graph , input we denote by the random variable of the output of node while performing algorithm on the graph with inputs (recall that might be randomized and thus the output is a random variable and not a value). Denote by the collection of outputs (in some canonical ordering). Let be a random variable of the viewpoint of in the running of the algorithm . This includes messages sent to , its memory and random coins during all rounds of the algorithm.
Secure Distributed Computation.
Let be a distributed algorithm. Informally, we say that computes (or simulates ) in a secure manner of if when running the algorithm every node learns the final output of but “nothing more”. This notion is captured by the existence of a simulator and is defined below.
Definition 1 (Perfect Privacy).
Let be a distributed (possibly randomized) algorithm, that works in rounds. We say that an algorithm computes with perfect privacy if for every graph , every and it holds that:
Correctness: For every input : .
Perfect Privacy: There exists a randomized algorithm (simulator) such that for every input it holds that
This security definition is known as the “semi-honest” model, where the adversary, acting a one of the nodes in the graph, is not allowed to deviate from the prescribed protocol, but can run arbitrary computation given all the messages it received. Moreover, we assume that the adversary does no collude with other nodes in the graph.
3.2 Cryptography with Perfect Privacy
One of the main cryptographic tools we use is a specific protocol for secure multiparty computation that has perfect privacy. Feige Kilian and Naor [FKN94] suggested a model where two players having inputs and wish to compute a function in a secure manner. They achieve this by each sending a single message to a third party that is able to compute the output of the function from these messages, but learn nothing else about the inputs and . For the protocol to work, the two parties need to share private randomness that is not known to the third party. This model was later generalized to multi-players and is called the Private Simultaneous Messages Model [IK97], which we formally describe next.
Definition 2 (The model).
Let be a variant function. A protocol for consists of a pair of algorithms where and such that
For any it holds that:
There exists a randomized algorithm (simulator) such that for and for sampled from , it holds that
The communication complexity of the PSM protocol is the encoding length and the randomness complexity of the protocol is defined to be .
Theorem 5 (Follows from [Ik97]).
For every function that is computable by an -space TM there is an efficient perfectly secure protocol whose communication complexity and randomness complexity are .
We describe two additional tools that we will use, secret sharing and one-time-pad encryption.
Definition 3 (Secret Sharing).
Let be a message. We say is secret shared to
shares by choosing random strings conditioned
on . Each is called a share, and notice that
the joint distribution of any
is called a share, and notice that the joint distribution of anyshares is uniform over .
Definition 4 (One-Time-Pad Encryption).
Let be a message. A one-time pad is an extremely simple encryption scheme that has information theoretic security. For a random key the “encryption” of according to is . It is easy to see that the encrypted message (without the key) is distributed as a uniform random string. To decrypt using the key we simply compute . The key might be references as the encryption key or decryption key.
In Section 4 we describe the centralized constructions of our low-congestion covers. We start by showing the construction of cycle covers (in Section 4.1). We then use the cycle cover construction to compute private neighborhood trees in Section 4.2. Finally, Section 5 describes the secure simulation which generalizes PSM to general graphs.
4 Low-Congestion Covers
4.1 Cycle Cover
We give the formal definition of a cycle cover and prove our main theorem regarding low-congestion cycle covers. Intuitively, a cycle cover is a collection is cycles in the graph such that each edge is covered by at least one cycle from the collection. We care about two main parameters regarding the cycle cover that we wish to minimize: (1) cycle length: the maximal length of a cycle and (2) edge congestion: the maximal number of cycles an edge participates in.
Definition 5 (Low-Congestion Cycle Cover).
For a given graph , a low-congestion cycle cover of is a collection of cycles that cover all edges of such that each cycle is of length at most and each edge appears in at most cycles in . That is, for every it holds that .
We also consider partial covers, that cover only a subset of edges . We say that a cycle cover is a cycle cover for , if all cycles are of length at most , each edge of appears in at least one of the cycles of , and no edge in appears in more than cycles in . That is, in this restricted definition, the covering is with respect to the subset of edges , however, the congestion limitation is with respect to all graph edges.
The main contribution of this section is an existential result regarding cycle covers with low congestion. Namely, we show that any graph that is 2-edge connected has a cycle cover where each cycle is at most the diameter of the graph (up to factors) and each edge is covered by cycles. Moreover, the proof is actually constructive, and yields a polynomial time algorithm that computes such a cycle cover.
For every bridgeless -vertex graph with diameter , there exists a -cycle cover with and .
The construction of a -cycle cover starts by constructing a BFS tree . The algorithm has two sub-procedures: the first computes a cycle collection for covering the non-tree edges , the second computes a cycle collection for covering the tree edges . We describe each cover separately. The pseudo-code for the algorithm is given in Figure 4. The algorithm uses two procedures, and which are given in Section 4.1.1 and Section 4.1.2 respectively.
4.1.1 Covering Non-Tree Edges
Covering the non-tree edge mainly uses the fact that while the graph many edges, then the girth is small. Specifically, using Fact 1, with we get that the girth of a graph with at least edges is at most . Hence, as long as that the graph has at least edges, a cycle of length can be found. We get that all but edges in are covered by edge-disjoint cycles of length .
In this subsection, we show that the set of edges , i.e., the set of non-tree edges can be covered by a -cycle cover denoted . Actually, what we show is slightly more general: if the tree is of depth the length of the cycles is at most . Lemma 1 will be useful for covering the tree-edges as well and is used again in see next subsection (Section 4.1.2).
Let be a -vertex graph, let be a tree of depth . Then, there exists a -cycle cover for the edges of .
An additional useful property of the cover is that despite the fact that the length of the cycles in is , each cycle is used to cover edges.
Each cycle in is used to cover edges in .
The rest of this subsection is devoted to the proof of Lemma 1. A key component in the proof is a partitioning of the nodes of the tree into blocks. The partitioning is based on a numbering of the nodes from to and grouping nodes with consecutive numbers into blocks under certain restrictions. We define a numbering of the nodes
by traversing the nodes of the tree in post order. That is, we let if is the node traversed. Using this mapping, we proceed to defining a partitioning of the nodes into blocks and show some of their useful properties.
For a block of nodes and a subset of non-tree edges , the notation is the number of edges in that have an endpoint in the set . We call this the density of block with respect to . For a subset of edges , and a density bound b (which will be set to a constant), an -partitioning is a partitioning of the nodes of the graph into blocks that satisfies the following properties:
Every block consists of a consecutive subset of nodes (w.r.t. their numbering).
If a block has density then consists of a single node.
The total number of blocks is at most .
For any b and , there exists an -partitioning partitioning of the nodes of satisfying the above properties.
This partitioning can be constructed by a greedy algorithm that traverses nodes of in increasing order of their numbering and groups them into blocks while the density of the block does not exceed b (see Figure 5 for the precise procedure).
Indeed, properties 1 and 2 are satisfied directly by the construction. For property 3, let be the number of blocks with . By the construction, we know that for any such block the block that comes after satisfies . Let be the final partitioning. Then, we have pairs of blocks that have density at least b and the rest of the blocks that have density at least . Formally, we have
On the other hand, since it is a partitioning of we have that . Thus, we get that and therefore as required. ∎
Our algorithm for covering the edges of makes use of this block partitioning with . For any two nodes , we use the notation (or simply ) to denote the unique simple path in the tree from to . The algorithm begins with an empty collection and then performs iterations where each iteration works as follows: Let be the set of uncovered edges (initially ). Then, we partition the nodes of with respect to and density parameter b. Finally, we search for cycles of length at most between the blocks. If such a cycle exists, we map it to a cycle in by connecting nodes within a block by the path in the tree . This way a cycle of length between the blocks translates to a cycle of length in the original graph . Denote the resulting collection by .
We note that the cycles might not be simple. This might happen if and only if the tree paths and intersect for some . Notice that the if an edge appears more than once in a cycle, then it must be a tree edge. Thus, we can transform any non-simple cycle into a collection of simple cycles that cover all edges that appeared only once in (the formal procedure is given at Figure 7). Since these cycle are constructed to cover only non-tree edges, we get that this transformation did not hurt the cover of . The formal description of the algorithm is given in Figure 6.
We move to the analysis of the algorithm, and show that it yields the desired cycle cover. That is, we show three things: that every cycle has length at most , that each edge is covered by at most cycles, and that each edge has at least one cycle covering it.
The bound of the cycle length follows directly from the construction. The cycles added to the collection are of the form , where each are paths in the tree and thus are of length at most . Notice that the simplification process of the cycles can only make the cycles shorter. Since we get that the cycle lengths are bounded by .
To bound the congestion of the cycle cover we exploit the structure of the partitioning, and the fact that each block in the partition has a low density. We begin by showing that by the post-order numbering, all nodes in a given subtree have a continuous range of numbers. For every , let be the minimal number of a node in the subtree of rooted by . That is, and similarly let .
For every and for every it holds that (1) and (2) iff .
The proof is by induction on the depth of . For the base case, we consider the leaf nodes , and hence with -depth, the claim holds vacuously. Assume that the claim holds for nodes in level and consider now a node in level . Let be the children of ordered from left to right. By the post-order traversal, the root is the last vertex visited in and hence