On the Local Communication Complexity of Counting and Modular Arithmetic

by   Bala Kalyanasundaram, et al.
Georgetown University

In standard number-in-hand multi-party communication complexity, performance is measured as the total number of bits transmitted globally in the network. In this paper, we study a variation called local communication complexity in which performance instead measures the maximum number of bits sent or received at any one player. We focus on a simple model where n players, each with one input bit, execute a protocol by exchanging messages to compute a function on the n input bits. We ask what can and cannot be solved with a small local communication complexity in this setting. We begin by establishing a non-trivial lower bound on the local complexity for a specific function by proving that counting the number of 1's among the first 17 input bits distributed among the participants requires a local complexity strictly greater than 1. We further investigate whether harder counting problems of this type can yield stronger lower bounds, providing a largely negative answer by showing that constant local complexity is sufficient to count the number 1 bits over the entire input, and therefore compute any symmetric function. In addition to counting, we show that both sorting and searching can be computed in constant local complexity. We then use the counting solution as a subroutine to demonstrate that constant local complexity is also sufficient to compute many standard modular arithmetic operations on two operands, including: comparisons, addition, subtraction, multiplication, division, and exponentiation. Finally we establish that function GCD(x,y) where x and y are in the range [1,n] has local complexity of O(1). Our work highlights both new techniques for proving lower bounds on this metric and the power of even a small amount of local communication.



There are no comments yet.


page 1

page 2

page 3

page 4


A New Approach to Multi-Party Peer-to-Peer Communication Complexity

We introduce new models and new information theoretic measures for the s...

The layer complexity of Arthur-Merlin-like communication

In communication complexity the Arthur-Merlin (AM) model is the most nat...

Simultaneous Multiparty Communication Complexity of Composed Functions

In the Number On the Forehead (NOF) multiparty communication model, k pl...

Robust Lower Bounds for Graph Problems in the Blackboard Model of Communication

We give lower bounds on the communication complexity of graph problems i...

Detecting cliques in CONGEST networks

The problem of detecting network structures plays a central role in dist...

Separating k-Player from t-Player One-Way Communication, with Applications to Data Streams

In a k-party communication problem, the k players with inputs x_1, x_2, ...

Algorithmic counting of nonequivalent compact Huffman codes

It is known that the following five counting problems lead to the same i...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In the standard study of number-in-hand multi-party communication complexity the input bits for a function of size are partitioned among two or more players. The goal is for these players to work together to compute over the input bits by transmitting information using network channels or a shared blackboard. In the deterministic context, the communication complexity of a given protocol is the total number of bits transmitted or recorded in the worst case over all possible inputs.

More recently, a natural variation was introduced that instead counts the maximum number of bits sent or received by any one player [9, 4]. Though different names have been given to this measure, we call it local (communication) complexity. In [9]

, the authors show connections between this metric and efficient distributed pattern recognition. Later, in 

[4], the authors describe this metric as a “fundamental area to explore,” noting that measures of global communication obscure the local load at individual players, a critical factor in settings where local processing is an important resource to conserve. They further underscore this importance by establishing formal theoretical connections between local complexity and multi-party secure computation, streaming algorithms, and circuit complexity.

Informally, this model consists of players connected by network channels. Each player gets a bit as input. They exchange messages according to some deterministic protocol to compute a function on these input bits. Each player maintains a single receive buffer into which all received bits are placed, and they read from their buffers one bit at a time. Executions proceed asynchronously to prevent the implicit encoding of information into silent rounds.

The study of local (communication) complexity remains in its early stages. To date, for example, to the best of our knowledge there are no known non-trivial lower bounds on the local complexity of specific functions, and only a small number of problems have been analyzed from the perspective of identifying the number of bits that must be received locally. The study of what can and cannot be solved with small local complexity is further endorsed by the connection between this model and linear-size circuits. Strong lower bound on the local complexity of a specific function would imply strong lower bounds on the circuit complexity of the function.111In [4], the authors note that proving a given function requires local complexity in implies a circuit complexity in , which would represent a major breakthrough in the study of the latter field.

Results: We start by tackling the open problem of producing a lower bound on local complexity for a specific function. We focus on the natural challenge of counting the number of ’s among the input bits, as this seems intuitively difficult to accomplish when restricted to communicating only a very small number of bits at each player. For a given bit sequence , let , for , be the number of bits in this substring of . We formalize the -counting function, denoted , as follows: Assume players numbered to , such that each player receives input bit . We say a protocol executed by these players solves -counting if at least one player outputs , and no player outputs something different.

In Section 4, we prove every solution to -counting has a local complexity strictly greater than . Though it is intuitive that you cannot count too high with such a small complexity, we emphasize that establishing such a claim is less obvious; requiring, in this case, a novel combination of indistinguishability and information theory techniques.

We next explore whether we can strengthen this lower bound by increasing . A natural conjecture, for example, is that local complexity is required to count the ’s in the first input bits. In Section 5, we disprove this conjecture with a protocol that solves -counting with local complexity . We then show how to solve -counting, and therefore solve every symmetric function, with a local complexity of only . These solutions borrow techniques from circuit design to recursively apply distributed adder circuits to aggregate the sums in an efficient distributed manner.

We conclude this study of counting by considering the two related problems of sorting and search. Counting the number of bits provides a straightforward solution to the problem of sorting the input bits, as in a setting with binary input values, sorting reduces to arranging all the bits to precede the bits. With this in mind, we show how to transform a counting solution into a sorting solution at the cost of only one extra bit of local complexity. Less obvious is the problem of searching for the position of the th -bit among the inputs. Deploying a more involved strategy, we show how to solve this problem with constant complexity using our counting solution as a subroutine.

Having established that symmetric functions can be computed with constant local complexity, we next turn our attention to the important class of 2-symmetric functions. Recall, a function is called 2-symmetric (or bi-symmetric) if the binary input bits can be split into two groups such that the function value does not change when we permute inputs within each of the groups. Applying our previous counting strategy, we can compute and with constant local complexity, in the sense that one player in the first partition learns its group’s count is , and one player in the second partition learns its group’s count is . The question remains whether we can move forward from here to efficiently compute interesting functions of the form .

In Section 6, we provide some positive answers to this question by describing strategies for computing , , , , , and , all modulo ,222Modularity is required as we are dealing with unary encoding of inputs and outputs. with only a constant local communication complexity. These results underscore the surprising power of computation with low local complexity, and the importance therefore of our lower bound. They also provide useful efficient subroutines for other computations, as many interesting problems have algebraic representations. To underscore this final point, we show, perhaps surprisingly, that where and and in the range can also be computed in constant local complexity.

Discussion: We emphasize that to the best of our knowledge, this work is the first to establish non-trivial bounds for what can and cannot be solved in a distributed fashion with small local complexity. There are many different problems we might have considered. Our choice to study basic distributed counting and arithmetic tasks were motivated by two factors: (1) they are natural and simple to define; and (2) they yield sharp computability thresholds (e.g., what can be counted with versus bit of local complexity).

Our decision to focus on deterministic protocols is similarly motivated by the simplicity of starting with the cleanest possible problem and model definitions. We note that the power of randomness for the problems studied here is not obvious, especially when considering the careful deterministic structuring of communication patterns often deployed by constant local complexity solutions. Non-determinism, by contrast, can be shown to be strictly stronger than determinism.333It is straightforward to show how to easily compute every 2-symmetric function with constant local complexity with a non-deterministic protocol. As mentioned in Section 6, however, a straightforward counting argument establishes that there exist 2-symmetric functions with local complexities in for deterministic protocols.

Of course, our work, combined with prior results [9, 4], still only scratches the surface when it comes to the deep exploration of local complexity. Our goal here is not just to investigate this specific set of problems, but to help instigate going forward the broader embrace of this intriguing and fundamental metric by the distributed algorithm theory community.

2 Related Work

The local communication metric studied here was introduced in [9, 4]. Our paper is perhaps best understood as a follow-up to [4], which motivated this model, but largely focused on problems with large local complexity, leaving small local complexity as a topic for future exploration (a challenge we take on here). Our formal model definition (Section 3) is somewhat more detailed than in [4], as such formality was needed to prove concrete lower bounds on specific functions. Below we summarize existing work on communication complexity that predates and informs the work here and in [9, 4] on local complexity.

Naturally, local communication complexity can be understood within the lineage of standard (global) communication complexity results, as it shares a commitment to minimizing the exact number of bits required for computing functions with inputs spread between players. The study of (global) communication complexity started with Yao [14] in 1979. The main measure in this context is the total number of bits exchanged between two parties computing a function on their inputs. Later, Chandra, Furst, and Lipton [5] introduced a multi-party communication complexity setting which is often referred to as a “number on the head” model: there are now potentially more than players; each players knows all the other players’ values, but not its own; and they communicate by writing on a shared blackboard. The complexity is the total number of bits displayed on the board. Babai, Nisan and Szegedy [1], among others, subsequently developed numerous bounds for this model (see the book by Kushilevitz and Nisan [11] for an thorough review of this period). Closer to our model is the subsequent work on so-called “number in hand” inputs, in which each player only knows its own input bits. Numerous papers consider multi-party number in-hand computation, with several different communication assumptions: namely, the message passing, blackboard, and coordinator models. All of them measure the total number of bits sent/written in the network; e.g., [5, 1, 12, 6, 8, 13, 2].

Also relevant are synchronous models that similarly explore the amount of communication required to compute a function on data distributed among multiple servers. In recent years, for example, the massively parallel communication (MPC) model [3] has received increased attention. Inspired by Map Reduce/Hadoop-style systems (see [7]), this model typically bounds the amount of incoming communication at a given server in a given round by its local storage capacity. The goal is to find good trade-offs between rounds and storage required to compute given classes of functions (much of the early work in the MPC model, for example, focused on conjunctive queries on data [3, 10]). Closely related to these models is the congested clique (e.g., [12]), in which data is distributed among servers in a fully-connected network, and communication bounds are now placed on channels.

The above summary only samples the many papers that study the communication required for the synchronous computation of distributed functions. Though related in spirit to our work on local communication complexity, the results do not directly apply to our setting. In these synchronous models, the goal is to reduce the number of rounds required to compute a function, whereas we minimize the exact number of bits sent or received at every player.

3 Model

Here we formalize our multi-party communication model and the local complexity metric we study in this setting. Our definitions are more formal than in recent work on this metric [4] as such specificity is needed to study lower bounds for concrete functions. After introducing our formal definitions, we briefly discuss the specific choices we made in attempting to nail down a model that balanced simplicity, tractability, and fidelity to existing work.

In more detail, we model a collection of deterministic computational processes (called both players and nodes in the following) executing in a variation of the standard asynchronous message passing model modified to better suit the study of communication complexity. We do not model messages arriving over discrete channels. Instead, the contents of received messages are appended to a single string of received information that the receiver processes one bit at a time, allowing for fine-grained control of exactly how many bits are consumed.

Communication and Computation: Formally, let be the set of nodes in a fully-connected network topology. We assume each node maintains a receive string which will store the bits of incoming messages. This string is initialized to be empty.

For a given node , if is not empty then the scheduler must eventually remove the first bit from and schedule a event at . As in the standard asynchronous message passing model, when a event is scheduled, node can update its state given the new bit , and send new messages to nodes in . In more detail, if executes a command during its processing of a event, for some message and destination , then the bit string is appended to the end of ’s receive string . We treat the execution of the steps associated with a event, including any commands, and the corresponding appending of sent message bits to other destinations’ receive strings, as one atomic step. Notably, this prevents bits in different messages from interleaving at a common receiver.

Also as in the standard asynchronous message model, we assume that each node can also define an event, which like a event can include commands. For each , the scheduler must eventually schedule the event, and it must do this before scheduling any events. That is, each node gets a change to initialize itself before it starts processing incoming bits.

In this paper, we study algorithms that assume each player in the network is provided a single bit as input. For each node we use the notation to indicate ’s input bit. Each node is also able to invoke for , as part of its and/or event step computation. A problem in this setting can therefore be understood as a mapping from each possible binary input assignment to an integer in binary, which we can express as a function of the form . We say nodes in our model solve or compute such a function if provided input assignment , at least one node outputs , and no node outputs anything different.

Local Communication Complexity: For a given execution , and node , let and be the total number of bits sent and received by , respectively, in . We define the local communication complexity of a given in , indicated , as follows: . We then define the local communication complexity of the entire execution , indicated , as . Let be a deterministic protocol. We define to be the maximum value defined over every execution of . Finally, for a given function , we define the local communication complexity of , also denoted , to be the minimum over every protocol that correctly computes .

For the sake of concision, we often use the slightly abbreviated phrase local complexity to refer to the local communication complexity of a protocol or function.

Discussion: We opted for an asynchronous communication model as round numbers can leak information not captured by our complexity metric. We also avoided distinct channels for each sender/receiver pair as these channels provide for free the identity of a given bit’s sender. Because we focus in this paper on computing protocols with very small local complexity, such leaks might end up significant. Our solution to this issue was to introduce a common receive buffer at each receiver on which incoming messages from all potential senders are appended. In this setup, for example, if a sender wants to deliver a single bit to a receiver it can do so, and this bit will be appended to the receiver’s buffer, but the receiver learns nothing about the source of the bit. If the sender wants the receiver to know who sent the bit, it has to actually send the up to bits required to encode its id. Another solution to avoid pairwise channels would have been to deploy a central coordinator through which all bits are sent (an approach sometimes deployed in the existing global communication complexity literature), but this centralization seemed incompatible with our focus on the local number of bits sent and received at each individual player.

We also note that several similar definitions of our local complexity metric are possible. We define local complexity at a given player in a given execution as the max of the number of bits it sent () and the number of bits it received (). One alternative would be to focus only on —that is, the bits sent—when measuring local complexity. This trivially allows, however, all functions to be computed with a minimum complexity of by having all players send their input bit to a single pre-determined leader who locally computes the function.

Another alternative is to measure only , the bits received. This metric also seems to provide too much power to the players. It is not hard to show, for example, that it enables the computation of every bi-symmetric function with local complexity. The basic idea is to deploy the counting routines we present later in this paper that enables one player in the first partition to learn the count of bits in the first partition, and one player in the second partition to learn the count of bits in the second partition. A close look at the routines reveal that the player that learns the count is dependent on the count itself (roughly speaking, if the count is , then the player in the partition learns this fact). The player that learns the count from the first partition can now send a bit to every player in the second partition for which that player’s corresponding count would cause the bi-symmetric function to evaluate to .

Giving these observations, our use of the maximum of and seemed the right choice to capture our intuitive understanding of local complexity, while avoiding sweeping solutions to large classes of problems. More generally, we emphasize that there is rarely an obvious best way to model and measure multi-party communication complexity, as evidenced by the variety of definitions in the existing literature. And as we have learned, all decisions in such modelling evince trade-offs. We did our best here to arrive at a natural and straightforward definition that captures the local communication we wish to study while sidestepping both trivializing assumptions and artificial difficulties.

4 Counting Lower Bound

A natural starting place to study what can and cannot be solved with small local communication complexity is the fundamental task of counting. In more detail, we study the local communication complexity of solving -counting function. Formally, we seek to identify a parameter for the counting function (defined in the introduction), such that the local complexity of the function is strictly greater than . The core result of this section is a lower bound that establishes for any sufficiently large , . We emphasize that is the first known lower bound on local complexity for a concrete function (prior work [4] contains only existential bounds based on counting arguments).

4.1 Proof Summary

At a high level, our proof strategy begins by focusing on local complexity of the related -threshold detection problem, which requires the nodes to determine if at least of the input bits are . We prove that any protocol that solves this problem with local complexity is highly constrained in its operation, generating executions that can be understood as a bit traveling in a chain, from one node to the next, with the final node making a decision.

Given such a structure, we apply a combinatorial argument to argue that for a sufficiently large constant threshold , we can construct two execution chain prefixes such that: (1) the correct output is different for each chain (i.e., one chain has enough ’s to exceed the threshold while the other does not); and (2) the node at the end of both prefixes sends the same bit to the next link, obfuscating the actual contents of its predecessors. The existence of these two prefixes can be deployed to generate an incorrect answer in at least one of the two cases, contradicting the assumption that any algorithm correctly solves threshold detection for this parameter.

Finally, once we bound threshold detection, we then use a reduction argument to obtain our final bound for the more natural counting problem.

4.2 Bounding Threshold Detection

We begin by proving a lower bound on the local complexity of the -threshold detection boolean function, that evaluates to if and only if at least out of input bits are . Formally, we use to indicate this function for a given pair of parameters and , and define it as:

Our goal is to prove that the following, which establishes for sufficiently large value that threshold detection for requires a local complexity greater than .

Theorem 4.1.

Fix some network size , threshold . It follows: .

Before proceeding to main proof of this theorem, we establish some useful preliminaries that formalize the constraints suffered by any threshold detection algorithm with a minimum local complexity of . In the following, we use the notation to represent the players. We say that is an initiator with respect to a given input bit if its initialization code for that input bit has it transmit a bit before receiving any bits. A key property of a minimal local complexity environment is that a correct protocol can only ever have one initiator:

Lemma 4.2.

Fix some , , , and protocol that computes with . There exists a player such that for every input assignment, is the only initiator among the players.


We first argue that there must be at least one initiator. Assume for contradiction that for some input assignment, , there are no initiators. It follows that no players send or receive any bits. Because we assume correctly computes , some player must output the correct answer without ever having received any bits. Let be a player that outputs. If it outputs , meaning there is at least input bits set to in our fixed assignment, it will do the same even when set all other input bits to —leading to an incorrect output. Symmetrically, if outputs , it will do the same when we set all other input bits to —leading to an incorrect output. This contradicts the correctness of .

Moving forward, therefore, we consider the case in which there are more than one initiator. Once we have established that there cannot be more than one initiator, we will show that this one initiator must be the same for all input assignments. Assume for contradiction that there exists some input assignment, for which has more than one initiator. Let and be two such initiators. Assume that the initialization code for with input bit has send bit to player , and the initialization code for with has it send to . Using these observations on the behavior of and we will identify an input assignment, that we can leverage to identify a contradiction.

Fix and . Fix . Fix input values for the remaining players such that the total number of bits in the assignment is exactly (because we assume , this is always possible).

Consider an execution of with assignment . Because this is an asynchronous systems an execution for a given input can depend on the scheduling of send and receive events. Assume a round robin scheduler that proceeds in rounds as follows: During the first round, it visits each player in order , and so on, scheduling each player to complete its initialization transmission (if any). In each subsequent round, it visits each player in order, for each, scheduling the processing of bits transmitted in the previous round, and then completing any new transmissions these received bits generate. Call this execution .

In this execution we can break up communication into what we call chains, which capture the causal relationship of sends and receives beginning with a given root player. For example, if we fix as a root, and note that sends a bit to , which then enables to send a bit to some , and so on, we note that there is a chain rooted at that begins

Moving on, we note that by construction: . It follows that at least one player must output in . Fix one such player that outputs . We argue that cannot be in both the chains rooted at and . If this was the case, then at some point as we followed the chain from to , and the chain from to , some node would have to be visited in both. This would require to receive at last bits which is not allowed in a protocol with a local complexity of

Without loss of generality, assume that is not in the chain rooted at (the other case is symmetric). Consider the execution , in which: (1) the input bit to is changed to ; (2) we replace the round robin scheduler with one that first schedules the nodes in the communication chain from to and on to , in order, leading to output. After this, we can revert to the round robin scheduler strategy to ensure all pending players get chances to take steps.

By construction, is indistinguishable from with respect to . Therefore, will output the same value in as . Because we flipped the input value of in , this output is wrong. This contradicts the assumption that always correctly computes .

We have now established that every input assignment has exactly one initiator. We want to now show that this initiator is the same for every assignment. To do so, assume for contradiction that assignment has as its single initiator, and assignment has player as its single initiator. Consider a third assignment which is defined the same as with the exception that player is given the same bit as in . We have now identified an assignment with two initiators. We argued above, however, that every assignment has at most one initiator: a contradiction. ∎

The above lemma established that executions of protocols for threshold functions with minimum local complexity have a single initiator, meaning they can be described as a sequence of player/message pairs. We provide some notation to formalize this idea:

Definition 4.3.

Fix a protocol with a single initiator and a local complexity of . We can describe an execution prefix of this protocol containing the first transmissions with a single chain of the form: where for each , is the player to receive a bit, and the bit it receives is . If , then describes the bit sent in response to receiving . Define . Because is an initiator, by convention we set . We use the notation , for some , to indicate the concatenation of step to the end of chain .

When considering a chain that describes an execution prefix, we can label each step in the chain with a value pair, , where is the number of players involved in the chain up to and including , and is the number of these players with an input bit of . The value pair for a given step captures, in some sense, a possible information scenario could generate that given step.

When considering the value pairs for a chain of an execution prefix of a protocol computing a threshold function , we say a pair of numbers is valid if two things are true: it is well-formed, in the sense that the observed values could show up as a value pair for a step in a chain (e.g., is not greater than ); and they are bivalent, in that the values are compatible with both an output of or as the chain extends, depending on the details of the extension. Formally:

Definition 4.4.

Suppose the function under consideration is . We say that a pair is valid with respect to this function if the values are:

  1. Well-Formed: and .

  2. Bivalent: and .

When considering chains for an execution of a protocol that computes a given with a local complexity of , we might want to ask the question of what are the properties of input bit assignments could possibly lead to a given step . We formalize this question by defining a set that captures all value pairs compatible with a given step:

Definition 4.5.

Given a protocol that computes a function with local complexity , a player , , and bit , we define the set to contain every pair that satisfies the following properties;

  1. is valid with respect to , and

  2. input assignment for that induces a chain that includes a step labeled with value pair .

Before tackling our main theorem, we have one last useful result to establish: that every valid pair for a given and value can show up in some chain.

Lemma 4.6.

Fix a protocol that computes a function with local complexity . Let be any valid pair for . There exists a player and bit , such that the set , defined with respect to , includes .


Fix a , , , and as specified by the lemma statement. By Lemma 4.2, every execution of has a single initiator and can be described by a chain. We will create such a chain step by step, setting the input bit for each player in the chain only after they appear in the chain receiving their bit. In more detail, for the first players that show up in the chain, we set their input to . For the remaining players, we set their input bits to . Notice that we can set these input bits after a player shows up in the chain, because in a setting with local complexity , after a player receives a bit, if it cannot output, it must send a bit to keep the execution going, regardless of its input. Its input bit can determine which player receives its transmission, which is why we have to build this assignment dynamically as the chain extends.

A straightforward contradiction argument establishes that none of the first players in this chain can avoid transmitting, and therefore extending the chain. This follows because, as constructed, this chain remains bivalent until at least player , in the sense that at every step, there exists an assignment of input bits to the players that have not yet participated that makes the correct output, and an assignment that makes the correct output.

Let be step in this chain. By construction: . ∎

We now have all the pieces required to tackle the proof of Theorem 4.1 by deploying a novel combinatorial argument. We begin by fixing and . We show that every valid must show up in at least one set. Because there are fewer such sets than valid pairs, the pigeonhole principle tells us that some must have multiple pairs. (It is here that the specific values of and matter, as they dictate the number of possible valid pairs.)

At a high-level, that means when receives bit in a chain, there are multiple possibilities regarding how many one bits appear in the chain leading up to this step. Because cannot distinguish between these value pairs we can, with care, craft an execution extension in which the protocol outputs the wrong value. In making this argument, extra mechanisms are required to deal with the possibility that the first player in the chain ends up the last player as well (this is possible because an initiator begins an execution without having yet received a bit). See Appendix for the proof of Theorem  4.1.

Once we have established our impossibility for and , we apply a reduction argument to generalize the results for larger values, by showing such solutions could be used to solve our original fixed-value case. This argument leverages the ability of the players to locally simulate additional players without expending extra communication bits.


(of Theorem  4.1) Assume for contradiction that there exists a protocol that computes the -threshold detection function, denoted , with a local complexity of . We will prove that this protocol must sometimes output the wrong answer, contradicting the assumption that its correct. We will then generalize this argument to larger and values using a reduction argument.

Let be the set of valid value pairs for . Simple counting establishes that . By Lemma 4.6, every must show up in some pair set . Because there are possible players and possible bits, there are possible pair sets. The pigeonhole principle therefore establishes that there exists a player and bit such that .

Going forward, we will use this target pair to create our contradiction. Consider the values in . By the definition of , each in this set is associated with at least one chain that ends with step , includes players, exactly of which have input bit . Call these source chains. Label each pair in with one of its source chains. Further label each of these source chains with a compatible input value assignment for the players in the chain (i.e., what is the input assignment to these players that generates the chain; choosing one arbitrarily if more than one assignment would create the same chain).

Because there are at least three pairs in , there must be two such pairs, , , such the initiators in their respective source chains must have the same input bit in their compatible value assignment. Notice, by Lemma 4.2, each of these source chains start with the same initiator. To simplify notion, let us call this initiator for the purposes of our proof. As will become clear, it is important that has the same input bit in both source chains as it is possible that eventually full chain we consider will loop back to .

Moving forward, we will use to reference the relevant source chain associated with , and to be the relevant compatible input assignment. We define and analogously but now with respect to . Recall that by construction is assigned the same bit in and .

We consider two cases concerning the players that shop up in chains and :

Case 1: .

By definition: both and are valid value pairs. It follows that both are bivalent, meaning that the input bits of the players that have sent or receive bits so far are not sufficient to determine the value of the function. A straightforward contradiction argument establishes that no player in either chain can output until the chain extends further, as if any player outputs , the bits of the players not in the chain can be set to to make that answer incorrect, and if any player outputs , the remaining bits can be set to . Therefore, when we get to step in both chains, the output has not yet been determined.

Because , there must be a player in one set but not the other. Without loss of generality, say is only in . Fix any possible extension of (where “possible” means there is an input assignment to the players in such that when combined with the fixed assignments for players in , describes the steps of the resulting execution).

The key observation is that it must be the case that . This follows because if extends then it also extends , as both and end with the same step: receiving . However, cannot occur because it features both in and , meaning that this chain would require the same non-initiator player444We know that because it only shows up in on o the two chains, and , whereas is the single initiator in both. to receive bits, which it cannot given our assumption of a local complexity of .

This observation creates an obstacle for the correctness of our protocol. We have just established that every way we can extend must omit . Consider the extension that occurs when we fix the input bits of all players that are not in and not , such that the total number of bits is . The execution corresponding to must eventually output. It does so, however, without sending a bit. If this execution outputs , then it is incorrect in the case where has bit , and if it outputs , then it is incorrect in the case where has bit .

Case 2: :

If then it follows that . Because , it also follows that . That is, the number of bits encountered before receives is different in versus . Player , of course, receives the same bit in both cases, so it must proceed without knowing if the count is or . The only player in these chains that can possibly receive another bit is the common initiator , as only initiators send a bit before receiving any bit. Since this initiator has the same input bit in both and (here is why it was important that we earlier identified two chains that satisfied this property), our protocol must eventually output without ever learning the true count of bits in the prefix leading up to ’s step.

To formalize this intuitive trouble, assume without loss of generality that . Because is valid, we know . Consider the extension that occurs when we set exactly of the players outside to have input bit . The input assignment corresponding to includes exactly bits, therefore some step in must correspond to a player outputting .

If we consider this same input assignment for the players outside of , we will get the same extension , as the last step in is the same as the last step in . The set is disjoint from the set with the possible exception of , as it is possible that the initiator ends the chain it started. By definition, however, all players in have the same input bit in the assignments corresponding to and (recall, we selected and specifically because their corresponding assignments give the same bit), and they receive and send the same bits in both, so the player in that outputs in the execution corresponding to also outputs in the execution corresponding to . This latter output, however, is incorrect, as the number of bits n the corresponding input assignment is strictly less than .

We have just established that any fixed protocol attempts to compute with local complexity can be induced to output the wrong answer. This contradicts our assumption that such a protocol exists. We now use this result the generalize our impossibility to larger and values.

Fix any and values where , as specified by the theorem. Assume for contradiction we have a protocol that computes for these values with local complexity . We will now define a protocol , defined for players, that simulates in a distributed fashion to compute with a local complexity of —contradicting our above result that no such protocol exists.

In more detail, protocol has the players in collectively simulate the players in , such that first players in start with input bit , and the rest (if any remain) with input bit . Our assumption that ensures that there are at least players in to initialize with a bit (as implies that , as needed). Notice, the output in this simulated setup is if and only if at least of the players in have input but . Therefore, if we can correctly simulate in this setting we can compute .

We are left then to show how to correctly implement this simulation. We can assume without loss of generality that the single initiator in (as established by Lemma 4.2). It begins by running as specified. If the protocol has it send a bit to a player in , then it can send the bit as specified. If it is instead instructed to send a bit to a player in , it simulates locally that player receiving the bit and simulates that player’s subsequent send. It continues this simulation until a bit is sent to a player in , at which point the bit is actually sent to that player by . Continuing in this manner, can simulate running on all players.

Two properties support the correctness of this simulation. First, each player in can receive at most one message, so each player only needs to be simulated once, eliminating the need for multiple players in to coordinate the simulation of a single player. Second, given a chain that starts with a player , moves through one or more players in , and then ends at a player in , it is valid for to send a bit directly to , as saved the bit it was instructed to send to a player by (as it just locally simulated this communication), and the local complexity model does not convey the source of a received bit, so cannot distinguish from which player an incoming bit was sent. ∎

4.3 Generalizing from Threshold Detection to Counting

We now leverage our result on threshold detection to derive a lower bound on any protocol that solves counting. The reduction here is similar in construction to the argument deployed in the preceding proof to generalize the -threshold detection result to larger values of .

Theorem 4.7.

For every , it follows: .


Assume for contradiction that there exists a protocol that solves -counting with local complexity for some . We can use to define a new protocol that solves -counting also with local complexity . To do so, we deploy the same strategy from the reduction argument deployed in the proof of Theorem 4.1, and have the players participating in protocol execute , locally simulating the extra players expected by . They can simulate these extra players all starting with input bit .

We now have a protocol that solves -counting for some . We can use to compute the -threshold detection function in a network of size : run ; if has one of the first players output , then that same player outputs for the threshold detection result; otherwise, if a player beyond position outputs in , that same player outputs for the threshold detection result.

By Theorem 4.1, however, -threshold detection cannot be computed for with local complexity : a contradiction. ∎

5 Counting Upper Bounds

In the previous section, we proved that you cannot count to with only a single bit of local communication complexity. Here we explore how much additional complexity is required to count to higher values. We divide this investigation into three questions: (1) what is the largest such that we can solve -counting with a local complexity of ?; (2) what local complexity is required to solve -counting?; and (3) what other problems can be easily solved with low local complexity using these counting strategies as a subroutine?

We tackle the second question first, describing how to solve -counting with constant local complexity. This disproves the reasonable conjecture that the local complexity of -counting must grow as a function of (e.g., ). We then turn our attention to the question of how high we can count with a local complexity of only . Our solution, which deploys ideas from our -counting protocol in a more complex construction, solves -counting, demonstrating a stark discontinuity between and bits of local complexity. Finally, we establish two corollaries that deploy these strategies to solve both sorting and search with constant complexity.

5.1 Solving (n,n)-Counting with Constant Local Complexity

We begin by considering -counting, which we prove can be solved with local complexity . As mentioned, this disproves the natural conjecture that the local complexity of -counting must grow with . For ease of presentation, we begin with a strategy that assumes is a power of . This result can be generalized to an arbitrary at the cost of a more involved protocol.

We formalize this result below in Theorem 5.1. Its proof depends on the construction of a counting protocol that carefully minimizes the number of bits each individual node sends or receives. Given the importance of this strategy to all the results that follow in this section, we begin with a high-level summary of our protocol before proceeding with its formal description and analysis in the proof of Theorem 5.1.

Protocol Summary: At a high-level, the protocol that establishes Theorem 5.1 operates in two phases. During the first phase, a count of the number of bits is aggregated into a distributed counter in which nodes each hold a single counter bit. In slightly more detail, we start by partitioning the nodes into groups of constant size, and for each group aggregating the count of their bits into a distributed counter of constant size. We then begin repeatedly pairing up counters and having them sum up their values in a distributed manner using strategies derived from arithmetic circuit design, allowing them to calculate a sum without any single node involved in these counters needing to send or receive more than a constant number of bits.

At the end of the first phase, we have aggregated the total count into a distributed counter of size . In the second phase, the nodes that hold the counter bits help direct a descent through a binary tree with one leaf for each possible count. The goal is to arrive at the leaf corresponding to the value stored in the counter, consolidating knowledge of the entire count at a single node. To do so, each bit of the counter informs the nodes implementing its corresponding level of the tree its counter bit value, propagating it in a chain of transmissions to prevent too much local communication. Therefore, when the tree descent arrives at each level, the specific node at which it arrives knows which sub-tree on which to advance the descent.

The proof that follows details each of the steps that makes up these phases, carefully accounting for the exact number of bits sent and received in their implementation.

Formal Result: We now show that when implemented and analyzed carefully, the local complexity of the protocol summarized above is no more than .

Theorem 5.1.

For every and there exists a protocol that solves -counting with a local communication complexity of . This protocol can be used to compute any symmetric boolean function with the same local complexity.


We describe the repeated binary addition bottom-up process to store the count in binary. Now imagine a complete binary tree with groups as leaves. The protocol proceed one level at a time, starting from the leaf level, until it reaches the root. At each level, the protocol maintains the number of ’s in the sub-tree rooted at that level.

At the leaf level, we group 16 leaves at a time. There are leaves. Starting from the first group of 16, run a simple and naive count protocol to count the number of ’s and once the count is complete a message is sent to the first member of the next leaf to start the process. Within a group of 16, run a four-bit protocol from first member of the group to the 16th/last member of the group to count the number of ’s in a linear chain fashion. At the end, the sum is represented as 5 bits. The last member retains the least significant bit of the sum and sends one bit each to members 12, 13, 14, 15 such that these bit values put together form the sum in binary. As explained before, the 16th member of the group then sends a bit to the first member of the next leaf to start the counting process. In the end, the last leaf sends a bit to start the addition process at the next level of the tree . The recipient of the message is predetermined and will become clear when the processing of the next level is explained. At the end of the leaf-level processing, each member sends and receives at most 5 bits each.

We now describe a bottom-up counting protocol that computes the sum of these counts in binary using a simple addition with carry and store the results of intermediate sums as binary. The bits of the resultant binary number are stored distributively where every member stores a bit of the binary sum. This significantly reduce the local complexity. In order to show this, we keep track of the number of members available and the number of members used thus far in the process. Suppose we have computed the sum of s in groups of size . We will show how to compute the sum for a group of size . In each group of size , the binary bits of the number of s are kept in distinct locations. Let be the number of locations used exactly once for a group of size . Since exactly new locations are needed for the group of size , the recurrence relation for is and . Solving this recurrence relation, we get . Since , for all , each location stores the sum at most once and the locations can be fully pre-specified for each iteration. Each member has full knowledge of this participation.

In the computation of the sum for a group of size , two sub-groups of size each has the sum stored in locations each. There are two phases in this computation. In the first phase, called deposit phase, the bits to be added and deposited into a new location each. In the second phase, called carry-add phase, the bits are added and the carry is rippled and the carry information, the third bit, is a signal to perform the computation.

The deposit phase for group of size begins after the completion of carry-add phase for all sub-groups of size and it is initiated when a bit is received from the player who completes the carry-add phase for the last group of size . In the deposit phase, the two least (respectively ith-least) significant bit locations from previous computations send their bits to the least (respectively ith-least) significant bit location for the resultant sum for the group of size .

Once this depositing process is complete, the last member will send a message ”0” to the member representing least significant bit of the new sum to start the carry-add phase. When a member of the new sum has received three bits, it computes the sum bit and the carry bit. It stores the sum bit and sends the carry bit to the next location. Recall that there are many groups of size exist and all of them must be calculated before we move on to groups of size .

While carry information represents the third bit which triggers the calculation, the most significant bit calculation a group of size sends a 0-bit to the least significant bit for the next group to continue the calculation. The most significant bit calculation of the last group of size , sends a message to start the deposit phase for group of size . It is important to note that who participates in what is fully determined beforehand and everyone has full knowledge of this information.

We now calculate the number of bits sent and received by any node during this process. In the base case, each node sends and receives at most 5 bits each. Since each node participates in the calculation of one sum, it receives 3 bits (two bits plus a carry or control bit), and sends two bits (resultant sum bit to the next group-size and a carry bit to the current group-size).

Note that final sum is in the range where both and are included. It occupies bits and the most significant bit is ”1” if and only if the sum is . If this is the case, then the member corresponding to the most significant bit of the sum can declare the output of function. For now, let us assume that the sum is less than where the resultant sum stored in locations. We now describe a method to let one member know the sum without sending all bits to the node. It is this process that fails when we try to compute a bi-symmetric functions which we will talk about later.

All members participate in a Binary Search Tree, once as a leaf and once as an internal node of the binary search tree. We will set the root of the tree and leftmost descendants of the tree to be the members with the final count bit each. The root contains the most significant (that is th) bit assuming the sum is less than while the significance of the bit decreases as we descend the tree in the left most path. Each member containing the sum bit on the left-most branch of the tree will send the bit value, called control-bit, to all members on the same level of the tree in a sequential fashion. This starts with the root, when a level finishes the message passing, the last member sends a message ”0” to the leftmost member of the tree one level below the current level. So each member sends and receives one bit. When the last level finishes its processing, the last member sends a ”0” message to the root to start the descending process.

Each leaf node has a number which starts with and ends in and they appear in order from left to right. Depending the value of the control-bit, the root sends a message ”0” to the left, if the control-bit is zero or to the right child if the control-bit is one. Upon receiving a message any intermediate node will send a message ”0” to the left or the right child as per the control bit it has. At the last level, message ”0” is sent the left or right child. When a node receives the last message ”0”, it consults its designated number in the range and outputs the value of the function. Each member sends and receives at most one bit. Except for the last bit, the total number of bits sent and received each by a member is at most 2. Therefore local communication complexity is at most 5+3+3 = 11. ∎

For a tighter result, a more involved construction and analysis can achieve the same complexity of even if is not a power of . We omit these details for the sake of concision.

5.2 Solving (n, (n/10))-Counting with Local Complexity of 2

We now turn our attention to counting with a local complexity of . We show that even with this small amount of communication, counting up to is possible. The proof for this theorem deploys the same general tree-based counter aggregation and subsequent dissemination strategies introduced in our -counting solution. We now, however, carefully implement these strategies in such a way that the the nodes not counting their inputs can collaborate with the nodes that are counting to reduce the number of bits they need to send and receive from down to .

We begin by isolating and analyzing a key step of this efficient simulation: how to leverage helper nodes to implement the distributed counter addition strategy from our -counting solution with a local complexity of only . Recall that the simpler implementation of this addition step in our -counting solution, in which there were no helper nodes present to reduce communication, induced a local complexity of bits.

Lemma 5.2.

Suppose is an integer. Assume two sets of players with one input bit each, where each collection of bits is interpreted as a binary integer. There exists a protocol with local complexity that computes the binary sum of these two numbers and stores the result in a third set of players, using an additional set of players to support the computation, leading a total of total players involved. Though the overall local complexity is , the players involved in storing the two input numbers send bit each and receive none during the computation and players storing and the resultant sum send at most bit each during the computation.


Let be bits defining the first number a be the bits defining the second. Let be the resulting sum. In the following, we use the notation , for a given labelled bit , to denote the players responsible for bit . In addition to these players, we will use additional players, which we label and , for each .

We now describe the computation. For each , and send their bits to . Upon receiving two bits, each computes the XOR of the two bits and sends the result to . This value represents a tentative sum of the relevant two bits. Each also computes the AND of these two bits, encoding the tentative carry, and sends it to . For to compute the final sum for this bit position, it also needs to know the relevant carry, which it can receive from . Similarly, for to know the full carry to send to , it needs to learn not just the carry bit from , but also any carry resulting from the sum computed by .

Let us pull together these pieces: For , each and computes and communicates the following after receiving two bits in any order. computes XOR of the two received bits and stores it as the resultant sum bit. computes AND of the two received bits and sends it to . computes OR of the two bits and sends it to . stores the bit it received the most significant bit of the sum. We can bootstrap the relevant processes in position to send the correct value on initialization (e.g., has no carry bits to receive). It is easy to verify that the resultant sum computation is correct and it meets the required communication bounds. ∎

We are now ready to describe a protocol that solves -counting with local complexity of . This protocol will leverage the distributed adding strategy captured in the preceding lemma to effectively count bits among the first positions in an efficient manner. The remaining nodes will be used to implement the results, totalling, and carrying roles needed by this addition.

Theorem 5.3.

For every and , such that , there exists a protocol that solves -counting with a local communication complexity of .


Assume and so we are computing number of in the first positions. As in our proof of -counting result, we assume for now that is a power of . (The same technique that eliminates this assumption for the -counting case applies here, but as before we omit for the sake of clarity.) The computation follows the main idea of the proof of Theorem 5.1. However, we will use an additional members to reduce the number of bits used in the 11-bit protocol to 2 bits.

We partition players into 10 groups of members each. The goal is to count the number of ’s in the first group. The input contained in the remaining 9 groups will be ignored and the players will be used to support the counting of ’s in the first group.

The protocol is divided into six phases. In phase 1, the second group of players will receive information from first group in the following way. Partition the input bits of first group into collection of two positions at a time (say ) and perform a simple binary addition of the two bits for each n/2 collections and store the 2-bit output of the binary addition in the corresponding two positions (say ) in the second group. It is easy to see that this can be accomplished where both players (say and ,

is odd) in the first group send one bit to each of the two players

and . The players in the second group receives two bits each but each player can still send 2 bits.

In the second phase, the third group will receive information from the second group such that every four bit of the third group contains the sum, in binary, of the number of 1’s in the corresponding four players of the first group. Note that three bits are sufficient to store the binary value between and and the extra position is used to reduce the number of bits sent/received to 2, as shown below. The protocol proceeds in the following way. Given two binary numbers and , receives both bits and while and receives both bits and . sends to and to . sends to and stores in . upon receiving the second bit, sends the AND of these bits it received to while storing the OR of these two bits in . Upon receiving two bits, stores the OR of these two bits in . Notice that each of the through receives at most bits while sending at most bit. Observe that the result of the addition of and is in . Each can still send more bit. The binary count in represents the number of ’s in through . This process is repeated so that the sum of number of ’s in every successive group of four ’s.

In the third phase, we will show that players in the fifth group, namely , will contain binary bits such that represents the count of number of ’s in positions through . In order to do so, we will perform simple binary addition of numbers and using the protocol explained in Lemma 5.2 where the parameter . We employ a total of players. Six players from the fourth group, through , four players from fifth group, through , and seven players from third group, and will be the 16 players that employ the protocol of Lemma 5.2. It it not hard to see that the communication limitations of the Lemma 5.2 is met.

In the fourth phase, we set players in the seventh group, namely , to contain binary bits such that represents the count of number of ’s in positions through . This is done by performing binary addition of and . As before, we apply Lemma 5.2 where the parameter . We use players out of which 8 are carrying input bits, namely ’s, and 5 store the output bits, namely ’s. We need additional 8 players from sixth group, namely through , to perform the computation.

Starting with binary counts of ’s at a time, stored in ’s, we will perform repeated binary addition (bottom-up counting process) to compute the count of all locations. Unlike the previous four phases where we needed additional group for each addition, the fifth phase performs all of the binary additions starting from bits to bits using only one additional group . This is the eight group of players. As we perform repeated additions using Lemma 5.2, each addition involves players out of which are input/output players. Only players perform intermediate computations. Since , the number of players used to perform intermediate computations is no larger than the number of players involved in the input/output parts of the process. The proof of Theorem 5.1 shows that at most players are used in storing input/output part of the addition process. Therefore, for the entire collection of additions involving bits to bits, the total number of players who perform intermediate computations is not larger than the number of players involved in the input/output parts of the addition. As argued in the proof of Theorem 5.1, if the final sum is then the most significant bit is and it can declare the output. Otherwise, each player of the remaining bits of the count will send its bit to another new player within the seventh group so that these new players have capability to send two bits instead of only one. This transition is possible within seventh group (’s) since the recurrence relation implies . Observe that is the number of locations used once in the process. Note that we do not use the calculation process of Theorem 5.1 since the number of bits sent and received by a player exceeds . We use the availability of the locations specified in the proof of Theorem 5.1 and perform computations as per Lemma 5.2. Let be the final count of number of ’s in binary. Note that is the least significant bit of the count.

The sixth and the last phase will contain two groups, namely the ninth group and the tenth group of players where one of ’s will output the correct count. As in the proof of Theorem 5.1, the players will form the internal nodes of a complete binary tree and the players will be the leaves. The root, we label level 1, of the binary tree will get the most significant bit, namely . The nodes in level will get . The distribution of this bit starts with the least significant bit first and end with the most significant bit to the root. This process does not proceed in parallel but in a sequential process using token passing. This requires to receive a bit to start the seeding of the next level. This ensures that the first bit received by the nodes of the tree is the control bit. Once the root gets its control bit, it follows the binary search process described in proof of Therorem  5.1. Only one leaf will receive a bit and it declares the correct count based on its position in the tree. ∎

5.3 Solving Sorting and Searching with Constant Local Complexity

The ability to count bits in the input with constant local complexity enables the solution of other natural problems with this same low complexity. We highlight two such problems here:

Sorting: The first problem is sorting. In the context of binary inputs distributed among nodes, , sorting reduces to gathering all the bits together. Formally, if , then to solve sorting for input , the nodes in should output , and the nodes in should output .

The counting solution described in analyzed in Theorem 5.1 has the nice property that not only does a node output , but the unique node that does so is . To extend this solution to sorting, therefore, it is sufficient for to disseminate a bit down the line from to , letting these preceding nodes know that they should also output . This increases the local complexity by a single bit from to . Formally:

Theorem 5.4.

The sorting of 1-bit inputs can be solved with local complexity .

Searching: Another natural problem is searching. In particular, for a given binary input assignment , we say that has the th “”, if: (1) , and (2) . We can therefore define a search problem, parameterized with , such that the goal is to output the id of the node with the th one. Building on our counting strategy, we can also solve this problem with constant local complexity:

Theorem 5.5.

For every network size and search location , searching for the th can be solved with constant local complexity.


First, run the protocol for counting the number of ones in the input where the result is in the binary form. This is the first tree of the Theorem 5.1.

Now store in binary in locations. If then check the count is equal to and report the outcome.

From now on we assume that and if there is ones it will be between through . This occupies bits in binary form. Let the count of s to be . Compare with . This comparison can be done in local communication complexity by starting with comparing most significant bit first and find out if . If then there is no such exists. Store in exactly the same place where is stored. Recall that we have the counts and whose sum led to still stored in appropriate places. We compare with in local communication complexity to find out if .

If the answer is yes, then perform subtraction . The subtraction is very similar to the addition we performed in the Theorem 5.1. Store the result in exactly the same place where the bits of count are stored. Now you can compare this new with .

But if the answer is no, then copy into where bits of are stored and compare this with the count .

This process continues until we hit a final group of bits. One can then easily find the original th one in constant local communication complexity. ∎

6 Modular Arithmetic with Constant Local Complexity

In this section we turn our attention to 2-symmetric (also known as bi-symmetric) functions, in which the input bits can be partitioned into two sets, and the output of the function depends only on the total count of bits in each set.

We focus in particular on balanced 2-symmetric functions, of the form , where the partitions evenly divide the bits into two sets of size . One can therefore interpret a function of this type as calculating , for a function of the form .

We turn our attention balanced 2-symmetric functions in part because they are the natural next class to consider after we established in the previous section that symmetric functions can be solved with constant local complexity. We emphasize that the local complexity jump from symmetric to 2-symmetric is non-trivial. A straightforward counting argument establishes that there must exist 2-symmetric functions with a local complexity in . Identifying a specific function with this larger complexity would resolve a major open problem in circuit complexity. This follows due to ability of any linear-sized circuit to be simulated in our setting by a protocol with constant local complexity (e.g., see the discussion in [4]). A function that cannot be solved with constant local complexity is a function that cannot be implemented by a linear-sized circuit.

Specifically, we begin by studying many standard modular arithmetic functions on two operands. Perhaps not surprisingly, given the connection between local and circuit complexity, we identify solutions to all functions considered that require only constant local complexity. We then build on these solutions to show that even the more complex GCD function can be implemented with constant complexity. These results underscore the surprisingly power of distributed function computation with a very small number of bits sent and received at any one node.

6.1 Standard Arithmetic Functions with Constant Local Complexity

We begin by studying standard arithmetic functions, including basic mathematical and comparison operations, and their composition. In all cases, we prove that constant local complexity is sufficient. A reasonable starting place for designing a protocol to compute a given balanced 2-symmetric function is to first run two instances of our -counting solutions in parallel on and . This allows some player in the first partition to learn