The distributed complexity of locally checkable problems on paths is decidable

11/05/2018 ∙ by Alkida Balliu, et al. ∙ University of Michigan aalto ETH Zurich 0

Consider a computer network that consists of a path with n nodes. The nodes are labeled with inputs from a constant-sized set, and the task is to find output labels from a constant-sized set subject to some local constraints---more formally, we have an LCL (locally checkable labeling) problem. How many communication rounds are needed (in the standard LOCAL model of computing) to solve this problem? It is well known that the answer is always either O(1) rounds, or Θ(^* n) rounds, or Θ(n) rounds. In this work we show that this question is decidable (albeit PSPACE-hard): we present an algorithm that, given any LCL problem defined on a path, outputs the distributed computational complexity of this problem and the corresponding asymptotically optimal algorithm.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

To what extent is it possible to automate the design of algorithms and the study of computational complexity? While algorithm synthesis problems are typically undecidable, there are areas of theoretical computer science in which we can make use of computational techniques in algorithm design—at least in principle, and sometimes also in practice. One such area is the theory of distributed computing; see [9, 16, 25, 6, 5, 3, 10, 18] for examples of recent success stories. In this work we bring yet another piece of good news:

Consider this setting: there is a computer network that consists of a path with nodes, the nodes are labeled with inputs from a constant-sized set, and the task is to find output labels from a constant-sized set subject to some local constraints. We show that for any given set of local constraints, it is decidable to tell what is the asymptotically optimal number of communication rounds needed to solve this problem (as a function of , for the worst-case input).

Background: s and the Model.

We focus on what are known as (locally checkable labeling) problems [21] in the model of distributed computing [19, 23]. We define the setting formally in Section 2, but in essence we look at the following question:

  • We are given an unknown input graph of maximum degree ; the nodes are labeled with input labels from a constant-size set , and the nodes also have unique identifiers from a polynomially-sized set.

  • The task is to label the nodes with output labels from a constant-size set , subject to some local constraints ; a labeling is globally feasible if it is locally feasible in all radius- neighborhoods for some .

  • Each node has to produce its own output label based on the information that it sees in its own radius- neighborhoods for some function.

Here the local constraints define an problem. The rule that the nodes apply to determine their output labels is called a distributed algorithm in the model, and function is the running time of the algorithm—here determines how far a node has to see in order to choose its own part of the solution, or equivalently, how many communication rounds are needed for each node to gather the relevant information if we view the input graph as a communication network.

In this setting, the case of is trivial, as all nodes can see the entire input. The key question is to determine which problems can be solved in sublinear time—here are some examples:

  • Vertex coloring with colors: can be solved in time [15, 8] and this is tight [19, 20].

  • Vertex coloring with colors, for : can be solved in polylogarithmic time [22] and requires at least logarithmic time [7] for deterministic algorithms.

While the study of this setting was initiated already in the seminal work by Naor and Stockmeyer in 1995 [21], our understanding of these questions has rapidly advanced in the past three years [2, 1, 4, 7, 6, 11, 12, 13, 14, 24]. The big surprises have been these:

  • There are problems with infinitely many different time complexities—for example, we can construct problems with a time complexity exactly for any rational number .

  • Nevertheless, there are also wide gaps in the complexity landscape: for example, no problem has a (deterministic) computational complexity that is between and .

However, what is perhaps most relevant for us is the following observation: if we look at the case of (paths and cycles), then the time complexity of any problem is either , , or , and the same holds for both deterministic and randomized algorithms [21, 5, 6].

Decidability of Time Complexities.

For a fixed , any problem has a trivial finite representation: simply enumerate all feasible radius- local neighborhoods. Hence it makes sense to ask if, given an problem, we can determine its time complexity. The following results are known by prior work:

  • If the input graph is an unlabeled path or cycle, the time complexity is decidable [21, 5].

  • If the input graph is a grid or toroidal grid, the time complexity is undecidable [21]. However, there are also some good news: in unlabeled toroidal grids, the time complexity falls in one of the classes , , or , it is trivial to tell if the time complexity is , and it is semi-decidable to tell if it is [5].

  • In the case of trees, there are infinitely many different time complexities, but there is a gap between and , and it is decidable to tell on which side of the gap a given problem lies [6].

Somewhat surprisingly, the seemingly simple case of labeled paths or cycles has remained open all the way since the 1995 paper by Naor and Stockmeyer [21], which defined s with inputs but analyzed decidability questions only in the case of unlabeled graphs.

We initially expected that the question of paths with input labels is a mere technicality and the interesting open questions are related to much broader graph families, such as rooted trees, trees, and bounded-treewidth graphs. However, it turned out that the main obstacle for understanding decidability in any such graph family seems to lie in the fact that the structure of the graph can be used to encode arbitrary input labels, hence it is necessary to first understand how the input labels influence decidability—and it turns out that this makes all the difference in the case of paths.

In this work we show that the time complexity of a given problem on labeled paths or cycles is decidable. However, we also show that decidability is far from trivial: the problem is PSPACE-hard, as

problems on labeled paths are expressive enough to capture linear bounded automata (Turing machines with bounded tapes).

2 Model

The Model.

The model of computation we consider in this work is the model of distributed computing [19, 23]. In the model, each node of the input graph is considered as a computational entity that can communicate with the neighboring nodes in order to solve some given graph problem. Computation is divided into synchronous rounds, where in each round each node first sends messages of arbitrary size to its neighbors, then receives the messages sent by its neighbors, and finally performs some local computation of arbitrary complexity. Each node is equipped with a globally unique identifier () which is simply a bit string of length , where denotes the number of nodes of the input graph. In the beginning of the computation, each node is aware of its own , the number of nodes and the maximum degree of the input graph, and potentially some additional problem-specific input. Each node has to decide at some point that it terminates, upon which it returns a local output and does not take part in any further computation; the problem is solved correctly if the local outputs of all nodes together constitute a global output that satisfies the output constraints of the given problem.

Each node executes the same algorithm; the running time of the distributed algorithm is the number of rounds until the last node terminates. It is well known that, due to the unbounded message sizes, an algorithm with runtime can be equivalently described as a function from the set of all possible radius- neighborhoods to the set of allowed outputs. In other words, we can assume that in a -round algorithm, each node first gathers the topology of and the input labels contained in its radius- neighborhood, and then decides on its output based solely on the collected information.

Locally Checkable Labelings.

The class of problems we consider is locally checkable labeling () problems [21]. problems are defined on graphs of bounded degree, i.e., we will assume that . Formally, an problem is given by a finite input label set , a finite output label set , an integer , and a finite set of graphs where every node is labeled with a pair and one node is marked (as the center). Each node of the input graph is assigned an input label from before the computation begins, and the global output of a distributed algorithm is correct if the radius- neighborhood of each node , including the input labels given to the contained nodes and the output labels returned by the contained nodes, is isomorphic to an element of where corresponds to the node marked as the center.

In the case of directed paths as our class of input graphs, we are interested in identifying the simplest possible form of problems. For this purpose, we define -normalized s; these are problems for which the input is just binary, and the size of the set of output labels is . Moreover, the solution can be checked at each node by just inspecting the input and output of , and, separately, the output of and the output of its predecessor. More formally, a -normalized problem is given by finite input and output label sets , satisfying , , a finite set of pairs and a finite set of pairs . The global output of a distributed algorithm for the -normalized problem is correct if the following hold:

  • For each node , we have , where denotes the input label of , and the output label of .

  • For each node that has a predecessor, we have , where is the predecessor of , and are the output labels of and , respectively.

It is straightforward to check that a -normalized problem is indeed a special case of an problem where .

3 Hardness

In this section we study the hardness of determining the distributed complexity of s on paths and cycles with input labels. More precisely, we start by proving the existence of a family of problems for consistently globally oriented paths, such that, given an problem in , it is PSPACE-hard to decide if its distributed complexity is or . Our main result shows the following.

It is PSPACE-hard to distinguish whether a given problem with input labels can be solved in time or needs time on globally oriented path graphs.

The high level idea of the proof of the above result is as follows. We would like to encode the execution of Turing machines as s on consistently oriented paths, and then define some for which the complexity depends on the running time of the machine. This is fairly easy on oriented grids, for example, where we can use one dimension of the grid as a tape, and the other dimension as time. One may try to do the same on paths, by projecting everything on a single dimension, concatenating the tape state of each step. Unfortunately, the obtained encoding is not locally checkable, since the length of the tape may be non-constant. To avoid this problem, we consider Linear Bounded Automata (), that is, Turing machines that have a tape of fixed size . We show that, if is constant, we can then encode the execution of an , as an for directed paths. Moreover, we show that by seeing this encoding as a two party game between a prover and a disprover, we can encode the execution of using labels of constant size that do not even depend on . If the execution of is not correctly encoded in the input of the , then we can disprove its correctness using output labels of size . Moreover, we ensure that, if the execution of is correctly encoded in the input of the , it is not possible to produce a correct proof of non-correctness. Then, in order to obtain an with a distributed complexity that depends on the execution time of , we encode some secret input at the first node of the path. We require then that all nodes involved in a correct encoding must produce the same secret as output.

Figure 1: Illustration of a correct encoding of the execution of an on a path; black nodes act as separators between the encoding of two consecutive steps of the ; in the example, the executes a unary counter.

Figure 1 shows an example of an that executes a unary counter, and its encoding as input to nodes on a path. In this instance, all nodes must produce the symbol as output. Figure 2 shows an example of the wrong input (the tape has been copied incorrectly between two consecutive steps of the ). In this case, nodes are allowed to produce a chain of errors. Different types of errors will be handled using different types of error labels. In the example, all nodes that produce the error chain, output , indicating an error of type . We will show that we need symbols to handle all possible errors (including the case in which the input tape is too long, way more than ). Also, it is necessary that all error chains that we allow as outputs, must be locally checkable.

Figure 2: Illustration of an incorrect encoding of the execution of an on a path; in the example, the tape of the is wrongly copied (the inputs in red are different, while they should be the same). The error output encodes the distance of between the two nodes, and the input wrongly copied.

Another interesting problem is to identify, for an that can be distributedly solved in constant time, how big this constant can be. In particular, we first focus on identifying the simplest possible description of an , and then, we provide a lower bound on the complexity of a constant time , as a function of the size of the description. For this purpose, we consider -normalized s, i.e., problems for which the input labeling is just binary and there are possible output labels. Also, the verifier for these s is the simplest possible: it can only check if the output of a node is correct w.r.t. its input, and separately, if the output of a node is correct w.r.t. the output of its predecessor. Therefore, we show how to convert an to a -normalized one by encoding the input in binary (Figure 3 shows an example), and obtain the following result.

There are -normalized s that can be solved in constant time but the distributed time complexity is .

Figure 3: Illustration of the normalization of an .

All results that we have been described so far apply to globally oriented paths. Nevertheless, we show that ideas and techniques can be generalized to work on undirected path and cycles as well, obtaining essentially the same results. Finally, we will show how to lift these results to trees without input labels, proving the following result.

It is PSPACE-hard to distinguish whether a given problem without input labels can be solved in time or needs time on trees with degree .

3.1 Linear Bounded Automata

A Linear Bounded Automata is a Turing Machine having a bounded tape of size at most , such that it is able to recognize the boundaries of the tape [17, p. 225]. More formally, we define an as a tuple of elements , where

  • [noitemsep]

  • is a finite set of states;

  • is the initial state;

  • is the final state;

  • is a finite set of tape alphabet symbols that contains integers , , and special symbols (left), and (right);

  • is the transition function, where .

The tape of is initialized as follows:

  • [noitemsep]

  • the first cell is marked with the symbol ;

  • the last cell is marked with the symbol ;

  • all other cells contain an integer in .

An execution of an is a sequence , where

  • [noitemsep]

  • ;

  • ;

  • , and is

    • [noitemsep]

    • if is ;

    • if is ;

    • if is .

3.2 The Problem

We define a family of s, in which each problem depends on the . The general idea is that the input of the may encode the execution of an . If it is the case, nodes are required to solve a problem that requires a time proportional to the execution time of . On the other hand, if it is not the case, nodes can produce an output that proves that this encoding is wrong. In order to define valid s, we consider the case where , that is, the size of the tape does not depend on the size of the distributed network.

3.2.1 Input Labels

We define the input labels of our as follows:

  • [noitemsep]

  • , where , indicates a symbol that will be used as some kind of secret;

  • , a label that acts as a separator between two steps of ;

  • gives information about the tape and the state of , where the content , the state , and the head ;

  • , indicating an empty input.

Note that the size of the set of possible input labels does not depend on the size of the tape.

3.2.2 Encoding an on a Path

Suppose we have a consistent global orientation in the path . Let be the execution of the starting from a tape initialized with .

Definition 1.

The input of the is a good input if the first node of the path has in input , where , and the rest of the path correctly encodes the execution of an initialized with (see Figure 1). More precisely:

  • [noitemsep]

  • ;

  • for ;

  • for , , where

    • [noitemsep]

    • ;

    • ;

    • if , otherwise ;

  • All other nodes have in input .

3.2.3 Output Labels

The set of output labels is the following.

  • [noitemsep]

  • ;

  • ;

  • : a generic error label;

  • where : an error of type indicating that the machine is not correctly initialized;

  • , where : an error of type that we will use in the case where the size of the tape is not correct, i.e., when the size of the tape is not ;

  • , where and : an error of type used when the tape of is wrongly copied;

  • : an error of type is used in case nodes have inconsistent states;

  • , where : an error of type indicating that the transition of is encoded incorrectly (this error captures also the case where the head is missing);

  • where : an error of type 5 used in the case when there is more than one head.

3.2.4 Constraints

The high level idea is the following. If the path encodes a good input, then nodes that are not labeled are required to output the input given to the first node of the path (either or ). Otherwise, nodes can produce a locally checkable proof of an error (see Figure 2 for an example). While nodes may output or even in the case in which the input is not a good input, nodes must not be able to produce a proof error in the case in which the path encodes a good input. We describe all these requirements as locally checkable constraints.

An output labeling for problem is correct if the following conditions are satisfied for nodes of the path . Note that, although nodes do not know their position on the path, for the sake of simplicity we will denote with the predecessor of , if it exists.

  1. Each node produces exactly one output label.

  2. If then .

  3. If has no predecessors (i.e., ) and , then .

  4. If then , and if then .

  5. If , then

    • if then the node has no predecessor;

    • if then .

  6. If , then

    • if , then ;

    • if then and .

  7. If , then

    • if , then where , ;

    • if then where ;

    • if then .

  8. If , then , , and .

  9. If , let

    • if , then where , , ;

    • if and , or and , or and (i.e., if node is an “ final node”), then either is a final state or where or ;

    • otherwise, then .

  10. If

    • if then where and .

  11. If then one of the following condition holds:

    • and has no predecessors;

    • and has a predecessor;

    • or is ;

    • ;

    • , , and

      • if then ;

      • if then either , or and:

        • if either , or or ;

        • if either , or or ;

        • if , either , or or ;

    • and where ;

    • and ;

    • where ;

    • ;

    • is an “ final node”;

    • and where and .

  12. If is of type , then must not be of type where .

The following property directly holds by definition of the constraints.

Property 1.

Each node is able to locally check all constraints by just inspecting its own input and output, and the ones of its predecessor (if it exists).

3.3 Upper Bound on the Complexity of the

We need to consider two possible scenarios: either terminates within time , or loops. In the case in which loops, we show a simple algorithm that solves the . As we know, any problem for which a solution exists can be solved in rounds in the model by gathering all the graph and solving the problem locally. There always exists a solution for problem if loops, in fact:

  • [noitemsep]

  • If , then all nodes output , even if there are errors in the machine encoding.

  • Otherwise, if , all nodes output .

It is easy to see that this output satisfies the constraints described above.

Suppose that terminates. In this case, we show how to solve the problem in constant time. More precisely, if terminates in rounds, we show a distributed algorithm that solves in rounds. Each node starts by gathering its -radius neighborhood . Notice that, by definition, if the input is a good input, then for each node that is taking part in the encoding of the execution of (i.e., ), contains . Hence, if a node does not see after gathering its ball , it means that the input is not a good input. So, after gathering its -radius ball, each node does the following.

  • [noitemsep]

  • If , then .

  • If does not contain , or if , then outputs .

  • If is a good input, then outputs .

The remaining case that we still need to handle is when contains , , but does not look like a good input. We want nodes to produce a proof of an error in some consistent way. Thus, we show that nodes can identify the first error and produce a proof based on that. First of all, notice that, since sees the first node in the path, can compute its position on the path. Also, node can identify who is the first node not satisfying the constraints of being a good input. Let be the position of in the path, that is . Now we distinguish the following cases based on (the output of each node will be determined by the first case encountered in the following list).

  1. If and , then, if , ; otherwise .

  2. If , it means that either the initial state is encoded incorrectly, or the tape is not initialized correctly, or the head is not initialized on the correct position. In this case, if , then , otherwise .

  3. If and , then the length of the tape is too long, and expected to have in input . Then, if , ; if then ; otherwise, .

  4. If and there exists a such that such that , then the length of the tape is too short, and did not expect to have a separator. In this case, if then ; if then ; otherwise .

  5. If where , , and , where , then the tape of has been copied incorrectly. In this case, if , then ; if then ; otherwise, .

  6. If and and there exists a such that and that , it means that nodes have inconsistent states. Consider the minimum satisfying the constraints. If then ; if then ; otherwise, .

  7. If none of the above is satisfied, it means that there exist a satisfying , such that and . Let . It holds that if is , , or , then is respectively , , or . If , where either , or , or is a final state, then there is some error in the transition (this captures also the case where there is no head). If , then ; if then ; otherwise, . Notice that this case captures also the one where the head is missing.

  8. If where , since all the above cases are not satisfied, it means that there exists a , such that , , and all nodes are labeled with some . That is, there are at least two heads, one on node and one on node . In this case, if , then ; if then ; if then , otherwise .

If the path encodes a good input, every node taking part in the encoding of the execution of outputs , and in this case it is easy to see that the output satisfies the constraints.

Therefore, assume that the path does not correctly encode the execution of starting from the correct tape content. First of all, notice that the algorithm handles all possible errors in the machine encoding, that is, if the input is not good, at least one case of the list applies. Consider all nodes that do not have in input , that is, all nodes taking part in the encoding of the execution of . If node sees the first node , i.e., if the distance between and is at most (notice that a good input has length ), then it is easy to see that the output satisfies the constraints. Some care is needed in the case where a node outputs a generic error and does not see : we need to show that also in this case the output is valid, meaning that the constraints are satisfied. In this case, the distance between and is strictly greater than , and since the encoding of the execution of is not correct, then

  • [noitemsep]

  • either the path does not correctly encode the execution of ,

  • or is not correctly initialized and it loops.

In the first case, some node on the path between and will output some specific error where , while in the second case initial nodes will output . In both scenarios the constraints for are satisfied. The complexity of the algorithm is .

3.4 Lower Bound on the Complexity of the

Let us define as follows. If terminates in time , then . If loops, then . We prove a lower bound on the complexity of of rounds, by showing that rounds are needed in the case where the input is a good input. In particular, we show that, in a good input, for all nodes such that , must be . The result then comes from the fact that, for some nodes, it requires rounds in order to see if or .

First of all, we ignore nodes that have in input since, in a good input, they are at distance at least from , the first node of the path. Hence, assume that a node not having in input does not output . In this case, can either output a generic error , or a specific error . If all nodes output , the verifier rejects on . If all nodes, starting from a node where , output , and all nodes with output , then the verifier rejects on . Therefore, let us assume that there is at least a node that outputs a specific error . We write and to denote respectively the successor of a node in the path, and the distance between two nodes and in the path.

  • If , the verifier accepts only if this error produces a chain that starts from and proceeds with increasing values. In order to be accepted, this chain must end at a node , and must output . Then, must witness that indeed has a local error in the machine initialization, which is not possible in a good input.

  • If , we could have two cases:

    • there is a chain of increasing values that starts from a node with , and ends on a node such that , , and (the tape is too short);

    • there is a chain of increasing values that starts from a node with , and ends on a node such that , , and (the tape is too long).

    Since, in a good input, the distance between two nodes having in input is always , the above scenarios are not possible.

  • If , there must be a chain of length exactly , starting from a node having , where , , and ending on a node such that , and , where . In a good input, the tape content of nodes and must be the same.

  • If , it means that there must exist two neighbors having two different states, and this can not happen in a good input.

  • If , there must be a chain that propagates the old state and old input, and the verifier accepts only if acknowledges that the transition has been wrongly encoded, which can not the case in a good input.

  • If , there must be a chain of length at least not passing through nodes having in input , starting from a node with where , and ending on a node with where . This is not possible on a good input.

Therefore, since nodes can not output any kind of error, and since is not a valid output for the nodes encoding the , then these nodes must output , where the value of matches the input of the first node of the path. Hence, requires .

3.5 Normalizing an Problem

We now show how to -normalize an and obtain a new having roughly the same time complexity. We define three different verifiers depending on their view.

  • [noitemsep]

  • A verifier running at node , checks , , , and .

  • A verifier running at node , checks and .

  • A verifier running at node checks and .

Lemma 2.

Consider an with and that can be solved in time and can be locally checked with a verifier. It is possible to define an such that and that can be solved in time and can be locally checked with a and a verifier.

Proof.

We define