Adversarially Robust Coloring for Graph Streams

09/23/2021
by   Amit Chakrabarti, et al.
0

A streaming algorithm is considered to be adversarially robust if it provides correct outputs with high probability even when the stream updates are chosen by an adversary who may observe and react to the past outputs of the algorithm. We grow the burgeoning body of work on such algorithms in a new direction by studying robust algorithms for the problem of maintaining a valid vertex coloring of an n-vertex graph given as a stream of edges. Following standard practice, we focus on graphs with maximum degree at most Δ and aim for colorings using a small number f(Δ) of colors. A recent breakthrough (Assadi, Chen, and Khanna; SODA 2019) shows that in the standard, non-robust, streaming setting, (Δ+1)-colorings can be obtained while using only O(n) space. Here, we prove that an adversarially robust algorithm running under a similar space bound must spend almost Ω(Δ^2) colors and that robust O(Δ)-coloring requires a linear amount of space, namely Ω(nΔ). We in fact obtain a more general lower bound, trading off the space usage against the number of colors used. From a complexity-theoretic standpoint, these lower bounds provide (i) the first significant separation between adversarially robust algorithms and ordinary randomized algorithms for a natural problem on insertion-only streams and (ii) the first significant separation between randomized and deterministic coloring algorithms for graph streams, since deterministic streaming algorithms are automatically robust. We complement our lower bounds with a suite of positive results, giving adversarially robust coloring algorithms using sublinear space. In particular, we can maintain an O(Δ^2)-coloring using O(n √(Δ)) space and an O(Δ^3)-coloring using O(n) space.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 24

07/19/2018

Coloring in Graph Streams

In this paper, we initiate the study of the vertex coloring problem of a...
10/25/2020

Even the Easiest(?) Graph Coloring Problem is not Easy in Streaming!

We study a graph coloring problem that is otherwise easy but becomes qui...
05/02/2019

Graph Coloring via Degeneracy in Streaming and Other Space-Conscious Models

We study the problem of coloring a given graph using a small number of c...
04/19/2022

The White-Box Adversarial Data Stream Model

We study streaming algorithms in the white-box adversarial model, where ...
11/26/2019

Pseudo-deterministic Streaming

A pseudo-deterministic algorithm is a (randomized) algorithm which, when...
03/21/2022

Brooks' Theorem in Graph Streams: A Single-Pass Semi-Streaming Algorithm for Δ-Coloring

Every graph with maximum degree Δ can be colored with (Δ+1) colors using...
11/24/2021

Scale-Invariant Strength Assortativity of Streaming Butterflies

Bipartite graphs are rich data structures with prevalent applications an...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A data streaming algorithm processes a huge input, supplied as a long sequence of elements, while using working memory (i.e., space) much smaller than the input size. The main algorithmic goal is to compute or estimate some function of the input

while using space sublinear in the size of . For most—though not all—problems of interest, a streaming algorithm needs to be randomized in order to achieve sublinear space. For a randomized algorithm, the standard correctness requirement is that for each possible input stream it return a valid answer with high probability. A burgeoning body of work—much of it very recent [BJWY20, BY20, HKM20, KMNS21, BHM21, WZ21, ACSS21, BEO21] but traceable back to [HW13]—addresses streaming algorithms that seek an even stronger correctness guarantee, namely that they produce valid answers with high probability even when working with an input generated by an active adversary. There is compelling motivation from practical applications for seeking this stronger guarantee: for instance, consider a user continuously interacting with a database and choosing future queries based on past answers received; or think of an online streaming or marketing service looking at a customer’s transaction history and recommending them products based on it.

We may view the operation of streaming algorithm as a game between a solver, who executes , and an adversary, who generates a “hard” input stream . The standard notion of having error probability is that for every fixed that the adversary may choose, the probability over ’s random choices that it errs on is at most . Since the adversary has to make their choice before the solver does any work, they are oblivious to the actual actions of the solver. In contrast to this, an adaptive adversary is not required to fix all of in advance, but can generate the elements (tokens) of incrementally, based on outputs generated by the solver as it executes . Clearly, such an adversary is much more powerful and can attempt to learn something about the solver’s internal state in order to generate input tokens that are bad for the particular random choices made by . Indeed, such adversarial attacks are known to break many well known algorithms in the streaming literature [HW13, BJWY20]. Motivated by this, one defines a -error adversarially robust streaming algorithm to be one where the probability that an adaptive adversary can cause the solver to produce an incorrect output at some point of time is at most . Notice that a deterministic streaming algorithm (which, by definition, must always produce correct answers) is automatically adversarially robust.

Past work on such adversarially robust streaming algorithms has focused on statistical estimation problems and on sampling problems but, with the exception of [BHM21], there has not been much study of graph theoretic problems. This work focuses on graph coloring, a fundamental algorithmic problem on graphs. Recall that the goal is to efficiently process an input graph given as a stream of edges and assign colors to its vertices from a small palette so that no two adjacent vertices receive the same color. The main messages of this work are that (i) while there exist surprisingly efficient sublinear-space algorithms for coloring under standard streaming, it is provably harder to obtain adversarially robust solutions; but nevertheless, (ii) there do exist nontrivial sublinear-space robust algorithms for coloring.

To be slightly more detailed, suppose we must color an -vertex input graph that has maximum degree . Producing a coloring using only colors, where is the chromatic number, is NP-hard while producing a -coloring admits a straightforward greedy algorithm, given offline access to . Producing a good coloring given only streaming access to and sublinear (i.e., bits of) space is a nontrivial problem and the subject of much recent research [BG18, ACK19, BCG20, AA20, BBMU21], including the breakthrough result of Assadi, Chen, and Khanna [ACK19] that gives a -coloring algorithm using only semi-streaming (i.e., bits of) space.111The notation hides factors polylogarithmic in . However, all of these algorithms were designed with only the standard, oblivious adversary setting in mind; an adaptive adversary can make all of them fail. This is the starting point for our exploration in this work.

1.1 Our Results and Contributions

We ask whether the graph coloring problem is inherently harder under an adversarial robustness requirement than it is for standard streaming. We answer this question affirmatively with the first major theorem in this work, which is the following (we restate the theorem with more detail and formality as Theorem 4.3).

Theorem 1.1.

A constant-error adversarially robust algorithm that processes a stream of edge insertions into an -vertex graph and, as long as the maximum degree of the graph remains at most , maintains a valid -coloring (with ) must use at least bits of space.

We spell out some immediate corollaries of this result because of their importance as conceptual messages.

  • Robust coloring using colors.  In the setting of Theorem 1.1, if the algorithm is to use only colors, then it must use space. In other words, a sublinear-space solution is ruled out.

  • Robust coloring using semi-streaming space.  In the setting of Theorem 1.1, if the algorithm is to run in only space, then it must use colors.

  • Separating robust from standard streaming with a natural problem.  Contrast the above two lower bounds with the guarantees of the [ACK19] algorithm, which handles the non-robust case. This shows that “maintaining an -coloring of a graph” is a natural (and well-studied) algorithmic problem where, even for insertion-only streams, the space complexities of the robust and standard streaming versions of the problem are well separated: in fact, the separation is roughly quadratic, by taking . This answers an open question of [KMNS21], as we explain in greater detail in Section 1.2.

  • Deterministic versus randomized coloring.  Since every deterministic streaming algorithm is automatically adversarially robust, the lower bound in Theorem 1.1 applies to such algorithms. In particular, this settles the deterministic complexity of

    -coloring. Also, turning to semi-streaming algorithms, whereas a combinatorially optimal

    222If one must use at most colors for some function , the best possible function that always works is . -coloring is possible using randomization [ACK19], a deterministic solution must spend at least colors. These results address a broadly-stated open question of Assadi [Ass18]; see Section 1.2 for details.

We prove the lower bound in Theorem 1.1 using a reduction from a novel two-player communication game that we call subset-avoidance. In this game, Alice is given an -sized subset of the universe ;333The notation denotes the set . she must communicate a possibly random message to Bob that causes him to output a -sized subset of  that, with high probability, avoids Alice’s set completely. We give a fairly tight analysis of the communication complexity of this game, showing an lower bound, which is matched by an deterministic upper bound. The subset-avoidance problem is a natural one. We consider the definition of this game and its analysis—which is not complicated—to be additional conceptual contributions of this work; these might be of independent interest for future applications.

We complement our lower bound with some good news: we give a suite of upper bound results by designing adversarially robust coloring algorithms that handle several interesting parameter regimes. Our focus is on maintaining a valid coloring of the graph using colors, where is the current maximum degree, as an adversary inserts edges. In fact, some of these results hold even in a turnstile model, where the adversary might both add and delete edges. In this context, it is worth noting that the [ACK19] algorithm also works in a turnstile setting.

Theorem 1.2.

There exist adversarially robust algorithms for coloring an -vertex graph achieving the following tradeoffs (shown in Table 1) between the space used for processing the stream and the number of colors spent, where denotes the evolving maximum degree of the graph and, in the turnstile setting, denotes a known upper bound on the stream length.

Model Colors Space Notes Reference
Insertion-only external random bits Theorem 5.5
Insertion-only any Corollary 5.10
Strict Graph Turnstile constant Theorem 5.9
Table 1: A summary of our adversarially robust coloring algorithms. A “strict graph turnstile” model requires the input to describe a simple graph at all times; see Section 3.

In each of these algorithms, for each stream update or query made by the adversary, the probability that the algorithm fails either by returning an invalid coloring or aborting is at most .

We give a more detailed discussion of these results, including an explanation of the technical caveat noted in Table 1 for the -coloring algorithm, in Section 2.2.

1.2 Motivation, Context, and Related Work

Graph streaming has become widely popular [McG14], especially since the advent of large and evolving networks including social media, web graphs, and transaction networks. These large graphs are regularly mined for knowledge and such knowledge often informs their future evolution. Therefore, it is important to have adversarially robust algorithms for working with these graphs. Yet, the recent explosion of interest in robust algorithms has not focused much on graph problems. We now quickly recap some history.

Two influential works [MNS11, HW13] identified the challenge posed by adaptive adversaries to sketching and streaming algorithms. In particular, Hardt and Woodruff [HW13] showed that many statistical problems, including the ubiquitous one of -norm estimation, do not admit adversarially robust linear sketches of sublinear size. Recent works have given a number of positive results. Ben-Eliezer, Jayaram, Woodruff, and Yogev [BJWY20]

considered such fundamental problems as distinct elements, frequency moments, and heavy hitters (these date back to the beginnings of the literature on streaming algorithms); for

-approximating a function value, they gave two generic frameworks that can “robustify” a standard streaming algorithm, blowing up the space cost by roughly the flip number , defined as the maximum number of times the function value can change by a factor of over the course of an -length stream. For insertion-only streams and monotone functions, is roughly , so this overhead is very small. Subsequent works [HKM20, WZ21, ACSS21] have improved this overhead with the current best-known one being [ACSS21].

For insertion-only graph streams, a number of well-studied problems such as triangle counting, maximum matching size, and maximum subgraph density can be handled by the above framework because the underlying functions are monotone. For some problems such as counting connected components, there are simple deterministic algorithms that achieve an asymptotically optimal space bound, so there is nothing new to say in the robust setting. For graph sparsification, [BHM21] showed that the Ahn–Guha sketch [AG09] can be made adversarially robust with a slight loss in the quality of the sparsifier. Thanks to efficient adversarially robust sampling [BY20, BHM21], many sampling-based graph algorithms should yield corresponding robust solutions without much overhead. For problems calling for Boolean answers, such as testing connectivity or bipartiteness, achieving low error against an oblivious adversary automatically does so against an adaptive adversary as well, since a sequence of correct outputs from the algorithm gives away no information to the adversary. This is a particular case of a more general phenomenon captured by the notion of pseudo-determinism, discussed at the end of this section.

Might it be that for all interesting data streaming problems, efficient standard streaming algorithms imply efficient robust ones? The above framework does not automatically give good results for turnstile streams, where each token specifies either an insertion or a deletion of an item, or for estimating non-monotone functions. In either of these situations, the flip number can be very large. As noted above, linear sketching, which is the preeminent technique behind turnstile streaming algorithms (including ones for graph problems), is vulnerable to adversarial attacks [HW13]. This does not quite provide a separation between standard and robust space complexities, since it does not preclude efficient non-linear solutions. The very recent work [KMNS21] gives such a separation: it exhibits a function estimation problem for which the ratio between the adversarial and standard streaming complexities is as large as , which is exponential upon setting parameters appropriately. However, their function is highly artificial, raising the important question: Can a significant gap be shown for a natural streaming problem? 444This open question was explicitly raised in the STOC 2021 workshop Robust Streaming, Sketching, and Sampling [Ste21].

It is easy to demonstrate such a gap in graph streaming. Consider the problem of finding a spanning forest in a graph undergoing edge insertions and deletions. The celebrated Ahn–Guha–McGregor sketch [AGM12] solves this in space, but this sketch is not adversarially robust. Moreover, suppose that is an adversarially robust algorithm for this problem. Then we can argue that the memory state of upon processing an unknown graph must contain enough information to recover entirely: an adversary can repeatedly ask for a spanning forest, delete all returned edges, and recurse until the evolving graph becomes empty. Thus, for basic information theoretic reasons, must use bits of space, resulting in a quadratic gap between robust and standard streaming space complexities. Arguably, this separation is not very satisfactory, since the hardness arises from the turnstile nature of the stream, allowing the adversary to delete edges. Meanwhile, the [KMNS21] separation does hold for insert-only streams, but as we (and they) note, their problem is rather artificial.

Hardness for Natural Problems.  We now make a simple, yet crucial, observation. Let missing-item-finding (mif) denote the problem where, given an evolving set , we must be prepared to return an element in or report that none exists. When the elements of are given as an input stream, mif admits the following -space solution against an oblivious adversary: maintain an -sampling sketch [JST11]

for the characteristic vector of

and use it to randomly sample a valid answer. In fact, this solution extends to turnstile streams. Now suppose that we have an adversarially robust algorithm for mif, handling insert-only streams. Then, given the memory state of after processing an unknown set with , an adaptive adversary can repeatedly query for a missing item , record , insert as the next stream token, and continue until fails to find an item. At that point, the adversary will have recorded (w.h.p.) the set , so he can reconstruct . As before, by basic information theory, this reconstructability implies that uses space.

This exponential gap between standard and robust streaming, based on well-known results, seems to have been overlooked—perhaps because mif does not conform to the type of problems, namely estimation of real-valued functions, that much of the robust streaming literature has focused on. That said, though mif is a natural problem and the hardness holds for insert-only streams, there is one important box that mif does not tick: it is not important enough on its own and so does not command a serious literature. This leads us to refine the open question of [KMNS21] thus: Can a significant gap be shown for a natural and well-studied problem with the hardness holding even for insertion-only streams?

With this in mind, we return to graph problems, searching for such a gap. In view of the generic framework of [BJWY20] and follow-up works, we should look beyond estimating some monotone function of the graph with scalar output. What about problems where the output is a big vector, such as approximate maximum matching (not just its size) or approximate densest subgraph (not just the density)? It turns out that the sketch switching technique of [BJWY20] can still be applied: since we need to change the output only when the estimates of the associated numerical values (matching size and density, respectively) change enough, we can proceed as in that work, switching to a new sketch with fresh randomness that remains unrevealed to the adversary. This gives us a robust algorithm incurring only logarithmic overhead.

But graph coloring is different. As our Theorem 1.1 shows, it does exhibit a quadratic gap for the right setting of parameters and it is, without doubt, a heavily-studied problem, even in the data streaming setting.

The above hardness of mif provides a key insight into why graph coloring is hard; see Section 2.1.

Connections with Other Work on Streaming Graph Coloring.  Graph coloring is, of course, a heavily-studied problem in theoretical computer science. For this discussion, we stick to streaming algorithms for this problem, which already has a significant literature [BG18, ACKP19, ACK19, BCG20, AA20, BBMU21].

Although it is not possible to -color an input graph in sublinear space [ACKP19], as [ACK19] shows, there is a semi-streaming algorithm that produces a -coloring. This follows from their elegant palette sparsification theorem, which states that if each vertex samples roughly colors from a palette of size , then there exists a proper coloring of the graph where each vertex uses a color only from its sampled list. Hence, we only need to store edges between vertices whose lists intersect. If the edges of are independent of the algorithm’s randomness, then the expected number of such “conflict” edges is , leading to a semi-streaming algorithm. But note that an adaptive adversary can attack this algorithm by using a reported coloring to learn which future edges would definitely be conflict edges and inserting such edges to blow up the algorithm’s storage.

There are some other semi-streaming algorithms (in the standard setting) that aim for -colorings. One is palette-sparsification based [AA20] and so, suffers from the above vulnerability against an adaptive adversary. Others [BG18, BCG20] are based on randomly partitioning the vertices into clusters and storing only intra-cluster edges, using pairwise disjoint palettes for the clusters. Here, the semi-streaming space bound hinges on the random partition being likely to assign each edge’s endpoints to different clusters. This can be broken by an adaptive adversary, who can use a reported coloring to learn many vertex pairs that are intra-cluster and then insert new edges at such pairs.

Finally, we highlight an important theoretical question about sublinear algorithms for graph coloring: Can they be made deterministic? This was explicitly raised by Assadi [Ass18] and, prior to this work, it was open whether, for -coloring, any sublinear space bound could be obtained deterministically. Our Theorem 1.1 settles the deterministic space complexity of this problem, showing that even the weaker requirement of -coloring forces space, which is linear in the input size.

Parameterizing Theorem 1.1 differently, we see that a robust (in particular, a deterministic) algorithm that is limited to semi-streaming space must spend colors. A major remaining open question is whether this can be matched, perhaps by a deterministic semi-streaming -coloring algorithm. In fact, it is not known how to get even a -coloring deterministically. Our algorithmic results, summarized in Theorem 1.2, make partial progress on this question. Though we do not obtain deterministic algorithms, we obtain adversarially robust ones, and we do obtain -colorings, though not all the way down to in semi-streaming space.

Other Related Work.  Pseudo-deterministic streaming algorithms[GGMW20] fall between adversarially robust and deterministic ones. Such an algorithm is allowed randomness, but for each particular input stream it must produce one fixed output (or output sequence) with high probability. Adversarial robustness is automatic, because when such an algorithm succeeds, it does not reveal any of its random bits through the outputs it gives. Thus, there is nothing for an adversary to base adaptive decisions on.

The well-trodden subject of dynamic graph algorithms deals with a model closely related to the adaptive adversary model: one receives a stream of edge insertions/deletions and seeks to maintain a solution after each update. There have been a few works on the -based graph coloring problem in this setting [BCHN18, BGK19, HP20]. However, the focus of the dynamic setting is on optimizing the update time without any restriction on the space usage; this is somewhat orthogonal to the streaming setting where the primary goal is space efficiency, and update time, while practically important, is not factored into the complexity.

2 Overview of Techniques

2.1 Lower Bound Techniques

As might be expected, our lower bounds are best formalized through communication complexity. Recall that a typical communication-to-streaming reduction for proving a one-pass streaming space lower bound works as follows. We set up a communication game for Alice and Bob to solve, using one message from Alice to Bob. Suppose that Alice and Bob have inputs and in this game. The players simulate a purported efficient streaming algorithm (for , the problem of interest) by having Alice feed some tokens into based on , communicating the resulting memory state of to Bob, having Bob continue feeding tokens into based on , and finally querying for an answer to , based on which Bob can give a good output in the communication game. When this works, it follows that the space used by must be at least the one-way (and perhaps randomized) communication complexity of the game. Note, however, that this style of argument where it is possible to solve the game by querying the algorithm only once, is also applicable to an oblivious adversary setting. Therefore, it cannot prove a lower bound any higher than the standard streaming complexity of .

The way to obtain stronger lower bounds by using the purported adversarial robustness of is to design communication protocols where Bob, after receiving Alice’s message, proceeds to query repeatedly, feeding tokens into based on answers to such queries. In fact, in the communication games we shall use for our reductions, Bob will not have any input at all and the goal of the game will be for Bob to recover information about Alice’s input, perhaps indirectly. It should be clear that the lower bound for the mif problem, outlined in Section 1.2, can be formalized in this manner. For our main lower bound (Theorem 1.1), we use a communication game that can be seen as a souped-up version of mif.

The Subset-Avoidance Problem.  Recall the subset-avoidance problem described in Section 1.1 and denote it . To restate: Alice is given a set of size and must induce Bob to output a set of size such that . The one-way communication complexity of this game can be lower bounded from first principles. Since each output of Bob is compatible with only possible input sets of Alice, she cannot send the same message on more than that many inputs. Therefore, she must be able to send roughly distinct messages for a protocol to succeed with high probability. The number of bits she must communicate in the worst case is roughly the logarithm of this ratio, which we show is . Interestingly, this lower bound is tight and can in fact be matched by a deterministic protocol, as shown in Lemma 4.2.

In the sequel, we shall need to consider a direct sum version of this problem that we call , where Alice has a list of subsets and Bob must produce his own list of subsets, with his th avoiding the th subset of Alice. We extend our lower bound argument to show that the one-way complexity of is .

Using Graph Coloring to Solve Subset-Avoidance.  To explain how we reduce the problem to graph coloring, we focus on a special case of Theorem 1.1 first. Suppose we have an adversarially robust -coloring streaming algorithm . We describe a protocol for solving . Let us set to have the universe correspond to all possible edges of an -vertex graph. Suppose Alice’s set has size . We show that, given a set of vertices, Alice can use public randomness to randomly map her elements to the set of vertex-pairs so that the corresponding edges induce a graph that, w.h.p., has max-degree . Alice proceeds to feed the edges of into and then sends Bob the state of .

Bob now queries to obtain a -coloring of . Then, he pairs up like-colored vertices to obtain a maximal pairing. Observe that he can pair up all but at most one vertex from each color class. Thus, he obtains at least such pairs. Since each pair is monochromatic, they don’t share an edge, and hence, Bob has retrieved missing edges that correspond to elements absent in Alice’s set. Since Alice used public randomness for the mapping, Bob knows exactly which elements these are. He now forms a matching with these pairs and inserts the edges to . Once again, he queries to find a coloring of the modified graph. Observe that the matching can increase the max-degree of the original graph by at most . Therefore, this new coloring uses at most colors. Thus, Bob would retrieve at least new missing edges. He again adds to the graph the matching formed by those edges and queries . It is crucial to note here that he can repeatedly do this and expect to output a correct coloring because of its adversarial robustness. Bob stops once the max-degree reaches , since now the algorithm can color each vertex with a distinct color, preventing him from finding a missing edge.

Summing up the sizes of all the matchings added by Bob, we see that he has found elements missing from Alice’s set. Since , this is . Thus, Alice and Bob have solved the problem where and . As outlined above, this requires communication. Hence, must use at least space.

With some further work, we can generalize the above argument to work for any value of with . For this generalization, we use the communication complexity of for suitable parameter settings. With more rigorous analysis, we can further generalize the result to apply not only to -coloring algorithms but to any -coloring algorithm. That is, we can prove Theorem 4.3.

2.2 Upper Bound Techniques

It is useful to outline our algorithms in an order different from the presentation in Section 5.

A Sketch-Switching-Based -Coloring.  The main challenge in designing an adversarially robust coloring algorithm is that the adversary can compel the algorithm to change its output at every point in the stream: he queries the algorithm, examines the returned coloring, and inserts an edge between two vertices of the same color. Indeed, the sketch switching framework of [BJWY20] shows that for function estimation, one can get around this power of the adversary as follows. Start with a basic (i.e., oblivious-adversary) sketch for the problem at hand. Then, to deal with an adaptive adversary, run multiple independent basic sketches in parallel, changing outputs only when forced to because the underlying function has changed significantly. More precisely, maintain independent parallel sketches where is the flip number, defined as the maximum number of times the function value can change by the desired approximation factor over the course of the stream. Keep track of which sketch is currently being used to report outputs to the adversary. Upon being queried, re-use the most recently given output unless forced to change, in which case discard the current sketch and switch to the next in the list of sketches. Notice that this keeps the adversary oblivious to the randomness being used to compute future outputs: as soon as our output reveals any information about the current sketch, we discard it and never use it again to process a stream element.

This way of switching to a new sketch only when forced to ensures that sketches suffice, which is great for function estimation. However, since a graph coloring output can be forced to change at every point in a stream of length , naively implementing this idea would require parallel sketches, incurring a factor of in space. We have to be more sophisticated. We combine the above idea with a chunking technique so as to reduce the number of times we need to switch sketches.

Suppose we split the -length stream into chunks, each of size . We initialize parallel sketches of a standard streaming -coloring algorithm to be used one at a time as each chunk ends. We store (buffer) an entire chunk explicitly and when we reach its end, we say we have reached a “checkpoint,” use a fresh copy of to compute a -coloring of the entire graph at that point, delete the chunk from our memory, and move on to store the next chunk. When a query arrives, we deterministically compute a -coloring of the partial chunk in our buffer and “combine” it with the coloring we computed at the last checkpoint. The combination uses at most colors. Since a single copy of takes space, the total space used by the sketches is . Buffering a chunk uses an additional space. Setting to be , we get the total space usage to be , since .

Handling edge deletions is more delicate. This is because we can no longer express the current graph as a union of (the graph up to the most recent checkpoint) and (the buffered subgraph) as above. A chunk may now contain an update that deletes an edge which was inserted before the checkpoint, and hence, is not in store. Observe, however, that deleting an edge doesn’t violate the validity of a coloring. Hence, if we ignore these edge deletions, the only worry is that they might substantially reduce the maximum degree causing us to use many more colors than desired. Now, note that if we have a -coloring at the checkpoint, then as long as the current maximum degree remains above , we have a -coloring in store. Hence, combining that with a -coloring of the current chunk gives an -coloring. Furthermore, we can keep track of the maximum degree of the graph using only space and detect the points where it falls below half of what it was at the last checkpoint. We declare each such point as a new “ad hoc checkpoint,” i.e., use a fresh sketch to compute a -coloring there. Since the max-degree can decrease by a factor of at most times, we show that it suffices to have only times more parallel sketches initialized at the beginning of the stream. This incurs only an -factor overhead in space. We discuss the algorithm and its analysis in detail in Algorithm 3 and Lemma 5.8 respectively.

To generalize the above to an -coloring in space, we use recursion in a manner reminiscent of streaming coreset construction algorithms. Split the stream into chunks, each of size . Now, instead of storing a chunk entirely and coloring it deterministically, we can recursively color it with colors in space and combine the coloring with the -coloring at the last checkpoint. The recursion makes the analysis of this algorithm even more delicate, and careful work is needed to argue the space usage and to properly handle deletions in the turnstile setting. The details appear in Theorem 5.9.

A Palette-Sparsification-Based -Coloring.  This algorithm uses a different approach to the problem of the adversary forcing color changes. It ensures that, every time an an edge is added, one of its endpoints is randomly recolored, where the color is drawn uniformly from a set of colors, where is determined by the degree of the endpoint, and is the set of colors currently held by neighboring vertices. Let denote the random string that drives this color-choosing process at vertex . When the adversary inserts an edge , the algorithm uses and to determine whether this edge could with significant probability end up with the same vertex color on both ends in the future. If so, the algorithm stores the edge; if not, it can be ignored entirely. It will turn out that when the number of colors is set to establish an -coloring, only an fraction of edges need to be stored, so the algorithm only needs to store bits of data related to the input. The proof of this storage bound has to contend with an adaptive adversary. We do so by first arguing that despite this adaptivity, the adversary cannot cause the algorithm to use more storage than the worst oblivious adversary could have. We can then complete the proof along traditional lines, using concentration bounds. The details appear in Algorithm 2 and Theorem 5.5.

There is a technical caveat here. The random string used at each vertex is about bits long. Thus, the algorithm can only be called semi-streaming if we agree that these random bits do not count towards the storage cost. In the standard streaming setting, this “randomness cost” is not a concern, for we can use the standard technique of invoking Nisan’s space-bounded pseudorandom generator [Nis90] to argue that the necessary bits can be generated on the fly and never stored. Unfortunately, it is not clear that this transformation preserves adversarial robustness. Despite this caveat, the algorithmic result is interesting as a contrast to our lower bounds, because the lower bounds do apply even in a model where random bits are free, and only actually computed input-dependent bits count towards the space complexity.

3 Preliminaries

Defining Adversarial Robustness.  For the purposes of this paper, a “streaming algorithm” is always one-pass and we always think of it as working against an adversary. In the standard streaming setting, this adversary is oblivious to the algorithm’s actual run. This can be thought of as a special case of the setup we now introduce in order to define adversarially robust streaming algorithms.

Let be a universe whose elements are called tokens. A data stream is a sequence in . A data streaming problem is specified by a relation where is some output domain: for each input stream , a valid solution is any such that . A randomized streaming algorithm for running in bits of space and using random bits is formalized as a triple consisting of (i) a function , (ii) a function , and (iii) a function . Given an input stream and a random string , the algorithm starts in state , goes through a sequence of states , where , and provides an output . The algorithm is -error in the standard sense if .

To define adversarially robust streaming, we set up a game between two players: Solver, who runs an algorithm as above, and Adversary, who adaptively generates a stream using a next-token function as follows. With as above, put and . In words, Adversary is able to query the algorithm at each point of time and can compute an arbitrary deterministic function of the history of outputs provided by the algorithm to generate his next token. Fix (an upper bound on) the stream length . Algorithm is -error adversarially robust if

In this work, we prove lower bounds for algorithms that are only required to be -error adversarially robust. On the other hand, the algorithms we design will achieve vanishingly small error of the form and moreover, they will be able to detect when they are about to err and can abort at that point.

Graph Streams and the Coloring Problem.  Throughout this paper, an insert-only graph stream describes an undirected graph on the vertex set , for some fixed that is known in advance, by listing its edges in some order: each token is an edge. A strict graph turnstile stream describes an evolving graph by using two types of tokens—, which causes to be added to , and , which causes to be removed—and satisfies the promises that each insertion is of an edge that was not already in and that each deletion is of an edge that was in . When we use the term “graph stream” without qualification, it should be understood to mean an insert-only graph stream, unless the context suggests that either flavor is acceptable.

In this context, a semi-streaming algorithm is one that runs in bits of space.

In the -coloring problem, the input is a graph stream and a valid answer to a query is a vector in specifying a color for each vertex such that no two adjacent vertices receive the same color. The quantity may be given as a function of some graph parameter, such as the maximum degree . In reading the results in this paper, it will be helpful to think of as a growing but sublinear function of , such as for . Since an output of the -coloring problem is a -sized object, we think of a semi-streaming coloring algorithm running in space as having “essentially optimal” space usage.

One-Way Communication Complexity.  In this work, we shall only consider a special kind of two-player communication game: one where all input belongs to the speaking player Alice and her goal is to induce Bob to produce a suitable output. Such a game, , is given by a relation , where is the input domain and is the output domain. In a protocol for , Alice and Bob share a random string . Alice is given and sends Bob a message . Bob uses this to compute an output . We say that solves to error if . The communication cost of is . The (one-way, randomized, public-coin) -error communication complexity of is solves to error .

If never uses , it is deterministic. Minimizing over zero-error deterministic protocols gives us the one-way deterministic communication complexity of , denoted .

A Result on Random Graphs.  During the proof of our main lower bound (in Section 4.2), we shall need the following basic lemma on the maximum degree of a random graph.

Lemma 3.1.

Let be a graph with edges and vertices, drawn uniformly at random. Define to be its maximum degree. Then for :

(1)
Proof.

Let

be the uniform distribution over graphs with

edges and vertices. Observe the monotonicity property that for all , . Next, let be the distribution over graphs on vertices in which each edge is included with probability , independently of any others, and let be the number of edges of a given graph . Then with ,

The last step follows from the well-known fact that the median of a binomial distribution equals its expectation when the latter is integral; hence

.

Taking and using a union bound and Chernoff’s inequality,

Algorithmic Results From Prior Work.  Our adversarially robust graph coloring algorithms in Section 5.2 will use, as subroutines, some previously known standard streaming algorithms for coloring. We summarize the key properties of these existing algorithms.

Fact 3.1 (Restatement of [Ack19], Result 2).

There is a randomized turnstile streaming algorithm for -coloring a graph with max-degree in the oblivious adversary setting that uses bits of space and random bits. The failure probability can be made at most for any large constant . ∎

In the adversarial model described above, we need to answer a query after each stream update. The algorithm mentioned in creftypecap 3.1 or other known algorithms using “about” colors (e.g., [BCG20]) use at least post-processing time in the worst case to answer a query. Hence, using such algorithms in the adaptive adversary setting might be inefficient. We observe, however, that at least for insert-only streams, there exists an algorithm that is efficient in terms of both space and time. This is obtained by combining the algorithms of [BCG20] and [HP20] (see the discussion towards the end of Section 5.2 for details).

Fact 3.2.

In the oblivious adversary setting, there is a randomized streaming algorithm that receives a stream of edge insertions of a graph with max-degree and degeneracy and maintains a proper coloring of the graph using colors, space, and amortized update time. The failure probability can be made at most for any large constant . ∎

4 Hardness of Adversarially Robust Graph Coloring

In this section, we prove our first major result, showing that graph coloring is significantly harder when working against an adaptive adversary than it is in the standard setting of an oblivious adversary. We carry out the proof plan outlined in Section 2.1, first describing and analyzing our novel communication game of subset-avoidance (henceforth, avoid) and then reducing the avoid problem to robust coloring.

4.1 The Subset Avoidance Problem

Let denote the following one-way communication game.

  • Alice is given with ;

  • Bob must produce with for which is disjoint from .

Let be the problem of simultaneously solving instances of .

Lemma 4.1.

The public-coin -error communication complexity of is bounded thus:

(2)
(3)
Proof.

Let be a -error protocol for and let , as defined in Section 3. Since, for each input , the error probability of on that input is at most , there must exist a fixing of the random coins of so that the resulting deterministic protocol is correct on all inputs in a set

The protocol is equivalent to a function where

  • the range size , because , and

  • for each , the tuple is a correct output for Bob, i.e., for each .

For any fixed , the set of all for which each coordinate is disjoint from the corresponding is precisely the set . The cardinality of this set is exactly . Thus, for any subset of , it holds that . Consequently,

which, on rearrangement, gives eq. 2.

To obtain eq. 3, we note that

(4)

which implies

Since our data streaming lower bounds are based on the problem, it is important to verify that we are not analyzing its communication complexity too loosely. To this end, we prove the following result, which says that the lower bound in Lemma 4.1 is close to being tight. In fact, a nearly matching upper bound can be obtained deterministically.

Lemma 4.2.

For any , , the deterministic complexity of is bounded thus:

(5)
Proof.

We claim there exists an ordered collection of subsets of of size , with the property that for each , there exists a set in which is disjoint from . In this case, Alice’s protocol is, given a set , to send the index of the first set in which is disjoint from ; Bob in turn returns the th element of . The number of bits needed to communicate such an index is at most , implying eq. 5.

We prove the existence of such an by the probabilistic method. Pick a subset of size uniformly at random. For any , define to be the set of subsets in which are disjoint from ; observe that . Then has the desired property if for all , it overlaps with . As

setting ensures the random set fails to have the desired property with probability strictly less than 1. Let be a realization of that does have the property. ∎

4.2 Reducing Multiple Subset Avoidance to Graph Coloring

Having introduced and analyzed the avoid communication game, we are now ready to prove our main lower bound result, on the hardness of adversarially robust graph coloring.

Theorem 4.3 (Main lower bound).

Let be integers with , and , and .

Assume there is an adversarially robust coloring algorithm for insert-only streams of -vertex graphs which works as long as the input graph has maximum degree , and maintains a coloring with colors so that all colorings are correct with probability . Then requires at least bits of space, where

Proof.

Given an algorithm as specified, we can construct a public-coin protocol to solve the communication problem using exactly as much communication as requires storage space. The protocol for the more basic problem is described in Algorithm 1.

1:Require: Algorithm that colors graphs up to maximum degree , always using colors
2: publicly random bits to be used by
3: publicly random permutation of , drawn uniformly
4: an enumeration of the edges of the complete graph on vertices
5:
6:function Alice(S):
7:      ::INIT(), the initial state of
8:     for  from to  do
9:         if  then
10:               ::INSERT(Z, , )               
11:     return
12:
13:function Bob():
14:      empty list
15:     for  from to  do
16:         clr ::QUERY(, )
17:          maximal pairing of like-colored vertices, according to clr
18:         for each pair  do
19:               ::INSERT(, , ) is turned into a matching and inserted          
20:               
21:     if  then
22:         return fail
23:     else
24:         
25:         return      
Algorithm 1 Protocol for

To use to solve instances of avoid, we pick disjoint subsets of the vertex set , each of size . A streaming coloring algorithm on the vertex set with degree limit and using at most colors can be implemented by relabeling the vertices in to the vertices in some set and using . This can be done times in parallel, as the sets are disjoint. Note that a coloring of the entire graph on vertex set using colors is also a -coloring of the subgraphs supported on . To minimize the number of color queries made, Algorithm 1 can be implemented by alternating between adding elements from the matching in each instance (for creftypecap 19), and making single color queries to the -vertex graph (for creftypecap 16).

The guarantee that uses fewer than colors depends on the input graph stream having maximum degree at most . In Bob’s part of the protocol, adding a matching to the graph only increases the maximum degree of the graph represented by by at most one; since he does this times, in order for the maximum degree of the graph represented by to remain at most , we would like the random graph Alice inserts into the algorithm to have maximum degree . By Lemma 3.1, the probability that, given some , this random graph on has maximum degree is

Taking a union bound over all graphs, we find that

We can ensure that this happens with probability at most by requiring .

If all the random graphs produced by Alice have maximum degree , and the colorings requested by the protocol are all correct, then we will show that Bob’s part of the protocol recovers at least edges for each instance. Since the algorithm ’s random bits and permutation random bits are independent, the probability that the the maximum degree is low and the algorithm gives correct colorings on graphs of maximum degree at most is .

The list of edges that Bob inserts (creftypecap 19) are fixed functions of the query output of on its state and random bits . None of the edges can already have been inserted by Alice or Bob, since each edge connects two vertices which have the same color. Because these edges only depend on the query output of , conditioned on this query output they are independent of and . This ensures that ’s correctness guarantee against an adversary applies here, and thus the colorings reported on creftypecap 16 are correct.

Assuming all queries succeed, and the initial graph that Alice added has maximum degree , for each , the coloring produced will have at most colors. Let be the set of vertices covered by the matching , so that are the unmatched vertices. Since no pair of unmatched vertices can have the same color, . This implies , and since is an integer, we have . Thus each for loop iteration will add at least new edges to . The final value of the list will contain at least edges that were not added by Alice; creftypecap 24 converts the first of these to elements of not in the set given to Alice.

Finally, by applying Lemma 4.1, we find that the communication needed to solve independent copies of with failure probability satisfies

where we used to conclude . ∎

Applying the above Theorem 4.3 with “,” we immediately obtain the following corollary, which highlights certain parameter settings that are particularly instructive.

Corollary 4.4.

Let be a monotonically increasing function, and an integer for which and . Let be a coloring algorithm which works for graphs of maximum degree up to ; which at any point in time uses colors, where is the current graph’s maximum degree; and which has total failure probability against an adaptive adversary. Then the number of bits of space used by is lower-bounded as . In particular:

  • If —or, more generally, —then space is needed.

  • To ensure space, is needed.

  • If , then . ∎

5 Upper Bounds: Adversarially Robust Coloring Algorithms

We now turn to positive results. We show how to maintain a -coloring of a graph in an adversarially robust fashion. We design two broad classes of algorithms. The first, described in Section 5.1, is based on palette sparsification as in [ACK19, AA20], with suitable enhancements to ensure robustness. The resulting algorithm maintains an -coloring and uses bits of working memory. As noted in Section 2.2, the algorithm comes with the caveat that it requires a large pool of random bits: up to of them. As also noted there, it makes sense to treat this randomness cost as separate from the space cost.

The second class of algorithms, described in Section 5.2, is built on top of the sketch switching technique of [BJWY20], suitably modified to handle non-real-valued outputs. This time, the amount of randomness used is small enough that we can afford to store all random bits in working memory. These algorithms can be enhanced to handle strict graph turnstile streams as described in Section 3. For any such turnstile stream of length at most , we maintain an -coloring using space. More generally, we maintain an -coloring in space for any . In particular, for insert-only streams, this implies an -coloring in space.

5.1 An Algorithm Based on Palette Sparsification

We proceed to describe our palette-sparsification-based algorithm. It maintains a -coloring of the input graph , where is the evolving maximum degree of the input graph . With high probability, it will store only bits of information about ; an easy modification ensures that this bound is always maintained by having the algorithm abort if it is about to overshoot the bound.

The algorithm does need a large number of random bits—up to of them—where is the maximum degree of the graph at the end of the stream or an upper bound on the same. Due to the way the algorithm looks ahead at future random bits, must be known in advance.

The algorithm uses these available random bits to pick, for each vertex, lists of random color palettes, one at each of “levels.” The level- list at vertex is called and consists of colors picked uniformly at random with replacement from the set . The algorithm tracks each vertex’s degree. Whenever a vertex is recolored, its new color is always of the form , where and . Thus, when the maximum degree in is , the only colors that have been used are the initial default and colors from . The total number of colors is therefore at most .

The precise algorithm is given in Algorithm 2.

1:Input: Stream of edges of a graph , with maximum degree always .
2:
3:Random bits:
4:for each vertex  do
5:     for each  do
6:          list of colors sampled u.a.r. with replacement from      
7:
8:Initialize:
9:for each vertex  do
10:      tracks degree of
11:      maintains color of ; in general
12: empty list of edges
13:
14:Process(edge ):
15: maintain vertex degrees
16:
17:for  from to  do store edges that might be needed in the future
18:     if  and overlap then
19: