The chromatic number of an -vertex graph is the minimum number of colors needed to color the vertices of so that no two adjacent vertices get the same color. The chromatic number problem is NP-hard and even hard to approximate within a factor of for any constant [FK98, Zuc07, KP06]. For any connected undirected graph with maximum degree , is at most [Viz64]. This existential coloring scheme can be made constructive across different models of computation. A seminal result of recent vintage is that the coloring can be done in the streaming model [ACK19]. Of late, there has been interest in graph coloring problems in the sub-linear regime across a variety of models [AA20a, ACK19, BDH19, BG18, BCG19]. Keeping with the trend of coloring problems, these works look at assigning colors to vertices. Since the size of the output will be as large as the number of vertices, reseachers study the semi-streaming model [McG14] for streaming graphs. In the semi-streaming model, ‡‡‡ hides a polylogarithmic factor. space is allowed.
In a marked departure from the above works that look at the classical coloring problem, the starting point of our work is (inarguably?) the simplest question one can ask in graph coloring – given a coloring function on the vertex set of a graph , is a valid coloring, i.e., for any edge , do both the endpoints of have different colors? This is the problem one encounters while proving that the problem of chromatic number belongs to the class NP [GJ79]. Conflict-Est, the problem of estimating the number of monochromatic (or, conflicting) edges for a graph given a coloring function , remains a simple problem in the RAM model; it even remains simple in the one-pass streaming model if the coloring function is marked on a public board, readable at all times. We show that the problem throws up interesting consequences if the coloring function on a vertex is revealed only when the vertex is revealed in the stream. For a streaming graph, if the vertices are assigned colors arbitrarily or randomly on-the-fly while it is exposed, our results can also be used to estimate the number of conflicting edges. These problems also find their use in estimating the number of conflicts in a job schedule and verifying a given job schedule in a streaming setting. This can also be extended to problems in various domains like frequency assignment in wireless mobile networks and register allocation [EHKR09]. As the problem, by its nature, admits an estimate or a yes-no answer, the need of the space to store all vertices as in the semi-streaming model goes away and we can focus on space efficient algorithms in the conventional graph streaming models like Vertex Arrival [CDK19]
. We also note in passing that many of the trend setting problems in streaming, like frequency moments, distinct elements, majority, etc. have been simple problems in the ubiquitousRAM model as the coloring problem we solve here.
2.1 Notations and the streaming models
Notations. We denote the set by . denotes a graph where and denote the set of vertices and edges of , respectively; and . We will write only and for vertices and edges when the graph is clear from the context. We denote as the set of monochromatic edges. The set of neighbors of a vertex is denoted by and the degree of a vertex is denoted by . Let where and denote the set of neighbors of that have been exposed already and are yet to be exposed, respectively in the stream. Also, where and . For a monochromatic edge , we refer to and as monochromatic neighbors of each other. We define to be the number of monochromatic neighbors of and hence, the monochromatic degree of .
denote the expectation of the random variable. For an event , denotes the complement of .
denotes the probability of an event. The statement “event occurs with high probability” is equivalent to , where is an absolute constant. The statement “ is a multiplicative approximation of ” means . For , denotes the standard exponential function, that is, . By polylogarithmic, we mean . The notation hides a polylogarithmic term in .
Streaming models for graphs. As alluded to earlier, the crux of the problem depends on the way the coloring function is revealed in the stream. The details follow.
(i) Vertex Arrival (VA): The vertices of are exposed in an arbitrary order. After a vertex is exposed, all the edges between and pre-exposed neighbors of , are revealed. This set of edges are revealed one by one in an arbitrary order. Along with the vertex , only the color is exposed, and not the colors of any pre-exposed vertices. So, we can check the monochromaticity of an edge only if and are explicitly stored.
(ii) Vertex Arrival with Degree Oracle (VAdeg) [MVV16, BS20]: This model works same as the VA model in terms of exposure of the vertex and the coloring on it; but we are allowed to know the degree of the currently exposed vertex from a degree oracle on .
(iii) Vertex Arrival in Random Order (VArand) [SK12, TGRV14]: This model works same as the VA model but the vertex sequence revealed is equally likely to be any one of the permutations of the vertices.
(iv) Edge Arrival (EA): The stream consists of edges of in an arbitrary order. As the edge is revealed, so are the colors on its endpoints. Thus the conflicts can be easily checked.
(v) Adjacency List (AL): The vertices of are exposed in an arbitrary order. When a vertex is exposed, all the edges that are incident to , are revealed one by one in an arbitrary order. Note that in this model each edge is exposed twice, once for each exposure of an incident vertex. As in the VA model, here also only ’s color is exposed.
As the conflicts can be checked easily in the EA model in space, a logarithmic counter is enough to count the number of monochromatic edges. The AL model works almost the same as the VAdeg model. So, we focus on the three models – VA, VAdeg and VArand in this work and show that they have a clear separation in their power vis-a-vis the problem we solve. A crucial takeaway from our work is that the random order assumption on exposure of vertices has huge improvements in space complexity.
2.2 Problem definitions, results and the ideas
Problem definition. Let the vertices of be colored with a function , for . An edge is said to be monochromatic or conflicting with respect to if . A coloring function is called valid if no edge in is monochromatic with respect to . For a given parameter , is said to be -far from being valid if at least edges are monochromatic with respect to . We study the following problems.
Problem 2.1 (Conflict Estimation aka Conflict-Est).
A graph and a coloring function are streaming inputs. Given an input parameter , the objective is to estimate the number of monochromatic edges in within a -factor.
Problem 2.2 (Conflict Separation aka Conflict-Sep).
A graph and a coloring function are streaming inputs. Given an input parameter , the objective is to distinguish if the coloring function is valid or is -far from being valid.
The results and the ideas involved. All our upper and lower bounds on space are for one-pass streaming algorithms. Table 1 states our results for the Conflict-Est problem, the main problem we solve in this paper, across different variants of the VA model. The main thrust of our work is on estimating monochromatic edges under random order stream. For random order stream, we present both upper and lower bounds in Sections 4 and 5. There is a gap between the upper and lower bounds in the VArand model, though we have a strong hunch that our upper bound is tight. Apart from the above, using a structural result on graphs, we show in Section 4.2 that the Conflict-Sep problem admits an easy algorithm in the VArand model. To give a complete picture across different variants of the VA models, we show matching upper and lower bounds for the VA and VAdeg models in Section 3 and Appendix E.
|(Sec. 3, Thm. 3.1)||(Sec. 4, Thm. 4.1)||(Sec. 3, Thm. 3.2)|
|(Sec. E.1, Thm. E.1)||(Sec. 5, Thm. 5.1)||(Sec. E.2, Thm. E.2)|
The promise on the number of monochromatic edges is a very standard assumption for estimating substructures in the world of graph streaming algorithm [KKP18, KMSS12, KMPV19, MVV16, BC17]. §§§Here we have cited a few. However, there are huge amount of relevant literature.
We now briefly mention the salient ideas involved. For the simpler variant of Conflict-Est in VA model, we first check if . If yes, we store all the vertices and their colors in the stream to determine the exact value of the number of monochromatic edges. Otherwise, we sample each pair of vertices in ¶¶¶ denotes the set of all size 2 subsets of ., with probability independently ∥∥∥Note that we might sample some pairs that are not forming edges in the graph. before the stream starts. When the stream comes, we compute the number of monochromatic edges from this sample. The details are in Section 3. Though the algorithm looks extremely simple, it matches the lower bound result for Conflict-Est in VA model, presented in Appendix E. The VAdeg model with its added power of a degree oracle, allows us to know for a vertex and as edges to pre-exposed vertices are revealed, we also know and . This allows us to use sampling to store vertices and to use a technique which we call sampling into the future where indices of random neighbors, out of neighbors, are selected for future checking. The upper bound result, for Conflict-Est in VAdeg model, is presented in Section 3, and it is tight as we prove a matching lower bound in Appendix E.
The algorithm for Conflict-Est in VArand model is the mainstay of our work and is presented in Section 4. We redefine the degree in terms of the number of monochromatic neighbors a vertex has in the randomly sampled set. Here, we estimate the high monochromatic degree and low monochromatic degree vertices separately by sampling a random subset of vertices. While the monochromatic degree for the high degree vertices can be extrapolated from the sample, handling low monochromatic degree vertices individually in the same way does not work. To get around, we group such vertices having similar monochromatic degress and treat them as an entity. We also provide a lower bound for the VArand model, in Section 5, using a reduction from multi-party set disjointness; though there is a gap in terms of the exponent in .
The highlights of our work are as follows:
We show that possibly the easiest graph coloring problem is worth studying over streams.
For researchers working in streaming, the gold standard is the EA model as most problems are non-trivial in this model. We point out a problem that is harder to solve in the VA model as compared to the EA model.
We show that the three VA related models have a clear separation in their space complexities vis-a-vis the problem we solve. We could exploit the random order of the arrival of the vertices to get substantial improvements in space complexity.
We could obtain lower bounds for all the three models but the lower bounds are matching for the VA and VAdeg models.
2.3 Prior works on graph coloring in semi-streaming model.
Bera and Ghosh [BG18] commenced the study of vertex coloring in the semi-streaming model. They devise a randomized one pass streaming algorithm that finds a vertex coloring in space. Assadi et al. [ACK19] find a proper vertex coloring using colors via various classes of sublinear algorithms. Their state of the art contributions can be attributed to a key result called the palette-sparsification theorem which states that for an -vertex graph with maximum degree , if colors are sampled independently and uniformly at random for each vertex from a list of colors, then with a high probability a proper coloring exists for the graph. They design a randomized one-pass dynamic streaming algorithm for the coloring using space. The algorithm takes post-processing time and assumes a prior knowledge of . Alon and Assadi [AA20b] improve the palette sparsification result of [ACK19]. They consider situations where the number of colors available is both more than and less than colors. They show that sampling colors per vertex is sufficient and necessary for a coloring. Bera et al. [BCG19] give a new graph coloring algorithm in the semi-streaming model where the number of colors used is parameterized by the degeneracy . The key idea is a low degeneracy partition, also employed in [BG18]. The numbers of colors used to properly color the graph is and post-processing time of the algorithm is improved to , without any prior knowledge about . Behnezhad et al. [BDH19] were the first to give one-pass W-streaming algorithms (streaming algorithms where outputs are produced in a streaming fashion as opposed to outputs given finally at the end) for edge coloring both when the edges arrive in a random order or in an adversarial fashion.
3 Conflict-Est in Va and VAdeg models
In this Section, we design algorithms for Conflict-Est problem in the VA and VAdeg models. We show matching lower bounds later in Appendix E. Mainly, we prove the following two theorems here.
Given any graph and a coloring function as input in the stream, there exists an algorithm that solves the Conflict-Est problem in the VA model with high probability in space, where is a lower bound on the number of monochromatic edges in the graph.
Given any graph and a coloring function as input in the stream, there exists an algorithm that solves the Conflict-Est problem in the VAdeg model with high probability in space, where is a lower bound on the number of monochromatic edges in the graph.
Before going to the algorithms for Conflict-Est problem in the VA and VAdeg model, we discuss as a warm-up, a two-pass algorithm for Conflict-Est in the VA model that uses space, where is the promised lower bound on the number of monochromatic edges in the graph. Here we assume that is known to the algorithm. However, this assumption can be removed easily in a setting with two passes.
A two-pass algorithm for Conflict-Est in Va model (described informally):
- If :
Our algorithm stores all the vertices and their colors. Thus we can determine the number of monochromatic edges exactly. The algorithm in this case is one pass and uses space.
- If :
In the first pass, store each edge with probability . In the second pass, we check each edge stored in the first pass for conflict. In this way, we determine the number of monochromatic edges in the sample, from which, we can obtain a desired approximation of the number of monochromatic edges in the graph. The space complexity of our algorithm in this case is .
If only one pass is allowed, the above algorithm, when , can not be simulated in VA model because of the following reason. Consider an edge such that is exposed before . Note that we will be able to know about the edge only when is exposed but we will be able to check whether only when we have stored and its color. However, there is no clue about the edge when is exposed. So, to solve it in one-pass, we sample each pair of vertices (without bothering if there is an edge between them) with probability , before the start of the stream, and determine the number of monochromatic edges in the sample to get an estimate of the number of monochromatic edges in . This implies that the space complexity of the algorithm for Conflict-Est in VA model is as stated in Theorem 3.1. In VAdeg model, when is exposed we will get and hence . The degree information, when is exposed, gives some statistics regarding how the vertex might be useful in the future. We exploit this advantage of VAdeg model over VA model to get an algorithm for Conflict-Est that has better space complexity (See Theorem 3.2).
3.1 Proof of Theorem 3.1
Our algorithm for Conflict-Est for VA model, first checks if . If yes, we store all the vertices along with their colors to estimate the number of monochromatic edges in the graph exactly. So, the space used by the algorithm is when . We will be done by giving an algorithm for Conflict-Est in VA model that uses space. This algorithm will only be executed when .
Let be the vertices of the graph. Our algorithm starts by generating a sample of vertex pairs where each is added to , independently, with probability . Note that is obtained before the start of the stream. Over the stream, we check the following for each : whether and is monochromatic. Let be the set of monochromatic edges in . Note that the expected value of is given by .
Note that the last inequality holds as and .
Putting together the space complexities of our algorithms for the case and , we have the desired bound on the space.
3.2 Proof of Theorem 3.2
For simplicity of presentation, assume that we know the number of edges in the graph. We will discuss ways to remove this assumption later.
3.2.1 Algorithm for Conflict-Est in VAdeg model when is known
Our algorithm for Conflict-Est for VAdeg model, first checks if . If , we store all the vertices along with their colors to estimate the number of monochromatic edges in the graph exactly. So, the space used by the algorithm is when . We will be done by giving an algorithm for Conflict-Est in VAdeg model that uses space. This algorithm will be executed only when .
Let and w.o.l.g. the vertices are exposed in the order . However, our algorithm does not know about the ordering of the vertices in the stream. Our algorithm stores the following information.
A random subset that will be generated over the stream;
a subset of vertices formed from the first elements in the pairs present in ; the colors of the vertices are also stored;
for each vertex , a number that denotes the number of neighbors in that have been exposed. So, is initialized to when gets exposed in the stream and is at most at any instance of the stream;
a subset of the set of monochromatic edges in .
When a vertex is exposed, our algorithm performs the following steps:
Get from the degree oracle and from the exposed edges and compute ;
Add , with probability to , independently;
Add along with its color to if at least one is added to .
For each such that , increment by .
For each such that , check whether forms a monochromatic edge. If yes, add to . (This step ensures independence so that Chernoff bounds can be used. See Remark 2 below.)
The main catch of the algorithm for Conflict-Est in VAdeg model is in Step-(ii). Due to the added power of degree oracle, we are able to sample edges that have not arrived explicitly in the stream. We referred to this phenomenon as sampling into the future in Section 2.2.
At the end of the stream, we report as the estimate of . Now, we show that Consider a monochromatic edge . W.l.o.g., assume that is exposed sometime after is exposed in the stream. Let be such that has neighbors in . So, is the -th neighbor of exposed after the exposure of . From the description of the algorithm, is added to if and only if is added to . Note that can be added to only when the vertex is exposed in the stream. Before calculating and applying Chernoff bound, we focus on the following remark.
At the first look, it might appear that the monochromatic edges are not independently added to . For example, let us consider the following situation. Let , with and , is added to , that is, is present in and the color of is stored. So, when gets exposed along with its color, we can check whether is monochromatic irrespective of being added to . But the crucial point is that we add to only when is added to . However, s, with and , are added to , independently. That is, each monochromatic edge in is added to , independently.
Note that the last inequality holds as . Observe that the space used by the algorithm is . Note that . Applying Chernoff bound (See Lemma A.1 in Appendix A), we can say that with high probability. Putting together the space complexities of our algorithms for the case and , we have the desired bound on the space.
3.2.2 Modifying the algorithm in Section 3.2.1 when is unknown
In the modified algorithm, we maintain a counter defined as follows.
Consider the following observation about cnt that will be used in our analysis. As mentioned earlier, .
At any point of the streaming algorithm, cnt is a lower bound on , the number of edges in the graph. Moreover, at the end of the stream, cnt becomes . Also, cnt is non-decreasing.
We process the stream by maintaining and , as defined in the algorithm in Section 3.2.1, for the case , until cnt reaches , with a slight difference. Here, we add each to with probability instead of as in Section 3.2.1, where is a vertex exposed while cnt is less than and . So, we have the following observation that will be used later in our analysis.
With high probability, for all the instances in the stream while cnt is less than .
Let be the first exposed vertex in the stream when cnt is more than . Also, let , where denotes disjoint union. Observe that . We construct by selecting independently each element of with probability . Recall that . So, . The observation follows by applying Chernoff bound (see Lemma A.1 (iii) in Appendix A). ∎
However, the modified algorithm behaves differently once cnt is more than . Let be as defined earlier. We maintain two extra objects, as described below, after cnt crosses .
The set of vertices and their colors;
A counter that denotes the number of monochromatic edges having both the endpoints in .
The formal description of the modified algorithm is presented in Algorithm 1.
We describe the algorithm and its analysis by breaking the range of into two cases, that is, (or ) and (or ). We show that the space complexity of the modified algorithm is in the first case and is in the latter case with high probability. Observe that this will imply the desired result as claimed in Theorem 3.2.
In this case, by Observation 3.3, there will be an instance (say when vertex is exposed) such that cnt goes beyond for the first time. Then we start storing all the vertices and their colors in . We stop updating and after is exposed. However, we update until end of the stream as we were doing previously in Section 3.2.1. Along with , we maintain the number of monochromatic edges (say ) having both the endpoints in . Note that is maintained exactly. Finally, we report as the output, where or depending on whether or not, respectively. By Observation 3.4, with high probability, for all the instances when cnt is less than (that is before the exposure of ). Also, after the exposure of , we are storing all the vertices along with their colors explicitly. So, the space used by the algorithm is , with high probability. To see the correctness of the algorithm, let be the set of monochromatic edges having both the endpoints in . Note that . Let be the set of monochromatic edges having at least one vertex in the set , that is, . Using Chernoff bound arguments (see Lemma A.1 in Appendix A), we have the following lemma. The proof of the following lemma is presented in Appendix B.
If , then is a approximation to with probability at least .
If , with probability at least .
Now let us divide the analysis into two cases, that is, and .
In this case, we set . So, is the output, which is always bounded above by . By Lemma 3.5 (i), implies with probability at least . Note that and . Putting everything together, lies between and , with probability at least .
This finishes the proof for the case .
We have proved the correctness of Algorithm 1 by considering the cases and separately. We have also shown that the space complexity of Algorithm 1 is in the former case and is in the latter case with high probability. Hence, we are done with the proof of Theorem 3.2.
4 Conflict-Est and Conflict-Sep in VArand model
In this Section, mainly, we show that the power of randomness can be used to design a better solution for the Conflict-Est problem in the VArand model. The Conflict-Est problem is the main highlight of our work. We feel that the crucial use of randomness in the input that is used to estimate a substructure (here, monochromatic edges) in a graph, will be of independent interest.
In this variant, we are given an and a promised lower bound on , the number of monochromatic edges in , as input and our objective is to determine a -approximation to .
Given any graph and a coloring function as input in the stream, the Conflict-Est problem in the VArand model can be solved with high probability in space, where is a lower bound on the number of monochromatic edges in the graph.
We prove the above theorem in Section 4.1. Note that the above algorithm can be used to solve Conflict-Sep in VArand model. In Section 4.2, we give a simple algorithm for Conflict-Est that exploits a structural property of the subgraph having only monochromatic edges. However, the space complexity of the algorithm for Conflict-Sep (in Section 4.2) is same that of the algorithm for Conflict-Est (in Section 4.1).
4.1 Conflict-Est in VArand model (Proof of Theorem 4.1)
The proof idea
A random sample comes for free – pick the first few vertices:
Let be the random ordering in which the vertices of are revealed. Let be a random subset of many vertices of sampled without replacement ****** hides a polynomial factor of and in the upper bound.. As we are dealing with a random order stream, consider the first vertices in the stream; they can be treated as , the random sample. We start by storing all the vertices in as well as their colors. Observe that if the monochromatic degree of any vertex is large (say roughly more than ), then it can be well approximated by looking at the number of monochromatic neighbors that has in . As a vertex streams past, there is no way we can figure out its monochromatic degree, unless we store its monochromatic neighbors that appear before it in the stream; if we could, we were done. Our only savior is the stored random subset .
Classifying the vertices of the random sample based on its monochromatic degree:
Our algorithm proceeds by figuring out the influence of the color of on the monochromatic degrees of vertices in . To estimate this, let denote the number of monochromatic neighbors that has in . We set a threshold , where . The significance of will be clear from the discussion below. Any vertex
will be classified as aor degree vertex depending on its monochromatic degree within , i.e., if , then is a vertex, else it is a vertex, respectively. (We use the subscripts m to stress the fact that the monochromatic degrees are induced by the set .) Let and be the partition of into the set of and degree vertices in . Let and denote the set of and degree vertices in . Notice that, because of the definition of and degree vertices, not only the sets are subsets of , but they are determined by the vertices of only.
Let and denote the sum of the monochromatic degrees of all the degree vertices and degree vertices in , respectively. So, and . Note that . We will describe how to approximate and separately. The formal algorithm is described in Algorithm 2 as Random-Order-Est that basically executes steps to approximate and in parallel.
; be the random ordering in which vertices are revealed and ;
denotes the number of monochromatic neighbors of in ,
denotes the (estimated) monochromatic neighbors of vertices in .
denotes the set of high degree vertex in , i.e., and ; and ;
The vertices in are partitioned into buckets as follows:
Processing the vertices in , the first vertices, in the stream: for ( each vertex exposed in the stream) do
Computation of some parameters based on vertices in and their colors: for (each with ) do
Processing the vertices in in the stream: for (each vertex exposed in the stream) do
Post processing, after the stream ends, to return the output: From the values of for all , determine the buckets for each vertex in . Also, for each , find . Then determine
To approximate , the random sample comes to rescue:
We can find , that is, a approximation of as described below. For each vertex and each monochromatic edge , , we see in the stream, we increase the value of for and for . After all the vertices in are revealed, we can determine by checking whether for each . For each vertex , we set its approximate monochromatic degree to be . We initialize the estimated sum of the monochromatic degree of high degree vertices as . For each vertex in the stream, we can determine , as we have stored all the vertices in along with their colors, and hence we can also determine whether is a degree vertex in . If is a degree vertex, we determine