Graph coloring is a fundamental and one of the most intensively studied problems in computer science: given a graph , assign colors to the vertices in such that no two adjacent vertices share the same color. The minimum number of colors needed to color a graph is called the chromatic number of the graph. Finding the chromatic number of a graph is a notoriously hard problem: assuming , there does not exist any polynomial time algorithm that approximates the chromatic number of a graph within a factor of for any constant where is the number of vertices in the graph [FK96, Zuc06, KP06]. However, there are known upper bounds on the chromatic number. A graph with maximum degree can be colored using at most many colors. There is a simple linear time greedy coloring algorithm 111A greedy coloring algorithm considers the vertices of a graph sequentially, assigning it the first available color permitted by the already colored neighbors of the vertex. that achieves such a coloring. A far superior bound is known for a rather large class of graph families: graphs with bounded degeneracy 222A graph is -degenerate if every subgraph of of has a vertex of degree at most . Degeneracy of a graph is the minimum for which the graph is -degenerate. It is easy to see that for any graph.. A graph with degeneracy has a chromatic number of at most . A greedy coloring algorithm on the following vertex ordering produces such a coloring of the graph. Pick the vertex with minimum degree, remove the edges incident on this vertex, recursively order the remaining vertices, then place this vertex at the end of the list. Arboricity 333Arboricity of an undirected graph is the minimum number of forests into which the edges of the graph can be partitioned. of a graph is a closely related concept to degeneracy. For a graph with degeneracy and arboricity , the following relations hold: . Hence, a graph with arboricity bounded by has a chromatic number of at most , and the above greedy coloring algorithm produces such a coloring.
In this work, we initiate the study of the graph coloring problem in the semi streaming model. In this model, the input graph is presented as a stream of edges; any algorithm must process the edges in the order of its arrival, in one or more passes, using space, where is the number of vertices [Mut05, FKM05]. The goal of the problem is to color the vertices of the graph using as few colors as possible. We first investigate whether we can find a -vertex coloring of a graph with maximum degree for any constant in this model. We show that such a coloring is possible to find in one pass. Our space usage involves a factor of and an dependant factor; we omit these factors for clarity, hiding them into an notation.
There is a randomized one pass streaming algorithm that finds a -vertex coloring of the input graph using amount of space.
We then explore whether one can significantly improve the number of colors for more general graph families. In this regard, we consider bounded arboricity graph families. Specifically, we ask whether it is possible to color the vertices of a graph with arboricity using at most many colors for any constant . We answer this question in the affirmative as well, albeit at the the expense of many passes over the stream.
There is a randomized pass streaming algorithm that finds a -vertex coloring of the input graph using amount of space, for any constant where is the arboricity of the input graph.
1.1 Our Techniques
We first discuss a high level overview of our -coloring algorithm. We observe that, if we do not have any restriction on the space, then there is a simple one pass algorithm for -coloring of the input graph. Assign every vertex the same color initially. Upon arrival of an edge, store it, and if both of its end-points share the same color, then recolor one of them using an “available” color (which always exists) that is not assigned to any of its neighbors. However, we do not have the luxury of storing the entire graph. Hence, we resort to a two phase coloring algorithm. In the first phase, we randomly partition the vertex set into many subsets such that for each of these subsets, the subgraph induced by it has maximum degree
with high probability (over the randomness of the partitioning). The random partitioning is realized by coloring each vertex independently and uniformly at random using asize color palette. Then we are able to store each of these induced subgraphs entirely in space. Note that the first phase coloring is not a proper coloring of the graph , and is carried out to decompose the graph into smaller monochromatic subgraphs. Now, in the second phase we color each induced subgraph with a new palette using the aforementioned algorithm to get an -vertex coloring of the original graph. The second phase coloring results in a proper vertex coloring of the input graph. By setting various parameters suitably, we bound the number of colors by . The detailed description of the algorithm is given in section 3.
We now present an overview of our second result, a pass -vertex coloring algorithm for graphs with arboricity . To design a color efficient algorithm for bounded arboricity graphs, we follow the broad strategy employed by Barenboim and Elkin in [BE10]. They gave a distibuted -vetrex coloring algorithm that runs in many rounds. The crux of their algorithm is to find an orientation of the edges of the undirected input graph such that the maximum out-degree of any vertex in the oriented directed graph is bounded by . Call such an orientation useful. We show how to find a useful orientation in many passes using space in the streaming model, following the template of [BE10]. Even with a useful orientation of the edges, we are not quite done. The maximum out-degree of any vertex is bounded by in the oriented graph, which is prohibitively large with the space restriction of the semi streaming model. We employ the two phase coloring technique discussed above to randomly partition the input graph into many induced subgraphs, such that within each induced subgraph, the maximum out-degree with respect to the useful orientation is bounded by . By appropriate settings of parameters, we prove that this is sufficient to a get a vertex coloring algorithm that uses at most many colors for any constant . We present this algorithm is details in section 4.
1.2 Related Work
The graph coloring problem is one of the most central problems in the distributed computing model. A monograph by Barenboim and Elkin [BE13] gives an excellent overview of the state of the art. We mention below few notable results in the synchronous message passing (SMP) model. For more detailed results, we refer to [BE13] and reference therein. In the SMP model, every vertex of the input graph is a processor, and they communicate with their neighbors (over the edges of the graph) in synchronous rounds. The running time of an algorithm is the number of rounds required. There is a randomized -coloring algorithm that requires time [KSOS06]. This result was improved by Schneider and Wattenhofer [SW10], and then subsequently by Barenboim et al. in [BEPS16] who came up with a -coloring algorithm with running time . Barenboim and Elkin [BE10] studied the -vertex coloring problem for graphs with arboricity bounded by . They gave a deterministic algorithm that runs in time and finds an -coloring of the input graph, and hence remarkably stretching the class of graph families for which efficient coloring algorithms are known. The main challenge in converting any of these distributed algorithms into a streaming algorithm lies in reducing the number of rounds. The number of rounds in SMP model possibly translates to the number of passes in the streaming settings. However, in the one pass streaming, it is not clear how to leverage the distributed algorithms to get an efficient streaming algorithm.
The graph coloring problem has been studied extensively in the dynamic setting, where the edges of the graph are inserted and deleted over time, and the goal is to maintain a valid vertex coloring of the graph after every update. Unlike the streaming setting, in this model there is no space restriction. The emphasis here is to use as few colors as possible while keeping the update time small. Bhattacharya et al. [BCHN18] gave a randomized algorithm that maintains -vertex coloring with expected amortized update time. They also gave a deterministic algorithm that maintains -vertex coloring with amortized update time. Barba et al. [BCK17] studied various trade-off between the number of colors used and update time. However the techniques used in the dynamic settings do not seem to be readily applicable in the streaming setting due to the fundamental differences in the models. The problem of edge coloring in the dynamic graph has been considered in [BM17, BCHN18]
. There are many other heuristics based approaches known in this model with emphasis on experimental supremacy[DGOP07, OB11, SIP16, HLT18].
In the streaming model, the problem of coloring an -uniform hypergraph using two colors has been studied in [RSV15]. To the best of our knowledge, the general graph coloring problem has not been considered in the streaming model before.
Throughout this paper, we consider the input graph to be a simple undirected graph with and . We work with streaming model where the input graph is presented as a stream of edges in some adversarial order. We consider cash-register variation of this model in which edges once inserted, are never deleted. For a vertex , denotes its set of neighbors and denotes its degree. The maximum degree of a graph is denoted by .
An orientation of edges of an undirected graph is an assignment of a direction to each edge of the graph. An oriented graph is a directed graph obtained by orientation of edges of the corresponding undirected graph. For a vertex in a directed graph , denotes the set of out-neighbors of and denotes its out-degree. We drop the subscript if the graph is clear from the context. We denote the maximum out-degree of any vertex by . A directed acyclic graph or DAG is a directed graph that has no directed cycle.
Definition 2.1 (Degeneracy).
A graph is -degenerate if every subgraph of has a vertex of degree at most . Degeneracy of a graph is the minimum for which the graph is -degenerate.
Definition 2.2 (Arboricity).
Arboricity of an undirected graph is the minimum number of forests into which the edges of the graph can be partitioned. We denote the arboricity of by . We drop the subscript if the underlying graph is clear from the context. By the work of Nash-Williams [NW64], we have
It is easy to see that both and . A tighter bound of is due to Chartrand et al. [CKW68]. It is also known that .
Definition 2.3 (Vertex Coloring).
A proper vertex coloring of a graph with a set of colors is a function such that .
The Chromatic number of a graph is defined as the minimum number of colors needed to get a proper vertex coloring of the graph. We denote the chromatic number by . It trivially holds that . For bounded degeneracy graphs, the upper bound on improves to . Similarly for graphs with arboricity bounded by , we have .
We apply the following versions of the Chernoff bound several times in the paper. Let
be independent random variables that take values in. Let denote their sum and let denote its expected value. Then
3 -Vertex Coloring
In this section we design a simple one pass streaming algorithm that finds a -vertex coloring of the input graph using amount of space. We assume that is known to us. In the full version of this paper, we present a slightly modified algorithm that removes this assumption.
Suppose . Hence, we are permitted to store every edge of the input graph. Then there is a trivial one pass streaming algorithm that maintains a -vertex coloring of the input graph after every edge update. The algorithm initializes every vertex with the same color. Upon arrival of each edge in the stream, it is stored, and if both of its end-points share the same color, then we recolor one of the end points with an “available” color that is not assigned to any of its neighbors. Since, there are colors in the palette, there always exists an “available” color.
Now we consider the interesting case when . In this case, we perform a two phase coloring of the input graph. In the first phase, we use many colors, where is a constant to be set later in the analysis to bound the probability of success. Let denote this coloring. Note that this coloring may not be a proper coloring of the graph , and is actually done to decompose the graph into smaller induced subgraphs (each color induces a subgraph) before the stream arrives. The first phase coloring results in many monochromatic subgraphs of . Let be the set of vertices with color , and be the corresponding induced subgraph, for all . Denote by the maximum degree of a vertex in the graph . In the second phase, we color each subgraph in parallel, using distinct palettes. We show that with high probability , and hence many colors are sufficient to color each subgraph . We, in fact, store the entire graph for all as the stream arrives, while recoloring the vertices as needed. This can be easily done by the algorithm described in the beginning of this section for the case . Since each is colored using a different palette, the second phase coloring results in a proper coloring of the original graph . The final coloring of the vertices is due to this second phase. We present the coloring procedure in algorithm 1.
We now analyze algorithm 1. We first prove that with high probability algorithm 1 always finds an available color in creftype 21 during the execution of the algorithm. Since we have many colors in , it is sufficient to show that is bounded by with high probability, over the randomness of the first phase coloring. This is handled by lemma 3.1. Hence, algorithm 1 generates a proper coloring with high probability.
Let be the maximum degree of any vertex in the graph , as defined in algorithm 1. Then, with probability at least , for all .
Given any , fix a vertex . For each neighbor of in , let
denote the indicator random variable such thatif has the same color as after the pre-processing step, and otherwise. Let denote the number of neighbors of with the same color as after the pre-processing step. By linearity of expectation, we get
Then we pick some such that .
Thus, by union bound, . Hence, probability that all vertices have at most neighbors with same color as themselves is at least . ∎
The main result in this section is captured in theorem 3.2 below.
There is a randomized one pass streaming algorithm that produces a -vertex coloring of a graph with maximum degree using amount of space, for any constant . Furthermore, the worst case update time of the algorithm is .
The number of colors used by algorithm 1 can be upper bounded by . Assuming , we get
The space usage of the algorithm is dictated by . From lemma 3.1, we have . Hence, the algorithm 1 requires amount of space. The bound on the update time follows from the recoloring time of a vertex in creftype 21. ∎
4 -Vertex Coloring
In this section we discuss a -vertex coloring algorithm in the semi streaming model, where is the arboricity of the input graph. This significantly extends the class of graph families for which efficient coloring algorithms can be designed. In this section, we assume that is known to the algorithm. In the full version of this paper we discuss how to remove this assumption, albeit at the expense of slightly larger palette of colors.
To design a more color efficient algorithm for bounded arboricity graphs, at a high level we follow the strategy of Barenboim and Elkin [BE10]. They designed a distributed coloring algorithm with many colors in many rounds. We first discuss the central idea of their algorithm, and then discuss the challenges in implementing those ideas in the streaming model. Assume, given a graph ) of arboricity , and a small constant , we partition the vertices in into many disjoint subsets such that the following property holds.
Bounded Degree Vertex Partition: For every vertex , , it has at most many neighbours in the vertex set .
Such a partitioning then enables us to orient the edges in a way so that the resulting directed graph is, in fact, a DAG with maximum out-degree of any vertex bounded by . For instance, consider the following orientation process. For an edge , orient it from the vertex with lower partition number to higher partition number. If both and are in the same partition, then orient them from lower vertex id to higher vertex id. It is not difficult to show that such an orientation is acyclic. If there is a cycle in the original graph, then there must be at least one vertex in that cycle that has two outgoing edges in the oriented graph. Since any DAG with maximum out-degree can be colored using at most many colors, the oriented graph leads to a -vertex coloring algorithm in the distributed settings, although not in a straight forward manner. The algorithm requires many rounds. Another interesting property of the vertex partitioning is that the edge orientations are implicitly defined by the partition itself. Hence we do not need to store edge specific information in order to maintain the oriented graph. We list property of the bounded degree acyclic orientation of the edges of a graph in the following item.
Bounded Degree Acyclic Graph Orientation: Given a graph with arboricity , and a small constant , an orientation of the edges is called bounded degree acyclic graph orientation if the orientation is acylic and maximum out degree of any vertex given by the orientation is at most .
We now discuss the challenges in converting these ideas into an algorithm in the semi streaming model. The first challenge is to derive a vertex partitioning with property a using only space. This turn out to be a rather easy task if we are allowed many passes. In order to find a vertex partitioning that satisfies property a, [BE10] gives a simple greedy algorithm that iteratively removes vertices of degree at most from the graph. They show after many iterations, the desired partitioning is achieved. This process easily translates to a pass space deterministic algorithm in the streaming model. For the sake of completeness we include a description of this algorithm in section 4.1.
The second challenge is to design a one pass streaming algorithm that can find a coloring for a DAG with maximum out-degree . Such an algorithm is easy to design in the offline setting, where we can store the entire graph. For example, consider a greedy coloring algorithm that operates on the reverse topologically sorted ordering of the vertices. It assigns a vertex first available color permitted by the already colored neighbors of the vertex. It is easy to see that the algorithm produces a -vertex coloring. In the distributed setting, [BE10] devises an algorithm that requires many rounds. We overcome this obstacle by leveraging ideas from our -coloring algorithm. Instead of working on the DAG directly, we consider a two phase coloring process. In the first phase, we color the vertices using roughly many colors. This results in that many monochromatic subgraphs, such that each subgraph, when viewed as a oriented graph with respect to the vertex partitioning, has maximum out degree bounded by . Hence, in the second phase, we use the offline algorithm to color each of the monochromatic subgraphs using a distinct palette. Setting the parameters suitably in the big ‘O’ notation, we bound the number of colors by .
4.1 Graph Orientation
In this section, we give an algorithm to find an orientation of the edges that has property a. Our algorithm is a straight forward adaptation of the distributed edge orientation algorithm by Barenboim and Elkin [BE10] in the streaming model. The algorithm does not explicitly orient the edges, rather it finds a partitioning of the vertex set that satisfies property a. The edge orientations are implicitly achieved by this partitioning. By discussion in the beginning of section 4, it follows that the orientation has property a. We now present the procedure in algorithm 2.
In analyzing the algorithm, [BE10] showed that . As a result, we have a -pass streaming algorithm.
There is a pass, space streaming algorithm that partitions the vertex set into many disjoint subsets with property a, for .
4.2 Coloring Algorithm
In this section, we give a -vertex coloring algorithm. Note that if , then we can store the entire graph using space. So we consider the interesting case when . We assume that is known to us.
At first, we use algorithm 2 to partition the vertex set into many disjoint subsets such that property a holds, for some small constant . The parameter is set as a function of the input parameter . This ensures that the maximum out degree of any vertex in the implicit orientation of the edges is bounded by . In parallel, we consider a decomposition of the input graph into many subgraphs. This is achieved by assigning every vertex a color picked independently and uniformly at random from a set of many colors, and then considering monochromatic induced subgraphs. We have already demonstrated the usefulness of this idea in designing a -vertex coloring algorithm in section 3. Following the same line of argument, we show that in every monochromatic induced subgraph, maximum out degree of any vertex with respect to the edge orientations is bounded by . We give the details of this process in algorithm 3.
We next analyze algorithm 3. From lemma 4.1 we have, . We first prove the bound on the number of colors used. It is easy to see that the algorithm produces a proper coloring, since each subgraph is colored using a distinct palette. The number of colors is bounded by . The space usage of the algorithm is . Hence, we focus on bounding , which is handled by lemma 4.2.
Let be as defined in algorithm 3. Then, with probability at least , for all .
Given any , fix a vertex . From the fact that has property a and our definition of orientation of the edges, it follows that has out-degree at most . For each out-neighbor of , let denote the indicator random variable such that if has the same color as after the pre-processing step, and otherwise. Then denotes the number of out-neighbors of with the same color as after the pre-processing step. By linearity of expectation, we get
Then we can pick such that .
Then, by union bound, we get probability that all vertices have at most out-neighbors with same color as themselves is at least . ∎
Thus w.h.p. each oriented monochromatic subgraph will have . Then total number of colors used after the second phase is
where the second last inequality follows by pluggin in the values for and . The theorem 4.3 below captures our main result.
Given a graph with arboricity , and a small positive constant , there is a randomized pass streaming algorithm that finds a -vertex coloring of the input graph using amount of space.
- [BCHN18] Sayan Bhattacharya, Deeparnab Chakrabarty, Monika Henzinger, and Danupon Nanongkai. Dynamic algorithms for graph coloring. In Proc. 39th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1–20, 2018.
- [BCK17] Luis Barba, Jean Cardinal, Matias Korman, Stefan Langerman, André van Renssen, Marcel Roeloffzen, and Sander Verdonschot. Dynamic graph coloring. In Workshop on Algorithms and Data Structures, pages 97–108, 2017.
- [BE10] Leonid Barenboim and Michael Elkin. Sublogarithmic distributed mis algorithm for sparse graphs using nash-williams decomposition. Distributed Computing, 22(5-6):363–379, 2010.
- [BE13] Leonid Barenboim and Michael Elkin. Distributed graph coloring: Fundamentals and recent developments. Synthesis Lectures on Distributed Computing Theory, 4(1):1–171, 2013.
- [BEPS16] Leonid Barenboim, Michael Elkin, Seth Pettie, and Johannes Schneider. The locality of distributed symmetry breaking. Journal of the ACM (JACM), 63(3):20, 2016.
- [BKV12] Bahman Bahmani, Ravi Kumar, and Sergei Vassilvitskii. Densest subgraph in streaming and mapreduce. International Conference on Very Large Data Bases, 5(5):454–465, 2012.
- [BM17] Leonid Barenboim and Tzalik Maimon. Fully-dynamic graph algorithms with sublinear time inspired by distributed computing. Procedia Computer Science, 108:89–98, 2017.
- [CKW68] Gary Chartrand, Hudson V Kronk, and Curtiss E Wall. The point-arboricity of a graph. Israel Journal of Mathematics, 6(2):169–175, 1968.
- [DGOP07] Antoine Dutot, Frédéric Guinand, Damien Olivier, and Yoann Pigné. On the decentralized dynamic graph coloring problem. Proc. Worksh. Compl. Sys. and Self-Org. Mod, 2007.
- [FK96] Uriel Feige and Joe Kilian. Zero knowledge and the chromatic number. In Annual IEEE Conference on Computational Complexity, page 278, 1996.
- [FKM05] Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and Jian Zhang. On graph problems in a semi-streaming model. Theor. Comput. Sci., 348(2–3):207–216, 2005. Preliminary version in Proc. 31st International Colloquium on Automata, Languages and Programming, pages 531–543, 2004.
- [HLT18] Bradley Hardy, Rhyd Lewis, and Jonathan Thompson. Tackling the edge dynamic graph colouring problem with and without future adjacency information. Journal of Heuristics, 24(3):321–343, 2018.
- [KP06] Subhash Khot and Ashok Kumar Ponnuswami. Better inapproximability results for maxclique, chromatic number and min-3lin-deletion. In International Colloquium on Automata, Languages and Programming, pages 226–237, 2006.
- [KSOS06] Kishore Kothapalli, Christian Scheideler, Melih Onus, and Christian Schindelhauer. Distributed coloring in õ (√ log n) bit rounds. In Proceedings of the 20th international conference on Parallel and distributed processing, pages 44–44, 2006.
- [MTVV15] Andrew McGregor, David Tench, Sofya Vorotnikova, and Hoa T Vu. Densest subgraph in dynamic graph streams. In International Symposium on Mathematical Foundations of Computer Science, pages 472–482, 2015.
- [Mut05] S. Muthukrishnan. Data streams: Algorithms and applications. Found. Trends Theor. Comput. Sci., 1:117–236, 2005.
- [NW64] CSJA Nash-Williams. Decomposition of finite graphs into forests. Journal of the London Mathematical Society, 1(1):12–12, 1964.
- [OB11] Linda Ouerfelli and Hend Bouziri. Greedy algorithms for dynamic graph coloring. In Communications, Computing and Control Applications (CCCA), 2011 International Conference on, pages 1–5, 2011.
- [RSV15] Jaikumar Radhakrishnan, Saswata Shannigrahi, and Rakesh Venkat. Hypergraph two-coloring in the streaming model. arXiv preprint arXiv:1512.04188, 2015.
- [SIP16] Scott Sallinen, Keita Iwabuchi, Suraj Poudel, Maya Gokhale, Matei Ripeanu, and Roger Pearce. Graph colouring as a challenge problem for dynamic graph processing on distributed systems. In High Performance Computing, Networking, Storage and Analysis, SC16: International Conference for, pages 347–358. IEEE, 2016.
- [SW10] Johannes Schneider and Roger Wattenhofer. A new technique for distributed symmetry breaking. In Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing, pages 257–266, 2010.
Linear degree extractors and the inapproximability of max clique and
Proc. 38th Annual ACM Symposium on the Theory of Computing, pages 681–690, 2006.