1. Introduction
Due to its learning, inference, and patternrecognition abilities, machine learning techniques such as neural networks, probabilistic graphical models (PGM) and other inferencebased algorithms have become quite popular in artificial intelligence research. PGMs can easily express and solve intricate problems with many dependencies, making it a good match for problems such as graph coloring. The PGM process is similar to aspects of human reasoning, such as the process of expressing a problem by using logic and observation, and applying inference to find a reasonable conclusion. With PGMs it is often possible to express and solve a problem from easily formulated relationships and observations, without the need to derive complex inverse relationships. This can be an aid to problems with many interdependencies that cannot be separated into independent parts to be approached individually and sequentially.
Although the cluster graph topology is well established in the PGM literature (Koller and Friedman, 2009), the overwhelmingly dominant topology encountered in literature is the factor graph. We speculate that this is at least partially due to the absence of algorithms to automatically construct valid cluster graphs, whereas factor graphs are trivial to construct. To address this we detail a general purpose construction algorithm termed LTRIP (Layered Trees Running Intersection Property). We have been covertly experimenting with this algorithm for a number of years (Streicher et al., 2016; Brink, 2016).
The graph coloring problem originated from the literal coloring of planar maps. It started with the four color map theorem, first noted by Francis Guthrie in 1852. He conjectured that four colors are sufficient to color neighboring counties differently for any planar map. It was ultimately proven by Kenneth Appel and Wolfgang Haken in 1976 and is notable for being the first major mathematical theorem with a computerassisted proof. In general, the graph coloring problem deals with the labeling of nodes in an undirected graph such that adjacent nodes do not have the same label. The problem is core to a number of real world applications, such as scheduling timetables for university subjects or sporting events, assigning taxis to customers, and assigning computer programming variables to computer registers (Lewis, 2015; Burke et al., 1994; Briggs, 1992). As graphical models got popular, message passing provided an exciting new approach to solving graph coloring and (the closely related) constraint satisfaction problems (Moon and Gunther, 2006; Kroc et al., 2009). For constraint satisfaction the survey propagation message passing technique seems to be particularly effective (Braunstein et al., 2005; Maneva et al., 2004; Kroc et al., 2012; Knuth, 2015). These techniques are primarily based on the factor graph PGM topology.
The work reported here forms part of a larger project aimed at developing an efficient alternative for the above message passing solutions to graph coloring. Cluster graphs and their efficient configuration are important in that work, hence our interest in those aspects here. Although we do also provide basic formulations for modeling graph coloring problems with PGMs, this is not the primary focus of the current paper, but instead only serves as a vehicle for comparing topologies.
This paper is structured as follows. Section 2 shows how the constraints of a graph coloring problem can be represented as “factors”. Furthermore, it is shown how these factors are linked up into graph structures on which inference can be applied. Section 3 discusses the factor graph and cluster graph topologies, as well as algorithms for automatically configuring them. The former is trivial, for the latter we provide the LTRIP algorithm in Section 3.3. Section 4 then integrates these ideas by expressing the well known Sudoku puzzle (an instance of a graph coloring problem) as a PGM. The experiments in Section 5 shows that, especially for complex cases, the cluster graph approach is simultaneously faster and more accurate. The last two sections consider possible future exploration and final conclusions.
2. Graph coloring with PGMs
This section provides a brief overview of graph coloring and PGMs, along with techniques of how to formulate a graph coloring problem as a PGM. We also explore the four color map theorem and illustrate through an example how to solve these and similar problems.
2.1. A general description of graph coloring problems
Graph coloring problems are NPcomplete – easily defined and verified, but can be difficult to invert and solve. The problem is of significant importance as it is used in a variety of combinatorial and scheduling problems.
The general graph coloring problem deals with attaching labels (or “colors”) to nodes in an undirected graph, such that (a) no two nodes connected by an edge may have the same label, and (b) the number of different labels that may be used is minimized. Our focus is mostly on the actual labeling of a graph.
A practical example of such a graph coloring is the classical four color map problem that gave birth to the whole field: a cartographer is to color the regions of a planar map such that no two adjacent regions have the same color. To present this problem as graph coloring, an undirected graph is constructed by representing each region in the map as a node, and each boundary between two regions as an edge connecting those two corresponding nodes. Once the problem is represented in this form, a solution can be approached by any typical graph coloring algorithm. An example of this parametrization can be seen in Figure 1 (a) and (b); we refer to (c) and (d) later on.
2.2. PGMs to represent graph coloring problems
PGMs are used as a tool to reason about largescale probabilistic systems in a computationally feasible manner. They are known for their powerful inference over problems with many interdependencies. It is often useful for problems that are difficult to approach algorithmically, with graph coloring being a specific example.
In essence a PGM is a compact representation of a probabilistic space as the product of smaller, conditionally independent, distributions called factors. Each factor defines a probabilistic relationship over the variables within its associated cluster – a cluster being a set of random variables. For discrete variables this results in a discrete probability table over all possible outcomes of these variables. Instead of explicitly getting the product of these factors (which typically is not computationally feasible), a PGM connects them into an appropriate graph structure. Inference is done by passing messages (or beliefs) over the links in this structure until convergence is obtained. In combination with the initial factor distributions, these converged messages can then be used to obtain the (approximate) posterior marginal distributions over subsets of variables.
To factorize a graph coloring problem, we first need to parametrize the problem probabilistically. This is achieved by allowing each node in the graph to be represented by a discrete random variable
that can take on a number of states. For graph coloring these states are the available labels for the node; e.g. four colors in the case of the four color map problem.Now that we have the random variables of our system, and their domains, we need to capture the relationship between these variables in order to represent it as factors in our PGM. For graph coloring, no two adjacent nodes may have the same color, therefore their associated random variables may not have the same state. One representation of this system would then be to capture this relationship using factors with a scope of two variables, each taken as an adjacent pair of nodes from the coloring graph. Although this is a full representation of the solution space, there is a tradeoff between accuracy and cluster size (we use size to mean cardinality) (Mateescu et al., 2010).
A clique is defined as a set of nodes that are all adjacent to each other within the graph, and a maximal clique is one that is not fully contained inside any other clique. To maximize the useful scope of factors, we prefer to define our factors directly on the maximal cliques of the graph. (We use the terms clique and cluster more or less interchangeably.) We then set the discrete probability tables of these factors to only allow states where all the variables are assigned different labels. In the next section we give an example of this.
After finalizing the factors we can complete the PGM by linking these factors in a graph structure. There are several valid structure variants to choose from – in this paper we specifically focus on factor graph vs the cluster graph structures. In the resulting graph structure, linked factors exchange information with each other about some, and not necessarily all, of the random variables they have in common. These variables are known as the separation set, or “sepset” for short, on the particular link of the graph. Whichever graph structure we choose must satisfy the socalled running intersection property (RIP) (Koller and Friedman, 2009, p.347). This property stipulates that for all variables in the system, any occurrence of a particular variable in two distinct clusters should have a unique (i.e. exactly one) path linking them up via a sequence of sepsets that all contain that particular variable. Several examples of this are evident in Figure 1 (d). In particular note the absence of the variable on the sepset between the and clusters. If this was not so there would have been two distinct sepset paths containing the variable between those two clusters. This would be invalid, broadly because it causes a type of positive feedback loop.
After establishing both the factors as well as linking them in a graph structure, we can do inference by using one of several belief propagation algorithms available.
2.3. Example: The four color map problem
We illustrate the above by means of the four color map problem. The example in Figure 1 can be expressed by the seven random variables to
, grouped into five maximal cliques as shown. There will be no clique with more than four variables (otherwise four colors would not be sufficient, resulting in a counterexample to the theorem). These maximal cliques are represented as factors with uniform distributions over their valid (i.e. nonconflicting) colorings. We do so by assigning either a possibility or an impossibility to each joint state over the factor’s variables. More specifically we use a nonnormalized discrete table and assign a “1” for outcomes where all variables have differing colors, and a “0” for cases with duplicate colors.
For example the factor belief for the clique of the puzzle in Figure 1 is shown in Table 1. These factors are connected into a graph structure – such as the cluster graph in Figure 1 (d). We can use belief propagation algorithms on this graph to find posterior beliefs.
Random variables  
State  1  2  3  4  1  
1  2  4  3  1  nonnormalized  
1  3  2  4  1  
1  3  4  2  1  
4  3  2  1  1  
elsewhere  0 
We successfully tested this concept on various planar maps of size up to regions. These were generated by first generating super pixels using the SLIC algorithm (Achanta et al., 2012) to serve as the initially uncolored regions.
We hypothesize that systems configured as described above, utilizing only binary probabilities, always preserve all possible solutions – as yet we have found no counterexample to this. (Although this certainly is not true of loopy graphs making use of nonbinary probabilities). The underlying reason seems to be that a state considered as possible within a particular factor will always be retained as such except if a message from a neighboring factor flags it as impossible. In that case it is of course quite correct that it should be removed from the spectrum of possibilities.
However, in this four color map case the space of solutions can in principle be prohibitively large. We force our PGM to instead find a particular unique solution, by firstly fixing the colors in the largest clique, and secondly by very slightly biasing the other factor probabilities towards initial color preferences. This makes it possible to pick a particular unique coloring as the most likely posterior option. An example of a graph of 250 regions can be seen in Figure 2.
3. Factor vs cluster graph topologies
The graph structure of a PGM can make a big difference in the speed and accuracy of inference convergence. That said, factor graphs are the predominant structure in literature – surprisingly so since we found them to be inferior to a properly structured cluster graph. Cluster graphs allow for passing multivariate messages between factors, thereby maintaining some of the intervariable correlations already known to the factor. This is in contrast to factor graphs where information is only passed through univariate messages, thereby implicitly destroying such correlations.
A search on scholar.google.com (conducted on June 28, 2017) for articles relating to the use of factor graphs versus cluster graphs in PGMs returned the following counts:

[leftmargin=1.4em]

5590 results for: probabilistic graphical models ”factor graph”,

661 results for: probabilistic graphical models ”cluster graph”, and

49 results for: probabilistic graphical models ”factor graph” ”cluster graph”.
Among the latter 49 publications (excluding four items authored at our university), no cluster graph constructions are found other than for Bethé / factor graphs, junction trees, and the clustering of Bayes networks. We speculate that this relative scarcity of cluster graphs points to the absence of an automatic and generic procedure for constructing good RIP satisfying cluster graphs.
3.1. Factor graphs
A factor graph, built from clusters , can be expressed in cluster graph notation as a Bethé graph . For each available random variable , contains an additional cluster . Their associated factors are all uniform (or vacuous) distributions and therefore does not alter the original product of distributions. Each cluster containing , is linked to this vacuous cluster . This places at the hub of a starlike topology with all the various subsets radiating outwards from it. Due to this starlike topology the RIP requirement is trivially satisfied.
3.2. Cluster graphs
A cluster graph , built from clusters , is a nonunique undirected graph, where

no cluster is a subset of another cluster, for all ,

the clusters are used as the nodes,

the nodes are connected by nonempty sepsets ,

and the sepsets satisfy the running intersection property.
Point (1) is not strictly necessary (see for instance the factor graph structure), but provides convenient computational savings. It can always be realized by simply assimilating nonobliging clusters into a superset cluster via distribution multiplication. Refer to Figure 1 (d) for an example of a typical cluster graph.
Although Koller et al. provides extensive theory on cluster graphs, they do not provide a general solution for the constructing thereof (Koller and Friedman, 2009, p.404). Indeed, they state that “the choice of cluster graph is generally far from obvious, and it can make a significant difference to the [belief propagation] algorithm.” Furthermore, the need for such a construction algorithm is made clear from their experimental evidence, which indicates that faster convergence and an increase in accuracy can be obtained from better graph structuring. Therefore, since cluster graph theory is well established, an efficient and uncomplicated cluster graph construction algorithm will be useful. We provide the LTRIP algorithm for this purpose.
3.3. Cluster graph construction via LTRIP
The LTRIP algorithm is designed to satisfy the running intersection property for a cluster graph by layering the interconnections for each random variable separately into a tree structure, and then superimposing these layers to create the combined sepsets. More precisely, for each random variable available in , all the clusters containing are interconnected into a treestructure – this is then the layer for . After finalizing all these layers, the sepset between cluster nodes and in , is the union of all the individual variable connections over all these layers.
While this procedure guarantees satisfying the RIP requirement, there is still considerable freedom in exactly how the treestructure on each separate layer is connected. In this we were guided by the assumption that it is beneficial to prefer linking clusters with a high degree of mutual information. We therefore chose to create trees that maximizes the size of the sepsets between clusters. The full algorithm is detailed in Algorithm 1 with an illustration of the procedure in Figure 4. Note that other (unexplored) alternatives are possible for the connectionWeights function in the algorithm. In particular, it would be interesting to evaluate information theoretic considerations as criterion.
4. Modeling Sudoku via PGMs
The Sudoku puzzle is a well known example of a graph coloring problem. A player is required to label a grid using the integers “1” to “9”, such that 27 selected regions have no repeated entries. These regions are the nine rows, nine columns, and nine nonoverlapping subgrids of the puzzle. Each label is to appear exactly once in each region. If a Sudoku puzzle is underconstrained, i.e. too few of the values are known beforehand, multiple solutions are possible. A well defined puzzle should have only a unique solution. We illustrate these constraints with a scaleddown Sudoku (with nonoverlapping subgrids) in Figure 5 (a).
We use the Sudoku puzzle as a proxy for testing graph coloring via PGMs, since this is a well known puzzle with many freely available examples. However, it should be kept in mind that solving Sudoku puzzles per se is not a primary objective of this paper (in related work not reported on here we have developed a PGM system capable of easily solving all Sudoku puzzles we have encountered). We now show how to construct a PGM for a Sudoku puzzle, by following the same approach as described for the four color map problem.
4.1. Probabilistic representation
For the graph coloring and probabilistic representation of the Sudoku puzzle, each grid entry is taken as a node, and all nodes that are prohibited from sharing the same label are connected with edges as seen in Figure 5 (b). It is apparent from the graph that each of the Sudoku’s “norepeat regions”, is also a maximal clique within the coloring graph.
The probabilistic representation for the scaleddown Sudoku is, therefore, random variables to , each representing a cell within the puzzle. The factors of the system are set up according to the cliques present in the coloring graph. Three examples of these factors, a row constraint, a column constraint and a subgrid constraint, are respectively , , and . The entries for the discrete table of are exactly the same as those of Table 1. The proper sized Sudoku puzzle used in our experiments are set up in exactly the same manner than the scaled down version, but now using cliques each of size nine.
We should also note that in the case of Sudoku puzzles, some of the values of the random variables are given beforehand. To integrate this into the system, we formally “observe” that variable. There are various ways to deal with this, one of which is to purge all the discrete distribution states not in agreement with the observations. Following this, the variable can be purged from all factor scopes altogether.
4.2. Graph structure for the PGM
We have shown how to parametrize the Sudoku puzzle as a coloring graph, and furthermore, how to parametrize the graph probabilistically. This captures the relationships between the variables of the system via discrete probability distributions. The next step is to link the factors into a graph structure. We outlined factor graph construction in Section
3.1, as well as cluster graph construction via LTRIP in Section 3.3. We apply these two construction methods directly to the Sudoku clusters thereby creating structures such as the cluster graph of Figure 6.4.3. Message passing approach
For the sake of brevity we do not discuss the detail of belief propagation techniques here – this is adequately available from many resources, including our references. However, for completeness we list some settings we applied:

For the inference procedure we used belief update procedure, also known as the LauritzenSpiegelhalter algorithm (Lauritzen and Spiegelhalter, 1988),

The convergence of the system, as well as the message passing schedule, are determined according to KullbackLeibler divergence between the newest and immediately preceding sepset beliefs.

Maxnormalization and maxmarginalization are used in order to find the maximum posterior solution over the system.

To make efficient use of memory and processing resources all discrete distributions support sparse representations.
5. Experimental investigation
As stated earlier, factor graphs are the dominant PGM graph structure encountered in the literature. This seems like a compromise, since cluster graphs have traits that should enable superior performance. In this section we investigate the efficiency of cluster graphs compared to factor graphs by using Sudoku puzzles as test cases.
5.1. Databases used
For our experiments, we constructed test examples from two sources, (a) 50 Sudoku puzzles ranging in difficulty taken from Project Euler (Hughes, 2012), and (b) the “95 hardest Sudokus sorted by rating” taken from Sterten (Sterten, 2005). All these Sudoku problems are welldefined with a unique solution, and the results are available for verification.
5.2. Purpose of experiment
The goal of our experiments is to investigate both the accuracy as well as the efficiency of cluster graphs as compared to factor graphs. Our hypothesis is that properly connected cluster graphs, as constructed with the LTRIP algorithm, will perform better during loopy belief propagation than a factor graph constructed with the same factors.
Mateescu et al. (Mateescu et al., 2010) shows that inference behavior differs with factor complexities. A graph with large clusters is likely to be computationally more demanding than a graph with smaller clusters (when properly constructed from the same system), but the posterior distribution is likely to be more precise. We therefore want to also test the performance of cluster graphs compared to factor graphs over a range of cluster sizes.
5.3. Design and configuration of the experiment
Our approach is to set up Sudoku tests with both factor graphs and cluster graphs using the same initial clusters. With regard to setting up the PGMs, we follow the construction methodology outlined in Section 4.
In order to generate graphs with smaller cluster sizes, we strike a balance between clusters of size two using every adjacent pair of nodes within the coloring graph as described in Section 2.2, and using the maximal cliques within the graph, also described in that section. We do so by generating sized clusters from an variables clique (where ). We split the cliques by sampling all combination of variables from the variable clique, and keeping only a subset of the samples, such that every pair of adjacent nodes from the clique is represented at least once within one of the samples.
For experiments using the Project Euler database we construct Sudoku PGMs with cluster sizes of three, five, seven, and nine variables in this manner. This results in graphs of , , and clusters respectively. We compare the runtime efficiency and solution accuracy for both factor and cluster graphs constructed from the same set of clusters.
On the much harder Sterten database PGMs based on cluster sizes less than nine was very inaccurate. We therefore limit those experiments to only clusters with size nine.
5.4. Results and interpretation
Figure 7 shows the results we obtained.
Cluster graphs showed superior accuracy for all the available test cases. We stress the fact that from our results, whenever a cluster graph failed to obtain a valid solution the corresponding factor graph also failed. However, it happened regularly that a cluster graph succeeded where a factor graph failed, especially so in the more trying configurations.
In the case of small clusters factor graphs apparently are faster than cluster graphs. Since cluster graphs built from small clusters are getting closer to factor graphs in terms of sepset sizes, this is unexpected. We expected the execution speed to also get closer to each other in this case.
As the cluster sizes increase, especially so when the problem domain becomes more difficult, the cluster graphs clearly outperform the factor graphs in terms of execution speed. Two explanations come to mind. Firstly, with the larger sepset sizes the cluster graph needs to marginalize out fewer random variables when passing messages over that sepset. Since marginalization is one of the expensive components in message passing, this should result in computational savings. Secondly, the larger sepset sizes allow factors to pass richer information to its neighbors. This speeds up the convergence rate, once again resulting in computational savings.
6. Future work
The LTRIP algorithm is shown to produce well constructed graphs. However, the criteria for building the maximal spanning trees in each layer can probably benefit from further refinement. In particular we suspect that taking the mutual information between factors into account might prove useful.
Our graph coloring parametrization managed to solve certain Sudoku puzzles successfully, as well as assigning colors to the four color map problem. This is a good starting point for developing more advanced techniques for solving graph coloring problems.
In this paper we evaluated our cluster graph approach on a limited set of problems. We hope that the LTRIP algorithm will enhance the popularity of these problems, as well as other related problems. This should provide evaluations from a richer set of conditions, contributing to a better understanding of the merits of this approach.
7. Conclusion
The objective of this study was a) to illustrate how graph coloring problems can be formulated with PGMs, b) to provide a means for constructing proper cluster graphs, and c) to compare the performance of these graphs to the ones prevalent in the current literature.
The main contribution of this paper is certainly LTRIP, our proposed cluster graph constructing algorithm. The cluster graphs produced by LTRIP show great promise in comparison to the standard factor graph approach, as demonstrated by our experimental results.
References
 SLIC superpixels compared to stateoftheart superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (11), pp. 2274–2282. Cited by: §2.3.
 Survey propagation: An algorithm for satisfiability. Random Structures & Algorithms 27 (2), pp. 201–226. Cited by: §1.
 Register allocation via graph coloring. Ph.D. Thesis, Rice University. Cited by: §1.
 Using probabilistic graphical models to detect dynamic objects for mobile robots. Ph.D. Thesis, Stellenbosch: Stellenbosch University. Cited by: §1.
 A university timetabling system based on graph colouring and constraint manipulation. Journal of Research on Computing in Education 27 (1), pp. 1–18. Cited by: §1.
 Project euler. Note: https://projecteuler.netAccessed: 20170703 Cited by: §5.1.
 The art of computer programming, volume 4, fascicle 6: satisfiability. 1st edition, AddisonWesley Professional. External Links: ISBN 0134397606, 9780134397603 Cited by: §1.
 Probabilistic graphical models: principles and techniques. 1st edition, MIT Press. Cited by: §1, §2.2, §3.2.
 Counting solution clusters in graph coloring problems using belief propagation. In Advances in Neural Information Processing Systems, pp. 873–880. Cited by: §1.
 Survey propagation revisited. CoRR abs/1206.5273. External Links: Link Cited by: §1.
 Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society. Series B (Methodological), pp. 157–224. Cited by: 1st item.
 A guide to graph colouring: algorithms and applications. Springer. Cited by: §1.
 A new look at survey propagation and its generalizations. CoRR cs.CC/0409012. External Links: Link Cited by: §1.
 Joingraph propagation algorithms. Journal of Artificial Intelligence Research 37, pp. 279–328. Cited by: §2.2, §5.2.
 Multiple constraint satisfaction by belief propagation: an example using sudoku. In Adaptive and Learning Systems, 2006 IEEE Mountain Workshop on, pp. 122–126. Cited by: §1.
 Sudoku. Note: http://magictour.free.fr/sudoku.htmAccessed: 20170703 Cited by: §5.1.
 A probabilistic graphical model approach to the structureandmotion problem. In 2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASARobMech), pp. 1–6. Cited by: §1.
Comments
There are no comments yet.