Gradient Descent based Weight Learning for Grouping Problems: Application on Graph Coloring and Equitable Graph Coloring

09/05/2019 ∙ by Olivier Goudet, et al. ∙ Université d'Angers 0

A grouping problem involves partitioning a set of items into mutually disjoint groups or clusters according to some guiding decision criteria and imperative constraints. Grouping problems have many relevant applications and are computationally difficult. In this work, we present a general weight learning based optimization framework for solving grouping problems. The central idea of our approach is to formulate the task of seeking a solution as a real-valued weight matrix learning problem that is solved by first order gradient descent. A practical implementation of this framework is proposed with tensor calculus in order to benefit from parallel computing on GPU devices. To show its potential for tackling difficult problems, we apply the approach to two typical and well-known grouping problems (graph coloring and equitable graph coloring). We present large computational experiments and comparisons on popular benchmarks and report improved best-known results (new upper bounds) for several large graphs.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Grouping problems generally involves partitioning a set of items into mutually disjoint groups or clusters according to some guiding decision criteria and imperative constraints [falkenauer1998genetic]. For example, given a graph with node set and edge set , the popular NP-hard graph -coloring problem is to color the nodes of with available colors in such a way that two nodes linked by an edge in must receive two different colors. Equivalently, this problem can be stated in terms of a grouping problem, i.e., to partition the nodes of into color groups such that each color group gathers the nodes receiving the same color. Additional examples of grouping problems include other coloring problems (e.g., equitable coloring and sum coloring) [Lewis16], and well-known NP-hard problems such as graph -partitioning [BenlicH13, BulucMSS016], bin packing [Falkenauer96] and maximally diverse grouping [BrimbergMU15, LaiH16]. Grouping problems are also useful models to formulate a number of relevant applications related to, e.g., assembly line balancing [Falkenauer97], timetabling [LewisP07], single batch–machine scheduling [KashanKK13] and routing [Abbasi-PooyaK17]. More illustrations of grouping problems and their applications can be found in [falkenauer1998genetic, KashanKK13, zhou2016reinforcement].

From a perspective of computational complexity, many grouping problems including those mentioned above are difficult search problems. As a result, finding satisfactory solutions to such problems represents a real challenge from an algorithmic perspective. Given the importance of grouping problems, various solution methods have been investigated in the literature to solve grouping problems. However, a majority of existing studies on grouping problems focuses on particular problems (e.g. the above-mentioned problems) and it is difficult to generate their results to other settings. Among the scarce general studies on grouping problems, grouping genetic algorithms (GGA) introduced in

[falkenauer1998genetic]

are certainly the most representative and significant. As a specialization of the general framework of genetic algorithms, GGA defines a special group encoding scheme and the associated genetic operators that manipulate groups instead of group members. Following the idea of GGA, a grouping version of the particle swarm optimization algorithm (PSO) was adapted to grouping problems in

[KashanKK13]

. In the adapted method (called GPSO), the particle position and velocity updating equations of PSO are modified to preserve the major characteristics of the original PSO equations and to take into account the structure of grouping problems. Contrary to GGA examining discrete spaces, GPSO works in both continuous and discrete spaces. Recently, an original reinforcement learning based local search approach (RLS) was proposed in

[zhou2016reinforcement]

. RLS introduces the idea of using a probability matrix to specify an item-to-group assignment and uses information learned from improved solutions given by local search to update the probability matrix. As such, the method learns a generative model of solution that is used to iteratively seed a local search procedure. RLS provides one of the main inspirations of our work.

In this work we are interested in solving grouping problems from a general perspective by proposing a generic method that can be applied to different grouping problems. The contributions of this work can be summarized as follows.

First, we introduce a weight learning based solution framework, called GpGrad, for solving grouping problems. We adopt the idea of using a weight matrix to specify a relative confidence score for each item to belong to each group. This weight matrix is updated step by step by gradient descent based learning in order to reach a legal solution. This approach benefits from GPU accelerated training and is able to deal with multiple parallel solution evaluations. As the first study of using gradient descent based weight learning to solve grouping problems, this work enriches the still limited toolkit of general solution methods for this important class of problems.

Second, we demonstrate the usefulness of the proposed framework by applying it on two well-known NP-hard grouping problems: the conventional Graph Coloring Problem (GCP) and the Equitable graph Coloring Problem (ECP), which have been intensively studied in the literature (see e.g., [galinier1999hybrid, lu2010memetic, titiloye2011graph, moalic2018variations, zhou2018improving] for the GCP and [diaz2014tabu, lai2015backtracking, sun2017feasible, wang2018tabu] for the ECP). For both problems, we present large computational experiments on well-known benchmark graphs and show the competitiveness of our approach compared to the best performing state-of-the-art algorithms. In particular, we report improved best results for several large geometric graphs and large random graphs for the ECP.

Third, this work shows the viability of formulating a discrete grouping problem as a real-valued weight learning problem. The competitive results on two challenging representative grouping problems invite more investigations of testing the proposed approach to other grouping problems and applications.

The remainder of the paper is organized as follows. Section 2 describes the proposed framework for solving grouping problems. Sections 3 and 4 are devoted to the application of the framework to the GCP and ECP respectively. Section 5 provides illustrative toy examples on the gradient descent learning. Section 6 reports on computational results of our algorithms compared to the state of the art. Section 7 reviews related heuristics of the literature for the GCP and ECP. Section 8 discusses the contributions and presents perspectives for future work.

2 Gradient descent based learning for grouping problems

Given a set of distinct items, the task of a grouping problem is to partition the items of set into different groups , such that and for , while taking into account some specific constraints and optimization objectives. We denote by the search space of all possible grouping assignments with groups (candidate solutions) for a given problem instance.

In this work, we assume that the number of groups is fixed. For grouping problems where the task is to find a minimum number of groups, as the bin packing problem and graph coloring problem, we can successively solve the problem with decreasing values of . Each time a solution is found for a given , it gives an upper bound of the optimal number of groups. For instance, for the general graph coloring problems which aims to determine the chromatic number of a graph (i.e., the smallest number of colors for which a -coloring exists), we typically solve a series of -coloring problems with decreasing [galinier2013recent].

According to this definition of a grouping problem, a candidate solution in the search space can be represented as a collection of groups: . In this work, we use an equivalent matrix formulation of a candidate solution: for a fixed number of groups , a candidate solution will be represented as a matrix in , with and for , if the -th item belongs to the group . The

-th row of the candidate solution matrix defines the group assignment vector of the

-th item and is denoted by .

We assume that the constraints and the optimization objective of the grouping problem can be formulated with a single evaluation function , where is the evaluation (fitness) of the candidate solution . The goal of the problem is to find a solution such that is minimum.

2.1 Weight formulation of the problem

Inspired by the work of [zhou2016reinforcement], we express a (discrete) grouping (a candidate solution) as a continuous grouping by defining a weight matrix in composed of real numbers vectors of size 111Note that the weight matrix can contain any real number while the probability matrix of [zhou2016reinforcement] is composed of non-negative numbers.. This weight matrix corresponds to a learned and soft assignment of each item in each group. The weight matrix representation has the advantages of being more flexible and more informative than the binary group assignment. Indeed, during the search, for each item, the algorithm can learn to reject some group assignments by decreasing the associated weights, confirm some group assignments by increasing their weights, or even set almost equal weights for uncertain group assignments. The -th row of the weight matrix defines the weight vector of the -th item and is denoted by .

Item-to-group assignment Given a weight matrix , the associated solution (grouping) can be derived using the following procedure: for , if and if . Each item selects its group with the maximum weight in . By abuse of notation and for simplicity, we rewrite this group selection for each item as a single function :

(1)

where argmax and one_hot operations are applied along the last axis of the matrix (group axis). Figure 1 shows an example with items and groups.

Figure 1: Example of group selection from a weight matrix .

Fitness evaluation Using the current solution we can compute the fitness function of the problem. As derives from , the given grouping problem becomes then the one of finding such that is minimum. As we explain below, we will use first order gradient descent to solve this continuous optimization problem.

Gradient evaluation To tackle this real-valued optimization problem by first order gradient descent over , we need to compute the gradient of the evaluation function with respect to . This is a real matrix of size whose element is

. By applying the chain rule

[rumelhart1985learning], it gives for :

(2)

2.2 Softmax approximation

The function is differentiable almost everywhere with respect to , but each term entering in equation (2) is always equal to zero. Therefore, we will use the softmax function as a continuous, differentiable approximation to argmax, with informative gradient.

can be approximated by a matrix of size , where each coefficient is computed with the elements of using the softmax function [bridle1990training] as:

(3)

Each coefficient of the matrix denotes the probability that the -th item selects the -th group as its group and is a control parameter in . In the following we rewrite this probability evaluation for each item with a single matrix equation as:

(4)

where the softmax function is applied along the last axis (group axis) for each item.

For any fixed , as goes toward infinity, converges toward . As an example for an item , if the weight vector is , the group selected for this item is . Its corresponds to . Figure 2 shows the smooth approximation depending on the value of the parameter from 0.1 to 100. When the value of increases, the probability vector goes toward .

Figure 2: Value of the probability vector for different values of the parameter . When goes toward infinity goes toward .

This kind of softmax approximation has notably been used to solve assignment problems in statistical physics frameworks [peterson1989new, gold1996softmax], where at each iteration step of the optimization process the control parameter is increased in a deterministic annealing scheme, such that goes toward . More recently this kind of relaxation with the softmax

function has been employed to learn probabilistic sampling of categorical variables in stochastic computational graphs

[jang2016categorical] or to apply

parameters regularization for automatic neural network structures learning

[louizos2017learning, kalainathan2018sam].

2.3 Gradient descent learning of the weights

For , we approximate by , with the Kronecker symbol equal to 1 if and 0 otherwise (details given in Appendix A

). Using this Straight-Through (ST) gradient estimator

[bengio2013estimating] of in equation (2), we obtain for :

(5)
(6)
(7)

In our framework however, we do not use any sequential loop to compute the gradient of each parameter. Instead we simply compute a tensor product to get the full gradient matrix. This operation can easily be parallellized on GPU devices. With tensor operations, the equations (7) for become then a single equation:

(8)

where is the gradient matrix of size of with respect to , is a matrix of 1’s of size , is the Hadamard product (element-wise product) and is the dot product between two matrices. This gradient depends on the grouping assignment problem. In general as we will see later with practical implementations (Section 3), this gradient is well defined and can easily be derived with tensor calculation.

Once the gradient with respect to the matrix of parameters is computed according to equation (8), we use first order gradient descent to update the parameters at each iteration:

(9)

where is a fix learning rate. This kind of first order optimization is classically employed to learn neural networks (in its stochastic version) and is known to be very efficient to learn models with huge number of parameters. Alternative optimization gradient updates could be applied such as second order gradient descent with Hessian matrix, adaptive learning rate or momentum [kingma2014adam]. However for the reason of simplicity, we employ a simple and fast first order gradient descent procedure in this work.

2.4 Weight smoothing

Inspired by the Reinforcement Learning Search (RLS) framework [zhou2016reinforcement], we apply a weight smoothing procedure in order to give the possibility for the method to forget old decisions made long ago and no longer helpful: every iterations, each parameter of the weight matrix is divided by a constant number which can be done with a single tensor computation according to:

(10)

One notices that if goes toward infinity goes toward zero and the probability matrix goes toward a matrix whose elements are all equal to (corresponding to forget everything).

2.5 General GpGrad framework

At initialization, each weight of the matrix

is sampled according to normal distribution with zero mean and

standard deviation. Then, at each iteration, the proposed GpGrad framework involves four steps summarized in Algorithm 1: group selection, fitness evaluation, weights update with gradient descent and weight smoothing.

We highlight in blue the steps of the framework that are problem dependent, i.e, the fitness and gradient evaluations. All other steps could be used as it stands for different grouping problems.

  Input: problem instance, number of available groups and random seed ;   Output: a legal grouping if it is found;    Initialize the parameter matrix: with random seed   while stopping condition not reached do      1) ; Group selection      2) Compute the fitness ; Problem specific fitness evaluation      3) Weights update         i) Compute Problem specific gradient evaluation        ii) ; softmax approximation        iii) Gradient w.r.t        iv) ; First order gradient descent      4) Every iterations do ; Weight smoothing   end while
Algorithm 1 GpGrad framework

The only stochastic part in GpGrad concerns the initialization of the tensor as the weights are drawn randomly according to a normal law. This random initialization has an impact on the search. Therefore for practical performance evaluation, a GpGrad algorithm will be run multiple times with different random seeds to initialize the pseudo-random generator of the normal law.

3 TensCol: an application of the GpGrad framework for graph coloring

In this section we present TensCol, an implementation of the GpGrad framework for the Graph Coloring Problem, which is a well-known grouping problem. Specifically, we consider the graph -coloring problem ( is given) which aims to find a legal -coloring for a given graph , that is, a partition of the nodes (i.e., the items) of into color groups such that any pair of nodes in any color group are not linked by an edge in (such a color group is also called an independent set). In other words, a legal -coloring implies that two nodes linked by an edge necessarily receive two different colors and thus belong to different color groups.

For the purpose of optimization, we define an evaluation (fitness) function that minimizes the number of color conflicts. Formally, for a given graph with available colors, a candidate solution is a partition of vertices into color groups (in other words, is the group of vertices receiving color ). The evaluation function is used to count the conflicting edges induced by :

(11)

where , if , and , and otherwise . A candidate solution is a legal -coloring when . The objective of the GCP is then to minimize , i.e, the number of conflicting edges to find a legal -coloring in the search space.

The GCP has the following property concerning symmetric solutions. Let be an arbitrary permutation of the set of colors and let be a k-coloring. Then is also a -coloring which is strictly equivalent to . In other words, and are two symmetric solutions representing exactly the same -coloring. This property implies that the names of the color groups in a coloring are irrelevant and interchangeable. Symmetric solutions are known to be a source of difficulties for many search algorithms applied on the GCP [ramani2006breaking]. In this work, we show how this problem can be easily avoided within the proposed GpGrad approach.

3.1 Matrix formulation of the graph coloring problem

For a graph with nodes, we note its adjacency symmetric matrix. There is a link between two nodes and in if and only if . For this problem in is the matrix corresponding to the color assignment (or grouping assignment) of each node to a color (or group ). For , as one and only one color has to be assigned to each node.

If we compute , we obtain a matrix in with coefficients:

(12)

One notices that if and only if the two nodes and have the same color (or belong to the same group). Interestingly enough this association matrix is independent by permutation of the colors (corresponding to permutation of the columns in ).

We also introduce the wrong grouping assignment matrix or conflict matrix in , where refers to element-wise product (Hadamard product). has coefficients and such that if and only if the two nodes and are in conflict, meaning that they are in the same group () and there exists an edge in between the two nodes (). The total number of conflicts or score for a candidate solution is then:

(13)

For simplicity we rewrite this equation as:

(14)
(15)

with corresponding to the summation operation of all matrix elements.

3.2 Gradient evaluation

Using the matrix formulation of , we can now compute the gradient of the GCP problem that we will use in the GpGrad framework of Algorithm 1 in order to learn the group assignments.

As , for , the partial derivative of with respect to an element of is (details given in Appendix B):

(16)

The interpretation of equation (16) is obvious as when goes from the value 0 to 1, we expect the cost to increase by the value , and this term corresponds effectively to , the number of adjacent nodes to having the color and thus equal to the number of new conflicts created when the color (or group) is assigned to the node .

Using matrix computation, equation (16) for becomes then:

(17)

where the gradient is a real matrix of size whose elements are equal to 222Notice that the term corresponds to the -matrix used in an efficient implementation proposed in [fleurent1996genetic] of the popular TabuCol algorithm for the GCP [hertz1987using]..

3.3 Algorithm

By incorporating equations (15) and (17) in the GpGrad framework of Algorithm 1 we obtain a first algorithm called whose pseudo-code is displayed in Algorithm 2. The algorithm runs until a stopping condition is reached: when (i.e., a legal -coloring is found) or the number of iterations is greater than (a parameter).

  Input: graph with adjacency matrix , available colors , a random seed , and number of maximum allowed iterations ;   Output: a legal -coloring of if it is found;    Initialize the parameter matrix: with random seed      repeat      1) ; Group selection      2) Compute the fitness ; GCP fitness      3) Weights update         i) Compute GCP gradient w.r.t        ii) ; softmax approximation        iii) Gradient w.r.t        iv) ; First order gradient descent      4) Every iterations do ; Weight smoothing         until  or   if  then      return     end if
Algorithm 2 implementation of the GpGrad framework for the GCP problem

One can notice that contrary to local search based coloring algorithms where only one node changes its color at each iteration, may change the colors of many nodes at each iteration. We will highlight this property in Section 5 on a toy example. It proves to be quite useful to deal with large graphs.

However, this single solution update with can easily get trapped in local optima and does not enable a large exploration of the search space for difficult instances. Therefore, we propose an extension of this algorithm called TensCol which runs multiple on a CUDA parallel computer to significantly increase search diversification and reinforce search intensification as well.

3.4 TensCol, a population-based implementation of for the GCP

To improve the search capacity of the algorithm presented in the last section, we introduce below a population-based extension of the algorithm.

3.4.1 Population-based implementation with multiple parallel candidate solutions

The first idea is to simultaneously minimize a set of candidate solutions with CUDA parallel computation. Each solution corresponds to a candidate -coloring that we note with . In this work we will set for the graph coloring problem. We use the same framework GpGrad, but instead of matrix, we will use three dimensional tensors, to take advantage of CUDA computation on GPU hardware.

We use the same notation as in Section 2 except that we write the different variables in bold in order to indicate that we use three dimensional tensors instead of matrices. Each of these three dimensional tensors has a first axis of size , the other axis are of size and .

The tensor of weights becomes then in . The probability tensor for the solutions can be computed in parallel from with , where softmax denotes softmax tensor computation along the last axis of the three dimensional tensor . The global tensor of these candidate solutions is defined by in as , where the argmax and one_hot operators are applied along the last axis (group axis).

We note the transposed three dimensional tensor of solutions obtained from by swapping its second and third axis. For the candidate solutions, the association tensor is then , where is the dot product between two tensors. This corresponds to the sum product over the last axis of and the second-to-last axis of . For , an element of is given by:

(18)
(19)

We note , the three dimensional tensor of size such that for , each of its element is . It corresponds to duplication of the same matrix .

By performing the element-wise product between the tensors and , we obtain the global conflict tensor for the solutions as , where each element of is given by:

(20)
(21)

The vector of fitness for the solutions can be computed with a single tensor operation from as with corresponding to the summation operation along second and third axis. For :

(22)
(23)

3.4.2 Penalization for diversity and bonus for intensification

Minimizing independently candidate solutions in parallel accelerates the algorithm, but each of the candidate solution can still easily get trapped in a poor local optimum. The idea that we employ in the following is to introduce two dependency terms linking the different solutions for two purposes: to avoid that they repeat the same mistakes by creating conflicts for the same pairs of nodes (diversity of errors), and to encourage them to make the same correct group assignments for each pair of nodes not connected by any link (intensification of good parts of the solution).

Group concentration matrix

To do this, we first perform a summation of the three dimensional association tensor along the first axis. We obtain the group concentration matrix of size :

(24)

with each element of given by:

(25)

is equal to the number of candidate solutions assigning the nodes and in the same group. It corresponds to a measure of similarity between the candidate solutions. The construction of this group concentration matrix is independent by permutation of the colors (or groups) of each candidate solution .

Diversity penalization term for identical and incorrect group assignments

Using this group concentration matrix , we compute the diversity penalization term , where corresponds to a sum of all matrix elements and designates element-wise power tensor calculation:

(26)
(27)

where is a parameter greater than 1 in order to penalize similar conflicts of different candidate solutions. By minimizing this term, the solutions are discouraged to make the same mistakes for the same pair of connected nodes.

Bonus term for identical and correct group assignments

Then using the same idea we compute the bonus term , where is the adjacency matrix of the complement graph of :

(28)
(29)

with greater than 1. By maximizing this bonus term the candidate solutions are encouraged to make the same grouping assignments for any non-connected pair of nodes.

Global loss objective

From now, we denote by the iteration step of the algorithm. Indeed, we empirically noticed that the introduction of the bonus and diversity penalization terms with factors depending linearly on the iteration step during the optimization process improved the results. The intuitive idea is that by modifying step by step the fitness landscape, the algorithm will get out of local minimum traps.

The global loss objective of the model consists in a weighted aggregation between the following three criteria: (i) the sum of the fitness of the candidate solutions (equation (23)), (ii) the diversity penalization term (equation (27)) and (iii) the bonus term (equation (29)), subtracted in order to maximize it:

(30)

with and two weighting parameters.

Gradient of the loss

At iteration , the gradient of this loss with respect to the tensor of solutions is:

(31)

Details are given in Appendix C.

3.4.3 TensCol algorithm

By incorporating equations (30) and (31) in the GpGrad framework of Algorithm 1, we obtain the TensCol algorithm whose pseudo-code is displayed in Algorithm 3.

One can notice from Algorithm 3 that we do not need to evaluate the penalization term and bonus terms at each iteration. It saves computational and memory resources to compute only their gradients with respect to . The algorithm runs until a stopping condition is reached: when the fitness of one of the candidate solutions reaches 0 (i.e., a legal -coloring is found) or when the number of iterations becomes greater than .

      Input: graph with adjacency matrix , available colors , random seed , and number of maximum allowed iterations ;   Output: a legal -coloring if it is found;    Initialize the parameter matrix: with random seed      repeat      1)      2) Fitness evaluation (forward phase)       i)       ii)       iii) Compute the fitness vector      3) Parameters update (backward phase)       i)       ii) Compute       iii) ;       iv)       v) ;      4) Every do ;         until  or   if  then      return   with   end if
Algorithm 3 TensCol for the GCP

4 TensCol for equitable graph coloring

Now we present an application of the TensCol algorithm presented in the last section for the equitable graph coloring problem (ECP), a variant of the graph coloring problem (GCP). The equitable graph coloring problem involves finding a minimum for which an equitable legal -coloring exists.

An equitable legal -coloring of is a partition of the vertex set of graph into disjoint independent sets satisfying the equity constraint, i.e., , where denotes the cardinality of the group . In other words, in an equitable legal -coloring, the color groups have the same or similar group sizes (limited to a size difference of at most one).

In previous heuristic algorithms proposed for the ECP, the search is often made by alternating feasible and infeasible searches respectively in the space of equitable and non-equitable colorings, using very specific operators such as two or three swap operators preserving the equity constraint [sun2017feasible, wang2018tabu] (see Section 7).

With the GpGrad framework we show that we can naturally handle this problem by only modifying the gradient formulation given by equation (31) to take into account this equitable constraint.

4.1 TensCol with equitable coloring constraint

For a -coloring to be equitable the number of nodes assigned to each color group must be equal to or where denotes the integer part of a positive real number and is the number of vertices of , with a particular case of when is divisible by .

We define the equity fitness of a candidate solution as:

(32)

It corresponds to the total number of surplus or deficit in term of number of nodes in the groups with respect to both admissible number of nodes and in each group. A legal solution of the ECP must satisfy .

We denote by the global equity fitness for candidate solutions, with for , given by equation (32).

By convention for this problem we set . Therefore the partial derivative of with respect to an element is:

(33)

with the total number of vertices with color for the candidate solution .

Knowing that , equation (33) can be rewritten in a simpler form as:

(34)

One can notice that this partial derivative term does not depend on the index of the vertex . Indeed the equitable constraint is a group constraint. Therefore the same gradient is applied to all nodes of the same group. According to equation (34) this gradient is equal to:

  1. 0 if the group already satisfies the constraint with or (no need to remove or add any nodes in this group)

  2. +1 if the group is composed of more items than required. Therefore, this gradient will put an equivalent pressure for all nodes to go to another group (if already in it) or not to join the group (if not already in it).

  3. -1 if this group is not composed of enough nodes, that will encourage the nodes to stay in this group or to join this group (if not already in it).

For the TensCol algorithm with the ECP constraint we employ the same GpGrad framework as presented in Algorithm 3 except that we add the ECP gradient of equation (34

). This ECP constraint is introduced progressively with a weight increasing at each epoch by a constant value

. We also remove the bonus term as we empirically noticed that encouraging the different candidate solutions to do the same group assignments for the same pair of nodes degrades the quality of the algorithm for the ECP (this is because it is better that the different candidate solutions propose different grouping assignments for each given pair of nodes in order to explore a wider space of equitable colorings). However we keep the diversity penalization term with associated gradient . The TensCol algorithm for the ECP is displayed in Algorithm 4.

      Input: graph with adjacency matrix , available colors , random seed , and number of maximum allowed iterations ;   Output: a legal equitable -coloring of if it is found;    Initialize the parameter matrix: with random seed      repeat      1)      2) Fitness evaluation (forward phase)       i)       ii)       iii) Compute the ECP fitness vector      3) Parameters update (backward phase)       i)       ii) Compute       iii) ;       iv)       v) ;      4) Every do ;         until  or   if  then      return   with   end if
Algorithm 4 TensCol for the ECP

5 Toy example of grouping problems learned with

In this section, we present illustrative toy examples for the GCP and ECP to give intuitive insights on the gradient descent learning.

5.1 GCP toy example

In this section we use the simple implementation of the GpGrad framework for the GCP corresponding to Algorithm 2, where only one candidate solution is evaluated at each step. We apply it on an easy instance named myciel4.col of the COLOR02 benchmark with 21 nodes and known chromatic number equal to 5. Figure 3 displays the last 3 iterations (over 8) of to color myciel4.col with 5 colors and random seed . The parameters used in are , , and . The number written on each node indicates the gradient of the global score with respect to the selected color for this node. It corresponds to the total number of conflicts, i.e., adjacent nodes with the same color (cf. equation (16)). Red edges correspond to conflicting edges. Blue circles highlight the nodes changing their color from one step to another. As we can see on this figure, tends to change the color of the nodes with the highest gradient values. One can also notice that can change the color of more than one node at each iteration as the update is done on the full candidate solution matrix at each step. A legal coloring solution is shown in Figure 3-(d) with .

(a)
(b)
(c)
(d)
Figure 3: Last three steps of to color the graph myciel4.col with 5 colors (optimal value). The number on each node indicates the gradient of the global score with respect to the selected color for this node. Red edges corresponds to conflicting edges. Blue circles highlight the nodes changing their color from one step to another. Several vertices can change their colors during the same iteration. Better seen in color.

5.2 ECP toy example

The equitable coloring of this same graph (myciel4.col) is now considered with again . We use the same algorithm and the same parameters, but the equitable constraint is taken into account in the gradient evaluation: , with for :

(35)

where is the total number of nodes with color .

Figure 4 displays the last three steps (over 19) of to find an equitable coloring for myciel4.col with 5 colors and random seed . The number on each node indicates the gradient of the global ECP score with respect to the selected color for this node. Therefore this number is equal for each node to the total number of conflicting edges related to this node (red edges) plus one if the group is in surplus and minus one if the group is in deficit. Blue circles highlight the nodes changing their color from one step to another. First we see that the algorithm alternates between legal coloring without any conflicting edges (Figure (a) and (c)) and illegal coloring with red conflicting edges (Figure (b)).

One can notice that many nodes (and even more than required) change from pink to red from Figure (a) to Figure (b) and the other way around from red to pink (and blue) from Figure (b) to Figure (c). This is due to the fact that the color updates are done independently for each node when using a first order gradient descent. It highlights that other update rules that can learn dependencies between nodes could be employed in order to coordinate the multiple node changes and avoid this kind of oscillation phenomenon. This is out of the scope of this work and constitutes an interesting research perspective.

(a)
(b)
(c)
(d)
Figure 4: Last three steps of to find an equitable coloring for myciel4.col with 5 colors (optimal value). The number on each node indicates the gradient of the total number of conflicts plus the gradient of the ECP constraint with respect to the selected color for this node. Red edges corresponds to conflicting edges. Blue circles highlight the nodes changing their color from one step to another. Better seen in color.

6 Computational results on GCP and ECP benchmark instances

This section is dedicated to an assessment of the TensCol algorithm for solving the GCP and ECP. We first present the experimental setting of TensCol for both problems. Then, we show the computational results obtained on the GCP, followed by the results on the ECP.

6.1 Benchmark instances and comparison criterion

For the GCP we test our algorithm on the 36 most widely used benchmark instances from the second DIMACS competition333Publicly available at ftp://dimacs.rutgers.edu/pub/challenge/graph/benchmarks/color/ for assessing the performance of graph coloring algorithms in recent studies [zhou2018improving, moalic2018variations]. For the ECP we test our algorithm on the same set of 73 benchmark instances used in [sun2017feasible, wang2018tabu] from the second DIMACS and COLOR02 competitions444Publicly available at https://mat.gsia.cmu.edu/COLOR02/.

Following the common practice for reporting comparative results in the coloring literature, we focus on the best solution found by each algorithm corresponding to the smallest number of colors needed to color a graph.

It is worth mentioning that for the GCP, no single algorithm including the most recent algorithms can attain the best-known results for all 36 tested DIMACS instances. Indeed, even the best performing algorithms miss at least two best-known results. This is understandable given that these instances have been studied for a long time (for some 30 years) and some best-known results have been achieved under specific and relaxed conditions (e.g., extended run time up to one month) and only by very few algorithms. Moreover, one notices that for these benchmark graphs, when is close to the chromatic number of the given graph or to the current best-known (upper) bound , finding a legal -coloring is a difficult task. As such, an algorithm able to attain the best-known results for a majority of the benchmark instances can be qualified competitive with respect to the current state-of-the-art on graph coloring.

These comments remain valid for the ECP. Meanwhile, given that the ECP has been studies less intensively compared to the GCP, one could still hope to find improved best-known results (i.e., equitable -colorings with smaller than the current best bound ). As we show in Section 6.4, this is indeed possible with the proposed TensCol algorithm, which achieves 8 new record results.

6.2 Experimental setting

The TensCol

algorithm was implemented in Python 3.5 with Pytorch 1.1 library for tensor calculation with Cuda 10.0

555The source code of our algorithm will be made publicly available.. It is specifically designed to run on GPU devices. In this work we used a Nvidia RTX 2080Ti graphic card with 12 GB memory. A Nvidia Tesla P100 with 16 GB memory was used only for the two very large graphs (C2000.5.col and C2000.9.col instances).

For a small graph such as r250.5.col colored with 65 colors for an equitable coloring, when we run TensCol with 200 candidate solutions in parallel there are millions weights in the model updated at each iteration. On a Nvidia RTX 2080Ti graphic card, it takes 0.002 seconds per iteration (for 200 candidate solutions evaluated). For a large graph such as C2000.9.col colored with 431 colors for an equitable coloring, there are millions weights updated at each iteration. On a Nvidia Tesla P100, it takes 0.37 seconds per iteration.

For our experiments, each instance was solved 10 times independently (with 10 different random seeds) following the common practice in the literature for graph coloring.

We used a baseline set of parameters, given in Table 1, for the GCP and ECP, except for the smoothing parameter which is very sensitive to the different structures of the GCP and ECP instances. Indeed we noticed that is a key parameter depending on the fitness landscape of the coloring problem and thus we chose it among a set of 5 different values. For random graphs such as DSJC500.5.col or DSJC1000.5.col with rough fitness landscapes, we had to set to 100 or 200 in order to frequently reset the learned tensor of weights according to equation (10). However, for geometric graphs such as R1000.1.col or instances based on register allocation for variables in real codes such as fpsol2.i.1.col or mulsol.i.1.col, we set meaning that we do not apply any reset process during the optimization process.

Parameter Description Value
number of parallel candidate solutions 200
standard deviation of initial parameters 0.01
exponential base in softmax approximation 1
learning rate 0.001
number of iterations between two smoothing procedures 5
smoothing parameter {1,2,10,100,200}
exponent for diversity penalization 2.5
increase factor for diversity penalization
(only for GCP) exponent for similarity bonus 1.2
(only for GCP) increase factor for similarity bonus
(only for ECP) increase factor for equitable constraint
Table 1: Parameter setting in TensCol for GCP and ECP

For some particular cases we had to deviate from this baseline set of parameters. For random graphs with high density (), we set instead of in order to prevent the diversity penalization term from becoming too important (cf. equation (27)). On the contrary, for random graphs with low density (), we set for the GCP instead of to avoid that the bonus term (cf. equation (29)) becomes too preponderant in the global loss evaluation. For families of instances with a huge number of nodes (more than 2000) such as wap02a.col and wap03a.col of the COLOR02 challenge, we had to decrease the number of candidate solutions evaluated in parallel from to in order to limit the GPU memory required. For the very difficult graph R1000.5.col on the GCP, we achieved a best coloring of 235 with the baseline set of parameters, while a better coloring with 234 colors can be obtained by TensCol with a specific fine tuning (, , and ).

For all these instances, we used as the stopping criterion, a maximum allowed number of iterations of , which corresponds to a maximum of different candidate solutions evaluated (as there are 200 solutions evaluated at each iteration).

6.3 Computational results on the GCP

This section is dedicated to an extensive experimental evaluation of TensCol on the GCP. We show a comparison of TensCol with 5 state-of-the-art coloring algorithms of the literature:

  • the probability learning based local search algorithm (PLSCOL) [zhou2018improving], which is an improved algorithm of reinforcement learning search (RLS) [zhou2016reinforcement].

  • the two-phase evolutionary algorithm (MMT)

    [malaguti2008metaheuristic].

  • the distributed evolutionary quantum annealing algorithm (QA) [titiloye2011graph]

  • the memetic algorithm (MA) [lu2010memetic]

  • the recent implementation of the memetic approach with two elite solutions in the population (HEAD) [moalic2018variations]

A more detailed presentation of each reference algorithm is given in the Section 7.1.

The results of TensCol and the reference algorithms are reported in Table 2: column 2 indicates the number of vertices in each instance, column 3 shows the chromatic number number (if known)666For some instances such as flat1000_76_0 the chromatic number is known by construction of the graph (and equal to 76 in this case) but no heuristic has yet found a legal coloring with this chromatic number.. Column 4 gives the best-known results (best upper bound of ) ever reported by an algorithm. Columns 5 to 9 show the best results () of PLSCOL, MMT, MA, QA and HEAD respectively. The remaining columns reports the results obtained by our TensCol algorithm: the best result (), the success runs over 10 runs during which TensCol attained its best result, and for indicative purpose the average computation time (in seconds) of TensCol for a successful run.

As we observe on Table 2, TensCol always finds the best-known coloring in a small amount of time on the instances with less than 200 nodes such as DSJC125.1.col or R125.1. On medium graphs with up to 500 nodes such as DSJC250.5.col or DSJC500.5.col, Tensol is very competitive compared to all the other algorithms.

For the large DSJC1000.5.col or DSJC1000.9.col instances, TensCol performs worse than MA and HEAD, two best-performing memetic algorithms that rely on recombining large independent sets between solutions, a technique which has proven to be very powerful for this kind of large random graphs. However using this mechanism can be a drawback for large geometric graphs as shown by the poor results obtained by MA and HEAD on the R1000.5.col instance. Indeed, as noticed in [wu2012coloring], for this R1000.5.col instance, some large independent sets are not part of any optimal 234-coloring and extracting such independent sets cannot help to decrease the number of colors needed for the whole graph. TensCol however can recover the optimal coloring for all the family of geometric graphs and in particular for this difficult R1000.5.col instance with an optimal 234-coloring. It is notable that this optimal coloring was only previously found with the MMT algorithm [malaguti2008metaheuristic], which is a complex two-phase method mixing an effective tabu search, a specialized grouping crossover operator, and a post-optimization phase based on a set covering formulation of the GCP.

TensCol also obtains good results for the large latin square graph (latin_square_10), which is difficult in particular for the MMT algorithm, but also for the MA and HEAD memetic algorithms.

Finally, it is worth noting that given the particularity of the GCP benchmark instances as discussed in Section 6.1, it is not suitable to apply statistical tests.

PLSCOL MMT MA QA HEAD TensCol
Instance SR t(s)
DSJC125.1 125 ? 5 5 5 5 5 5 5 10/10 40
DSJC125.5 125 ? 17 17 17 17 17 17 17 10/10 68
DSJC125.9 125 ? 44 44 44 44 44 44 44 10/10 22
DSJC250.1 250 ? 8 8 8 8 8 8 8 10/10 95
DSJC250.5 250 ? 28 28 28 28 28 28 28 10/10 199
DSJC250.9 250 ? 72 72 72 72 72 72 72 10/10 87
DSJC500.1 500 ? 12 12 12 12 12 12 12 10/10 1098
DSJC500.5 500 ? 47 48 48 48 48 47 48 5/10 7807
DSJC500.9 500 ? 126 126 127 126 126 126 126 6/10 18433
DSJC1000.1 1000 ? 20 20 20 20 20 20 20 10/10 10225
DSJC1000.5 1000 ? 82 87 84 83 83 82 84 9/10 32495
DSJC1000.9 1000 ? 222 223 225 223 222 222 224 6/10 58084
DSJR500.1 500 ? 12 12 12 12<