A deep learning guided memetic framework for graph coloring problems

09/13/2021 ∙ by Olivier Goudet, et al. ∙ Université d'Angers 0

Given an undirected graph G=(V,E) with a set of vertices V and a set of edges E, a graph coloring problem involves finding a partition of the vertices into different independent sets. In this paper we present a new framework which combines a deep neural network with the best tools of "classical" metaheuristics for graph coloring. The proposed algorithm is evaluated on the weighted graph coloring problem and computational results show that the proposed approach allows to obtain new upper bounds for medium and large graphs. A study of the contribution of deep learning in the algorithm highlights that it is possible to learn relevant patterns useful to obtain better solutions to this problem.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A graph coloring problem is to assign colors to the vertices of a graph subject to certain constraints. One of the most common graph coloring problem is the vertex coloring problem. Given an undirected graph with a set of vertices and a set of edges , the problem is to color the vertices of the graph such that two adjacent vertices receive different colors. This problem can also be seen as finding a partition of the vertex set into different color groups (also called independent sets or color classes) such that two vertices linked by an edge belong to different color groups. In some variants of graph coloring, the problem is to find such legal coloring of the graph while considering additional objective function to minimize.

The search space of a graph coloring problem is composed of the partitions of vertices into color groups:

(1)

where , . This search space is in general huge and finding an optimal solution is in general intractable unless P=NP, as most of the graph coloring problems are NP-hard.

Graph coloring problems have been studied very intensively in the past decades. A first category of methods proposed in the literature are local search procedures. Starting from a solution constructed using greedy methods, local search improves the current solution by considering best moves in a given neighborhood. To be effective, local search heuristics usually incorporate mechanisms to escape local optima based on tabu lists [blochliger2008reactive, hertz1987using] or perturbation strategies [jin2019solving, nogueira2021iterated]. However for very difficult instances of graph coloring, this is often not enough to find the global optimum as the search may be restricted to a single region of the search space. To overcome these difficulties, hybrid algorithms have been proposed, in particular based on the memetic framework that combines local searches and crossovers [MABook2012]. The memetic framework has been very successful in solving several graph colorig problems [jin2014memetic, lu2010memetic, moalic2018variations, porumbel2010evolutionary]

. These hybrid algorithms combine the benefits of local search for intensification with a population of high-quality solutions offering diversification possibilities. The memetic algorithms proposed in the literature for graph coloring typically use a small population with no more than 100 individuals. At each generation, one offspring solution is usually created by randomly selecting two or more individuals in the population and applying a crossover operator. One of the most popular crossovers used for graph coloring problems is the Greedy Partition Crossover (GPX) introduced in the hybrid evolutionary algorithm (HEA)

[galinier1999hybrid]. The GPX produces offspring by choosing alternatively the largest color class in the parent solutions. The offspring that is generated from this crossover is then improved by a local search procedure.

These crossover operators of the memetic algorithm allow to produce new restarting points for the local search procedure, which are expected to be better than a pure random initialization. However, when using such mechanisms, there is usually no way of knowing in advance whether the new restarting point indicates a promising area that is really worth being explored by the local search procedure. Indeed, sometimes, the use of a crossover can bring back to an already visited region of the search space without any chance of further improvement, or to a new region far from the global optimum. Moreover, hybrid algorithms do not have a specific memory to store information about past trajectories done in the search space that could be used to learn useful patterns to solve the problem (although sharing groups with crossovers can be seen as some sort of ”learning” of good patterns).

On the other hand, numerous algorithms have been proposed since decades from the machine learning community to leverage statistical learning methods for solving difficult combinatorial search problems. We refer the reader to the recent survey of

[bengio2020machine]

on this topic. These attempts have been given a new lease of life, with the emergence of deep learning techniques for combinatorial optimization problems

[bello2016neural, dai2017learning], inspired from the great success of the AlphaZero algorithm for combinatorial games [silver2018general]

. In particular, some recent works using reinforcement learning and deep learning have been applied to solve graph coloring problems

[huang2019coloring, lemos2019graph]

. Nevertheless, these studies rarely exploit specific knowledge of the problem, which greatly limits the interest and performance of these approaches. Indeed, the results obtained by this type of approach are for the moment far from the results obtained by state of the art algorithms on graph coloring problems such as hybrid algorithms

[lu2010memetic, malaguti2008metaheuristic, moalic2018variations, porumbel2010evolutionary] and simulated annealing algorithms [titiloye2011quantum]. We can mention however new works which are trying to pair efficient local search algorithms and machine learning techniques [goudet2021population, zhou2016reinforcement, zhou2018improving] with promising results for graph coloring problems.

In this paper, we aim to push further the integration of machine learning and combinatorial optimization, by proposing a new framework which combines deep neural networks with the best tools of ”classical” metaheuristics for graph coloring (local search well adapted with tabu search [hertz1987using] and use of recombination of solutions like those used in memetic algorithms [galinier1999hybrid, lu2010memetic, moalic2018variations, porumbel2010evolutionary]), so as to solve very difficult graph coloring problems which still resist the best current methods. In order to achieve this integration, we propose to revisit an idea proposed by Boyan and Moore twenty years ago. In [boyan2000learning], the authors remarked that the performance of a local search procedure depends on the state from which the search starts and therefore proposed to use a regression algorithm to predict the results of a local search algorithm. Once learned this predictive model can help to select new good starting points for the local search and the authors show on some examples that it can help to find quicker the optimal solution in the search space. We propose to exploit this idea with the use of modern deep learning techniques, in order to help in the selection of promising new crossovers among those possible ones in each generation. We design a specific neural network architecture for graph coloring problems inspired by deep set networks [lucas2018mixed, zaheer2017deep, zhang2019deep], in order to make it invariant by permutation of the color classes. Furthermore, as training a deep learning algorithm requires a large amount of data, we implement a memetic algorithm with a very large population (), whose individuals evolve in parallel in the search space, adapting the framework recently proposed in [goudet2021massively] for latin squares completion. In order to learn the neural network and to compute all the local searches in parallel for all the individuals of the population, we leverage on GPU (Graphic Processing Units) computation.

As a proof of concept, we apply this approach to solve the weighted vertex graph coloring problem (WVCP) which has recently attracted a lot of interest in the literature [nogueira2021iterated, wang2020reduction]. In the WVCP, a strictly positive weight is associated to each vertex . The goal of the weighted vertex coloring problem (WVCP) is to find a legal coloring minimizing the global score:

(2)

This problem is well suited for our framework as predicting this global score, instead of more sophisticated constraints, is well adapted when using regression machine learning techniques.

The WVCP has a number of practical applications in different fields such as matrix decomposition problems [prais2000reactive], batch scheduling [gavranovic2000graph] and manufacturing [hochbaum1997scheduling]. It has been addressed by exact methods [cornaz2017solving, furini2012exact, malaguti2009models] and heuristics [prais2000reactive, sun2018adaptive, wang2020reduction]. The current most performing heuristic for this problem is an iterated local search algorithm exploring two different neighborhoods and based on tabu search [nogueira2021iterated].

2 General framework - revisiting the STAGE algorithm with deep learning and memetic algorithm

Given a problem whose the goal is to find an optimal solution, minimizing an objective function , the expected search results of stochastic local search algorithm A can be defined as:

(3)

where

is the probability that a search starting from

will terminate in state . evaluates ’s promise as a starting state for the algorithm A.

The main idea of the STAGE algorithm [boyan2000learning] was to approximate this expectancy by a regression approximation model

, taking as input the encoded real-valued feature vector

(with features) of a state . This function

can be a regression model such as a linear regression model or a more complex non linear model such as a neural network.

After initializing a first random candidate solution , and the function approximator , the STAGE algorithm evolves in three steps:

  1. Optimize f using A. From , it runs the local search algorithm A, producing a search trajectory that ends at a local optimum .

  2. Train . For each point on the search trajectory, use as a new training pair for the function approximator.

  3. Optimize using hillclimbing. Continuing from , perform a hillclimbing search on the learned objective function . This results in a new state which should be a new good starting point for A.

We propose to revisit this idea with an adaptation for each of the three points of the STAGE algorithm:

  1. First, regarding the first step of the STAGE algorithm, we propose to run in parallel local searches with algorithm A starting from different states, instead of only one. It allows to produce different search trajectories allowing to build a training dataset with a great diversity of examples.

  2. Secondly, regarding the step 2, we do not use any prior mapping from states to features. Following the current trend in deep learning, the embedding of the state can be directly learned in a end to end pipeline with a deep neural network, denoted as . We make this neural network invariant by permutation of the group of colors in the coloring which is a very important feature of all graph coloring problems, by adapting the deep set network architecture proposed in [zaheer2017deep] (see Section 2.3).

  3. Thirdly, in the step 3 of the original STAGE algorithm, a hillclimbing algorithm is used to optimize the current solution guided by the objective function . However, as we address a very complex problem and we use a complex non convex and non-linear function (deep neural network), it is difficult to optimize it using a hillclimbing algorithm. We tried to use more complex algorithms such as tabu search to optimize it, but there is a deeper problem, which is the question of generalization. Indeed, if a state is too different from the states already seen before in the training dataset, and in particular if the color groups that composed it are too different from the color groups already seen before by the neural network, we expect that

    can be very inaccurate for the estimation of

    . Therefore, we propose to replace this hillclimbing procedure by a crossover operation between different members of a population of candidate solutions. By recombining the different color groups already seen before by the learning algorithm we expect the approximation of , given by , to be more precise.

The pseudo-code of the proposed new deep learning guided memetic framework for graph coloring (DLMCOL) is shown in Algorithm 1.

1:  Input: Graph , population size .
2:  Output: The best legal coloring found so far
3:   population_initialization Section 2.1
4:  Initialize the neural network with random weights.
5:   and .
6:  
7:  repeat
8:     for , in parallel do
9:         Section 2.2
10:     end for
11:     
12:     if  then
13:        
14:     end if
15:

     Build supervised learning training dataset

and train the neural network on it. Section 2.3.
16:      Section 2.4
17:      Section 2.5
18:      Section 2.6
19:  until stopping condition met
20:  return  
Algorithm 1 Deep learning guided memetic framework

The algorithm takes a graph as input and tries to find a legal coloring with the minimum score . At the beginning, all the individuals of the population are initialized in parallel using a greedy random algorithm (cf. Section 2.1) and the neural network is initialized with random weights. Then, the algorithm repeats a loop (generation) until a stopping criterion (e.g., a cutoff time limit or a maximum of generations) is met. Each generation involves the execution of five components:

  1. The individuals of the current population are simultaneously improved by running in parallel local searches on the GPU to find new legal solutions with a minimum score (cf. Section 2.2). For each of the improved individuals from step 1, we record , the legal state with the lowest score obtained during each local search trajectory.

  2. From these local search trajectories, a supervised learning training dataset is built with and for and the neural network is trained on this dataset during epochs (cf. Section 2.3).

  3. The distances between all pairs of the existing individuals and new individuals are computed in parallel (cf. Section 2.4).

  4. Then the population updating procedure (cf. Section 2.5) merges the existing and new individuals to create a new population of individuals, by taking into account the fitness of each individual (score of the WVCP) and the distances between individuals in order to maintain some diversity in the population.

  5. Finally each individual is matched with its nearest neighbors in the population. For each individual, offsprings are generated and the one with the best expected score evaluated with the neural network is selected (cf. Section 2.6). After this selection procedure, offspring individuals are selected and become the new starting points which are improved by the parallel by the local search procedure during the next generation ().

The algorithm stops when a predefined condition is reached and return the , the best legal solution found so far. The subsequent subsections present the five components of this deep learning guided memetic framework applied on the WVCP.

2.1 Initialization with a greedy random algorithm for the WVCP

In order to initialize the individuals of the population, we use a greedy random procedure which has already been seen as very effective for the WVCP in [nogueira2021iterated, sun2018adaptive].

First all the nodes are sorted by descending order of weight and by degree, then a color is assigned to each node without creating conflicts by randomly choosing a color in the set of already used color. If no color is available for the node with weight a new color is created (and the score of the current solution is increased by ).

Notice that for this problem the number of colors that can be used in order to find a legal coloring minimizing the global score is not known in advance. It is at least strictly greater than the chromatic number of the graph .

However, for our algorithm a predefined maximum allowed number of colors need to be set in order to specify the size of the layers of the neural network and to allocate memory for the local searches on the GPU. We evaluate this number as the maximum number of colors used in these greedy random procedures applied to initialize the whole population.

The new search space restricted with maximum available colors and composed of the partitions of vertices into color groups is defined as:

(4)

Even if it can be seen as quite restrictive to limit the search in instead of , we empirically notice that the best solution found for each of the different instances of the WVCP presented in the experimental section of this paper (cf. Section 3) still used significantly less than color.

2.2 Parallel iterated Tabu Search with feasible and infeasible searches

For local optimization, we employ a parallel iterated tabu search algorithm to simultaneously improve the individuals of the current population. It relies on the adaptive feasible and infeasible tabu search procedure for weighted vertex coloring (AFISA) proposed in [sun2018adaptive] for the WVCP, with some slight modifications. This feasible and infeasible tabu search uses a sequential procedure that improves a starting legal or illegal coloring by optimizing the fitness function given by:

(5)

where with:

(6)

and where is a penalty coefficient for the number of conflicts in the solution.

The procedure improves the current coloring by successively changing the color of a vertex in the search space (with a maximum of colors). Such a change is called an one-move. To prevent the search from revisiting already visited colorings, a node cannot changes its color for the next (called tabu tenure) iterations111In the original AFISA algorithm, the tabu tenure is concerning past moves (as in the original TabuCol algorithm [hertz1987using]) instead of completely freezing a node, but we empirically observed that it seems more effective for the WVCP to freeze a node which has just changed color in order to avoid to much color changes of the same node without any improvement of the score (plateau).. The tabu tenure is set to be , where is a random integer from , is parameter set to 0.2, and is the number of nodes in the graph222In the original AFISA algorithm, the tabu tenure was set to where corresponds to the extended evaluation function given by equation 5, but we remark that it seems more appropriate to relate this tabu tenure to the number of nodes in the graph as it is directly proportional to the number of available moves at each step. It seems preferable that the global extended score , which depends on the magnitude of the weights for each specific instance, does not have such a direct impact on the tabu tenure..

Like in the AFISA algorithm, we perform successive searches by changing dynamically the value of in order to navigate in the space of legal and illegal colorings. The number of successive searches is set to . At the beginning the parameter is set to the value for each individual of the population, then at the end of each successive local search, if the best current solution found by the tabu search procedure is legal () then the value of is divided by 2 (in order to increase the chance of visiting infeasible solutions), otherwise if the solution is illegal () then is multiplied by (in order to guide the search toward feasible regions)333In the original AFISA algorithm is initially set to 1, cannot be lower than 1 and the adaptive mechanism only increase or decrease its value by 1. We have found empirically more efficient to initialize to a value depending on the density of the graph (via the ratio ) and the amplitude of the weights a we must balance a total weight score with a number of conflicts. Additionally, dividing or multiplying its value by 2 appears to be empirically more effective for faster adjustments, especially for instances with heavy weights than simply changing its value by 1..

During the last iteration of this iterative local search algorithm, we set in order to be sure to guide at least one time each local search toward a legal solution.

As shown in Algorithm 2, our parallel iterative tabu search runs in parallel on the GPU to raise the quality of the current population in parallel.

All the data structures required during the search are stored in each local thread memory running tabu search except the information of the graph which is stored in the global memory and shared.

1:  Input: Population , depth of tabu search , maximum number of iterations , weights of each vertex .
2:  Output: Improved population .
3:  for , in parallel do
4:      and . Records the best solution found so far on each local thread.
5:  end for
6:  
7:  while  do
8:     for , in parallel do
9:         //Improve the solutions by running the tabu search procedure minimizing the extended fitness function with search depth and penalization coefficient .
10:        if  and  then
11:           
12:        end if
13:        if  then
14:           if  then
15:              
16:           else
17:              
18:           end if
19:        else
20:           
21:        end if
22:     end for
23:     
24:  end while
25:  return
Algorithm 2 Parallel iterated tabu search with feasible and infeasible solutions

2.3 Deep neural network training

Once all the local searches are done in parallel, we collect the starting states and the best score found on each local thread. It allows to build a supervised training dataset with examples whose entries are the ’s in and the corresponding targets are real values .

A neural network , parametrized by a vector of parameters (initialized at random at the beginning), will be successively trained on each new dataset produced at each generation (online training) in order to be able to be more and more accurate at predicting the expected score obtained after the local search procedure for any new starting point .

This neural network takes directly as input a coloring as a set of vectors , , where each is a binary vector of size indicating if the vertex belongs to the color group . For such an entry , the neural network outputs a real value noted .

One important characteristic of our neural network is that it be invariant by permutation of the group of colors of any solution given as input. It should be a function from to such that for any permutation of the input color groups,

(7)

As indicated in [lucas2018mixed, zaheer2017deep], such permutation invariant functions can be obtained by combining the treatments of each color group vector with an additional ”color-averaging” operation that performs an average of the features across the different color groups. In this deep set architecture, this averaging is the only form of interaction between the different color groups. It has notably been shown in [lucas2018mixed] that such operations are sufficient to recover all invariant functions from to .

Using the notations proposed in [lucas2018mixed], for a coloring , the color group invariant network is defined as:

(8)

where each is a permutation invariant function from to , where ’s are the layer sizes.

Each equivariant layer operation with input features and ouput features includes a weight matrix that treats each color group independently, a color-mixing weight matrix

and a bias vector

.

As in classical multi-layer feed-forward neural network,

processes each color group vector of the solution independently. Then, the weight matrix computes an average across the different color groups for each feature given by:

(9)
(10)

The output of permutation-equivariant layer is a matrix in , which is the concatenation of output vectors of size :

(11)

where for ,

(12)

where

is a non linear activation function.

After each local search procedure the neural network is trained during epochs on the new dataset using Adam optimizer [kingma2014adam] with initial learning rate

and batches of size 100 in order to minimize the mean square error loss (MSE) between the outputs and the targets. In order to speed up the training, prior to the non linearity we apply a batch normalization layer

[ioffe2015batch], adapted to keep the invariant property of the network.

Once learned this neural network will be used to select new crossovers for the next generation (see Section 2.6 below), but before performing crossovers, we must decide if the new legal colorings obtained after the parallel local search procedure can be inserted into the population. In order to do this, a distance-and-quality based pool update strategy is used to create a new population satisfying a minimum spacing among the individuals to ensure population diversity [PorumbelHK11]. Maintaining this minimum spacing requires the computation of pairwise distances between the solutions, which is presented the next subsection.

2.4 Distance computation

Following [goudet2021population, lu2010memetic, porumbel2010evolutionary], for population updating, we use a matrix to record all the distances between any two solutions of the population. This symmetric matrix is initialized with the pairwise distances computed for each pair of individuals in the initial population, and then updated each time a new individual is inserted in the population.

To merge the new solutions and the existing solutions, we need to evaluate (i) distances between each individual in the population and each improved individual in and (ii) distances between all the pairs of individuals in . All the distance computations are independent from one another, and are performed in parallel on the GPU (one computation per thread).

Given two colorings and , we use the set-theoretic partition distance to measure the dissimilarity between and , which corresponds to the minimum number of vertices that need to be displaced between color classes of to transform to [porumbel2011efficient]. The exact partition distance between two solutions can be calculated with the Hungarian algorithm [kuhn1955hungarian] in time. However, given that we need to compute millions of distances at each generation with the large population, we instead adopt the efficient approximation algorithm presented in [porumbel2011efficient], which scales in .

2.5 Population update

According to [porumbel2010evolutionary, PorumbelHK11], the population update procedure aims to keep the best individuals, but also to ensure a minimum spacing distance between the individuals. The update procedure is sequential, as we need to compare one by one existing individuals in the population at generation and the local search improved offspring solutions in the population .

We use the population update procedure proposed by [goudet2021population]. This procedure greedily adds one by one the best individuals of in the population of the next generation until reaches individuals, such that ( is the number of vertices), for any , . Each corresponds to the approximation of the set-theoretic partition distance which was precomputed in the last step of the algorithm.

2.6 Parent matching and selection of crossovers with the neural network

At each generation, each individual of the population is matched with its nearest neighbors in the population (in the sense of the distance evaluated in subsection 2.4).

For each individual , offspring solutions () are generated using the well-known GPX crossover [galinier1999hybrid, moalic2018variations], where the individual is taken as the first parent and its neighbor is the second parent (the GPX crossover is not symmetric).

For each individual , among these crossovers, we select the one with the best expected score evaluated with the neural network:

(13)

After this selection procedure, offspring solutions are selected and serve as the new starting points that are further improved in parallel by the local search procedure during the next generation ().

3 Experimental results

This section is dedicated to a computational assessment of the proposed deep learning memetic framework for solving the weighted vertex coloring problem, by making comparisons with the state-of-the-art methods.

3.1 Implementation on graphic processing units

The DLMCOL algorithm was programmed in Python with the Numba 0.53.1 library for CUDA kernel implementation (local search, distance computation, crossovers). The neural network is implemented in Pytorch 1.8.1. DLMCOL is specifically designed to run on GPUs. In this work we used a V100 Nvidia graphic card with 32 GB memory.

3.2 Benchmark instances

We carried out extensive experiments on the benchmark used in the recent papers on the WVCP [nogueira2021iterated, sun2017feasible, wang2020reduction]: the pxx, rxx, DIMACS/COLOR small, and DIMACS/COLOR large instances. The pxx and rxx instances are based on matrix-decomposition problems [prais2000reactive], while DIMACS/COLOR small [cornaz2017solving, furini2012exact] and DIMACS/COLOR large [sun2017feasible] are based on DIMACS and COLOR competitions.

As indicated in [nogueira2021iterated, wang2020reduction], a preprocessing procedure can be applied to reduce a graph with the set of weight for the WVCP. For each clique with nodes, if we note the the smallest weight of this set, all the nodes in the graph with a degree equal to and a weight can be removed from the graph without changing the optimal WVCP score that can be found for this instance. Enumerating all the cliques of the graph is a challenging problem. We used the igraph python package444https://igraph.org/python/ with a ten seconds timeout for all instances. For small instances it is enough to enumerate all the cliques of a graph.

3.3 Parameter setting

The population size of DLMCOL is set to , which is chosen as a multiple of the number of 64 threads per block. This large population size offers a good performance ratio on the Nvidia V100 graphics cards that we used in our experiments, while remaining reasonable for pairwise distance calculations in the population, as well as the memory occupation on the GPU for medium instances (). However for large instances (), we set in order to limit the memory occupation on the device.

In addition to the population size, the parameter of the tabu search is set to 0.2 and the number of tabu iterations depends on the size of the graph. The maximum number of iterated local searches launched at each generation, , is set to 10. The minimum spacing distance used for pool update is set to .

For the neural network we implement an architecture with 4 hidden layers of size , , and

. The non linear activation function is a Leaky Relu function defined as

, with . The neural network is trained during epochs at each generation with Adam optimizer and initial learning rate .

Table 1 summarizes the parameter setting, which can be considered as the default and is used for all our experiments.

Parameter Description Value
Population size 20480 (8192)
Number of iterations local search 10
Number of iterations tabu search
Tabu tenure parameter 0.2
Minimum spacing between two individuals
Learning rate of the neural network 0.001
Number of epochs of the training 20
Number of considered neighbors for crossover selection 32
Table 1: Parameter setting in DLMCOL

3.4 Comparative results on WVCP bechmarks

This section shows a comparative analysis on the pxx, rxx, DIMACS/COLOR small, and DIMACS/COLOR large instances with respect to the state-of-the-art methods [nogueira2021iterated, sun2017feasible, wang2020reduction]. The reference methods include the three best recent heuristics: AFISA [sun2017feasible], RedLS [wang2020reduction] and ILS-TS [nogueira2021iterated]. When they are available, we also include the optimal scores obtained with the MWSS exact algorithm [cornaz2017solving] and reported in [nogueira2021iterated].

Given the stochastic nature of the DLMCOL algorithm, each instance was independently solved 10 times. For small instances presented in Tables 2 and 3, a time limit of 1 hour was used. However for medium and large instance in Tables 4 and 5, as training the neural network and performing all the local searches for the large population is time consuming, a cutoff limit of 48 hours is retained.

For a fair comparison, we also launched the local search methods RedLS and ILS-TS during 48 hours, until no improvements is observed. It was done on a computer with an Intel Xeon E5-2670 processor (2.5 GHz and 2 GB RAM). As the available AFISA binary code does not allow setting a cutoff time, we only report its results mentioned in the original article [sun2017feasible].

However, we acknowledge that the comparison remains difficult in term of computational time between DLMCOL and the competitors, as DLMCOL uses a GPU while the other algorithms, AFISA, RedLS and ILS-TS use a CPU. Therefore the computation times are given for indicative purposes only.

Columns 1, 2, and 3 of Tables 25 show the characteristics of each instance (i.e., name of the instance, number of vertices , and optimal result reported in the literature). Columns 4-9 present the best and average scores obtained by the reference algorithms for the difference instances, as well as the average time in second required to obtain the best results. The results of the proposed DLMCOL algorithm are reported in columns 10 and 11. Bold numbers show the dominating values while a star indicates a new upper bound555The certificates of the new best-known solutions from DLMCOL are available at https://github.com/GoudetOlivier/DLMCOL..

Due to memory limit on each thread of the GPU for the local searches, the DLMCOL algorithm was not runned on the biggest instances of the DIMACS/COLOR benchmarks: C2000.5, C2000.9, DSJC1000.9 and wap01-4a.

As can be seen from Tables 24, for all the small DIMACS/COLOR, pxx and rxx instances, DLMCOL can attain the known optimality or best result reported in the literature with almost 100 % success rate over 10 runs. Hoverer, the computational time required to achieve these results is in general higher in particular when compared with ILT-TS. It may be explained by the fact that DLMCOL can only obtain a first result after 10 iterations of local search in parallel of the whole population.

For the larger instances reported in Table 5, DLMCOL obtains excellent results by reaching the best-known score for 31 over 49 instances. For 11 of them, DLMCOL even finds new upper bounds that were never been reported before. In particular, we notice important improvements in comparison with the best methods of the literature for 3 instances, by significant reducing their best-known scores: DSJC500.5 from 707 to 686, flat1000_50_0 from 1184 to 924 and latin_square_10 from 1542 to 1483.

However, DLMCOL does not work in general for large graphs with low density of edges such as DSJC1000.1, inithhx.i.2, inithhx.i.3 and wapXXa. In fact, for these instances, it is very hard for the neural network to learn a common backbone of good solutions. Indeed, the good solutions for these instances are characterized by a low ratio of the number of color groups over the total number of vertices. As for the WVCP, only the maximum weight of each color group has an impact on the score, for these instances many different groupings of vertices are possible without impacting the score. It results in a very high average distance between the best solutions of the population. Therefore, the neural network fails to learn relevant patterns in this too diversified population.

For the largest graphs, we notice that the convergence of the algorithm is quite slow. Even after 48 hours, there is still room for further improvements. For the DJSC1000.5, flat1000_60_0 and flat1000_76_0 instances, it is possible to obtain still better new upper bounds respectively of 1185, 1162, 1165, after 138, 98 and 95 hours, respectively.

Instance AFISA RedLS ILS-TS DLMCOL
Graph name Opti. Best (Avg.) t (s) Best (Avg.) t (s) Best (Avg.) t (s) Best (Avg.) t (s)
DSJC125.1g 124 - 23 (24) 3016 23 0.01 23 3 23 22
DSJC125.1gb 124 - 90 (92.5) 402 91 (91.7) 696 90 15 90 21
DSJC125.5g 125 - 71 (72.3) 216 72 32895 71 77 71 416
DSJC125.5gb 125 - 243 (250.2) 369 241 (241.3) 13528 240 219 240 38
DSJC125.9g 125 169 169 (169.9) 16 169 3493 169 1.27 169 120.8
DSJC125.9gb 125 604 604 (605.5) 444 604 17.62 604 59.8 604 81
GEOM100 100 65 65 (65.0) 0.81 65 (67.5) 0.01 65 0.03 65 14
GEOM100a 100 89 89 (89.5) 110 90 (93.3) 0.01 89 0.65 89 19
GEOM100b 100 32 32 (33.1) 59 32 0.6 32 0.02 32 17
GEOM110 110 68 68 (68.0) 34 68 (69.9) 0.03 68 0.05 68 18
GEOM110a 110 97 97 (97.8) 177 97 (99.4) 0.04 97 0.58 97 22
GEOM110b 110 37 37 (37.9) 131 37 (37.71) 22.1 37 0.25 37 56
GEOM120 120 72 72 33 72 (73.1) 9.0 72 0.03 72 17
GEOM120a 120 105 105 (106.3) 156 105 (105.9) 1.96 105 0.77 105 27
GEOM120b 120 35 35 (37.3) 67.7 35 (35.25) 14.38 35 0.81 35 36
GEOM30b 30 12 12 0.02 12 0.01 12 0.01 12 6
GEOM40b 40 16 16 0.03 16 (16.6) 0.01 16 0.01 16 6
GEOM50b 50 18 18 0.02 18 (18.2) 62.02 18 0.01 18 7
GEOM60b 60 23 23 0.22 23 0.01 23 0.01 23 8
GEOM70 70 47 47 5 47 (48.6) 0.01 47 0.02 47 9
GEOM70a 70 73 73 4 73 (73.6) 0.25 73 0.03 73 10
GEOM70b 70 24 24 12 24 15.1 24 0.01 24 11
GEOM80 80 66 66 2 67 (67.4) 0.01 66 0.01 66 9
GEOM80a 80 76 76 (76.1) 137 76 (78.2) 1.3 76 0.04 76 12
GEOM80b 80 27 27 (27.8) 67 27 24.9 27 0.06 27 12
GEOM90 90 61 61 (61.2) 89 61 (63.5) 1.74 61 0.15 61 12
GEOM90a 90 73 73 (74) 512 73 (74.1) 3.65 73 0.55 73 14
GEOM90b 90 30 30 (30.1) 67 30 (30.1) 0.17 30 0.02 30 16
R100_1g 100 21 21 (22) 114 21 (21.8) 508.0 21 7.14 21 300
R100_1gb 100 81 81 (83.8) 3 81 (81.4) 1279.7 81 1.98 81 15
R100_5g 100 - 59 (60.1) 7 59 193 59 0.2 59 23
R100_5gb 100 - 221 (224.1) 187 220 (222) 687 220 4 220 23
R100_9g 100 141 141 (141.3) 21 141 0.62 141 40.8 141 42
R100_9gb 100 518 518 (5449.3) 1152 518 8.17 518 (518.3) 1066.1 518 41
R50_1g 50 14 14 0.14 14 (14.1) 0.01 14 0.01 14 7
R50_1gb 50 53 53 (53.0) 0.24 53 (53.1) 0.07 53 0.01 53 7
R50_5g 50 37 37 (37.0) 0.95 37 0.01 37 0.02 37 9
R50_5gb 50 135 135 (135.3) 4 135 0.09 135 0.21 135 9
R50_9g 50 74 74 1 74 0.02 74 0.01 74 11
R50_9gb 50 262 262 13 262 0.01 262 1.4 262 11
R75_1g 75 18 18 (18.4) 11 18 2.62 18 0.28 18 9
R75_1gb 75 70 70 (70.1) 2 70 (72.4) 0..35 70 0.23 70 9
R75_5g 75 51 51 (51.4) 01 51 (51.2) 598.6 51 0.39 51 14
R75_5gb 75 186 186 (189) 19 186 51.1 186 2 186 14
R75_9g 75 110 110 3 110 0.08 110 0.1 110 22
R75_9gb 75 396 396 (396.4) 146 396 0.42 396 3.9 396 22
Best rate 95% 89% 100% 100%
Table 2: Comparative results of DLMCOL with the state-of-the-art methods (AFISA, RedLS, ILS-TS) for DIMACS/COLOR small instances. Dominating results are indicated in bold.
Instance AFISA RedLS ILS-TS DLMCOL
Graph name Opti. Best (Avg.) t (s) Best (Avg.) t (s) Best (Avg.) t (s) Best (Avg.) t (s)
P06 16 565 565 0.0 565 0.01 565 0.01 565 7
P07 24 3771 3771 0.0 3771 (3773.5) 0.01 3771 0.01 3771 6
P08 24 4049 4049 0.2 4049 0.03 4049 0.21 4049 6
P09 25 3388 3388 (3388.2) 1 3388 0.01 3388 0.01 3388 6
P10 16 3983 3983 0.7 3983 0.01 3983 0.01 3983 5
P11 18 3380 3380 0.0 3380 0.01 3380 0.01 3380 5
P12 26 657 657 0.0 657 0.01 657 0.01 657 6
P13 26 3220 3220 (3221.1) 0.7 3220 (3229) 0.01 3220 0.09 3220 6
P14 31 3157 3157 0.0 3157 0.01 3157 0.01 3157 6
P15 34 341 341 1.8 341 (343.1) 0.01 341 0.01 341 6
P16 34 2343 2343 0.7 2343 (2383.3) 0.01 2343 0.01 2343 6
P17 37 3281 3281 (3322.2) 2.7 3281 (3282.6) 0.05 3281 0.01 3281 7
P18 35 3228 3228 0.1 3228 0.01 3228 0.01 3228 6
P19 36 3710 3710 0.4 3710 0.01 3710 0.01 3710 6
P20 37 1830 1830 (1841) 4.9 1830 (1844) 0.05 1830 0.38 1830 6
P21 38 3660 3660 (3660.5) 0.8 3660 (3707.0) 0.01 3660 0.01 3660 7
P22 38 1912 1912 (1912.2) 0.3 1912 (1946) 0.21 1912 0.01 1912 7
P23 44 3770 3770 (3793.0) 0.3 3770 (3804) 0.04 3770 0.01 3770 7
P24 34 661 661 0.0 661 (667.1) 0.19 661 0.01 661 6
P25 36 504 504 0.3 504 0.01 504 0.01 504 6
P26 37 520 520 0.1 520 0.01 520 0.01 520 7
P27 44 216 216 0.1 216 (219) 0.01 216 0.07 216 7
P28 44 1729 1729 (1735.1) 2.6 1729 0.01 1729 0.01 1729 7
P29 53 3470 3470 0.1 3470 0.01 3470 0.01 3470 8
P30 60 4891 4891 54 4891 (4901) 0.01 4891 0.01 4891 10
P31 47 620 620 3.7 620 0.01 620 0.01 620 6
P32 51 2480 2480 0.4 2480 0.01 2480 0.01 2480 8
P33 56 3018 3018 (3029.7) 0.4 3018 (3096) 0.02 3018 0.01 3018 9
P34 74 1980 1980 (1980.5) 3.0 1980 (1994) 0.01 1980 0.03 1980 14
P35 86 2140 2140 (2145.0) 4.5 2140 (2161) 0.01 2140 0.02 2140 18
P36 101 7210 7210 (7385) 0.1 7210 0.01 7210 0.01 7210 24
P38 87 2130 2130 (2139.5) 9.5 2140 (2161) 0.01 2130 0.29 2130 18
P40 86 4984 4984 (5016.6) 5.1 5005 (5082.7) 0.01 4984 0.33 4984 18
P41 116 2688 2688 ( 2688.1) 0.1 2688 (2785.8) 0.04 2688 0.32 2688 453
P42 138 2466 2466 (2671.2) 931.0 2482 (2539.9) 0.02 2466 9.02 2466 27
Best rate 100% 91% 100 % 100%
Table 3: Comparative results of DLMCOL with the state-of-the-art methods (AFISA, RedLS, ILS-TS) for pxx instances. Dominating results are indicated in bold.
Instance AFISA RedLS ILS-TS DLMCOL
Graph name Opti. Best (Avg.) t (s) Best (Avg.) t (s) Best (Avg.) t (s) Best (Avg.) t (s)
r01 144 6724 6724 (6727.8) 49.5 6732 (6769.2) 0.01 6724 0.96 6724 31
r02 142 6771 6771 (6780.6) 85.3 6774 (6818.6) 0.01 6771 0.25 6771 30
r03 139 6473 6473 (6490.8) 190.2 6505 (6597.7) 233.5 6473 4.53 6473 51
r04 151 6342 6342 (6403.2) 467.3 6349 (6427.7) 0.42 6342 0.83 6342 81
r05 142 6408 6408 (6466.3) 71.7 6411 (65010.3) 0.42 6408 2.57 6408 30
r06 148 7550 7550 (7555.9) 29.2 7550 (7558.9) 0.01 7550 0.01 7550 31
r07 141 6889 6889 (7555.9) 34.8 6910 (6974.2) 954 6889 7.29 6889 29
r08 138 6057 6057 (6080.3) 311.7 6071 (6147.4) 0.04 6057 1.6 6057 234
r09 129 6358 6358 (6393.8) 395.2 6390 (6451.9) 66.13 6358 1.84 6358 26
r10 150 6508 6508 (6519.3) 461.1 6518 (65078.6) 0.05 6508 2.46 6508 33
r11 208 7654 7654 (7710.6) 9542.2 7691 (7739.5) 489.45 7654 5.27 7654 432
r12 199 7690 7691 (7710.4) 9542.2 7694 (7730.2) 2.61 7690 4.33 7690 58
r13 217 7500 7521 (7558.3) 619.5 7524 (7566.7) 0.04 7500 5.8 7500 71
r14 214 8254 8254 (8283.9) 8044.1 8288 (8371.4) 0.87 8254 4.78 8254 68
r15 198 8021 8021 (8126.8) 2559.1 8021 (8024.0) 0.01 8021 0.01 8021 172
r16 188 7755 7755 (7789.2) 195.5 7764 (7809.4) 0.01 7755 11.22 7755 51
r17 213 7979 7979 (8030.3) 855.4 8011 (8064.3) 0.86 7979 4.39 7979 242
r18 200 7232 7232 (7278.9) 868.2 7240 (7295.3) 11.53 7232 26.4 7232 4374
r19 185 6826 6840 (6868.1) 395.5 6826 (6850.5) 39.15 6826 2.09 6826 189
r20 217 8023 8023 (8102.0) 1028.5 8031 (8138.3) 1.68 8023 13.08 8023 3027
r21 281 9284 9284 (9384.5) 4588.7 9294 (9320.1) 0.01 9284 9.15 9284 6103
r22 285 8887 8887 (8959.3) 12911 8924 (9030.6) 0.01 8887 63.42 8887 1521
r23 288 9136 9136 (9267.9) 3252.0 9145 (9222.0) 0.05 9136 42.77 9136 (9137.7) 25716
r24 269 8464 8464 (8572.9) 13142.6 8468 (8534.5) 0.01 8464 0.51 8464 797
r25 266 8426 8468 (8560.8) 874.8 8579 (8649.6) 0.01 8426 36.03 8426 7088
r26 284 8819 8819 (8927.9) 14225.1 8937 (9035.3) 0.01 8819 82.2 8819 22861
r27 259 7975 7975 (8019.7) 14074.9 7975 (7997.3) 1.71 7975 9.86 7975 102
r28 288 9407 9407 (9599.4) 8691.0 9409 (9475.4) 0.01 9407 0.44 9407 1891
r29 281 8693 8693 (8743.7) 7613.1 8701 (8743.7) 0.03 8693 4.54 8693 3429
r30 301 9816 9816 (10003.2) 8838.6 9820 (9877.1) 0.01 9816 1.36 9816 147
Best rate 87% 13.3% 100 % 100%
Table 4: Comparative results of DLMCOL with the state-of-the-art methods (AFISA, RedLS, ILS-TS) for rxx instances. Dominating results are indicated in bold.
Instance AFISA RedLS ILS-TS DLMCOL
Graph name Opti. Best (Avg.) t (s) Best (Avg.) t (s) Best (Avg.) t (s) Best (Avg.) t (s)
C2000.5 2000 - 2400 (2425.1) 3134.0 2151 (2162.4) 11827 2250 (2266.2) 11030 - -
C2000.9 2000 - 6228 (6284.0) 2798.3 5486 (5507.9) 166246 5808 (5849.3) 161980 - -
DSJC1000.1 1000 - 359 (362.9) 430.5 300 (302.6) 98115 305 (305.9) 97025 342.0 (344.5) 1105
DSJC1000.5 1000 - 1357 (1430.9) 371.7 1220 (1228.4) 234 1242 (1270.4) 1929 1230.0 (1260.0) 167639
DSJC1000.9 1000 - 3166 (3231.0) 490.2 2864 (2875.7) 48298 2975 (2997.8) 101238 - -
DSJC250.1 250 - 140 (141.9) 48.9 130 (132) 1 127 (127) 2576 127 (127) 1353
DSJC250.5 250 - 415 (428.1) 269.2 404 (407.7) 43579 393 (393.3) 58615 392 (392) 9226
DSJC250.9 250 934 939 (943.2) 926 940 (943.11) 801.22 934 (936.4) 7670.05 934 4722
DSJC500.1 500 - 210 (215.6) 426 188 (189.8) 50 184 (185.4) 72845 184 (184.9) 23049
DSJC500.5 500 - 778 (845.1) 159.3 726 (729.0) 2466 707 (716.9) 51407 686* (688.4) 47064
DSJC500.9 500 - 1790 (1854.5) 831.1 1681 (1685.3) 148914 1692 (1702.5) 136637 1662* (1664) 121518
DSJR500.1 500 - 169 (175.4) 458.9 171 (173.4) 1 169 (169) 0.56 169 (171.7) 35389
flat1000_50_0 1000 - 1289 (1315.7) 981.8 1184 (1186.7) 93711 1218 (1226.4) 141135 924* 82068
flat1000_60_0 1000 - 1338 (1354) 201.9 1220 (1230.6) 264 1242 (1258.0) 60631 1224 (1231.6) 168937
flat1000_76_0 1000 - 1314 (1337.6 ) 2396.6 1200 (1210.5) 316 1240 (1246.7) 141292 1210.0 (1220.6) 164934
inithx.i.1 864 - 587 (587.9 ) 527.5 569 (571.4) 2 569 ¡ 0.01 569 214
inithx.i.2 645 - 341 (341.6) 0.01 329 (331.5) 1328 329 184 336 (337.0) 127
inithx.i.3 621 - 352 (355.6 ) 0.01 337 (337.4) 5 337 1 350 (352.2) 142
latin_square_10 900 - 1690 (1900.0) 780.3 1548 (1561.8) 36451 1542 (1557.0) 142925 1483* (1486.2) 149656
le450_15a 450 - 241 (247.1) 288.4 217 (218.7) 1 213 (213.8) 83054 213 (216.5) 43854
le450_15b 450 - 239(245.1) 368.3 219 (225.3) 1 217 (218.2) 2505 217 (219.6) 71563
le450_15c 450 - 313 (320.8) 432.9 288 (292.2) 2 277 (280.3) 17392 275* (277.4) 46789
le450_15d 450 - 306 (314.1) 113.7 285 (288.9) 979 274 (275.8) 155735 272* (272.4) 22917
le450_25a 450 - 317 (329.9) 362.3 308 (310.4) 45732 306 (306.0) 715 307 (308.6) 26228
le450_25b 450 - 318 (325.8) 285.9 308 (310.4) 45732 307 (307.0) 18 307 (308.8) 43456
le450_25c 450 - 378 (387.9 ) 359.4 360 (364.1) 2 349 (352.1) 82371 342* (343) 57924
le450_25d 450 - 375 (385.3) 254.8 342 (350.2) 4 339 (342.4) 16881 329* (330.1) 45128
miles1000 128 431 432 (444.7 ) 480.0 431 6.06 431 18.81 431 581
miles1500 128 797 797 1802 797 (797.3) 0.03 797 1.68 797 76
miles250 128 102 102 (102.7) 56.6 102 (103.6) 0.7 102 0.18 102 21
miles500 128 260 260 (261.3) 48.4 260 (260.4) 1.39 260 0.18 260 30
queen10_10 100 - 166 (169.2 ) 68.4 162 (165.3) 10400 162 (162.0) 24 162 (162) 19
queen11_11 121 - 178 (182.3) 55.2 179 (180.1) 909 172 (172.7) 68208 172 (172) 1753
queen12_12 144 - 194 (198.6) 92.7 188 (189.4) 133726 185 (185.2) 23709 185 (185) 1826
queen13_13 169 - 204 (207.5) 199.8 197 (200.9) 8100 194 (194.8) 20802 194 (194) 1150
queen14_14 196 - 224 (227.4) 360.1 217 (219.7) 2 216 (217.1) 469 215* (215.2) 16621
queen15_15 225 - 237 (241.2) 183.4 230 (233.5) 1 224 (226.7) 2685 223* (224.1) 9276
queen16_16 256 - 253 (256.3) 300.9 242 (246.0) 579 238 (239.0) 60187 234* (234.8) 14751
wap01a 2368 - 638 (653.1) 1133.5 545 (558.6) 89184 549 (550.5) 62497 - -
wap02a 2464 - 637 (638.1) 3270.46 538 (547.1) 40143 541 (543.1) 46848 - -
wap03a 4730 - 687 (707.5) 2901.5 562 (566.9) 134923 577 (579.7) 55554 - -
wap04a 5231 - 698 (709.0 ) 4.79 563 (583.0) 52833 570 (573.0) 538455 - -
wap05a 905 - 598 (610.9) 1574.5 542 (548.1) 1348 542 (542.8) 62323 587 (588.0) 1139
wap06a 947 - 599 (607.6) 65.3 519 (529.4) 134 519 (520.8) 46077 574 (587.0) 1244
wap07a 1809 - 680 (692.5) 384.82 561 (563.8) 3433 567 (571.0) 11943 685 (686.0) 8294
wap08a 1870 - 663 (673.4) 2627.2 529 (540.3) 37505 546 (549.7) 67847 663 (665) 7397
zeroin.i.1 211 511 518 0.01 511 0.08 511 1.7 511 38
zeroin.i.2 211 336 336 (337.6) 440.8 336 0.22 336 0.01 336 33
zeroin.i.3 206 298 299 (301.7) 139.6 298 (298.7) 2.25 298 10.67 298 31
Best rate 10.2% 53.1% 48.9 % 63.3%
Table 5: Comparative results of DLMCOL with the state-of-the-art methods (MWSS, AFISA, RedLS, ILS-TS) for DIMACS/COLOR large instances. Dominating results are indicated in bold. New upper bounds are displayed with a star.

3.5 Impact of the selection selection strategy driven by deep learning

In this section we aim to analyze the benefits of the neural network based crossover selection strategy (see Section 2.6), which is one of the key components of the proposed DLMCOL framework.

We launched 10 replications of DLMCOL on 4 instances (DSJC500.1, DSJC500.5, le450_25c and le450_25d) with a cutoff time of 20 hours, with and without the neural network crossover selection. In the version without neural network, a second parent is randomly chosen among the nearest neighbors of each individual in order to perform a crossover.

The average best score obtained at each generation is displayed on Figure 1. Green curve corresponds to the standard DLMCOL algorithm while the red curve corresponds to the version without deep learning. One first notices that the version without deep learning can perform more generations in the same amount of time because there is no time spent in the neural network training and offspring evaluations.

However we observe that the green curve is always below the red curve and that better results can be achieved in the same amount of time. This highlights that the neural network can really help in the selection of promising crossovers for the memetic algorithm.

Figure 1: Impact of the deep learning driven crossover selection strategy on the algorithm. -axis corresponds to the WVCP score (average best score over 10 runs at each generation) and -axis corresponds to the number of generations.

4 Conclusion

A deep learning guide memetic framework for graph coloring problems was presented in this paper, as well as an implementation on GPU devices to solve the typical weighted graph coloring problem. This approach uses the deep set architecture to learn an invariant by color permutation regression model, useful to select the most promising crossovers at each generation. It can take advantage of GPU computations to perform massively parallel local search computations of a large population.

The proposed approach was assessed on popular DIMACS and COLOR challenge benchmarks of the weighted graph coloring problem. The computational results show that the algorithm competes globally well with the best algorithms on this problem. It can find 11 new upper bounds for very difficult instances and significantly improve the previous best results for three graphs. The same framework with the same type of neural network architecture could be applied to solve other graph coloring problems.

The achieved results reveals however three main limitations of the proposed approach. First, due to the memory capacity on the GPU devices we used, the DLMCOL algorithm has trouble to deal with very large instances. In particular, for the parallel local searches, the memory available on each thread of the GPU can be a huge limitation. Secondly, the algorithm can be quite slow to converge in comparison with sequential local search algorithms, due to the large population and the time spent to train the neural network at each generation. Thirdly, the algorithm fails for large instances with a low density (sparse graphs), as for these instance and for the studied coloring problem, the neural network has trouble to learn good patterns to effectively guide the selection of promising crossovers.

Other future works could be done. In particular, it could be interesting to use deep learning techniques to learn a specific crossover for this weighted graph coloring problem instead of the classical GPX crossover used in this algorithm. Other neural network structures could be investigated to overcome the difficulty encountered on sparse graphs.

Acknowledgment

We would like to thank Dr. Wen Sun for sharing the binary code of their AFISA algorithm [sun2017feasible], Dr. Yiyuan Wang for sharing the source code of their RedLS algorithm [wang2020reduction] and Pr. Bruno Nogueira for sharing the source code of their ILS-TS algorithm [nogueira2021iterated].

This work was granted access to the HPC resources of IDRIS (Grant No. 2020-A0090611887) from GENCI.

References