 # A deep learning guided memetic framework for graph coloring problems

Given an undirected graph G=(V,E) with a set of vertices V and a set of edges E, a graph coloring problem involves finding a partition of the vertices into different independent sets. In this paper we present a new framework which combines a deep neural network with the best tools of "classical" metaheuristics for graph coloring. The proposed algorithm is evaluated on the weighted graph coloring problem and computational results show that the proposed approach allows to obtain new upper bounds for medium and large graphs. A study of the contribution of deep learning in the algorithm highlights that it is possible to learn relevant patterns useful to obtain better solutions to this problem.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

A graph coloring problem is to assign colors to the vertices of a graph subject to certain constraints. One of the most common graph coloring problem is the vertex coloring problem. Given an undirected graph with a set of vertices and a set of edges , the problem is to color the vertices of the graph such that two adjacent vertices receive different colors. This problem can also be seen as finding a partition of the vertex set into different color groups (also called independent sets or color classes) such that two vertices linked by an edge belong to different color groups. In some variants of graph coloring, the problem is to find such legal coloring of the graph while considering additional objective function to minimize.

The search space of a graph coloring problem is composed of the partitions of vertices into color groups:

 S={{V1,V2,…,Vk}:∪ki=1Vi=V,Vi∩Vj=∅,1≤k≤|V|} (1)

where , . This search space is in general huge and finding an optimal solution is in general intractable unless P=NP, as most of the graph coloring problems are NP-hard.

Graph coloring problems have been studied very intensively in the past decades. A first category of methods proposed in the literature are local search procedures. Starting from a solution constructed using greedy methods, local search improves the current solution by considering best moves in a given neighborhood. To be effective, local search heuristics usually incorporate mechanisms to escape local optima based on tabu lists [blochliger2008reactive, hertz1987using] or perturbation strategies [jin2019solving, nogueira2021iterated]. However for very difficult instances of graph coloring, this is often not enough to find the global optimum as the search may be restricted to a single region of the search space. To overcome these difficulties, hybrid algorithms have been proposed, in particular based on the memetic framework that combines local searches and crossovers [MABook2012]. The memetic framework has been very successful in solving several graph colorig problems [jin2014memetic, lu2010memetic, moalic2018variations, porumbel2010evolutionary]

. These hybrid algorithms combine the benefits of local search for intensification with a population of high-quality solutions offering diversification possibilities. The memetic algorithms proposed in the literature for graph coloring typically use a small population with no more than 100 individuals. At each generation, one offspring solution is usually created by randomly selecting two or more individuals in the population and applying a crossover operator. One of the most popular crossovers used for graph coloring problems is the Greedy Partition Crossover (GPX) introduced in the hybrid evolutionary algorithm (HEA)

[galinier1999hybrid]. The GPX produces offspring by choosing alternatively the largest color class in the parent solutions. The offspring that is generated from this crossover is then improved by a local search procedure.

These crossover operators of the memetic algorithm allow to produce new restarting points for the local search procedure, which are expected to be better than a pure random initialization. However, when using such mechanisms, there is usually no way of knowing in advance whether the new restarting point indicates a promising area that is really worth being explored by the local search procedure. Indeed, sometimes, the use of a crossover can bring back to an already visited region of the search space without any chance of further improvement, or to a new region far from the global optimum. Moreover, hybrid algorithms do not have a specific memory to store information about past trajectories done in the search space that could be used to learn useful patterns to solve the problem (although sharing groups with crossovers can be seen as some sort of ”learning” of good patterns).

On the other hand, numerous algorithms have been proposed since decades from the machine learning community to leverage statistical learning methods for solving difficult combinatorial search problems. We refer the reader to the recent survey of

[bengio2020machine]

on this topic. These attempts have been given a new lease of life, with the emergence of deep learning techniques for combinatorial optimization problems

[bello2016neural, dai2017learning], inspired from the great success of the AlphaZero algorithm for combinatorial games [silver2018general]

. In particular, some recent works using reinforcement learning and deep learning have been applied to solve graph coloring problems

[huang2019coloring, lemos2019graph]

. Nevertheless, these studies rarely exploit specific knowledge of the problem, which greatly limits the interest and performance of these approaches. Indeed, the results obtained by this type of approach are for the moment far from the results obtained by state of the art algorithms on graph coloring problems such as hybrid algorithms

[lu2010memetic, malaguti2008metaheuristic, moalic2018variations, porumbel2010evolutionary] and simulated annealing algorithms [titiloye2011quantum]. We can mention however new works which are trying to pair efficient local search algorithms and machine learning techniques [goudet2021population, zhou2016reinforcement, zhou2018improving] with promising results for graph coloring problems.

In this paper, we aim to push further the integration of machine learning and combinatorial optimization, by proposing a new framework which combines deep neural networks with the best tools of ”classical” metaheuristics for graph coloring (local search well adapted with tabu search [hertz1987using] and use of recombination of solutions like those used in memetic algorithms [galinier1999hybrid, lu2010memetic, moalic2018variations, porumbel2010evolutionary]), so as to solve very difficult graph coloring problems which still resist the best current methods. In order to achieve this integration, we propose to revisit an idea proposed by Boyan and Moore twenty years ago. In [boyan2000learning], the authors remarked that the performance of a local search procedure depends on the state from which the search starts and therefore proposed to use a regression algorithm to predict the results of a local search algorithm. Once learned this predictive model can help to select new good starting points for the local search and the authors show on some examples that it can help to find quicker the optimal solution in the search space. We propose to exploit this idea with the use of modern deep learning techniques, in order to help in the selection of promising new crossovers among those possible ones in each generation. We design a specific neural network architecture for graph coloring problems inspired by deep set networks [lucas2018mixed, zaheer2017deep, zhang2019deep], in order to make it invariant by permutation of the color classes. Furthermore, as training a deep learning algorithm requires a large amount of data, we implement a memetic algorithm with a very large population (), whose individuals evolve in parallel in the search space, adapting the framework recently proposed in [goudet2021massively] for latin squares completion. In order to learn the neural network and to compute all the local searches in parallel for all the individuals of the population, we leverage on GPU (Graphic Processing Units) computation.

As a proof of concept, we apply this approach to solve the weighted vertex graph coloring problem (WVCP) which has recently attracted a lot of interest in the literature [nogueira2021iterated, wang2020reduction]. In the WVCP, a strictly positive weight is associated to each vertex . The goal of the weighted vertex coloring problem (WVCP) is to find a legal coloring minimizing the global score:

 f(S)=k∑i=1maxj∈Vi wj. (2)

This problem is well suited for our framework as predicting this global score, instead of more sophisticated constraints, is well adapted when using regression machine learning techniques.

The WVCP has a number of practical applications in different fields such as matrix decomposition problems [prais2000reactive], batch scheduling [gavranovic2000graph] and manufacturing [hochbaum1997scheduling]. It has been addressed by exact methods [cornaz2017solving, furini2012exact, malaguti2009models] and heuristics [prais2000reactive, sun2018adaptive, wang2020reduction]. The current most performing heuristic for this problem is an iterated local search algorithm exploring two different neighborhoods and based on tabu search [nogueira2021iterated].

## 2 General framework - revisiting the STAGE algorithm with deep learning and memetic algorithm

Given a problem whose the goal is to find an optimal solution, minimizing an objective function , the expected search results of stochastic local search algorithm A can be defined as:

 E[fA(S)]=∑S′∈SP(SA⟶S′)f(S′) (3)

where

is the probability that a search starting from

will terminate in state . evaluates ’s promise as a starting state for the algorithm A.

The main idea of the STAGE algorithm [boyan2000learning] was to approximate this expectancy by a regression approximation model

, taking as input the encoded real-valued feature vector

(with features) of a state . This function

can be a regression model such as a linear regression model or a more complex non linear model such as a neural network.

After initializing a first random candidate solution , and the function approximator , the STAGE algorithm evolves in three steps:

1. Optimize f using A. From , it runs the local search algorithm A, producing a search trajectory that ends at a local optimum .

2. Train . For each point on the search trajectory, use as a new training pair for the function approximator.

3. Optimize using hillclimbing. Continuing from , perform a hillclimbing search on the learned objective function . This results in a new state which should be a new good starting point for A.

We propose to revisit this idea with an adaptation for each of the three points of the STAGE algorithm:

1. First, regarding the first step of the STAGE algorithm, we propose to run in parallel local searches with algorithm A starting from different states, instead of only one. It allows to produce different search trajectories allowing to build a training dataset with a great diversity of examples.

2. Secondly, regarding the step 2, we do not use any prior mapping from states to features. Following the current trend in deep learning, the embedding of the state can be directly learned in a end to end pipeline with a deep neural network, denoted as . We make this neural network invariant by permutation of the group of colors in the coloring which is a very important feature of all graph coloring problems, by adapting the deep set network architecture proposed in [zaheer2017deep] (see Section 2.3).

3. Thirdly, in the step 3 of the original STAGE algorithm, a hillclimbing algorithm is used to optimize the current solution guided by the objective function . However, as we address a very complex problem and we use a complex non convex and non-linear function (deep neural network), it is difficult to optimize it using a hillclimbing algorithm. We tried to use more complex algorithms such as tabu search to optimize it, but there is a deeper problem, which is the question of generalization. Indeed, if a state is too different from the states already seen before in the training dataset, and in particular if the color groups that composed it are too different from the color groups already seen before by the neural network, we expect that

can be very inaccurate for the estimation of

. Therefore, we propose to replace this hillclimbing procedure by a crossover operation between different members of a population of candidate solutions. By recombining the different color groups already seen before by the learning algorithm we expect the approximation of , given by , to be more precise.

The pseudo-code of the proposed new deep learning guided memetic framework for graph coloring (DLMCOL) is shown in Algorithm 1.

The algorithm takes a graph as input and tries to find a legal coloring with the minimum score . At the beginning, all the individuals of the population are initialized in parallel using a greedy random algorithm (cf. Section 2.1) and the neural network is initialized with random weights. Then, the algorithm repeats a loop (generation) until a stopping criterion (e.g., a cutoff time limit or a maximum of generations) is met. Each generation involves the execution of five components:

1. The individuals of the current population are simultaneously improved by running in parallel local searches on the GPU to find new legal solutions with a minimum score (cf. Section 2.2). For each of the improved individuals from step 1, we record , the legal state with the lowest score obtained during each local search trajectory.

2. From these local search trajectories, a supervised learning training dataset is built with and for and the neural network is trained on this dataset during epochs (cf. Section 2.3).

3. The distances between all pairs of the existing individuals and new individuals are computed in parallel (cf. Section 2.4).

4. Then the population updating procedure (cf. Section 2.5) merges the existing and new individuals to create a new population of individuals, by taking into account the fitness of each individual (score of the WVCP) and the distances between individuals in order to maintain some diversity in the population.

5. Finally each individual is matched with its nearest neighbors in the population. For each individual, offsprings are generated and the one with the best expected score evaluated with the neural network is selected (cf. Section 2.6). After this selection procedure, offspring individuals are selected and become the new starting points which are improved by the parallel by the local search procedure during the next generation ().

The algorithm stops when a predefined condition is reached and return the , the best legal solution found so far. The subsequent subsections present the five components of this deep learning guided memetic framework applied on the WVCP.

### 2.1 Initialization with a greedy random algorithm for the WVCP

In order to initialize the individuals of the population, we use a greedy random procedure which has already been seen as very effective for the WVCP in [nogueira2021iterated, sun2018adaptive].

First all the nodes are sorted by descending order of weight and by degree, then a color is assigned to each node without creating conflicts by randomly choosing a color in the set of already used color. If no color is available for the node with weight a new color is created (and the score of the current solution is increased by ).

Notice that for this problem the number of colors that can be used in order to find a legal coloring minimizing the global score is not known in advance. It is at least strictly greater than the chromatic number of the graph .

However, for our algorithm a predefined maximum allowed number of colors need to be set in order to specify the size of the layers of the neural network and to allocate memory for the local searches on the GPU. We evaluate this number as the maximum number of colors used in these greedy random procedures applied to initialize the whole population.

The new search space restricted with maximum available colors and composed of the partitions of vertices into color groups is defined as:

 ¯S={{V1,V2,…,Vkmax}:∪kmaxi=1Vi=V,Vi∩Vj=∅,i≠j,1≤i,j≤kmax} (4)

Even if it can be seen as quite restrictive to limit the search in instead of , we empirically notice that the best solution found for each of the different instances of the WVCP presented in the experimental section of this paper (cf. Section 3) still used significantly less than color.

### 2.2 Parallel iterated Tabu Search with feasible and infeasible searches

For local optimization, we employ a parallel iterated tabu search algorithm to simultaneously improve the individuals of the current population. It relies on the adaptive feasible and infeasible tabu search procedure for weighted vertex coloring (AFISA) proposed in [sun2018adaptive] for the WVCP, with some slight modifications. This feasible and infeasible tabu search uses a sequential procedure that improves a starting legal or illegal coloring by optimizing the fitness function given by:

 g(S)=f(s)+ϕ×c(S) (5)

where with:

 δuv={1if u∈Vi, v∈Vj and i=j and i≠00otherwise (6)

and where is a penalty coefficient for the number of conflicts in the solution.

The procedure improves the current coloring by successively changing the color of a vertex in the search space (with a maximum of colors). Such a change is called an one-move. To prevent the search from revisiting already visited colorings, a node cannot changes its color for the next (called tabu tenure) iterations111In the original AFISA algorithm, the tabu tenure is concerning past moves (as in the original TabuCol algorithm [hertz1987using]) instead of completely freezing a node, but we empirically observed that it seems more effective for the WVCP to freeze a node which has just changed color in order to avoid to much color changes of the same node without any improvement of the score (plateau).. The tabu tenure is set to be , where is a random integer from , is parameter set to 0.2, and is the number of nodes in the graph222In the original AFISA algorithm, the tabu tenure was set to where corresponds to the extended evaluation function given by equation 5, but we remark that it seems more appropriate to relate this tabu tenure to the number of nodes in the graph as it is directly proportional to the number of available moves at each step. It seems preferable that the global extended score , which depends on the magnitude of the weights for each specific instance, does not have such a direct impact on the tabu tenure..

Like in the AFISA algorithm, we perform successive searches by changing dynamically the value of in order to navigate in the space of legal and illegal colorings. The number of successive searches is set to . At the beginning the parameter is set to the value for each individual of the population, then at the end of each successive local search, if the best current solution found by the tabu search procedure is legal () then the value of is divided by 2 (in order to increase the chance of visiting infeasible solutions), otherwise if the solution is illegal () then is multiplied by (in order to guide the search toward feasible regions)333In the original AFISA algorithm is initially set to 1, cannot be lower than 1 and the adaptive mechanism only increase or decrease its value by 1. We have found empirically more efficient to initialize to a value depending on the density of the graph (via the ratio ) and the amplitude of the weights a we must balance a total weight score with a number of conflicts. Additionally, dividing or multiplying its value by 2 appears to be empirically more effective for faster adjustments, especially for instances with heavy weights than simply changing its value by 1..

During the last iteration of this iterative local search algorithm, we set in order to be sure to guide at least one time each local search toward a legal solution.

As shown in Algorithm 2, our parallel iterative tabu search runs in parallel on the GPU to raise the quality of the current population in parallel.

All the data structures required during the search are stored in each local thread memory running tabu search except the information of the graph which is stored in the global memory and shared.

### 2.3 Deep neural network training

Once all the local searches are done in parallel, we collect the starting states and the best score found on each local thread. It allows to build a supervised training dataset with examples whose entries are the ’s in and the corresponding targets are real values .

A neural network , parametrized by a vector of parameters (initialized at random at the beginning), will be successively trained on each new dataset produced at each generation (online training) in order to be able to be more and more accurate at predicting the expected score obtained after the local search procedure for any new starting point .

This neural network takes directly as input a coloring as a set of vectors , , where each is a binary vector of size indicating if the vertex belongs to the color group . For such an entry , the neural network outputs a real value noted .

One important characteristic of our neural network is that it be invariant by permutation of the group of colors of any solution given as input. It should be a function from to such that for any permutation of the input color groups,

 fθ(Vσ(1),…,Vσ(kmax))=fθ(V1,…,Vkmax) (7)

As indicated in [lucas2018mixed, zaheer2017deep], such permutation invariant functions can be obtained by combining the treatments of each color group vector with an additional ”color-averaging” operation that performs an average of the features across the different color groups. In this deep set architecture, this averaging is the only form of interaction between the different color groups. It has notably been shown in [lucas2018mixed] that such operations are sufficient to recover all invariant functions from to .

Using the notations proposed in [lucas2018mixed], for a coloring , the color group invariant network is defined as:

 fθ(S)=1kmaxkmax∑i=1(ϕθP∘ϕθP−1∘⋯∘ϕθ0(S))i (8)

where each is a permutation invariant function from to , where ’s are the layer sizes.

Each equivariant layer operation with input features and ouput features includes a weight matrix that treats each color group independently, a color-mixing weight matrix

and a bias vector

.

As in classical multi-layer feed-forward neural network,

processes each color group vector of the solution independently. Then, the weight matrix computes an average across the different color groups for each feature given by:

 ρ(S) =ρ(V1,…,Vkmax) (9) =1kmaxkmax∑i=1Vi (10)

The output of permutation-equivariant layer is a matrix in , which is the concatenation of output vectors of size :

 ϕθ(S)=(ϕθ(S)1,…,ϕθ(S)kmax) (11)

where for ,

 ϕθ(S)i=μ(β+ViΛ+ρ(S)Γ) (12)

where

is a non linear activation function.

After each local search procedure the neural network is trained during epochs on the new dataset using Adam optimizer [kingma2014adam] with initial learning rate

and batches of size 100 in order to minimize the mean square error loss (MSE) between the outputs and the targets. In order to speed up the training, prior to the non linearity we apply a batch normalization layer

[ioffe2015batch], adapted to keep the invariant property of the network.

Once learned this neural network will be used to select new crossovers for the next generation (see Section 2.6 below), but before performing crossovers, we must decide if the new legal colorings obtained after the parallel local search procedure can be inserted into the population. In order to do this, a distance-and-quality based pool update strategy is used to create a new population satisfying a minimum spacing among the individuals to ensure population diversity [PorumbelHK11]. Maintaining this minimum spacing requires the computation of pairwise distances between the solutions, which is presented the next subsection.

### 2.4 Distance computation

Following [goudet2021population, lu2010memetic, porumbel2010evolutionary], for population updating, we use a matrix to record all the distances between any two solutions of the population. This symmetric matrix is initialized with the pairwise distances computed for each pair of individuals in the initial population, and then updated each time a new individual is inserted in the population.

To merge the new solutions and the existing solutions, we need to evaluate (i) distances between each individual in the population and each improved individual in and (ii) distances between all the pairs of individuals in . All the distance computations are independent from one another, and are performed in parallel on the GPU (one computation per thread).

Given two colorings and , we use the set-theoretic partition distance to measure the dissimilarity between and , which corresponds to the minimum number of vertices that need to be displaced between color classes of to transform to [porumbel2011efficient]. The exact partition distance between two solutions can be calculated with the Hungarian algorithm [kuhn1955hungarian] in time. However, given that we need to compute millions of distances at each generation with the large population, we instead adopt the efficient approximation algorithm presented in [porumbel2011efficient], which scales in .

### 2.5 Population update

According to [porumbel2010evolutionary, PorumbelHK11], the population update procedure aims to keep the best individuals, but also to ensure a minimum spacing distance between the individuals. The update procedure is sequential, as we need to compare one by one existing individuals in the population at generation and the local search improved offspring solutions in the population .

We use the population update procedure proposed by [goudet2021population]. This procedure greedily adds one by one the best individuals of in the population of the next generation until reaches individuals, such that ( is the number of vertices), for any , . Each corresponds to the approximation of the set-theoretic partition distance which was precomputed in the last step of the algorithm.

### 2.6 Parent matching and selection of crossovers with the neural network

At each generation, each individual of the population is matched with its nearest neighbors in the population (in the sense of the distance evaluated in subsection 2.4).

For each individual , offspring solutions () are generated using the well-known GPX crossover [galinier1999hybrid, moalic2018variations], where the individual is taken as the first parent and its neighbor is the second parent (the GPX crossover is not symmetric).

For each individual , among these crossovers, we select the one with the best expected score evaluated with the neural network:

 S0i=argminSji,1≤j≤Kfθ(Sji) (13)

After this selection procedure, offspring solutions are selected and serve as the new starting points that are further improved in parallel by the local search procedure during the next generation ().

## 3 Experimental results

This section is dedicated to a computational assessment of the proposed deep learning memetic framework for solving the weighted vertex coloring problem, by making comparisons with the state-of-the-art methods.

### 3.1 Implementation on graphic processing units

The DLMCOL algorithm was programmed in Python with the Numba 0.53.1 library for CUDA kernel implementation (local search, distance computation, crossovers). The neural network is implemented in Pytorch 1.8.1. DLMCOL is specifically designed to run on GPUs. In this work we used a V100 Nvidia graphic card with 32 GB memory.

### 3.2 Benchmark instances

We carried out extensive experiments on the benchmark used in the recent papers on the WVCP [nogueira2021iterated, sun2017feasible, wang2020reduction]: the pxx, rxx, DIMACS/COLOR small, and DIMACS/COLOR large instances. The pxx and rxx instances are based on matrix-decomposition problems [prais2000reactive], while DIMACS/COLOR small [cornaz2017solving, furini2012exact] and DIMACS/COLOR large [sun2017feasible] are based on DIMACS and COLOR competitions.

As indicated in [nogueira2021iterated, wang2020reduction], a preprocessing procedure can be applied to reduce a graph with the set of weight for the WVCP. For each clique with nodes, if we note the the smallest weight of this set, all the nodes in the graph with a degree equal to and a weight can be removed from the graph without changing the optimal WVCP score that can be found for this instance. Enumerating all the cliques of the graph is a challenging problem. We used the igraph python package with a ten seconds timeout for all instances. For small instances it is enough to enumerate all the cliques of a graph.

### 3.3 Parameter setting

The population size of DLMCOL is set to , which is chosen as a multiple of the number of 64 threads per block. This large population size offers a good performance ratio on the Nvidia V100 graphics cards that we used in our experiments, while remaining reasonable for pairwise distance calculations in the population, as well as the memory occupation on the GPU for medium instances (). However for large instances (), we set in order to limit the memory occupation on the device.

In addition to the population size, the parameter of the tabu search is set to 0.2 and the number of tabu iterations depends on the size of the graph. The maximum number of iterated local searches launched at each generation, , is set to 10. The minimum spacing distance used for pool update is set to .

For the neural network we implement an architecture with 4 hidden layers of size , , and

. The non linear activation function is a Leaky Relu function defined as

, with . The neural network is trained during epochs at each generation with Adam optimizer and initial learning rate .

Table 1 summarizes the parameter setting, which can be considered as the default and is used for all our experiments.

### 3.4 Comparative results on WVCP bechmarks

This section shows a comparative analysis on the pxx, rxx, DIMACS/COLOR small, and DIMACS/COLOR large instances with respect to the state-of-the-art methods [nogueira2021iterated, sun2017feasible, wang2020reduction]. The reference methods include the three best recent heuristics: AFISA [sun2017feasible], RedLS [wang2020reduction] and ILS-TS [nogueira2021iterated]. When they are available, we also include the optimal scores obtained with the MWSS exact algorithm [cornaz2017solving] and reported in [nogueira2021iterated].

Given the stochastic nature of the DLMCOL algorithm, each instance was independently solved 10 times. For small instances presented in Tables 2 and 3, a time limit of 1 hour was used. However for medium and large instance in Tables 4 and 5, as training the neural network and performing all the local searches for the large population is time consuming, a cutoff limit of 48 hours is retained.

For a fair comparison, we also launched the local search methods RedLS and ILS-TS during 48 hours, until no improvements is observed. It was done on a computer with an Intel Xeon E5-2670 processor (2.5 GHz and 2 GB RAM). As the available AFISA binary code does not allow setting a cutoff time, we only report its results mentioned in the original article [sun2017feasible].

However, we acknowledge that the comparison remains difficult in term of computational time between DLMCOL and the competitors, as DLMCOL uses a GPU while the other algorithms, AFISA, RedLS and ILS-TS use a CPU. Therefore the computation times are given for indicative purposes only.

Columns 1, 2, and 3 of Tables 25 show the characteristics of each instance (i.e., name of the instance, number of vertices , and optimal result reported in the literature). Columns 4-9 present the best and average scores obtained by the reference algorithms for the difference instances, as well as the average time in second required to obtain the best results. The results of the proposed DLMCOL algorithm are reported in columns 10 and 11. Bold numbers show the dominating values while a star indicates a new upper bound555The certificates of the new best-known solutions from DLMCOL are available at https://github.com/GoudetOlivier/DLMCOL..

Due to memory limit on each thread of the GPU for the local searches, the DLMCOL algorithm was not runned on the biggest instances of the DIMACS/COLOR benchmarks: C2000.5, C2000.9, DSJC1000.9 and wap01-4a.

As can be seen from Tables 24, for all the small DIMACS/COLOR, pxx and rxx instances, DLMCOL can attain the known optimality or best result reported in the literature with almost 100 % success rate over 10 runs. Hoverer, the computational time required to achieve these results is in general higher in particular when compared with ILT-TS. It may be explained by the fact that DLMCOL can only obtain a first result after 10 iterations of local search in parallel of the whole population.

For the larger instances reported in Table 5, DLMCOL obtains excellent results by reaching the best-known score for 31 over 49 instances. For 11 of them, DLMCOL even finds new upper bounds that were never been reported before. In particular, we notice important improvements in comparison with the best methods of the literature for 3 instances, by significant reducing their best-known scores: DSJC500.5 from 707 to 686, flat1000_50_0 from 1184 to 924 and latin_square_10 from 1542 to 1483.

However, DLMCOL does not work in general for large graphs with low density of edges such as DSJC1000.1, inithhx.i.2, inithhx.i.3 and wapXXa. In fact, for these instances, it is very hard for the neural network to learn a common backbone of good solutions. Indeed, the good solutions for these instances are characterized by a low ratio of the number of color groups over the total number of vertices. As for the WVCP, only the maximum weight of each color group has an impact on the score, for these instances many different groupings of vertices are possible without impacting the score. It results in a very high average distance between the best solutions of the population. Therefore, the neural network fails to learn relevant patterns in this too diversified population.

For the largest graphs, we notice that the convergence of the algorithm is quite slow. Even after 48 hours, there is still room for further improvements. For the DJSC1000.5, flat1000_60_0 and flat1000_76_0 instances, it is possible to obtain still better new upper bounds respectively of 1185, 1162, 1165, after 138, 98 and 95 hours, respectively.

### 3.5 Impact of the selection selection strategy driven by deep learning

In this section we aim to analyze the benefits of the neural network based crossover selection strategy (see Section 2.6), which is one of the key components of the proposed DLMCOL framework.

We launched 10 replications of DLMCOL on 4 instances (DSJC500.1, DSJC500.5, le450_25c and le450_25d) with a cutoff time of 20 hours, with and without the neural network crossover selection. In the version without neural network, a second parent is randomly chosen among the nearest neighbors of each individual in order to perform a crossover.

The average best score obtained at each generation is displayed on Figure 1. Green curve corresponds to the standard DLMCOL algorithm while the red curve corresponds to the version without deep learning. One first notices that the version without deep learning can perform more generations in the same amount of time because there is no time spent in the neural network training and offspring evaluations.

However we observe that the green curve is always below the red curve and that better results can be achieved in the same amount of time. This highlights that the neural network can really help in the selection of promising crossovers for the memetic algorithm. Figure 1: Impact of the deep learning driven crossover selection strategy on the algorithm. y-axis corresponds to the WVCP score (average best score over 10 runs at each generation) and x-axis corresponds to the number of generations.

## 4 Conclusion

A deep learning guide memetic framework for graph coloring problems was presented in this paper, as well as an implementation on GPU devices to solve the typical weighted graph coloring problem. This approach uses the deep set architecture to learn an invariant by color permutation regression model, useful to select the most promising crossovers at each generation. It can take advantage of GPU computations to perform massively parallel local search computations of a large population.

The proposed approach was assessed on popular DIMACS and COLOR challenge benchmarks of the weighted graph coloring problem. The computational results show that the algorithm competes globally well with the best algorithms on this problem. It can find 11 new upper bounds for very difficult instances and significantly improve the previous best results for three graphs. The same framework with the same type of neural network architecture could be applied to solve other graph coloring problems.

The achieved results reveals however three main limitations of the proposed approach. First, due to the memory capacity on the GPU devices we used, the DLMCOL algorithm has trouble to deal with very large instances. In particular, for the parallel local searches, the memory available on each thread of the GPU can be a huge limitation. Secondly, the algorithm can be quite slow to converge in comparison with sequential local search algorithms, due to the large population and the time spent to train the neural network at each generation. Thirdly, the algorithm fails for large instances with a low density (sparse graphs), as for these instance and for the studied coloring problem, the neural network has trouble to learn good patterns to effectively guide the selection of promising crossovers.

Other future works could be done. In particular, it could be interesting to use deep learning techniques to learn a specific crossover for this weighted graph coloring problem instead of the classical GPX crossover used in this algorithm. Other neural network structures could be investigated to overcome the difficulty encountered on sparse graphs.

## Acknowledgment

We would like to thank Dr. Wen Sun for sharing the binary code of their AFISA algorithm [sun2017feasible], Dr. Yiyuan Wang for sharing the source code of their RedLS algorithm [wang2020reduction] and Pr. Bruno Nogueira for sharing the source code of their ILS-TS algorithm [nogueira2021iterated].

This work was granted access to the HPC resources of IDRIS (Grant No. 2020-A0090611887) from GENCI.