1 Introduction
Problems of graph partitioning arise in various areas of computer science, engineering, and related fields. For example in high performance computing [27], community detection in social networks [25] and route planning [4]. In particular the graph partitioning problem is very valuable for parallel computing. In this area, graph partitioning is mostly used to partition the underlying graph model of computation and communication. Roughly speaking, vertices in this graph represent computation units and edges denote communication. This graph needs to be partitioned such that there are few edges between the blocks (pieces). In particular, if we want to use processors we want to partition the graph into blocks of about equal size.
In this paper we focus on a version of the problem that constrains the maximum block size to times the average block size and tries to minimize the total cut size, i.e., the number of edges that run between blocks. It is well known that this problem is NPcomplete [7] and that there is no approximation algorithm with a constant ratio factor for general graphs [7]
. Therefore mostly heuristic algorithms are used in practice.
A successful heuristic for partitioning large graphs is the multilevel graph partitioning (MGP) approach depicted in Figure 1 where the graph is recursively contracted to achieve smaller graphs which should reflect the same basic structure as the input graph. After applying an initial partitioning algorithm to the smallest graph, the contraction is undone and, at each level, a local refinement method is used to improve the partitioning induced by the coarser level.
The main focus of this paper is a technique which integrates an evolutionary search algorithm with our multilevel graph partitioner KaFFPa and its scalable parallelization. We present novel mutation and combine operators which in contrast to previous methods that use a graph partitioner [28, 11] do not need random perturbations of edge weights. We show in Section 6 that the usage of edge weight perturbations decreases the overall quality of the underlying graph partitioner. The new combine operators enable us to combine individuals of different kinds (see Section 4 for more details). Due to the parallelization our system is able to compute partitions that have quality comparable or better than previous entries in Walshaw’s well known partitioning benchmark within a few minutes for graphs of moderate size. Previous methods of Soper et.al [28] required runtimes of up to one week for graphs of that size. We therefore believe that in contrast to previous methods, our method is very valuable in the area of high performance computing.
The paper is organized as follows. We begin in Section 2 by introducing basic concepts. After shortly presenting Related Work in Section 3, we continue describing the main evolutionary components in Section 4 and its
parallelization in Section 5. A summary of extensive experiments done to tune the algorithm and evaluate its performance is presented in Section 6. A brief outline of the techniques used in the multilevel graph partitioner KaFFPa is provided in Appendix A. We have implemented these techniques in the graph partitioner KaFFPaE (Karlsruhe Fast Flow Partitioner Evolutionary) which is written in C++. Experiments reported in Section 6 indicate that KaFFPaE is able to compute partitions of very high quality and scales well to large networks and machines.
2 Preliminaries
2.1 Basic concepts
Consider an undirected graph with edge weights , node weights , , and . We extend and to sets, i.e., and . denotes the neighbors of . We are looking for blocks of nodes ,…, that partition , i.e., and for . The balancing constraint demands that for some parameter . The last term in this equation arises because each node is atomic and therefore a deviation of the heaviest node has to be allowed. The objective is to minimize the total cut where . A clustering is also a partition of the nodes, however is usually not given in advance and the balance constraint is removed. A vertex that has a neighbor , is a boundary vertex. An abstract view of the partitioned graph is the so called quotient graph, where vertices represent blocks and edges are induced by connectivity between blocks. Given two clusterings and the overlay clustering is the clustering where each block corresponds to a connected component of the graph where is the union of the cut edges of and , i.e. all edges that run between blocks in either or . By default, our initial inputs will have unit edge and node weights. However, even those will be translated into weighted problems in the course of the algorithm.
A matching is a set of edges that do not share any common nodes, i.e., the graph has maximum degree one. Contracting an edge means to replace the nodes and by a new node connected to the former neighbors of and . We set so the weight of a node at each level is the number of nodes it is representing in the original graph. If replacing edges of the form , would generate two parallel edges , we insert a single edge with . Uncontracting an edge undos its contraction. In order to avoid tedious notation, will denote the current state of the graph before and after a (un)contraction unless we explicitly want to refer to different states of the graph. The multilevel approach to graph partitioning consists of three main phases. In the contraction (coarsening) phase, we iteratively identify matchings and contract the edges in . Contraction should quickly reduce the size of the input and each computed level should reflect the global structure of the input network. Contraction is stopped when the graph is small enough to be directly partitioned using some expensive other algorithm. In the refinement (or uncoarsening) phase, the matchings are iteratively uncontracted. After uncontracting a matching, a refinement algorithm moves nodes between blocks in order to improve the cut size or balance.
KaFFPa, which we use as a base case partitioner, extended the concept of iterated multilevel algorithms which was introduced by [29]. The main idea is to iterate the coarsening and uncoarsening phase. Once the graph is partitioned, edges that are between two blocks are not contracted. An Fcycle works as follows: on each level we perform at most two recursive calls using different random seeds during contraction and local search. A second recursive call is only made the second time that the algorithm reaches a particular level. As soon as the graph is partitioned, edges that are between blocks are not contracted. This ensures nondecreasing quality of the partition since our refinement algorithms guarantee no worsening and break ties randomly. These so called global search strategies are more effective than plain restarts of the algorithm. Extending this idea will yield the new combine and mutation operators described in Section 4.
Local search algorithms find good solutions in a very short amount of time but often get stuck in local optima. In contrast to local search algorithms, genetic/evolutionary algorithms are good at searching the problem space globally. However, genetic algorithms lack the ability of fine tuning a solution, so that local search algorithms can help to improve the performance of a genetic algorithm. The combination of an evolutionary algorithm with a local search algorithm is called
hybrid or memetic evolutionary algorithm [20].3 Related Work
There has been a huge amount of research on graph partitioning so that we refer the reader to [15, 31] for more material on multilevel graph partitioning and to [20] for more material on genetic approaches for graph partitioning. All general purpose methods that are able to obtain good partitions for large real world graphs are based on the multilevel principle outlined in Section 2. Well known software packages based on this approach include, Jostle [31], Metis [19], and Scotch [24]. KaFFPa [17] is a MGP algorithm using local improvement algorithms that are based on flows and more localized FM searches. It obtained the best results for many graphs in [28]. Since we use it as a base case partitioner it is described in more detail in Appendix A. KaSPar [23] is a graph partitioner based on the central idea to (un)contract only a single edge between two levels. KaPPa [17] is a "classical" matching based MGP algorithm designed for scalable parallel execution.
Soper et al. [28] provided the first algorithm that combined an evolutionary search algorithm with a multilevel graph partitioner. Here crossover and mutation operators have been used to compute edge biases, which yield hints for the underlying multilevel graph partitioner. Benlic et al. [5] provided a multilevel memetic algorithm for balanced graph partitioning. This approach is able to compute many entries in Walshaw’s Benchmark Archive [28] for the case . PROBE [8] is a metaheuristic which can be viewed as a genetic algorithm without selection. It outperforms other metaheuristics, but it is restricted to the case and .
Very recently an algorithm called PUNCH [11] has been introduced. This approach is not based on the multilevel principle. However, it creates a coarse version of the graph based on the notion of natural cuts. Natural cuts are relatively sparse cuts close to denser areas. They are discovered by finding minimum cuts between carefully chosen regions of the graph. They introduced an evolutionary algorithm which is similar to Soper et al. [28], i.e. using a combine operator that computes edge biases yielding hints for the underlying graph partitioner. Experiments indicate that the algorithm computes very good partitions for road networks. For instances without a natural structure such as road networks, natural cuts are not very helpful.
4 Evolutionary Components
The general idea behind evolutionary algorithms (EA) is to use mechanisms which are highly inspired by biological evolution such as selection, mutation, recombination and survival of the fittest. An EA starts with a population of individuals (in our case partitions of the graph) and evolves the population into different populations over several rounds. In each round, the EA uses a selection rule based on the fitness of the individuals (in our case the edge cut) of the population to select good individuals and combine them to obtain improved offspring [16]. Note that we can use the cut as a fitness function since our partitioner almost always generates partitions that are within the given balance constraint, i.e. there is no need to use a penalty function or something similar to ensure that the final partitions generated by our algorithm are feasible. When an offspring is generated an eviction rule is used to select a member of the population and replace it with the new offspring. In general one has to take both into consideration, the fitness of an individual and the distance between individuals in the population [2]. Our algorithm generates only one offspring per generation. Such an evolutionary algorithm is called steadystate [9]. A typical structure of an evolutionary algorithm is depicted in Algorithm 1.
For an evolutionary algorithm it is of major importance to keep the diversity in the population high [2], i.e. the individuals should not become too similar, in order to avoid a premature convergence of the algorithm. In other words, to avoid getting stuck in local optima a procedure is needed that randomly perturbs the individuals. In classical evolutionary algorithms, this is done using a mutation operator. It is also important to have operators that introduce unexplored search space to the population. Through a new kind of crossover and mutation operators, introduced in Section 4.1, we introduce more elaborate diversification strategies which allow us to search the search space more effectively.
Interestingly, Inayoshi et al. [18] noticed that good local solutions of the graph partitioning problem tend to be close to one another. Boese et al. [6] showed that the quality of the local optima overall decreases as the distance from the global optimum increases. We will see in the following that our combine operators can exchange good parts of solutions quite effectively especially if they have a small distance.
4.1 Combine Operators
We now describe the general combine operator framework. This is followed by three instantiations of this framework. In contrast to previous methods that use a multilevel framework our combine operators do not need perturbations of edge weights since we integrate the operators into our partitioner and do not use it as a complete black box.
Furthermore all of our combine operators assure that the offspring has a partition quality at least as good as the best of both parents. Roughly speaking, the combine operator framework combines an individual/partition (which has to fulfill a balance constraint) with a clustering . Note that
the clustering does not necessarily has to fulfill a balance constraint and is not necessarily given in advance. All instantiations of this framework use a different kind of clustering or partition. The partition and the clustering are both used as input for our multilevel graph partitioner KaFFPa in the following sense. Let be the set of edges that are cut edges, i.e. edges that run between two blocks, in either or . All edges in are blocked during the coarsening phase, i.e. they are not contracted during the coarsening phase. In other words these edges are not eligible for the matching algorithm used during the coarsening phase and therefore are not part of any matching computed. An illustration of this can be found in Figure 2.
The stopping criterion for the multilevel partitioner is modified such that it stops when no contractable edge is left. Note that the coarsest graph is now exactly the same as the quotient graph of the overlay clustering of and of (see Figure 3). Hence vertices of the coarsest graph correspond to the connected components of and the weight of the edges between vertices corresponds to the sum of the edge weights running between those connected components in .
As soon as the coarsening phase is stopped, we apply the partition to the coarsest graph and use this as initial partitioning. This is possible since we did not contract any cut edge of . Note that due to the specialized coarsening phase and this specialized initial partitioning we obtain a high quality initial solution on a very coarse graph which is usually not discovered by conventional partitioning algorithms. Since our refinement algorithms guarantee no worsening of the input partition and use random tie breaking we can assure nondecreasing partition quality. Note that the refinement algorithms can effectively exchange good parts of the solution on the coarse levels by moving only a few vertices. Figure 3 gives an example.
Also note that this combine operator can be extended to be a multipoint combine operator, i.e. the operator would use instead of two parents. However, during the course of the algorithm a sequence of two point combine steps is executed which somehow "emulates" a multipoint combine step. Therefore, we restrict ourselves to the case . When the offspring is generated we have to decide which solution should be evicted from the current population. We evict the solution that is most similar to the offspring among those individuals in the population that have a cut worse or equal than the offspring itself. The difference of two individuals is defined as the size of the symmetric difference between their sets of cut edges. This ensures some diversity in the population and hence makes the evolutionary algorithm more effective.
4.1.1 Classical Combine using Tournament Selection
This instantiation of the combine framework corresponds to a classical evolutionary combine operator . That means it takes two individuals of the population and performs the combine step described above. In this case corresponds to the partition having the smaller cut and corresponds to the partition having the larger cut. Random tie breaking is used if both parents have the same cut. The selection process is based on the tournament selection rule [22], i.e. is the fittest out of two random individuals from the population. The same is done to select . Note that in contrast to previous methods the generated offspring will have a cut smaller or equal to the cut of . Due to the fact that our multilevel algorithms are randomized, a combine operation performed twice using the same parents can yield different offspring.
4.1.2 Cross Combine / (Transduction)
In this instantiation of the combine framework , the clustering corresponds to a partition of . But instead of choosing an individual from the population we create a new individual in the following way. We choose uniformly at random in and uniformly at random in . We then use KaFFPa to create a partition of fulfilling the balance constraint . In general larger imbalances reduce the cut of a partition which then yields good clusterings for our crossover. To the best of our knowledge there has been no genetic algorithm that performs combine operations combining individuals from different search spaces.
4.1.3 Natural Cuts
Delling et al. [11] introduced the notion of natural cuts as a preprocessing technique for the partitioning of road networks. The preprocessing technique is able to find relatively sparse cuts close to denser areas. We use the computation of natural cuts to provide another combine operator, i.e. combining a partition with a clustering generated by the computation of natural cuts. We closely follow their description: The computation of natural cuts works in rounds. Each round picks a center vertex and grows a breadthfirst search (BFS) tree. The BFS is stopped as soon as the weight of the tree, i.e. the sum of the vertex weights of the tree, reaches
, for some parameters and . The set of the neighbors of in is called the ring of . The core of is the union of all vertices added to before its size reached where is another parameter.
The core is then temporarily contracted to a single vertex and the ring into a single vertex to compute the minimum cut between them using the given edge weights as capacities.
To assure that every vertex eventually belongs to at least one core, and therefore is inside at least one cut, the vertices are picked uniformly at random among all vertices that have not yet been part of any core in any round. The process is stopped when there are no such vertices left.
In the original work [11] each connected component of the graph , where is the union of all edges cut by the process above, is contracted to a single vertex. Since we do not use natural cuts as a preprocessing technique at this place we don’t contract these components. Instead we build a clustering of such that each connected component of is a block.
This technique yields the third instantiation of the combine framework which is divided into two stages, i.e. the clustering used for this combine step is dependent on the stage we are currently in. In both stages the partition used for the combine step is selected from the population using tournament selection. During the first stage we choose uniformly at random in , uniformly at random in and we set . Using these parameters we obtain a clustering of the graph which is then used in the combine framework described above. This kind of clustering is used until we reach an upper bound of ten calls to this combine step. When the upper bound is reached we switch to the second stage. In this stage we use the clusterings computed during the first stage, i.e. we extract elementary natural cuts and use them to quickly compute new clusterings. An elementary natural cut (ENC) consists of a set of cut edges and the set of nodes in its core. Moreover, for each node in the graph, we store the set of of ENCs that contain in their core. With these data structures its easy to pick a new clustering (see Algorithm 2) which is then used in the combine framework described above.
4.2 Mutation Operators
We define two mutation operators, an ordinary and a modified Fcycle. Both mutation operators use a random individual from the current population. The main idea is to iterate coarsening and refinement several times using different seeds for random tie breaking. The first mutation operator can assure that the quality of the input partition does not decrease. It is basically an ordinary Fcycle which is an algorithm used in KaFFPa. Edges between blocks are not contracted. The given partition is then used as initial partition of the coarsest graph. In contrast to KaFFPa, we now can use the partition as input to the partition in the very beginning. This ensures nondecreasing quality since our refinement algorithms guarantee no worsening. The second mutation operator works quite similar with the small difference that the input partition is not used as initial partition of the coarsest graph. That means we obtain very good coarse graphs but we can not assure that the final individual has a higher quality than the input individual. In both cases the resulting offspring is inserted into the population using the eviction strategy described in Section 4.1.
5 Putting Things Together and Parallelization
We now explain the parallelization and describe how everything is put together. Each processing element (PE) basically performs the same operations using different random seeds (see Algorithm 3
). First we estimate the population size
: each PE performs a partitioning step and measures the time spend for partitioning. We then choose such that the time for creating partitions is approximately where the fraction is a tuning parameter and is the total running time that the algorithm is given to produce a partition of the graph. Each PE then builds its own population, i.e. KaFFPa is called several times to createindividuals/partitions. Afterwards the algorithm proceeds in rounds as long as time is left. With corresponding probabilities, mutation or combine operations are performed and the new offspring is inserted into the population.
We choose a parallelization/communication protocol that is quite similar to randomized rumor spreading [12]. Let denote the number of PEs used. A communication step is organized in rounds. In each round, a PE chooses a communication partner and sends her the currently best partition of the local population. The selection of the communication partner is done uniformly at random among those PEs to which not already has been send to. Afterwards, a PE checks if there are incoming individuals and if so inserts them into the local population using the eviction strategy described above. If is improved, all PEs are again eligible. This is repeated times. Note that the algorithm is implemented completely asynchronously, i.e. there is no need for a global synchronisation. The process of creating individuals is parallelized as follows: Each PE makes calls to KaFFPa using different seeds to create individuals. Afterwards we do the following times: The root PE computes a random cyclic permutation of all PEs and broadcasts it to all PEs. Each PE then sends a random individual to its successor in the cyclic permutation and receives a individual from its predecessor in the cyclic permutation. We call this particular part of the algorithm quick start.
The ratio of mutation to crossover operations yields a tuning parameter . As we will see in Section 6 the ratio is a good choice. After some experiments we fixed the ratio of the mutation operators to and the ratio of the combine operators to .
Note that the communication step in the last line of the algorithm could also be performed only every iterations (where is a tuning parameter) to save communication time. Since the communication network of our test system is very fast (see Section 6), we perform the communication step in each iteration.
6 Experiments
Implementation.
We have implemented the algorithm described above using C++. Overall, our program (including KaFFPa) consists of about 22 500 lines of code. We use two base case partitioners, KaFFPaStrong and KaFFPaEco. KaFFPaEco is a good tradeoff between quality and speed, and KaFFPaStrong is focused on quality. For the following comparisons we used Scotch 5.1.9., and kMetis 5.0 (pre2).
System.
Experiments have been done on two machines. Machine A is a cluster with 200 nodes where each node is equipped with two Quadcore Intel Xeon processors (X5355) which run at a clock speed of 2.667 GHz. Each node has 2x4 MB of level 2 cache each and run Suse Linux Enterprise 10 SP 1. All nodes are attached to an InfiniBand 4X DDR interconnect which is characterized by its very low latency of below 2 microseconds and a point to point bandwidth between two nodes of more than 1300 MB/s. Machine B has two Intel Xeon X5550, 48GB RAM, running Ubuntu 10.04. Each CPU has 4 cores (8 cores when hyperthreading is active) running at 2.67 GHz. Experiments in Sections 6.1, 6.2, 6.3 and 6.5 have been conducted on machine A, and experiments in Sections 6.4 and 6.6 have been conducted on machine B. All programs were compiled using GCC Version 4.4.3 and optimization level 3 using OpenMPI 1.5.3. Henceforth, a PE is one core.
Instances.
We report experiments on three suites of instances (small, medium sized and road networks) summarized in Appendix C. is a random geometric graph with nodes where nodes represent random points in the unit square and edges connect nodes whose Euclidean distance is below . This threshold was chosen in order to ensure that the graph is almost connected. is the Delaunay triangulation of random points in the unit square. Graphs ,.. and .. come from Walshaw’s benchmark archive [30]. Graphs and , and are undirected versions of the road networks, used in [10]. is a road network taken from [3]. Our default number of partitions are 2, 4, 8, 16, 32, 64 since they are the default values in [30] and in some cases we additionally use 128 and 256. Our default value for the allowed imbalance is 3% since this is one of the values used in [30] and the default value in Metis. Our default number of PEs is 16.
Methodology.
We mostly present two kinds of data: average values and plots that show the evolution of solution quality (convergence plots). In both cases we perform multiple repetitions. The number of repetitions is dependent on the test that we perform. Average values over multiple instances are obtained as follows: for each instance (graph,
), we compute the geometric mean of the average edge cut values for each instance. We now explain how we compute the convergence plots. We start explaining how we compute them for a single instance
: whenever a PE creates a partition it reports a pair (, cut), where the timestamp is the currently elapsed time on the particular PE and cut refers to the cut of the partition that has been created. When performing multiple repetitions we report average values (, avgcut) instead. After the completion of KaFFPaE we are left with sequences of pairs (, cut) which we now merge into one sequence. The merged sequence is sorted by the timestamp . The resulting sequence is called . Since we are interested in the evolution of the solution quality, we compute another sequence . For each entry (in sorted order) in we insert the entry into . Here is the minimum cut that occurred until time . refers to the normalized sequence, i.e. each entry (, cut) in is replaced by (, cut) where and is the average time that KaFFPa needs to compute a partition for the instance . To obtain average values over multiple instances we do the following: for each instance we label all entries in , i.e. (, cut) is replaced by (, cut, ). We then merge all sequences and sort by . The resulting sequence is called . The final sequence presents event based geometric averages values. We start by computing the geometric mean cut value using the first value of all (over ). To obtain we basically sweep through : for each entry (in sorted order) in we update , i.e. the cut value of that took part in the computation of is replaced by the new value , and insert into . Note that can be only smaller or equal to the old cut value of .6.1 Parameter Tuning
We now tune the fraction parameter and the ratio between mutation and crossover operations. For the parameter tuning we choose our small testset because runtimes for a single graph partitioner call are not too large. To save runtime we focus on for tuning the parameters. For each instance we gave KaFFPaE ten minutes time and 16 PEs to compute a partition. During this test the quick start option is disabled.
For this test the flip coin parameter is set to one. In Figure 5 we can see that the algorithm is not too sensitive about the exact choice of this parameter. However, larger values of speed up the convergence rate and improve the result achieved in the end. Since and are the best parameter in the end, we choose as our default value. For tuning the ratio of mutation and crossover operations, we set to ten. We can see that for smaller values of the algorithm is not too sensitive about the exact choice of the parameter. However, if the exceeds 8 the convergence speed slows down which yields worse average results in the end. We choose because it has a slight advantage in the end. The parameter tuning uses KaFFPaStrong as a partitioner. We also performed the parameter tuning using KaFFPaEco as a partitioner (see Appendix B.1).
6.2 Scalability
In this Section we study the scalability of our algorithm. We do the following to obtain a fair comparison: basically each configuration has the same amount of time, i.e. when doubling the number of PEs used, we divide the time that KaFFPaE has to compute a partition per instance by two. To be more precise, when we use one PE KaFFPaE has to compute a partition of an instance. When KaFFPaE uses PEs, then it gets time to compute a partition of an instance. For all the following tests the quick start option is enabled. To save runtime we use our small sized testset and fix to 64. Here we perform five repetitions per instance. We can see in Figure 6 that using more processors speeds up the convergence speed and up to also improves the quality in the end (in these cases the speedups are optimal in the end). This might be due to island effects [1]. For results are worse compared to . This is because the algorithm is barely able to perform combine and mutation steps, due to the very small amount of time given to KaFFPaE (60 seconds). On the largest graph of the testset (delaunay16) we need about 20 seconds to create a partition into blocks.
We now define pseudo speedup which is a measure for speedup at a particular normalized time of the configuration using one PE. Let be the mean minimum cut that KaFFPaE has computed using PEs until normalized time . The pseudo speedup is then defined as where . If for all we set (in this case the parallel algorithm is not able to compute the result computed by the sequential algorithm at normalized time ; this is only the case for ). We can see in Figure 6 that after a short amount of time we reach super linear pseudo speedups in most cases.
6.3 Comparison with KaFFPa and other Systems
/Algo.  Reps.  KaFFPaE 
Avg.  impr. %  
2  569  0.2% 
4  1 229  1.0% 
8  2 206  1.5% 
16  3 568  2.7% 
32  5 481  3.4% 
64  8 141  3.3% 
128  11 937  3.9% 
256  17 262  3.7% 
overall  3 872  2.5% 
In this Section we compare ourselves with repeated executions of KaFFPa and other systems. We switch to our middle sized testset to avoid the effect of overtuning our algorithm parameters to the instances used for calibration. We use 16 PEs and two hours of time per instance when we use KaFFPaE. We parallelized repeated executions of KaFFPa (embarrassingly parallel, different seeds) and also gave 16 PEs and two hours to KaFFPa. We look at and performed three repetitions per instance. Figure 7 show convergence plots for . All convergence plots can be found in the Appendix B.2. As expected the improvements of KaFFPaE relative to repeated executions of KaFFPa increase with increasing . The largest improvement is obtained for . Here KaFFPaE produces partitions that have a 3.9% smaller cut value than plain restarts of the algorithm. Note that using a weaker base case partitioner, e.g. KaFFPaEco, increases this value. On the small sized testset we obtained an improvement of 5.9% for compared to plain restarts of KaFFPaEco. Tables comparing KaFFPaE with the best results out of ten repetitions of Scotch and Metis can be found in the Appendix Table 4. Overall, Scotch and Metis produce 19% and 28% larger (best) cuts than KaFFPaE respectively. However, these methods are much faster than ours (Appendix Table 4).
6.4 Combine Operator Experiments
Algo.  S3R  K3R  KC  SC 

Avg.  improvement %  
2  591  2.4  1.6  0.2 
4  1 304  3.4  4.0  0.2 
8  2 336  3.7  3.6  0.2 
16  3 723  2.9  2.0  0.2 
32  5 720  2.7  3.3  0.0 
64  8 463  2.8  3.0  0.6 
128  12 435  3.6  4.5  0.0 
256  17 915  3.4  4.2  0.1 
We now look into the effectiveness of our combine operator . We conduct the following experiment: we compare the best result of three repeated executions of KaFFPa (K3R) against a combine step (KC), i.e. after creating two partitions we report the result of the combine step combining both individuals. The same is done using the combine operator of Soper et. al. [28] (SC), i.e. we create two individuals using perturbed edge weights as in [28] and report the cut produced by the combine step proposed there (the best out of the three individuals). We also present best results out of three repetitions when using perturbed edge weights as in Soper et. al. (S3R). Since our partitioner does not support double type edge weights, we computed the perturbations and scaled them by a factor of 10 000 (for S3R and SC). We performed ten repetitions on the middle sized testset. Results are reported in Table 2. A table presenting absolute average values and comparing the runtime of these algorithms can be found in Appendix Table 5. We can see that for large our new combine operator yields improved partition quality in compareable or less time (KC vs. K3R)). Most importantly, we can see that edge biases decrease the solution quality (K3R vs. S3R). This is due to the fact that edge biases make edge cuts optimial that are not close to optimial in the unbiased problem. For example on 2D grid graphs, we have straight edge cuts that are optimal. Random edge biases make bended edge cuts optimal. However, these cuts are are not close to optimal cuts of the original graph partitioning problem. Moreover, local search algorithms (Flowbased, FMbased) work better if there are a lot of equally sized cuts.
6.5 Walshaw Benchmark
We now apply KaFFPaE to Walshaw’s benchmark archive [28] using the rules used there, i.e., running time is not an issue but we want to achieve minimal cut values for and balance parameters . We focus on since KaFFPaE (more precisely KaFFPa) is not made for the case . We run KaFFPaE with a time limit of two hours using 16 PEs (two nodes of the cluster) per graph, and and report the best results obtained in the Appendix D. KaFFPaE computed 300 partitions which are better than previous best partitions reported there: 91 for 1%, 103 for 3% and 106 for 5%. Moreover, it reproduced equally sized cuts in 170 of the 312 remaining cases. When only considering the 15 largest graphs and we are able to reproduce or improve the current result in 224 out of 240 cases. Overall our systems (including KaPPa, KaSPar, KaFFPa, KaFFPaE) now improved or reproduced the entrys in 550 out of 612 cases (for ).
6.6 Comparison with PUNCH
grp,  algorithm/runtime  
ger.  P  B  B  
2  164  83  161  6  161 
4  400  96  394  6  393 
8  711  102  694  9  693 
16  1 144  83  1 148  16  1 137 
32  1 960  71  1 928  31  1 898 
64  3 165  83  3 164  62  3 143 
eur.  P  B  B  
2 
129  423  149  39  129 
4  309  358  313  39  310 
8  634  293  693  47  659 
16  1 293  252  1 261  73  1 238 
32  2 289  217  2 259  130  2 240 
64  3 828  241  3 856  248  3 825 
In this Section we focus on finding partitions for road networks. We implemented a specialized algorithm, Buffoon, which is similar to PUNCH [11] in the sense that it also uses natural cuts as a preprocessing technique to obtain a coarser graph on which the graph partitioning problem is solved. For more information on natural cuts, we refer the reader to [11]. Using our (shared memory) parallelized version of natural cut preprocessing we obtain a coarse version of the graph. Note that our preprocessing uses slightly different parameters than PUNCH (using the notation of [11], we use ). Since partitions of the coarse graph correspond to partitions of the original graph, we use KaFFPaE to partition the coarse version of the graph.
After preprocessing, we gave KaFFPaE on europe and on germany, to compute a partition. In both cases we used all 16 cores (hyperthreading active) of machine B for preprocessing and for KaFFPaE. The experiments where repeated ten times. A summary of the results is shown in Table 3. Interestingly, on germany already our average values are smaller or equal to the best result out of 100 repetitions obtained by PUNCH. Overall in 9 out of 12 cases we compute a best cut that is better or equal to the best cut obtained by PUNCH. Note that for obtaining the best cut values we invest significantly more time than PUNCH. However, their machine is about a factor two faster (12 cores running at 3.33GHz compared to 8 cores running at 2.67GHz) and our algorithm is not tuned for road networks. A table comparing the results on road networks against KaFFPa, KaSPar, Scotch and Metis can be found in Appendix 6. These algorithms produce 9%, 12%, 93% and 288% larger cuts on average respectively.
7 Conclusion and Future Work
KaFFPaE is an distributed evolutionary algorithm to tackle the graph partitioning problem. Due to new crossover and mutation operators as well as its scalable parallelization it is able to compute the best known partitions for many standard benchmark instances in only a few minutes. We therefore believe that KaFFPaE is still helpful in the area of high performance computing.
Regarding future work, we want to integrate other partitioners if they implement the possibility to block edges during the coarsening phase and use the given partitioning as initial solution. It would be interesting to try other domain specific combine operators, e.g. on social networks it could be interesting to use a modularity clusterer to compute a clustering for the combine operation.
References

[1]
Enrique Alba and Marco Tomassini.
Parallelism and evolutionary algorithms.
IEEE Trans. Evolutionary Computation
, 6(5):443–462, 2002.  [2] Thomas Bäck. Evolutionary algorithms in theory and practice : evolution strategies, evolutionary programming, genetic algorithms. PhD thesis, 1996.
 [3] David Bader, Henning Meyerhenke, Peter Sanders, and Dorothea Wagner. 10th DIMACS Implementation Challenge  Graph Partitioning and Graph Clustering, http://www.cc.gatech.edu/dimacs10/.
 [4] Reinhard Bauer, Daniel Delling, Peter Sanders, Dennis Schieferdecker, Dominik Schultes, and Dorothea Wagner. Combining hierarchical and goaldirected speedup techniques for dijkstra’s algorithm. ACM Journal of Experimental Algorithmics, 15, 2010.

[5]
Una Benlic and JinKao Hao.
A multilevel memtetic approach for improving graph partitions.
In
22nd Intl. Conf. Tools with Artificial Intelligence
, pages 121–128, 2010.  [6] K.D. Boese, A.B. Kahng, and S. Muddu. A new adaptive multistart technique for combinatorial global optimizations. Operations Research Letters, 16(2):101–113, 1994.
 [7] Thang Nguyen Bui and Curt Jones. Finding good approximate vertex and edge partitions is NPhard. Inf. Process. Lett., 42(3):153–159, 1992.
 [8] Pierre Chardaire, Musbah Barake, and Geoff P. McKeown. A probebased heuristic for graph partitioning. IEEE Trans. Computers, 56(12):1707–1720, 2007.
 [9] Kenneth Alan De Jong. Evolutionary computation : a unified approach. MIT Press, 2006.
 [10] D. Delling, P. Sanders, D. Schultes, and D. Wagner. Engineering route planning algorithms. In Algorithmics of Large and Complex Networks, volume 5515 of LNCS StateoftheArt Survey, pages 117–139. Springer, 2009.
 [11] Daniel Delling, Andrew V. Goldberg, Ilya Razenshteyn, and Renato F. Werneck. Graph Partitioning with Natural Cuts. In 25th International Parallel and Distributed Processing Symposium (IPDPS’11). IEEE Computer Society, 2011.
 [12] Benjamin Doerr and Mahmoud Fouz. Asymptotically optimal randomized rumor spreading. In ICALP (2), volume 6756 of Lecture Notes in Computer Science, pages 502–513. Springer, 2011.
 [13] D. Drake and S. Hougardy. A simple approximation algorithm for the weighted matching problem. Information Processing Letters, 85:211–213, 2003.
 [14] C. M. Fiduccia and R. M. Mattheyses. A LinearTime Heuristic for Improving Network Partitions. In 19th Conference on Design Automation, pages 175–181, 1982.
 [15] P.O. Fjallstrom. Algorithms for graph partitioning: A survey. Linkoping Electronic Articles in Computer and Information Science, 3(10), 1998.

[16]
David E. Goldberg.
Genetic algorithms in search, optimization, and machine learning
. AddisonWesley, 1989.  [17] M. Holtgrewe, P. Sanders, and C. Schulz. Engineering a Scalable High Quality Graph Partitioner. 24th IEEE International Parallal and Distributed Processing Symposium, 2010.
 [18] Hiroaki Inayoshi and Bernard Manderick. The weighted graph bipartitioning problem: A look at ga performance. In PPSN, volume 866 of Lecture Notes in Computer Science, pages 617–625. Springer, 1994.
 [19] G. Karypis, V. Kumar, Army High Performance Computing Research Center, and University of Minnesota. Parallel multilevel kway partitioning scheme for irregular graphs. SIAM Review, 41(2):278–300, 1999.
 [20] Jin Kim, Inwook Hwang, YongHyuk Kim, and Byung Ro Moon. Genetic approaches for graph partitioning: a survey. In GECCO, pages 473–480. ACM, 2011.
 [21] J. Maue and P. Sanders. Engineering algorithms for approximate weighted matching. In 6th Workshop on Exp. Algorithms (WEA), volume 4525 of LNCS, pages 242–255. Springer, 2007.
 [22] Brad L. Miller and David E. Goldberg. Genetic algorithms, tournament selection, and the effects of noise. Complex Systems, 9:193–212, 1995.
 [23] V. Osipov and P. Sanders. nLevel Graph Partitioning. 18th European Symposium on Algorithms (see also arxiv preprint arXiv:1004.4024), 2010.
 [24] F. Pellegrini. Scotch home page. http://www.labri.fr/pelegrin/scotch.
 [25] Josep M. Pujol, Vijay Erramilli, and Pablo Rodriguez. Divide and conquer: Partitioning online social networks. CoRR, abs/0905.4918, 2009.
 [26] P. Sanders and C. Schulz. Engineering Multilevel Graph Partitioning Algorithms. 19th European Symposium on Algorithms (see also arxiv preprint arXiv:1012.0006v3), 2011.
 [27] K. Schloegel, G. Karypis, and V. Kumar. Graph Partitioning for High Performance Scientific Simulations. UMSI research report/University of Minnesota (Minneapolis, Mn). Supercomputer institute, page 38, 2000.
 [28] A.J. Soper, C. Walshaw, and M. Cross. A combined evolutionary search and multilevel optimisation approach to graphpartitioning. Journal of Global Optimization, 29(2):225–241, 2004.
 [29] C. Walshaw. Multilevel refinement for combinatorial optimisation problems. Annals of Operations Research, 131(1):325–372, 2004.
 [30] C. Walshaw and M. Cross. Mesh Partitioning: A Multilevel Balancing and Refinement Algorithm. SIAM Journal on Scientific Computing, 22(1):63–80, 2000.
 [31] C. Walshaw and M. Cross. JOSTLE: Parallel Multilevel GraphPartitioning Software – An Overview. In F. Magoules, editor, Mesh Partitioning Techniques and Domain Decomposition Techniques, pages 27–58. CivilComp Ltd., 2007. (Invited chapter).
Appendix A Karlsruhe Fast Flow Partitioner
We now provide a brief overview over the techniques used in the underlying graph partitioner which is used a graph partitioner later. KaFFPa [26] is a classical matching based multilevel graph partitioner. Recall that a multilevel graph partitioner basically has three phases: coarsening, initial partitioning and uncoarsening.
KaFFPa makes contraction more systematic by separating two issues: A rating function indicates how much sense it makes to contract an edge based on local information. A matching algorithm tries to maximize the sum of the ratings of the contracted edges looking at the global structure of the graph. While the rating functions allows a flexible characterization of what a “good” contracted graph is, the simple, standard definition of the matching problem allows to reuse previously developed algorithms for weighted matching. Matchings are contracted until the graph is “small enough”. In [17] we have observed that the rating function works best among other edge rating functions, so that this rating function is also used in KaFFPa.
We employed the Global Path Algorithm (GPA) as a matching algorithm. It was proposed in [21] as a synthesis of the Greedy algorithm and the Path Growing Algorithm [13]. This algorithm achieves a halfapproximation in the worst case, but empirically, GPA gives considerably better results than Sorted Heavy Edge Matching and Greedy (for more details see [17]). GPA scans the edges in order of decreasing weight but rather than immediately building a matching, it first constructs a collection of paths and even cycles. Afterwards, optimal solutions are computed for each of these paths and cycles using dynamic programming.
The contraction is stopped when the number of remaining nodes is below . The graph is then small enough to be partitioned by some initial partitioning algorithm. KaFFPa employs Scotch as an initial partitioner since it empirically performs better than Metis.
Recall that the refinement phase iteratively uncontracts the matchings contracted during the contraction phase. After a matching is uncontracted, local search based refinement algorithms move nodes between block boundaries in order to reduce the cut while maintaining the balancing constraint. Local improvement algorithms are usually variants of the FMalgorithm [14]. The algorithm is organized in rounds. In each round, a priority queue is used which is initialized with all vertices that are incident to more than one block, in a random order. The priority is based on the gain where is the decrease in edge cut when moving to block . Ties are broken randomly if there is more than one block that yields the maximum gain when moving to it. Local search then repeatedly looks for the highest gain node . Each node is moved at most once within a round. After a node is moved its unmoved neighbors become eligible, i.e. its unmoved neighbors are inserted into the priority queue. When a stopping criterion is reached all movements to the best found cut that occurred within the balance constraint are undone. This process is repeated several times until no improvement is found.
During the uncoarsening phase KaFFPa additionally uses more advanced refinement algorithms. The first method is based on maxflow mincut computations between pairs of blocks, i.e., a method to improve a given bipartition. Roughly speaking, this improvement method is applied between all pairs of blocks that share a nonempty boundary. The algorithm basically constructs a flow problem by growing an area around the given boundary vertices of a pair of blocks such that each min cut in this area yields a feasible bipartition of the original graph within the balance constraint. This yields a locally improved partition of the graph. The second method for improving a given partition is called multitry FM. Roughly speaking, a way local search initialized with a single boundary node is repeatedly started. Previous methods are initialized with all boundary nodes.
KaFFPa extended the concept of iterated multilevel algorithms which was introduced by [29]. The main idea is to iterate the coarsening and uncoarsening phase. Once the graph is partitioned, edges that are between two blocks are not contracted. An Fcycle works as follows: on each level we perform at most two recursive calls using different random seeds during contraction and local search. A second recursive call is only made the second time that the algorithm reaches a particular level. As soon as the graph is partitioned, edges that are between blocks are not contracted. This ensures nondecreasing quality of the partition since our refinement algorithms guarantee no worsening and break ties randomly. These so called global search strategies are more effective than plain restarts of the algorithm.
Appendix B Additional Experimental Data
b.1 Further Parameter Tuning
In this Section we perform parameter tuning using KaFFPaEco (a faster but not so powerful as KaFFPaStrong) as a base case partitioner. We start tuning the fraction parameter . As before we set the flip coin parameter to one. In Figure 5 we can see that the algorithm is not too sensitive about the exact choice of this parameter. As before, larger values of speed up the convergence rate and improve the result achieved in the end. Since is the best parameter in the end, we choose it as our default value.
We now tune the ratio between mutation to crossover operations. For this test we set . The results a similar to the results achieved when using KaFFPaStrong as a base case partitioner. Again we can see that for smaller values of the algorithm is not to sensitive about the exact choice of the parameter. When , i.e. no crossover operation is performed the convergence speed slows down which yields worse average results in the end. The results of and are comparable in the end. We choose for consistency.
b.2 Further Comparison Data
/Algo.  Reps.  KaFFPaE  Scotch  Metis  
Avg.  Avg.  Best.  [s]  Best.  [s]  
2  569  568  671  0.22  711  0.12 
4  1 229  1 217  1 486  0.41  1 574  0.13 
8  2 207  2 173  2 663  0.62  2 831  0.13 
16  3 568  3 474  4 192  0.86  4 500  0.14 
32  5 481  5 298  6 437  1.15  6 899  0.15 
64  8 141  7 879  9 335  1.46  10 306  0.18 
128  11 937  11 486  13 427  1.85  14 500  0.20 
256  17 262  16 634  18 972  2.28  20 341  0.25 
overall  3 872  3 779  4 507  0.87  4 835  0.16 
Algo.  S3R  K3R  KC  SC  
avg.  [s]  avg.  [s]  avg.  [s]  avg.  [s]  
2  591  19  577  14  582  12  590  17 
4  1 304  30  1 261  28  1 254  22  1 302  27 
8  2 336  40  2 252  45  2 255  36  2 332  41 
16  3 723  54  3 617  67  3 649  57  3 714  61 
32  5 720  82  5 569  110  5 540  99  5 722  84 
64  8 463  116  8 236  164  8 213  146  8 512  113 
128  12 435  171  12 008  239  11 895  225  12 432  162 
256  17 915  217  17 335  327  17 199  329  17 935  232 

b.3 Larger Scalability Plots
b.4 Road Networks
PUNCH  Buffoon  KaFFPa Strong  KaSPar Strong  Scotch  Metis  
graph  Best  Avg.  [m]  Best  Avg.  [m]  Best  Avg.  [m]  Best  Avg.  [m]  Best  Avg.  [m]  Best  Avg.  [m]  
deu  2  164  166  0.83  161  161  6.2  163  166  3.29  167  172  3.86  265  279  0.05  271  296  0.10 
deu  4  400  410  0.96  393  394  6.8  395  403  5.25  419  426  4.07  608  648  0.10  592  710  0.10 
deu  8  711  746  1.02  693  694  9.7  726  729  5.85  762  773  4.17  1 109  1 211  0.15  1 209  1 600  0.10 
deu  16  1 144  1 188  0.83  1 137  1 148  16.8  1 263  1 278  7.05  1 308  1 333  4.64  1 957  2 061  0.20  2 052  2 191  0.10 
deu  32  1 960  2 032  0.71  1 898  1 928  31.7  2 115  2 146  7.68  2 182  2 217  4.73  3 158  3 262  0.25  3 225  3 607  0.10 
deu  64  3 165  3 253  0.83  3 143  3 164  61.1  3 432  3 440  8.55  3 610  3 631  4.89  4 799  4 937  0.30  4 985  5 320  0.10 
eur  2  129  130  4.25  129  175  39.5  130  130  16.88  133  138  32.44  369  448  0.20  412  454  0.55 
eur  4  309  309  3.58  310  317  39.1  412  430  30.40  355  375  36.13  727  851  0.40  902  1 698  0.54 
eur  8  634  671  2.93  659  671  47.9  749  772  34.45  774  786  37.21  1 338  1 461  0.60  2 473  3 819  0.55 
eur  16  1 293  1 353  2.52  1 238  1 257  73.5  1 454  1 493  39.01  1 401  1 440  42.56  2 478  2 563  0.81  3 314  8 554  0.56 
eur  32  2 289  2 362  2.17  2 240  2 260  130.2  2 428  2 504  40.76  2 595  2 643  43.31  4 057  4 249  1.00  5 811  7 380  0.55 
eur  64  3 828  3 984  2.41  3 825  3 862  248.9  4 240  4 264  42.23  4 502  4 526  42.23  6 518  6 739  1.23  10 264  13 947  0.55 
overall  822  847  1.57  812  831  33.9  893.05  909  13.97  911  931  13.03  1 495  1 607  0.30  1 800  2 400  0.23 
Appendix C Instances
small sized instances  

graph  
rgg15  160 240  
rgg16  342 127  
delaunay15  98 274  
delaunay16  196 575  
uk  4 824  6 837 
luxemburg  114 599  119 666 
3elt  4 720  13 722 
4elt  15 606  45 878 
fe_sphere  16 386  49 152 
cti  16 840  48 232 
fe_body  45 087  163 734 
medium sized instances  

graph  
rgg17  728 753  
rgg18  1 547 283  
delaunay17  393 176  
delaunay18  786 396  
bel  463 514  591 882 
nld  893 041  1 139 540 
t60k  60 005  89 440 
wing  62 032  121 544 
fe_tooth  78 136  452 591 
fe_rotor  99 617  662 431 
memplus  17 758  54 196 
road networks 


graph  
germany  4 378 446  5 483 587 
europe  18 029 721  22 217 686 
Appendix D Detailed Walshaw Benchmark Results
Graph/ 
2  4  8  16  32  64  
add20 
642  594  1 194  1 159  1 727  1 696  2 107  2 062  2 512  2 687  3 188  3 108 
data  188  188  377  378  656  659  1 142  1 135  1 933  1 858  2 966  2 885 
3elt  89  89  199  199  340  341  568  569  967  968  1 553  1 553 
uk  19  19  40  40  80  82  144  146  251  256  417  419 
add32  10  10  33  33  66  66  117  117  212  212  486  493 
bcsstk33  10 096  10 097  21 390  21 508  34 174  34 178  55 327  54 763  78 199  77 964  109 811  108 467 
whitaker3  126  126  380  380  654  655  1 091  1 091  1 678  1 697  2 532  2 552 
crack  183  183  362  362  676  677  1 098  1 089  1 697  1 687  2 581  2 555 
wing_nodal  1 695  1 695  3 563  3 565  5 422  5 427  8 353  8 339  12 040  11 828  16 185  16 124 
fe_4elt2  130  130  349  349  603  604  1 002  1 005  1 620  1 628  2 530  2 519 
vibrobox  11 538  10 310  18 956  19 098  24 422  24 509  33 501  32 102  41 725  40 085  49 012  47 651 
bcsstk29  2 818  2 818  8 029  8 029  13 904  13 950  22 618  21 768  35 654  34 841  57 712  57 031 
4elt  138  138  320  320  532  533  932  934  1 551  1 547  2 574  2 579 
fe_sphere  386  386  766  766  1 152  1 152  1 709  1 709  2 494  2 488  3 599  3 584 
cti  318  318  944  944  1 749  1 752  2 804  2 837  4 117  4 129  5 820  5 818 
memplus  5 491  5 484  9 448  9 500  11 807  11 776  13 250  13 001  15 187  14 107  17 183  16 543 
cs4  366  366  925  934  1 436  1 448  2 087  2 105  2 910  2 938  4 032  4 051 
bcsstk30  6 335  6 335  16 596  16 622  34 577  34 604  70 945  70 604  116 128  113 788  176 099  172 929 
bcsstk31  2 699  2 699  7 282  7 287  13 201  13 230  23 761  23 807  37 995  37 652  59 318  58 076 
fe_pwt  340  340  704  704  1 433  1 437  2 797  2 798  5 523  5 549  8 222  8 276 
bcsstk32  4 667  4 667  9 195  9 208  20 204  20 323  35 936  36 399  61 533  60 776  94 523  91 863 
fe_body  262  262  598  598  1 026  1 048  1 714  1 779  2 796  2 935  4 825  4 879 
t60k  75  75  208  208  454  454  805  815  1 320  1 352  2 079  2 123 
wing  784  784  1 610  1 613  2 479  2 505  3 857  3 880  5 584  5 626  7 680  7 656 
brack2  708  708  3 013  3 013  7 040  7 099  11 636  11 649  17 508  17 398  26 226  25 913 
finan512  162  162  324  324  648  648  1 296  1 296  2 592  2 592  10 560  10 560 
fe_tooth  3 814  3 815  6 846  6 867  11 408  11 473  17 411  17 396  25 111  24 933  34 824  34 433 
fe_rotor  2 031  2 031  7 180  7 292  12 726  12 813  20 555  20 438  31 428  31 233  46 372  45 911 
598a  2 388  2 388  7 948  7 952  15 956  15 924  25 741  25 789  39 423  38 627  57 497  56 179 
fe_ocean  387  387  1 816  1 824  4 091  4 134  7 846  7 771  12 711  12 811  20 301  19 989 
144  6 478  6 478  15 152  15 140  25 273  25 279  37 896  38 212  56 550  56 868  79 198  80 406 
wave  8 658  8 665  16 780  16 875  28 979  29 115  42 516  42 929  61 104  62 551  85 589  86 086 
m14b  3 826  3 826  12 973  12 981  25 690  25 852  42 523  42 351  65 835  67 423  98 211  99 655 
auto  9 949  9 954  26 614  26 649  45 557  45 470  77 097  77 005  121 032  121 608  172 167  174 482 
Graph/ 
2  4  8  16  32  64  
add20 
623  576  1 180  1 158  1 696  1 689  2 075  2 062  2 422  2 387  2 963  3 021 
data  185  185  369  369  638  638  1 111  1 118  1 815  1 801  2 905  2 809 
3elt  87  87  198  198  334  335  561  562  950  950  1 537  1 532 
uk  18  18  39  39  78  78  140  141  240  245  406  411 
add32  10  10  33  33  66  66  117  117  212  212  486  490 
bcsstk33  10 064  10 064  20 767  20 854  34 068  34 078  54 772  54 455  77 549  77 353  108 645  107 011 
whitaker3  126  126  378  378  650  651  1 084  1 086  1 662  1 673  2 498  2 499 
crack  182  182  360  360  671  673  1 077  1 077  1 676  1 666  2 534  2 529 
wing_nodal  1 678  1 678  3 538  3 542  5 361  5 368  8 272  8 310  11 939  11 828  15 967  15 874 
fe_4elt2  130  130  342  342  595  596  991  994  1 599  1 613  2 485  2 503 
vibrobox  11 538  10 310  18 736  18 778  24 204  24 170  33 065  31 514  41 312  39 512  48 184  47 651 
bcsstk29  2 818  2 818  7 971  7 983  13 717  13 816  22 000  21 410  34 535  34 400  55 544  55 302 
4elt  137  137  319  319  522  523  906  908  1 523  1 524  2 543  2 565 
fe_sphere  384  384  764  764  1 152  1 152  1 698  1 704  2 474  2 471  3 552  3 530 
cti  318  318  916  916  1 714  1 714  2 746  2 758  3 994  4 011  5 579  5 675 
memplus  5 353  5 353  9 375  9 362  11 662  11 624  13 088  13 001  14 617  14 107  16 997  16 259 
cs4  360  360  917  926  1 424  1 434  2 055  2 087  2 892  2 925  4 016  4 051 
bcsstk30  6 251  6 251  16 399  16 497  34 137  34 275  69 592  69 763  113 888  113 788  173 290  171 727 
bcsstk31  2 676  2 676  7 150  7 150  12 985  13 003  23 299  23 232  37 109  37 228  58 143  57 953 
fe_pwt  340  340  700  700  1 410  1 411  2 773  2 776  5 460  5 488  8 124  8 205 
bcsstk32  4 667  4 667  8 725  8 733  19 956  19 962  35 140  35 486  59 716  58 966  91 544  91 715 
fe_body  262  262  598  598  1 018  1 016  1 708  1 734  2 738  2 810  4 643  4 799 
t60k  71  71  203  203  449  449  793  802  1 304  1 333  2 039  2 098 
wing  773  773  1 593  1 602  2 451  2 463  3 807  3 852  5 559  5 626  7 561  7 656 
brack2  684  684  2 834  2 834  6 800  6 861  11 402  11 444  17 167  17 194  25 658  25 913 
finan512  162  162  324  324  648  648  1 296  1 296  2 592  2 592  10 560  10 560 
fe_tooth  3 788  3 788  6 764  6 795  11 287  11 274  17 176  17 310  24 752  24 933  34 230  34 433 
fe_rotor  1 959  1 959  7 118  7 126  12 445  12 472  20 076  20 112  30 664  31 233  45 053  45 911 
598a  2 367  2 367  7 816  7 838  15 613  15 722  25 563  25 686  38 346  38 627  56 153  56 179 
fe_ocean  311  311  1 693  1 696  3 920  3 921  7 657  7 631  12 437  12 539  19 521  19 989 
144  6 434  6 438  15 203  15 078  25 092  25 109  37 730  37 762  55 941  56 356  78 636  78 559 
wave  8 591  8 594  16 665  16 668  28 506  28 495  42 259  42 295  60 731  61 722  84 533  85 185 
m14b  3 823  3 823  12 948  12 948  25 390  25 520  41 778  41 997  65 359  65 180  96 519  96 802 
auto  9 673  9 683  25 789  25 836  44 785  44 832  75 719  75 778  119 157  120 086  170 989  171 535 
Graph/ 
2  4  8  16  32  64  
add20 
598  546  1 169  1 149  1 689  1 675  2 061  2 062  2 411  2 387  2 963  3 021 
data  182  181  363  363  628  628  1 088  1 084  1 786  1 776  2 832  2 798 
3elt  87  87  197  197  329  330  557  558  944  942  1 509  1 519 
uk  18  18  39  39  75  76  137  139  237  242  395  400 
add32  10  10  33  33  63  63  117  117  212  212  483  486 
bcsstk33  9 914  9 914  20 167  20 179  33 919  33 922  54 333  54 296  77 457  77 101  106 903  106 827 
whitaker3  126  126  377  378  644  644  1 073  1 079  1 650  1 667  2 477  2 498 
crack  182  182  360  360  666  667  1 065  1 076  1 661  1 655  2 505  2 516 
wing_nodal  1 669  1 668  3 521  3 522  5 341  5 345  8 241  8 264  11 793  11 828  15 892  15 813 
fe_4elt2  130  130  335  335  578  580  983  984  1 575  1 592  2 461  2 482 
vibrobox  11 254  10 310  18 690  18 696  23 924  23 930  32 615  31 234  40 816  39 183  47 624  47 361 
bcsstk29  2 818  2 818  7 925  7 936  13 540  13 575  21 459  20 924  33 851  33 817  55 029  54 895 
4elt  137  137  315  315  515  515  888  895  1 504  1 516  2 514  2 546 
fe_sphere  384  384  762  762  1 152  1 152  1 681  1 683  2 434  2 465  3 528  3 522 
cti  318  318  889  889  1 684  1 684  2 719  2 721  3 927  3 920  5 512  5 594 
memplus  5 281  5 267  9 292  9 297  11 624  11 543  13 095  13 001  14 537  14 107  16 650  16 044 
cs4  353  353  909  912  1 420  1 431  2 043  2 079  2 866  2 919  3 973  4 012 
bcsstk30  6 251  6 251  16 189  16 186  34 071  34 146  69 337  69 288  112 159  113 321  170 321  170 591 
bcsstk31  2 669  2 670  7 086  7 088  12 853  12 865  22 871  23 104  36 502  37 228  57 502  56 674 
fe_pwt  340  340  700  700  1 405  1 405  2 743  2 745  5 399  5 423  7 985  8 119 
bcsstk32  4 622  4 622  8 441  8 441  19 411  19 601  34 481  35 014  58 395  58 966  90 586  89 897 
fe_body  262  262  588  588  1 013  1 014  1 684  1 697  2 696  2 787  4 512  4 642 
t60k  65  65  195  195  443  445  788  796  1 299  1 329  2 021  2 089 
wing  770  770  1 590  1 593  2 440  2 452  3 775  3 832  5 538  5 564  7 567  7 611 
brack2  660  660  2 731  2 731  6 592  6 611  11 193  11 232  16 919  17 112  25 598  25 805 
finan512  162  162  324  324  648  648  1 296  1 296  2 592  2 592  10 560  10 560 
fe_tooth  3 773  3 773  6 688  6 714  11 154  11 185  17 070  17 215  24 733  24 933  34 320  34 433 
fe_rotor  1 940  1 940  6 899  6 940  12 309  12 347  19 680  19 932  30 356  30 974  45 131  45 911 
598a  2 336  2 336  7 728  7 735  15 414  15 483  25 450  25 533  38 476  38 550  56 377  56 179 
fe_ocean  311  311  1 686  1 686  3 893  3 902  7 385  7 412  12 211  12 362  19 400  19 727 
144  6 357  6 359  15 004  14 982  25 030  24 767  37 419  37 122  55 460  55 984  77 430  78 069 
wave  8 524  8 533  16 558  16 533  28 489  28 492  42 084  42 134  60 537  61 280  83 413  84 236 
m14b  3 802  3 802  12 945  12 945  25 154  25 143  41 465  41 536  65 237  65 077  96 257  96 559 
auto  9 450  9 450  25 271  25 301  44 206  44 346  74 636  74 561  119 294  119 111  169 835  171 329 
Comments
There are no comments yet.