1 Introduction
Balanced graph partitioning is an important problem in computer science and engineering with an abundant amount of application domains, such as VLSI circuit design, data mining and distributed systems [37]. It is well known that this problem is NPcomplete [8] and that no approximation algorithm with a constant ratio factor exists for general graphs unless P=NP [8]. Still, there is a large amount of literature on methods (with worstcase exponential time) that solve the graph partitioning problem to optimality. This includes methods dedicated to the bipartitioning case [3, 4, 12, 13, 14, 15, 23, 21, 29, 38] and some methods that solve the general graph partitioning problem [16, 39]. Most of these methods rely on the branchandbound framework [27]. However, these methods can typically solve only very small problems as their running time grows exponentially, or if they can solve large bipartitioning instances using a moderate amount of time [12, 13], the running time highly depends on the bisection width of the graph. Methods that solve the general graph partitioning problem [16, 39] have huge running times for graphs with up to a few hundred vertices. Thus in practice mostly heuristic algorithms are used.
Typically the graph partitioning problem asks for a partition of a graph into blocks of about equal size such that there are few edges between them. Here, we focus on the case when the bounds on the size are very strict, including the case of perfect balance when the maximal block size has to equal the average block size.
Our focus in this paper is on solution quality, i.e. minimize the number of edges that run between blocks. During the past two decades there have been numerous researchers trying to improve the best graph partitions in Walshaw’s wellknown partitioning benchmark [40, 41]. Overall there have been more than forty different approaches that participated in this benchmark. Indeed, high solution quality is of major importance in applications such as VLSI Design [1, 2] where even minor improvements in the objective can have a large impact on the production costs and quality of a chip. Highquality solutions are also favorable in applications where the graph needs to be partitioned only once and then the partition is used over and over again, implying that the running time of the graph partitioning algorithms is of a minor concern [11, 18, 26, 28, 31, 30]. Thirdly, highquality solutions are even important in areas in which the running time overhead is paramount [40], such as finite element computations [36] or the direct solution of sparse linear systems [20]. Here, highquality graph partitions can be useful for benchmarking purposes, i.e. measuring how much more running time can be saved by higher quality solutions.
In order to compute highquality solutions, stateoftheart local search algorithms exchange vertices between blocks of the partition trying to decrease the cut size while also maintaining balance. This highly restricts the set of possible improvements. Recently, we introduced new techniques that relax the balance constraint for vertex movements but globally maintain balance by combining multiple local searches [35]. This was done by reducing this combination problem to finding negative cycles in a graph. In this paper, we extend the neighborhood of the combination problem by employing integer linear programming. This enables us to find even more complex combinations and hence to further improve solutions. More precisely, our approach is based on integer linear programs that solve the partitioning problem to optimality. However, out of the box those programs typically do not scale to large inputs, in particular because the graph partitioning problem has a very large amount of symmetry – given a partition of the graph, each permutation of the block IDs gives a solution having the same objective and balance. Hence, we adapt the integer linear program to improve a given input partition. We do so by defining a much smaller graph, called model, and solve the graph partitioning problem on the model to optimality by the integer linear program. More specifically, we select vertices close to the cut of the given input partition for potential movement and contract all remaining vertices of a block into a single vertex. A feasible partition of this model corresponds to a partition of the input graph having the same balance and objective. Moreover, this model enables us to use symmetry breaking, which allows us to scale to much larger inputs. To make the approach even faster, we combine it with initial bounds on the objective provided by the input partition, as well as providing the input partition to the integer linear program solver. Overall, we arrive at a system that is able to improve more than half of all entries in Walshaw’s benchmark when the number of blocks is high.
The rest of the paper is organized as follows. We begin in Section 2 by introducing basic concepts. After presenting some related work in Section 3 we outline the integer linear program as well as our novel local search algorithm in Section 4. Here, we start by explaining the very basic idea that allows us to find combinations of simple vertex movements. We then explain our strategies to improve the running time of the solver and strategies to select vertices for movement. A summary of extensive experiments done to evaluate the performance of our algorithms is presented in Section 5. Finally, we conclude in Section 6.
2 Preliminaries
2.1 Basic concepts
Let be an undirected graph. We consider positive, realvalued edge and vertex weight functions resp. and extend them to sets, i.e., and . Let denote the neighbors of . The degree of a vertex is . A vertex is a boundary vertex if it is incident to at least one vertex in a different block. We are looking for disjoint blocks of vertices ,…, that partition ; i.e., . The balancing constraint demands that each block has weight for some imbalance parameter . We call a block overloaded if its weight exceeds . The objective of the problem is to minimize the total cut subject to the balancing constraints. We define the gain of a vertex as the maximum decrease in the cut value when moving it to a different block.
3 Related Work
There has been a huge amount of research on graph partitioning and we refer the reader to the surveys given in [6, 9, 36, 42] for most of the material. Here, we focus on issues closely related to our main contributions. All generalpurpose methods that are able to obtain good partitions for large realworld graphs are based on the multilevel principle. Wellknown software packages based on this approach include Jostle [42], KaHIP [33], Metis [24] and Scotch [32].
Chris Walshaw’s wellknown benchmark archive has been established in 2001 [40, 41]. Overall it contains 816 instances (34 graphs, 4 values of imbalance, and 6 values of ). Ever since there have been more than forty different approaches that participated in this benchmark. In this benchmark, the running time of the participating algorithms is not measured or reported. Submitted partitions will be validated and added to the archive if they improve on a particular result. This can either be an improvement in the number of cut edges or, if they match the current best cut size, an improvement in the weight of the largest block. Most entries in the benchmark have as of Feb. been obtained by Galinier et al. [19] (more precisely an implementation of that approach by Frank Schneider), Hein and Seitzer [22] and the Karlsruhe HighQuality Graph Partitioning (KaHIP) framework [35]. More precisely, Galinier et al. [19] use a memetic algorithm that is combined with tabu search to compute solutions and Hein and Seitzer [22] solve the graph partitioning problem by providing tight relaxations of a semidefinite program into a continuous problem.
The Karlsruhe HighQuality Graph Partitioning (KaHIP) framework implements many different algorithms, for example flowbased methods and morelocalized local searches, as well as several coarsegrained parallel and sequential metaheuristics. KaBaPE [35]
is a coarsegrained parallel evolutionary algorithm, i.e. each processor has its own population (set of partitions) and a copy of the graph. After initially creating the local population, each processor performs multilevel combine and mutation operations on the local population. This is combined with a metaheuristic that combines local searches that individually violate the balance constraint into a more global feasible improvement. For more details, we refer the reader to
[35].4 Local Search based on Integer Linear Programming
We now explain our algorithm that combines integer linear programming and local search. We start by explaining the integer linear program that can solve the graph partitioning problem to optimality. However, outofthebox this program does not scale to large inputs, in particular because the graph partitioning problem has a very large amount of symmetry. Thus, we reduce the size of the graph by first computing a partition using an existing heuristic and based on it collapsing parts of the graph. Roughly speaking, we compute a small graph, called model, in which we only keep a small amount of selected vertices for potential movement and perform graph contractions on the remaining ones. A partition of the model corresponds to a partition of the input network having the same objective and balance. The computed model is then solved to optimality using the integer linear program. As we will see this process enables us to use symmetry breaking in the linear program, which in turn drastically speeds up computation times.
4.1 Integer Linear Program for the Graph Partitioning Problem
We now introduce a generalization of an integer linear program formulation for balanced bipartitioning [7] to the general graph partitioning problem. First, we introduce binary decision variables for all edges and vertices of the graph. More precisely, for each edge , we introduce the variable which is one if is a cut edge and zero otherwise. Moreover, for each and block , we introduce the variable which is one if is in block and zero otherwise. Hence, we have a total of variables. We use the following constraints to ensure that the result is a valid partition:
(1)  
(2)  
(3)  
(4) 
The first two constraints ensure that is set to one if the vertices and are in different blocks. For an edge and a block , the righthand side in this equation is one if one of the vertices and is in block and the other one is not. If both vertices are in the same block then the righthand side is zero for all values of . Hence, the variable can either be zero or one in this case. However, since the variable participates in the objective function and the problem is a minimization problem, it will be zero in an optimum solution. The third constraint ensures that the balance constraint is satisfied for each partition. And finally, the last constraint ensures that each vertex is assigned to exactly one block. To sum up, our program has constraints and nonzeros. Since we want to minimize the weight of cut edges, the objective function of our program is written as:
(5) 
4.2 Local Search
The graph partitioning problem has a large amount of symmetry – each permutation of the block IDs gives a solution with equal objective and balance. Hence, the integer linear program described above will scan many branches that contain essentially the same solutions so that the program does not scale to large instances. Moreover, it is not immediately clear how to improve the scalability of the program by using symmetry breaking or other techniques.
Our goal in this section is to develop a local search algorithm using the integer linear program above. Given a partition as input to be improved, our main idea is to contract vertices “that are far away” from the cut of the partition. In other words, we want to keep vertices close to the cut and contract all remaining vertices into one vertex for each block of the input partition. This ensures that a partition of the contracted graph yields a partition of the input graph with the same objective and balance. Hence, we apply the integer linear program to the model and solve the partitioning problem on it to optimality. Note, however, that due to the performed contractions this does not imply an optimal solution on the input graph.
We now outline the details of the algorithm. Our local algorithm has two inputs, a graph and a partition of its vertices. For now assume that we have a set of vertices which we want to keep in the coarse model, i.e. a set of vertices which we do not want to contract. We outline in Section 4.4 which strategies we have to select the vertices . For the purpose of contraction we define sets . We obtain our coarse model by contracting each of these vertex sets. The contraction of a vertex set works as follows: the set of vertices is contracted into a single vertex . The weight of is set to the sum of the weight of all vertices in the set that is contracted. There is an edge between two vertices and in the contracted graph if there is an edge between a vertex of the set and in the original graph . The weight of an edge is set to the sum of the weight of edges that run between the vertices of the set and . After all contractions have been performed the coarse model contains vertices, and potentially much less edges than the input graph. Figure 1 gives an abstract example of our model.
There are two things that are important to see: first, due to the way we perform contraction, the given partition of the input network yields a partition of our coarse model that has the same objective and balance simply by putting into block and keeping the block of the input for the vertices in . Moreover, if we compute a new partition of our coarse model, we can build a partition in the original graph with the same properties by putting the vertices into the block of their coarse representative together with the vertices of that are in this block. Hence, we can solve the integer linear program on the coarse model to compute a partition for the input graph. After the solver terminates, i.e. found an optimum solution of our mode or has reached a predefined time limit , we transfer the best solution to the original graph. Note that the latter is possible since an integer linear program solver typically computes intermediate solutions that may not be optimal.
4.3 Optimizations
Independent of the vertices that are selected to be kept in the coarse model, the approach above allows us to define optimizations to solve our integer linear program faster. We apply four strategies: (i) symmetry breaking, (ii) providing a start solution to the solver, (iii) add the objective of the input as a constraint as well as (iv) using the parallel solving facilities of the underlying solver. We outline the first three strategies in greater detail:
Symmetry Breaking.
If the set is small, then the solver will find a solution much faster. Ideally, our algorithms selects the vertices such that . In other words, no two contracted vertices can be clustered in one block. We can use this to break symmetry in our integer linear programming by adding constraints that fix the block of to block , i.e. we set and for . Moreover, for those vertices we can remove the constraint which ensures that the vertex is assigned to a single unique block—since we assigned those vertices to a block using the new additional constraints.
Providing a Start Solution to the Solver.
The integer linear program performs a significant amount of work in branches which correspond to solutions that are worse than the input partitioning. Only very few  if any  solutions are better than the given partition. However, we already know a fairly good partition (the given partition from the input) and give this partition to the solver by setting according initial values for all variables. This ensures that the integer linear program solver can omit many branches and hence speeds up the time needed to solve the integer linear program.
Solution Quality as a Constraint.
Since we are only interested in improved partitions, we can add an additional constraint that disallows solutions which have a worse objective than the input partition. Indeed, the objective function of the linear program is linear, and hence the additional constraint is also linear. Depending on the objective value, this reduces the number of branches that the linear program solver needs to look at. However, note that this comes at the cost of an additional constraint that needs to be evaluated.
4.4 Vertex Selection Strategies
The algorithm above works for different vertex sets that should be kept in the coarse model. There is an obvious tradeoff: on the one hand, the set should not be too large, otherwise the coarse model would be large and hence the linear programming solver needs a large amount of time to find a solution. On the other hand, the set should also not be too small, since this restricts the amount of possible vertex movements, and hence the approach is unlikely to find an improved solution. We now explain different strategies to select the vertex set . In any case, while we add vertices to the set , we compute the number of nonzeros in the corresponding ILP. We stop to add vertices when the number of nonzeros in the corresponding ILP is larger than a parameter .
Vertices Close to Input Cut.
The intuition of the first strategy, Boundary, is that changes or improvements of the partition will occur reasonable close to the input partition.
In this simple strategy our algorithm tries to use all boundary vertices as the set .
In order to adhere to the constraint on the number of nonzeros in the ILP, we add the vertices of the boundary uniformly at random and stop if the number of nonzeros is reached.
If the algorithm managed to add all boundary vertices whilst not exceeding the specified number of nonzeros, we do the following extension:
we perform a breadthfirst search that is initialized with a random permutation of the boundary vertices. All additional vertices that are reached by the BFS are added to . As soon as the number of nonzeros is reached, the algorithm stops.
Start at Promising Vertices.
Especially for high values of the boundary contains many vertices. The Boundary strategy quickly adds a lot of random vertices while ignoring vertices that have high gain. However, note that even in good partitions it is possible that vertices with positive gain exist but cannot be moved due to the balance constraint.
Hence, our second strategy, Gain, tries to fix this issue by starting a breadthfirst search initialized with only high gain vertices. More precisely, we initialize the BFS with each vertex having gain where is a tuning parameter. Our last strategy, TopVertices, starts by sorting the boundary vertices by their gain. We break ties uniformly at random. Vertices are then traversed in decreasing order (highest gain vertices first) and for each start vertex our algorithm adds all vertices with distance to the model. The algorithm stops as soon as the number of nonzeros exceeds .
Early gainbased local search heuristics for the balanced graph partitioning problem searched for pairwise swaps with positive gain [17, 25]. More recent algorithms generalized this idea to also search for cycles or paths with positive total gain [35]. An important advantage of our new approach is that we solve the combination problem to optimality, i.e. our algorithm finds the best combination of vertex movements of the vertices in w.r.t to the input partition of the original graph. Therefore we can also find more complex optimizations that cannot be reduced to positive gain cycles and paths.
5 Experiments
5.1 Experimental Setup and Methodology
We implemented the algorithms using C++17 and compiled all codes using g++7.2.0 with full optimization (O3). We use Gurobi 7.5.2 as an ILP solver and always use its parallel version. We perform experiments on the Phase 2 Haswell nodes of the SuperMUC supercomputer. The Phase 2 of SuperMUC consists of 3072 nodes, each with two Haswell Xeon E52697 v3 processors. Each node has 28 cores at 2.6GHz, as well as 64GB of main memory and runs the SUSE Linux Enterprise Server (SLES) operating system. Unless otherwise mentioned, our approach uses the sharedmemory parallel variant of Gurobi using all 28 cores of a single node of the machine. In general, we perform five repetitions per instance and report the average running time as well as cut. Unless otherwise mentioned, we use a time limit for the integer linear program. When the time limit is passed, the integer linear program solver outputs the best solution that has currently been discovered. This solution does not have to be optimal. Note that we do not perform experiments with Metis [24] and Scotch [32] in here, since previous papers, e.g. [33, 34]
, have already shown that solution quality obtained is much worse than results achieved in the Walshaw benchmark. When averaging over multiple instances, we use the geometric mean in order to give every instance the same influence on the
final score.Performance Plots.
These plots relate the fastest running time to the running time of each other ILPbased local search algorithm on a perinstance basis. For each algorithm, these ratios are sorted in increasing order. The plots show the ratio on the yaxis to highlight the instances in which each algorithm performs badly. For plots in which we measure solution quality, the yaxis shows the ratio cutcut. A point close to zero indicates that the running time/quality of the algorithm was considerably worse than the fastest/best algorithm on the same instance. A value of one therefore indicates that the corresponding algorithm was one of the fastest/best algorithms to compute the solution. Thus an algorithm is considered to outperform another algorithm if its corresponding ratio values are above those of the other algorithm. In order to include instances that hit the time limit, we set the corresponding values below zero for ratio computations.
Instances.
We perform experiments on two sets of instances. Set is used to determine the performance of the integer linear programming optimizations and to tune the algorithm. We obtained these instances from the Florida Sparse Matrix collection [10] and the 10th DIMACS Implementation Challenge [5] to test our algorithm. Set are all graphs from Chris Walshaw’s graph partitioning benchmark archive [40, 41]. This archive is a collection of instances from finiteelement applications, VLSI design and is one of the default benchmarking sets for graph partitioning.
Table 1 gives basic properties of the graphs from both benchmark sets. We ran the unoptimized integer linear program that solves the graph partitioning problem to optimality from Section 4.1 on the five smallest instances from the Walshaw benchmark set. With a time limit of minutes, the solver has only been able to compute a solution for two graphs with . For higher values of the solver was unable to find any solution in the time limit. Even applying feasible optimizations does not increase the amount of ILPs solved. Hence, we omit further experiments in which we run an ILP solver on the full graph.
Graph  Graph  

Walshaw Graphs (Set B)  Walshaw Graphs (Set B)  
add20  2 395  7 462  wing  62 032  K 
data  2 851  15 093  brack2  62 631  K 
3elt  4 720  13 722  finan512  74 752  K 
uk  4 824  6 837  fe_tooth  78 136  K 
add32  4 960  9 462  fe_rotor  99 617  K 
bcsstk33  8 738  K  598a  110 971  K 
whitaker3  9 800  28 989  fe_ocean  143 437  K 
crack  10 240  30 380  144  144 649  M 
wing_nodal  10 937  75 488  wave  156 317  M 
fe_4elt2  11 143  32 818  m14b  214 765  M 
vibrobox  12 328  K  auto  448 695  M 
bcsstk29  13 992  K  
4elt  15 606  45 878  Parameter Tuning (Set A)  
fe_sphere  16 386  49 152  delaunay_n15  32 768  98 274 
cti  16 840  48 232  rgg_15  32 768  K 
memplus  17 758  54 196  2cubes_sphere  101 492  K 
cs4  22 499  43 858  cfd2  123 440  M 
bcsstk30  28 924  M  boneS01  127 224  M 
bcsstk31  35 588  K  Dubcova3  146 689  M 
fe_pwt  36 519  K  G2_circuit  150 102  K 
bcsstk32  44 609  K  thermal2  1 227 087  M 
fe_body  45 087  K  as365  3 799 275  M 
t60k  60 005  89 440  adaptive  6 815 744  M 
5.2 Impact of Optimizations
We now evaluate the impact of the optimization strategies for the ILP that we presented in Section 4.3. In this section, we use the variant of our local search algorithm in which is obtained by starting depthone breadthfirst search at the highest gain vertices, and set the limit on the nonzeros in the ILP to . However, we expect the results in terms of speedup to be similar for different vertex selection strategies. To evaluate the ILP performance, we run KaFFPa using the strong preconfiguration on each of the graphs from set using and and then use the computed partition as input to each ILP (with the different optimizations). As the optimizations do not change the objective value achieved in the ILP, we only report running times of our different approaches. We set the time limit of the ILP solver to 30 minutes.
We use five variants of our algorithm: Basic does not contain any optimizations; BasicSym enables symmetry breaking; BasicSymSSol additionally gives the input partitioning to the ILP solver. The two variants BSSSConst= and BSSSConst are the same as BasicSymSSol with additional constraints: BSSSConst= has the additional constraint that the objective has to be smaller or equal to the start solution, BSSSConst has the constraint that the objective value of a solution must be better than the objective value of the start solution. Figure 3 summarises the results.
In our experiments, the basic configuration reaches the time limit in 95 out of the 300 runs. Overall, enabling symmetry breaking drastically speeds up computations. On all of the instances which the Basic configuration could solve within the time limit, each other configuration is faster than the Basic configuration. Symmetry breaking speeds up computations by a factor of 41 in the geometric mean on those instances. The largest obtained speedup on those instances was a factor of 5663 on the graph adaptive for . The configuration solves all but the two instances (boneS01, ) and (Dubcova3, ) within the time limit. Additionally providing the start solution (BasicSymSSol) gives an addition speedup of 22% on average. Over the Basic configuration, the average speedup is 50 with the largest speedup being 6495 and the smallest speedup being 47%. This configuration can solve all instances within the time limit except the instance boneS01 for . Providing the objective function as a constraint (or strictly smaller constraint) does not further reduce the running time of the solver. Instead, the additional constraints even increase the running time. We adhere this to the fact that the solver has to do additional work to evaluate the constraint. We conclude that BasicSymSSol is the fastest configuration of the ILP. Hence, we use this configuration in all the following experiments. Moreover, from Figure 2 we can see that this configuration can solve most of the instance within the time limit if the number of nonzeros in the ILP is below . Hence, we set the parameter to in the following section.
5.3 Vertex Selection Rules
We now evaluate the vertex selection strategies to find the set of vertices that model the ILP. We look at all strategies described in Section 4.4, i.e. Boundary, Gain with the parameter as well as TopVertices for . To evaluate the different selection strategies, we use the best of five runs of KaFFPa strong on each of the graphs from set using and and then use the computed partition as input to the ILP (with different sets ). Table 2 summarizes the results of the experiment, i.e. the number of cases in which our algorithm was able to improve the result, the average running time in seconds for these selection strategies as well as the number of cases in which the strategy computed the best result (the partition having the lowest cut). We set the time limit to days to be able to finish almost all runs without running into timeout. For the average running time we exclude all graphs in which at least one algorithm did not finish in days (rgg_15 , delaunay_n15 , G2_circuit ). If multiple runs share the best result, they are all counted. However, when no algorithm improves the input partition on a graph, we do not count them.
Gain  TopVertices  Boundary  
Relative Number of Improvements  
2  70%  70%  70%  50%  70%  70%  70% 
4  50%  60%  80%  70%  70%  70%  80% 
8  50%  60%  78%  60%  60%  60%  48% 
16  30%  50%  70%  40%  30%  30%  40% 
32  60%  60%  46%  50%  50%  20%  20% 
64  70%  70%  50%  30%  20%  20%  0% 
Average Running Time  
2  189.943s  292.573s  357.145s  34.045s  61.152s  92.452s  684.198s 
4  996.934s  628.950s  428.353s  87.357s  255.223s  558.578s  1467.595s 
8  552.183s  244.470s  244.046s  105.737s  167.164s  340.900s  96.763s 
16  118.532s  52.547s  90.363s  53.385s  141.814s  243.957s  34.790s 
32  40.300s  24.607s  94.146s  27.156s  80.252s  116.023s  7.596s 
64  15.866s  21.908s  24.253s  14.627s  30.558s  44.813s  4.187s 
Relative Number Best Algorithm  
2  20%  60%  50%  10%  10%  0%  60% 
4  10%  0%  50%  10%  0%  0%  30% 
8  0%  20%  30%  10%  10%  10%  26% 
16  0%  10%  54%  10%  0%  10%  20% 
32  0%  8%  38%  0%  0%  0%  4% 
64  0%  16%  36%  0%  0%  0%  0% 
Looking at the number of improvements, the Boundary strategy is able to improve the input for small values of , but with increasing number of blocks improvements decrease to no improvement in all runs with . Because of the limit on the number of nonzeros, the ILP contains only random boundary vertices for large values of in this case. Hence, there are not sufficiently many high gain vertices in the model and fewer improvements for large values of are expected. For small values of , the Boundary strategy can improve as many as the Gain strategy but the average running times are higher.
For , the strategy Gain has the highest number of improvements, for it is surpassed by the strategy Gain. However, the strategy Gain finds the best cuts in most cases among all tested strategies. Due to the way these strategies are designed, they are able to put a lot of high gain vertices into the model as well as vertices that can be used to balance vertex movements. The TopVertices strategies are overall also able to find a large number of improvements. However, the found improvements are typically smaller than the Gain strategies. This is due to the fact that the TopVertices strategies grow BFS balls with a predefined depth around high gain vertices first, and later on are not able to include vertices that could be used to balance their movement. Hence, there are less potential vertex movements that could yield an improvement.
For almost all strategies, we can see that the average running time decreases as the number of blocks increases. This happens because we limit the number of nonzeros in our ILP. As the number of nonzeros grows linear with the underlying model size, the models are far smaller for higher values of . Using symmetry breaking, we already fixed the block of the vertices which represent the vertices not part of . Thus the ILP solver can quickly prune branches which would place vertices connected heavily to one of these vertices in a different block. Additionally, our data indicate that a large number of small areas in our model results faster in solve times than when the model contains few large areas. The performance plot in Figure 3 shows that the strategies Boundary, TopVertices and Gain have lower running times than other strategies. These strategies all select a large number of vertices to initialize the breadthfirst search. Therefore they output a vertex set that is the union of many small areas around these vertices. Variants that initialize the breadthfirst search with fewer vertices have fewer areas, however each of the areas is larger.
5.4 Walshaw Benchmark
In this section, we present the results when running our best configuration on all graphs from Walshaw’s benchmark archive. Note that the rules of the benchmark imply that running time is not an issue, but algorithms should achieve the smallest possible cut value while satisfying the balance constraint. We run our algorithm in the following setting: We take existing partitions from the archive and use those as input to our algorithm. As indicated by the experiments in Section 5.3, the vertex selection strategies Gain perform best for different values of . Thus we use the variant Gain for and both Gain and Gain otherwise in this section. We repeat the experiment once for each
/  

2  
4  
8  
16  
32  
64  
sum 
instance (graph, ) and run our algorithm for and . For larger values of , we strengthen our strategy and use as a bound for the number of nonzeros. Table 3 summarizes the results and Table 7 in the Appendix gives detailed perinstance results.
When running our algorithm using the currently best partitions provided in the benchmark, we are able to improve 38% of the currently reported perfectly balanced results. We are able to improve a larger amount of results for larger values of , more specifically, out of the partitions with , we can improve of all perfectly balanced partitions. This is due to the fact that the graph partitioning problem becomes more difficult for larger values of . There is a wide range of improvements with the smallest improvement being for graph auto with and and with the largest improvement that we found being for fe_body for and . The largest absolute improvement we found is for bcsstk32 with and . In general, the total number of improvements becomes less if more imbalance is allowed. This is also expected since traditional local search methods have a larger amount of freedom to move vertices. However, the number of improvements still shows that the method is also able to improve a large number of partitions for large values of even if more imbalance is allowed.
6 Conclusions and Future Work
We presented a novel metaheuristic for the balanced graph partitioning problem. Our approach is based on an integer linear program that solves a model to combine unconstraint vertex movements into a global feasible improvement. Through a given input partition, we were able to use symmetry breaking and other techniques that make the approach scale to large inputs. In Walshaw’s well known benchmark tables, we were able to improve a large amount of partitions given in the benchmark.
In the future, we plan to further improve our implementation and integrate it into the KaHIP framework. We would like to look at other objective functions as long as they can be modelled linearly. Moreover, we want to investigate weather this kind of contractions can be useful for other ILPs. It may be interesting to find cores for contraction by using the information provided an evolutionary algorithm like KaFFPaE [34], i.e. if many of the individuals of the population of the evolutionary algorithm agree that two vertices should be put together in a block then those should be contracted in our model. Lastly, besides using other exact techniques like branchandbound to solve our combination model, it may also be worthwhile to use a heuristic algorithm instead.
Acknowledgements
The research leading to these results has received funding from the European Research Council under the European Community’s Seventh Framework Programme (FP7/20072013) /ERC grant agreement No. 340506. Moverover, the authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gausscentre.eu) for funding this project by providing computing time on the GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (www.lrz.de).
References
 [1] C. J. Alpert and A. B. Kahng. Recent Directions in Netlist Partitioning: A Survey. Integration, the VLSI Journal, 19(12):1–81, 1995.

[2]
C. J. Alpert, A. B. Kahng, and S. Z. Yao.
Spectral Partitioning with Multiple Eigenvectors.
Discrete Applied Mathematics, 90(1):3–26, 1999.  [3] M. Armbruster. BranchandCut for a Semidefinite Relaxation of LargeScale Minimum Bisection Problems. PhD thesis, 2007.

[4]
M. Armbruster, M. Fügenschuh, C. Helmberg, and A. Martin.
A Comparative Study of Linear and Semidefinite BranchandCut
Methods for Solving the Minimum Graph Bisection Problem.
In
Proc. of the 13th International Conference on Integer Programming and Combinatorial Optimization
, volume 5035 of LNCS, pages 112–124. Springer, 2008.  [5] D. A. Bader, H. Meyerhenke, P. Sanders, C. Schulz, A. Kappes, and D. Wagner. Benchmarking for Graph Clustering and Partitioning. In Encyclopedia of Social Network Analysis and Mining, pages 73–82. Springer, 2014.
 [6] C. Bichot and P. Siarry, editors. Graph Partitioning. Wiley, 2011.
 [7] R. Brillout. A MultiLevel Framework for Bisection Heuristics. 2009.
 [8] T. N. Bui and C. Jones. Finding Good Approximate Vertex and Edge Partitions is NPHard. Information Processing Letters, 42(3):153–159, 1992.
 [9] A. Buluç, H. Meyerhenke, I. Safro, P. Sanders, and C. Schulz. Recent Advances in Graph Partitioning. In Algorithm Engineering, pages 117–158. Springer, 2016.
 [10] T. Davis. The University of Florida Sparse Matrix Collection.
 [11] D. Delling, A. V. Goldberg, T. Pajor, and R. F. Werneck. Customizable Route Planning. In Proc. of the 10th International Symposium on Experimental Algorithms, volume 6630 of LCNS, pages 376–387. Springer, 2011.
 [12] D. Delling, A. V. Goldberg, I. Razenshteyn, and R. F. Werneck. Exact Combinatorial BranchandBound for Graph Bisection. In Proc. of the 12th Workshop on Algorithm Engineering and Experimentation (ALENEX’12), pages 30–44, 2012.
 [13] D. Delling and R. F. Werneck. Better Bounds for Graph Bisection. In Proc. of the 20th European Symposium on Algorithms, volume 7501 of LNCS, pages 407–418, 2012.
 [14] A. Feldmann and P. Widmayer. An Time Algorithm to Compute the Bisection Width of Solid Grid Graphs. In Proc. of the 19th European Conference on Algorithms, volume 6942 of LNCS, pages 143–154. Springer, 2011.

[15]
A. Felner.
Finding Optimal Solutions to the Graph Partitioning Problem with
Heuristic Search.
Annals of Mathematics and Artificial Intelligence
, 45:293–322, 2005.  [16] C. E. Ferreira, A. Martin, C. C. De Souza, R. Weismantel, and L. A. Wolsey. The Node Capacitated Graph Partitioning Problem: A Computational Study. Mathematical Programming, 81(2):229–256, 1998.
 [17] C. M. Fiduccia and R. M. Mattheyses. A LinearTime Heuristic for Improving Network Partitions. In Proc. of the 19th Conference on Design Automation, pages 175–181, 1982.
 [18] J. Fietz, M. Krause, C. Schulz, P. Sanders, and V. Heuveline. Optimized Hybrid Parallel Lattice Boltzmann Fluid Flow Simulations on Complex Geometries. In Proc. of EuroPar 2012 Parallel Processing, volume 7484 of LNCS, pages 818–829. Springer, 2012.
 [19] P. Galinier, Z. Boujbel, and M. C. Fernandes. An Efficient Memetic Algorithm for the Graph Partitioning Problem. Annals of Operations Research, 191(1):1–22, 2011.
 [20] A. George. Nested Dissection of a Regular Finite Element Mesh. SIAM Journal on Numerical Analysis, 10(2):345–363, 1973.
 [21] W. W. Hager, D. T. Phan, and H. Zhang. An Exact Algorithm for Graph Partitioning. Mathematical Programming, 137(12):531–556, 2013.

[22]
M. Hein and S. Setzer.
Beyond Spectral Clustering  Tight Relaxations of Balanced Graph Cuts.
In Advances in Neural Information Processing Systems, pages 2366–2374, 2011.  [23] S. E. Karisch, F. Rendl, and J. Clausen. Solving Graph Bisection Problems with Semidefinite Programming. INFORMS Journal on Computing, 12(3):177–191, 2000.
 [24] G. Karypis and V. Kumar. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM Journal on Scientific Computing, 20(1):359–392, 1998.
 [25] B. W. Kernighan and S. Lin. An Efficient Heuristic Procedure for Partitioning Graphs. The Bell System Technical Journal, 49(1):291–307, 1970.
 [26] T. Kieritz, D. Luxen, P. Sanders, and C. Vetter. Distributed TimeDependent Contraction Hierarchies. In Proc. of the 9th International Symposium on Experimental Algorithms, volume 6049 of LNCS, pages 83–93. Springer, 2010.
 [27] A. H. Land and A. G. Doig. An Automatic Method of Solving Discrete Programming Problems. Econometrica, 28(3):497–520, 1960.
 [28] U. Lauther. An Extremely Fast, Exact Algorithm for Finding Shortest Paths in Static Networks with Geographical Background, 2004.
 [29] A. Lisser and F. Rendl. Graph Partitioning using Linear and Semidefinite Programming. Mathematical Programming, 95(1):91–101, 2003. doi:10.1007/s101070020342x.
 [30] D. Luxen and D. Schieferdecker. Candidate Sets for Alternative Routes in Road Networks. In Proc. of the 11th International Symposium on Experimental Algorithms (SEA’12), volume 7276 of LNCS, pages 260–270. Springer, 2012.
 [31] R. H. Möhring, H. Schilling, B. Schütz, D. Wagner, and T. Willhalm. Partitioning Graphs to Speedup Dijkstra’s Algorithm. Journal of Experimental Algorithmics (JEA), 11(2006), 2007.
 [32] F. Pellegrini. Scotch Home Page. http://www.labri.fr/pelegrin/scotch.
 [33] P. Sanders and C. Schulz. Engineering Multilevel Graph Partitioning Algorithms. In Proc. of the 19th European Symp. on Algorithms, volume 6942 of LNCS, pages 469–480. Springer, 2011.
 [34] P. Sanders and C. Schulz. Distributed Evolutionary Graph Partitioning. In Proc. of the 12th Workshop on Algorithm Engineering and Experimentation (ALENEX’12), pages 16–29, 2012.
 [35] P. Sanders and C. Schulz. Think Locally, Act Globally: Highly Balanced Graph Partitioning. In Proc. of the 12th Int. Symp. on Experimental Algorithms (SEA’13), LNCS. Springer, 2013.
 [36] K. Schloegel, G. Karypis, and V. Kumar. Graph Partitioning for High Performance Scientific Simulations. In The Sourcebook of Parallel Computing, pages 491–541, 2003.
 [37] C. Schulz and D. Strash. Graph Partitioning Formulations and Applications to Big Data. In Encyclopedia on Big Data Technologies, 2018, to appear.
 [38] M. Sellmann, N. Sensen, and L. Timajev. Multicommodity Flow Approximation used for Exact Graph Partitioning. In Proc. of the 11th European Symposium on Algorithms, volume 2832 of LNCS, pages 752–764. Springer, 2003.
 [39] N. Sensen. Lower Bounds and Exact Algorithms for the Graph Partitioning Problem Using Multicommodity Flows. In Proc. of the 9th European Symposium on Algorithms, volume 2161 of LNCS, pages 391–403. Springer, 2001.
 [40] A. J. Soper, C. Walshaw, and M. Cross. A Combined Evolutionary Search and Multilevel Optimisation Approach to GraphPartitioning. Journal of Global Optimization, 29(2):225–241, 2004.
 [41] C. Walshaw. Walshaw Partitioning Benchmark. http://staffweb.cms.gre.ac.uk/~wc06/partition/.
 [42] C. Walshaw and M. Cross. JOSTLE: Parallel Multilevel GraphPartitioning Software – An Overview. In Mesh Partitioning Techniques and Domain Decomposition Techniques, pages 27–58. 2007.
Appendix A Additional Tables
Graph / k  2  4  8  16  32  64  

add20  596  596  1151  1151  1681  1681  2040  2040  *2360  2361  ^2947  2949 
data  189  189  382  382  668  668  1127  1127  1799  1799  2839  2839 
3elt  90  90  201  201  345  345  573  573  960  960  1532  1532 
uk  19  19  41  41  83  83  145  145  *^246  247  408  408 
add32  11  11  34  34  67  67  118  118  213  213  485  485 
bcsstk33  10171  10171  21717  21717  34437  34437  54680  54680  77414  77414  107185  107185 
whitaker3  127  127  381  381  656  656  1085  1085  1668  1668  2491  2491 
crack  184  184  366  366  679  679  1088  1088  *1678  1679  2535  2535 
wing_nodal  1707  1707  3575  3575  5435  5435  *8333  8334  11768  11768  *^15774  15775 
fe_4elt2  130  130  349  349  607  607  1007  1007  1614  1614  2475  2478 
vibrobox  10343  10343  18976  18976  24484  24484  *^31848  31850  *39474  39477  *46568  46571 
bcsstk29  2843  2843  8035  8035  13975  13975  21905  21905  *34733  34737  55241  55241 
4elt  139  139  326  326  545  545  *^933  934  1551  1551  ^2564  2565 
fe_sphere  386  386  768  768  1156  1156  1714  1714  2488  2488  3543  3543 
cti  334  334  954  954  1788  1788  2793  2793  4046  4046  5629  5629 
memplus  *5499  5513  *9442  9448  *^11710  11712  ^12893  12895  *^13947  13953  ^16188  16223 
cs4  369  369  932  932  1440  1440  2075  2075  *2907  2928  ^4025  4027 
bcsstk30  6394  6394  16651  16651  34846  34846  *^70407  70408  113336  113336  *171148  171153 
bcsstk31  2762  2762  7351  7351  *13280  13283  *23857  23869  *37143  37158  *57354  57402 
fe_pwt  340  340  705  705  1447  1447  2830  2830  *^5574  5575  ^8177  8180 
bcsstk32  4667  4667  9311  9311  *^20008  20009  *^36249  36250  *60013  60038  *90778  90895 
fe_body  262  262  599  599  1033  1033  *1722  1736  ^2797  2846  *4728  4730 
t60k  79  79  209  209  456  456  ^812  813  1323  1323  *^2074  2077 
wing  789  789  1623  1623  2504  2504  ^3870  3876  ^5592  5594  ^7622  7625 
brack2  731  731  3084  3084  7140  7140  11570  11570  ^17382  17387  *25805  25808 
finan512  162  162  324  324  648  648  1296  1296  2592  2592  10560  10560 
fe_tooth  3816  3816  *6888  6889  *11414  11418  *^17352  17355  *24879  24885  *34234  34240 
fe_rotor  2098  2098  7222  7222  ^12838  12841  *20389  20391  *31132  31141  *45677  45687 
598a  2398  2398  8001  8001  *15921  15922  *25694  25702  *38576  38581  *^56094  56097 
fe_ocean  464  464  1882  1882  4188  4188  7713  7713  ^12667  12684  ^20061  20069 
144  6486  6486  ^15194  15196  25273  25273  *37566  37571  *55467  55475  *77391  77402 
wave  8677  8677  *17193  17198  *29188  29198  *42639  42646  *61100  61108  ^83987  83994 
m14b  3836  3836  *13061  13062  *25834  25838  *42161  42172  *65469  65529  ^96446  96452 
auto  *^10101  10103  *27092  27094  *45991  46014  ^77391  77418  *121911  121944  ^172966  172973 
Graph / k  2  4  8  16  32  64  

add20  585  585  1147  1147  *^1680  1681  2040  2040  2361  2361  2949  2949 
data  188  188  376  376  656  656  1121  1121  1799  1799  2839  2839 
3elt  89  89  199  199  340  340  568  568  953  953  1532  1532 
uk  19  19  40  40  80  80  142  142  246  246  408  408 
add32  10  10  33  33  66  66  117  117  212  212  485  485 
bcsstk33  10097  10097  21338  21338  34175  34175  54505  54505  77195  77195  106902  106902 
whitaker3  126  126  380  380  654  654  1083  1083  1664  1664  2480  2480 
crack  183  183  362  362  676  676  1081  1081  1669  1669  2523  2523 
wing_nodal  1695  1695  3559  3559  5401  5401  8302  8302  *11731  11733  *^15734  15736 
fe_4elt2  130  130  349  349  603  603  1000  1000  1608  1608  ^2470  2472 
vibrobox  10310  10310  18943  18943  24422  24422  *^31710  31712  *^39396  39400  *46529  46541 
bcsstk29  2818  2818  8029  8029  13891  13891  21694  21694  34606  34606  *^54950  54951 
4elt  138  138  320  320  532  532  927  927  1535  1535  2546  2546 
fe_sphere  386  386  766  766  1152  1152  1708  1708  2479  2479  3534  3534 
cti  318  318  944  944  1746  1746  2759  2759  3993  3993  5594  5594 
memplus  *5452  5457  9385  9385  11672  11672  12873  12873  ^13931  13933  ^16091  16110 
cs4  366  366  925  925  1434  1434  2061  2061  2903  2903  ^3981  3982 
bcsstk30  6335  6335  16583  16583  34565  34565  69912  69912  112365  112365  170059  170059 
bcsstk31  2699  2699  7272  7272  *^13134  13137  *23333  23339  *37057  37061  *57000  57025 
fe_pwt  340  340  704  704  1432  1432  2797  2797  5514  5514  ^8128  8130 
bcsstk32  4667  4667  9180  9180  *19612  19624  35617  35617  *59501  59504  *89893  89905 
fe_body  262  262  598  598  1023  1023  1714  1714  ^2748  2756  *^4664  4674 
t60k  75  75  208  208  454  454  805  805  1313  1313  2062  2062 
wing  784  784  1610  1610  2474  2474  3857  3857  ^5576  5577  ^7585  7586 
brack2  708  708  3013  3013  7029  7029  11492  11492  *17120  17128  ^25604  25607 
finan512  162  162  324  324  648  648  1296  1296  2592  2592  10560  10560 
fe_tooth  3814  3814  *6843  6844  11358  11358  *^17264  17265  *24799  24804  ^34159  34170 
fe_rotor  2031  2031  7158  7158  12616  12616  ^20146  20152  *30975  30982  *45304  45321 
598a  2388  2388  7948  7948  15831  15831  *25620  25624  ^38410  38422  *55867  55882 
fe_ocean  ^385  387  1813  1813  *4060  4063  7616  7616  ^12523  12524  *19851  19852 
144  *6476  6478  15140  15140  *25225  25232  *37341  37347  *55258  55277  *76964  76980 
wave  *^8656  8657  ^16745  16747  *28749  28758  *42349  42354  *60617  60625  ^83451  83466 
m14b  3826  3826  12973  12973  *^25626  25627  *42067  42080  *64684  64697  ^96145  96169 
auto  9949  9949  *26611  26614  *45424  45429  *76533  76539  *120470  120489  ^171866  171880 
Graph / k  2  4  8  16  32  64  

add20  560  560  1134  1134  1673  1673  2030  2030  2346  2346  2920  2920 
data  185  185  369  369  638  638  1088  1088  1768  1768  *2781  2783 
3elt  87  87  198  198  334  334  561  561  944  944  1512  1512 
uk  18  18  39  39  78  78  139  139  240  240  397  397 
add32  10  10  33  33  66  66  117  117  212  212  476  476 
bcsstk33  10064  10064  20762  20762  34065  34065  54354  54354  76749  76749  *105737  105742 
whitaker3  126  126  378  378  649  649  1073  1073  1647  1647  *2456  2459 
crack  182  182  360  360  671  671  1070  1070  1655  1655  *^2487  2489 
wing_nodal  1678  1678  3534  3534  5360  5360  8244  8244  *11630  11632  *^15612  15613 
fe_4elt2  130  130  341  341  595  595  990  990  1593  1593  ^2431  2435 
vibrobox  10310  10310  18736  18736  24153  24153  *^31440  31443  *39197  39201  *46231  46235 
bcsstk29  2818  2818  7971  7971  13710  13710  21258  21258  33807  33807  54382  54382 
4elt  137  137  319  319  522  522  901  901  1519  1519  2512  2512 
fe_sphere  384  384  764  764  1152  1152  1696  1696  2459  2459  *^3503  3505 
cti  318  318  916  916  1714  1714  2727  2727  3941  3941  *5522  5524 
memplus  *^5352  5353  9309  9309  *^11584  11586  12834  12834  *13887  13895  *15950  15953 
cs4  360  360  917  917  *^1423  1424  2043  2043  *2884  2885  ^3979  3980 
bcsstk30  6251  6251  16372  16372  34137  34137  69357  69357  110334  110334  *168271  168274 
bcsstk31  2676  2676  7148  7148  12962  12962  *22949  22956  *36567  36587  *56025  56038 
fe_pwt  340  340  700  700  1410  1410  2754  2754  5403  5403  8036  8036 
bcsstk32  4667  4667  8725  8725  19485  19485  *^34869  34875  ^58739  58740  *89478  89479 
fe_body  262  262  598  598  1016  1016  1693  1693  *^2708  2709  *^4522  4523 
t60k  71  71  203  203  449  449  792  792  1302  1302  *^2034  2036 
wing  773  773  1593  1593  2451  2451  ^3783  3784  5559  5559  7560  7560 
brack2  684  684  2834  2834  6778  6778  *11253  11256  *^16981  16982  *^25362  25363 
finan512  162  162  324  324  648  648  1296  1296  2592  2592  10560  10560 
fe_tooth  3788  3788  6756  6756  11241  11241  *17107  17108  *24623  24625  *33779  33795 
fe_rotor  1959  1959  *^7049  7050  12445  12445  *19863  19867  *30579  30587  *44811  44822 
598a  2367  2367  7816  7816  15613  15613  *^25379  25380  *38093  38105  *55358  55364 
fe_ocean  311  311  1693  1693  3920  3920  7405  7405  ^12283  12288  19518  19518 
144  *^6430  6432  15064  15064  *24901  24905  *^36999  37003  *54800  54806  *76548  76557 
wave  8591  8591  ^16633  16638  28494  28494  42139  42139  *60334  60356  *82809  82811 
m14b  3823  3823  12948  12948  25390  25390  41778  41778  ^64354  64364  *^95575  95587 
auto  9673  9673  25789  25789  *^44724  44732  *^75665  75679  ^119131  119132  ^170295  170314 
Graph / k  2  4  8  16  32  64  

add20  536  536  1120  1120  1657  1657  2027  2027  2341  2341  2920  2920 
data  181  181  363  363  628  628  1076  1076  1743  1743  2747  2747 
3elt  87  87  197  197  329  329  557  557  930  930  1498  1498 
uk  18  18  39  39  75  75  137  137  236  236  394  394 
add32  10  10  33  33  63  63  117  117  212  212  476  476 
bcsstk33  9914  9914  20158  20158  33908  33908  54119  54119  ^76070  76079  *105297  105309 
whitaker3  126  126  376  376  644  644  1068  1068  1632  1632  *^2425  2429 
crack  182  182  360  360  666  666  1063  1063  1655  1655  *^2479  2489 
wing_nodal  1668  1668  3520  3520  5339  5339  8160  8160  *11533  11536  *^15514  15515 
fe_4elt2  130  130  335  335  578  578  979  979  1571  1571  ^2406  2412 
vibrobox  10310  10310  18690  18690  23924  23924  ^31216  31218  *^38823  38826  *45987  45994 
bcsstk29  2818  2818  7925  7925  13540  13540  20924  20924  33450  33450  53703  53703 
4elt  137  137  315  315  515  515  887  887  1493  1493  ^2478  2482 
fe_sphere  384  384  762  762  1152  1152  1678  1678  2427  2427  3456  3456 
cti  318  318  889  889  1684  1684  2701  2701  3904  3904  ^5460  5462 
memplus  *^5253  5263  *9281  9292  *^11540  11543  12799  12799  *13857  13867  *15875  15877 
cs4  353  353  908  908  1420  1420  ^2042  2043  *2855  2859  *^3959  3962 
bcsstk30  6251  6251  16165  16165  34068  34068  68323  68323  109368  109368  *166787  166790 
bcsstk31  *^2660  2662  7065  7065  *^12823  12825  *22718  22724  *36354  36358  *55250  55258 
fe_pwt  340  340  700  700  1405  1405  2737  2737  ^5305  5306  ^7956  7959 
bcsstk32  4622  4622  8441  8441  18955  18955  34374  34374  58352  58352  *88595  88598 
fe_body  262  262  588  588  1012  1012  1683  1683  *^2677  2678  ^4500  4501 
t60k  65  65  195  195  441  441  787  787  *1289  1291  *^2013  2015 
wing  770  770  *1589  1590  2440  2440  3775  3775  *^5512  5513  ^7529  7534 
brack2  660  660  2731  2731  6592  6592  *11052  11055  16765  16765  *25100  25108 
finan512  162  162  324  324  648  648  1296  1296  2592  2592  10560  10560 
fe_tooth  3773  3773  6687  6687  *^11147  11151  *16983  16985  ^24270  24274  *33387  33403 
fe_rotor  1940  1940  6779  6779  *12308  12309  *19677  19680  *30355  30356  *44368  44381 
598a  2336  2336  *7722  7724  15413  15413  25198  25198  ^37632  37644  *54677  54684 
fe_ocean  311  311  1686  1686  3886  3886  7338  7338  ^12033  12034  *^19391  19394 
144  6345  6345  ^14978  14981  *24174  24179  *^36608  36608  *54160  54168  *75753  75777 
wave  8524  8524  *16528  16531  28489  28489  *^42024  42025  *^59608  59611  *81989  82006 
m14b  3802  3802  *^12858  12859  25126  25126  *41097  41098  *63397  63411  *94123  94140 
auto  9450  9450  25271  25271  44206  44206  *74266  74272  *118998  119004  ^169260  169290 