Faster exact solution of sparse MaxCut and QUBO problems

02/04/2022
by   Daniel Rehfeldt, et al.
0

The maximum-cut problem is one of the fundamental problems in combinatorial optimization. With the advent of quantum computers, both the maximum-cut and the equivalent quadratic unconstrained binary optimization problem have experienced much interest in recent years. This article aims to advance the state of the art in the exact solution of both problems – by using mathematical programming techniques on digital computers. The main focus lies on sparse problem instances, although also dense ones can be solved. We enhance several algorithmic components such as reduction techniques and cutting-plane separation algorithms, and combine them in an exact branch-and-cut solver. Furthermore, we provide a parallel implementation. The new solver is shown to significantly outperform existing state-of-the-art software for sparse MaxCut and QUBO instances. Furthermore, we improve the best known bounds for several instances from the 7th DIMACS Challenge and the QPLIB, and solve some of them (for the first time) to optimality.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

04/24/2020

Faster Parallel Multiterminal Cuts

We give an improved branch-and-bound solver for the multiterminal cut pr...
11/05/2020

A revisited branch-and-cut algorithm for large-scale orienteering problems

The orienteering problem is a route optimization problem which consists ...
09/10/2019

Faster quantum and classical SDP approximations for quadratic binary optimization

We give a quantum speedup for solving the canonical semidefinite program...
04/30/2020

On Solving Cycle Problems with Branch-and-Cut: Extending Shrinking and Exact Subcycle Elimination Separation Algorithms

In this paper, we extend techniques developed in the context of the Trav...
09/17/2019

Preprocessing and Cutting Planes with Conflict Graphs

This paper addresses the implementation of conflict graph-based routines...
07/11/2019

State-of-The-Art Sparse Direct Solvers

In this chapter we will give an insight into modern sparse elimination m...
03/04/2022

Improving Ant Colony Optimization Efficiency for Solving Large TSP Instances

Ant Colony Optimization (ACO) is a family of nature-inspired metaheurist...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Given an undirected graph , and edge weights , the maximum-cut (MaxCut) problem is to find a partition of such that the summed weight of the edges between and is maximized. MaxCut is one of the fundamental -hard optimization problems (28) and has applications for example in VLSI design (3) and the theory of spin glasses in physics (33)111As a side note, the 2021 Nobel prize in Physics was awarded for work on spin glasses.. The latter application is particularly interesting, because it requires an exact solution of the MaxCut problem.

A problem that is equivalent to MaxCut is the quadratic unconstrained binary optimization (QUBO) problem. Given a matrix , the corresponding QUBO problem can be formulated as

Any QUBO instance can be formulated as a MaxCut instance in a graph with vertices, and any MaxCut instance on a graph can be formulated as a QUBO instance with , see e.g. (4). The focus of this article is mostly on MaxCut algorithms, but due to the just mentioned equivalence, all results can be (and indeed are) applied to QUBO as well.

The huge recent interest in quantum computing has also put MaxCut and QUBO in the spotlight: Both of them can be heuristically solved by current quantum annealers. However, Jünger et al. 

(25) demonstrate on a wide range of test-sets that digital computing methods prevail against state-of-the-art quantum annealers.

For digital computers, many heuristics have been proposed both for MaxCut and QUBO. See Dunning et. al. (13) for a recent overview. There have also been various articles on exact solution. See Barahona et al. (4) for an early, Rendl et al. (37) for a more recent, and Jünger et al. (25) for an up-to-date overview. In the last years, more focus has been put on the development of methods that are best suited for dense instances, see for example (22; 23; 31) for state-of-the-art methods. However, the maximum number of nodes for MaxCut (or number of variables for QUBO) instances that can be handled by these methods is roughly 300. In contrast, this article aims to advance the state of the art in the practical exact solution of sparse MaxCut and QUBO instances. The largest (sparse) instance solved in this article has more than 10 000 nodes.

1.1 Contribution and structure

This article describes the design and implementation of a branch-and-cut based MaxCut and QUBO solver. In particular, we suggest several algorithmic improvements of key components of a branch-and-cut framework.

Section 2

shows how to efficiently solve a well-known linear programming (LP) relaxation for the MaxCut problem by using cutting planes. Among other things, we demonstrate how the separation of maximally violated constraints, which was described by many authors as being too slow for practical use, can be realized with quite moderate run times.

Section 3 is concerned with another vital component within branch-and-cut: reduction techniques. We review methods from the literature and propose new ones. The reduction methods can be applied for preprocessing and domain propagation.

Section 4 shows how to integrate the techniques from the previous to sections as well as several additional methods in a branch-and-cut algorithm. Parallelization is also discussed.

Section 5 provides computational results of the newly implemented MaxCut and QUBO solver on a large collection of test-sets from the literature. It is shown that the new solver outperforms the previous state of the art. Furthermore, the best known solutions of several benchmark instances can be improved and one is even solved (for the first time) to optimality.

1.2 Preliminaries and notation

In the remainder of this article, we assume that a MaxCut instance with graph and edge weights is given. Graphs are always assumed to be undirected and simple, i.e., without parallel edges or self-loops. Given a graph , we refer to the vertices and edges of any subgraph as and respectively, An edge between vertices is denoted by . An edge set is called a cycle. A cycle is called simple if all its vertices have degree in . An edge is called a chord of if both and are contained in (an edge of) . If no such exists, we say that is chordless. Given a graph and a , we define the induced edge cut as .

Finally, for any function with finite, and any , we define .

2 Solving the relaxation: efficient separation of odd-cycle cuts

This section is concerned with an integer programming (IP) formulation for MaxCut due to Barahona and Mahjoub (5), given below.

Formulation 1.

Odd-cycle cuts

(1)
(2)
(3)

The formulation is based on the observation that for any edge cut and any cycle the number of their common edges, namely , is even. This property is enforced by the constraints (2). These constraints are called cycle inequalities.

2.1 Cutting plane separation

Barahona and Mahjoub (5) show that the LP-relaxation of Formulation 1 can be solved in polynomial time. More precisely, they describe how to separate the constraints (3) in polynomial time, as demonstrated in the following. First, rewrite the constraints (3) as

(4)

Next, construct a new graph from the MaxCut graph . This graph consists of two copies and of , connected by the following additional edges. For each let and be the corresponding vertices in and , respectively. For each edge let and be in

. Finally, for any (LP-relaxation vector)

define the following edge weights on : For each , set and . The construction is exemplified in Figure 3. Consider, for example, the edge in Figure (a)a. The weight of the corresponding (dashed) edges and in Figure (b)b is . The weight of the corresponding (bold) edges and is .

(a) Original graph.

(b) Auxiliary graph, consisting of two copies of the original graph (bold edges), and additional connecting edges (dashed).
Figure 3: MaxCut graph and corresponding auxiliary graph for cycle cut separation.

Given an LP-relaxation vector , we can find violated inequalities (3) as follows. For each compute a shortest path between and in the weighted graph . By construction of , such a path contains an odd number of edges which are neither in nor . Let be the corresponding set of edges in ; i.e. for each edge or that is in the shortest path, let be in . Furthermore, the edges of the shortest path correspond to a closed walk in . The length of the shortest path in is equal to . Thus, if for each the corresponding shortest path between and in has length at least , the vector is an optimal solution to the LP-relaxation of Formulation 1. Otherwise, we have found at least one violated constraint.

Although shortest paths can be computed in polynomial time, the literature has so far considered the above separation procedure as too time-consuming to be directly used in practical exact MaxCut or QUBO solution. Instead, heuristics are employed and exact cycle separation is only used if no more cuts can be found otherwise, see, e.g., (3; 4; 9; 25; 33). However, as we will show in the following, the exact separation can actually be realized in a practically quite efficient way.

2.2 Fast computation of maximally-violated constraints

Initially, we observe that it is usually possible to considerably reduce the size of the auxiliary graph described above. First, all edges of with (or practically, with being sufficiently close to ) can be removed. Because all edge weights are non-negative, no such edges can be contained in a path of weight smaller than . Second, one can contract edges with . Both of these operations can be done implicitly while creating the auxiliary graph (e.g., edges with weight are never added). In this way, one can use cache-efficient, static data structures, such as the compressed-sparse-row format, see e.g. (29), for representing the auxiliary graph.

For computing a shortest path, we use a modified version of Dijkstra’s algorithm. For any vertex in the auxiliary graph let denote the distance of to the start vertex, as computed by the algorithm. We use the following modifications. First, we stop the execution of the algorithm as soon as we scan a vertex with . Second, as already observed in Jünger and Mallach (26), one does need to not proceed the shortest path computation from any vertex, say , in the auxiliary graph where the twin vertex, , has already been scanned and the following condition holds: .

Finally, we use an optimized implementation of Dijkstra’s algorithm together with a specialized binary heap. For the latter, we exploit the fact that the values (i.e. vertex indices) of the key, value pairs inserted into the heap are natural numbers bounded by the number of vertices of the auxiliary graph.

2.3 Post-processing

As already mentioned above, the edges of the shortest path computed in the auxiliary graph correspond to a closed walk —but not necessarily to a simple cycle. Thus, Jünger and Mallach (26) suggest to extract all simple cycles from such a closed walk and separate the corresponding inequalities. We follow this suggestion (although we note that this modification is performance neutral in our implementation).

Barahona and Mahjoub (5) observe that a cycle inequality is only facet-defining if the corresponding cycle is chordless. If a cycle has a chord , one readily obtains two smaller cycles and with and . One verifies that any cycle inequality defined on can be written as the sum of two cycle inequalities defined on and , where is in the odd edges set of exactly one of the two cycle inequalities. Jünger and Mallach (27) suggest a procedure to extract from any simple cycle with corresponding violated cycle-inequality a chordless cycle whose cycle-inequality is also violated. This procedure runs in . However, a disadvantage of this approach is that it finds only one such chordless cycle, which might not be the smallest or most violated one. Additionally, there can be several such chordless cycles. In the following, we suggest a procedure to find several non-overlapping chordless cycles with corresponding violated cycle inequality from a given cycle with violated cycle inequality.

Consider a simple cycle and let with odd. Assume there is a vector such that the cycle inequality corresponding to and is violated, that is:

For each define and store the following information.

  • ,

  • .

This information can be computed in total time : Traverse the nodes , of in this order and compute the above two values for from .

With the above information at hand, traverse for each the incident edges of . Whenever a chord with is found, check whether the cycle inequality of one or both of the corresponding cycles is violated. This check can be performed in constant time by using the precomputed information for the indices and . For example, if is even, one of the corresponding two cycle inequalities is

If a violated cycle inequality is found, add the corresponding chord together with a flag that indicates which of the two possible cycles is to be used to some (initially empty) queue . Once the incident edges of all nodes for have been traversed, sort the elements of according to the size of the corresponding cycles in non-decreasing order. Consider all indices of the original cycle as unmarked. Check the (implicit) cycles in in non-decreasing order. Let with be the corresponding chord. If both and are unmarked, mark the indices . Otherwise, discard the (implicit) cycle. Finally, add all cycle inequalities corresponding to non-discarded cycles to the cut pool. The overall procedure runs in . In practice, its run time is completely neglectable.

Finally, we suggest a procedure to obtain additional cycle cuts from the auxiliary graph. This approach is particularly useful for MaxCut instances with few vertices, because the number of generated cycle inequalities separated in each round is limited by the number of vertices of the MaxCut instance (if we ignore additional cycle-inequalities that are possibly found by the above post-processing). The procedure makes use of the symmetry of the auxiliary graph. Assume that we have computed a shortest path between a pair of vertices, say and , as described above. Recall that denotes the distance of any vertex to the start vertex . If there is a twin pair of vertices such that none of them are part of the shortest path between and , and , we can get another violated cycle inequality as follow: First, we take the - path computed by the algorithm. Second, we consider the - path, and transform it to an - path (of same length) by exploiting the symmetry of the auxiliary graph. By combining the two paths, we obtain an - path of length .

3 Simplifying the problem: reduction techniques

Reduction techniques are a key ingredient for the exact solution of many -hard optimization problems, such as Steiner tree (36) or vertex coloring (34). For QUBO, several reductions methods have been suggested in the literature. Basic techniques can already be found in Hammer et al. (21). The perhaps most extensive reduction framework is given in Tavares et. al. (10). Recently, Glover et al. (18) provided efficient realizations and extensions of the classic first and second order derivative and co-derivative techniques (20). We have implemented the methods from Glover et al. (18) for this article. However, we do not provide details, but rather concentrate on MaxCut reduction techniques in the following.

For MaxCut, there are several articles that discuss reduction techniques for unweighted MaxCut. Ferizovic et al. (14) provide the practically most powerful collection of such techniques. Lange et al. (32) provide techniques for general (weighted) MaxCut instances. In the following, we will describe some of their methods. Furthermore, we suggest new MaxCut reduction methods. Their practical strength will be demonstrated in Section 5.

Initially, we note that any edge with weight can be removed from . Any solution to this reduced version of can be extended to a solution of same weight to the original instance (in linear time). Thus, in the following we assume no edges have weight . We also note that for the incidence vector of any graph cut one obtains a corresponding (but not unique) vertex assignment that satisfies for all the relation . This correspondence will be used repeatedly in the following.

3.1 Cut-based reduction techniques

The first reduction technique from Lange et al. (32) is based on the following proposition.

Proposition 1 ((32)).

Let and such that . If

then there is an optimal solution to with , where if , and if .

Note that in the case of , one can simply contract . In the case of , one needs to multiply the weights of the incident edges of one of the endpoints of by before the contraction.

One way to check for all whether an exists such that the conditions of Proposition 1 are satisfied is by using Gomory-Hu trees. We have only implemented a simpler check that considers for an edge the sets and as , as already suggested in Lange et al. (32). A combined check for all edges can be made in . We note that this test corresponds to the first order derivative reduction method (mentioned above) for QUBO. This relation can be readily verified by means of the standard transformations between MaxCut and QUBO.

The next reduction technique from Lange et al. (32) is based on triangles, and is given below.

Proposition 2 ((32)).

Assume there is a triangle in with edges , , . Let such that , and such that . If

and

then there is an optimal solution to with .

Similarly to the previous proposition, we only implemented tests for the simple cases of , , , and for and , respectively.

In the following, we propose a new reduction test based on triangles, which complements the above one from Lange et al. (32).

Proposition 3.

Assume there is a triangle in with edges , , such that , , and . Let such that and let such that . If

(5)

and

(6)

then there is an optimal solution to such that .

Proof.

Let be a feasible solution to with . We will construct a feasible solution with such that . Thus, there exists at least one optimal solution with .

Because , it needs to holds that either

(7)

or

(8)

We just consider the case (7); the second one can be handled in an analogeous way. Let be a vertex assignment corresponding to ; i.e., for all it holds that . Define a new vertex assignment as follows

Let be the cut corresponding to ; i.e., for all it holds if , and otherwise. Note that for all it holds that . For all it holds that . In particular,

(9)

because of . Thus, we obtain

which concludes the proof. ∎

As for the previous triangle test, we only consider the simple cases of , , , and for and in our implementation.

Note that Lange et al. (32) furthermore propose a generalization of Proposition 2 to more general connected subgraphs. Also Proposition 3 could be generalized in a similar way. However, since we only implemented reductions tests for the triangle conditions, we do not provide details on this generalization here. We also note that exploiting this more general condition for effective practical reductions is not straight-forward and seems computationally considerably more expensive than the triangle tests.

3.2 Further reduction techniques

In the following, we propose two additional reduction methods, based on different techniques. One uses the reduced-costs of the LP-relaxation of Formulation 1, and one exploits simple symmetries in MaxCut instances.

We start with the latter. If successful, the test based on the following proposition allows one to contract two (possibly non-adjacent) vertices.

Proposition 4.

Assume there are two distinct vertices such that . If there exists a non-zero such that for all pairs with , and moreover

  • in case of

  • in case of ,

then there is an optimal vertex solution to such that if , and if .

Proof.

We consider only the case ; the case can be shown in a similar way. Let with . We will construct a with such that the weight of the induced cut of is not lower than the weight of the induced cut of . In this way, the proof is complete, because we can apply this construction also for any optimal vertex assignment

Let be the induced cut of . Assume that

(10)

Otherwise, switch the roles of and in the following.

Let such that for all . Note that is well-defined. Define a new cut as follows

Because of (10) and it holds that . ∎

The condition of Proposition 4 can be checked efficiently in practice by using hashing techniques, similar to the ones used for the parallel rows test for mixed-integer programs (2).

A well-known reduction method for binary integer programs, which was already used for MaxCut (4), is as follows. Consider a feasible solution to the LP-relaxation of Formulation 1, with reduced-costs , and with objective value . Further, let be the weight of a graph cut. If for an it holds that and , one can fix . If for a it holds that and , one can fix . This method can also be used for LP-solutions (obtained during separation) that satisfy only a subset of the cycle inequalities (2). In the following, we will only consider optimal LP-solutions (possibly for a subset of the cycle inequalities). Since we furthermore consider only LP-solutions obtained by the Simplex algorithm, all non-zero variables have reduced-cost .

From incident fixed edges one obtains a (non-unique) partial vertex assignment . This assignment can be used to obtain additional fixings, as detailed in the following proposition.

Proposition 5.

Let be an optimal solution to the LP-relaxation of Formulation 1, with reduced-costs , and objective value . Let be an upper bound on the weight of a maximum-cut. Let and such that for any optimal vertex assignment it holds that for all . Further, let and define

and

For any optimal vertex assignment the following conditions hold. If , then . If , then .

The proposition follows from standard linear programming results. If one of the conditions of the proposition is satisfied, one can fix all edges between and .

4 Solving to optimality: branch-and-cut

This section describes how to incorporate the methods introduced so far together with additional components in an exact branch-and-cut algorithm. This branch-and-cut algorithm has been implemented based on the academic MIP solver SCIP (7). Besides being a stand-alone MIP solver, SCIP provides a general branch-and-cut framework. Most importantly, we rely on SCIP for organizing the branch-and-bound search, and the cutting plane management. Most native, general-purpose algorithms of SCIP such as primal heuristics, conflict analysis, or generic cutting planes are deactivated by our solver for performance reasons.

4.1 Key components

In the following, we list the main components of the branch-and-cut framework that was implemented for this article.

Presolving

For presolving, the reduction methods described in this article are executed iteratively within a loop. This loop is reiterated as long as at least one edge has been contracted during the previous round, and the predefined maximum number of loop passes has not been reached yet.

Domain propagation

For domain propagation we use the reduced-cost criteria described in Section 3.2. The simple single-edge fixing is done by the generic reduced-costs propagator plug-in of SCIP. For the new implication based method we have implemented an additional propagator.

A classic propagation method, e.g. (4), is as follows: Consider the connected components induced by edges that have been fixed to 0 or 1. All additional edges in these connected components can be readily fixed. However, this technique brought no benefits in our experiments, since the variable values of such edges are implied by the cycle inequalities (2).

Decomposition

It is well-known that connected components of the graph underlying a MaxCut instance can be solved separately, see e.g. (25). More generally, one can solve biconnected components separately (this simple observation does not seem to have been mentioned in the MaxCut literature so far). Since several benchmark instances used in this article contain many very small biconnected components, we solve components with a limited number of vertices by enumeration. In this way, we avoid the overhead associated with creating and solving a new MaxCut instance for each subproblem.

Primal heuristics

Primal heuristics are an important component of practical branch-and-bound algorithms: First, to find an optimal solution (verified by the dual-bound), and second to find strong primal bounds that allow the algorithm to cut off many branch-and-bound nodes. For computing an initial primal solution, we have implemented the MaxCut heuristic by Burer et. al. (11). We further use the Kernighan-Lin algorithm (30) to improve any (intermediary) solution found by the algorithm of Burer et. al. Additionally, we use this combined algorithm as a local search heuristic whenever a new best primal solution has been found during the branch-and-bound search. In this case, we initiate the heuristic with this new best solution (which can be done by translating the solution into the two-dimensional angle vectors required by the heuristic).

We also implemented the spanning-tree heuristic from Barahona et al. (4), which uses a given (not necessarily optimal) LP-solution to find graph cuts. We execute this heuristic after each separation round.

Separation

In each separation round, we initially try to find violated cycle inequalities on triangles of the underlying graph (by enumerating some triangles). Next, we use shortest-path computations to find additional violated cuts, as described in Section 2.2 and Section 2.3. Among the speed-up techniques, we have not (yet) implemented the contraction of edges, since the separation routine is already quite fast and other implementations seemed more promising.

Branching

We simply branch on the edge variables and use the pseudo-costs branching strategy of SCIP, see Gamrath (16) for more details. Initial experiments showed that the default branching strategy of SCIP, reliable pseudo-costs branching, spends too much time on strong-branching to be competitive.

4.2 Parallelization

For parallelizing our solver, we use the Ubiquity Generator Framework (UG(38), a software package to parallelize branch-and-bound based solvers—for both shared- and distributed-memory environments. UG implements a Supervisor-Worker load coordination scheme (35). Importantly, Supervisor functions make decisions about the load balancing without actually storing the data associated with the branch-and-bound (B&B) search tree.

A major problem of parallelizing the B&B search lies in the simple fact that parallel resources can only be used efficiently once the number of open B&B nodes is sufficiently large. Thus, we employ so-called racing ramp-up (35): Initially, each thread (or process) starts the solving process of the given problem instance, but each with different (customized) parameters and random seeds. Additionally, we reserve some threads to exclusively run primal heuristics. During the racing, information such as improved primal solutions or global variable fixings is being exchanged among the threads. We terminate the racing once a predefined number of open B&B nodes has been created by one thread, or the problem has been solved. Once the racing has been terminated and the problem is still unsolved, the open nodes are distributed among the threads and the actual parallel solving phase starts.

5 Computational results

This section provides computational results on a large collection of MaxCut and QUBO instances from the literature. We look at the impact of individual components, and furthermore compare the new solver with the state of the art for the exact solution of MaxCut and QUBO instances. An overview of the test-sets used in the following is given in Table 1. The second column gives the number of instances per test-sets. The third and fourth columns give the range of nodes and edges in the case of MaxCut, or the range of variables and non-zero coefficients in the case of QUBO.

Only a few exact MaxCut or QUBO solvers are publicly available, and some, such as BiqMac (37) and BiqBin (22), only as web services. Still, the state-of-the-art solvers BiqCrunch (31) and MADAM (23) are freely available, even with source code. However, we have observed that both of these solvers are outperformed on most instances listed in Table 1 by the recent release 9.5 of the state-of-the-art commercial solver Gurobi (19). Gurobi solves mixed-integer quadratic programs, which are a superclass of QUBO. In fact, the standard benchmark library for quadratic programs, QPLIB (15), contains various QUBO instances. Compared to the previous release 9.1, Gurobi 9.5 has hugely improved on QUBO (and thereby also MaxCut) instances. For example, while Gurobi 9.1 could not solve any of the IsingChain instances from Table 1 in one hour (with one thread), Gurobi 9.5 solves all of them in less than one minute. Thus, in the following, we will use Gurobi 9.5 as a reference for our new solver. We will also provide results from the literature, but the comparison with Gurobi 9.5 allows us to obtain results in the same computational environment.

Very recently, an article describing a new solver, called McSparse, specialized to sparse MaxCut and QUBO instances was published (12). The computational experiments in (12) demonstrate that McSparse outperforms previous MaxCut and QUBO solvers on sparse instances. Like BiqMac (37) and BiqBin (22), this solver is only available via a web interface. However, we will still provide some comparison with our solver in the following by using the results published in (12).

The computational experiments were performed on a cluster of Intel Xeon Gold 5122 CPUs with 3.60 GHz, and 96 GB RAM per compute node. We ran only one job per compute node at a time, to avoid a distortion of the run time measures—originating for example from shared (L3) cache. For our solver, we use the commerical CPLEX 12.10 (24) as LP-solver, although our solver also allows for the use of the non-commercial (but slower) SoPlex (7) instead. For Gurobi we set the parameter MipGap to . Otherwise, we would obtain suboptimal solutions even for many instances with integer weights.

Name # V E Description
DIMACS 4 512 - 3375 1536-10125 Instances introduced at the
7th DIMACS Challenge.
Mannino 4 48 - 487 1128-8511 Instances from a radio frequency assignment problem,
introduced in (9).
PM1s 10 100 495 Instances generated with the rudy framework,
from the BiqMac Lib (40).
W01 10 100 495 Instances generated with the rudy framework,
from the BiqMac Lib (40).
K64-chimera 80 2049 8064 Instances on D-Wave Chimera graphs
introduced in (25).
Kernel 14 33 - 2888 91-2981 Instances from various sources
collected by (14).
IsingChain 30 100 - 300 4950-44850 Instances from an application in statistical physics,
introduced in (33)
Torus 18 100 - 343 200-1029 2D and 3D torus instances from an application in
statistical physics, introduced in (33)
Name # Description
QPLIB 22 120 - 1225 602-34876 QUBO instances from the standard benchmark software
for quadratic programs, see (15).
BQP100 10 100 471-528 Randomly generated instances
introduced in  (6).
BQP250 10 250 3039-3208 Randomly generated instances
introduced in (6).
BE120.3 10 120 2176-2253 Randomly generated instances
introduced in (8).
BE250 10 250 3268-3388 Randomly generated instances
introduced in (8).
GKA 35 20 - 125 204-7788 Randomly generated instances with different densities,
introduced in (17)
Table 1: Details of MaxCut (upper part) and QUBO (lower part) test-sets used in this article.

5.1 Individual components

This section takes a look at individual algorithmic components introduced in Section 2 and Section 3.

First, we show the run time required for our improved separation of cycle inequalities. Table 2 reports per test-set the average (arithmetic mean) percentual time required for the separation procedure (column four), as well as for solving the LP-relaxations (column five). Recall that the latter is done by CPLEX 12.10, one of the leading commercial LP-solvers. For more than half of the test-sets the average time required for the separation is less than 10 %. Also for the remaining test-sets it is always less than 20 %. Notably, this time also includes adding the cuts (including the triangle inequalities) to the cut pool, which requires additional computations. The time could be further reduced by contracting 0-weight edges in the auxiliary graph, as described in Section 2. Notably, both the separation time and LP-solution time are very small for the IsingChain and Kernel instances. This behavior is due to the fact that many of these instances are already solved during presolving, as detailed in the following,

Name # Sepa-time [%] LP-time [%]
BE120.3 10 2.5 79.3
BE250 10 2.8 84.3
BQP100 10 3.0 32.0
BQP250 10 2.6 86.2
GKA 35 6.7 49.1
IsingChain 30 0.0 0.0
K64-chimera 80 10.1 58.3
Kernel 14 1.3 4.8
PM1s 10 13.1 78.3
QPLIB 22 18.1 65.5
Torus 18 16.7 15.5
W01 10 12.0 59.3
Table 2: Average times spent in separation and (re-) optimization of the LP for MaxCut and QUBO test-sets.

Next, we demonstrate the strength of the reduction methods implemented for this article. Only results for the MaxCut test-sets are reported. We show the impact of the MaxCut reduction techniques from (32) described in Section 3 as well as the QUBO reduction techniques from (18)—by using the standard problem transformations between QUBO and MaxCut. We refer to the combination of these two as base preprocessing. Additionally, the methods described in Proposition 3 and Proposition 4 are referred to as new techniques. Note that Proposition 5 cannot be applied, because no reduced-costs are available.

Table 3 shows in the first column the name of the test-set, followed by its number of instances. The next columns show the percentual average number of nodes and edges of the instances after the preprocessing without (column three and four), and with (columns five and six) the new methods. The last two columns report the percentual relative change between the previous results. The run time is not reported, because it is on all instances below 0.05 seconds.

The new reduction techniques have an impact on five of the eight test-sets. The strongest reductions occur on Kernel and IsingChain. We remark that the symmetry-based reductions from Proposition 4 have a very small impact, and only allow for contracting a few dozen additional edges on Kernel. We also note that while the IsingChain instances are already drastically reduced by the base preprocessing, the new methods still have an important impact, as they reduce the number of edges of several instances from more than a thousand to less than 300. The IsingChain instances were already completely solved by reduction techniques in Tavares (39), by using maximum-flow based methods. However the run time was up to three orders of magnitudes larger than in our case. The machine used by Tavares had a Pentium 4 CPU at 3.60 GHz, thus being significantly slower than the machines used for this article. Still, also when taking the different computing environments into account, the run time difference is huge.

base preprocessing +new techniques relative change
Test-set # V [%] E [%] V [%] E [%] V [%] E [%]
IsingChain 30 6.1 0.8 1.1 0.05 -82.0 -93.8
K64-chimera 80 3.1 4.6 3.1 4.6 0.0 0.0
Kernel 14 24.1 30.1 16.4 20.6 -32.0 -31.6
Mannino 4 64.1 69.3 63.2 68.7 -1.4 -0.9
Torus 18 80.6 87.5 78.5 85.2 -2.6 -2.6
W01 10 99.1 94.8 99.1 94.8 0.0 0.0
DIMACS 4 97.0 98.9 96.9 98.9 -0.1 0.0
PM1s 10 99.7 99.9 99.7 99.9 0.0 0.0
Table 3: Average remaining size of MaxCut instances after preprocessing.

5.2 Exact solution

This section compares Gurobi

9.5 and our new solver with respect to the mean time, the maximum time, and the number of solved instances. For the mean time we use the shifted geometric mean 

(1) with a shift of second. In this section, we use only single-thread mode. Table 4 provides the results for a time-limit of one hour. The second column shows the number of instances in the test-set. Columns three gives the number of instances solved by Gurobi, column four the number of instances solved by our solver. Column five and six show the mean time taken by Gurobi and our solver. The next column gives the relative speedup of our solver. The last three columns provide the same information for the maximum run time, Speedups that signify an improved performance of the new solver are marked in bold.

It can be seen that our solver consistently outperforms Gurobi 9.5—both with respect to mean and maximum time. Also, it solves on each test-set at least as many instances as Gurobi. The only test-set where Gurobi performs better is BQP100, which, however, can be solved by both solvers in far less than a second.

On the other test-sets, the mean time of the new solver is better, often by large factors (up to ). On the instance sets that can both be fully solved, the maximum time taken by the new solver is in most cases also much smaller. On five of the test-sets, the new solver can solve more instances to optimality than Gurobi 9.5.

# solved mean time (sh. geo. mean) maximum time
Test-set # Grb new Grb [s] new [s] speedup     Grb [s] new [s] speedup
PM1s 10 10 10 192.3 21.0 9.16 303.3 48.6 6.24
W01 10 10 10 44.1 3.1 14.23 97.1 21.4 4.54
Kernel 14 14 14 0.7 0.1 7.00 14.3 1.1 13.00
IsingChain 30 30 30 1.3 0.05 26.00 41.0 0.05 820.00
Torus 18 18 18 3.8 0.4 9.50 628.0 7.6 82.63
K64-chimera 80 80 80 90.1 1.5 60.07 195.4 8.6 22.72
QPLIB 22 8 13 687.4 173.3 3.97 3600 3600 1.00
BQP100 10 10 10 0.1 0.1 1.00 0.2 0.4 0.50
BQP250 10 0 7 3600 654.1 5.50 3600 3600 1.00
BE120.3 10 9 10 265.6 60.2 4.41 3600 820.0 4.40
BE250 10 0 8 3600 609.3 5.91 3600 3600 1.00
GKA 35 29 31 6.5 6.0 1.08 3600 3600 1.00
Table 4: Comparison of Gurobi 9.5 (Grb) and new solver (new).

Finally, we compare our solver with the very recent QUBO and MaxCut solver McSparse, specialized on sparse instances. In Table 5 we provide an instance-wise comparison of our solver and McSparse. We provide the number of branch-and-bound nodes (columns three and four) and the run times (columns five and six) of McSparse and our solver per problem instance. We use the 14 instances that were selected in the article by Charfreitag et al. (12) as being representative of their test-bed. The first seven instances are MaxCut and the last seven QUBO problems. Charfreitag et al. (12) only use one thread per run. Their results were obtained on a system with AMD EPYC 7543P CPUs at 2.8 GHz, and with 256 GB RAM. CPU benchmarks222https://www.cpubenchmark.net/singleThread.html#server-thread consider this system to be faster than the one used in this article, already in single-thread mode. Furthermore, McSparse is embedded into Gurobi (version 9.1), which is widely regarded as the fastest commercial MIP-solver, whereas our solver is based on the non-commercial SCIP, although we also use a commercial LP-solver.

# B&B nodes run time
Name V E MS new     MS [s] new [s]
pm1s_100.3 100 495 341 737 48.2 48.0
pw01_100.0 100 495 171 179 20.0 8.5
mannino_k487b 487 5391 1 15 167.3 2.9
bio-diseasome 516 1188 1 1 9.5 0.6
ca-netscience 379 914 1 1 0.1 0.0
g000981 110 188 1 1 0.0 0.0
imgseg_138032 12736 23664 1 1 30.5 4.4
Name MS new     MS [s] new [s]
bqp250-3 250 3092 25 15 414.1 85.8
gka2c 50 813 1 1 0.5 0.3
gka4d 100 2010 129 71 219.6 61.9
gka5c 80 721 1 1 0.1 0.1
gka7a 30 241 1 1 0.0 0.0
be120.3.5 120 2248 111 63 257.7 62.4
be250.3 250 3277 107 81 841.0 212.0
Table 5: Comparison of McSparse (MS) and our solver (new) on seven MaxCut and seven QUBO instances (considered to be representative (12)).

As to the number of branch-and-bound nodes, the pictures is somewhat mixed—with McSparse requiring fewer nodes on three, and more nodes on four instances. Notably, McSparse also includes clique-cuts separation, which is not implemented in our solver, and a specialized branching strategy, while we use a simple generic one. These two features might explain the smaller number of nodes on some instances. As to the run time, five instances can be solved in less than a second by both solvers (with the new solver being slightly faster). On the remaining nine instances, the new solver is always faster—for all but one instances by a factor of more than . On one instance (mannino_k487b), it is even by a factor of more than faster.

5.3 Parallelization

Although parallelization is not the main topic of this article, we still provide some corresponding results in the following. To give insights into the strengths and weaknesses of our racing-based parallelization, we provide instance-wise results. We use the test-sets Mannino and DIMACS, which both contain instances that cannot be solved within one hour by Gurobi and our new solver in single-thread mode. The sizes of the instances are given in Table 6.

Name V E Name V E
torusg3-8 512 1536 mannino_k487a.mc 487 1435
toruspm3-8-50 512 1536 mannino_k487b.mc 487 5391
torusg3-15 3375 10125 mannino_k487c.mc 487 8511
toruspm3-15-50 3375 10125 mannino_k48.mc 48 1128
Table 6: Details on DIMACS (left) and Mannino (right) instances.

Table 7 provides results of Gurobi and the new solver on the DIMACS and Mannino instances. Both solvers are run once with one thread and once with eight threads. As before, a time-limit of one hour is used. The table provides the percentual primal-dual gap, as well as the run time. The results reveal for both solvers a performance degradation on easy instances with increased number of threads. Most notably on mannino_k487b, where Gurobi takes almost 10 times longer with eight threads. On the other hand, the new solver shows a strong speedup on two hard instances that cannot be solved in one hour singke-threaded, namely toruspm3-8-50 and mannino_k487c. On the latter, one even observes a super-linear speedup. This speedup can be at least partly attributed to the exclusive use of primal heuristics on one thread during racing, which finds an optimal solution quickly in both cases. On the other hand, in single-thread mode the best primal solution is sub-optimal even at the time-limit.

primal-dual gap [%] run time [s]
Name Grb-T1 Grb-T8 new-T1 new-T8 Grb-T1 Grb-T8 new-T1 new-T8
torusg3-8 0.0 0.0 0.0 0.0 1494.2 1178.5 8.5 9.3
toruspm3-8-50 1.8 1.8 0.5 0.0 3600 3600 3600 1415.8
torusg3-15 6.8 3.4 1.3 0.4 3600 3600 3600 3600
toruspm3-15-50 9.5 12.2 2.3 2.3 3600 3600 3600 3600
mannino_k487a 0.0 0.0 0.0 0.0 3.5 10.7 1.1 1.3
mannino_k487b 0.0 0.0 0.0 0.0 9.2 80.5 2.9 2.8
mannino_k487c 0.1 0.0 0.1 0.0 3600 3176.7 3600 398.2
mannino_k48 0.0 0.0 0.0 0.0 0.1 0.4 2.7 3.8
Table 7: Results of Gurobi 9.5 (Grb) and the new solver (new), with one (-T1) and eight (-T8) threads each.

Finally, Table 8 provides results for several previously unsolved MaxCut and QUBO benchmark instances from the QPLIB and the 7th DIMACS Challenge. We also report the previous best known solution values (previous primal) from the literature, which were taken from the QPLIB and the MQLib (13). For the QPLIB instances we report the results from the one hour single-thread run in Section 5.2. However, for the DIMACS instances, torusg3-15 and toruspm3-15-50, we performed additional runs. Note that the DIMACS instances were originally intended to be solved with negated weights. However, it seems that most publications, e.g., (13), do not perform this transformation. Thus, we also use the unmodified instances, to allow for better comparison. However, we additionally report the solution values of the transformed instances, these transformed instances are marked by a . We used a machine with 88 cores of Intel Xeon E7-8880 v4 CPUs @ 2.20GHz. We ran the two instances (non-exclusively) for at most 3 days while using 80 threads. Both torusg3-15 and torusg3-15 could be solved to optimality in this way, but toruspm3-15-50 and toruspm3-15-50 still remained with a primal-dual gap of percent each.

Name gap [%] new primal previous primal
torusg3-15 opt 286626481 282534518
torusg3-15 opt 292031950 -
toruspm3-15-50 1.8 3010 2968
toruspm3-15-50 1.8 3008 -
QPLIB_3693 1.3 -1152 -1148
QPLIB_3850 1.7 -1194 -1192
Table 8: Improved solutions for MaxCut (first four) and QUBO (last two) benchmark instances.

6 Conclusion and outlook

This article has demonstrated how to design a state-of-the-art solver for sparse QUBO and MaxCut instances, by enhancing and combining key algorithmic ingredients such as presolving and cutting-plane generation. The newly implemented solver outperforms both the leading commercial and non-commercial competitors on a wide range of test-sets from the literature. Moreover, the best known solutions to several instances could be improved.

Still, there are various promising routes for further improvement. Examples would be a new branching strategy, or the implementation of additional separation methods such as clique-cuts (9). In this way, a considerable further speedup of the new solver might be achieved.

References

  • [1] Tobias Achterberg. Constraint Integer Programming. PhD thesis, Technische Universität Berlin, 2007.
  • [2] Tobias Achterberg, Robert E. Bixby, Zonghao Gu, Edward Rothberg, and Dieter Weninger. Presolve reductions in mixed integer programming. INFORMS J. Comput., 32(2):473–506, 2020.
  • [3] Francisco Barahona, Martin Grötschel, Michael Jünger, and Gerhard Reinelt. An application of combinatorial optimization to statistical physics and circuit layout design. Oper. Res., 36(3):493–513, 1988.
  • [4] Francisco Barahona, Michael Jünger, and Gerhard Reinelt. Experiments in quadratic 0-1 programming. Math. Program., 44(1-3):127–137, 1989.
  • [5] Francisco Barahona and Ali Ridha Mahjoub. On the cut polytope. Math. Program., 36(2):157–173, 1986.
  • [6] JE Beasley. Heuristic algorithms for the unconstrained binary quadratic programming problem. Technical report, 1998.
  • [7] Ksenia Bestuzheva, Mathieu Besançon, Wei-Kun Chen, Antonia Chmiela, Tim Donkiewicz, Jasper van Doornmalen, Leon Eifler, Oliver Gaul, Gerald Gamrath, Ambros Gleixner, Leona Gottwald, Christoph Graczyk, Katrin Halbig, Alexander Hoen, Christopher Hojny, Rolf van der Hulst, Thorsten Koch, Marco Lübbecke, Stephen J. Maher, Frederic Matter, Erik Mühmer, Benjamin Müller, Marc E. Pfetsch, Daniel Rehfeldt, Steffan Schlein, Franziska Schlösser, Felipe Serrano, Yuji Shinano, Boro Sofranac, Mark Turner, Stefan Vigerske, Fabian Wegscheider, Philipp Wellner, Dieter Weninger, and Jakob Witzig. The SCIP Optimization Suite 8.0. ZIB-Report 21-41, Zuse Institute Berlin, December 2021.
  • [8] Alain Billionnet and Sourour Elloumi. Using a mixed integer quadratic programming solver for the unconstrained quadratic 0-1 problem. Math. Program., 109(1):55–68, 2007.
  • [9] Thorsten Bonato, Michael Jünger, Gerhard Reinelt, and Giovanni Rinaldi. Lifting and separation procedures for the cut polytope. Math. Program., 146(1-2):351–378, 2014.
  • [10] Endre Boros, Peter L Hammer, and Gabriel Tavares. Preprocessing of unconstrained quadratic binary optimization. Technical report, 2006.
  • [11] Samuel Burer, Renato D. C. Monteiro, and Yin Zhang. Rank-two relaxation heuristics for MAX-CUT and other binary quadratic programs. SIAM J. Optim., 12(2):503–521, 2002.
  • [12] Jonas Charfreitag, Michael Jünger, Sven Mallach, and Petra Mutzel. McSparse: Exact solutions of sparse maximum cut and sparse unconstrained binary quadratic optimization problems. In Cynthia A. Phillips and Bettina Speckmann, editors, Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX 2022), online first, 2022.
  • [13] Iain Dunning, Swati Gupta, and John Silberholz. What works best when? A systematic evaluation of heuristics for max-cut and QUBO. INFORMS J. Comput., 30(3):608–624, 2018.
  • [14] Damir Ferizovic, Demian Hespe, Sebastian Lamm, Matthias Mnich, Christian Schulz, and Darren Strash. Engineering Kernelization for Maximum Cut. In Guy E. Blelloch and Irene Finocchi, editors, Proceedings of the Symposium on Algorithm Engineering and Experiments, ALENEX 2020, Salt Lake City, UT, USA, January 5-6, 2020, pages 27–41. SIAM, 2020.
  • [15] Fabio Furini, Emiliano Traversi, Pietro Belotti, Antonio Frangioni, Ambros M. Gleixner, Nick Gould, Leo Liberti, Andrea Lodi, Ruth Misener, Hans D. Mittelmann, Nikolaos V. Sahinidis, Stefan Vigerske, and Angelika Wiegele. QPLIB: a library of quadratic programming instances. Math. Program. Comput., 11(2):237–265, 2019.
  • [16] Gerald Gamrath. Enhanced predictions and structure exploitation in branch-and-bound. Technische Universitaet Berlin (Germany), 2020.
  • [17] Fred Glover, Gary A Kochenberger, and Bahram Alidaee. Adaptive memory tabu search for binary quadratic programs. Management Science, 44(3):336–345, 1998.
  • [18] Fred W. Glover, Mark W. Lewis, and Gary A. Kochenberger. Logical and inequality implications for reducing the size and difficulty of quadratic unconstrained binary optimization problems. Eur. J. Oper. Res., 265(3):829–842, 2018.
  • [19] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2021.
  • [20] Peter L. Hammer, Pierre Hansen, and Bruno Simeone. Roof duality, complementation and persistency in quadratic 0-1 optimization. Math. Program., 28(2):121–155, 1984.
  • [21] Peter L Hammer and Sergiu Rudeanu. Boolean methods in operations research and related areas. Springer, 1968.
  • [22] Timotej Hrga, Borut Luzar, Janez Povh, and Angelika Wiegele. Biqbin: Moving boundaries for np-hard problems by HPC. In Ivan Dimov and Stefka Fidanova, editors, Advances in High Performance Computing - Results of the International Conference on ”High Performance Computing”, HPC 2019, Borovets, Bulgaria, September 2-6, 2019, volume 902 of Studies in Computational Intelligence, pages 327–339. Springer, 2019.
  • [23] Timotej Hrga and Janez Povh. MADAM: a parallel exact solver for max-cut based on semidefinite programming and ADMM. Comput. Optim. Appl., 80(2):347–375, 2021.
  • [24] IBM. CPLEX, 2020.
  • [25] Michael Jünger, Elisabeth Lobe, Petra Mutzel, Gerhard Reinelt, Franz Rendl, Giovanni Rinaldi, and Tobias Stollenwerk. Quantum annealing versus digital computing: An experimental comparison. ACM J. Exp. Algorithmics, 26, jul 2021.
  • [26] Michael Jünger and Sven Mallach. Odd-cycle separation for maximum cut and binary quadratic optimization. In Michael A. Bender, Ola Svensson, and Grzegorz Herman, editors, 27th Annual European Symposium on Algorithms, ESA 2019, September 9-11, 2019, Munich/Garching, Germany, volume 144 of LIPIcs, pages 63:1–63:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019.
  • [27] Michael Jünger and Sven Mallach. Exact facetial odd-cycle separation for maximum cut and binary quadratic optimization. INFORMS J. Comput., 33(4):1419–1430, 2021.
  • [28] R. Karp. Reducibility among combinatorial problems. In R. Miller and J. Thatcher, editors, Complexity of Computer Computations, pages 85–103. Plenum Press, 1972.
  • [29] Jeremy Kepner and John Gilbert. Graph algorithms in the language of linear algebra. SIAM, 2011.
  • [30] Brian W. Kernighan and Shen Lin. An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J., 49(2):291–307, 1970.
  • [31] Nathan Krislock, Jérôme Malick, and Frédéric Roupin. Biqcrunch: A semidefinite branch-and-bound method for solving binary quadratic problems. ACM Trans. Math. Softw., 43(4):32:1–32:23, 2017.
  • [32] Jan-Hendrik Lange, Bjoern Andres, and Paul Swoboda. Combinatorial Persistency Criteria for Multicut and Max-Cut. In

    IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019

    , pages 6093–6102. Computer Vision Foundation / IEEE, 2019.

  • [33] Frauke Liers. Contributions to Determining Exact Ground-States of Ising Spin-Glasses and to their Physics. PhD thesis, University of Cologne, 2004.
  • [34] Jinkun Lin, Shaowei Cai, Chuan Luo, and Kaile Su. A reduction based method for coloring very large graphs. In Carles Sierra, editor,

    Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017

    , pages 517–523. ijcai.org, 2017.
  • [35] Ted Ralphs, Yuji Shinano, Timo Berthold, and Thorsten Koch. Parallel Solvers for Mixed Integer Linear Optimization, pages 283–336. Springer International Publishing, Cham, 2018.
  • [36] Daniel Rehfeldt and Thorsten Koch. Implications, Conflicts, and Reductions for Steiner Trees. In Mohit Singh and David P. Williamson, editors, Integer Programming and Combinatorial Optimization - 22nd International Conference, IPCO 2021, Atlanta, GA, USA, May 19-21, 2021, Proceedings, volume 12707 of Lecture Notes in Computer Science, pages 473–487. Springer, 2021.
  • [37] Franz Rendl, Giovanni Rinaldi, and Angelika Wiegele. Solving max-cut to optimality by intersecting semidefinite and polyhedral relaxations. Math. Program., 121(2):307–335, 2010.
  • [38] Yuji Shinano, Tobias Achterberg, Timo Berthold, Stefan Heinz, Thorsten Koch, and Michael Winkler. Solving open MIP instances with parascip on supercomputers using up to 80, 000 cores. In 2016 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016, Chicago, IL, USA, May 23-27, 2016, pages 770–779. IEEE Computer Society, 2016.
  • [39] Gabriel Tavares. New algorithms for Quadratic Unconstrained Binary Optimization (QUBO) with applications in engineering and social sciences. PhD thesis, Rutgers, the State University of New Jersey-New Brunswick, 2008.
  • [40] Angelika Wiegele. BiqMac Library: A collection of Max-Cut and quadratic 0-1 programming instances of medium size. Technical report, 2007.