In this paper, we consider the following mixed-integer optimization problem:
where is a closed, convex set in .
State-of-the-art algorithms for integer optimization are based on two ideas that are at the origin of mixed-integer programming and have been constantly refined: cutting planes and branch-and-bound. Decades of theoretical and experimental research into both these techniques is at the heart of the outstanding success of integer programming solvers. Nevertheless, we feel that there is lot of scope for widening and deepening our understanding of these tools. We have recently started building foundations for a rigorous, quantitative theory for analyzing the strengths and weaknesses of cutting planes and branching . We continue this project in the current manuscript.
In particular, we provide a theoretical framework to explain an empirically observed phenomenon: algorithms that make a combined use of both cutting planes and branching techniques are more efficient (sometimes by orders of magnitude), compared to their stand alone use in algorithms. We hope that our insights can contribute to a better and more precise understanding of the interaction of cutting planes and branching: which cutting plane schemes and branching schemes complement each other with concrete, provable gains obtained with their combined use, as opposed to not? Not only is a theoretical understanding of this phenomenon lacking, a deeper understanding of the interaction of these methods is considered to be important by both practitioners and theoreticians in the mixed-integer optimization community. To quote an influential computational survey  “… it seems that a tighter coordination of the two most fundamental ingredients of the solvers, branching and cutting, can lead to strong improvements.”
The main computational burden in any cutting plane or branch-and-bound or branch-and-cut algorithm is the solution of the intermediate convex relaxations. Thus, there are two important aspects to deciding how efficient such an algorithm is: 1) How many linear programs (LPs) or convex optimization problems are solved? 2) How computationally challenging are these convex problems? The first aspect has been widely studied using the concepts of proof size and rank; see[20, 21, 22, 11, 10, 9, 16, 5, 26, 47] for a small sample of previous work. Formalizing the second aspect is somewhat tricky and we will focus on a very specific aspect: the sparsity of the constraints describing the linear program. The collective wisdom of the optimization community says that sparsity of constraints is a highly important aspect in the efficiency of linear programming [4, 27, 46, 50]. Additionally, most successful mixed-integer optimization solvers use sparsity as a criterion for cutting plane selection; see [24, 23, 25] for an innovative line of research. Compared to cutting planes, sparsity considerations have not been as prominent in the choice of branching schemes. This is primarily because for variable disjunctions sparsity is not an issue, and there is relatively less work on more general branching schemes; see [1, 42, 19, 38, 39, 41, 18, 37, 34]. In our analysis, we are careful about the sparsity of the disjunctions as well – see Definition 1.3 below.
1.1 Framework for mathematical analysis.
We now present the formal details of our approach. A cutting plane for the feasible region of (1.1) is a halfspace such that . The most useful cutting planes are those that are not valid for , i.e., . There are several procedures used in practice for generating cutting planes, all of which can be formalized by the general notion of a cutting plane paradigm. A cutting plane paradigm is a function that takes as input any closed, convex set and outputs a (possibly infinite) family of cutting planes valid for . Two well-studied examples of cutting plane paradigms are the Chvátal-Gomory cutting plane paradigm [48, Chapter 23] and the split cut paradigm [13, Chapter 5]. We will assume that all cutting planes are rational in this paper.
State-of-the-art solvers embed cutting planes into a systematic enumeration scheme called branch-and-bound. The central notion is that of a disjunction, which is a union of polyhedra such that , i.e., the polyhedra together cover all of . One typically uses a (possibly infinite) family of disjunctions for potential deployment in algorithms. A well-known example is the family of split disjunctions that are of the form , where and . When the first coordinates of
correspond to a standard unit vector, we getvariable disjunctions, i.e., disjunctions of the form , for .
A family of disjunctions can also form the basis of a cutting plane paradigm. Given any disjunction , any halfspace such that is a cutting plane, since by definition of a disjunction. The corresponding cutting plane paradigm , called disjunctive cuts based on , is the family of all such cutting planes derived from disjunctions in .
In the following we assume that all convex optimization problems that need to be solved have an optimal solution or are infeasible.
A branch-and-cut algorithm based on a family of disjunctions and a cutting plane paradigm maintains a list of convex subsets of the initial set which are guaranteed to contain the optimal point, and a lower bound that stores the objective value of the best feasible solution found so far (with if no feasible solution has been found). At every iteration, the algorithm selects one of these subsets and solves the convex optimization problem to obtain . If the objective value is less than or equal to , then this set is discarded from the list . Else, if satisfies the integrality constraints, is updated with the value of and is discarded from the list. Otherwise, the algorithm makes a decision whether to branch or to cut. In the former case, a disjunction is chosen such that and the list is updated . If the decision is to cut, then the algorithm selects a cutting plane such that , and updates the relaxation by adding the cut , i.e., updates .
Motivated by the above, we will refer to a family of disjunctions also as a branching scheme. In a branch-and-cut algorithm, if one always chooses to add a cutting plane and never uses a disjunction to branch, then it is said to be a (pure) cutting plane algorithm and if one does not use any cutting planes ever, then it is called a (pure) branch-and-bound algorithm. We note here that in practice, when a decision to cut is made, several cutting planes are usually added as opposed to just one single cutting plane like in Definition 1.1. In our mathematical framework, allowing only a single cut makes for a seamless generalization from pure cutting plane algorithms, and also makes quantitative analysis easier.
The execution of any branch-and-cut algorithm on a mixed-integer optimization instance can be represented by a tree. Every convex relaxation processed by the algorithm is denoted by a node in the tree. If the optimal value for is not better than the current lower bound, or is integral, is a leaf. Otherwise, in the case of a branching, its children are , and in the case of a cutting plane, there is a single child representing (we use the same notation as in Definition 1.1). This tree is called the branch-and-cut tree (branch-and-bound tree, if no cutting planes are used). If no branching is done, this tree (which is really a path) is called a cutting plane proof. The size of the tree or proof is the total number of nodes.
Proof versus algorithm.
Although we use the word “algorithm” in Definition 1.1, it is technically a non-deterministic algorithm, or equivalently, a proof schema or proof system for optimality  (leaving aside the question of finite termination for now). This is because no indication is given on how the important decisions are made: Which set to process from
? Branch or cut? Which disjunction or cutting plane to use? If these are made concrete, one would obtain a standard deterministic algorithm (assuming, for the moment, finite termination on all instances). Nevertheless, the proof system is very useful for obtaining information theoretic lower bounds on the efficiency of any deterministic branch-and-cut algorithm. Moreover, one can prove the validity of any upper bound on the objective, i.e., the validity ofby exhibiting a branch-and-cut tree where this inequality is valid for all the leaves. If is the optimal value, this is a proof of optimality, but one may often be interested in the branch-and-cut/branch-and-bound/cutting plane proof complexity of other valid inequalities as well. The connections between integer programming and proof complexity has a long history; see [6, 45, 32, 7, 15, 28, 29, 12, 43, 44, 35, 30], to cite a few. Our results can be interpreted in the language of proof complexity as well.
Another subtlety to keep in mind is that one could add to the power of such a branch-and-cut proof system by relaxing the requirement that the current optimal solution should be eliminated by the chosen disjunction or cutting plane. This can make a difference – an instance may have a finite proof in the strengthened system while no finite proof exists in the original system . When required, we will use the phrase restricted proof to refer to a proof that imposes the restriction of eliminating at every node of the proof tree.
Recall that we quantify the complexity of any branch-and-bound/cutting plane/branch-and-cut algorithm using two aspects: the number of LP relaxations processed and the sparsity of the constraints defining the LPs. The number of LP relaxations processed is given precisely by the number of nodes in the corresponding tree (Definition 1.2). Sparsity is formalized in the following definitions.
Let be a natural number that we call the sparsity parameter. Then the pair will denote the restriction of the paradigm that only reports the sub-family of cutting planes that can be represented by inequalities with at most non-zero coefficients; the notation will be used to denote this sub-family for any particular convex set . Similarly, will denote the sub-family of the family of disjunctions such that each polyhedron in the disjunction has an inequality description where every inequality has at most non-zero coefficients.
1.2 Our Results
1.2.1 Sparsity versus size.
Our first set of results considers the trade-off between the sparsity parameter and the number of LPs processed, i.e., the size of the tree. There are several avenues to explore in this direction. For example, one could compare pure branch-and-bound algorithms based on and , i.e., fix a particular disjunction family and consider the effect of sparsity on the branch-and-bound tree sizes. One could also look at two different families of disjunctions and and look at their relative tree sizes as one turns the knob on the sparsity parameter. Similar questions could be asked about cutting plane paradigms and for interesting paradigms . Even more interestingly, one could compare pure branch-and-bound and pure cutting plane algorithms against each other.
We first focus on pure branch-and-bound algorithms based on the family of split disjunctions. A very well-known example of pure integer instances (i.e., ) due to Jeroslow  shows that if the sparsity of the splits used is restricted to be 1, i.e., one uses only variable disjunctions, then the branch-and-bound algorithm will generate an exponential (in the dimension ) sized tree. On the other hand, if one allows fully dense splits, i.e., sparsity is , then there is a tree with just 3 nodes (one root, and two leaves) that solves the problem. We ask what happens in Jeroslow’s example if one uses split disjunctions with sparsity . Our first result shows that unless the sparsity parameter , one cannot get constant size trees, and if the sparsity parameter , then the tree is of exponential size.
The above instance is a modification of Jeroslow’s instance; Jeroslow’s instance uses an equality constraint instead of an inequality. However, the same argument applies for Jeroslow’s instance.
The bounds in Theorem 1.4 give a constant lower bound when . We establish another lower bound which does better in this regime.
Let be the halfspace defined by inequality , where is an odd number. Consider the instances of (1.1) with , the objective and . The optimum is , and any branch-and-bound proof with sparsity that certifies has size at least .
Next we consider the relative strength of cutting planes and branch-and-bound. Our previous work has studied conditions under which one method can dominate the other, depending on which cutting plane paradigm and branching scheme one chooses . For this paper, the following result from  is relevant: for every convex 0/1 pure integer instance, any branch-and-bound proof based on variable disjunctions can be “simulated” by a lift-and-project cutting plane proof without increasing the size of the proof (versions of this result for linear 0/1 programming were known earlier; see [20, 21]). Moreover, in  we constructed a family of stable set instances where lift-and-project cuts gave exponentially shorter proofs than branch-and-bound. This is interesting because lift-and-project cuts are disjunctive cuts based on the same family of variable disjunctions, so it is not a priori clear that they have an advantage. These results were obtained with no regard for sparsity. We now show that once we also track the sparsity parameter, this advantage can disappear.
Let be the halfspace defined by inequality , where is an odd number. Consider the intances of (1.1) with , the objective and . The optimum is , and there is a branch-and-bound algorithm based on variable disjunctions, i.e., the family of split disjunctions with sparsity , that certifies in steps. However, any cutting plane for with sparsity is trivial, i.e., valid for , no matter what cutting plane paradigm is used to derive it.
1.2.2 Superiority of branch-and-cut.
We next consider the question of when combining branching and cutting planes is provably advantageous. For this question, we leave aside the complications arising due to sparsity considerations and focus only on the size of proofs. The following discussion and results can be extended to handle the issue of sparsity as well, but we leave it out of this extended abstract.
Given a cutting plane paradigm , and a branching scheme , are there families of instances where branch-and-cut based on and does provably better than pure cutting planes based on alone and pure branch-and-bound based on alone? If a cutting plane paradigm and a branching scheme are such that either for every instance, gives cutting plane proofs of size at most a polynomial factor larger than the shortest branch-and-bound proofs with , or vice versa, for every instance gives proofs of size at most polynomially larger than the shortest cutting plane proofs based on , then combining them into branch-and-cut is likely to give no substantial improvement since one method can always do the job of the other, up to polynomial factors. As mentioned above, prior work  had shown that disjunctive cuts based on variable disjunctions (with no restriction on sparsity) dominate branch-and-bound based on variable disjunctions for pure 0/1 instances, and as a consequence branch-and-cut based on these paradigms is dominated by pure cutting planes. In the next theorem, we show that the situation completely reverses if one considers a broader family of disjunctions (still restricted to the pure integer case).
Let be a closed, convex set. Let be a fixed natural number and let be any family of disjunctions that contains all split disjunctions, such that all disjunctions in have at most terms in the disjunction. If a valid inequality for has a cutting plane proof of size using disjunctive cuts based on , then there exists a branch-and-bound proof of size at most based on . Moreover, there is a family of instances where branch-and-bound based on split disjunctions solves the problem in time whereas there is a polynomial lower bound on split cut proofs.
With similar analysis as Theorem 1.8, we can get the following theorem that takes sparsity into account as well.
Let be a closed, convex set. Let be a valid inequality for . If there exists a cutting plane proof of size and sparsity certifying the validity of this inequality, which is derived using general split disjunctions of sparsity , then there exists a branch-and-bound proof of sparsity which proves the validity and takes at most iterations.
The above discussion and theorem motivate the following definition which formalizes the situation where no method dominates the other. To make things precise, we assume that there is a well-defined way to assign a concrete size to any instance of (1.1); see  for a discussion on how to make this precise. Additionally, when we speak of an instance, we allow the possibility of proving the validity of any inequality valid for , not necessarily related to an upper bound on the objective value. Thus, an instance is a pair such that for all .
A cutting plane paradigm and a branching scheme are complementary if there is a family of instances where gives polynomial (in the size of the instances) size proofs and the shortest branch-and-bound proof based on is exponential (in the size of the instances), and there is another family of instances where gives polynomial size proofs while gives exponential size proofs.
We wish to formalize the intuition that branch-and-cut is expected to be exponentially better than branch-and-bound or cutting planes alone for complementary pairs of branching schemes and cutting plane paradigms. But we need to make some mild assumptions about the branching schemes and cutting plane paradigms. All known branching schemes and cutting plane methods from the literature satisfy these conditions.
A branching scheme is said to be regular if no disjunction involves a continuous variable, i.e., each polyhedron in the disjunction is described using inequalities that involve only the integer constrained variables.
A branching scheme is said to be embedding closed if disjunctions from higher dimensions can be applied to lower dimensions. More formally, let , , , . If is a disjunction in with respect to , then the disjunction , interpreted as a set in , is also in for the space with respect to (note that , interpreted as a set in , is certainly a disjunction with respect to ; we want to be closed with respect to such restrictions).
A cutting plane paradigm is said to be regular
if it has the following property, which says that adding “dummy variables” to the formulation of the instance should not change the power of the paradigm. Formally, letbe any closed, convex set and let for some . Then if a cutting plane is derived by applied to , i.e., this inequality is in , then it should also be in , and conversely, if is in , then the equivalent inequality should be in .
A cutting plane paradigm is said to be embedding closed if disjunctions from higher dimensions can be applied to lower dimensions. More formally, let . Let be any closed, convex set. If the inequality is a cutting plane for with respect to that can be derived by applying to , then the cutting plane that is valid for should also belong to .
A cutting plane paradigm is said to be inclusion closed, if for any two closed convex sets , we have . In other words, any cutting plane derived for can also be derived for a subset .
Let be a regular, embedding closed branching scheme and let be a regular, embedding closed, and inclusion closed cutting plane paradigm such that includes all variable disjunctions and and form a complementary pair. Then there exists a family of instances of (1.1) which have polynomial size branch-and-cut proofs, whereas any branch-and-bound proof based on and any cutting plane proof based on is of exponential size.
As a concrete example of a complementary pair that satisfies the other conditions of Theorem 1.12, consider to be the Chvátal-Gomory paradigm and to be the family of variable disjunctions. From their definitions, they are both regular and is embedding closed. The Chvátal-Gomory paradigm is also embedding closed and inclusion closed. For the Jeroslow instances from Theorem 1.4, the single Chvátal-Gomory cut proves optimality, whereas variable disjunctions produce a tree of size . On the other hand, consider the set , where and the valid inequality for . Any Chvátal-Gomory paradigm based proof has size exponential in the size of the input, i.e., every proof has length at least . On the other hand, a single disjunction on the variable solves the problem.
In , we also studied examples of disjunction families such that disjunctive cuts based on are complementary to branching schemes based on .
Example 1.13 shows that the classical Chvátal-Gomory cuts and variable branching are complementary and thus give rise to a superior branch-and-cut routine when combined by Theorem 1.12. As discussed above, for 0/1 problems, lift-and-project cuts and variable branching do not form a complementary pair, and neither do split cuts and split disjunctions by Theorem 1.8. It would be nice to establish the converse of Theorem 1.12: if there is a family where branch-and-cut is exponentially superior, then the cutting plane paradigm and branching scheme are complementary. In Theorem 1.14 below, we prove a partial converse along these lines in the pure integer setting; it remains an open question if our definition of complementarity is an exact characterization of when branch-and-cut is superior.
Let be a branching scheme that includes all split disjunctions and let be any cutting plane paradigm. Suppose that for every pure integer instance and any cutting plane proof based on for this instance, there is a branch-and-bound proof based on of size at most a polynomial factor (in the size of the instance) larger. Then for any branch-and-cut proof based on and for a pure integer instance, there exists a pure branch-and-bound proof based on that has size at most polynomially larger than the branch-and-cut proof.
The high level message that we extract from our results is the formalization of the following simple intuition. For branch-and-cut to be superior to pure cutting planes or pure branch-and-bound, one needs the cutting planes and branching scheme to do “sufficiently different” things. For example, if they are both based on the same family of disjunctions (such as lift-and-project with variable branching, or the setting of Theorem 1.8), then we do not get any improvements with branch-and-cut. The definition of a complementary pair attempts to make the notion of “sufficiently different” formal and Theorem 1.12 derives the concrete superior performance of branch-and-cut from this formalization.
2.1 Proof of Theorem 1.4
We first give necessary definitions and prove a lemma.
Consider the instances in Theorem 1.4, and the branch-and-bound tree produced by split disjunctions to solve it. Assume node of contains at least one integer point in , and are the split disjunctions used to derive from the root of . For , is a true split disjunction of if both of the two halfspaces of have a nonempty intersection with the integer hull of the corresponding parent node, i.e. the parent node’s integer hull is split into two nonempty parts by . Otherwise, it is called a false split disjunction of . We define the generation variable set of as the index set such that it consists of all the indices of the variables involved in the true split disjunctions of . The generation set of the root node is empty.
Consider the instances in Theorem 1.4, and the branch-and-bound tree produced by split disjunctions with sparsity parameter to solve it. For any node of with at least one feasible integer point , let , and denote the relaxation, the integer hull and the generation variable set corresponding to . Define .
If , then we have:
the objective LP value of is .
We first give a proof of (i). Since is a feasible integer point, . Thus, there exists , where for and . So .
For each , we wish to show that . This will show that and . Consider any inequality describing ; if it is not the original defining inequality or a 0/1 bound on a variable, then this inequality was introduced on the path from the root to . A false split disjunction cannot remove since is integral. Consider an inequality coming from a true split disjunction. Let for some be such an inequality. Since and for , we observe that .
We will prove (ii) by contradiction, so we assume the objective LP value of is strictly less than . Let denote the relaxation corresponding to the root node. Assume .
Since , there exists , where . Define , where , and for . It is clear that , and since the LP value is assumed to be strictly less than . Since , there must be a halfspace coming from a false split disjunction of that excludes . The inequality describing this halfspace must involve variable , otherwise also violates , which leads to a contradiction since comes from a false split disjunction and therefore cannot cut off any integer point. Hence assume the inequality describing is for some , and (since the sparsity of the disjunctions is restricted to be at most ). Since , we have , and there exists such that . Let , where , , and for . By definition of , . Since are integral, and comes from a false split disjunction, must be valid for and . Thus, we have
which implies that is valid for . This is a contradiction. ∎
Proof of Theorem 1.4.
For a node of the branch-and-bound tree containing at least one integer point, if it is derived by exactly true split disjunctions, then we say it is a node of generation . By Lemma 2.2, if , then a node of generation has LP objective value , and in the subtree rooted at there must exist at least two descendants from generation , since the leaf nodes must have LP values less than or equal to . Therefore, there are at least nodes of generation when . This finishes the proof. ∎
2.2 Proof of Theorem 1.6
Let and . Then the number of 0/1 solutions to is at most .
Let and . By making the variable change for and for , it is seen that the number of 0/1 solutions to is the same as the number of 0/1 solutions to . Writing this a bit more cleanly, we want to upper bound the number of 0/1 solutions to , where for all and . The collection of subsets that are solutions to is an antichain in the lattice of subsets with set inclusion as the partial order because all the values are strictly positive. By Sperner’s Theorem , the size of this collection is at most . ∎
Proof of Theorem 1.6.
We consider the instance from Theorem 1.6. For any split disjunction , we define to be the set of all the optimal LP vertices (of the original polytope) that lie strictly in the split set corresponding to . Let the support of be given by with . Since and , is precisely the subset of the optimal LP vertices such that . Fix some and consider those optimal LP vertices where . This means that . Let be the number of 0/1 solutions to with exactly coordinates set to 1. Then the number of vertices from with the -th coordinate equal to is
since for all . Using Lemma 2.3, and we obtain the upper bound on the number of vertices from with the -th coordinate equal to . Therefore, Since is odd, we have
A direct calculation then shows that
Let be the largest even number not exceeding . Since , we obtain, for every ,
Using the fact that, for every even positive integer ,
Thus this is an upper bound on . Since the total number of optimal LP vertices of the instance is , we obtain the following lower bound of on the size of a branch-and-bound proof: ∎
2.3 Proof of Theorem 1.7
Proof of Theorem 1.7.
We first show a branch-and-bound algorithm with size . Let the root node be . The objective LP value of is . Let and be the children of produced by branches and respectively. Then the LP values of and are and . Therefore is a leaf node. Recursively, let and be children of produced by and for . Note that this is well defined since the LP values of and are and for . It is clear that node is a leaf for . Node is an infeasible leaf since there are variables set to be . Therefore, the whole branch-and-bound tree has nodes.
Next, we show that any cutting plane for the problem with sparsity is valid for . We will use the fact that .
Let be the set of indices for the non-zero coefficients in an inequality defining the cutting plane, i.e., the inequality is given by . Since this is a cutting plane it must be valid for all points in . Let . Since , we have . Therefore is valid for all of . Since the inequality only involves , , it must also be a valid inequality for all of . ∎
2.4 Proof of Theorem 1.8
Proof of Theorem 1.8.
Let the cutting plane proof be , and the sequence of the corresponding disjunctions deriving it be . Moreover, assume is for . Since we assume all cutting planes are rational, we may assume and . Let be . Since is valid for , we must have that .
Let be the root node of the branch-and-bound tree. Recursively, we define and be the children of generated by applying the split disjunction for . Applying the disjunction on only generates infeasible nodes as noted above. Meanwhile, shows the validity of . Thus, we have replaced the cut with nodes of the branch-and-bound tree: of these are infeasible and one is feasible. Therefore, we get a branch-and-bound tree of size .
We first derive some straightforward consequences of Definition 1.11.
Let be two closed, convex sets. Let be any branching scheme and let be an inclusion closed cutting plane paradigm. If there is a branch-and-bound proof with respect to based on for the validity of an inequality , then there is a branch-and-bound proof with respect to based on for the validity of of the same size. The same holds for cutting plane proofs based on .
For the branch-and-bound proofs, apply the same set of disjunctions on instead of . Since , all the nodes in the branch-and-bound tree for are subsets of the corresponding nodes in the branch-and-bound tree for . Thus, is valid for the leaves of the new branch-and-bound tree.
For the cutting plane proofs, apply the same sequence of cuts and the result follows from the inclusion closed property of (Definition 1.11).∎
Let and be both embedding closed and let be a closed, convex set. Let be a valid inequality for . If there is a branch-and-bound proof with respect to based on for the validity of interpreted as a valid inequality in for , then there is a branch-and-bound proof with respect to based on for the validity of of the same size. The same holds for cutting plane proofs based on .
Since is embedding closed, for any disjunction used in the space , we use the restriction of to the space (Definition 1.11).
Similarly, the cutting plane claim from the fact that is embedding closed (Definition 1.11). ∎
Let be a polytope and let be a valid inequality for . Let . Then, for any regular branching scheme or a regular cutting plane paradigm , any proof of validity of with respect to can be changed into a proof of validity of with respect to with no change in length, and vice versa.
A proof of with respect to never involves , and so can be carried over verbatim a proof for with respect to . In the other direction, since we assume is regular (Definition 1.11), no disjunction uses the variable and so it can be applied with the same effect on . Similarly, since is regular, by definition any cutting plane derived for can be converted into an equivalent cutting plane for .∎
Proof of Theorem 1.12.