 # Quantum speedup of branch-and-bound algorithms

Branch-and-bound is a widely used technique for solving combinatorial optimisation problems where one has access to two procedures: a branching procedure that splits a set of potential solutions into subsets, and a cost procedure that determines a lower bound on the cost of any solution in a given subset. Here we describe a quantum algorithm that can accelerate classical branch-and-bound algorithms near-quadratically in a very general setting. We show that the quantum algorithm can find exact ground states for most instances of the Sherrington-Kirkpatrick model in time O(2^0.226n), which is substantially more efficient than Grover's algorithm.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## Appendix A Example: Integer Linear Programming

To gain some intuition for how the results presented here could be applied, in this appendix we describe one simple and well-known application of branch-and-bound techniques: integer linear programming. An integer linear program (ILP) is a problem of the form:

 minimise cTx subject to Ax≥b, x≥0, x∈Zn

where and

are vectors,

is a matrix, and inequalities are interpreted componentwise. Integer linear programming problems have many applications, including production planning, scheduling, capital budgeting and depot location .

We can solve ILPs using branch-and-bound as follows. We begin by finding a lower bound on the optimal solution to the ILP, by relaxing it to a standard linear program (LP) and solving the LP; that is, removing the constraint . This corresponds to the cost function. If the solution is integer-valued, we are done, as it corresponds to a valid solution to the ILP. Otherwise, consider an index such that the found solution value is not an integer. To implement branching, we consider the two LPs formed by introducing the constraints , . At least one of these must have the same optimal solution as the original ILP. We then repeat with these new LPs. An appealing aspect of this method is that the solution to the relaxation simultaneously tells us a lower bound on the cost, and a good variable to branch on.

The sequences of additional constraints specify subsets of potential solutions to the overall ILP. The branch and cost functions take this sequence as input and solve the resulting LP, to make a decision about which variable to branch on next, and compute a lower bound on cost, respectively. The complexity of the LP-solving step is polynomial in the input size, so the primary contribution to the overall runtime will in general be the exponential scaling in terms of the number of branching steps. A standard classical method could be used (e.g. the simplex algorithm), or one of the recently developed quantum algorithms for linear programming [11, 4, 3, 26, 10].

A particularly simple and elegant special case of this approach is the knapsack problem. Here we are given a list of items, each with weights and values , and an overall weight upper bound . We seek to find a subset of the items that maximises , given that . We can write this as an integer linear program as follows:

 maximise n∑i=1vixi subject to n∑i=1wixi≤W, xi∈{0,1} for all i.

Each variable corresponds to whether the ’th item is included in the knapsack. Then the LP relaxation is simply to replace the constraint with the constraint for all . This is equivalent to allowing fractional amounts of each item to be included.

The branch-and-bound approach to solving ILPs can immediately also be applied to the generalisation to Mixed Integer Linear Programming, where only certain variables are constrained. Now we only branch on those variables which are forced to be integers. One can also apply it to “branch and cut” algorithms. In this approach, when the LP relaxation returns a non-integer-valued solution, one may also add a new constraint (hyperplane) which separates that solution from all integer-valued feasible solutions.

## Appendix B Analysis of Algorithm 2

In this appendix we prove the correctness and the claimed runtime bound of Algorithm 2.

###### Theorem 1.

Let be the minimal cost of a valid solution, and let be the size of the truncated tree with cost bound . (If there is no solution, , and is the size of the whole tree.) Algorithm 2 uses

 O(√Tmindlogcmaxlog(dlogcmaxϵ)×(log(dlogcmaxϵ)+dlogd))

oracle calls, and except with failure probability at most , returns a solution with minimal cost, if one exists, and otherwise “no solution”.

###### Proof.

We first show that the algorithm succeeds with probability at least . The loop executes at most times, so each of Count and Search is used at most times. By a union bound, it is sufficient to pick to ensure that all the uses of Count and Search succeed, except with total probability at most . So we henceforth assume that Count and Search do always succeed.

If this is the case, we first observe that the algorithm always correctly outputs a minimal-cost solution, if one exists, or otherwise “no solution”. This is because at the final iteration (when ), if no solution has previously been found then Search will explore the entire tree and find a solution if one exists. To see that it outputs a minimal-cost solution, note that the binary search on using Search is over the range , and is no larger than the largest value of previously computed, so any solution with cost smaller than would have been found in a previous iteration.

It remains to prove the runtime bound. Let denote the size of the truncated tree with cost bound (so ). The first binary search (in part 2a) executes Count times, each iteration using queries; and the second binary search executes Search times, where each iteration uses queries. At each iteration of the loop, after the binary search using Count, by correctness of the quantum tree size estimation algorithm. Further, at the first iteration when (if such an iteration occurs), for all , Count does not return “contains more than nodes”. This implies that , because as the binary search terminated at cost , Count must have returned “contains more than nodes”. Note that this holds even though Count can return an arbitrary outcome when .

Therefore, at this iteration the tree truncated at cost contains a minimal-cost solution, which will be found by the binary search on using Search, and the algorithm will terminate. On the other hand, if there is no iteration such that , we must have . Combining these two claims, we have throughout the algorithm. The loop over exponentially increasing values of does not affect the overall complexity bound, so the overall complexity is

 O(√Tmindlogcmaxlog(dlogcmaxϵ)×(log(dlogcmaxϵ)+dlogd))

queries, as claimed. ∎

We remark that it seems that, in general, Algorithm 2 could not be replaced with simply using the Search subroutine with exponentially increasing values of the cost parameter (an approach taken in  for the special case of accelerating backtracking algorithms for the travelling salesman problem). This is because increasing the cost at which the tree is truncated by a constant factor could increase the size of the truncated tree substantially beyond .

## Appendix C Truncated tree size bound for Sherrington-Kirkpatrick model

In this appendix, let denote the size of the tree corresponding to the classical branch-and-bound algorithm applied to find the ground-state energy of an Ising Hamiltonian corresponding to an matrix , using the bounding function described in the main text, where the tree is truncated at the optimal value .

We will prove the following result:

###### Theorem 2.

Let be an matrix corresponding to a Sherrington-Kirkpatrick model instance on spins. For all sufficiently large ,

 PrA[TA≥20.451n]≤0.01.

The dominant term in the quantum complexity is the square root of the classical complexity, which is determined by , so Theorem 2 implies that the quantum branch-and-bound algorithm has an running time on 99% of Sherrington-Kirkpatrick model instances.

In order to prove Theorem 2, we will need two technical lemmas, proven in Appendix D.

###### Lemma 3.

Let . Let be continuous and 1-Lipschitz in each coordinate separately, i.e.  for all . Then

 Pr[f(x)≥Ex[f(x)]+t]≤e−t2/(2N), Pr[f(x)≤Ex[f(x)]−t]≤e−t2/(2N).

Lemma 3 was shown in  but with an incorrect constant. We will also need a bound on the expectation . A precise value for this is known as , but we will need a bound that holds for arbitrary :

###### Lemma 4.

Let be an matrix corresponding to a Sherrington-Kirkpatrick model instance on spins. For all , .

We are now able to prove Theorem 2. The basic strategy is to upper-bound the expected value of , using that (by linearity of expectation) this can be expressed as a sum over all bit-strings , , of the probability that the node corresponding to is contained within the truncated branch-and-bound tree. These bit-strings are precisely those such that Bound, and a tail bound can be used to upper-bound the probability that this event occurs.

There are two technical difficulties which need to be handled. First, this approach does not give a good upper bound in the case where is high, which can occur with non-negligible probability, leading to becoming large. We therefore handle this case separately and show that it occurs with low probability. Next, to find a tail bound on Bound, we need to compute expressions of the form ; although a limiting form for this is known [37, 42, 39], we will additionally need relatively tight bounds in the case . We therefore split into cases (where and we can use the precise limiting result) and (where we use Lemma 4).

###### Proof of Theorem 2.

Write , and let be an arbitrary value to be determined. We will upper-bound the probability that for some as follows, where we use the notation

for the indicator random variable which evaluates to 1 if

is true, and 0 if is false:

 PrA[TA≥B] = PrA[TA≥B∧minzzTAz≤μ+γn3/2]+PrA[TA≥B∧minzzTAz>μ+γn3/2] ≤ PrA[TA[minzzTAz≤μ+γn3/2]≥B]+PrA[minzzTAz>μ+γn3/2] ≤ 1BEA[TA[minzzTAz≤μ+γn3/2]]+PrA[minzzTAz>μ+γn3/2] = 1Bn∑ℓ=0∑x∈{±1}ℓEA[[BoundA(x)≤minzzTAz][minzzTAz≤μ+γn3/2]]+PrA[minzzTAz>μ+γn3/2] ≤ 1Bn∑ℓ=0∑x∈{±1}ℓPrA[BoundA(x)≤μ+γn3/2]+PrA[minzzTAz>μ+γn3/2] ≤

where the second inequality is Markov’s inequality and we use linearity of expectation in the second equality.

To upper-bound the last term, we use Lemma 3. We first observe that is 1-Lipschitz in each variable, as if we modify to produce by changing to for some pair ,

 minz∈{±1}n∑i

and by a similar argument . So Lemma 3 implies that

 PrA[minzzTAz>μ+γn3/2]≤e−(γn3/2)2/(2(n2))≤e−γ2n.

For this to be upper-bounded by a small constant (e.g. 0.005) we can take .

We next upper-bound the first term by bounding . We only need to consider in the maximisation, because when , trivially upper-bounding this probability by 1 already gives a sufficiently strong bound. Recall that

 BoundA(x)=∑1≤i

The function is 1-Lipschitz in each variable by a similar argument to (2). Thus, for any ,

 PrA[BoundA(x)≤EA[BoundA(x)]−ηn3/2]≤e−η2n. (3)

First assume that , so as . For any , we have

 EA[BoundA(x)] = (4) = −(n−ℓ)EA[∣∣ ∣∣ℓ∑i=1aij∣∣ ∣∣]+EA⎡⎣minz∑ℓ+1≤i

where we use linearity of expectation to obtain the first expression, that , and the known limiting result  [37, 42, 39].

Writing , we have that

 EA[BoundA(x)]=(−(1−α)√α√2π−(0.763⋯+o(1))(1−α)3/2)n3/2=:g1(α)n3/2.

On the other hand, for , we follow a similar argument but apply the nonasymptotic result of Lemma 4 to bound , which implies that

 EA[BoundA(x)] ≥ −(n−ℓ)√2π√ℓ−0.601√n−ℓ−0.833(n−ℓ)3/2 (7) ≥ −(n−ℓ)√2π√ℓ−1.434(n−ℓ)3/2 (8) = (−(1−α)√α√2π−1.434(1−α)3/2)n3/2=:g2(α)n3/2. (9)

In either case, we have

 PrA[BoundA(x)≤μ+γn3/2]=PrA[BoundA(x)−EA[BoundA(x)]≤μ+γn3/2−EA[BoundA(x)]].

By (3), using and observing (see Figure 3) that for sufficiently large , so the right-hand side is negative as required, we have

 PrA[BoundA(x)≤μ+γn3/2]≤{e−(g1(α)+(0.763⋯+o(1))−γ)2nif 0.4≤α≤0.9e−(g2(α)+(0.763⋯+o(1))−γ)2nif α≥0.9.

So

 maxℓ≥0.4n,x∈{±1}ℓ2ℓPrA[BoundA(x)≤μ+γn3/2] ≤ max{maxα∈[0.4,0.9]2αne−(g1(α)+0.763⋯+o(1))2n,maxα∈[0.9,1]2αne−(g2(α)+0.763⋯+o(1))2n} = max{maxα∈[0.4,0.9]2n(α−(g1(α)+0.763⋯+o(1))2/ln2),maxα∈[0.9,1]2n(α−(g2(α)+0.763⋯+o(1))2/ln2)}

observing that . It remains to determine upper bounds on the functions

 α−(g1(α)+0.763⋯+o(1))2ln2=α−(−(1−α)√α√2π+0.763…(1−(1−α)3/2))2ln2+o(1)=:h1(α)+o(1), (10)
 α−(g2(α)+0.763⋯+o(1))2ln2=α−(−(1−α)√α√2π+0.763⋯−1.434(1−α)3/2)2ln2+o(1)=:h2(α)+o(1). (11)

This can easily be achieved numerically, giving (see Figure 4) the result that for and for . Hence

 PrA[TA≥B]≤(n+1)2(0.45003+o(1))nB+0.005,

and to upper-bound the first term by 0.005, for sufficiently large one can take . This completes the proof. ∎ Figure 3: The functions g1(α), g2(α) defined in (6), (9). g1(α)≥−0.763 for all α≥0.4, while g2(α)≥−0.763 for all α≥0.9. Figure 4: The functions h1(α), h2(α) defined in (10), (11). h1(α)<0.45003 for all α, while h2(α)<0.45003 for all α≥0.9.

## Appendix D Proofs of technical lemmas

In this appendix we prove Lemmas 3 and 4. We say that satisfies the bounded differences condition with constants , , if whenever and differ only in the ’th coordinate.

###### Lemma 5 (McDiarmid’s inequality or method of bounded differences [16, Corollary 5.2]).

If satisfies the bounded differences condition with constants , and are independent random variables, then

 Pr[f(x)≥Ex[f(x)]+t]≤e−2t2/d, Pr[f(x)≤Ex[f(x)]−t]≤e−2t2/d,

where .

###### Lemma 3 (restated).

Let . Let be continuous and 1-Lipschitz in each coordinate separately, i.e.  for all . Then

 Pr[f(x)≥Ex[f(x)]+t]≤e−t2/(2N), Pr[f(x)≤Ex[f(x)]−t]≤e−t2/(2N).
###### Proof.

For , , let be a Rademacher random variable, taking values with equal probability. Then define the sequence by . Let be defined by setting . Then changing one entry of can change by at most , so we can apply Lemma 5 with to obtain

 Pr[f(x(y))≥Ey[f(x(y))]+t]≤e−t2/(2N), Pr[f(x(y))≤Ey[f(x(y))]−t]≤e−t2/(2N).

As , the distribution of approaches a standard normal distribution for all . The lemma follows. ∎

###### Lemma 4 (restated).

Let be an matrix corresponding to a Sherrington-Kirkpatrick model instance on spins. For all , .

###### Proof of Lemma 4.

 EA[minz∈{±1}nzTAz]=−∫0−∞Pr[minz∈{±1}nzTAz≤t]dt,

valid as is non-positive. Next, for any we have

 Pr[minz∈{±1}nzTAz≤t]≤2nPr[∑i

using a union bound over and symmetry of the distribution of . By a tail bound on the normal distribution, we have

 Pr[∑i

for all . So

 EA[minz∈{±1}nzTAz] ≥ −∫0−∞min{1,2ne−t2/n2}dt = −∫−n3/2√ln2−∞2ne−t2/n2dt−∫0−n3/2√ln21dt = −n2n−1/2∫−√2nln2−∞e−t2/2dt−√ln2n3/2 ≥ −√n2√ln2−√ln2n3/2 = −0.600561…√n−0.832555…n3/2,

where we use the bound for any in the second inequality. ∎

## Appendix E Classical numerical branch-and-bound results

We implemented the classical branch-and-bound algorithm described in the main text, with cost function Bound, using a simple depth-first search procedure within the branch-and-bound tree which backtracks on nodes corresponding to partial solutions with an energy bound worse than the lowest energy seen thus far. For an S-K model instance described by a matrix , this gives an upper bound on the size of an optimally truncated branch-and-bound tree (equivalently, on the runtime of the best-first search algorithm applied to find the ground state energy, with cost function Bound).

This algorithm enabled instances on more than 50 spins to be solved within minutes on a standard laptop computer. We then carried out a least-squares fit on the log of the number of nodes explored, omitting small , to estimate the scaling of the algorithm with . Note that, due to finite-size effects, this may not be accurate for large ; however, it gives an indication of tree size scaling. The median normalised ground state energy found for the larger values of (e.g.  for ) seems to approach the limiting value relatively slowly. These results are consistent with heuristic finite-size results reported in  and were validated using exhaustive search for small . Figure 5: Median tree size explored by classical branch-and-bound algorithm with depth-first strategy. 99 random instances generated for each n. Fit is line y=20.371n+5.380. Figure 6: Normalised ground state energy Eminn−3/2 of instances of the S-K model. 99 random instances generated for each n.