# Fast quantum subroutines for the simplex method

We propose quantum subroutines for the simplex method that avoid classical computation of the basis inverse. For a well-conditioned m × n constraint matrix with at most d_c nonzero elements per column, at most d nonzero elements per column or row of the basis, and optimality tolerance ϵ, we show that pricing can be performed in time Õ(1/ϵ√(n)(d_c n + d^2 m)), where the Õ notation hides polylogarithmic factors. If the ratio n/m is larger than a certain threshold, the running time of the quantum subroutine can be reduced to Õ(1/ϵd √(d_c) n √(m)). Classically, pricing would require O(d_c^0.7 m^1.9 + m^2 + o(1) + d_c n) in the worst case using the fastest known algorithm for sparse matrix multiplication. We also show that the ratio test can be performed in time Õ(t/δ d^2 m^1.5), where t, δ determine a feasibility tolerance; classically, this requires O(m^2) in the worst case. For well-conditioned sparse problems the quantum subroutines scale better in m and n, and may therefore have a worst-case asymptotic advantage. An important feature of our paper is that this asymptotic speedup does not depend on the data being available in some "quantum form": the input of our quantum subroutines is the natural classical description of the problem, and the output is the index of the variables that should leave or enter the basis.

## Authors

• 8 publications
• ### A Quantum Interior Point Method for LPs and SDPs

We present a quantum interior point method with worst case running time ...
08/28/2018 ∙ by Iordanis Kerenidis, et al. ∙ 0

• ### Efficient algorithms for computing a minimal homology basis

Efficient computation of shortest cycles which form a homology basis und...
01/21/2018 ∙ by Tamal K. Dey, et al. ∙ 0

• ### Communication Complexity of One-Shot Remote State Preparation

Quantum teleportation uses prior shared entanglement and classical commu...
02/21/2018 ∙ by Shima Bab Hadiashar, et al. ∙ 0

• ### Quantum Algorithms for Variants of Average-Case Lattice Problems via Filtering

We show polynomial-time quantum algorithms for the following problems: ...
08/25/2021 ∙ by Yilei Chen, et al. ∙ 0

• ### Sparse Matrix to Matrix Multiplication: A Representation and Architecture for Acceleration (long version)

Accelerators for sparse matrix multiplication are important components i...
06/02/2019 ∙ by Pareesa Ameneh Golnari, et al. ∙ 0

• ### Optimal Provable Robustness of Quantum Classification via Quantum Hypothesis Testing

Quantum machine learning models have the potential to offer speedups and...
09/21/2020 ∙ by Maurice Weber, et al. ∙ 17

• ### Stabilizer Circuits, Quadratic Forms, and Computing Matrix Rank

We show that a form of strong simulation for n-qubit quantum stabilizer ...
03/29/2019 ∙ by Chaowen Guan, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The simplex method is one of the most impactful algorithms of the past century; to this day, it is widely used in a variety of applications. This paper studies some opportunities for quantum computers to accelerate the simplex method.

The use of quantum computers for optimization is a central research question that has attracted significant attention in recent years. It is known that a quadratic speedup for unstructured search problems can be obtained using Grover’s algorithm [Grover, 1996]. Thanks to exponential speedups in the solution of linear systems [Harrow et al., 2009, Childs et al., 2017], it seems natural to try to translate those speedups into faster optimization algorithms, since linear systems appear as a building block in many optimization procedures. However, few results in this direction are known. This is likely due to the difficulties encountered when applying a quantum algorithm to a problem whose data is classically described, and a classical description of the solution is required. We provide a simple example to illustrate these difficulties.

Suppose we want to solve the system , where is an invertible matrix with at most nonzero elements per column or row. Using the fastest known quantum linear systems algorithm [Childs et al., 2017], the gate complexity of this operation is , where indicate the gate complexity necessary to describe in a certain input model, and is the condition number of . We remark that here and in the rest of this paper, we measure the running time for quantum subroutines as the number of basic gates, as is usual in the literature. Notice that does not appear in the running time, as the dependence is polylogarithmic. However, we need gates to implement for sparse in the gate model, as will be shown in the following; and gates are necessary to implement . This is natural for an exact representation of the input data, since has nonzero elements and has nonzero elements. If we also want to extract the solution with precision , using the fast tomography algorithm of [Kerenidis and Prakash, 2018] we end up with running time . This is slower than the time taken to classically compute an LU decomposition of , which is [Yuster and Zwick, 2005]. Thus, naive application of quantum linear system algorithms (QLSAs) does not give any advantage.

Despite these difficulties, a few examples of fast quantum optimization algorithms exist. [Brandao and Svore, 2017, Van Apeldoorn et al., 2017]

give polynomial speedups for the solution of semidefinite programs and therefore also linear programs (LPs). These two papers give a quantum version of

[Arora and Kale, 2016]: while the algorithm is essentially the same as its classical counterpart, the basic subroutines admit faster quantum algorithms. The running time for LPs is , where are bounds on the size of the optimal primal/dual solution, and an optimality tolerance; this is faster than any known classical algorithm when — although as [Van Apeldoorn et al., 2017] notes, interesting problems that fall into this regime are not known and many natural SDP formulations do not satisfy this requirement. To achieve this speedup, [Brandao and Svore, 2017, Van Apeldoorn et al., 2017] assume that there exists an efficient quantum subroutine to describe (i.e., in polylogarithmic time), and do not output the solution to the problem, but rather a quantum state that describes it. If we insist on classical input and output for the optimization problem, the overall running time increases significantly. The papers [Kerenidis and Prakash, 2018, Casares and Martin-Delgado, 2019] also give polynomial speedups for LPs, using different variants of an interior point algorithm. Specifically, [Kerenidis and Prakash, 2018] gives a running time of , where is a constraint satisfaction tolerance, and [Casares and Martin-Delgado, 2019] gives a running time of . Both papers follow the classical algorithm, but accelerate the basic subroutines performed at each iteration. To achieve this speedup, both papers rely on QRAM, a form of quantum storage. QRAM allows data preparation subroutines that are exponentially faster than what would be required under the standard gate model. Assuming QRAM, the algorithms of [Kerenidis and Prakash, 2018, Casares and Martin-Delgado, 2019] have classical input and output.

Summarizing, there are few known examples of faster quantum optimization algorithms, and all of them have strong assumptions on the availability of efficient data preparation or data readout subroutines. In particular, the quantum optimization algorithms of [Brandao and Svore, 2017, Van Apeldoorn et al., 2017, Kerenidis and Prakash, 2018, Casares and Martin-Delgado, 2019] have one of these two assumptions: (i) that having quantum input/output is acceptable, ignoring the cost of a classical translation, or (ii) that QRAM, a form of quantum storage whose physical realizability is unclear, is available. Both assumptions have the merit of leading to interesting algorithmic developments, but it is still an open question to find practical situations in which they are satisfied, particularly in the context of traditional optimization applications. In this paper we propose quantum subroutines that do not rely on any of these two assumptions.

#### Our results.

For brevity, from now on we assume that the reader is familiar with standard linear optimization terminology; we refer to [Bertsimas and Tsitsiklis, 1997] for a comprehensive treatment of LPs. The simplex method aims to solve , where with at most nonzero elements per column. It keeps a basis, i.e., a set of linearly independent columns of , and repeatedly moves to a different basis that defines a solution with better objective function value. As is common in the literature, we use the term “basis” to refer to both the set of columns, and the corresponding submatrix of , depending on context. The maximum number of nonzero elements in any column or row of the basis submatrix is denoted . The basis change (called a pivot) is performed by determining a new column that should enter the basis, and removing one column from the current basis. Choosing the new column is called pricing, and it is asymptotically the most expensive step: it requires computing the basis inverse and looping over all the columns in the worst case, for a total of operations using the matrix multiplication algorithm of [Yuster and Zwick, 2005]. If the basis inverse is known, i.e., updated from a previous iteration, the worst-case running time is .

We show that with a quantum computer we can apply Grover search for pricing, so that the running time scales as rather than for looping over all the columns. To apply Grover search we need a quantum oracle that determines if a column is eligible to enter the basis, i.e., if it has negative reduced cost. It is crucial that this is a quantum oracle, because the speedup of Grover search comes from being able to call the oracle on a superposition. We propose a construction for this oracle using the QLSA of [Childs et al., 2017]

, several gadgets to make state amplitudes interfere in a certain way, and amplitude estimation

[Brassard et al., 2002]. The construction avoids classical computation of the basis inverse. The overall running time of the oracle is , where is the precision of the reduced costs (i.e., the optimality tolerance). This gives a total running time of . If the ratio is large, we can find a better tradeoff between the Grover speedup and the data preparation subroutines, and improve the running time of the quantum pricing algorithm to . We summarize this below.

###### Theorem 1.

There exist quantum subroutines to identify if a basis is optimal, or determine a column with negative reduced cost, with running time , which can be reduced to if the ratio is larger than .

The regime with large is interesting because it includes many natural LP formulations; e.g., the LP relaxations of cutting stock problems, vehicle routing problems, or any other formulation that is generally solved by column generation [Lübbecke and Desrosiers, 2005]. The optimality tolerance for the quantum subroutines is slightly different from the classical algorithm, i.e., we need a tolerance relative to some norm for efficiency; these details will be discussed subsequently in the paper.

If pricing is performed via our quantum subroutine, we obtain the index of a column that has negative reduced cost with arbitrarily high probability. To determine which column should leave the basis, we have to perform the

ratio test. Using techniques similar to those used for the pricing step, we can identify the column that leaves the basis in time , where and are precision parameters of this step. In particular, determines how well we approximate the minimum of the ratio test performed by the classical algorithm (i.e., the relative error is ; there is also an absolute error controlled by because, as will be discussed subsequently in the paper, attaining a purely relative error bound would require infinite precision). Classically, the ratio test requires time in the worst case, because the basis inverse could be dense even if the basis is sparse (although it is unlikely in practice). We summarize this result below.

###### Theorem 2.

There exists a quantum subroutine to identify if a nonbasic column proves unboundedness of the LP in time . There also exists a quantum subroutine to perform the ratio test in time , returning an approximation of the true minimum of the ratio test with relative error and absolute error proportional to .

The exact statement of these theorems will be given subsequently in the paper.

It is known that for most practical LPs the maximum number of nonzeroes in a column is essentially constant; for example, on the entire benchmark set MIPLIB2010, less than 1% of the columns have more than 200 nonzeroes (and less than 5% have more than 50 nonzeroes). Similarly, the number of nonzeroes per row of the basis is small: on MIPLIB2010, looking at the optimal bases of the LP relaxations, less than of the rows have more than 50 nonzeroes. As increase, so typically does the sparsity. For example, the largest problem available in the benchmark set MIPLIB2017 has , but only nonzero elements per column on average; the second largest problem has , and only nonzero elements per column on average. It is therefore interesting to look at the scaling of the running time under the assumption that the sparsity parameters are constant, or at most polylogarithmic in and . In this case, the running time of the oracle for the reduced costs is , giving a total running time for pricing of . Hence, for a well-conditioned basis and under the assumption (often verified in practice) that , we obtain running time for the quantum pricing subroutine, which can be reduced to if the ratio is large; and running time for the quantum ratio test subroutine.

Summarizing, the quantum subroutines that we propose can be asymptotically faster than the best known classical method, under the assumption that the LPs are extremely sparse — an assumption that is generally verified in practice. Indeed, while the gate complexity of the quantum subroutines depends on some optimality and feasibility tolerances in addition to the classical input parameters (, , and the sparsity parameters), these tolerances do not scale with and . For well-conditioned problems, the quantum subroutines have better scaling in and , and this could turn into an asymptotic advantage. To achieve this potential advantage, we never explicitly compute the basis inverse, and rely on the quantum computer to indicate which columns should enter and leave the basis at each iteration. Similar to other papers in the quantum optimization literature, we use a classical algorithm (the simplex method) and accelerate the subroutines executed at each iteration of the simplex. Differently from the literature, we do not make assumptions on the availability of QRAM or of the data in “quantum form”, as we explicitly take into account the time necessary to prepare the data in the gate model; to the best of our knowledge, this is a crucial novelty of our paper in the quantum optimization literature. Of course, if QRAM were to become available in the future, our approach would also gain a significant speedup (the running time of the pricing routine drops to , and that of the ratio test drops to ).

Our algorithms require a polylogarithmic number of logical qubits, i.e., a fault-tolerant quantum computer, therefore they are not suitable for implementation on the noisy intermediate-scale quantum computers currently available. A further reason why our analysis is mostly of theoretical, rather than practical, interest is the fact that we only take into account the worst-case behavior: it is well established that in practice the simplex method performs much better than the worst case, in terms of the number of iterations

[Spielman and Teng, 2004, Dadush and Huiberts, 2018] as well as the complexity of a single iteration, see e.g. [Chvátal, 1983]. However, besides showing that quantum computers may accelerate the simplex method on large, sparse problems, our results may lead to further developments in quantum optimization algorithms. For example, we conjecture that our subroutines can be concatened to perform multiple pivots in superposition, which could have interesting applications even if the probability of success decreases exponentially in the number of consecutive pivots. These directions are left for future research.

The rest of this paper is organized as follows. We start with a brief summary of the simplex method to establish terminology, in Sect. 2. In Sect. 3 we define our notation, describe useful results from the literature, and give an overview of our quantum subroutines; Sect. 4 contains a detailed explanation of each step. Sect. 5 concludes the paper.

## 2 Overview of the simplex method

The simplex method solves the following linear optimization problem: , where , . A basis is a subset of linearly independent columns of . Given a basis , assume that it is an ordered set and let be the -th element of the set. The set is called the set of nonbasic variables. We denote by the square invertible submatrix of corresponding to columns in , and the remaining submatrix. The term “basis” may refer to or , depending on context. The simplex method can be described compactly as follows; see, e.g., [Bertsimas and Tsitsiklis, 1997] for a more detailed treatment.

• Start with any basic feasible solution. (This is w.l.o.g. because it is always possible to find one.) Let be the current basis, the nonbasic variables, the current solution.

• Repeat the following steps:

1. Compute the reduced costs for the nonbasic variables . This step is called pricing. If the basis is optimal: the algorithm terminates. Otherwise, choose .

2. Compute . If , the optimal cost is unbounded from below: the algorithm terminates.

3. If some component of is positive, compute (this step is called ratio test):

 r∗:=minj=1,…,m:uj>0xB(j)uj. (1)
4. Let be such that . Form a new basis replacing with . This step is called a pivot.

To perform the pricing step, we compute an LU factorization of the basis ; this requires time using fast sparse matrix multiplication techniques [Yuster and Zwick, 2005]. (In practice, the traditional

Gaussian elimination is used instead, but the factorization is not computed from scratch at every iteration.) Then, we can compute the vector

and finally perform the calculations for all ; this requires an additional time, bringing the total time to . To perform the ratio test, we need the vector , which takes time assuming the LU factorization of is available from pricing. As remarked earlier, may not be dense if is sparse, so this step could take significantly less time in practice. Finally, since the calculations are performed with finite precision, we use an optimality tolerance and the optimality criterion becomes .

## 3 Quantum implementation: overview

Before giving an overview of our methodology, we introduce some notation and useful results from the literature. We assume that the reader is familiar with quantum computing notation; an introduction for non-specialists is given in [Nannicini, 2017], and a comprehensive reference is [Nielsen and Chuang, 2002]. Given a vector , denotes its -norm; given a matrix , denotes the spectral norm, whereas is the Frobenius norm. Given two matrices (including vectors or scalars), denotes the matrix obtained stacking on top of , assuming that the dimensions are compatible. Given a classical vector , we denote its amplitude encoding. If we have binary digits or strings , we denote by their concatenation. Given a binary string with , we define . The symbol denotes the -digit all-zero binary string.

### 3.1 Preliminaries

This paper uses quantum algorithms for linear systems introduced in [Childs et al., 2017]. For the system with integer entries, the input to the algorithm is encoded by two unitaries and which are queried as oracles, more precisely:

• : specified by two maps; the map provides the index of the -th nonzero element of column , the map provides the value of the -th element of column .

• : maps .

We define the following symbols:

• : maximum number of nonzero entries in any column of .

• : maximum number of nonzero entries in any row of .

• : sparsity of .

• : ratio of largest to smallest nonzero singular value of

. Throughout this paper, we assume that , or an upper bound on it, is known.

• : maximum nonzero entry of , rounded up to a power of 2.

• : number of bits to index entries of .

###### Theorem 3 ([Childs et al., 2017]).

Let be symmetric and . Given , , and , there exists a quantum algorithm that produces the state with using queries to and queries to , with additional gate complexity .

Note that [Childs et al., 2017] also gives a faster algorithm that employs variable-time amplitude amplification; however, we cannot use it for our subroutines because it is not unitary, i.e., it requires measurements. The restriction on symmetric can be relaxed by considering the system , see [Harrow et al., 2009]. In the rest of this paper, we will refer to as a shorthand for the above symmetric matrix; the r.h.s. and the number of rows should always be adjusted accordingly. If the original (non-symmetric) is invertible, so is this symmetric matrix and the singular values are the same (but with different multiplicity). Our running time analysis takes into account this transformation, i.e., we use rather than where appropriate.

Throughout this paper, will be used to suppress polylogarithmic factors in the input parameters, i.e., . As stated in the introduction, we assess the complexity of quantum algorithms in terms of basic gates. We assume that the cost of a controlled unitary is the same as that of the unitary, because the number of additional gates required by the controlled unitary is generally the same in the notation. (All our algorithms require a polylogarithmic number of qubits.) Furthermore, we do not explicitly uncompute auxiliary registers, but it should always be assumed that auxiliary registers are reset to their original state at the end of a subroutine; this does not affect the running time analysis.

In addition to the QLSA, we will use the following well-known result.

###### Proposition 1 ([Nielsen and Chuang, 2002], Sect. 5.2).

When applying phase estimation, let be the output of the procedure when applied to an eigenstate with phase . If we use qubits of precision, the first bits of will be accurate with probability at least , i.e., .

### 3.2 High-level description of the quantum subroutines

Throughout this overview of the quantum algorithms, we assume that the LP data is properly normalized. The normalization can be carried out as a classical preprocessing step, whose details are given in Sect. 4, and the running time of this step is negligible compared to the remaining subroutines.

Our first objective is to implement a quantum oracle that determines if a column has negative reduced cost, so that we can apply Grover search to this oracle. To reach this goal we rely on the QLSA of Thm. 3. Using that algorithm, with relatively straightforward data preparation we can construct an oracle that, given a column index in a certain register, outputs in another register. We still need to get around three obstacles: (i) the output of the QLSA is a renormalization of the solution, rather than the (unscaled) vector ; (ii) we want to compute , while so far we only have access to ; (iii) we need the output to be a binary yes/no condition (i.e., not just an amplitude) so that Grover search can be applied to it, and we are not allowed to perform measurements. We overcome the first two obstacles by: extending and properly scaling the linear system so that is suitably encoded in the QLSA output; and using the inverse of the unitary that maps to to encode in the amplitude of one of the basis states. To determine the sign of such amplitude, we rely on interference to create a basis state with amplitude such that if and only if . At this point, we can apply amplitude estimation [Brassard et al., 2002] to determine the magnitude of up to precision . This requires iterations of the amplitude estimation algorithm. We therefore obtain a unitary operation that overcomes the three obstacles. A similar scheme can be used to determine if the basis is optimal, i.e., no column with negative reduced cost exists.

Some further complications merit discussion. The first one concerns the optimality tolerance: classically, this is typically for some given . Note that an absolute optimality tolerance is not invariant to rescaling of . In the quantum subroutines, checking for a column may be too expensive (i.e., ) if the norm of is too large; this is due to the normalization of the quantum state, which would force us to increase the precision of amplitude estimation. We therefore use the inequality as optimality criterion, and show that with this criterion it is sufficient to require precision for the amplitude estimation part. Notice that if is small, the ratio test (1) has a small denominator, hence we may move by a large amount if column enters the basis; it is therefore reasonable to require that is very close to zero to declare that the basis is optimal. The converse is true if is large. Since we never explicitly compute the basis inverse, we do not have classical access to . To alleviate this issue, we show in Sect. 4.5 that there exists an efficient quantum subroutine to compute the root mean square of over all ; thus, if desired we can also output this number to provide some characterization of the optimality tolerance used in the pricing step. A similar scheme can be used to output an estimate of for the specific entering the basis. It is important to remark that while the optimality tolerance is relative to a norm, which in turn depends on the problem data, the evidence provided in Sect. 1 indicates that due to sparsity, in practice we do not expect these norms to grow with and (or at least, no more than polylogarithmically).

The second complication concerns the condition number of the basis: the running time of the quantum routines explicitly depends on it, but this is not the case for the classical algorithms based on an LU decomposition of (although precision may be affected). We do not take any specific steps to improve the condition number of the basis (e.g., [Clader et al., 2013, Casares and Martin-Delgado, 2019]), but we note that one possibility is to simply ignore any singular values above a certain cutoff, and invert only the part of the basis that is “well-conditioned”, i.e., spanned by singular vectors with singular value below the cutoff. If we do so, the accuracy of the calculations could get arbitrarily bad in theory (i.e., if the r.h.s. of some linear system is orthogonal to the subspace considered [Harrow et al., 2009]), but in practice this would only happen if the problem is numerically very unstable.

With the above construction we have a quantum subroutine that determines the index of a column with negative reduced cost, if one exists. Such a column can enter the basis. To perform a basis update we still need to determine the column leaving the basis: this is our second objective. For this step we need knowledge of . Classically, this is straightforward because the basis inverse is available, since it is necessary to compute reduced costs anyway. With the above quantum subroutines the basis inverse is not known, and in fact part of the benefit of the quantum subroutines comes from always working with the original, sparse basis, rather than its possibly dense inverse. Thus, we describe another quantum algorithm that uses a QLSA as a subroutine, and identifies the element of the basis that is an approximate minimizer of the ratio test (1). Special care must be taken in this step, because attaining the minimum of the ratio test is necessary to ensure that the basic solution after the pivot is feasible (i.e., satisfies the nonnegativity constraints ). However, in the quantum setting we are working with continuous amplitudes, and determining if an amplitude is zero is impossible due to finite precision. Our approach to give rigorous guarantees for this step involves the use of two feasibility tolerances: a tolerance , that determines which components of will be involved in the ratio test (due to the condition in (1)); and a precision multiplier , that determines the approximation guarantee for the minimization in (1). In particular, our algorithm returns an approximate minimizer that attains a relative error of , plus an absolute error proportional to . We remark that since the minimum of the ratio test could be zero, giving a purely relative error bound seems impossible in finite precision. This is discussed in Sect. 4.6. A similar quantum algorithm can be used to determine if column proves unboundedness of the LP.

## 4 Details of quantum implementation

We now discuss data preparation and give the details of the quantum subroutines. The steps followed at every iteration are listed in Alg. 1; all the subroutines used in the algorithm are fully detailed in subsequent sections.

All data normalization is performed within the subroutine SimplexIter, on line 3: this prepares the input for all other subroutines. Normalizing has cost , while finding the leading singular value of has cost using, e.g., the power method, with precision [Kuczyński and Woźniakowski, 1992]

. The QLSA requires the eigenvalues of

to lie in : those outside this set will be discarded by the algorithm111While [Childs et al., 2017] states the requirement in their main theorem, the relaxed assumption and spectrum in is sufficient for all of their proofs.. We can choose to be some arbitrary constant, say, ; rescale by , where is the estimate obtained with the power method; and inflate by a factor . This ensures that the spectrum of the rescaled satisfies the requirements. The overall running time is not affected, because the time for the remaining subroutines dominates , as will be seen in the rest of this section (e.g., simply loading and for the QLSA requires time , see the proof of Thm. 4).

### 4.1 Implementation of the oracles Pab and Pb

Recall that the quantum linear system algorithm of Thm. 3 requires access to two quantum oracles describing the data of the linear system. We now discuss their implementation, so as to compute their gate complexity. Given computational basis states , in this section we denote for brevity. We start with the two maps necessary for .

Since the map is in-place, we implement it as the sequence of mappings .

To implement the first part , for given and we use controlled single-qubit operations. Let be the transformation that maps the basis state into : this can be computed classically in time, since it requires at most bit-flips. Then the desired mapping requires at most -controlled gates, which require basic gates each plus polylogarithmic extra space. The mapping is then easy to construct, as it requires at most CX gates to compute in the second register (by taking the XOR of the third register with the second). We then undo the computation in the third register. Thus, we obtain a total of basic gates for each and . There are at most nonzero elements per column (recall the transformation to make symmetric), yielding basic gates to construct the first map of for a given column .

The implementation of is similar. Since and the largest entry of is , the map requires basic gates for each and . Summarizing, the implementation of the two maps for requires basic gates per column, for a total of gates.

To implement we need to construct the state vector . It is known that this can be done using basic gates [Grover and Rudolph, 2002]; below, we formalize the fact that if is -sparse, gates are sufficient.

###### Proposition 2.

Let be a vector with nonzero elements. Then the state can be prepared with gates.

###### Proof.

We can rely on the construction of [Grover and Rudolph, 2002], that consists in a sequence of controlled rotations. The construction can be understood on a binary tree with leaves, representing the elements of , and where each inner node has value equal to the sum of the squares of the children nodes. Each inner node requires a controlled rotation with an angle determined by the square root of the ratio between the two children nodes. Assuming that is a power of 2 for simplicity, the tree has levels and the construction when is dense () requires controlled rotations. Notice that at each inner node, a controlled rotation is necessary only if both children have nonzero value. If only one child has nonzero value the operation amounts to a controlled , and if both children have value zero no operation takes place. If , there can be at most nodes at each level that require a controlled rotation, and in fact the deepest level of the binary tree such that all nodes contain nonzero value is . We need controlled rotations up to this level of the tree, and for each of these nodes we may need at most susequent operations, yielding the total gate complexity . ∎

### 4.2 Sign estimation

To determine the sign of a reduced cost we will use the subroutines SignEstNFN, given in Alg. 2, and SignEstNFP, given in Alg. 3

. The acronyms “NFN” an “NFP” stand for “no false negatives” and “no false positives”, respectively, based on an interpretation of these subroutines as classifiers that need to assign a 0-1 label to the input. We need two such subroutines because the quantum phase estimation, on which they are based, is a continuous transformation. Therefore, a routine that has “no false negatives”, i.e., with high probability it returns 1 if the input data’s true class is 1 (in our case, this means that a given amplitude is

), may have false positives: it may also return 1 with too large probability for some input that belongs to class 0 (i.e., the given amplitude is ). The probability of these undesirable events decreases as we get away from the threshold . We therefore adjust the precision and thresholds used in the sign estimation routines to construct one version that with high probability returns 1 if the true class is 1 (no false negatives), and one version that with high probability returns 0 if the true class is 0 (no false positives).

###### Proposition 3.

Let with real up to a global phase factor, and a basis state index such that . Then with probability at least 3/4, if Algorithm 2 (SignEstNFN) returns , and if Algorithm 2 returns then , making calls to or their controlled version, and requiring additional gates.

###### Proof.

We track the evolution of the state, isolating the coefficient of . After line 5 of the algorithm we are in the state . Applying on the first qubit gives:

 1√2(1√2|0⟩+1√2|1⟩)⊗|k⟩+1√2(1√2|0⟩−1√2|1⟩)⊗U|0q⟩)= 12(1+αk)|0⟩⊗|k⟩+12(1−αk)|1⟩⊗|k⟩+12(|0⟩−|1⟩)⊗2n−1∑j=0j≠kαj|j⟩.

We now apply amplitude estimation to the state to determine the magnitude of . The exact phase that must be estimated by the phase estimation portion of the algorithm is the number such that:

 sinπθ=12(1+αk).

Suppose . Then , implying, by monotonicity of over its domain:

 θ>sin−1(12(1−ϵ))π≥π6+2√3(12(1−ϵ)−12)π≥π6−ϵ√3π≥16−ϵ√3π,

where for the second inequality we have used the Taylor expansion of at :

 sin−1(x)≈π6+2√3(x−12)+2√39(x−12)2.

Rather than , we obtain an approximation up to a certain precision. By Prop. 1, using qubits of precision, then

 |θ−~θ|≤ϵ√3π (2)

with probability at least . Then we must have with probability at least 3/4. In this case, the algorithm returns 1 (recall that if the first bit of the expansion is 1, i.e., , we must take the complement because we do not know the sign of eigenvalue on which phase estimation is applied, see [Brassard et al., 2002] for details).

Now suppose the algorithm returns 0. This implies that we have , so that with probability at least because of (2). Thus, remembering that by assumption we are in the monotone part of the domain of , we have:

 αk=2sinπθ−1<2sin(π6−ϵ√3)−1<2(12+√32(π6−ϵ√3−π6))−1=−ϵ,

where for the second inequality we used the Taylor expansion of at :

 sin(x)=12+√32(x−π6)−14(x−π6)2.

Regarding the gate complexity, amplitude estimation with digits of precision requires calls to , controlled-, or controlled-, and the reflection circuits of the Grover iterator (which can be implemented with

basic gates and auxiliary qubits). The inverse quantum Fourier transform on

qubits requires basic gates; finally, the controlled unitary to construct , as well as the final bitwise comparison, require gates. ∎

###### Proposition 4.

Let with real up to a global phase factor, and a basis state index such that . Then with probability at least 3/4, if Algorithm 3 (SignEstNFP) returns , and if Algorithm 3 returns then , making calls to or their controlled version, and requiring additional gates.

###### Proof.

The proof is similar to Prop. 3, and we use the same symbols and terminology. We apply amplitude estimation to the state to determine the magnitude of . Suppose . Then , implying:

 θ ≤sin−1(12(1−ϵ))π≤π6+2√3(12(1−ϵ)−12)+89√3(12(1−ϵ)−12)2π ≤π6−ϵ√3+4ϵ29√3π≤16−7ϵ9√3π,

where we used , and for the second inequality we have used the Taylor expansion of at :

 sin−1(x)≈π6+2√3(x−12)+2√39(x−12)2+8√327(x−12)3,

coupled with the fact that the third-order term is negative at . Rather than , we obtain an approximation up to a certain precision. By Prop. 1, using qubits of precision, then

 |θ−~θ|≤ϵ9√3π (3)

with probability at least . Then we must have with probability at least 3/4. In this case, the algorithm returns 0.

Now suppose the algorithm returns 1. This implies that we have , so that with probability at least because of (3). Thus, for and , we have:

 αk =2sinπθ−1≥2sin(π6−3ϵ9√3)−1>2⎛⎝12−√323ϵ9√3−14(3ϵ9√3)2+√312(3ϵ9√3)3⎞⎠−1 ≥−ϵ3−ϵ108+√312(3ϵ9√3)3≥−ϵ,

where for the second inequality we used the Taylor expansion of at :

 sin(x)=12+√32(x−π6)−14(x−π6)2−√312(x−π6)3+148(x−π6)4.

The rest of the analysis is the same as in Prop. 3. ∎

### 4.3 Determining if the basis is optimal, or the variable entering the basis

To find a column that can enter the basis, we use the subroutine CanEnter described in Alg. 5, which then becomes a building block of FindColumn, detailed in Alg. 6.

###### Proposition 5.

With probability , if Algorithm 5 (CanEnter) returns 1 then column has reduced cost . The gate complexity of the algorithm is , where is the gate complexity of the quantum linear system algorithm on the input.

###### Proof.

CanEnter relies on Algorithms 4 (RedCost) and 2 (SignEstNFN). Prop. 3 ensures that the return value of the algorithm is as stated, provided that RedCost encodes the reduced cost as desired. We now analyze RedCost. The QLSA encodes the solution to:

 (AB001)(xy)=(Akck) (4)

in a state that is guaranteed to be -close to the exact normalized solution . Call this state , where