Quantum-Inspired Support Vector Machine

06/21/2019 ∙ by Chen Ding, et al. ∙ 7

Support vector machine (SVM) is a particularly powerful and flexible supervised learning model that analyze data for both classification and regression, whose usual complexity scales polynomially with the dimension and number of data points. Inspired by the quantum SVM, we present a quantum-inspired classical algorithm for SVM using fast sampling techniques. In our approach, we develop a general method to approximately calculate the kernel function and make classification via carefully sampling the data matrix, thus our approach can be applied to various types of SVM, such as linear SVM, poly-kernel SVM and soft SVM. Theoretical analysis shows one can find the supported hyperplanes on a data set which we have sampling access, and thus make classification with arbitrary success probability in logarithmic runtime, matching the runtime of the quantum SVM.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Since the 1980s, quantum computing has attracted wide attention due to its enormous advantages in solving hard computational problems, such as integer factorization [1], database searching [2], machine learning [3]

and so on. In 1997, Daniel R. Simon offered compelling evidence that the quantum model may have significantly more complexity theoretic power than the probabilistic Turing machine

[4]. However, it remains an interesting question where is the border between classical computing and quantum computing. Although many proposed quantum algorithms have exponential speedups for the existing classical algorithms, is there any way we can accelerate such classical algorithms to the same complexity of the quantum ones?

In 2018, inspired by the quantum recommendation system algorithm proposed by Iordanis Kerenidis and Anupam Prakash [5], Ewin Tang designed a classical algorithm to produce a recommendation algorithm that can achieve an exponential improvement on previous algorithms [6], which is a breakthrough that shows how to apply the subsampling strategy based on Alan Frieze, Ravi Kannan, and Santosh Vempala’s 2004 algorithm [7]

to find a low-rank approximation of a matrix. Subsequently, Tang continued to use same techniques to dequantize two quantum machine learning algorithms, quantum principal component analysis

[8] and quantum supervised clustering [9], and shows classical algorithms could also match the bounds and runtime of the corresponding quantum algorithms, with only polynomial slowdown [10].

Later, András Gilyén et al. [11] and Nai-Hui Chia et al. [12] independently and simultaneously proposed a quantum-inspired matrix inverse algorithm with logarithmic complexity of matrix size, which eliminates the speedup advantage of the famous HHL algorithm [13] on certain conditions. Recently, Juan Miguel Arrazola et al. studied the actual performance of quantum-inspired algorithms and found that quantum-inspired algorithms can perform well in practice under given conditions. However, the conditions should be further reduced if we want to apply the algorithms to practical datasets [14]. All of these works give a very promising future for applying the quantum-inspired algorithm in the machine learning area, where matrix inverse algorithms are universally used.

In this paper, we want to bring the “magical power” of quantum-inspired methods to the support vector machine (SVM), a data classification algorithm which is commonly used in machine learning area [15, 16]

. However, SVM is still not powerful enough when dealing with large data sets and spaces, for a phenomenon called the curse of dimensionality which describes the complexity and overfitting problem is usually observed

[17]. In 2014, Patrick Rebentrost, Masoud Mohseni and Seth Lloyd proposed a quantum SVM [18], which can achieve an exponential speedup compared with the classical algorithms. Inspired by the quantum SVM algorithm, Tang’s methods[6] and András Gilyén et al.’s work[11], we propose a quantum-inspired classical SVM algorithm.

The main idea is first to transform the problem of making classification to the problem of solving equation , where is the kernel matrix, and is the data matrix. We note that the quantum-inspired matrix inverse algorithm [11] can not be invoked directly to solve the equations here, since we only have sampling access to the data matrix instead of kernel matrix

. Thus, we find the approximate singular value decomposition of the kernel matrix

by developing an indirect sampling technique that will sample the data matrix instead of . Finally we make classification by approximately computing the classification expression, which consists of the solution of the former equation, the data matrix and the querying point . To avoid a polynomial complexity overhead, we employ methods of sampling dot computation and rejection sampling. In the whole process, we need to avoid the direct operation on the vectors or matrices with the same size as the kernel, in case losing the exponential speedup. Analysis shows that our algorithm can make accurate classification with an appropriate success probability by controlling the computation error, within only logarithmic runtime of the dimension of data space and the number of data points.

Ii Preliminary

Ii-a Svm

We show the simplest case that the data points are linearly separable and leave the other cases to further discussion(see Section VI).

Suppose we have data points , where depending on the class to which belongs. A SVM finds a pair of parallel hyperplanes that strictly divides the points into two classes depends on the given data. Then for any new input points, it can make classification by its relative position with the hyperplanes. Here are parameter of hyperplanes, given by the following optimization problem

for the data is linear separable, as in [18], by taking duel problem we have

in which , . Taking derivation of the objective function, we have


A solution to the optimization problem must be the solution of equation 1. Thus once we find , we can make classification for any given point by , in which for any .

Ii-B Sampling on vectors

We show the idea of sampling to get indices, which is the key technique used in our algorithm, as well as in [7, 6, 11].

Definition 1 (Sampling on vectors).

Suppose , define

as a probability distribution that:

A sampling on probability distribution is here called a sampling on .

Ii-C Basic sampling algorithms

We introduce two algorithms employing sampling techniques for saving complexity. They are treated as oracles that outputs certain outcomes with controlled errors in the main algorithm.

Ii-C1 Trace inner product estimation

Here we invoke Alg. 1 from [11]. This algorithm achieves computation of inner products within certain success probability and error.

1: that we have all access in complexity and that we have query access in complexity . Error bound and success probability bound .
2:Estimate to precision with probability at least .
3:Sample from row norms of , sample from , let .
4:Repeat step 3 times and compute the mean of , note as .
5:Repeat step 4 times and take the median of , note as .
Algorithm 1 Trace Inner Product Estimation.

Ii-C2 Rejection sampling

Alg. 2 achieves sampling of a vector that we do not have full query access in time logarithmic of its length.

1: that we have length-square access and that we have norm access and that we have query access.
2:Sample from length-square distribution of .
3:Take .
4:Sample a row index by row norm square of .
5:Query and compute .
6:Sample a real number uniformly distributed in . If , output , else, go to step 4.
7:The row index .
Algorithm 2 Rejection sampling.

Iii Quantum-inspired SVM Algorithm

We show the main algorithm (Alg. 4) that make classification as the classical SVMs. Notice that actual computation only happens when we use the expression "compute" in this algorithm. Otherwise it will lose the exponential-speedup advantage for operations on large vectors or matrices. Fig. 1 shows the algorithm process.

1: training data points of form , where depending on the class to which belongs. Error bound and success probability bound .
2:Find that with success probability at least , in which .
3:For any given , find its class.
4:Init: Set as described in (3) and (4).
5:Sample columns: Sample column indices according to the column norm squares . Define to be the matrix whose -th column is . Define .
Algorithm 3 Quantum-inspired SVM Algorithm(Part 1).
6:Sample rows: Sample uniformly, then sample a row index distributed as . Sample a total number of row indices this way. Define whose -th row is . Define .
7:Spectral decomposition:

Compute the eigenvalues and eigenvectors of

. Denote here as , s.t.

is orthogonal matrix while

is diagonal matrix with only first diagonal elements non-zero.
8:Approximate eigenvectors: Let . Define for .
9:Estimate matrix elements: Compute to precision by algorithm 1, each with success probability . Let .
10:Find sign: Define . Compute to precision with success probability . Tell its sign.
11:The answer class depends on the sign. Postive corresponds to 1 while negative to .
Algorithm 4 Quantum-inspired SVM Algorithm(Part 2).

Fig. 1: A flow chart demonstration of the main algorithm. It implements the pseudo-inverse by finding an approximate singular value decomposition of via subsampling(Steps 5 and 6), then inverting the singular values(Step 7). Theorems used in each steps are marked. Notice that we sample on to achieve the sampling effect on , which is the indirect sampling technique we developed in this paper.

The following theorem is to be proved in section IV and section V.

Theorem 1 (Correctness).

If data matrix satisfies , and we have sample access of in logarithmic time on and , then algorithm 4

can classify any point

in logarithmic time on and with probability at least .

Iv Accuracy

Let , , in which . Then the total error of classification expression

Here with probability no less than is guaranteed by step 10 of algorithm 4.

with probability no less than is shown in subsection IV-A.

with probability no less than is shown in subsection IV-B.


with success probability no less than .

For achieving accurate classification, we only need a relative error less than . Thus by lessen , we can achieve this goal in any given probability range.

Iv-a Proof of

Notice that

Here we put 5 theorems(from 2 to 6) for , in which theorem 2 and 5 are invoked from [11]. We offer proofs for the other theorems in appendix B. Theorem 6 proves based on the conditions and the former theorems.

Theorem 2.

Let be a matrix and let be the sample matrix that , then , we have

Hence, for , with probability at least we have

Theorem 3.

Suppose is a system of orthogonal vectors while

Suppose . Then

Theorem 4.

Suppose that is a system of orthogonal vectors that

Suppose , , . Let , then


In which , .

Theorem 5.

If , has columns that spans the row and column space of , then

Theorem 6.

Suppose that is a system of approximated orthogonal vectors that


In which , , , . Then

To conclude, for , we need to pick , and such that

and decide the sampling parameter as


Iv-B Proof of

Notice that

For and , we have .

For , let be the vector that , we have

In which as shown in proof of theorem 6.

V Complexity

We analyzed the steps of main algorithm and show each step’s complexity is within the logarithmic time.

V-a The spectral decomposition

For symmetric matrix , the fastest classical spectral decomposition is through classical spectral symmetric QR method, of which the complexity is .

V-B Computation of

Inspecting algorithm 1, we can easily find its complexity is

For computation of by algorithm 1, we have

Observe that , and we can query the matrix element of in cost . Thus the complexity in step 9 is

V-C Computation of

V-C1 Query of

for any , we have . To estimate . We take and using algorithm 1 computing it to with success probability that

in which

and are given in subsection IV-A.

Thus for a query of , the error ,

and the success probability is . The complexity is

V-C2 Computation of

We compute by a sample of by the following algorithm 2. We do not care about error of or success probability here because if , it suffice to sample another index of . For the sampling depends on the elements in , the probability that we obtain small is low.

Here we can take as

and control it within logarithmic range of by reducing . Or we can simply compute using algorithm 1 and take it as .

For we need to query for times on average. If we get in the first way, the total sampling complexity is

V-C3 Computation of

Once we have index and query access to , by algorithm 1, we can compute to the assumption in step 10 of algorithm 4

which is within the logarithmic range of and . Considering the error and success probability in query process, the total error is

while the success probability is greater than

Vi Discussion

Other ways of solving this question are like solving from simply employing algorithm in [11] and use it in estimation of . Or solve twice by employing algorithm in [11] twice. Though it works for the simplest case, it can not deal with further problem like soft SVM because it simply depends on the ability to solve linear equations with sample access on coefficient matrix.

We here give some discussion about the improvements to be made in our algorithm in the future:

Vi-a Improving sampling for dot product

Remember in algorithm 1 we can estimate dot products for two vectors. However, it does not work well for all the conditions, like when and are donminated by a coordinate. For randomness, [19] implies that we can apply a spherically random rotation to all , which does not change the kernel matrix

, but will make all the coordinates random variables distributed evenly.

Vi-B Non-linear SVM

When the training data is not linearly separable, there is not a pair of parallel hyperplanes that strictly divides the points into two classes depends on the given data. Hence the solution to the original optimization problem does not exist. There are two kinds of improving methods here.

A non-linear SVM improves the fitting ability by changing the kernel function, thus making it possible to classify strictly.

Take polynomial kernel for example, we have

The equation 1 becomes

If we take

Note that ’s size is , , the column norms of are

Thus we can sample on , and for , algorithm 4 is still suitable here.

Vi-C Soft SVM

While a non-linear SVM may bring overfitting problem while achieving strict classification, another improving methods is soft SVM, which allows for wrong classification on training data and minimize the offsets.

By introducing a soft variable here the equation to solve becomes

We only consider its sub-equation


For , we have

Thus to find solution of (5), we only need to add to all the eigenvalues of in step 7 of algorithm 4 and continue.

Vii Conclusion

We have proposed a quantum-inspired SVM algorithm that achieves exponential speedup over the previous classical algorithms. We hope that the techniques developed in our work can lead to the emergence of more efficient classical algorithms, such as applying our method to more complex support vector machines [16, 20] or other machine learning algorithms. The technique of indirect sampling can expand the application area of fast sampling techniques. And it will make contribution to the further competition between classical algorithms and quantum ones.

Some improvements on our work would be made in the future, such as reducing the conditions on the data matrix and further reducing the complexity, which can be achieved through a deeper investigation on the algorithm and the error propagation process.

Certain investigations on the application of such an algorithm are also required to make quantum-inspired SVM operable in solving questions like face recognition

[15] and signal processing[21].

We note that our work, as well as the previous quantum-inspired algorithms, are not intended to demonstrate that quantum computing is uncompetitive. We want to find out where the boundaries of classical and quantum computing are, and we expect new quantum algorithms are developed to beat our algorithm.

Appendix A Notations

Symbol Meaning
vector or matrix with only one column
pseudo inverse of
adjoint of
-th row of
-th column of
2-norm of
Frobenius norm of
complexity for computing
complexity for querying

Appendix B Proof of Theorems in Iv

B-a Proof of Theorem 3


B-B Proof of Theorem 4