Since the 1980s, quantum computing has attracted wide attention due to its enormous advantages in solving hard computational problems, such as integer factorization , database searching , machine learning 
and so on. In 1997, Daniel R. Simon offered compelling evidence that the quantum model may have significantly more complexity theoretic power than the probabilistic Turing machine. However, it remains an interesting question where is the border between classical computing and quantum computing. Although many proposed quantum algorithms have exponential speedups for the existing classical algorithms, is there any way we can accelerate such classical algorithms to the same complexity of the quantum ones?
In 2018, inspired by the quantum recommendation system algorithm proposed by Iordanis Kerenidis and Anupam Prakash , Ewin Tang designed a classical algorithm to produce a recommendation algorithm that can achieve an exponential improvement on previous algorithms , which is a breakthrough that shows how to apply the subsampling strategy based on Alan Frieze, Ravi Kannan, and Santosh Vempala’s 2004 algorithm 
to find a low-rank approximation of a matrix. Subsequently, Tang continued to use same techniques to dequantize two quantum machine learning algorithms, quantum principal component analysis and quantum supervised clustering , and shows classical algorithms could also match the bounds and runtime of the corresponding quantum algorithms, with only polynomial slowdown .
Later, András Gilyén et al.  and Nai-Hui Chia et al.  independently and simultaneously proposed a quantum-inspired matrix inverse algorithm with logarithmic complexity of matrix size, which eliminates the speedup advantage of the famous HHL algorithm  on certain conditions. Recently, Juan Miguel Arrazola et al. studied the actual performance of quantum-inspired algorithms and found that quantum-inspired algorithms can perform well in practice under given conditions. However, the conditions should be further reduced if we want to apply the algorithms to practical datasets . All of these works give a very promising future for applying the quantum-inspired algorithm in the machine learning area, where matrix inverse algorithms are universally used.
In this paper, we want to bring the “magical power” of quantum-inspired methods to the support vector machine (SVM), a data classification algorithm which is commonly used in machine learning area [15, 16]
. However, SVM is still not powerful enough when dealing with large data sets and spaces, for a phenomenon called the curse of dimensionality which describes the complexity and overfitting problem is usually observed. In 2014, Patrick Rebentrost, Masoud Mohseni and Seth Lloyd proposed a quantum SVM , which can achieve an exponential speedup compared with the classical algorithms. Inspired by the quantum SVM algorithm, Tang’s methods and András Gilyén et al.’s work, we propose a quantum-inspired classical SVM algorithm.
The main idea is first to transform the problem of making classification to the problem of solving equation , where is the kernel matrix, and is the data matrix. We note that the quantum-inspired matrix inverse algorithm  can not be invoked directly to solve the equations here, since we only have sampling access to the data matrix instead of kernel matrix
. Thus, we find the approximate singular value decomposition of the kernel matrixby developing an indirect sampling technique that will sample the data matrix instead of . Finally we make classification by approximately computing the classification expression, which consists of the solution of the former equation, the data matrix and the querying point . To avoid a polynomial complexity overhead, we employ methods of sampling dot computation and rejection sampling. In the whole process, we need to avoid the direct operation on the vectors or matrices with the same size as the kernel, in case losing the exponential speedup. Analysis shows that our algorithm can make accurate classification with an appropriate success probability by controlling the computation error, within only logarithmic runtime of the dimension of data space and the number of data points.
We show the simplest case that the data points are linearly separable and leave the other cases to further discussion(see Section VI).
Suppose we have data points , where depending on the class to which belongs. A SVM finds a pair of parallel hyperplanes that strictly divides the points into two classes depends on the given data. Then for any new input points, it can make classification by its relative position with the hyperplanes. Here are parameter of hyperplanes, given by the following optimization problem
for the data is linear separable, as in , by taking duel problem we have
in which , . Taking derivation of the objective function, we have
A solution to the optimization problem must be the solution of equation 1. Thus once we find , we can make classification for any given point by , in which for any .
Ii-B Sampling on vectors
Definition 1 (Sampling on vectors).
Suppose , define as a probability distribution that:
as a probability distribution that:
A sampling on probability distribution is here called a sampling on .
Ii-C Basic sampling algorithms
We introduce two algorithms employing sampling techniques for saving complexity. They are treated as oracles that outputs certain outcomes with controlled errors in the main algorithm.
Ii-C1 Trace inner product estimation
Ii-C2 Rejection sampling
Alg. 2 achieves sampling of a vector that we do not have full query access in time logarithmic of its length.
Iii Quantum-inspired SVM Algorithm
We show the main algorithm (Alg. 4) that make classification as the classical SVMs. Notice that actual computation only happens when we use the expression "compute" in this algorithm. Otherwise it will lose the exponential-speedup advantage for operations on large vectors or matrices. Fig. 1 shows the algorithm process.
Let , , in which . Then the total error of classification expression
with probability no less than is shown in subsection IV-A.
with probability no less than is shown in subsection IV-B.
with success probability no less than .
For achieving accurate classification, we only need a relative error less than . Thus by lessen , we can achieve this goal in any given probability range.
Iv-a Proof of
Here we put 5 theorems(from 2 to 6) for , in which theorem 2 and 5 are invoked from . We offer proofs for the other theorems in appendix B. Theorem 6 proves based on the conditions and the former theorems.
Let be a matrix and let be the sample matrix that , then , we have
Hence, for , with probability at least we have
Suppose is a system of orthogonal vectors while
Suppose . Then
Suppose that is a system of orthogonal vectors that
Suppose , , . Let , then
In which , .
If , has columns that spans the row and column space of , then
Suppose that is a system of approximated orthogonal vectors that
In which , , , . Then
To conclude, for , we need to pick , and such that
and decide the sampling parameter as
Iv-B Proof of
For and , we have .
For , let be the vector that , we have
In which as shown in proof of theorem 6.
We analyzed the steps of main algorithm and show each step’s complexity is within the logarithmic time.
V-a The spectral decomposition
For symmetric matrix , the fastest classical spectral decomposition is through classical spectral symmetric QR method, of which the complexity is .
V-B Computation of
V-C Computation of
V-C1 Query of
for any , we have . To estimate . We take and using algorithm 1 computing it to with success probability that
and are given in subsection IV-A.
Thus for a query of , the error ,
and the success probability is . The complexity is
V-C2 Computation of
We compute by a sample of by the following algorithm 2. We do not care about error of or success probability here because if , it suffice to sample another index of . For the sampling depends on the elements in , the probability that we obtain small is low.
Here we can take as
and control it within logarithmic range of by reducing . Or we can simply compute using algorithm 1 and take it as .
For we need to query for times on average. If we get in the first way, the total sampling complexity is
V-C3 Computation of
Other ways of solving this question are like solving from simply employing algorithm in  and use it in estimation of . Or solve twice by employing algorithm in  twice. Though it works for the simplest case, it can not deal with further problem like soft SVM because it simply depends on the ability to solve linear equations with sample access on coefficient matrix.
We here give some discussion about the improvements to be made in our algorithm in the future:
Vi-a Improving sampling for dot product
Remember in algorithm 1 we can estimate dot products for two vectors. However, it does not work well for all the conditions, like when and are donminated by a coordinate. For randomness,  implies that we can apply a spherically random rotation to all , which does not change the kernel matrix
, but will make all the coordinates random variables distributed evenly.
Vi-B Non-linear SVM
When the training data is not linearly separable, there is not a pair of parallel hyperplanes that strictly divides the points into two classes depends on the given data. Hence the solution to the original optimization problem does not exist. There are two kinds of improving methods here.
A non-linear SVM improves the fitting ability by changing the kernel function, thus making it possible to classify strictly.
Take polynomial kernel for example, we have
The equation 1 becomes
If we take
Note that ’s size is , , the column norms of are
Thus we can sample on , and for , algorithm 4 is still suitable here.
Vi-C Soft SVM
While a non-linear SVM may bring overfitting problem while achieving strict classification, another improving methods is soft SVM, which allows for wrong classification on training data and minimize the offsets.
We have proposed a quantum-inspired SVM algorithm that achieves exponential speedup over the previous classical algorithms. We hope that the techniques developed in our work can lead to the emergence of more efficient classical algorithms, such as applying our method to more complex support vector machines [16, 20] or other machine learning algorithms. The technique of indirect sampling can expand the application area of fast sampling techniques. And it will make contribution to the further competition between classical algorithms and quantum ones.
Some improvements on our work would be made in the future, such as reducing the conditions on the data matrix and further reducing the complexity, which can be achieved through a deeper investigation on the algorithm and the error propagation process.
Certain investigations on the application of such an algorithm are also required to make quantum-inspired SVM operable in solving questions like face recognition and signal processing.
We note that our work, as well as the previous quantum-inspired algorithms, are not intended to demonstrate that quantum computing is uncompetitive. We want to find out where the boundaries of classical and quantum computing are, and we expect new quantum algorithms are developed to beat our algorithm.
Appendix A Notations
|vector or matrix with only one column|
|pseudo inverse of|
|-th row of|
|-th column of|
|Frobenius norm of|
|complexity for computing|
|complexity for querying|
Appendix B Proof of Theorems in Iv
B-a Proof of Theorem 3