# Quantum-Inspired Support Vector Machine

Support vector machine (SVM) is a particularly powerful and flexible supervised learning model that analyze data for both classification and regression, whose usual complexity scales polynomially with the dimension and number of data points. Inspired by the quantum SVM, we present a quantum-inspired classical algorithm for SVM using fast sampling techniques. In our approach, we develop a general method to approximately calculate the kernel function and make classification via carefully sampling the data matrix, thus our approach can be applied to various types of SVM, such as linear SVM, poly-kernel SVM and soft SVM. Theoretical analysis shows one can find the supported hyperplanes on a data set which we have sampling access, and thus make classification with arbitrary success probability in logarithmic runtime, matching the runtime of the quantum SVM.

## Authors

• 3 publications
• 1 publication
• 3 publications
• ### Quantum-enhanced least-square support vector machine: simplified quantum algorithm and sparse solutions

Quantum algorithms can enhance machine learning in different aspects. He...
08/05/2019 ∙ by Jie Lin, et al. ∙ 13

• ### Support Vector Machines on Noisy Intermediate Scale Quantum Computers

Support vector machine algorithms are considered essential for the imple...
09/26/2019 ∙ by Jiaying Yang, et al. ∙ 0

• ### Support Vector Machine Model for Currency Crisis Discrimination

Support Vector Machine (SVM) is powerful classification technique based ...
03/03/2014 ∙ by Arindam Chaudhuri, et al. ∙ 0

• ### Highly Efficient Human Action Recognition with Quantum Genetic Algorithm Optimized Support Vector Machine

In this paper we propose the use of quantum genetic algorithm to optimiz...
11/27/2017 ∙ by Yafeng Liu, et al. ∙ 0

• ### A bagging and importance sampling approach to Support Vector Machines

An importance sampling and bagging approach to solving the support vecto...
08/17/2018 ∙ by R. Bárcenas, et al. ∙ 0

• ### A complexity analysis of statistical learning algorithms

We apply information-based complexity analysis to support vector machine...
12/19/2012 ∙ by Mark A. Kon, et al. ∙ 0

• ### Wasserstein Coresets for Lipschitz Costs

Sparsification is becoming more and more relevant with the proliferation...
05/18/2018 ∙ by Sebastian Claici, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Since the 1980s, quantum computing has attracted wide attention due to its enormous advantages in solving hard computational problems, such as integer factorization [1], database searching [2], machine learning [3]

and so on. In 1997, Daniel R. Simon offered compelling evidence that the quantum model may have significantly more complexity theoretic power than the probabilistic Turing machine

[4]. However, it remains an interesting question where is the border between classical computing and quantum computing. Although many proposed quantum algorithms have exponential speedups for the existing classical algorithms, is there any way we can accelerate such classical algorithms to the same complexity of the quantum ones?

In 2018, inspired by the quantum recommendation system algorithm proposed by Iordanis Kerenidis and Anupam Prakash [5], Ewin Tang designed a classical algorithm to produce a recommendation algorithm that can achieve an exponential improvement on previous algorithms [6], which is a breakthrough that shows how to apply the subsampling strategy based on Alan Frieze, Ravi Kannan, and Santosh Vempala’s 2004 algorithm [7]

to find a low-rank approximation of a matrix. Subsequently, Tang continued to use same techniques to dequantize two quantum machine learning algorithms, quantum principal component analysis

[8] and quantum supervised clustering [9], and shows classical algorithms could also match the bounds and runtime of the corresponding quantum algorithms, with only polynomial slowdown [10].

Later, András Gilyén et al. [11] and Nai-Hui Chia et al. [12] independently and simultaneously proposed a quantum-inspired matrix inverse algorithm with logarithmic complexity of matrix size, which eliminates the speedup advantage of the famous HHL algorithm [13] on certain conditions. Recently, Juan Miguel Arrazola et al. studied the actual performance of quantum-inspired algorithms and found that quantum-inspired algorithms can perform well in practice under given conditions. However, the conditions should be further reduced if we want to apply the algorithms to practical datasets [14]. All of these works give a very promising future for applying the quantum-inspired algorithm in the machine learning area, where matrix inverse algorithms are universally used.

In this paper, we want to bring the “magical power” of quantum-inspired methods to the support vector machine (SVM), a data classification algorithm which is commonly used in machine learning area [15, 16]

. However, SVM is still not powerful enough when dealing with large data sets and spaces, for a phenomenon called the curse of dimensionality which describes the complexity and overfitting problem is usually observed

[17]. In 2014, Patrick Rebentrost, Masoud Mohseni and Seth Lloyd proposed a quantum SVM [18], which can achieve an exponential speedup compared with the classical algorithms. Inspired by the quantum SVM algorithm, Tang’s methods[6] and András Gilyén et al.’s work[11], we propose a quantum-inspired classical SVM algorithm.

The main idea is first to transform the problem of making classification to the problem of solving equation , where is the kernel matrix, and is the data matrix. We note that the quantum-inspired matrix inverse algorithm [11] can not be invoked directly to solve the equations here, since we only have sampling access to the data matrix instead of kernel matrix

. Thus, we find the approximate singular value decomposition of the kernel matrix

by developing an indirect sampling technique that will sample the data matrix instead of . Finally we make classification by approximately computing the classification expression, which consists of the solution of the former equation, the data matrix and the querying point . To avoid a polynomial complexity overhead, we employ methods of sampling dot computation and rejection sampling. In the whole process, we need to avoid the direct operation on the vectors or matrices with the same size as the kernel, in case losing the exponential speedup. Analysis shows that our algorithm can make accurate classification with an appropriate success probability by controlling the computation error, within only logarithmic runtime of the dimension of data space and the number of data points.

## Ii Preliminary

### Ii-a Svm

We show the simplest case that the data points are linearly separable and leave the other cases to further discussion(see Section VI).

Suppose we have data points , where depending on the class to which belongs. A SVM finds a pair of parallel hyperplanes that strictly divides the points into two classes depends on the given data. Then for any new input points, it can make classification by its relative position with the hyperplanes. Here are parameter of hyperplanes, given by the following optimization problem

 min 12∥w∥ s.t. yi(wTxi+b)≥1,i=1,...,n

for the data is linear separable, as in [18], by taking duel problem we have

 max m∑j=1yiαi−12m∑j,k=1αjAjkαk s.t. ∑α=0 \& yjαj≥0,i=1,...,n

in which , . Taking derivation of the objective function, we have

 XTXα=y (1)

A solution to the optimization problem must be the solution of equation 1. Thus once we find , we can make classification for any given point by , in which for any .

### Ii-B Sampling on vectors

We show the idea of sampling to get indices, which is the key technique used in our algorithm, as well as in [7, 6, 11].

###### Definition 1 (Sampling on vectors).

Suppose , define

as a probability distribution that:

 x∼q(v):P[x=i]=|vi|2∥v∥2

A sampling on probability distribution is here called a sampling on .

### Ii-C Basic sampling algorithms

We introduce two algorithms employing sampling techniques for saving complexity. They are treated as oracles that outputs certain outcomes with controlled errors in the main algorithm.

#### Ii-C1 Trace inner product estimation

Here we invoke Alg. 1 from [11]. This algorithm achieves computation of inner products within certain success probability and error.

#### Ii-C2 Rejection sampling

Alg. 2 achieves sampling of a vector that we do not have full query access in time logarithmic of its length.

## Iii Quantum-inspired SVM Algorithm

We show the main algorithm (Alg. 4) that make classification as the classical SVMs. Notice that actual computation only happens when we use the expression "compute" in this algorithm. Otherwise it will lose the exponential-speedup advantage for operations on large vectors or matrices. Fig. 1 shows the algorithm process.

The following theorem is to be proved in section IV and section V.

###### Theorem 1 (Correctness).

If data matrix satisfies , and we have sample access of in logarithmic time on and , then algorithm 4

can classify any point

in logarithmic time on and with probability at least .

## Iv Accuracy

Let , , in which . Then the total error of classification expression

 E =Δ((→x−→xj)TXα) ≤Δ((→x−→xj)TX~α)+|(→x−→xj)TX(~α−α)| ≤Δ((→x−→xj)TX~α)+∥→x−→xj∥(∥α′−α∥+∥~α−α′∥) =E1+E2+E3

Here with probability no less than is guaranteed by step 10 of algorithm 4.

with probability no less than is shown in subsection IV-A.

with probability no less than is shown in subsection IV-B.

Thus

 E ≤E1+E2+E3 ≤2ϵκ2√m∥→x−→xj∥

with success probability no less than .

For achieving accurate classification, we only need a relative error less than . Thus by lessen , we can achieve this goal in any given probability range.

### Iv-a Proof of E2≤ϵ2∥α∥∥→x−→xj∥

Notice that

 E2 =∥→x−→xj∥∥α−α′∥ =∥→x−→xj∥∥α−VΣ−2VTAα∥ ≤∥α∥∥→x−→xj∥∥VΣ−2VTA−Im∥

Here we put 5 theorems(from 2 to 6) for , in which theorem 2 and 5 are invoked from [11]. We offer proofs for the other theorems in appendix B. Theorem 6 proves based on the conditions and the former theorems.

###### Theorem 2.

Let be a matrix and let be the sample matrix that , then , we have

 P[∥^XT^X−~XT~X∥≥ϵ∥^X∥∥^X∥F]≤2re−ϵ2c4

Hence, for , with probability at least we have

 ∥^XT^X−~XT~X∥≤ϵ∥^X∥∥^X∥F
###### Theorem 3.

Suppose is a system of orthogonal vectors while

 ~A=k∑l=1σ2lV′lV′Tl.

Suppose . Then

 |V′Ti^AV′j−δijσ2i|≤β
###### Theorem 4.

Suppose that is a system of orthogonal vectors that

 |V′Ti^AV′j−δijσ2i|≤β

Suppose , , . Let , then

 |VTiVj−δij|≤κ2((k+1)β+ϵ′),

and

 |VTiAVj−δijσ2i|≤ 2ϵ′κ6+(k2−2k+2)β3κ4+(3k−4)β2κ4 +3βκ4.

In which , .

###### Theorem 5.

If , has columns that spans the row and column space of , then

 ∥B∥≤∥(VTV)−1∥∥VTBV∥.
###### Theorem 6.

Suppose that is a system of approximated orthogonal vectors that

 |VTiVj−δij|≤γ1≤14k (2)
 |VTiAVj−δijσ2i|≤γ2

In which , , , . Then

 ∥VΣ−2VTA−Im∥≤ϵ

To conclude, for , we need to pick , and such that

 5kκ5(2ϵ′κ2+4β)≤3ϵ2
 4kκ2((k+1)β+ϵ′)≤1
 (k2−2k+2)β2+(3k−4)β≤1

and decide the sampling parameter as

 r =⌈4ln(2nη2)ϵ′2⌉ (3) c =⌈4κ2ln(2rη1)β2⌉ (4)

### Iv-B Proof of E3≤ϵ2∥α∥∥→x−→xj∥

Notice that

 E3 =∥→x−→xj∥∥α−~α∥

For and , we have .

For , let be the vector that , we have

 ∥~α−α′∥= ∥k∑l=1λl−~λlσ2lVl∥ = ∥Vz∥ ≤ √∥VTV∥∥z∥ ≤ 433ϵσ2l8√k∥y∥1σ2l√k ≤ 12ϵ∥α∥

In which as shown in proof of theorem 6.

## V Complexity

We analyzed the steps of main algorithm and show each step’s complexity is within the logarithmic time.

### V-a The spectral decomposition

For symmetric matrix , the fastest classical spectral decomposition is through classical spectral symmetric QR method, of which the complexity is .

### V-B Computation of ~λl

Inspecting algorithm 1, we can easily find its complexity is

 1ξ2ln(η)(L(A)+Q(B))

For computation of by algorithm 1, we have

 λl=1σ2lV′TlR→y=1σ2lTr[V′Tl^XTX→y]=1σ2lTr[X→yV′Tl^XT]

Observe that , and we can query the matrix element of in cost . Thus the complexity in step 9 is

 T6= O((∥X∥F∥→y∥ϵ∥→y∥/κ4√k)2ln(4kη)(L(X)+Q(→yV′Tl^XT))) = O(κ8k2∥X∥2Frϵ2ln(4kη))

### V-C Computation of yj−(→x−→xj)TX~α

#### V-C1 Query of ~α

for any , we have . To estimate . We take and using algorithm 1 computing it to with success probability that

 ϵ1=ϵ34∥X∥Fln(8η)C∥α∥∥x−xj∥
 η1=ηϵ232r∥X∥Fln(8η)

in which

 C=√r(κ4√k+κ2(k+1)β+κϵ′+38κ2ϵ√m)

and are given in subsection IV-A.

Thus for a query of , the error ,

 ϵ2 ≤ϵ1(r∑s=1us) ≤ϵ1√r∥u∥ ≤ϵ1√r( ⎷k∑l=1λ2lσ8l+ ⎷k∑l=1|~λl−λl|2σ8l) ≤ϵ1√r(κ4 ⎷k∑l=1λ2l+ ⎷k∑l=19ϵ2∥y∥264σ4lk) ≤ϵ1√r(κ4 ⎷k∑l=1yTVlVTly+38κ2ϵ∥y∥) ≤ϵ1√r(κ4√Tr[VlVTl]+38κ2ϵ∥y∥) ≤ϵ1√r(κ4√k+κ2(k+1)β+κϵ′+38κ2ϵ√m) ≤ϵ3∥α∥∥x−xj∥4∥X∥Fln(8η)

and the success probability is . The complexity is

 Q(~α) =O(r∥X∥Fϵ21∥α∥2∥x−xj∥2ln(1η1)) =O(16r∥X∥3Fln3(8η)Cϵ6∥α∥2∥x−xj∥2ln(32r∥X∥Fln(8η)ηϵ2))

#### V-C2 Computation of j

We compute by a sample of by the following algorithm 2. We do not care about error of or success probability here because if , it suffice to sample another index of . For the sampling depends on the elements in , the probability that we obtain small is low.

Here we can take as

 κ4√k+κ2(k+1)β+κϵ′+38κ2ϵ√m≥∥u∥≥∥^Xu∥

and control it within logarithmic range of by reducing . Or we can simply compute using algorithm 1 and take it as .

For we need to query for times on average. If we get in the first way, the total sampling complexity is

 ∥X∥FD∥XT^Xu∥Q(~α)

#### V-C3 Computation of yj−(→x−→xj)TX~α

Once we have index and query access to , by algorithm 1, we can compute to the assumption in step 10 of algorithm 4

 T7= 4∥X∥Fϵ2ln(8η)Q(~α)+∥X∥FD∥XT^Xu∥Q(~α) = O(16r∥X∥3Fln3(8η)Cϵ6∥α∥2∥x−xj∥2ln(32r∥X∥Fln(8η)ηϵ2)) ⋅∥X∥F(4ϵ2ln(8η)+D∥XT^Xu∥)

which is within the logarithmic range of and . Considering the error and success probability in query process, the total error is

 E7= 4∥X∥Fϵ2ln(8η)(∑us)ϵ1∥α∥∥x−xj∥+ϵ2∥α∥∥x−xj∥ ≤ ϵ∥α∥∥x−xj∥

while the success probability is greater than

 1−η8−4∥X∥Fϵ2ln(8η)rη1=1−η4

## Vi Discussion

Other ways of solving this question are like solving from simply employing algorithm in [11] and use it in estimation of . Or solve twice by employing algorithm in [11] twice. Though it works for the simplest case, it can not deal with further problem like soft SVM because it simply depends on the ability to solve linear equations with sample access on coefficient matrix.

We here give some discussion about the improvements to be made in our algorithm in the future:

### Vi-a Improving sampling for dot product

Remember in algorithm 1 we can estimate dot products for two vectors. However, it does not work well for all the conditions, like when and are donminated by a coordinate. For randomness, [19] implies that we can apply a spherically random rotation to all , which does not change the kernel matrix

, but will make all the coordinates random variables distributed evenly.

### Vi-B Non-linear SVM

When the training data is not linearly separable, there is not a pair of parallel hyperplanes that strictly divides the points into two classes depends on the given data. Hence the solution to the original optimization problem does not exist. There are two kinds of improving methods here.

A non-linear SVM improves the fitting ability by changing the kernel function, thus making it possible to classify strictly.

Take polynomial kernel for example, we have

 K(xi,xj)=(xjxTi)d

The equation 1 becomes

 K→α=→y K=((xjxTi)d)i,j=1,...,m

If we take

 X=(x1⊗x1⊗⋯x1,x2⊗⋯x2,...,xm⊗⋯xm)

Note that ’s size is , , the column norms of are

 ∥xk⊗xk⊗⋯⊗xk∥2 =n∑i=1n∑j=1(xkixkj)2 =(n∑l=1x2kl) =∥xk∥4

Thus we can sample on , and for , algorithm 4 is still suitable here.

### Vi-C Soft SVM

While a non-linear SVM may bring overfitting problem while achieving strict classification, another improving methods is soft SVM, which allows for wrong classification on training data and minimize the offsets.

By introducing a soft variable here the equation to solve becomes

 (0→1T→1XTX+γ−1Im)(b→α)=(0→y)

We only consider its sub-equation

 (XTX+1γIm)→α=→y (5)

For , we have

 A−1≈k∑l=11σ2l+1γV′lV′Tl

Thus to find solution of (5), we only need to add to all the eigenvalues of in step 7 of algorithm 4 and continue.

## Vii Conclusion

We have proposed a quantum-inspired SVM algorithm that achieves exponential speedup over the previous classical algorithms. We hope that the techniques developed in our work can lead to the emergence of more efficient classical algorithms, such as applying our method to more complex support vector machines [16, 20] or other machine learning algorithms. The technique of indirect sampling can expand the application area of fast sampling techniques. And it will make contribution to the further competition between classical algorithms and quantum ones.

Some improvements on our work would be made in the future, such as reducing the conditions on the data matrix and further reducing the complexity, which can be achieved through a deeper investigation on the algorithm and the error propagation process.

Certain investigations on the application of such an algorithm are also required to make quantum-inspired SVM operable in solving questions like face recognition

[15] and signal processing[21].

We note that our work, as well as the previous quantum-inspired algorithms, are not intended to demonstrate that quantum computing is uncompetitive. We want to find out where the boundaries of classical and quantum computing are, and we expect new quantum algorithms are developed to beat our algorithm.

## Appendix B Proof of Theorems in Iv

### B-a Proof of Theorem 3

###### Proof:
 |V′Ti^AV′j−δijσ2i| ≤|V′Ti(^A−~A)V′j|+|V′Ti~AV′j−δijσ2i| ≤∥V′Ti∥⋅∥(^A−~A)V′j∥ ≤β

### B-B Proof of Theorem 4

###### Proof:
 Δ1= |VTiVj−δij| = |V′TiRRTV′j−δijσ4iσ2iσ2j| ≤ 1σ2iσ2j(|V′Ti^A^AV′j−δijσ4i|+|V′Ti(RRT−^A^A)V′j|) ≤ (σ2i+σ2j+k−1)β+∥^XT(XXT−^X^XT)^X∥ ≤ 1σ2iσ2j((σ2i+σ2j+k−1)β+ϵ′) ≤ κ2((k+1)β+ϵ′)
 Δ2= |VTiAVj−δijσ2i| = 1σ2iσ2j|V′TiRARTV′j−δijσ6i| ≤ 1σ2iσ2j(|V′Ti(RAR−^A^A^A)TV′j| +|V′Ti^A^A^A