References
- Freund and Schapire (1996) Y. Freund and R. R. E. Schapire, International Conference on Machine Learning 96 (1996), 10.1.1.51.6252.
- Mohri et al. (2012) M. Mohri, A. Rostamizadeh, and A. Talwalkar, Fundations of Machine Learning (The MIT Press, 2012).
- Kong and Hong (2015) K.-K. Kong and K.-S. Hong, Pattern Recognition Letters 68, 63 (2015).
- Markoski et al. (2015) B. Markoski, Z. Ivanković, L. Ratgeber, P. Pecev, and D. Glušac, Acta Polytechnica Hungarica (2015).
- Owusu et al. (2014) E. Owusu, Y. Zhan, and Q. R. Mao, Expert Systems with Applications 41, 3383 (2014).
- Jiang et al. (2015) Y. Jiang, Y. Shen, Y. Liu, and W. Liu, Mathematical Problems in Engineering 2015, 1 (2015).
- Li et al. (2008) X. Li, L. Wang, and E. Sung, Engineering Applications of Artificial Intelligence 21, 785 (2008).
- Roe et al. (2005) B. P. Roe, H. J. Yang, J. Zhu, Y. Liu, I. Stancu, and G. McGregor, Nuclear Instruments and Methods in Physics Research, Section A: Accelerators, Spectrometers, Detectors and Associated Equipment (2005), 10.1016/j.nima.2004.12.018, 0408124 [physics] .
- Zhang and Zhang (2008) C. X. Zhang and J. S. Zhang, Pattern Recognition Letters (2008), 10.1016/j.patrec.2008.03.006.
- Grover (1996) L. K. Grover, in Proceedings of the twenty-eighth annual ACM symposium on Theory of computing - STOC ’96 (ACM Press, New York, New York, USA, 1996) pp. 212–219, 9605043 [quant-ph] .
- Kitaev (1995) A. Y. Kitaev, , 1 (1995), arXiv:9511026 [quant-ph] .
- Shor (1997) P. W. Shor, SIAM Journal on Computing 26, 1484 (1997), 9508027 [quant-ph] .
- Harrow et al. (2009) A. W. Harrow, A. Hassidim, and S. Lloyd, Physical Review Letters 103, 150502 (2009), 0811.3171 .
- Childs et al. (2017) A. M. Childs, R. Kothari, and R. D. Somma, SIAM Journal on Computing 46, 1920 (2017), 1511.02306 .
- Coles et al. (2018) P. J. Coles, S. Eidenbenz, S. Pakin, A. Adedoyin, J. Ambrosiano, P. Anisimov, W. Casper, G. Chennupati, C. Coffrin, H. Djidjev, D. Gunter, S. Karra, N. Lemons, S. Lin, A. Lokhov, A. Malyzhenkov, D. Mascarenas, S. Mniszewski, B. Nadiga, D. O’Malley, D. Oyen, L. Prasad, R. Roberts, P. Romero, N. Santhi, N. Sinitsyn, P. Swart, M. Vuffray, J. Wendelberger, B. Yoon, R. Zamora, and W. Zhu, (2018), arXiv:1804.03719 .
- Buhrman and de Wolf (2002) H. Buhrman and R. de Wolf, Theoretical Computer Science 288, 21 (2002).
- Biamonte et al. (2017) J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, “Quantum machine learning,” (2017), 1611.09347 .
- Ciliberto et al. (2018) C. Ciliberto, M. Herbster, A. D. Ialongo, M. Pontil, A. Rocchetto, S. Severini, and L. Wossnig, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Science 474, 20170551 (2018), 1707.08561 .
- Cai et al. (2015) X.-D. Cai, D. Wu, Z.-E. Su, M.-C. Chen, X.-L. Wang, L. Li, N.-L. Liu, C.-Y. Lu, and J.-W. Pan, Physical Review Letters 114, 110504 (2015), 1409.7770 .
- Li et al. (2015) Z. Li, X. Liu, N. Xu, and J. Du, Physical Review Letters 114, 140504 (2015), 1410.1054 .
- Dunjko et al. (2016) V. Dunjko, J. M. Taylor, and H. J. Briegel, Physical Review Letters 117, 130501 (2016), 1610.08251 .
- Schuld et al. (2017) M. Schuld, M. Fingerhuth, and F. Petruccione, EPL (Europhysics Letters) 119, 60002 (2017), 1703.10793 .
- Rebentrost et al. (2014) P. Rebentrost, M. Mohseni, and S. Lloyd, Physical Review Letters 113, 130503 (2014), 1307.0471 .
- Cong and Duan (2016) I. Cong and L. Duan, New Journal of Physics 18, 073011 (2016), 1510.00113 .
- Lloyd et al. (2014) S. Lloyd, M. Mohseni, and P. Rebentrost, Nature Physics 10, 631 (2014), 1307.0401 .
- Sasaki et al. (2001) M. Sasaki, A. Carlini, and R. Jozsa, Physical Review A 64, 022317 (2001), 0102020 [quant-ph] .
- Bae and Kwek (2015) J. Bae and L.-C. Kwek, Journal of Physics A: Mathematical and Theoretical 48, 083001 (2015), 1707.02571 .
- Kearns and Valiant (1994) M. Kearns and L. Valiant, Journal of the ACM 41, 67 (1994).
- Dervovic et al. (2018) D. Dervovic, M. Herbster, P. Mountney, S. Severini, N. Usher, and L. Wossnig, (2018), arXiv:1802.08227 .
- Zhu et al. (2016) H. Zhu, R. Kueng, M. Grassl, and D. Gross, (2016), arXiv:1609.08172 .
- Du et al. (2018) Y. Du, M.-H. Hsieh, T. Liu, and D. Tao, “Implementable quantum classifier for nonlinear data,” (2018), arXiv:1809.06056 .
Appendix A Proof of Theorem 1
Here we perform our analysis for the probabilistic case, which can degenerate to the conventional AdaBoost if the outputs of classifiers are certain. Moreover, we assume a certain target label exists for all , where is the sample space of all possible inputs. Let be the probability mass function defined on .
The goal of AdaBoost is to find the optimal coefficients of the linear model based on the basis classifiers with the minimum exponential error, which is the average of over the joint distribution of inputs and classifiers. Here
are random variables which yield the conditional probabilities
. Let , that is if , and otherwise. Then is fully determined by a conditional probability mass function .The exponential error as the cost function yields
(11) |
because . In AdaBoost, the optimization problem is done by adding each term into one by one with the optimal weight at iteration. Let be the exponential error of the first terms of
Let be a binary string . Let , and let . Then equation (11) gives
(12) | ||||
This is a convex function respect to , and an unique solution to the problem exists at the extreme. Taking its derivative to gives
(13) |
and hence
(14) | |||||
That is
(15) |
Let
(16) |
Then the optimal weight of each iteration is
(17) |
In the following, we demonstrate that the optimal weight can be adaptively obtained. When , initialize . Thus
(18) | ||||
which is exactly the generalization error of .
Let be the normalization factor. Then
(19) | ||||
By definition . Therefore
(20) | ||||
Let
(21) | ||||
Appendix B Proof of Theorem 2
In the section Proof of Theorem 1, a theoretical optimal solution to the AdaBoost Model is derived. However, in practice, the underlying distribution of inputs is unknown, and therefore the values of are impossible to be evaluated. Also, usually the training algorithm cannot cover the whole sample space (otherwise the explicit relationship between inputs and output are known, and machine learning is unnecessary).
Similar to other machine learning tasks, this problem is solved by sampling. Clearly, with a underlying distribution on the sample space , each can be viewed as a random variable on the sample space. This can be done with an interesting result derived from Hoeffding’s inequality.
Theorem 4 (Hoeffding’s inequality).
If a sample of size is drawn from a distribution on a sample space , then given a random variable on and any positive number
(22) |
where .
The key point here is that, though cannot be evaluated in practice, is computable, and it approximates well when is large.
According to equation (19), For a sample of pairs drawn from the distribution , let
(23) |
To be noticed, the value of is derived with iteration according to equation (21). Since , is always positive, which means is always non-negative as well. Further, .
Therefore, for a target precision of , a sample with size is good enough to achieve the goal with a constant probability. Nevertheless, the size of sample have to be determined before hand; and hence we should choose
(25) |
where .
Remark.
However, might not be small when
is large, which indicates that AdaBoost may not be good if the model does not converge fast with the number of classifiers used. These might be improved by other boosting algorithms, e.g. LogitBoost, Gradient Boosting, XGBoosting.
Appendix C Quantum Simulation of Classical Process
This section reviews some results from Kitaev’s paper Kitaev (1995) that simulate classical Boolean circuits with quantum circuits. For convenient, without loose of generality the classical registers are denoted with Dirac notations here.
According to lemma 1 and 7 in Kitaev (1995), if a function can be computed with Boolean operations a basis , which is a small set of Boolean operations, then it can be computed with operations in the basis . The basis is defined in a way that, for each , there is a Also the operation to copy a state
(26) |
(which is indeed a CNOT gate) have to be included into .
Furthermore, we say a circuit computes a Boolean function , if it converts . With basis , this is computation is performed as .
However, one may only need partial information about the output . Classically, it is free to readout part of the the output bits and drop the rest. Nevertheless, in quantum computation, dropping those “garbage” bits () would destroy the quantum state if they are in superposition. But as shown above, can be constructed with reversible gates. Divide register into two parts , and then is
(it is always separable as the initial states are all tensor product states), above process is then
By repeating this process on an extra register , the process
can be constructed.
If the input state is on quantum registers and it is in superposition
this process will give
Then the pairwise operation (26) is performed between the “garbage” states on and , which gives
Finally, the original process is performed again on which ends up at
Since the appending registers are all end up at , it is free to drop them after computation.
In summary, the process can be achieved with quantum gates even for computation in superposition. This fact indicates that each arithmetic part in our quantum algorithm can be performed with the same complexity of the classical algorithm. Since all ancillary registers always start and end at , they are neglected in our notation for simplicity.
Note that, although above result is only valid for Boolean functions, as how modern computers work, these Boolean operations are indeed universal. In case people want to deal with real numbers on computers, those values have to be encoded into binary strings up to some precision.
Example 1.
The updating rule (6) is purely arithmetic. This can be viewed as repeating controlled operation on a register , encoding a numerical value in terms of binary strings up to some precision. Each application of is controlled by each qubit of the string . More precisely,
where ; . As a result, lines 10-14 in algorithm 1 can be performed in quantum circuits with the same order of gates as classical circuit. Additionally, this can be done in superposition for all , and hence the “for” loop in classical algorithm can be done in one shot.
Similarly, another step for phase estimation in our algorithm can be done with this method.
Example 2.
There exists an operation such that for ,
(27) |
The requirement of is presented to make sure is a real number, and therefore, the state can be constructed on an ancillary register “anc”. Here , where is the binary representation of the real number up to some precision. The process to compute is arithmetic. By further appending an additional qubit to the system, an operation can be constructed as lemma 4 in Dervovic et al. (2018), such that it converts to
Finally the register can be cleared and dropped with the garbage dropping technique above. This whole process is exactly the operation .
The operations in these examples would be useful in next section.
Appendix D Proof of Theorem 3
In the Quantum AdaBoost Algorithm, the computation other then the average of can be performed in parallel on the whole sample. That is, for every initial state , where is the data points of a sample of size drawn from the sample space , the classical algorithm outputs to the register , which encoding the numerical value of . Note that here is the state corresponding to the binary value of , as how modern computer saves numerical values. With this property, the AdaBoost algorithm can be performed by following adaptive procedure:
At iteration, given the classical information of (where can be obtained in iteration), initialize the state of three registers , , as
(28) |
with access to the quantum oracle defined in (7), one can obtain
(29) |
With the classical information of , one can update the register with the updating rule (6) (which is a classical arithmetic process shown in example 1) to the state
(30) |
Compose the whole arithmetic process that converts (28) to (30) and rewrite it as :
(31) | ||||
With an extra working register, apply the operation in example 2 to the final state in (31)
(32) | ||||
where
(33) | ||||
Note that for each , .
According to the definition in equation (23), the result of (32) is indeed
(34) |
This can be rewrite as
(35) |
which performs a rotation of angle .
Let . After a Pauli- operation is performed on the last register of , it is transformed to
(36) | |||||
Let . Apply the inverse operation to (36), so that is mapped back to the initial state (28). Note that, is orthogonal to and our operation is unitary. Therefore, if an operation only inverse the amplitude of the every state perpendicular to the initial state (analogy to the diffusion operator in Grover’s algorithm Grover (1996)) is applied and the operation is performed again, would be left unchanged. This procedure gives
(37) | |||||
In conclusion, converts the initial state to
(38) |
Such operation provides the possibility to estimate with phase estimation algorithm.
To fairly compare the query complexities, we want to constrain the results from both classical and quantum algorithm to the same precision. In order to approximate with the target precision , the phase estimation algorithm have to estimate with precision , and as shown in (25), a sample of size is enough to estimate each with with precision .
In the step of our quantum algorithm, by choosing number of iterations in (38) to be , the phase estimation process could read out the value of , such that .
In order to estimates with the same precision as the classical algorithm, we need to bound to make sure
This can be done by choose a proper . Then, the task of our analysis is to bound the value of in terms of and as in the classical case.
Let , then gives Since , is a small number, and hence
(39) | ||||
When , is almost a constant, and . This is usually true since and . Note that , and . To make sure , or equivalently , the optimal can be chosen is . This gives .
Moreover, for iteration, the step (29) requires queries. So the query complexity for each iteration is .
Nevertheless, in order to obtain the value of , each quantum iteration is followed with a measurement. The information of saved in superposition would be disrupted and thus it have to be evaluated from every beginning every time. Therefore the overall complexity is , comparing to the classical case, which is . As discussed in remark at the end of the section Proof of Theorem 1, AdaBoost algorithm may not work well if it does not converge within a small number of iterations. Therefore, the here may be considered as a small constant.
Also, for both quantum and classical algorithms, we use , the query complexity of classical algorithm can be rewritten as and the quantum query complexity is then .
This quantum algorithm could give the same result of the classical algorithm with the same order of precision with same success probability.
Comments
There are no comments yet.