Machine Learning Kernel Method from a Quantum Generative Model

07/11/2019 ∙ by Przemysław Sadowski, et al. ∙ Institute of Theoretical and Applied Informatics 0

Recently the use of Noisy Intermediate Scale Quantum (NISQ) devices for machine learning tasks has been proposed. The propositions often perform poorly due to various restrictions. However, the quantum devices should perform well in sampling tasks. Thus, we recall theory of sampling-based approach to machine learning and propose a quantum sampling based classifier. Namely, we use randomized feature map approach. We propose a method of quantum sampling based on random quantum circuits with parametrized rotations distribution. We obtain simple to use method with intuitive hyper-parameters that performs at least equally well as top out-of-the-box classical methods. In short we obtain a competitive quantum classifier with crucial component being quantum sampling -- a promising task for quantum supremacy.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

The use of Noisy Intermediate Scale Quantum (NISQ) devices for machine learning tasks has been proposed in various forms [1]. However, there is still a lot to be understood about the potential sources of quantum advantage. While it is reasonable to postulate that quantum computers may outperform classical computers in machine learning tasks, the existing propositions often perform poorly due to various restrictions [2].

As some people put it, NISQ devices are best at simulating themself. In particular, this means performing noisy random quantum circuits. It may seem to be not a very useful task, however the noisy devices has been successfully used as a resource for so called quantum generative models, leading to non-trivial probability distributions in vector spaces 

[3, 4, 5, 6]. Thus, maybe we could harness that specific resource for machine learning tasks.

At the same time, is has been shown that any translation invariant kernel could be substituted by a explicit feature map based on some probability distribution in the vector space of the features [7]. Putting it the other way around, any unique probability distribution on a vector space of features can be expected to raise a new kernel for that features space.

We recall the theory of sampling-based approach to machine learning and establish a link to quantum generative models. Our goal is to show that we could develop quantum machine learning methods that use quantum devices solely as a source of a probability distribution, hoping to efficiently use the quantum resource.

We use a scheme designed for random features for large-scale kernel machine [7] as sampling based classification. We propose a method of quantum sampling based on random quantum circuits with parameterized rotations distribution. In short we obtain a competitive quantum classifier with crucial component being quantum sampling – a promising task for quantum supremacy.

1 Preliminaries – Randomized Feature Maps

One method to tackle large-scale kernels has been proposed by Rahimi and Recht [7, 8]. The proposition is to do pre-processing, mapping the input data to a randomized low-dimensional feature space and then apply a linear classifier. The explicit pipeline is shown in Figure 1. The key of the idea is to replace designing a fancy kernel with developing a sophisticated method for sampling vectors. We will introduce the original idea, split it into steps and specify which ones will be further considered.

Let us recall one of the key theorems that provides a foundation for the proposed scheme. Lets assume we have an input feature space and a shift invariant kernel , is the corresponding probability distribution in the input features space ,

is an explicit map into a higher dimensional space built in a certain way based on random variable

sampled with distribution .

Claim 1

(Uniform convergence of Fourier features) [7]. Let be a compact subset of with diameter . Then, for the mapping , we have

(1)

where

is the second moment of the Fourier transform of

. Further, with any constant probability when .

The proposition from [7] for the construction of and was to sample a set of vectors , of the same dimension as the data at hand as a basis for the new features. The vectors in the training data are compared with the random set and new features are generated as of the corresponding vectors. Then all features are transformed separately via a one dimensional non-linear (cosine, sine) function

(2)

New data set is passed on to a linear classifier for training. New test data is transformed the same way. The inner product is computed using the same random vectors sample and transformed with the same non-linear function. Then, the previously trained linear classifier is used.

The scheme can be seen as a typical classifier with pre-processing phase. In such picture we have the following steps:

  • Initialization.

  • Pre-processing of the training data.

  • Classifier training with the processed data.

  • Pre-processing of the test data.

  • Applying the classifier on the processed test data.

To implement the scheme we need to specify the initialization and pre-processing steps.

Figure 1: Original random features generation scheme. The crucial part that decides what kernel is effectively implemented is random vectors sampling, i.e. initialization.

We split the pre-processing into features generation and non-linear map and will focus mostly on the former one later in the paper. We operate on data points with dimension and create new features that does not have to be equal .

  • Initialization: Sample , from a distribution given by hyper-parameters.

  • Pre-processing: Transform any given training or test vector into .

In the remaining part we will show how to implement these steps using quantum circuits for vectors sampling.

2 Features Generation from Quantum Circuits

For our purposes it should be sufficient to know that any program that can be run on a

-qubit quantum computing device can usually be described by a unitary operator

. In this notation for an input vector , the output is simply a result of multiplication

(3)

In this context the considered vectors are normalized and called states.

Figure 2: Quantum part of the scheme in the most general picture. We link vector with quantum operation , such that the result of the operation for any input vector provides information about the inner product .

We aim at linking random vectors to quantum circuits in such a way that allows us to compute the inner product. Such scheme would be compatible with the scheme in the previous section. The quantum part in the most general picture is presented in Fig. 2.

In particular, we fix a quantum operation and denote the row (Hermitian conjugated) as

(4)

using basis vector . For given we want to compute the inner product . Let us note that we have

(5)

when . We can obtain inner product with some on a quantum computer by injecting as the input state and reading the value of the output state

(6)

Let us note that reading the exact value requires so called state tomography [9] and can be done with arbitrary precision with arbitrarily high probability, but always approximately. In practice esimating this value is a Bernoulli trial.

Based on the above considerations the randomized feature map is constructed as follows. Based on some quantum operation construction procedure we define a probability distribution on a set of unitary operators. This set will be described in detail later as circuit Ansätze. The parameters of this distribution are hyper-parameters of the whole classification scheme. From fixed distribution we sample a set of unitary operators , and in consequence vectors . Each vector is a row of a unitary operator and thus normalized. To obtain vectors of variable length we sample the lengths . The resulting set of vectors is , where , for . Let us note that the length can be correlated with the sampled vector. For example, we could design a larger circuit, with additional qubits, that after the measurement would indicate the length.

The mapping is performed as follows. In order to obtain feature for data point we

  • map a data point into a normalized vector and store ,

  • apply circuit to state , computing ,

  • estimate real part of the amplitude of , obtaining ,

  • scale the vectors obtaining ,

  • apply a non-linear map cos/sin obtaining , ,

  • return .

Effectively is a concatenation of and , where cos/sin are element-wise matrix operations, rows of are random vectors and columns of are data points . The steps are sketched in Fig. 3.

Figure 3: Random features calculation with quantum circuits. For an input feature vector , a feature is constructed using a quantum circuit unitary operation with weight . for all random circuits.

3 Example

3.1 Quantum Circuits

In the previous section we stated that we use quantum operations to generate the features. In Section 2 we only mentioned that any operation corresponds to an unitary operation. However, we will use a much more practical way of defining quantum operations – quantum circuits. That is a computational model inspired by classical logic circuits. The common idea is to describe a complex global operation with a sequence of simple and small basic operations. The most often recommended introduction can be found in [10].

We will use a set of basic operations represented by two parameterized unitary operations and build circuits by multiplying these matrices. The operations will correspond to one qubit rotations and a two-qubit entangling gate CNOT. The matrix representations of the two are

(7)

where is a Pauli matrix.

The end operation acts on a bigger space than

or CNOT. We will assume that all of the gates are extended to the same space with tensor product operation. Thus we will specify on which subspace the operator works. In case of

the operator works on a subspace corresponding to one qubit and we will use

(8)

to mark that it is the th out of qubits, where is a

-dimensional identity matrix and

is tensor product operation (also tensordot in e.g. numpy). In case of CNOT the operator works on a product of two subspaces, corresponding to so called target and control qubits. We will mark it with CNOT.

3.2 Random Vectors Circuit Ansätze

For this example we chose an Ansätze that generate a broad family of quantum circuits with little hyper-parameters that have intuitive interpretation. The parameters that will need to be fixed are the number of layers

, the parameters of the normal distribution used for rotations:

, and variance of the vectors length

.

For qubits the circuit is created as follows. First a rotation gate is added on each of the qubits, gates in total.

(9)

Then for layer we repeat: sample a control and action qubits for a CNOT gate and then add a rotation on both action and control qubits

(10)

The resulting operator is composed as

(11)

An example is presented in Figure 4. The rotation angles are sampled from the distribution described by the hyper-parameters,

. We use Gaussian distribution with fixed mean and variance. An additional hyper-parameter is

that will affect the weights of the vectors. For circuit we store as the weight corresponding to circuit , so that we will effectively consider vector .

@C=1.em @R=1.2em & R_y(β_0,1) & 2 & R_y(β_1,c)& &&1 &R_y(β_3,c)&M
& R_y(β_0,2) & & & ⊕&R_y(β_2,t)&⊕&R_y(β_3,t) &M
& R_y(β_0,3) & ⊕& R_y(β_1,t)& -1 &R_y(β_2,c)&&&M

Figure 4: Circuit Ansätze example. The circuit always begins with a rotation on each of the qubits. Then, a number of CNOTs is put at random qubits, each followed with rotations on both target () and control () qubits. All of the rotation parameters are sampled from the same distribution described by the hyper-parameters. Here .

3.3 Setting

In this work we perform basic accuracy measuring experiments. As the testing dataset we consider the MNIST dataset [11] as in [12]. We aim at beating the SVM with radical basis function as the kernel. We explore the space of hyper-parameters and the relation between the score and the number of random quantum circuits used.

The whole experimental algorithm is based on the randomized feature maps scheme presented in Section 1. We first describe the details of the preprocessing and classification used, and then report the obtained results.

The MNIST dataset contains 70000 images corresponding to digits 0-9. We extract only two of the digits: 3 and 5. There are 13454 data points of either of them. For measuring the accuracy we use a single training-test pair with size proportion 6:1. The size of the sets are 11532, and 1922 correspondingly.

Before feeding the algorithm with data we do simple feature selection. We plan to use a 7 qubit circuits that operate on vectors of dimension equal to

. Thus we select 128 best features according to a test, looking for multimodal distributions.

For features selection we use SelectKBest method and we perform classification with LinearSVC method from sklearn [13].

3.4 Scores

In the presented example we analyse the accuracy of the resulting classification scheme. We will compare the results to the ones obtainable by linear and non-linear methods. The results depend on the hyper-parameters selection, thus we show the results obtained for a range of values.

The accuracy considered here is the fraction of correct answers in a binary classification scheme. For comparison we take permutation invariant methods, without any optimisation towards image processing. The two main reference points are Linear SVC and SVM with radical basis function kernel. The scores obtainable with these methods are roughly 96% and 99%.

One particularly important hyper-parameter is selected circuits size. This is the one that is connected to the complexity of the quantum part of the scheme. Larger circuit size would increase both simulation cost and quantum device running time. In the case of the selected Ansätze we can select the number of CNOT gates freely. In this example we consider the number of CNOT gates being multiplication of the number of qubits.

Our hyper-parameter selection has been done with grid search for small number of random vectors . The best score was equal to 0.9909 for , although the average sore was highest for the largest tested value of . This result is better than what we achieved using SVM with radical basis function kernel. The best results has been obtained for and circuit layers number set to twice the number of qubits.

Method Accuracy
Logistic regression .959
SVM+RBF
Quantum Generative Model Kernel .9909
Table 1:

The best scores obtained with the considered methods. As the reference methods we consider logistic regression and supported vector machine with radical basis function kernel (SVM+RBF). The results of the reference methods come from 

[12].

The histograms of the scores are presented in Fig. 5 and Fig. 6. The relation between best score and the number of random vectors is presented in Fig. 7.

Figure 5: Score histograms for number of random vectors equal to (top) and (bottom). The scores aggregate various hyper-parameters.
Figure 6: Score histograms for number of random vectors equal to 8000 (top), 16000 (bottom). The scores aggregate various hyper-parameters.
Figure 7: Scores obtained for number of random vectors equal to 500, 1000, 2000, 4000, 8000 and 16000. Whiskers reflect the min/max values. The highest obtained value is .9909.

4 Discussion

There are three key points that we want to discuss, concerning simulation complexity, applicability to quantum data and the obtained scores.

Firstly, the proposed method provides a link between quantum circuit Ansätze and machine learning kernels. Any family of quantum circuits gives a new kernel. For small quantum circuits this gives a quantum inspired kernel creation method. For large circuits, that we can expect to yield a distribution that is hard to simulate, we obtain kernels that can be considered non-classical. However, we want to stress that the fact that simulating the circuits is time consuming does not mean that sampling from the resulting distribution is as well. Many families of random circuits are known to converge with length to easily sampled distributions, in particular -designs [14].

Secondly, the method works with data encoded in quantum states. Thus, is compatible with input data that is intrinsically of quantum nature. This may be an important feature if quantum simulation on quantum devices become common, and methods that handle output data directly would be desirable. In particular, the kernel can be joined with methods that operate on states with known preparation scheme but without the need to obtain the representation in computational basis, as in VQE [15]. Also, these are the most probable scenarios to require circuit sizes that cannot be simulated classically in reasonable time.

Lastly, the exemplary Ansätze seems competitive compared to established classical methods. The comparison is far from being a good argument for arguing supremacy over classical methods, but supports optimistic view. The selected problem is best handled with methods that harness spacial relations in the image [16]. For fair comparison these relations should be included.

The generative model that we choose in this work is only an example. Apart from the circuit model there are other quantum computational models. A natural turn for future work would be to look at other possibilities. Another example of computationally universal model that can generate probability distributions with specific features would be a quantum walk [17]. One could also turn to a general description of a quantum system given by Schroedinger/Lindblad equation [18]. These models could yield different probability distributions, and thus be the source of different kernels.

References

  • [1] Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, and Seth Lloyd. Quantum machine learning. Nature, 549(7671):195, 2017.
  • [2] Carlo Ciliberto, Mark Herbster, Alessandro Davide Ialongo, Massimiliano Pontil, Andrea Rocchetto, Simone Severini, and Leonard Wossnig. Quantum machine learning: a classical perspective. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 474(2209):20170551, 2018.
  • [3] John Preskill. Quantum computing in the nisq era and beyond. Quantum, 2:79, 2018.
  • [4] Scott Aaronson and Lijie Chen. Complexity-theoretic foundations of quantum supremacy experiments. arXiv preprint arXiv:1612.05903, 2016.
  • [5] Sergio Boixo, Sergei V Isakov, Vadim N Smelyanskiy, Ryan Babbush, Nan Ding, Zhang Jiang, Michael J Bremner, John M Martinis, and Hartmut Neven. Characterizing quantum supremacy in near-term devices. Nature Physics, 14(6):595, 2018.
  • [6] Jonathan Romero and Alan Aspuru-Guzik. Variational quantum generators: Generative adversarial quantum machine learning for continuous distributions. arXiv preprint arXiv:1901.00848, 2019.
  • [7] Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. In Advances in neural information processing systems, pages 1177–1184, 2008.
  • [8] Ali Rahimi and Benjamin Recht. Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning. In Advances in neural information processing systems, pages 1313–1320, 2009.
  • [9] RT Thew, Kae Nemoto, Andrew G White, and William J Munro. Qudit quantum-state tomography. Physical Review A, 66(1):012303, 2002.
  • [10] Michael A Nielsen and Isaac L Chuang. Quantum Computation and Quantum Information. Cambridge University Press, 2010.
  • [11] Mohd Razif Shamsuddin, Shuzlina Abdul-Rahman, and Azlinah Mohamed. Exploratory analysis of mnist handwritten digit for machine learning modelling. In

    International Conference on Soft Computing in Data Science

    , pages 134–145. Springer, 2018.
  • [12] Christopher Wilson, Johannes Otterbach, Nikolas Tezak, Robert Smith, Peter Karalekas, Anthony Polloreno, Sohaib Alam, Gavin Crooks, and Marcus Da Silva. Quantum kitchen sinks: An algorithm for machine learning on near-term quantum computers. Bulletin of the American Physical Society, 2019.
  • [13] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine Learning in Python . Journal of Machine Learning Research, 12:2825–2830, 2011.
  • [14] Aram W Harrow and Richard A Low. Random quantum circuits are approximate 2-designs. Communications in Mathematical Physics, 291(1):257–302, 2009.
  • [15] Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung, Xiao-Qi Zhou, Peter J Love, Alán Aspuru-Guzik, and Jeremy L O’brien.

    A variational eigenvalue solver on a photonic quantum processor.

    Nature communications, 5:4213, 2014.
  • [16] http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_ results.html#4d4e495354.
  • [17] Przemysław Sadowski, Jarosław Adam Miszczak, and Mateusz Ostaszewski. Lively quantum walks on cycles. Journal of Physics A: Mathematical and Theoretical, 49(37):375302, 2016.
  • [18] Łukasz Pawela and Przemysław Sadowski. Various methods of optimizing control pulses for quantum systems with decoherence. Quantum Information Processing, 15(5):1937–1953, 2016.