Large-scale quantum machine learning

08/02/2021
by   Tobias Haug, et al.
Imperial College London
0

Quantum computers promise to enhance machine learning for practical applications. Quantum machine learning for real-world data has to handle extensive amounts of high-dimensional data. However, conventional methods for measuring quantum kernels are impractical for large datasets as they scale with the square of the dataset size. Here, we measure quantum kernels using randomized measurements to gain a quadratic speedup in computation time and quickly process large datasets. Further, we efficiently encode high-dimensional data into quantum computers with the number of features scaling linearly with the circuit depth. The encoding is characterized by the quantum Fisher information metric and is related to the radial basis function kernel. We demonstrate the advantages of our methods by classifying images with the IBM quantum computer. To achieve further speedups we distribute the quantum computational tasks between different quantum computers. Our approach is exceptionally robust to noise via a complementary error mitigation scheme. Using currently available quantum computers, the MNIST database can be processed within 220 hours instead of 10 years which opens up industrial applications of quantum machine learning.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 11

08/04/2020

A divide-and-conquer algorithm for quantum state preparation

Advantages in several fields of research and industry are expected with ...
11/21/2019

Local certification of programmable quantum devices of arbitrary high dimensionality

The onset of the era of fully-programmable error-corrected quantum compu...
06/07/2021

The Inductive Bias of Quantum Kernels

It has been hypothesized that quantum computers may lend themselves well...
12/16/2020

Variational Quantum Algorithms

Applications such as simulating large quantum systems or solving large-s...
05/06/2022

Incremental Data-Uploading for Full-Quantum Classification

The data representation in a machine-learning model strongly influences ...
03/25/2022

High Dimensional Quantum Learning With Small Quantum Computers

Quantum computers hold great promise to enhance machine learning, but th...
07/07/2019

Quantum-inspired canonical correlation analysis for exponentially large dimensional data

Canonical correlation analysis (CCA) is a technique to find statistical ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Support vector machine

Figure 1: a)Supervised learning to classify images of handwritten digits. By learning from a training set of labeled images, our goal is to identify previously unseen test data correctly. The support vector machine (SVM) learns using a kernel (Eq. (3)) which is a measure of distance between the data. b) The pixels of each image are converted to

-dimensional feature vectors

that are encoded as parameters on a parameterized quantum circuit (PQC). We use hardware efficient PQCs of qubits constructed from layers of parameterized single qubit rotations and two-qubit entangling gates that can be efficiently implemented noisy quantum computers. The -th index of the feature vector is mapped to a parameter of the single-qubit rotations via (Eq. (2)), where is a fixed reference parameter. The number of encoded features scales linearly with and . The kernel (Eq. (3)) is characterized by the quantum Fisher information metric (QFIM) and can be approximately described by the radial basis function kernel (Eq. (5)). We calculate the quantum kernel by measuring the PQC in randomized local bases of Haar random unitaries . The quantum computation time scales linearly with the size of the dataset. c) The SVM trained with the quantum kernel draws the decision boundaries (here shown for a two-dimensional feature vector space and three possible digits) that classify each feature vector to its corresponding label.

Our goal is to classify unlabeled test data by learning from labeled training data as shown in Fig.1a. The dataset for the supervised learning task contains in total entries. The -th data item is described by a -dimensional feature vector with label , which belongs to one of possible classes. To learn and classify data, we use a kernel that is a measure of distance between feature vectors and  [2]. The kernel corresponds to an embedding of the -dimensional data into a higher-dimensional space, where analysis of the data becomes easier [28]. In quantum kernel learning, we embed the data into the high-dimensional Hilbert space of the quantum computer and use it to calculate the kernel (see Fig.1

b). With the kernels, we train a support vector machine (SVM) to find hyperplanes that separate two classes of data (see Fig.

1c). The SVM is optimized using the kernels of the training dataset with a semidefinite program that can be efficiently solved with classical [29] or quantum computers [30, 31].

(1)

subject to the conditions and . After finding the optimal weights , the SVM predicts the class of a feature vector as , where is calculated from the weights. One can extend this approach to distinguish classes by solving SVMs that separate each class from all other classes.

The power of the SVM highly depends on a good choice of kernel , such that it captures the essential features of the dataset. In the following, we propose a powerful class of quantum kernels that can be implemented with currently available quantum computers. Then, we show how to compute kernels for large datasets and mitigate the noise inherent in real quantum devices.

Ii Encoding

A crucial question is how to efficiently encode a high-dimensional feature vector into a quantum computer while providing a useful kernel for machine learning. We encode the -dimensional feature vector as -dimensional parameter of a PQC via

(2)

where is a scaling constant and the reference parameter. As shown in Fig.1b, we use hardware efficient PQCs with qubits and layers of unitaries for the encoding [32]. Each layer is composed of a product of parameterized single qubit rotations and non-parameterized two-qubit entangling gates that generate the quantum state .

Our choice of quantum kernel measures the distance between two encoding states as given by the fidelity between and  [25, 7]

(3)

which for pure states reduces to .

We can formalize the expressive power of our encoding with the QFIM , which is a dimensional positive-semidefinite matrix that provides information about the kernel in the proximity of  [33]. For a pure state it is given by , where is the gradient in respect to the -th element of  [34]. In the limit of encoding Eq. (2), the kernel of a pure quantum state can be written as

(4)

where is the

-th eigenvalue of the QFIM

and is the inner product of the feature vector and the

-th eigenvector

of . The rank (the number of non-zero eigenvalues) of is an important measure of the properties of the PQC and the encoding [33]. The eigenvectors with have no effect on the kernel with . Thus, feature vectors that lie in the space of eigenvectors with eigenvalue zero cannot be distinguished using the kernel as they have the same value . Further, the size of the eigenvalues determines how strongly the kernel changes in direction of the feature space. By appropriately designing the QFIM as the weight matrix of the kernel, generalizing from data could be greatly enhanced [33, 25, 35]. For example, the feature subspace with eigenvalue 0 could be engineered such that it coincides with data that belongs to a particular class. Conversely, features that strongly differ between different classes could be tailored to have large eigenvalues such that they can be easily distinguished [35]. For a PQC with qubits the rank is upper bounded by , which is the maximal number of features that can be reliably distinguished by the kernel [33].

a      b

Figure 2: a) Simulated kernel as function of , which is the distance in feature space weighed with the QFIM . The feature vectors , are randomly sampled and encoded with Eq. (2) into the PQC. We show two types of hardware efficient PQCs, namely YZ-CX PQC and NPQC (see Appendix A

). Shaded area is the standard deviation of the kernel. The quantum kernels are well approximated by radial basis function kernels (rbf, dashed line, Eq. (

5)) until reaching very small values (dash-dotted lines). PQCs have qubits, layers and we average over 50 random feature vectors. b) Experimental kernel as function of distance between the feature vectors. We encode randomly chosen feature vectors in the NPQC with the QFIM . The quantum kernel generated by theory (blue dots) and via experimental results with IBM quantum computer (orange crosses) follows approximately the isotropic radial basis function kernel (, black line). Shaded area is standard deviation of the kernel. The NPQC has qubits, features and layers. Experimental results from ibmq_guadalupe were performed with randomized measurement settings, measurement samples and error mitigation with Eq. (7).

It has been recently shown that the kernel of pure quantum states of hardware efficient PQCs can be approximated as Gaussian or radial-basis function kernels [36], which are one of the most popular non-linear kernels with wide application in various machine learning methods [37]. Specifically, for small enough with the encoding Eq. (2), we can approximately describe the quantum kernel as

(5)

which is the radial basis function kernel with the QFIM as weight matrix  [36]. While for general PQCs the QFIM is a priori not known, a type of PQC called NPQC has the special property that the QFIM takes a simple form with , where

is the identity matrix and

a particular reference parameter (see [38] and Appendix A). The NPQC forms an approximate isotropic radial basis function kernel that can serve as a well characterised basis for quantum machine learning. We also study another commonly used type of hardware efficient circuit (YZ-CX PQC) composed of single qubit rotations and CNOT gates arranged in a one-dimensional nearst-neighbor chain with a non-trivial QFIM . Further details on the NPQC and YZ-CX PQC are shown in the Appendix A.

Iii Measurement

We calculate the quantum kernels using randomized measurements [39, 40, 41] by measuring quantum states in randomly chosen single qubit bases. We prepare the quantum state and rotate into a random basis with the unitary , where the single qubit unitaries are chosen according to the Haar measure over . Then, we measure samples of the rotated state

in the computational basis and estimate the probability

of measuring the computational basis state for state and basis . This procedure is repeated for different measurement bases and all quantum states. The kernel via randomized measurements is then calculated as

(6)

where is the Hamming distance that counts the number of bits that differ between the computational states and .

The statistical error of estimating the kernel in this way scales as with the number of measurement bases . The number of measurement samples per basis scales exponentially with the number of qubits [39, 40]. However, by using importance sampling, the number of measurements can be drastically reduced [42]. The number of measurements needed to determine all entries of the kernel matrix scales linearly with the dataset size , allowing us to quickly process kernels of large datasets. Other commonly used measurement strategies such as the swap test [43, 44] or the inversion test [18] have to explicitly prepare both states and on the quantum computer and thus scale unfavorably with the square of the dataset size (see Appendix B).

Iv Error mitigation

In general, quantum computers are affected by noise, which will turn the prepared pure quantum state into a mixed state and may negatively affect the capability to learn. For depolarizing noise, we can use the information gathered in the process to mitigate its effect and infer the noiseless value of the kernel.

For global depolarizing noise, with a probability the pure quantum state is replaced with the completely mixed state , where is the identity matrix. The resulting quantum state is the density matrix . The purity can be determined from the randomized measurements by reusing the same data used to compute the kernel entries. Using these purities, the depolarization probability can be calculated by solving a quadratic equation [45]. With and the kernel affected by depolarizing noise, the mitigated kernel is given by

(7)

which simplifies for small , to .

V Results

We now proceed to numerically and experimentally demonstrate our methods. First, we investigate the kernel of our encoding. In Fig.2a we numerically simulate [46, 47] two types of hardware efficient PQCs (YZ-CX PQC and NPQC) and show that the quantum kernel is well described by a radial basis function kernel (Eq. (5), dashed line). The kernel diverges from the radial basis function kernel for small values of the kernel and reaches a plateau at , which is the fidelity of Haar random states [48]. In Fig.2b, we experimentally measure the kernel of the NPQC with an IBM quantum computer (ibmq_guadalupe [49]) using randomized measurements and error mitigation (Eq. (7)). We find that the mean value of the kernel matches well with the isotropic radial basis function kernel. See Appendix D for details on the experiment and Appendix C for results regarding the YZ-CX PQC.

Next we address the statistical error introduced by estimating the kernel using randomised measurements and depolarizing noise . In Fig.3a we simulate the average error

(8)

of measuring the mitigated kernel using randomized measurements with respect to its exact value as function of number of measurement samples . We find that there is a threshold of samples where the error becomes minimal. We are able to mitigate depolarizing noise to a noise-free level even for high . In Fig.3b, we show the minimal number of samples required to measure the kernel with an average error of at most as function of depolarization noise . The randomized measurement scheme works well even with substantial noise , where we find a power law .

a
b

Figure 3: a) Average error for measuring the kernel with randomized measurements as function of number of measurement samples and the global depolarizing probability . Simulation with measurement settings, qubits and the YZ-CX PQC. b) Minimal number of measurement samples needed to achieve an average error of at most for varying depolarizing noise . Dashed line is the power law . Number of measurement settings is for and for .

Figure 4: Accuracy of classifying previously unseen handwritten digits correctly as function of the size of the training data. a) SVM trained with experimental quantum kernel measured on a single quantum computer (ibmq_guadalupe) with randomized measurements using error mitigation (red, Eq. (7)) and no error mitigation (yellow). The shaded area is the standard deviation of the accuracy. As reference, we show numerical simulations of the isotropic radial basis function kernel (blue), exact quantum kernel (orange) and noiseless simulation of randomized measurements (green). b) We distribute the measurements on two different quantum computers (ibmq_guadalupe and ibmq_toronto, purple curve) and post-process the combined measurement results with error mitigation. As reference, we show the accuracy of quantum kernel measured on a single quantum computer for ibmq_guadalupe (red) and ibmq_toronto (light blue). We encode the data into the YZ-CX PQC with features and the NPQC with features. Experiments are performed using measurement samples, qubits and randomized measurement settings. The test data contains entries with test and training data randomly drawn from the full dataset, which is repeated 20 times for each training data size.

Now we assess the overall performance of our approach on a practical task. We learn to classify handwritten 2D images of digits ranging from 0 to 9. The dataset contains images of pixels, where each pixel has an integer value between 0 and 16 [50]. We map the image to dimensional feature vectors. For the YZ-CX PQC, we use all

features, whereas for the NPQC we perform a principal component analysis to reduce it to

features. We calculate the kernel of the full dataset and use a randomly drawn part of it as training data for optimizing the SVM with Scikit-learn [51]. The accuracy of the SVM is defined as the percentage of correctly classified test data, which are

images that have not been used for training. The dataset is rescaled using the training data such that each feature has mean value zero and its variance is given by

. We encode the feature vectors via Eq. (2) with , where for the YZ-CX PQC we choose randomly and for the NPQC we define such that the QFIM is given by (see Appendix A).

In Fig.4a, we measure the quantum kernel with a single quantum computer. We plot the accuracy of classifying test data with the SVM against the size of the training data for the YZ-CX PQC and the NPQC. As kernels, we compare simulations with radial basis function kernel (rbf), the exact simulated quantum kernel (exact) and a noiseless simulation of the randomized measurements (noiseless). For experimental data, we use an IBM quantum computer (ibmq_guadalupe [49], see Appendix D for more details) to perform randomized measurements with error mitigation (mitigated) and without error mitigation (unmitigated). The accuracy improves steadily with increased number of training data for all kernels. Our error mitigation scheme (Eq. (7)) substantially improves the accuracy of the SVM trained with experimental data to nearly the level of the noiseless simulation of the randomized measurements. The randomized measurements have a lower accuracy compared to the exact quantum kernel as we use only a relatively small number of randomized measurement settings. For the NPQC the exact quantum kernel shows nearly the same accuracy as the radial basis function kernel, whereas for the YZ-CX PQC the quantum kernel performs slightly worse, likely indicating that its QFIM does not optimally capture the structure of the data. The depolarizing probability of the IBM quantum computer is estimated as for the NPQC and for the YZ-CX. To measure the kernel of the dataset, we require in total experiments, whereas with the inversion test the quantum computation time would be a factor of 100 longer with experiments.

Finally, in Fig.4b we distribute the measurements between two quantum computers, such that half of the dataset is measured with ibmq_guadalupe and the other half with ibmq_toronto [49] (see Appendix D for more details). The measurement results are then combined and we apply error mitigation during post-processing. As reference, we also plot the accuracy achieved with a single quantum computer. For the YZ-CX PQC, we find nearly equal accuracy with the distributed and single quantum computer approach. For the NPQC, the accuracy of the distributed approach is slightly lower. The performance highly depends on the noise and calibration of the IBM quantum computers, which can fluctuate over time and highly depends when an experiment is performed. We attribute the lower performance of the distributed YZ-CX approach with a higher noise level present while the experiment was performed on ibmq_toronto. As the randomized measurement method correlates measured samples, differences in the respective noise model of the two quantum computers can have a negative effect on the resulting quantum kernel. In the Appendix E and F, we show the accuracy of the training data and the confusion matrices.

Vi Discussion

We demonstrated machine learning of large datasets with quantum computers using randomized measurements. The machine learning data is encoded into hardware efficient PQCs, where the number of features scales linearly with the depth of the circuit and number of qubits. The kernel is characterized by the QFIM and its eigenvalues and eigenvectors [33]. As the behavior of the kernel is crucial for effectively learning and generalizing data, future work could design the QFIM to improve the capability of quantum machine learning models. We demonstrated the NPQC with a simple and exactly known QFIM, which could be a useful basis to study quantum machine learning on large quantum computers. The relation of our PQCs and radial basis function kernels [36] gives us a strong indication that our encoding is at least as powerful as classical machine learning kernels and could be used to study the power of quantum machine learning [52]. However, we stress that the description as radial basis function kernels is only approximate and fails for very small kernel values, which may hide possible quantum advantages [7].

We mitigate the noise occurring in the quantum computer by using data sampled during the measurements of the kernel. We find that the number of measurement samples needed to mitigate depolarizing noise scales as , allowing us to extract kernels even from very noisy quantum computers. We successfully apply this model to mitigate the noise of the IBM quantum computer. While the noise model of quantum computers is known to be complex, the depolarizing noise model is sufficient to mitigate the noise of quantum kernels [45]. We note that noise induced errors can actually be beneficial to machine learning as the capability to generalize from data improves with increasing noise [35].

In general, the number of measurements needed for the randomized measurement scheme scales exponentially with the number of qubits [39, 40]. However, various approaches can mitigate this. Importance sampling can drastically reduce the number of measurements needed [42]. In other settings adaptive measurements have been proposed to improve the scaling of measurement costs [53], as well as other approaches such as shadow tomography [54]

. The choice of an effective set of measurements could be included in the machine learning task as hyper-parameters to be optimised. To reduce the number of qubits, one could combine our approach with quantum autoencoders to transform the encoding quantum state into a subspace with less qubits that captures the essential information of the kernel 

[55]. Alternatively, one could trace out most of the qubits of a many-qubit quantum state such that a subystem with a lower number of qubits remains. Then, randomized measurements can efficiently determine the kernel . It would be worthwhile to investigate the learning power of kernels generated from subsystems of quantum states that possess quantum advantage [6, 7].

Our method scales linearly with dataset size and provides a quadratic speedup compared to conventional measurement methods such as the inversion test, allowing us to compute large datasets in a reasonable time. Additionally, the measurements for the dataset can be distributed between different quantum computers to achieve a further speedup by a factor of . We demonstrate parallelization between two IBM quantum computers and achieve comparable accuracy to training with a single quantum computer. Our encoding can load up to features onto qubits such that the popular MNIST dataset for classifying 2D images of handwritten digits with pixels could fit already on qubits [56]. Current state of the art quantum computers measure about 5000 quantum states per second [11, 12]. Assuming measurement samples and measurement settings, our method can process the full MNIST training dataset with entries in about 220 hours of quantum processing time of a single quantum computer. In contrast, the inversion or swap test would require at least 10 years with samples. With our scheme, the currently available quantum computers can be benchmarked with large-scale datasets against classical machine learning algorithms and explore industrially relevant applications of quantum machine learning.

Code to reproduce the experimental results presented in this paper is available from [57] and the experimental data is available from [58].

Acknowledgements.
We acknowledge discussions with Kiran Khosla and Alistair Smith. This work is supported by a Samsung GRC project and the UK Hub in Quantum Computing and Simulation, part of the UK National Quantum Technologies Programme with funding from UKRI EPSRC grant EP/T001062/1. We acknowledge the use of IBM Quantum services for this work. The views expressed are those of the authors, and do not reflect the official policy or position of IBM or the IBM Quantum team.

References

Appendix A Parameterized quantum circuits

Figure 5: a) The NPQC for qubits and layers, which is a hardware efficient PQC composed of single qubit rotations and CPHASE gates. For the reference parameter , the QFIM is the identity matrix. b) Example of the entangling layer for the NPQC, which is composed of non-overlapping CPHASE gates and rotations by . c)

YZ-CX PQC, which is a hardware efficient circuit consisting of single qubit rotations and CNOT gates arranged in an alternating fashion in even and odd layers

.

We use two different types of PQCs. Both consist of layers of unitaries , generating the quantum state parameterized by the -dimensional parameter vector .

In Fig.5a, we show the first circuit we use, which we call the NPQC. The first layer is single qubit rotations around the and axis for each qubit with . Here, , and , , are the Pauli matrices for qubit . Each additional layer is a product of two qubit entangling gates and parameterized single qubit rotations defined as , where and is the controlled gate for qubit index , , where indices larger than are taken modulo. The entangling layer is shown as example in Fig.5b. The shift factor for layer is given by the recursive rule shown in the following. Initialise a set and . In each iteration, pick and remove one element from . Then set and for . As the last step, we set . We repeat this procedure until no elements are left in or a target depth is reached. One can have maximally layers with in total parameters. The NPQC has a QFIM , being the identity matrix, for the reference parameter given by

(9)

Close to this reference parameter, the QFIM remains approximately close to being an identity matrix. When implementing the NPQC for the IBM quantum computer, we choose the sift factor such that only nearest-neighbor CPHASE gates arranged in a chain appear. To match the connectivity of the IBM quantum computer, we removed one entangling gate and its corresponding single qubit rotations which require connection between the first and the last qubit of the chain.

The second type of PQC used is shown in Fig.5c, which we call YZ-CX. It consists of layers of parameterized single qubit and rotations, followed by CNOT gates. The CNOT gates arranged in a one-dimensional chain, acting on neighboring qubits. Every layer , the CNOT gates are shifted by one qubit. Redundant single qubit rotations that are left over at the edges of the chain are removed.

Appendix B Methods to measure quantum kernels

In Fig.6, we explain the different methods to measure kernels of quantum states. In this paper, we use the randomized measurements method shown in Fig.6a. The number of required measurements to measure all possible pairs of kernels scales linearly with dataset size .

The inversion test is shown in Fig.6b. To measure the kernel between two quantum states, it uses the unitary of the first state combined the with inverse unitary of the second state. Then, the kernel is given by the probability of measuring the zero state. Here, the number of measurements scales with the square of the dataset size.

The swap test is shown in Fig.6c. It prepares both states for the kernel, requiring two times the amount of qubits as with the other tests. Then, a controlled SWAP gate is applied, with the control being on an ancilla qubit. Then, the kernel is given by the measurement of the ancilla. As with the inversion test, the number of required measurements scales with the square of the dataset size. Further, the controlled SWAP gate can require substantial quantum resources.

Figure 6: Quantum circuits to measure kernel of quantum states . a) Randomized measurement scheme. Prepare state , rotate into randomized basis given by single qubit Haar random unitaries and measure in computational basis. By post-processing sampled states one can extract the kernel. The number of measurements scales linearly with . b) Inversion test. Prepare and measure probability of zero state . The number of measurements scale with . c) Swap test. Prepare on twice the number of qubits and perform controlled SWAP gate with an ancilla. Kernel is determined by measuring ancilla only. The number of measurements scale with .

Appendix C Experimental kernel of YZ-CX PQC

In Fig.7, we show experimental data of the kernel for the YZ-CX PQC using ibmq_guadalupe. We find that the experimental data and numerical simulations match well.

Figure 7: Experimental kernel as function of distance between the feature vectors , . We encode randomly chosen feature vectors via in the YZ-CX PQC, where is a randomly chosen reference parameter. The QFIM is calculated numerically. The quantum kernel generated by theory (blue dots) and via experimental results with IBM quantum computer (orange crosses) follows approximately a weighted radial basis function kernel (black line). Shaded area is standard deviation of the kernel. The YZ-CX PQC has qubits, features and layers. Experimental results from ibmq_guadalupe were performed with randomized measurement settings, measurement samples and error mitigation.

Appendix D IBM Quantum implementation details

Our PQC circuits are constructed as parameterised circuits with Qiskit [59]. These parameterised circuits are first transpiled then bound for each data point and randomised measurement unitary, ensuring that all circuits submitted have the same structure and use the same set of device qubits. Transpiling is handled by the pytket python package [60] using rebase, placement and routing passes with no additional optimisations (IBMQ default passes with optimisation level 0).

The ibmq_guadalupe [49] results presented in Fig. 2 and Fig. 4 were collected between 22nd July 2021 and 30th July 2021. The ibmq_toronto [49] results presented in Fig. 4 were collected between 23rd July 2021 and 9th August 2021. Fig. 2 required the execution of circuits and Fig. 4 involved circuits, each with 8192 measurement shots. For comparison, applying the inversion test to the same handwritten digit dataset used for Fig. 4 would have required the execution of circuits. Circuits were executed on IBM quantum devices using the circuit queue API. Job submissions were batched in such a way that all measurement circuits for a data point were submitted and executed together.

Beyond the error mitigation procedure described in the main text we carry out no further error mitigation, specifically we do not apply measurement error mitigation.

a b
c d
e f

Figure 8: Accuracy of classifying training data as function of the size of the training data. The shaded area is the standard deviation of the accuracy. We use same data as in the main text. We compare the radial basis function kernel (blue dots), exact quantum kernel (orange), noiseless simulation of randomized measurements (green), kernel of IBM quantum computer with randomized measurements using error mitigation (red) and no error mitigation (yellow). a,c,e) YZ-CX PQC with features and b,d,f) NPQC with features. We use a,b) ibmq_guadalupe, c,d) ibmq_toronto and e,f) equally distributing measurements between aforementioned quantum computers. We have measurement samples, qubits and randomized measurement settings. The training data is randomly drawn from the full dataset, which is repeated 20 times for each training data size.

Appendix E Training accuracy

In Fig.8, we plot the accuracy of classifying the training data with the SVM for the YZ-CX PQC and NPQC. We show the accuracy for processing on ibmq_guadalupe, ibmq_toronto and distributing the dataset between both quantum computers. The accuracy is defined as the percentage of training data that is correctly identified. We find that error mitigation substantially increases the accuracy in all cases.

Appendix F Confusion matrix

We now show the confusion matrices for the test data. The confusion matrix shows what label is predicted by the SVM in respect to its true label of the test data. The diagonal are the correctly classified digits, whereas the off-diagonals show the number of times a digit was miss-classified. In Fig.

9, we show the confusion matrix for the NPQC. and in Fig.10 we show the confusion matrix for the YZ-CX PQC. We find that the actual digit 8 is often predicted to be the digit 1. Then, likely confusions are that digit 3 is assumed to be 8 and digit 9 is assumed to be 8. We find these confusions consistently in all kernels. While for the NPQC, radial basis function kernel and quantum kernel give nearly the same confusion matrix, we find substantial differences for the YZ-CX PQC. The reason is that while NPQC is an approximate isotropic radial basis function kernel, the YZ-CX PQC is an approximate radial basis function kernel with a weight matrix given by the QFIM. The weight matrix of the YZ-CX seems to reduce the accuracy of the trained SVM.

a b c d

Figure 9: Confusion matrix for the NPQC for a) radial basis function kernel b) exact quantum kernel c) mitigated IBM results with ibmq_guadalupe and d) unmitigated IBM results. We use 1300 training data and 200 test data, where we average the confusion matrix over 100 randomly sampled instances of the data.

a b c d

Figure 10: Confusion matrix for the YZ-CX PQC for a) radial basis function kernel b) exact quantum kernel c) mitigated IBM results with ibmq_guadalupe and d) unmitigated IBM results. We use 1300 training data and 200 test data, where we average the confusion matrix over 100 randomly sampled instances of the data.

Appendix G Product state as analytic radial basis function kernel

As an analytic example, we show that product states form an exact radial basis function kernel. We use the following qubit quantum state

(10)

The QFIM is given by , where is the -dimensional identity matrix and . The kernel of two states parameterized by , is given by

(11)

where we define as the difference between the two parameter sets. We now assume and that all the differences of the parameters are equal . We then find in the limit of many qubits

(12)

which gives us the radial basis function kernel.