1 Introduction
Quantum machine learning (QML) is regarded as an early application of noisy intermediatescale quantum computing that could leverage quantum advantage (bharti2022noisy; huang2021information; huang2021power; huang2021quantum). During the last few years, there have been several proposals to perform different supervised and unsupervised QML tasks (rebentrost2014quantum; schuld2015introduction; Dunjko2018machinelearning; biamonte2017quantum; bharti2022noisy). Most of the QML literature focuses on gradientbased algorithms that rely on hybrid approaches whereby a classical computer is used to update variational parameters of quantum circuits to minimise a given cost function (Benedetti2019parameterized; caro2021generalization)
. However, gradient descent–applied to quantum circuits–is known for scaling poorly with the number of qubits, as the probability of the gradients being nonzero is exponentially small as a function of the number of qubits
(mcclean2018barren). This phenomenon is commonly addressed as the barren plateau problem and jeopardises the practical achievement of quantum advantage. For this reason, there has also been a general interest of using gradientfree techniques to train variational quantum circuits (franken2020gradient; benedetti2019generative; peruzzo2014variational; khatri2019quantum; leyton2021robust) (despite some controversy (Arrasmith2021effectofbarren; marrero2020entanglement; cerezo2021cost)), as well as coming up with quantuminspired gradientfree machine learning methods that can run both on classical and quantum computers (gonzalez2021classification; gonzalez2021learning; sergioli2019binary).In this work, we report the implementation of an optimisationfree framework (gonzalez2021classification; gonzalez2021learning) (similar to quantum kernels (mengoni209kernel; schuld2019qml; havlivcek2019supervised; blank2020quantum)) on real quantum devices for classification and density estimation^{1}^{1}1We also release a library that is used to perform local and remote (on IBM quantum computers) runs of quantum circuits for classification and density estimation: https://gitlab.com/mlphysicsunal/qcm. This framework can be used for supervised and unsupervised machine learning tasks, which we exemplify through classification and density estimation. Its main feature is that a dataset of arbitrarily many samples can be compressed into a quantum state of a fixed number of qubits. Once this quantum state is prepared, it is projected onto a quantum state of a sample that is to be classified or whose density is to be estimated. The latter quantum state is built using a quantum feature map encoding. Therefore, classification or density estimation (unlike many quantum kernel methods) can be achieved by just a single estimation of a quantum state overlap between a quantum state that encodes an arbitrarily large data set and the corresponding quantum state of the sample of interest.
This paper is divided as follows. Section 2 explains the gradientfree framework for classification and density estimation. Section 3 shows how to perform these tasks on a quantum computer. Then, section 4 presents results for a couple of experiments carried out on real quantum devices. After that, section 5 discusses the results in light of future challenges. Finally, section 6 concludes.
2 Gradientfree Classification and Density Estimation
In this section, we outline the algorithms for classification and density estimation based on quantum measurements performed on general physical systems, which can be efficiently simulated in classical computers (gonzalez2021classification; gonzalez2021learning).
The departure point for both algorithms is the availability of a quantum feature map (QFM) , where is the space of classical data features, and is the Hilbert space of some physical system. Thus, the QFM maps a data sample to the quantum state of a physical system, i.e. , where indexes a set of data samples.
A quantum state for a data set of samples can be built through
(1) 
where is a normalisation constant. Equation 1 shows that the data set state is a superposition of the states corresponding to each sample.
2.1 Classification
To incorporate a class for each sample , we consider another QFM , where is a discrete set of elements or classes, and is the Hilbert space of some physical system. Therefore, a labelled data set can be mapped to a quantum state via
(2) 
where .
As explained in Ref. (gonzalez2021classification), the classification of a new data point consists of projecting the part of the dataset quantum state onto the corresponding new data point quantum state . More formally, we can construct a reduced density matrix
(3) 
from which we can obtain the probability that is of the class . Note that we can directly calculate the probability through .
2.2 Density Estimation
We consider a data set described by the quantum state in eq. 1. The estimated probability density at any point is simply given by the Born rule
(4) 
3 Circuit Implementation
In sections 3.2 and 3.1 we will show how classification and density estimation can be carried out when the QFMs map classical data onto the state of a multiqubit system. Section 3.3 will discuss how the particular quantum circuit unitaries can be implemented to perform classification and density estimation.
3.1 Classification
A general quantum circuit for classification is depicted in fig. 1(a), where the probability that a new data point is of class is computed using a training labelled data set . The quantum circuit can be seen as follows. From left to right, the unitary prepares the quantum state of the data set : . From right to left, the unitary prepares the quantum state of the new data point along the th class direction: . Therefore, the quantum circuit prepares a state such that its projection onto gives the probability of being classified in class .
Thus, the classification probability can be estimated by sampling the quantum circuit times and counting the number of times that the bit string is measured. Then, the estimated probability is .
3.2 Density Estimation
A general quantum circuit for probability density estimation is given in fig. 1(c), where the probability density at a point is computed using a training data set . Similarly as in classification, the quantum circuit can be seen in two ways in order to grasp which states are being prepared: from left to right, the unitary prepares the data set quantum state ; and from right to left, the unitary prepares the sample data point quantum state . Thus, the complete circuit prepares the state , whose projection onto gives the probability density at .
The latter procedure allows the direct estimation of the probability density as shown in eq. 4 by making measurements of the quantum circuit and by computing , where is the number of times that the bit string is measured.
3.3 MultiQubit Quantum State Preparation
The method that we have so far explored depends on the ability to compile the unitaries for into quantum circuits readable by current quantum computers. Most current quantum computers have primitive one and twoqubit gates that allow universal quantum computation. Therefore, even though the general unitary is known, we need to decompose it into the primitive quantum gates of a quantum computer.
Several algorithms for arbitrary unitary decomposition have been suggested (barenco1995qrdecomposition; mottonen2004sincosDecomposition; krol2022efficient; li2013decomposition). In this work, we use the algorithm proposed in Ref. (shende2006synthesis), that offers a preparation of an qubit state using at most CNOT gates. This algorithm is implemented in the popular library for quantum computing Qiskit (Qiskit), which we used to connect to publicly available quantum computers from IBM.
Remarkably, recent work has produced new ways to prepare arbitrary quantum states using shallow quantum circuits (bausch2020fast), by using additional ancillary qubits (araujo2021qsp; zhang2021lowdepth), by training parametrised quantum circuits in the socalled quantum machine learning setup (schuld2019qml; Haug2020classifying; rakyta2022efficient)
, or even by implementing tensornetwork inspired gradientfree optimisation techniques
(shirakawa2021automatic).Regardless of the quantum state preparation algorithm, our classification and density estimation framework retains the advantage of condensing the complete, arbitrarily large dataset into a single quantum state of fixed size.
4 Results
The quantum circuit shown in fig. 1(a) was used to classify data in a XOR disposition, as shown in fig. 1(c). Such toy data set is able to tell apart linear classifiers from nonlinear classifiers. In our case, nonlinearity is induced by the QFM. As an example, we consider the following QFM
(5) 
which ensures that the induced kernel is a pairwise cosinelike similarity measure
. Regarding class labels, we selected the onehot encoding as the QFM, such that red points are mapped to
and blue points are mapped to . Thus, a total of three qubits are used to perform the classification quantum circuit, with two qubits encoding the data features, and the remaining one encoding the class label.Figure 2 shows three panels that display the probability that a point placed in is assigned to the red class or the blue class. The three panels correspond to a classical simulation of the classification quantum circuit on the left, a classical simulation of the corresponding noisy quantum circuit on the middle, and the classification carried out on the IBM Bogotá quantum device on the right. The noise model for the quantum circuit includes singlequbit readout errors, gate errors and T1 and T2 relaxation errors. It is clear from the middle and right panels of fig. 2 that the used noise model is not able to simulate the real noisy quantum circuit, most likely because such a simplified noise model does not account for the complex dynamics that the quantum circuit undergoes as an open quantum system (berg2022probabilistic).
A more general QFM that is not as handtailored as the one introduced in eq. 5 is the random Fourier features (RFFs) QFM that we proposed in Ref. (gonzalez2021classification). The RFF method consists of mapping data features to a finitedimensional space where the inner product approximates a given kernel (rahimi2007rff). Such a map can be written as , where is some number of dimensions, such that , for some given shiftinvariant kernel function . This result is supported on Bochner’s theorem (reed1975ii), which affirms that a shiftinvariant kernel is related to a particular probability measure
through the Fourier transform. This allows us to write the
th component of as(6) 
where is sampled from , and is sampled uniformly from . Finally, the RFFs obtained through can be used to define a QFM, for instance, through a binarised amplitude encoding.
In the case of the 1D data shown in fig. 1(d), we can define the QFM through
(7) 
where is the decimal representation of a bit string of length ^{2}^{2}2In this work, . Thus, . Remarkably, as we proved in (gonzalez2021learning), this technique enables the approximation of any probability distribution using finitedimensional density matrices at the core of the algorithm.
We chose the map to approximate the Gaussian kernel, with a given parameter , such that (rahimi2008weighted). A total of eight RFFs were used so that the circuit in fig. 1(c) consisted of three qubits.
In fig. 3 we show the density estimation carried out in three different ways. The three panels correspond to a classical simulation of the density estimation quantum circuit on the left, a classical simulation of the density estimation noisy quantum circuit on the middle, and the density estimation carried out on the IBM Lima quantum device on the right. Similarly, as in the classification case, we see that the noise model provided by IBM is far from simulating the actual behaviour of the quantum circuit.
5 Discussion
QFMs play a central role in this work, as they provide a solution to the problem of encoding classical data into quantum states of qubits. Nonetheless, a calculation of the complete state is required prior to physically encoding the classical data into the quantum computer. This sole fact puts in danger the algorithmic advantage of our proposal running on a quantum computer versus running on a classical computer, due to the easy classical access to the wave function entries of the dataset state (cotler2021revisiting).
As we mentioned before, the preparation of on a quantum computer can be done using several arbitrary quantum state preparation methods. This is only done once. If the data increases, so that a new state needs to be prepared, one can consider the simpler problem of preparing with as an initial state, instead of the usual initial state (haug2021optimal).
The preparation of or directly challenges the scalability of our proposal. In this work, we have prepared using onehot encoding, which is a completely deterministic QFM with gates. However, the preparation of requires exponentially many quantum gates as a function of the number of qubits of the QFM’s target physical system (shende2006synthesis). We used such arbitrary state preparation algorithms in the experiments of this work for illustration; however, this procedure is not scalable. Instead, we can consider a parameterised quantum circuit that maps data and parameters in the angles of parameterised quantum gates. Then, by minimising , where is a distance (fidelity (rakyta2022efficient), KL divergence of the probability distributions represented by the states (liu2018differentiable), classical shadows (li2021vsql; sack2022), among others), one is able to obtain a variational circuit that acts as a primitive circuit to approximately apply QFMs to new data points without investing exponential resources.
This proposed setup would use the primitives for preparing the dataset state , and for preparing the quantum state of a single data point to perform classification and density estimation, as shown in this paper. The numerical heavylifting that is exponential in the number of qubits of the system would need to be done just once when preparing the circuit primitives. However, classifying or estimating the density of a new data sample would involve just the evaluation of the primitive circuits. Of course, the feasibility of using this method for large scale quantum machine learning is subjected to the progress of training parameterised quantum circuits, which amounts to overcoming the barren plateau problem (sack2022; haug2021optimal; Sim2021adaptive; zhu2019training; Grant2019initialization).
6 Conclusions
In this work, we implemented a method (gonzalez2021classification; gonzalez2021learning) to perform data classification and density estimation on quantum circuits. This was achieved through the deterministic preparation of a quantum state that represents the information contained in a classical training data set and a quantum state that represents the information of a single point to be classified or whose probability density is to be estimated. These quantum states are obtained by applying a quantum feature map to classical data points and are prepared using arbitrary quantum state algorithms.
One of the outstanding advantages of this method is the ability to approximate the probability distribution of arbitrarily large training data sets into finitedimensional quantum states. We demonstrated classification and density estimation with toy data sets using quantum circuits of three qubits. We confirmed that the method’s performance on real quantum devices suffered from decoherence, as expected. However, the noise models provided by IBM’s Qiskit are far from describing the actual behaviour of the quantum device for the applications we explored. This shows that, even though the theory of open quantum systems has been well established, its practical application to large quantum systems has been a challenge. Thus, our work adheres to the experimental evidence that more effective noise models are needed to simulate decoherence in quantum circuits.
Regarding possible quantum advantages, we acknowledge that the preparation of arbitrary quantum states can lead to the performance degradation of our method. Nonetheless, the exponential effort needed to prepare the quantum state of the training data set needs to be done only once. Furthermore, we argued that the effort to prepare the quantum state of a new data point (to be classified or whose probability density is to be estimated) could also be made only just once by training a variational quantum circuit that performs the desired quantum feature map on an arbitrary input. However, the feasibility of this alternative is subject to the advance of methods to train variational quantum circuits avoiding the barren plateau problem.
Statements and Declarations

Competing Interests: The authors declare no competing interests.