Log In Sign Up

Optimisation-free Classification and Density Estimation with Quantum Circuits

We demonstrate the implementation of a novel machine learning framework for classification and probability density estimation using quantum circuits. The framework maps a training data set or a single data sample to the quantum state of a physical system through quantum feature maps. The quantum state of the arbitrarily large training data set summarises its probability distribution in a finite-dimensional quantum wave function. By projecting the quantum state of a new data sample onto the quantum state of the training data set, one can derive statistics to classify or estimate the density of the new data sample. Remarkably, the implementation of our framework on a real quantum device does not require any optimisation of quantum circuit parameters. Nonetheless, we discuss a variational quantum circuit approach that could leverage quantum advantage for our framework.


Quantum Measurement Classification with Qudits

This paper presents a hybrid classical-quantum program for density estim...

Generalization in quantum machine learning from few training data

Modern quantum machine learning (QML) methods involve variationally opti...

Phase Gadget Synthesis for Shallow Circuits

We give an overview of the circuit optimisation methods used by tket, a ...

Supervised Learning with Quantum Measurements

This letter reports a novel method for supervised machine learning based...

A learning theory for quantum photonic processors and beyond

We consider the tasks of learning quantum states, measurements and chann...

Quantum Geometric Machine Learning for Quantum Circuits and Control

The application of machine learning techniques to solve problems in quan...

Probabilistic Modeling with Matrix Product States

Inspired by the possibility that generative models based on quantum circ...

1 Introduction

Quantum machine learning (QML) is regarded as an early application of noisy intermediate-scale quantum computing that could leverage quantum advantage (bharti2022noisy; huang2021information; huang2021power; huang2021quantum). During the last few years, there have been several proposals to perform different supervised and unsupervised QML tasks (rebentrost2014quantum; schuld2015introduction; Dunjko2018machinelearning; biamonte2017quantum; bharti2022noisy). Most of the QML literature focuses on gradient-based algorithms that rely on hybrid approaches whereby a classical computer is used to update variational parameters of quantum circuits to minimise a given cost function (Benedetti2019parameterized; caro2021generalization)

. However, gradient descent–applied to quantum circuits–is known for scaling poorly with the number of qubits, as the probability of the gradients being non-zero is exponentially small as a function of the number of qubits 

(mcclean2018barren). This phenomenon is commonly addressed as the barren plateau problem and jeopardises the practical achievement of quantum advantage. For this reason, there has also been a general interest of using gradient-free techniques to train variational quantum circuits (franken2020gradient; benedetti2019generative; peruzzo2014variational; khatri2019quantum; leyton2021robust) (despite some controversy (Arrasmith2021effectofbarren; marrero2020entanglement; cerezo2021cost)), as well as coming up with quantum-inspired gradient-free machine learning methods that can run both on classical and quantum computers (gonzalez2021classification; gonzalez2021learning; sergioli2019binary).

In this work, we report the implementation of an optimisation-free framework (gonzalez2021classification; gonzalez2021learning) (similar to quantum kernels (mengoni209kernel; schuld2019qml; havlivcek2019supervised; blank2020quantum)) on real quantum devices for classification and density estimation111We also release a library that is used to perform local and remote (on IBM quantum computers) runs of quantum circuits for classification and density estimation: This framework can be used for supervised and unsupervised machine learning tasks, which we exemplify through classification and density estimation. Its main feature is that a dataset of arbitrarily many samples can be compressed into a quantum state of a fixed number of qubits. Once this quantum state is prepared, it is projected onto a quantum state of a sample that is to be classified or whose density is to be estimated. The latter quantum state is built using a quantum feature map encoding. Therefore, classification or density estimation (unlike many quantum kernel methods) can be achieved by just a single estimation of a quantum state overlap between a quantum state that encodes an arbitrarily large data set and the corresponding quantum state of the sample of interest.

This paper is divided as follows. Section 2 explains the gradient-free framework for classification and density estimation. Section 3 shows how to perform these tasks on a quantum computer. Then, section 4 presents results for a couple of experiments carried out on real quantum devices. After that, section 5 discusses the results in light of future challenges. Finally, section 6 concludes.

2 Gradient-free Classification and Density Estimation

In this section, we outline the algorithms for classification and density estimation based on quantum measurements performed on general physical systems, which can be efficiently simulated in classical computers (gonzalez2021classification; gonzalez2021learning).

The departure point for both algorithms is the availability of a quantum feature map (QFM) , where is the space of classical data features, and is the Hilbert space of some physical system. Thus, the QFM maps a data sample to the quantum state of a physical system, i.e. , where indexes a set of data samples.

A quantum state for a data set of samples can be built through


where is a normalisation constant. Equation 1 shows that the data set state is a superposition of the states corresponding to each sample.

2.1 Classification

To incorporate a class for each sample , we consider another QFM , where is a discrete set of elements or classes, and is the Hilbert space of some physical system. Therefore, a labelled data set can be mapped to a quantum state via


where .

As explained in Ref. (gonzalez2021classification), the classification of a new data point consists of projecting the part of the dataset quantum state onto the corresponding new data point quantum state . More formally, we can construct a reduced density matrix


from which we can obtain the probability that is of the class . Note that we can directly calculate the probability through .

2.2 Density Estimation

We consider a data set described by the quantum state in eq. 1. The estimated probability density at any point is simply given by the Born rule


3 Circuit Implementation

In sections 3.2 and 3.1 we will show how classification and density estimation can be carried out when the QFMs map classical data onto the state of a multi-qubit system. Section 3.3 will discuss how the particular quantum circuit unitaries can be implemented to perform classification and density estimation.

Figure 1: Quantum circuits and data sets for classification and density estimation. (a) is the circuit for classification of a point given a training data set ; the green part of the circuit corresponds to the data feature space and the yellow one to the labels space . (b) shows a toy data set used for classification, with two features and and a label shown as the red or blue colour. (c) is the circuit for estimating the probability density of a point given a training data set

. (d) shows a 1D toy data set used for density estimation and a kernel density estimation (KDE) fit.

3.1 Classification

A general quantum circuit for classification is depicted in fig. 1(a), where the probability that a new data point is of class is computed using a training labelled data set . The quantum circuit can be seen as follows. From left to right, the unitary prepares the quantum state of the data set : . From right to left, the unitary prepares the quantum state of the new data point along the -th class direction: . Therefore, the quantum circuit prepares a state such that its projection onto gives the probability of being classified in class .

Thus, the classification probability can be estimated by sampling the quantum circuit times and counting the number of times that the bit string is measured. Then, the estimated probability is .

3.2 Density Estimation

A general quantum circuit for probability density estimation is given in fig. 1(c), where the probability density at a point is computed using a training data set . Similarly as in classification, the quantum circuit can be seen in two ways in order to grasp which states are being prepared: from left to right, the unitary prepares the data set quantum state ; and from right to left, the unitary prepares the sample data point quantum state . Thus, the complete circuit prepares the state , whose projection onto gives the probability density at .

The latter procedure allows the direct estimation of the probability density as shown in eq. 4 by making measurements of the quantum circuit and by computing , where is the number of times that the bit string is measured.

3.3 Multi-Qubit Quantum State Preparation

The method that we have so far explored depends on the ability to compile the unitaries for into quantum circuits readable by current quantum computers. Most current quantum computers have primitive one- and two-qubit gates that allow universal quantum computation. Therefore, even though the general unitary is known, we need to decompose it into the primitive quantum gates of a quantum computer.

Several algorithms for arbitrary unitary decomposition have been suggested (barenco1995qrdecomposition; mottonen2004sincosDecomposition; krol2022efficient; li2013decomposition). In this work, we use the algorithm proposed in Ref. (shende2006synthesis), that offers a preparation of an -qubit state using at most CNOT gates. This algorithm is implemented in the popular library for quantum computing Qiskit (Qiskit), which we used to connect to publicly available quantum computers from IBM.

Figure 2: Predictions (background colour) of exact circuit simulation (left), noisy circuit simulation (middle) and circuit on the IBM Bogotá quantum device (right) for a XOR data set (points, cf. fig. 1

(b)). The colour indicates the probability that a point is classified in the blue class, as shown by the colour bar. The area under the the receiver operating characteristic curve was 99.93%, 99.82% and 95.83% for the predictions of the exact simulation, noisy simulation, and real quantum device, respectively.

Remarkably, recent work has produced new ways to prepare arbitrary quantum states using shallow quantum circuits (bausch2020fast), by using additional ancillary qubits (araujo2021qsp; zhang2021lowdepth), by training parametrised quantum circuits in the so-called quantum machine learning setup (schuld2019qml; Haug2020classifying; rakyta2022efficient)

, or even by implementing tensor-network inspired gradient-free optimisation techniques 


Regardless of the quantum state preparation algorithm, our classification and density estimation framework retains the advantage of condensing the complete, arbitrarily large dataset into a single quantum state of fixed size.

4 Results

The quantum circuit shown in fig. 1(a) was used to classify data in a XOR disposition, as shown in fig. 1(c). Such toy data set is able to tell apart linear classifiers from non-linear classifiers. In our case, non-linearity is induced by the QFM. As an example, we consider the following QFM


which ensures that the induced kernel is a pairwise cosine-like similarity measure

. Regarding class labels, we selected the one-hot encoding as the QFM, such that red points are mapped to

and blue points are mapped to . Thus, a total of three qubits are used to perform the classification quantum circuit, with two qubits encoding the data features, and the remaining one encoding the class label.

Figure 2 shows three panels that display the probability that a point placed in is assigned to the red class or the blue class. The three panels correspond to a classical simulation of the classification quantum circuit on the left, a classical simulation of the corresponding noisy quantum circuit on the middle, and the classification carried out on the IBM Bogotá quantum device on the right. The noise model for the quantum circuit includes single-qubit readout errors, gate errors and T1 and T2 relaxation errors. It is clear from the middle and right panels of fig. 2 that the used noise model is not able to simulate the real noisy quantum circuit, most likely because such a simplified noise model does not account for the complex dynamics that the quantum circuit undergoes as an open quantum system (berg2022probabilistic).

Figure 3:

Density estimation (blue points) of bi-Gaussian-distributed data (cf. 

fig. 1

(d)) with exact circuit simulation (left), noisy circuit simulation (middle) and run on the IBM Lima quantum computer. Orange lines are computed through regular Gaussian kernel density estimation. 1024 shots were used to estimate every point on a (simulated or real) quantum computer. Confidence intervals are computed with the asymptotic normal approximation of the Bernoulli distribution from which measurements are sampled.

A more general QFM that is not as hand-tailored as the one introduced in eq. 5 is the random Fourier features (RFFs) QFM that we proposed in Ref. (gonzalez2021classification). The RFF method consists of mapping data features to a finite-dimensional space where the inner product approximates a given kernel (rahimi2007rff). Such a map can be written as , where is some number of dimensions, such that , for some given shift-invariant kernel function . This result is supported on Bochner’s theorem (reed1975ii), which affirms that a shift-invariant kernel is related to a particular probability measure

through the Fourier transform. This allows us to write the

-th component of as


where is sampled from , and is sampled uniformly from . Finally, the RFFs obtained through can be used to define a QFM, for instance, through a binarised amplitude encoding.

In the case of the 1D data shown in fig. 1(d), we can define the QFM through


where is the decimal representation of a bit string of length 222In this work, . Thus, . Remarkably, as we proved in (gonzalez2021learning), this technique enables the approximation of any probability distribution using finite-dimensional density matrices at the core of the algorithm.

We chose the map to approximate the Gaussian kernel, with a given parameter , such that  (rahimi2008weighted). A total of eight RFFs were used so that the circuit in fig. 1(c) consisted of three qubits.

In fig. 3 we show the density estimation carried out in three different ways. The three panels correspond to a classical simulation of the density estimation quantum circuit on the left, a classical simulation of the density estimation noisy quantum circuit on the middle, and the density estimation carried out on the IBM Lima quantum device on the right. Similarly, as in the classification case, we see that the noise model provided by IBM is far from simulating the actual behaviour of the quantum circuit.

5 Discussion

QFMs play a central role in this work, as they provide a solution to the problem of encoding classical data into quantum states of qubits. Nonetheless, a calculation of the complete state is required prior to physically encoding the classical data into the quantum computer. This sole fact puts in danger the algorithmic advantage of our proposal running on a quantum computer versus running on a classical computer, due to the easy classical access to the wave function entries of the dataset state  (cotler2021revisiting).

As we mentioned before, the preparation of on a quantum computer can be done using several arbitrary quantum state preparation methods. This is only done once. If the data increases, so that a new state needs to be prepared, one can consider the simpler problem of preparing with as an initial state, instead of the usual initial state  (haug2021optimal).

The preparation of or directly challenges the scalability of our proposal. In this work, we have prepared using one-hot encoding, which is a completely deterministic QFM with gates. However, the preparation of requires exponentially many quantum gates as a function of the number of qubits of the QFM’s target physical system (shende2006synthesis). We used such arbitrary state preparation algorithms in the experiments of this work for illustration; however, this procedure is not scalable. Instead, we can consider a parameterised quantum circuit that maps data and parameters in the angles of parameterised quantum gates. Then, by minimising , where is a distance (fidelity (rakyta2022efficient), KL divergence of the probability distributions represented by the states (liu2018differentiable), classical shadows (li2021vsql; sack2022), among others), one is able to obtain a variational circuit that acts as a primitive circuit to approximately apply QFMs to new data points without investing exponential resources.

This proposed setup would use the primitives for preparing the dataset state , and for preparing the quantum state of a single data point to perform classification and density estimation, as shown in this paper. The numerical heavy-lifting that is exponential in the number of qubits of the system would need to be done just once when preparing the circuit primitives. However, classifying or estimating the density of a new data sample would involve just the evaluation of the primitive circuits. Of course, the feasibility of using this method for large scale quantum machine learning is subjected to the progress of training parameterised quantum circuits, which amounts to overcoming the barren plateau problem (sack2022; haug2021optimal; Sim2021adaptive; zhu2019training; Grant2019initialization).

6 Conclusions

In this work, we implemented a method (gonzalez2021classification; gonzalez2021learning) to perform data classification and density estimation on quantum circuits. This was achieved through the deterministic preparation of a quantum state that represents the information contained in a classical training data set and a quantum state that represents the information of a single point to be classified or whose probability density is to be estimated. These quantum states are obtained by applying a quantum feature map to classical data points and are prepared using arbitrary quantum state algorithms.

One of the outstanding advantages of this method is the ability to approximate the probability distribution of arbitrarily large training data sets into finite-dimensional quantum states. We demonstrated classification and density estimation with toy data sets using quantum circuits of three qubits. We confirmed that the method’s performance on real quantum devices suffered from decoherence, as expected. However, the noise models provided by IBM’s Qiskit are far from describing the actual behaviour of the quantum device for the applications we explored. This shows that, even though the theory of open quantum systems has been well established, its practical application to large quantum systems has been a challenge. Thus, our work adheres to the experimental evidence that more effective noise models are needed to simulate decoherence in quantum circuits.

Regarding possible quantum advantages, we acknowledge that the preparation of arbitrary quantum states can lead to the performance degradation of our method. Nonetheless, the exponential effort needed to prepare the quantum state of the training data set needs to be done only once. Furthermore, we argued that the effort to prepare the quantum state of a new data point (to be classified or whose probability density is to be estimated) could also be made only just once by training a variational quantum circuit that performs the desired quantum feature map on an arbitrary input. However, the feasibility of this alternative is subject to the advance of methods to train variational quantum circuits avoiding the barren plateau problem.

Statements and Declarations

  • Competing Interests: The authors declare no competing interests.