I Introduction
Machine learning has revolutionized the modern world. Today, machine learning models are leveraged for nearly every imaginable task ranging from medical diagnoses [Myszczynska2020] to fraud detection [Awoyemi2017] to marketing [Sterne2017]. The emergence of machine learning across many disciplines is due in large part to the recent accessiblity of relatively powerful computers. In accordance with Moore’s Law, computer hardware has improved exponentially in scale and speed over the past 60 years. Unfortunately, despite all of the recent success, modern hardware still greatly restricts the practicality of certain machine learning models. Machine learning, deep learning in particular, can be very computationally expensive, sometimes requiring hours, days, or even months of training time on today’s computers [Thompson2020]. Moreover, conventional computers are beginning to approach physical limitations that will slow their improvements in years to come [Peper2017]. For these reasons, many are beginning to research alternative computing platforms for training machine learning models. Among these platforms, quantum computers have emerged as a particularly interesting candidate.
The appeal of a quantum computer is largely due to the properties of quantum entanglement and quantum superposition which cannot be efficiently simulated on a classical computer. These properties can be extremely useful as illustrated by Shor’s prime factorization algorithm [Shor1999] and Grover’s search algorithm [Grover1996] which offer an exponential and polynomial speed up respectively over their best existing classical counterparts. These two algorithms give a sense of what large, high fidelity quantum computers may offer to the field of computer science in years to come. Today we are still in the era of noisy intermediate scale quantum (NISQ) computers, but already a number of quantum machine learning algorithms have been proposed [Biamonte2017, Ciliberto2018]
. We have previously studied quantum approaches to linear regression
[Date2020LinReg][Date2021QUBOFormulations], and balanced means clustering [Arthur2020balanced]. Additionally, we proposed a quantum learning model that can be used for binary classification on universal quantum computers [Date2020QuantumDiscriminator]. In this paper, we propose a hybrid quantumclassical neural network architecture and empirically analyze its performance on several binary classification data sets. Our study is the first to test this hybrid neural network architecture using simulated or quantum hardware to the best of our knowledge.Neural networks have proven to be successful for learning on conventional computers, but they have some serious limitations. They are prone to overfitting [Hawkins2004], and training even a small neural network is an NPcomplete problem [Blum1992]. These limitations have inspired many to propose quantum approaches to deep learning [Schuld2015, Wan2017, Killoran2019, Beer2020, Zoufal2019, Kamruzzaman2019, Garg2020]. Unfortunately, many of these proposals cannot be implemented on modern hardware, and those that can, do not have a onetoone correspondence with conventional artificial neural networks. For this reason, there is an increased interest in deep learning approaches that perform well on near term quantum computers. To this end, variational quantum circuits (VQCs) have proven to be a promising quantum analogue to artificial neurons [Cerezo2021, Abbas2021, Benedetti2019, Broughton2020, Sim2019, Hubregtsen2021, Chen2020]. Variational quantum circuits can be trained using classical optimization techniques, and they are believed to have some expressibility advantages over conventional neural network architectures [Cerezo2021]. Liu et al. have implemented their own hybrid quantumclassical neural network using variational quantum circuits, and their study provides detailed insight into the training dynamics of models similar to ours [liu2021representation].
Ii Quantum Neural Networks
A variational quantum circuit is comprised of three key components. First, a feature map maps a real valued classical data point into a qubit quantum state :
(1) 
Next, an ansatz
manipulates the prepared quantum state through a series of entanglements and rotation gates. The angles of the ansatz’s rotations are parameterized by a vector
.(2) 
Finally, an observable
is measured, and the eigenvalue corresponding to the resultant quantum state is recorded. In most machine learning applications, a variational quantum circuit is run many times using a particular input
and parameter vector so that the circuit’s expectation value, denoted by , can be approximated.(3) 
When a variational quantum circuit is used for machine learning, this approximated expectation value is typically treated as the output of the model.
The feature map of a variational quantum circuit is known to play some role in the expressiveness of the model [Schuld2021FeatureMap]. In general, data should be encoded in such a way that the value of a feature can be extracted from the prepared quantum state through some combination of qubit rotations and measurements. This ensures each possible input has a unique qubit encoding before being passed to the ansatz. On modern hardware, it is also important to use a feature map with limited depth, since each additional gate introduces noise to the quantum state. We satisfy both requirements by scaling each feature to fit within the interval and then encoding its value into the relative amplitude of a corresponding qubit:
(4) 
It is worth noting that more sophisticated feature maps exist [Goto2020, lloyd2020featureencoding, Yano2020]. However, for the data sets analyzed in this manuscript, this straightforward feature map achieves high accuracy while avoiding many of the complications introduced by more complex methods.
The Qiskit circuit library contains several ansatzes consisting of two qubit entanglements and parameterized single qubit rotations. We chose the RealAmplitudes ansatz (with one repetition and full entanglement) for each variational quantum circuit studied in Section IV. This is the default ansatz used by Qiskit’s variational quantum circuit implementation (TwoLayerQNN). It has also been used in a variational quantum circuit with proven advantages over traditional feedforward neural networks in terms of both capacity and trainability [Abbas2021]
. We also chose the default observable used by Qiskit’s variational quantum circuit implementation. Mathematically, this observable can be described as the tensor product of
PauliZ matrices (), where is the number of qubits in the quantum state:(5) 
This observable has the interesting property that if the measured quantum state has odd parity, the recorded eigenvalue is 1, and if the measured quantum state has even parity, the recorded eigenvalue is 1. This means that the expectation value of the circuit will always be within the interval
.A number of studies have used a variational quantum circuit for binary classification [Schuld2020BinaryClassification, Chen2021, Chen2020HybridClassifier, Farhi2018classification, Mitarai2018]
. This can be done by relating the expectation value of the circuit to the probability that a point belongs to a given class. Consider a binary classification problem in which each data point
is labeled or . We use the following equation to relate the expectation value of the parity observable to the probability a point is labelled :(6) 
Training the variational quantum circuit classifier amounts to determining a parameter vector
that minimizes the negative loglikelihood of the probability distribution over the training data set. The exact cost function used by our binary classifier is given by the equation below:
(7) 
where is the number of points in the training data set, is the th data point in the training set, and is the label of the th point.
Cost can be minimized using a classical optimizer such as gradient descent. When computing the gradient of the cost function, the derivative of the expectation value of the variational quantum circuit with respect to each parameter of the ansatz is computed using parameter shift rule [Crooks2019]:
(8) 
where is a macroscopic shift determined by the eigenvalues of the gate parameterized by . For all of the rotation and phase gates available in the Qiskit library, .
In many ways, the aforementioned variational quantum circuit classifier resembles a logistic unit used in a conventional neural network. The circuit has an input vector and a set of classically optimizable parameters . Additionally, the output (expected value) of the variational quantum circuit is continuously differentiable and bound to a small range of real values. These similarities motivated us to construct a small hybrid quantumclassical feedforward neural network using variational quantum circuits as individual neurons. To achieve reasonable training times on modern quantum hardware, we restricted the neural network architecture to contain only a single hidden layer and a single output unit.
Using the architecture shown in Figure 2, our hybrid neural network contains variational quantum circuits. The first circuits comprise the hidden layer of the feedforward network. Each of these circuits has its own parameter vector , and they all share the same input . The output of each circuit in the hiddenlayer is stacked to create an dimensional vector . Collectively, we denote the circuits of the hidden layer as a function .
(9) 
Before is passed to the feature map of the final variational quantum circuit, a transformation is applied so that each value is within the interval :
(10) 
The last variational quantum circuit is run with input and parameter vector . We denote this quantum circuit as a function . All together, the hybrid neural network is expressed by the following composite function:
(11) 
The output of the hybrid neural network can be used to learn the probability distribution that a point is labeled using the same method described by Equation 6:
(12) 
Training the hybrid neural network on binary classification problems is similar to training the individual variational quantum circuit classifier. We minimize the same cost function (given by Equation 7), and we still use parameter shift rule to compute the gradient of each circuit. Now however, we must compute the gradient of the cost function with respect to the output layer’s parameter vector as well each parameter vector in the hidden layer
. This is performed most efficiently using backpropagation.
Iii Methods
iii.1 Amplitude Encoding Feature Map
The amplitude encoding feature map is implemented by first initializing a quantum register of qubits, each in the state. Next, a single qubit parameterized RY gate is applied to each qubit. The parameters of each gate are chosen such that the th qubit is rotated by an angle . The probability of measuring the th qubit in the state after the feature map is applied is given by the following equation:
(13) 
Note that if is restricted to the interval , is a onetoone function with a range spanning all possible probabilities from 0 to 1. This feature map guarantees that each unique input will have a unique quantum encoding without requiring a large number of quantum gates.
iii.2 Real Amplitudes Ansatz
The RealAmplitudes ansatz from the Qiskit library consists entirely of single qubit RY rotation gates and two qubit CX entanglement gates. First, a parameterized RY gate is applied to each qubit. The parameters of these RY gates are the first parameters of the ansatz. Next, two qubit CX gates are applied to each possible combination of qubits in the quantum state. By convention, the least significant qubit is used as the control bit each time. Finally, each qubit is subject to another parameterized RY gate. The parameters of these gates are the second parameters of the ansatz. Additional rounds of CX entanglements and RY rotations can be added to the ansatz by adjusting the number of repetitions. However, to avoid long training times, we used one repetition for each variational quantum circuit. With this specification, the RealAmplitudes ansatz always has exactly parameters.
iii.3 Preprocessing Real Valued Data
As mentioned in Section III.1, we require each feature to be within the interval before passing it to the feature map. For Bars and Stripes, this is not an issue since all features have a binary value. Alternatively, the two real valued data sets must be modified before training since both include many points with feature values outside of the desired range. We prepared these data sets for the variational quantum circuit using the following procedure:

Scale the data set so that each feature has a mean of 0 and a variance of 1 using SciKit Learn’s
StandardScalar class. 
Divide the modified data set by its feature with the largest absolute value. Now all features in the data set have a value between 1 and 1.

Multiply the modified data set by .

Add to each feature of the modified data set.
After this procedure is performed, each feature in the modified data set will fall within the interval [], ensuring that every unique input will have a unique quantum encoding.
iii.4 Hardware and Job Specifications
During the simulated quantum trials, each quantum circuit was run using the IBM QASM simulator. During the actual quantum trials, each quantum circuit was run using the IBM Mumbai quantum computer or the IBM Montreal quantum computer. Both quantum computers have 27 qubits and a quantum volume of 128. They also both use CX, ID, RZ, SX, and X gates. When our results were compiled, the average CNOT error of IBM Mumbai was , and the average readout error was . The average CNOT error of IBM Montreal was , and the average readout error was . To determine the output (expectation value) of a variational quantum circuit on a particular input, we ran the circuit 1024 times and then averaged the result of each run. Each job was initialized and sent to the quantum computer using a personal laptop with a 2.7 GHz DualCore Intel i5 processor and 8 GB 1,867 MHz DDR3 memory. This laptop was also used to process the results of each job and optimize model parameters accordingly.
Iv Results
in sample accuracy  in sample cost  out of sample accuracy  out of sample cost  

hardware  data  model  parameters  median  avg.  std.  median  avg.  std.  median  avg.  std.  median  avg.  std. 
simulated  BAS  VQC  8  100.0  88.89  12.42  0.55  0.54  0.04  N/A  N/A  N/A  N/A  N/A  N/A 
HNN  20  100.0  100.0  0.0  0.33  0.35  0.07  N/A  N/A  N/A  N/A  N/A  N/A  
simulated  synth  VQC  4  97.5  85.5  18.34  0.37  0.46  0.14  100.0  86.5  20.13  0.35  0.43  0.15 
HNN  12  97.5  93.88  9.0  0.29  0.33  0.13  97.5  94.5  8.79  0.25  0.29  0.14  
simulated  iris  VQC  8  88.12  81.5  14.37  0.45  0.48  0.12  87.5  82.5  17.92  0.44  0.48  0.12 
HNN  20  91.25  89.88  4.24  0.37  0.39  0.09  95.0  91.5  9.23  0.38  0.39  0.10  
quantum  BAS  VQC  8  50.0  50.0  0.0  0.71  0.71  0.01  N/A  N/A  N/A  N/A  N/A  N/A 
HNN  20  25.0  33.33  11.79  0.71  0.72  0.11  N/A  N/A  N/A  N/A  N/A  N/A  
quantum  synth  VQC  4  96.25  82.92  20.65  0.38  0.46  0.13  95.0  90.0  10.8  0.35  0.4  0.1 
HNN  12  96.25  95.0  3.68  0.26  0.31  0.07  100.0  95.0  7.07  0.23  0.27  0.06  
quantum  iris  VQC  8  45.0  45.0  5.0  0.79  0.79  0.07  47.5  47.5  7.5  0.75  0.75  0.03 
HNN  20  28.12  28.12  20.62  0.92  0.92  0.21  37.5  37.5  17.5  0.95  0.95  0.24 
Bars and Stripes data set (BAS), a synthetic two dimensional data set (synth), and a subset of the iris data set (iris). Average values are denoted by avg. and the corresponding standard deviation is denoted by std.
We tested the hybrid neural network on three binary classification data sets. As a point of comparison, we also trained an individual variational quantum circuit classifier on each of these data sets. We trained both models on a simulated universal quantum computer and a state of the art universal quantum computer. To achieve reasonable training times, we restricted the hybrid neural network to use only
hidden neurons. On all three data sets, 10 simulated quantum trials and 2 to 3 actual quantum trials were performed for both quantum models. Each trial, all ansatz parameters were randomly initialized using a uniform distribution with a range of
.On modern hardware, the VQC and HNN do not offer any training time advantage over classical machine learning models. In fact, it is always possible to construct a classical multilayer perceptron that requires substantially shorter training times while achieving equal or better accuracy. Since it is unclear how quantum training time will change as quantum hardware evolves, we did not report the training times from our experiments. In general, even on the smallest data set we tested, training the VQC model can take over 30 minutes. This training time is dominated by the time required to prepare and run each quantum circuit on the quantum computer. Since the HNN model is composed of multiple VQC units, the training time of the HNN is larger than the training time of the VQC. On our test data sets, we found that the training time of the HNN was 3 to 5 times larger than the training time of the VQC.
iv.1 Bars and Stripes
Bars and Stripes is a synthetic data set of binary black and white images. Each image in the data set is either a “bar” or a “stripe.” A “bar” has 1 to horizontal rows highlighted in black, and a “stripe” has 1 to vertical columns highlighted in black. In some variations of Bars and Stripes, an entirely white image and an entirely black image is also included. We do not include these two images since their classification is ambiguous. Overall, the data set contains images. Of these images, are stripes and are bars. An image of the Bars and Stripes data set is depicted in Figure 4.
We trained the variational quantum circuit classifier and hybrid quantum classical neural network on the
Bars and Stripes data set using 20 epochs of batch gradient descent with a learning rate of 0.5. All 4 points in the data set were used for training. The results of the simulated trials are reported in Table
1 and Figure 5. The results of the quantum trials are reported in Table 1.On simulated hardware, the HNN correctly classified every point each trial, while the VQC occasionally incorrectly classified one or more points. Nevertheless, both models achieved high accuracy on average. The slight difference in the average accuracy of each model may indicate that the HNN architecture is more resilient to unfavorable parameter initialization. On quantum hardware, both models performed poorly. The number of required qubits and gates in each model is proportional to the dimension of the data set, so it may be the case that the quantum circuits used for this data set were too large to be accurately performed on modern hardware.
iv.2 Synthetic Data
We also trained the individual variational quantum circuit and hybrid quantumclassical neural network on a two dimensional, linearly separable data set generated using SciKit Learn’s make_blobs() function. This synthetic data set consisted of 100 data points split evenly between each class. In each experiment, 80 of the 100 data points were chosen at random to be used for training. Training consisted of 10 epochs of minibatch gradient descent using a batch size of 16 points and a learning rate of 0.1. The results of the simulated trials are reported in Table 1 and Figure 6. The results of the quantum trials are reported in Table 1.
Similar to the bars and stripes data set, the HNN and VQC both achieved high accuracy on simulated hardware. The HNN had higher average accuracy than the VQC by roughly 10 percent. Additionally, the HNN had a lower average cost than the VQC by over 30 percent. This time, both models also achieved high accuracy on actual quantum hardware. This is unsurprising since the synthetic data set is two dimensional, meaning much fewer qubits and logic gates are required in each quantum circuit.
Since the synthetic data set is two dimensional, it is possible to visualize the classification line and probability distribution learned by each quantum model. In Figure 7, we have plotted this information for one of the variational quantum circuit trials and one of the hybrid neural network trials. The two examples chosen were selected because their final accuracy and final cost value were reflective of other trials of the same model type. Additionally, both examples had roughly 50% accuracy before training.
iv.3 Iris
Finally, we trained the individual variational quantum circuit and hybrid quantumclassical neural network on a subset of the iris benchmark data set. The iris data set consists of 150 samples split evenly among 3 species of iris. Each iris sample is represented by four real valued features (sepal length, sepal width, pedal length, pedal width). We tested both models on the 100 samples corresponding to the iris versicolor and iris virginica species whose sample points are nonlinearly separable. In each experiment, 80 of the 100 data points were chosen at random to be used for training. Training consisted of 10 epochs of minibatch gradient descent using a batch size of 16 points and a learning rate of 0.1. The results of the simulated trials are reported in Table 1 and Figure 8. The results of the quantum trials are reported in Table 1.
On average, the HNN achieved roughly 10 percent better accuracy than the VQC when simulated quantum hardware was used. The HNN also achieved an average cost value approximately 20% less than the VQC. Unfortunately, like the Bars and Stripes data set, both models performed extremely poorly when quantum hardware was used. Again, we suspect the decline in performance is due to the fact the iris data set is four dimensional.
V Conclusion
On simulated hardware, the hybrid quantumclassical neural network always outperformed the individual variational quantum circuit in terms of both accuracy and cost. Specifically, the average accuracy was 8 to 11 percent higher, and the average cost was 20 to 40 percent lower. Notably, the advantages achieved by the hybrid neural network were observed on both the training data set and the test data set. This suggests that they were not a product of overfitting. The learned Bernoulli distributions illustrated in Figure
7 give some indication of why the hybrid quantumclassical neural network achieves better performance. The neural network is able to produce a probability distribution with a much steeper gradient near the classification line. This enables the neural network to classify points with greater certainty than the individual variational quantum circuit, which in turn helps minimize cost.It is not overwhelmingly surprising that the hybrid neural network is more expressive than the variational quantum circuit classifier since it has more than twice as many parameters. Nevertheless, increasing the number of parameters of a machine learning model does not always guarantee better results, especially when data points outside of the training set are considered. At the very least, the proposed hybrid neural network architecture illustrates one effective way to add parameters to a quantum machine learning model. Some measures indicate that variational quantum circuits are more expressive than classical neural network architectures [Cerezo2021]. Our proposed hybrid quantumclassical neural network architecture illustrates one approach to capitalize on these advantages when tackling more challenging machine learning tasks.
Notably, when quantum hardware was used, the variational quantum circuit classifier and the hybrid neural network both performed extremely poorly on the iris data set and the Bars and Stripes data set. This is likely because the number of qubits and number of required gates is proportional to the dimension of the data set. Increasing the number of qubits or increasing the number of gates adversely impacts the fidelity of modern quantum computation. Future research may investigate using a more sophisticated feature map or ansatz within the variational quantum circuits used by the neural network. Additionally, a more in depth study into hyperparameter optimization of the learning rate and batch size may prove useful for improving results on modern quantum hardware. Finally, larger and more complex hybrid neural network architectures may be investigated on more challenging classification problems.
Acknowledgements
This manuscript has been authored in part by UTBattelle, LLC under Contract No. DEAC0500OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paidup, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doepublicaccessplan). This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DEAC0500OR22725. This work was funded in part by the DOE Office of Science, Highenergy Physics Quantised program. This work was funded in part by the DOE Office of Science, Advanced Scientific Computing Research (ASCR) program.
Comments
There are no comments yet.