1 Introduction
Quantum computers use the principles of quantum mechanics for computing, which are more powerful than classical computers in many computing problems [24, 5]. Noisy intermediate scale quantum (NISQ) [19]
devices will be the only quantum devices that can be used in nearterm, where we can only use a limited number of qubits without error correcting. So developing NISQ algorithms is a new challenge.
In this paper, we will focus on quantum machine learning. Many quantum machine learning algorithms, such as qSVM, qPCA and quantum Boltzmann machine, have been developed
[27, 22, 3, 20, 14, 1, 4], and these algorithms were shown to be more efficient than their classical versions. Recently, several NISQ quantum machine learning algorithms, such as QuGAN, QCBM and quantum kernel methods, have been proposed [15, 12, 21, 6, 2]. However, these algorithms did not aim to build quantum deep neural networks.In recent years, deep neural network [10]
became the most important and powerful method in machine learning, which was widely applied in computer vision
[26][25], and many other fields. The basic unit of DNN is the perception, which is an affine transform together with an activation function. The nonlinearity of the activation function and the depth give the DNN more representation power. Approaches have been proposed to build classical DNNs on quantum computers
[8, 29, 7]. They achieved quantum speedup under certain assumptions. But the structure of classical DNNs is still used, since only some operations are speeded up by quantum algorithms, for instance, to speedup the inner product using the swap test [29].In this paper, we introduce the first quantum analog to classical DNN, which consists of fully quantum structured layers with better representation power than the classical DNN and still keeps the advantages of the classical DNN such as the nonlinear activation, the multilayer structure, and the efficient backpropagation training algorithm.
The main contribution of this paper is to introduce the concept of quantum neural network layer (QNNL) as a quantum analog to the classic neural network layer in DNN. As all quantum gates are unitary and hence linear, the main difficulty of building a QNNL is introducing nonlinearity. We solved this problem by encoding the input vector to a quantum state nonlinearly with a PQC. A QNNL is a quantum circuit which is totally different from the classical neural network layer. A quantum DNN (QDNN) can be easily built with QNNLs, since the input and output of a QNNL are classical values.
The advantage of introducing QNNLs is that we can access vectors of exponential dimensional Hilbert spaces with only polynomial resources on a quantum computer. We proved that this model can not be classically simulated efficiently unless universal quantum computing can be classically simulated efficiently. So QDNNs have more representation power than classical DNNs. We also give training algorithms of QDNNs which are similar to backpropagation (BP) algorithm. Moreover, QNNLs use the hybrid quantumclassical scheme. Hence, a QDNN with a reasonable size can be trained efficiently on NISQ processors. Finally, a numerical experiment for an image recognition is given using QDNNs, where high accurate rate is achieved.
We finally remark that all tasks using DNN can be turned into quantum algorithms with more representation powers by replacing the DNN by QDNN.
2 Hybrid quantumclassical algorithm
The hybrid quantumclassical algorithm scheme [17] consists of a quantum part and a classical part. In the quantum part, one uses parametric quantum circuits (PQCs) to prepare quantum states using quantum processors. In the classical part, one uses classical computers for optimizing the parameters of the PQCs in the quantum parts.
2.1 Hybrid quantumclassical scheme based on PQCs
PQCs are quantum circuits with parametric gates. In general, a PQC is of the form
where are the parameters, each is a rotation gate , and is a 1qubit or a 2qubits gate such that . For example, in this paper we will use the Pauli gates , and the gate.
In practical tasks such as VQE [13] and quantum machine learning [12], we want to find a quantum state with certain desired properties. This can be done with the following three steps based on the hybrid quantumclassical scheme. First, we need to choose an appropriate ansatz, that is, designing the circuit structure of a PQC . All parameters are initialized randomly. Then we apply this PQC to a fixed initial state , for instance . Second, by measuring the final state
repeatedly, we can estimate the expected value
for an Hamiltonian . will be designed differently in different tasks. In many tasks, the ground state ofis our goal. To achieve this goal, in the final step, we optimize the loss function
, by updating parameters on classical computers.In summary, a hybrid quantumclassical scheme, as shown in Figure 1, consists of a PQC and a loss function of the form together with a classical algorithm for updating parameters, where is a Hamiltonian.
2.2 Optimization in hybrid quantumclassical scheme
There are many methods for optimizing the loss function for a hybrid quantumclassical scheme based on PQCs. Some are gradientbased [11] and some are gradientfree [18]. We will focus on gradientbased algorithms in this paper.
The gradient can be estimated by shifting parameters of PQCs without changing the circuit structure. The detail of the gradient estimation algorithm can be found in Appendix A. Once the gradient is obtained, we can use the gradient descent method to update the parameters.
3 DNNs with quantum neural network layers
In this section, we will introduce the concepts of quantum neural network layer (QNNL) and quantum DNN, and give a training algorithm for the quantum DNN.
3.1 QNNL and QDNN
A DNN consists of a large number of neural network layers, and each neural network layer is a nonlinear function with parameters . In the classical DNN, takes the form of , where is an affine transform and is a nonlinear activation function. The power of DNNs comes from the nonlinearity of the activation function. Without activation functions, DNNs will be nothing more than affine transforms.
However, all quantum gates are unitary matrices and hence linear. So the key point of developing QNNLs is introducing nonlinearity.
Suppose that the input data is classical. We introduce nonlinearity to our QNNL by encoding the input to a quantum state nonlinearly. Concretely, we will use a PQC for this process. Choose a PQC with at most qubits and apply it to an initial state . We obtain a quantum state
(1) 
encoded from . The PQC is naturally nonlinear in the parameters. For example, the encoding process
from to is nonlinear. Moreover we can compute the gradient of each component of efficiently. This is very important, since we need the gradient of the input in each layer when training the QNNL. The encoding step is the analog to the classical activation step.
After encoding the input data, we apply a linear transform as the analog of linear transform in the classical DNNs. This part is natural on quantum computers, because all quantum gates are linear. We use another PQC
with parameters for this step. We assume that the number of parameters in is .Finally, the output of a QNNL will be computed as follow. We choose of fixed Hamiltonians (which means we will not change them during training), , and output
(2) 
Here, the bias term is an analog of bias in classical DNNs. Also, each is a hybrid quantumclassical scheme with PQC and Hamiltonian .
To compute the output efficiently, we assume that the expectation value of each of these Hamiltonians can be computed in , where is the precision. It is easy to show that all Hamiltonianss of the following form satisfy this assumption
where
are tensor products of Pauli matrices or
local Hamiltonians.In summary, a QNNL is a function
defined by (1) and (2), and shown in Figure 2. Note that a QNNL is a function with classic input and output, and can be determined by a tuple
with parameters . Notice that the QNNLs activate before affine transforms while classical DNNLs activate after affine transforms. But this difference can be ignored when we consider multilayers.
Since the input and output of QNNLs are classical, these QNNLs can be naturally embedded in classical DNNs. A DNN consists of the composition of multiple compatible QNNLs and classical DNN layers is called quantum DNN (QDNN):
where is a classical or quantum layer from to for and are the parameters of the QDNN.
3.2 Training algorithms of QDNNs
We will use gradient descent to update the parameters. In classical DNNs, the gradient of parameters in each layer is computed by the backpropagation algorithm (BP). Suppose that we have a QDNN. Consider a QNNL with parameters whose input is , output is . Refer to (1) and (2) for details. To use the BP algorithm, we need to compute and .
is trivial. And because are PQCs and each component of is an output of a hybrid quantumclassical scheme, both and can be estimated by the algorithm in section 2.2. Hence, QDNNs can be trained with the BP algorithm.
4 Representation power of QDNNs
In this section, we will show that QDNNs have more representation power than that of classical DNNs.
According to the definition of QNNLs in (2), each element of the outputs in a QNNL is of the form
(3) 
In general, estimation of on a classical computer will be difficult by the following theorem.
Theorem 1.
Estimation (3) with precision is hard, where is the boundederror quantum polynomial time complexity class.
Proof.
Consider any language
. There exists a polynomialtime Turing machine which takes
as input and outputs a polynomialsized quantum circuit . Moreover, if and only if the measurement result ofof the first qubit has the probability
to be .Because are universal quantum gates, can be expressed as a polynomialsized PQC: with proper parameters. Consider , then
(4) 
if and only if , and
(5) 
if and only if . ∎
As it is generally believed that quantum computers can not be simulated efficiently by classical computers, according to Theorem 1, there exist QNNLs which cannot be classically simulated efficiently. Hence, this QNNLs can represent functions which cannot be represented by classical DNNs with polynomial units. Thus, adding QNNLs to DNNs will enhance the representation power of DNNs.
5 Numerical experiments
In this section, we will use QDNN to conduct a numerical experiment for an image classification task. The data comes from the MNIST data set. We built a QDNN with 3 QNNLs. The goal of this QDNN is to recognize the digit in the image is either 0 or 1 as a classifier.
We uses the Julia package Yao.jl as a quantum simulator [16] in our experiments. All data were collected on a desktop PC with Intel CPU i74790 and 4GB RAM.
5.1 Details of the model
The data in the MNIST is dimensional images. This dimension is too large for the current quantum simulator. Hence, we first resize the image to pixels. We use three QNNLs in our QDNN, which will be called the input layer, the hidden layer, and the output layer, respectively. Each layer is made of 3 parts: encoder, transform, and output.
5.1.1 Input layer
The input layer uses an 8qubit circuit. The encoder will accept an input vector and use a PQC to map it to a quantum state . The structure of the PQC is like
where
(6) 
is a CNOT circuit which introduces entanglement to the circuit. In our experiment, is of the following form.
The transform in the input layer is similar to the encoder. The structure of the transform is like
(7) 
Also, the transform parameters will be input to as the same way as were input to .
The output of the input layer is of the form
(8) 
where and denotes the result obtained by applying the operator on the th qubit for .
5.1.2 Hidden layer
The hidden layer uses 6 qubits. The structure of the hidden layer is almost the same as the input layer, but with less qubit gates.
The encoder is of the form
(9) 
The transform is
(10) 
The output of the hidden layer is
(11) 
5.1.3 Output layer
The output layer uses 4 qubits. The structure of the output layer is also similar to the input layer. The only difference is that in the output is classification result.
The encoder is
(12) 
The transform is like
(13) 
The output of the output layer is
(14) 
We do not add bias term here, and it will output a vector in . Moreover, if the input is from an image of digit , the output should be close to , otherwise it should be close to after training.
In conclusion, the settings of these three layers are shown in table 1.
# of qubits  Input dimension  Output dimension  # of parameters (transform + bias)  

Input layer  8  64  24  160 + 24 
Hidden layer  6  24  12  96 + 12 
Output layer  4  12  2  28 + 0 
Finally, the loss function is defined as
(15) 
where is the training set.
5.2 Experiment results
All parameters were initialized randomly in . We use Adam optimizer [9] to update parameters. We train this QDNN for 400 iterations with batch size of 240. In the first 200 of iterations, the hyper parameters of Adam is set to be . In the later 200 of iterations, we change to .
The values of the loss function on the training set and test set during training is shown in Figure 3. The accurate rate of this QDNN on the test set rise to after training.
6 Discussion
We introduced the model of QNNL and built QDNN with QNNLs. We proved that QDNNs have more representation power than classical DNNs. We presented a practical gradientbased training algorithm as the analog of BP algorithms. Because the model is based on hybrid quantumclassical scheme, it can be realized on NISQ processors. As a result, the QDNN has more representation powers than classical DNNs and still keeps most of the advantages of the classical DNNs.
Due to the limited power the classical simulator for quantum computation, only QDNNs with a small number of qubits can be used in practice. As a consequence, we only trained a model for a simple task in our experiment. If we have more quantum resources in the future, we can access exponential dimensional feature Hilbert spaces [21] with QDNNs and only uses polynomial size of parameters. Hence, we believe that QDNNs will help us to extract features more efficiently in exponential dimensional feature Hilbert spaces. This idea is similar to ideas of kernel methods [23, 28].
References
 [1] (2018) Quantum boltzmann machine. Physical Review X 8 (2), pp. 021050. Cited by: §1.
 [2] (2019) Parameterized quantum circuits as machine learning models. Quantum Science and Technology 4 (4), pp. 043001. Cited by: §1.
 [3] (2017) Quantum machine learning. Nature 549 (7671), pp. 195–202. Cited by: §1.
 [4] (2018) A quantum machine learning algorithm based on generative models. Science advances 4 (12), pp. eaat9004. Cited by: §1.

[5]
(1996)
A fast quantum mechanical algorithm for database search.
In
Proceedings of 28th Annual ACM Symposium on Theory of Computing
, STOC ’96, New York, NY, USA, pp. 212–219. External Links: ISBN 0897917855, Link, Document Cited by: §1.  [6] (2019) Supervised learning with quantumenhanced feature spaces. Nature 567 (7747), pp. 209. Cited by: §1.

[7]
(2020)
Quantum algorithms for deep convolutional neural networks
. In International Conference on Learning Representations, External Links: Link Cited by: §1.  [8] (2019) Continuousvariable quantum neural networks. Physical Review Research 1 (3), pp. 033063. Cited by: §1.
 [9] (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §5.2.
 [10] (2015) Deep learning. Nature 521 (7553), pp. 436–444. Cited by: §1.
 [11] (2017) Hybrid quantumclassical approach to quantum optimal control. Physical review letters 118 (15), pp. 150503. Cited by: §2.2.
 [12] (2018) Differentiable learning of quantum circuit born machines. Physical Review A 98 (6), pp. 062324. Cited by: §1, §2.1.
 [13] (201909) Variational quantum eigensolver with fewer qubits. Phys. Rev. Research 1, pp. 023025. External Links: Document, Link Cited by: §2.1.

[14]
(2014)
Quantum principal component analysis
. Nature Physics 10 (9), pp. 631. Cited by: §1.  [15] (2018) Quantum generative adversarial learning. Physical review letters 121 (4), pp. 040502. Cited by: §1.
 [16] (2019) Yao.jl: extensible, efficient framework for quantum algorithm design. arXiv preprint arXiv:1912.10877. Cited by: §5.
 [17] (2016) The theory of variational hybrid quantumclassical algorithms. New Journal of Physics 18 (2), pp. 023023. Cited by: §2.
 [18] (2019) Sequential minimal optimization for quantumclassical hybrid algorithms. arXiv preprint arXiv:1903.12166. Cited by: §2.2.
 [19] (2018) Quantum computing in the nisq era and beyond. Quantum 2, pp. 79. Cited by: §1.

[20]
(2014)
Quantum support vector machine for big data classification
. Physical review letters 113 (13), pp. 130503. Cited by: §1.  [21] (2019) Quantum machine learning in feature hilbert spaces. Physical review letters 122 (4), pp. 040504. Cited by: §1, §6.
 [22] (2015) An introduction to quantum machine learning. Contemporary Physics 56 (2), pp. 172–185. Cited by: §1.
 [23] (2004) Kernel methods for pattern analysis. Cambridge university press. Cited by: §6.
 [24] (1994) Algorithms for quantum computation: discrete logarithms and factoring. In Proceedings 35th Annual Symposium on Foundations of Computer Science, pp. 124–134. Cited by: §1.
 [25] (2012) Deep learning for nlp (without magic). In Tutorial Abstracts of ACL 2012, pp. 5–5. Cited by: §1.
 [26] (2018) Deep learning for computer vision: a brief review. Computational intelligence and neuroscience 2018. Cited by: §1.
 [27] (2012) Quantum algorithm for data fitting. Physical review letters 109 (5), pp. 050505. Cited by: §1.
 [28] (2003) Kernel methods for relation extraction. Journal of machine learning research 3 (Feb), pp. 1083–1106. Cited by: §6.
 [29] (201907) Building quantum neural networks based on a swap test. Phys. Rev. A 100, pp. 012334. External Links: Document, Link Cited by: §1.
Appendix A Gradient Estimation of PQCs
Without the loss of generality, suppose that a PQC has the form
Given a Hamiltonian , the expection value is defined by the following equation,
The goal in hybrid quantumclassical computing is usually optimizing . We can use gradient descent for this problem. Thus, we need estimate
Notice that has the form , where . Hence, we have
We denote
and
By calculus,
Denote
That is, we shift the th parameter. Then
By similar computation,
Thus, one can simply check that
In conclusion, we can estimate gradient of parameters of a PQC by shifting parameters and runing the same circuit.