QDNN: DNN with Quantum Neural Network Layers

12/29/2019 ∙ by Chen Zhao, et al. ∙ 0

The deep neural network (DNN) became the most important and powerful machine learning method in recent years. In this paper, we introduce a general quantum DNN, which consists of fully quantum structured layers with better representation power than the classical DNN and still keeps the advantages of the classical DNN such as the non-linear activation, the multi-layer structure, and the efficient backpropagation training algorithm. We prove that the quantum structured layer can not be simulated efficiently by classical computers unless universal quantum computing can be classically simulated efficiently and hence our quantum DNN has more representation power than the classical DNN. Moreover, our quantum DNN can be used on near-term noisy intermediate scale quantum (NISQ) processors. A numerical experiment for image classification based on quantum DNN is given, where high accurate rate is achieved.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Quantum computers use the principles of quantum mechanics for computing, which are more powerful than classical computers in many computing problems [24, 5]. Noisy intermediate scale quantum (NISQ) [19]

devices will be the only quantum devices that can be used in near-term, where we can only use a limited number of qubits without error correcting. So developing NISQ algorithms is a new challenge.

In this paper, we will focus on quantum machine learning. Many quantum machine learning algorithms, such as qSVM, qPCA and quantum Boltzmann machine, have been developed

[27, 22, 3, 20, 14, 1, 4], and these algorithms were shown to be more efficient than their classical versions. Recently, several NISQ quantum machine learning algorithms, such as QuGAN, QCBM and quantum kernel methods, have been proposed [15, 12, 21, 6, 2]. However, these algorithms did not aim to build quantum deep neural networks.

In recent years, deep neural network [10]

became the most important and powerful method in machine learning, which was widely applied in computer vision

[26]

, natural language processing

[25]

, and many other fields. The basic unit of DNN is the perception, which is an affine transform together with an activation function. The non-linearity of the activation function and the depth give the DNN more representation power. Approaches have been proposed to build classical DNNs on quantum computers

[8, 29, 7]. They achieved quantum speed-up under certain assumptions. But the structure of classical DNNs is still used, since only some operations are speeded up by quantum algorithms, for instance, to speedup the inner product using the swap test [29].

In this paper, we introduce the first quantum analog to classical DNN, which consists of fully quantum structured layers with better representation power than the classical DNN and still keeps the advantages of the classical DNN such as the non-linear activation, the multi-layer structure, and the efficient backpropagation training algorithm.

The main contribution of this paper is to introduce the concept of quantum neural network layer (QNNL) as a quantum analog to the classic neural network layer in DNN. As all quantum gates are unitary and hence linear, the main difficulty of building a QNNL is introducing non-linearity. We solved this problem by encoding the input vector to a quantum state non-linearly with a PQC. A QNNL is a quantum circuit which is totally different from the classical neural network layer. A quantum DNN (QDNN) can be easily built with QNNLs, since the input and output of a QNNL are classical values.

The advantage of introducing QNNLs is that we can access vectors of exponential dimensional Hilbert spaces with only polynomial resources on a quantum computer. We proved that this model can not be classically simulated efficiently unless universal quantum computing can be classically simulated efficiently. So QDNNs have more representation power than classical DNNs. We also give training algorithms of QDNNs which are similar to backpropagation (BP) algorithm. Moreover, QNNLs use the hybrid quantum-classical scheme. Hence, a QDNN with a reasonable size can be trained efficiently on NISQ processors. Finally, a numerical experiment for an image recognition is given using QDNNs, where high accurate rate is achieved.

We finally remark that all tasks using DNN can be turned into quantum algorithms with more representation powers by replacing the DNN by QDNN.

2 Hybrid quantum-classical algorithm

The hybrid quantum-classical algorithm scheme [17] consists of a quantum part and a classical part. In the quantum part, one uses parametric quantum circuits (PQCs) to prepare quantum states using quantum processors. In the classical part, one uses classical computers for optimizing the parameters of the PQCs in the quantum parts.

2.1 Hybrid quantum-classical scheme based on PQCs

PQCs are quantum circuits with parametric gates. In general, a PQC is of the form

where are the parameters, each is a rotation gate , and is a 1-qubit or a 2-qubits gate such that . For example, in this paper we will use the Pauli gates , and the gate.

In practical tasks such as VQE [13] and quantum machine learning [12], we want to find a quantum state with certain desired properties. This can be done with the following three steps based on the hybrid quantum-classical scheme. First, we need to choose an appropriate ansatz, that is, designing the circuit structure of a PQC . All parameters are initialized randomly. Then we apply this PQC to a fixed initial state , for instance . Second, by measuring the final state

repeatedly, we can estimate the expected value

for an Hamiltonian . will be designed differently in different tasks. In many tasks, the ground state of

is our goal. To achieve this goal, in the final step, we optimize the loss function

, by updating parameters on classical computers.

Figure 1: Hybrid quantum-classical scheme.

In summary, a hybrid quantum-classical scheme, as shown in Figure 1, consists of a PQC and a loss function of the form together with a classical algorithm for updating parameters, where is a Hamiltonian.

2.2 Optimization in hybrid quantum-classical scheme

There are many methods for optimizing the loss function for a hybrid quantum-classical scheme based on PQCs. Some are gradient-based [11] and some are gradient-free [18]. We will focus on gradient-based algorithms in this paper.

The gradient can be estimated by shifting parameters of PQCs without changing the circuit structure. The detail of the gradient estimation algorithm can be found in Appendix A. Once the gradient is obtained, we can use the gradient descent method to update the parameters.

3 DNNs with quantum neural network layers

In this section, we will introduce the concepts of quantum neural network layer (QNNL) and quantum DNN, and give a training algorithm for the quantum DNN.

3.1 QNNL and QDNN

A DNN consists of a large number of neural network layers, and each neural network layer is a non-linear function with parameters . In the classical DNN, takes the form of , where is an affine transform and is a non-linear activation function. The power of DNNs comes from the non-linearity of the activation function. Without activation functions, DNNs will be nothing more than affine transforms.

However, all quantum gates are unitary matrices and hence linear. So the key point of developing QNNLs is introducing non-linearity.

Suppose that the input data is classical. We introduce non-linearity to our QNNL by encoding the input to a quantum state non-linearly. Concretely, we will use a PQC for this process. Choose a PQC with at most qubits and apply it to an initial state . We obtain a quantum state

(1)

encoded from . The PQC is naturally non-linear in the parameters. For example, the encoding process

from to is non-linear. Moreover we can compute the gradient of each component of efficiently. This is very important, since we need the gradient of the input in each layer when training the QNNL. The encoding step is the analog to the classical activation step.

After encoding the input data, we apply a linear transform as the analog of linear transform in the classical DNNs. This part is natural on quantum computers, because all quantum gates are linear. We use another PQC

with parameters for this step. We assume that the number of parameters in is .

Finally, the output of a QNNL will be computed as follow. We choose of fixed Hamiltonians (which means we will not change them during training), , and output

(2)

Here, the bias term is an analog of bias in classical DNNs. Also, each is a hybrid quantum-classical scheme with PQC and Hamiltonian .

Figure 2: The structure of a QNNL

To compute the output efficiently, we assume that the expectation value of each of these Hamiltonians can be computed in , where is the precision. It is easy to show that all Hamiltonianss of the following form satisfy this assumption

where

are tensor products of Pauli matrices or

-local Hamiltonians.

In summary, a QNNL is a function

defined by (1) and (2), and shown in Figure 2. Note that a QNNL is a function with classic input and output, and can be determined by a tuple

with parameters . Notice that the QNNLs activate before affine transforms while classical DNNLs activate after affine transforms. But this difference can be ignored when we consider multi-layers.

Since the input and output of QNNLs are classical, these QNNLs can be naturally embedded in classical DNNs. A DNN consists of the composition of multiple compatible QNNLs and classical DNN layers is called quantum DNN (QDNN):

where is a classical or quantum layer from to for and are the parameters of the QDNN.

3.2 Training algorithms of QDNNs

We will use gradient descent to update the parameters. In classical DNNs, the gradient of parameters in each layer is computed by the backpropagation algorithm (BP). Suppose that we have a QDNN. Consider a QNNL with parameters whose input is , output is . Refer to (1) and (2) for details. To use the BP algorithm, we need to compute and .

is trivial. And because are PQCs and each component of is an output of a hybrid quantum-classical scheme, both and can be estimated by the algorithm in section 2.2. Hence, QDNNs can be trained with the BP algorithm.

4 Representation power of QDNNs

In this section, we will show that QDNNs have more representation power than that of classical DNNs.

According to the definition of QNNLs in (2), each element of the outputs in a QNNL is of the form

(3)

In general, estimation of on a classical computer will be difficult by the following theorem.

Theorem 1.

Estimation (3) with precision is -hard, where is the bounded-error quantum polynomial time complexity class.

Proof.

Consider any language

. There exists a polynomial-time Turing machine which takes

as input and outputs a polynomial-sized quantum circuit . Moreover, if and only if the measurement result of

of the first qubit has the probability

to be .

Because are universal quantum gates, can be expressed as a polynomial-sized PQC: with proper parameters. Consider , then

(4)

if and only if , and

(5)

if and only if . ∎

As it is generally believed that quantum computers can not be simulated efficiently by classical computers, according to Theorem 1, there exist QNNLs which cannot be classically simulated efficiently. Hence, this QNNLs can represent functions which cannot be represented by classical DNNs with polynomial units. Thus, adding QNNLs to DNNs will enhance the representation power of DNNs.

5 Numerical experiments

In this section, we will use QDNN to conduct a numerical experiment for an image classification task. The data comes from the MNIST data set. We built a QDNN with 3 QNNLs. The goal of this QDNN is to recognize the digit in the image is either 0 or 1 as a classifier.

We uses the Julia package Yao.jl as a quantum simulator [16] in our experiments. All data were collected on a desktop PC with Intel CPU i7-4790 and 4GB RAM.

5.1 Details of the model

The data in the MNIST is dimensional images. This dimension is too large for the current quantum simulator. Hence, we first resize the image to pixels. We use three QNNLs in our QDNN, which will be called the input layer, the hidden layer, and the output layer, respectively. Each layer is made of 3 parts: encoder, transform, and output.

5.1.1 Input layer

The input layer uses an 8-qubit circuit. The encoder will accept an input vector and use a PQC to map it to a quantum state . The structure of the PQC is like

where

(6)

is a CNOT circuit which introduces entanglement to the circuit. In our experiment, is of the following form.

The transform in the input layer is similar to the encoder. The structure of the transform is like

(7)

Also, the transform parameters will be input to as the same way as were input to .

The output of the input layer is of the form

(8)

where and denotes the result obtained by applying the operator on the -th qubit for .

5.1.2 Hidden layer

The hidden layer uses 6 qubits. The structure of the hidden layer is almost the same as the input layer, but with less qubit gates.

The encoder is of the form

(9)

The transform is

(10)

The output of the hidden layer is

(11)

5.1.3 Output layer

The output layer uses 4 qubits. The structure of the output layer is also similar to the input layer. The only difference is that in the output is classification result.

The encoder is

(12)

The transform is like

(13)

The output of the output layer is

(14)

We do not add bias term here, and it will output a vector in . Moreover, if the input is from an image of digit , the output should be close to , otherwise it should be close to after training.

In conclusion, the settings of these three layers are shown in table 1.

# of qubits Input dimension Output dimension # of parameters (transform + bias)
Input layer 8 64 24 160 + 24
Hidden layer 6 24 12 96 + 12
Output layer 4 12 2 28 + 0
Table 1: Settings of three layers

Finally, the loss function is defined as

(15)

where is the training set.

5.2 Experiment results

All parameters were initialized randomly in . We use Adam optimizer [9] to update parameters. We train this QDNN for 400 iterations with batch size of 240. In the first 200 of iterations, the hyper parameters of Adam is set to be . In the later 200 of iterations, we change to .

The values of the loss function on the training set and test set during training is shown in Figure 3. The accurate rate of this QDNN on the test set rise to after training.

Figure 3: Loss function

6 Discussion

We introduced the model of QNNL and built QDNN with QNNLs. We proved that QDNNs have more representation power than classical DNNs. We presented a practical gradient-based training algorithm as the analog of BP algorithms. Because the model is based on hybrid quantum-classical scheme, it can be realized on NISQ processors. As a result, the QDNN has more representation powers than classical DNNs and still keeps most of the advantages of the classical DNNs.

Due to the limited power the classical simulator for quantum computation, only QDNNs with a small number of qubits can be used in practice. As a consequence, we only trained a model for a simple task in our experiment. If we have more quantum resources in the future, we can access exponential dimensional feature Hilbert spaces [21] with QDNNs and only uses polynomial size of parameters. Hence, we believe that QDNNs will help us to extract features more efficiently in exponential dimensional feature Hilbert spaces. This idea is similar to ideas of kernel methods [23, 28].

References

  • [1] M. H. Amin, E. Andriyash, J. Rolfe, B. Kulchytskyy, and R. Melko (2018) Quantum boltzmann machine. Physical Review X 8 (2), pp. 021050. Cited by: §1.
  • [2] M. Benedetti, E. Lloyd, S. Sack, and M. Fiorentini (2019) Parameterized quantum circuits as machine learning models. Quantum Science and Technology 4 (4), pp. 043001. Cited by: §1.
  • [3] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd (2017) Quantum machine learning. Nature 549 (7671), pp. 195–202. Cited by: §1.
  • [4] X. Gao, Z. Zhang, and L. Duan (2018) A quantum machine learning algorithm based on generative models. Science advances 4 (12), pp. eaat9004. Cited by: §1.
  • [5] L. K. Grover (1996) A fast quantum mechanical algorithm for database search. In

    Proceedings of 28th Annual ACM Symposium on Theory of Computing

    ,
    STOC ’96, New York, NY, USA, pp. 212–219. External Links: ISBN 0-89791-785-5, Link, Document Cited by: §1.
  • [6] V. Havlíček, A. D. Córcoles, K. Temme, A. W. Harrow, A. Kandala, J. M. Chow, and J. M. Gambetta (2019) Supervised learning with quantum-enhanced feature spaces. Nature 567 (7747), pp. 209. Cited by: §1.
  • [7] I. Kerenidis, J. Landman, and A. Prakash (2020)

    Quantum algorithms for deep convolutional neural networks

    .
    In International Conference on Learning Representations, External Links: Link Cited by: §1.
  • [8] N. Killoran, T. R. Bromley, J. M. Arrazola, M. Schuld, N. Quesada, and S. Lloyd (2019) Continuous-variable quantum neural networks. Physical Review Research 1 (3), pp. 033063. Cited by: §1.
  • [9] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §5.2.
  • [10] Y. LeCun, Y. Bengio, and G. Hinton (2015) Deep learning. Nature 521 (7553), pp. 436–444. Cited by: §1.
  • [11] J. Li, X. Yang, X. Peng, and C. Sun (2017) Hybrid quantum-classical approach to quantum optimal control. Physical review letters 118 (15), pp. 150503. Cited by: §2.2.
  • [12] J. Liu and L. Wang (2018) Differentiable learning of quantum circuit born machines. Physical Review A 98 (6), pp. 062324. Cited by: §1, §2.1.
  • [13] J. Liu, Y. Zhang, Y. Wan, and L. Wang (2019-09) Variational quantum eigensolver with fewer qubits. Phys. Rev. Research 1, pp. 023025. External Links: Document, Link Cited by: §2.1.
  • [14] S. Lloyd, M. Mohseni, and P. Rebentrost (2014)

    Quantum principal component analysis

    .
    Nature Physics 10 (9), pp. 631. Cited by: §1.
  • [15] S. Lloyd and C. Weedbrook (2018) Quantum generative adversarial learning. Physical review letters 121 (4), pp. 040502. Cited by: §1.
  • [16] X. Luo, J. Liu, P. Zhang, and L. Wang (2019) Yao.jl: extensible, efficient framework for quantum algorithm design. arXiv preprint arXiv:1912.10877. Cited by: §5.
  • [17] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-Guzik (2016) The theory of variational hybrid quantum-classical algorithms. New Journal of Physics 18 (2), pp. 023023. Cited by: §2.
  • [18] K. M. Nakanishi, K. Fujii, and S. Todo (2019) Sequential minimal optimization for quantum-classical hybrid algorithms. arXiv preprint arXiv:1903.12166. Cited by: §2.2.
  • [19] J. Preskill (2018) Quantum computing in the nisq era and beyond. Quantum 2, pp. 79. Cited by: §1.
  • [20] P. Rebentrost, M. Mohseni, and S. Lloyd (2014)

    Quantum support vector machine for big data classification

    .
    Physical review letters 113 (13), pp. 130503. Cited by: §1.
  • [21] M. Schuld and N. Killoran (2019) Quantum machine learning in feature hilbert spaces. Physical review letters 122 (4), pp. 040504. Cited by: §1, §6.
  • [22] M. Schuld, I. Sinayskiy, and F. Petruccione (2015) An introduction to quantum machine learning. Contemporary Physics 56 (2), pp. 172–185. Cited by: §1.
  • [23] J. Shawe-Taylor, N. Cristianini, et al. (2004) Kernel methods for pattern analysis. Cambridge university press. Cited by: §6.
  • [24] P. W. Shor (1994) Algorithms for quantum computation: discrete logarithms and factoring. In Proceedings 35th Annual Symposium on Foundations of Computer Science, pp. 124–134. Cited by: §1.
  • [25] R. Socher, Y. Bengio, and C. D. Manning (2012) Deep learning for nlp (without magic). In Tutorial Abstracts of ACL 2012, pp. 5–5. Cited by: §1.
  • [26] A. Voulodimos, N. Doulamis, A. Doulamis, and E. Protopapadakis (2018) Deep learning for computer vision: a brief review. Computational intelligence and neuroscience 2018. Cited by: §1.
  • [27] N. Wiebe, D. Braun, and S. Lloyd (2012) Quantum algorithm for data fitting. Physical review letters 109 (5), pp. 050505. Cited by: §1.
  • [28] D. Zelenko, C. Aone, and A. Richardella (2003) Kernel methods for relation extraction. Journal of machine learning research 3 (Feb), pp. 1083–1106. Cited by: §6.
  • [29] J. Zhao, Y. Zhang, C. Shao, Y. Wu, G. Guo, and G. Guo (2019-07) Building quantum neural networks based on a swap test. Phys. Rev. A 100, pp. 012334. External Links: Document, Link Cited by: §1.

Appendix A Gradient Estimation of PQCs

Without the loss of generality, suppose that a PQC has the form

Given a Hamiltonian , the expection value is defined by the following equation,

The goal in hybrid quantum-classical computing is usually optimizing . We can use gradient descent for this problem. Thus, we need estimate

Notice that has the form , where . Hence, we have

We denote

and

By calculus,

Denote

That is, we shift the -th parameter. Then

By similar computation,

Thus, one can simply check that

In conclusion, we can estimate gradient of parameters of a PQC by shifting parameters and runing the same circuit.