Quantum Capsule Networks

by   Zidu Liu, et al.
Tsinghua University

Capsule networks, which incorporate the paradigms of connectionism and symbolism, have brought fresh insights into artificial intelligence. The capsule, as the building block of capsule networks, is a group of neurons represented by a vector to encode different features of an entity. The information is extracted hierarchically through capsule layers via routing algorithms. Here, we introduce a quantum capsule network (dubbed QCapsNet) together with a quantum dynamic routing algorithm. Our model enjoys an exponential speedup in the dynamic routing process and exhibits an enhanced representation power. To benchmark the performance of the QCapsNet, we carry out extensive numerical simulations on the classification of handwritten digits and symmetry-protected topological phases, and show that the QCapsNet can achieve the state-of-the-art accuracy and outperforms conventional quantum classifiers evidently. We further unpack the output capsule state and find that a particular subspace may correspond to a human-understandable feature of the input data, which indicates the potential explainability of such networks. Our work reveals an intriguing prospect of quantum capsule networks in quantum machine learning, which may provide a valuable guide towards explainable quantum artificial intelligence.



There are no comments yet.


page 1

page 4

page 5


Investigating Capsule Networks with Dynamic Routing for Text Classification

In this study, we explore capsule networks with dynamic routing for text...

Capsule networks with non-iterative cluster routing

Capsule networks use routing algorithms to flow information between cons...

iCaps: An Interpretable Classifier via Disentangled Capsule Networks

We propose an interpretable Capsule Network, iCaps, for image classifica...

CapProNet: Deep Feature Learning via Orthogonal Projections onto Capsule Subspaces

In this paper, we formalize the idea behind capsule nets of using a caps...

On Learning and Learned Representation with Dynamic Routing in Capsule Networks

Capsule Networks (CapsNet) are recently proposed multi-stage computation...

How to Accelerate Capsule Convolutions in Capsule Networks

How to improve the efficiency of routing procedures in CapsNets has been...

Assessing Capsule Networks With Biased Data

Machine learning based methods achieves impressive results in object cla...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Connectionism and symbolism are two complementary approaches towards artificial intelligence (AI) [1]

. Inspired by biological brains, connectionism aims to model the intelligence as an emergent phenomenon by connecting a large number of neurons. The most popular paradigm of this approach lies on artificial neural networks. With the recent rise of deep learning

[2, 3, 4], connectionism has attracted great attentions and become one of the most promising ways to realize general AI. Noteworthy examples [4]

include feed-forward neural networks, recursive neural networks, recurrent neural networks, and convolutional neural networks (CNNs). Connectionism AI requires less prior knowledge, which is beneficial for scaling up and broader applications. Yet, connectionism AI typically treats the machine learning model as a black box, which makes it challenging for human to explain the rational behind the model’s decisions. In comparison, symbolic AI is based on the logic, deduction, and higher-level symbolic representations

[5, 6]. It enables one to trace back how the decision is made, and seek for the interpretability of machine learning models. However, such models generally require more sophisticated prior knowledge, which hinders the widespread application of symbolic AI.

Figure 1:

General framework of quantum capsule networks (QCapsNets). The input data is first preprocessed in the previous layers to extract some preliminary features. Within QCapsNets, there are multiple capsules in each layer. Each capsule contains a group of interacting qubits, which forms a sub-quantum neural network (sub-QNN). The output states of the sub-QNN are represented by quantum states living in the Hilbert space, which encode features of the entities. Different capsules can represent different entities, such as ears, eyes and mouth of the panda. The information is processed layer by layer via a quantum dynamic routing algorithm. During the routing procedure, the probability that the lower-level capsule are assigned to the higher-level one will be updated with their geometric agreements. As such, higher-level capsules can not only recognize the active entities in the lower layer, but also preserve their geometric relationships. The magnitude of the output capsule indicates the classification probability.

Recently, a new paradigm that incorporates both symbolism and connectionism has become an encouraging trend for AI. Many attempts have been made [7, 8, 9, 10], among which the capsule network (CapsNet) [10, 11, 12, 13, 14], as a variant of the traditional neural network, has become a cutting-edge research area. In analogy to a group of neurons constituting a specific functional area of the human brain, the basic unit of CapsNets—capsule—is a set of neurons that used to detect a specific entity. Such a capsule is represented by a vector, rather than a scalar in traditional neural networks. In this fashion, a capsule is able to represent different features of an entity, such as pose, hue, texture, deformation, etc. Furthermore, the norm of the capsule indicates the likelihood that an entity being captured. During the feed-forward process, the information can be transferred layer by layer via the routing-by-agreement mechanism [11]

: the capsule in the higher-level layer is predicated upon its geometric relationship (e.g., dot product in the Euclidean space) with the lower-level one. Unlike the max-pooling method throwing away information about the precise position of entities in CNNs, this routing mechanism used in CapsNets can preserve the geometric relationships amongst entities. With such geometric information encoded inside the capsule, CapsNets generally have more intrinsic explainability than CNNs

[15, 16].

With the development and flourishing of quantum computing architecture [17, 18, 19, 20, 21, 22, 23], the interplay between quantum physics and AI has attracted a wide range of interests [24, 25, 26, 27]

. Along this line, many heuristic quantum machine learning models have been proposed, including the quantum decision tree classifiers

[28, 29]

, quantum support vector machines


, quantum Boltzmann machines

[31], quantum generative models [32, 33, 34, 35], quantum convolutional neural networks [36, 37, 38, 39, 40], and perception-based quantum neural networks [41], etc. Some of these works show potential quantum advantages over their classical counterparts, which have boosted the development of quantum AI [25]. Although some recent efforts have been initiated to interpret behavior of AI in quantum optical experiments [42], a generally explainable quantum AI is still in its infancy.

Here, inspired by the innovative design of classical CapsNets and the exciting progress in quantum machine learning, we propose a quantum capsule network (QCapsNet), where interacting qubits are encapsulated into a capsule as the building block of the architecture. For the feed-forward process between the two adjacent layers, we propose a quantum dynamic routing algorithm, which may yield an exponential speedup compared with the classical counterpart. We benchmark our model through the classification tasks of both the quantum states and real-life images, and find that its accuracy surpasses conventional quantum classifiers without any capsule architecture. In addition, the combination of symbolism and connectionism prevents the QCapsNet from being fully a black-box model. Through tweaking the active output capsule, one can visualize the variation of a specific explainable feature and semantically interpret the model’s output. Our results not only demonstrate the exceptional performance of the QCapsNet in classification tasks, but also reveal its potential explainability, which may pave a way to explore explainable quantum AI.

Ii General framework

ii.1 Classical capsule networks

To begin with, we first briefly recap the essential idea of the CapsNet. In computer graphics, a set of instantiation parameters is an abstract representation of an entity that is fed into a rendering program so as to generate an image [43]. Here, the entity refers to a segment of an object. For example, nose and mouth are two entities on the face. The motivation behind introducing a capsule is to encapsulate the instantiation parameters as a vector, and thus predict the presence of a specific entity. In this respect, the norm of the capsule vector represents the likelihood that an entity being detected, whereas the components of the capsule vector encode its corresponding features (instantiation parameters), such as pose, deformation, hue, etc. From this perspective, the classical CapsNet is intuitively a neural network that attempts to perform inverse graphics.

The preprocessing layers of the classical CapsNets (e.g., the convolutional layer) first extract some basic features of the input data, and then encapsulate these features into several capsules to represent different entities. Given the -th capsule vector in the -th layer, the prediction vector of the -th capsule in the -th layer is calculated by , where is a weight matrix to be trained by the gradient descent algorithm. After taking into account routing coefficients and a nonlinear normalization function squash, the -th capsule vector in the -th layer is calculated by . The routing-by-agreement algorithm is executed by dynamically updating the routing coefficients with the geometric agreement (such as dot product) . Here, reflects the probability that the lower-level capsule is assigned to the higher-level one. After a few iterations, lower-level capsules with strong agreements will dominate the contribution to the capsule (see the Supplementary Information). The classification result can be deduced from the norm of the output capsule vector (activation probability). By virtue of this algorithm, CapsNets can not only extract information from lower-level capsules, but also preserve their geometric relationships [11]. As a result, CapsNets can address the so-called Picasso problem in image recognition, e.g., an image with an upside-down position of nose and mouth will not be recognized as a human face in CapsNets. In contrast, CNNs will still classify such an image as a human face, as the max-pooling method generally neglects the spatial relationship between entities [14].

ii.2 Quantum capsule networks

Figure 2:

Numerical results of QCapsNets for classifications of handwritten-digit images and symmetry-protected topological states. (a)-(b) show the inaccuracy of the MNIST training and test datasets after the convergence, respectively, as a function of the number of parameters. For concrete comparisons, we equip the QCapsNets with three different sub-QNNs inside the capsule, namely the parameterized quantum circuit (PQC), the deep quantum feed-forward neural network (DQFNN), and the post-selection deep quantum feed-forward neural network (post-DQFNN). Their corresponding QCapsNets are dubbed PQC-Caps, DQFNN-Caps and post-DQFNN-Caps, respectively. The baseline for comparison is a parameterized quantum circuit without any capsule architecture. (c) shows the training process of QCapsNets for symmetry-protected topological states. The training data contains

states, which are generated by uniformly sampling the model’s parameter from to . We find that after epochs, the inaccuracy of the training dataset can be less than . The activation probabilities of two output capsules are plotted in the inset by varying from 0.8 to 1.2. The dot indicates a right classification result while the

dot refers to a wrong one. This result shows that our trained QCapsNet can locate the quantum phase transition point near


We now extend the classical CapsNets to the quantum domain and introduce a QCapsNet model, which holds the appealing promise to showcase exponential advantages. The general framework of a QCapsNet is illustrated in Fig. 1. This network consists of three crucial ingredients, i.e., the preprocessing layers, the capsule layers, and the quantum dynamic routing process. The model’s input is first fed into the preprocessing layers to extract some preliminary features. These features are encapsulated into several quantum states and then sent to the capsule layers. Inside each capsule, there are a group of interacting qubits building up a sub-quantum neural network (sub-QNN). As such, given the -th quantum capsule state in the -th layer as the input, the parameterized sub-QNN can be regarded as a quantum channel that generates a prediction quantum state for the -th capsule in the -th layer. In order to obtain the -th quantum capsule state in the -th layer, , we propose a quantum dynamic routing algorithm. Here, is the routing probability and dynamically updated with the geometric relationship (distance measure) between and . In stark contrast to classical CapsNets, our quantum dynamic routing algorithm is able to evaluate the distance measure of quantum states in parallel, and thus achieve an exponential speedup. The classification results can be read out by measuring the activation probability of capsules in the last layer.

There are several ways to measure the distance of quantum states [44], including the trace distance, fidelity, and the quantum Wasserstein distance [45, 46]. Here, we utilize the

-th moment overlap between the two mixed quantum states


as a geometric measurement tool, where denotes the power of the matrix multiplication. The reason why we choose the -th moment overlap is twofold. On the one hand, the moment order

serves as a hyperparameter in controlling the convergence of the iteration: the larger

we choose, the quicker convergence we obtain (yet, this would require more resources in experiment). On the other hand, such a quantity can be measured in different experiments with a low sample complexity [47, 48]. During the quantum dynamic routing process, the overlap between the prediction state and the capsule state serves as a quantum geometric agreement. Furthermore, the activation probability of each capsule can be accessed through the -th purity of the state: . As the number of qubits increases, the minimum of the purity will exponentially decay to zero. In addition, since the prediction quantum state heavily relies on the quantum process inside each capsule, the specific structure of sub-QNNs can have an enormous impact on the performance of QCapsNets. We will investigate several QCapsNets with different capsule architectures in the following paragraphs.

ii.3 Quantum dynamic routing

Figure 3:

Reconstruction of handwritten-digit images with QCapsNets. (a) The architecture of the reconstruction networks. We use a classical CNN to extract some preliminary features of the input image, and then encode its output into nine primary capsules in the quantum layers. Through quantum dynamic routing process, the essential features of input images are encoded into two digit capsules. Each capsule in the quantum layers is composited by four qubits. We select the capsule with the largest activation probability for quantum state tomography, and feed its classical representation to the classical feed-forward network to reconstruct the input image. The loss function of this model includes two parts. The first part requires the measurement results of the digit capsules, and the second part evaluates the reconstruction error of the decoder networks. (b) The reconstruction results with QCapsNets. Here, we present images with two different labels, i.e., “3” and “6”. For comparison, we show the input (ground truth) data in the first row and the reconstruction result (prediction) in the second row.

We now introduce a quantum dynamic routing process among all the prediction states and the -th capsule state in -th layer, which contains the following two major subroutines.

(i) Measuring the geometric relationship in parallel. In order to leverage the quantum parallelism, we first encode the set of prediction states into the bus registers of two qRAM (quantum random access memory) states [49, 50, 51, 52], namely, the input qRAM state and the output qRAM state . The routing coefficients of and are uniformly initialized at and then dynamically refined by geometric relationships. Owing to the qRAM structure, we can compute all the overlaps in parallel by use of a SWAP test between and [53]. Through tracing out the bus register, these geometric relationships can be efficiently encoded into the address register as a diagonal quantum density matrix with


(ii) Assigning the routing coefficients. With such an overlap state at hand, we can utilize techniques of density matrix exponentiation [54] and Hamiltonian simulation [55] to generate the following unitary operator,


where is the identity operator in the bus register, and the Pauli- gate acts on the ancilla qubit. Next, we apply to the state for a relatively short time, and project the above ancilla qubit to the subspace via post-selection [56]. As a consequence, all these geometric relationships can be assigned into the routing coefficients of the output qRAM state as


Subsequently, the new capsule state can be obtained by tracing out the index qubits of the output qRAM state . Such a top-down feedback increases the routing coefficient for the prediction state that has a large overlap with the new capsule state . Repeating the above quantum routing procedure for a few iterations (usually three in our numerical experiments), the routing coefficients generally converge. The explicit implementation and technical details of the algorithm are presented in the Supplementary Information.

By virtue of the qRAM structure, each capsule can be addressed in steps. Besides, for the geometric measurement between two -dimensional quantum states, the time complexity of the SWAP test is . Therefore, the overall computational complexity of our quantum dynamic routing algorithm is . Compared with the classical counterpart, which generally takes time , our quantum dynamic routing algorithm showcases an exponential speed-up. This is an appealing advantage of QCapsNets, especially when the size of the learning problem becomes large.

Iii Numerical experiments

iii.1 Performance benchmarking

Figure 4: Variations of handwritten-digit images with different perturbations on the active digit capsule. (a) We apply a -axis rotation on the first qubit in the active digit capsule, which shows a variation of the thickness in the reconstruction images. (b) We apply a -axis rotation on the last qubit in the active digit capsule, which indicates a stretch behavior in the reconstruction images. (c) We apply a global -axis rotation on the active digit capsule. The reconstruction images are rotated in different angles, along with a slight deformation. The perturbation parameter is tweaked from to by intervals of 0.06.

To benchmark the performance of QCapsNets, we carry out some numerical experiments about the classification of both classical (e.g., handwritten digit images) and quantum (e.g., topological states) data. Note that in QCapsNets, we can furnish their capsules with various sub-QNNs . Different families of sub-QNNs may bear distinct entangling capabilities and representation power. Thereby in the following numerical experiments, we propose three kinds of QCapsNets with different sub-QNNs, and then benchmark their performance by the classification accuracy. The first sub-QNN is the parameterized quantum circuit (PQC), which has been widely used as a standard ansatz for quantum classifiers [57, 58, 59, 60, 61, 62, 63]

. The second one is the deep quantum feed-forward neural network (DQFNN), which has been proposed to solve the supervised learning problem

[41] and the dynamics of quantum open system [64]

. Inside DQFNN, each node represents a qubit, and their connections are given by parameterized unitary gates. It benefits from an efficient quantum backpropagation algorithm, a relaxed requirement for the quantum coherent time, and a fewer number of qubits owing to a reuse mechanism. The third one is the post-selection deep quantum feed-forward neural network (post-DQFNN), which appends an additional post-selection procedure to each layer of DQFNN. The explicit structures of these sub-QNNs are shown in the Supplementary Information. For brevity, we name QCapsNets equipped with the above three sub-QNNs as PQC-Caps, DQFNN-Caps and post-DQFNN-Caps, respectively. In addition, we use the conventional parameterized quantum circuit (without any capsule architecture) as a baseline for comparison. We focus on the two-category classification problem, and thus supply the last layer of QCapsNets with two capsules, whose explicit structures are discussed in the Supplementary Information.

We first apply QCapsNets to the classification of handwritten digit images in the MNIST dataset [65], which has been widely considered to be a real-life test bed for various machine learning paradigms. In Fig. 2a-b, we plot the scaling of inaccuracy as a function of the number of parameters in QCapsNets for the training and test dataset, respectively. As the number of parameters increases, the inaccuracy of all three QCapsNets is reduced to less than , which considerably surpasses the performance of the baseline. Therefore, given the same number of parameters, owing to the capsule architecture and the quantum dynamic routing mechanism, QCapsNets can extract the information more effectively and may possess an enhanced representation power than the conventional parameterized quantum circuits.

In addition to classical data, QCapsNets also apply to quantum input data. To this end, we further use the QCapsNet as a quantum classifier for the symmetry-protected topological (SPT) states. Specifically, we consider the following cluster-Ising model, whose Hamiltonian reads [66]:


where the Pauli matrices act on the -th spin and is the total number of spins. The parameter indicates the strength of the nearest coupling. This model is exactly solvable and exhibits a well-understood quantum phase transition point at . There is an antiferromagnetic phase for , while a SPT phase for (characterized by a non-local string order). The training data is a set of ground states of Hamiltonian , which is generated by uniformly sampling from to under the periodic boundary condition. In this example, the capsule structure of our QCapsNet is fixed to be DQFNN.

As shown in Fig. 2c, the inaccuracy of the training dataset can drop below within 40 epochs. After training, we generate ground states as the test dataset, with ranging from to . In the inset, we feed the trained QCapsNets with the test dataset, and plot the activation probability of two output capsules as a function of . In the regime far from the critical point, as the magnitudes of significantly depart away, our QCapsNet can precisely distinguish two different phases. In addition, the phase transition point can be inferred from the intersection of

. Although there are some tiny fluctuations near the quantum phase transition point due to finite-size effects, the critical point estimated by the QCapsNet is

, i.e., only a small deviation from the exact value. The loss function and the explicit activation probability of capsules are given in the Supplementary Information.

iii.2 Explainability of QCapsNets

We have demonstrated the exceptional performance of QCapsNets in the classification tasks. Yet, if one would like to further utilize QCapsNets for making critical decisions (e.g., self-driving cars and medical diagnostics) [67], it is of crucial importance to understand, trust, and explain the rational behind the model’s decisions. As such, here we examine the potential explainability of QCapsNets through the following reconstruction scheme.

In Fig. 3a, we attach the QCapsNet to a classical encoder (CNN) and a classical decoder (feed-forward network), and use the whole network to reconstruct the input image from the MNIST dataset. The first two procedures are similar to the ones in the classification task. Some basic features of the image are first extracted by the classical encoder, and then encapsulated into nine primary capsules in the quantum layers. Through quantum dynamic routing process, high-level features are encoded into two digit capsules. We pick up the capsule state with the largest activation probability, and feed it into the classical feedforward network to reconstruct the input image. To guide the active capsule to capture more intrinsic features, we use a composite loss function which takes into account both the classification and reconstruction loss (see the Supplementary Information for their explicit expressions). After training, the reconstruction results are plotted in Fig. 3b, where the first row shows the input images (ground truth) and the second row exhibits the reconstruction images (prediction). These reconstruction results are considerably robust while preserving their representative details.

In the above simulation, we find that the most active capsule contains sufficient information to reconstruct the original image. Hence, the entire Hilbert space of such a quantum capsule state may learn a plenty of variants of the images. After the reconstruction process, we can feed a perturbed capsule state to the trained decoder network, and analyze how the perturbation affects the reconstruction result. As shown in Fig. 4, we test three different types of perturbations on the digit capsule, with the perturbation parameter ranging from to . The first type of perturbation we consider is the -axis rotation on the first qubit. In Fig. 4a, as the perturbation parameter gets larger, the strokes of both digits “3” and “6” become thicker. In Fig. 4b, we apply a -axis rotation on the last qubit. Through tuning the perturbation parameter, both digits have been squeezed to various degrees. In the Fig. 4c, we apply a global -axis rotation on the whole active capsule. By tweaking the perturbation parameter, both digits are rotated at different angles, together with a tiny deformation. These perturbation results indicate that a particular subspace of the digit capsule state could almost represent a specific explainable feature of the handwritten images. As the Hilbert space grows exponentially with respect to the number of qubits, each capsule may have the potential to encode exponential information, which shows an enhanced representation power and implies a quantum advantage over its classical counterpart.

Iv Discussion

We have introduced a QCapsNet model equipped with an efficient quantum dynamic routing algorithm. By exploiting quantum parallelism, the overlaps between two capsule states can be measured in parallel, which showcases an exponential speed-up over the classical counterpart. Through the classification tasks for both the classical handwritten digits and SPT states, we found that QCapsNets achieve the state-of-the-art accuracy among quantum classifiers. By virtue of the geometric relationships, QCapsNets can capture the essential features of the input data, and then encapsulate them into the output capsule. Accordingly, such capsule contains sufficient instantiation parameters to reconstruct the original data. In particular, one specific subspace of the output capsule could correspond to a human-understandable feature of the input data.

Many interesting and important questions remain unexplored and deserve further investigation. First, it would be interesting to consider other distance measures to quantify the geometric relationships between capsule states [44, 45, 46]

. Second, the performance of QCapsNets may be further improved by other quantum routing algorithms. For instance, a quantum extension of the expectation-maximization routing algorithm


might be a good candidate along this line. In addition, quantum adversarial machine learning has attracted considerable attention recently

[61, 68]. It has been shown that, compared to CNNs, CapsNets are relatively resilient against the adversarial attacks [12]. This inspires a natural question concerning whether QCapsNets are also more robust to adversarial perturbations than other traditional quantum classifiers. Finally, such a grouped architecture may shed light on implementing QCapsNets in a collection of distributed quantum computers that are routed via the quantum internet [69].

V Acknowledgements

We thank S. R. Lu and J. G. Liu for helpful discussion. This work was supported by the Frontier Science Center for Quantum Information of the Ministry of Education of China, Tsinghua University Initiative Scientific Research Program, and the Beijing Academy of Quantum Information Sciences. D.-L. D. also acknowledges additional support from the Shanghai Qi Zhi Institute.


  • Russell and Norvig [2020] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, 4th ed. (Pearson, Hoboken, 2020).
  • LeCun et al. [2015] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature 521, 436 (2015).
  • Jordan and Mitchell [2015] M. I. Jordan and T. M. Mitchell, Machine learning: Trends, perspectives, and prospects, Science 349, 255 (2015).
  • Goodfellow et al. [2016] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (The MIT Press, Cambridge, 2016).
  • Gilmore [1960] P. C. Gilmore, A Proof Method for Quantification Theory: Its Justification and Realization, IBM J. Res. Dev. 4, 28 (1960).
  • Eliasmith and Bechtel [2006] C. Eliasmith and W. Bechtel, Symbolic versus Subsymbolic, in Encyclopedia of Cognitive Science (American Cancer Society, 2006).
  • Hu et al. [2014] X. Hu, J. Zhang, J. Li, and B. Zhang, Sparsity-Regularized HMAX for Visual Recognition, PLOS ONE 9, e81813 (2014).
  • Shi et al. [2017] J. Shi, J. Chen, J. Zhu, S. Sun, Y. Luo, Y. Gu, and Y. Zhou, ZhuSuan: A Library for Bayesian Deep Learning, arXiv:1709.05870 (2017).
  • Dong [2020] T. Dong, A Geometric Approach to the Unification of Symbolic Structures and Neural Networks, 1st ed. (Springer, Cham, 2020).
  • Hinton et al. [2011] G. E. Hinton, A. Krizhevsky, and S. D. Wang, Transforming Auto-Encoders, in Artificial Neural Networks and Machine Learning – ICANN 2011, Lecture Notes in Computer Science, edited by T. Honkela, W. Duch, M. Girolami, and S. Kaski (Springer, Berlin, Heidelberg, 2011) pp. 44–51.
  • Sabour et al. [2017] S. Sabour, N. Frosst, and G. E. Hinton, Dynamic Routing Between Capsules, arXiv:1710.09829 (2017).
  • Hinton et al. [2018] G. E. Hinton, S. Sabour, and N. Frosst, Matrix capsules with EM routing, in International Conference on Learning Representations (Vancouver, BC, Canada, 2018).
  • Wang and Liu [2018] D. Wang and Q. Liu, An Optimization View on Dynamic Routing Between Capsules, in International Conference on Learning Representations Workshop (2018).
  • Patrick et al. [2019] M. K. Patrick, A. F. Adekoya, A. A. Mighty, and B. Y. Edward, Capsule Networks – A survey, in Journal of King Saud University - Computer and Information Sciences (2019).
  • Shahroudnejad et al. [2018] A. Shahroudnejad, P. Afshar, K. N. Plataniotis, and A. Mohammadi, Improved Explainability of Capsule Networks: Relevance Path by Agreement, in 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP) (2018) pp. 549–553.
  • Wang et al. [2020]

    L. Wang, R. Nie, Z. Yu, R. Xin, C. Zheng, Z. Zhang, J. Zhang, and J. Cai, An interpretable deep-learning architecture of capsule networks for identifying cell-type gene expression programs from single-cell RNA-sequencing data, 

    Nat. Mach. Intell. 2, 693 (2020).
  • Arute et al. [2019] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, R. Barends, R. Biswas, S. Boixo, F. G. S. L. Brandao, D. A. Buell, B. Burkett, Y. Chen, Z. Chen, B. Chiaro, R. Collins, W. Courtney, A. Dunsworth, E. Farhi, B. Foxen, A. Fowler, C. Gidney, M. Giustina, R. Graff, K. Guerin, S. Habegger, M. P. Harrigan, M. J. Hartmann, A. Ho, M. Hoffmann, T. Huang, T. S. Humble, S. V. Isakov, E. Jeffrey, Z. Jiang, D. Kafri, K. Kechedzhi, J. Kelly, P. V. Klimov, S. Knysh, A. Korotkov, F. Kostritsa, D. Landhuis, M. Lindmark, E. Lucero, D. Lyakh, S. Mandrà, J. R. McClean, M. McEwen, A. Megrant, X. Mi, K. Michielsen, M. Mohseni, J. Mutus, O. Naaman, M. Neeley, C. Neill, M. Y. Niu, E. Ostby, A. Petukhov, J. C. Platt, C. Quintana, E. G. Rieffel, P. Roushan, N. C. Rubin, D. Sank, K. J. Satzinger, V. Smelyanskiy, K. J. Sung, M. D. Trevithick, A. Vainsencher, B. Villalonga, T. White, Z. J. Yao, P. Yeh, A. Zalcman, H. Neven, and J. M. Martinis, Quantum supremacy using a programmable superconducting processor, Nature 574, 505 (2019).
  • Song et al. [2019] C. Song, K. Xu, H. Li, Y.-R. Zhang, X. Zhang, W. Liu, Q. Guo, Z. Wang, W. Ren, J. Hao, H. Feng, H. Fan, D. Zheng, D.-W. Wang, H. Wang, and S.-Y. Zhu, Generation of multicomponent atomic Schrödinger cat states of up to 20 qubits, Science 365, 574 (2019).
  • Wright et al. [2019] K. Wright, K. M. Beck, S. Debnath, J. M. Amini, Y. Nam, N. Grzesiak, J.-S. Chen, N. C. Pisenti, M. Chmielewski, C. Collins, K. M. Hudek, J. Mizrahi, J. D. Wong-Campos, S. Allen, J. Apisdorf, P. Solomon, M. Williams, A. M. Ducore, A. Blinov, S. M. Kreikemeier, V. Chaplin, M. Keesan, C. Monroe, and J. Kim, Benchmarking an 11-qubit quantum computer, Nat. Commun. 10, 5464 (2019).
  • Gong et al. [2021] M. Gong, S. Wang, C. Zha, M.-C. Chen, H.-L. Huang, Y. Wu, Q. Zhu, Y. Zhao, S. Li, S. Guo, H. Qian, Y. Ye, F. Chen, C. Ying, J. Yu, D. Fan, D. Wu, H. Su, H. Deng, H. Rong, K. Zhang, S. Cao, J. Lin, Y. Xu, L. Sun, C. Guo, N. Li, F. Liang, V. M. Bastidas, K. Nemoto, W. J. Munro, Y.-H. Huo, C.-Y. Lu, C.-Z. Peng, X. Zhu, and J.-W. Pan, Quantum walks on a programmable two-dimensional 62-qubit superconducting processor, Science 372, 948 (2021).
  • Zhong et al. [2020] H.-S. Zhong, H. Wang, Y.-H. Deng, M.-C. Chen, L.-C. Peng, Y.-H. Luo, J. Qin, D. Wu, X. Ding, Y. Hu, P. Hu, X.-Y. Yang, W.-J. Zhang, H. Li, Y. Li, X. Jiang, L. Gan, G. Yang, L. You, Z. Wang, L. Li, N.-L. Liu, C.-Y. Lu, and J.-W. Pan, Quantum computational advantage using photons, Science 370, 1460 (2020).
  • Wu et al. [2021] Y. Wu, W.-S. Bao, S. Cao, F. Chen, M.-C. Chen, X. Chen, T.-H. Chung, H. Deng, Y. Du, D. Fan, M. Gong, C. Guo, C. Guo, S. Guo, L. Han, L. Hong, H.-L. Huang, Y.-H. Huo, L. Li, N. Li, S. Li, Y. Li, F. Liang, C. Lin, J. Lin, H. Qian, D. Qiao, H. Rong, H. Su, L. Sun, L. Wang, S. Wang, D. Wu, Y. Xu, K. Yan, W. Yang, Y. Yang, Y. Ye, J. Yin, C. Ying, J. Yu, C. Zha, C. Zhang, H. Zhang, K. Zhang, Y. Zhang, H. Zhao, Y. Zhao, L. Zhou, Q. Zhu, C.-Y. Lu, C.-Z. Peng, X. Zhu, and J.-W. Pan, Strong Quantum Computational Advantage Using a Superconducting Quantum Processor, Phys. Rev. Lett. 127, 180501 (2021).
  • Zhong et al. [2021] H.-S. Zhong, Y.-H. Deng, J. Qin, H. Wang, M.-C. Chen, L.-C. Peng, Y.-H. Luo, D. Wu, S.-Q. Gong, H. Su, Y. Hu, P. Hu, X.-Y. Yang, W.-J. Zhang, H. Li, Y. Li, X. Jiang, L. Gan, G. Yang, L. You, Z. Wang, L. Li, N.-L. Liu, J. J. Renema, C.-Y. Lu, and J.-W. Pan, Phase-Programmable Gaussian Boson Sampling Using Stimulated Squeezed Light, Phys. Rev. Lett. 127, 180502 (2021).
  • Biamonte et al. [2017] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, Quantum machine learning, Nature 549, 195 (2017).
  • Dunjko and Briegel [2018] V. Dunjko and H. J. Briegel, Machine learning & artificial intelligence in the quantum domain: A review of recent progress, Rep. Prog. Phys. 81, 074001 (2018).
  • Das Sarma et al. [2019] S. Das Sarma, D.-L. Deng, and L.-M. Duan, Machine learning meets quantum physics, Phys. Today 72, 48 (2019).
  • Li and Deng [2021] W. Li and D.-L. Deng, Recent advances for quantum classifiers, Sci. China-Phys. Mech. Astron. 65, 220301 (2021).
  • Lu and Braunstein [2014] S. Lu and S. L. Braunstein, Quantum decision tree classifier, Quantum Inf. Process. 13, 757 (2014).
  • Heese et al. [2021] R. Heese, P. Bickert, and A. E. Niederle, Representation of binary classification trees with binary features by quantum circuits, arXiv:2108.13207 (2021).
  • Rebentrost et al. [2014] P. Rebentrost, M. Mohseni, and S. Lloyd, Quantum Support Vector Machine for Big Data Classification, Phys. Rev. Lett. 113, 130503 (2014).
  • Amin et al. [2018] M. H. Amin, E. Andriyash, J. Rolfe, B. Kulchytskyy, and R. Melko, Quantum Boltzmann Machine, Phys. Rev. X 8, 021050 (2018).
  • Lloyd and Weedbrook [2018] S. Lloyd and C. Weedbrook, Quantum Generative Adversarial Learning, Phys. Rev. Lett. 121, 040502 (2018).
  • Dallaire-Demers and Killoran [2018]

    P.-L. Dallaire-Demers and N. Killoran, Quantum generative adversarial networks

    Phys. Rev. A 98, 012324 (2018).
  • Gao et al. [2018] X. Gao, Z.-Y. Zhang, and L.-M. Duan, A quantum machine learning algorithm based on generative models, Sci. Adv. 4, eaat9004 (2018).
  • Hu et al. [2019] L. Hu, S.-H. Wu, W. Cai, Y. Ma, X. Mu, Y. Xu, H. Wang, Y. Song, D.-L. Deng, C.-L. Zou, and L. Sun, Quantum generative adversarial learning in a superconducting quantum circuit, Sci. Adv. 5, eaav2761 (2019).
  • Cong et al. [2019] I. Cong, S. Choi, and M. D. Lukin, Quantum convolutional neural networks, Nat. Phys. 15, 1273 (2019).
  • Li et al. [2020] Y. Li, R.-G. Zhou, R. Xu, J. Luo, and W. Hu, A quantum deep convolutional neural network for image recognition, Quantum Sci. Technol. 5, 044003 (2020).
  • Kerenidis et al. [2019] I. Kerenidis, J. Landman, and A. Prakash, Quantum Algorithms for Deep Convolutional Neural Networks, arXiv:1911.01117 (2019).
  • Liu et al. [2021] J. Liu, K. H. Lim, K. L. Wood, W. Huang, C. Guo, and H.-L. Huang, Hybrid quantum-classical convolutional neural networks, Sci. China Phys. Mech. Astron. 64, 290311 (2021).
  • Wei et al. [2021] S. Wei, Y. Chen, Z. Zhou, and G. Long, A Quantum Convolutional Neural Network on NISQ Devices, arXiv:2104.06918 (2021).
  • Beer et al. [2020] K. Beer, D. Bondarenko, T. Farrelly, T. J. Osborne, R. Salzmann, D. Scheiermann, and R. Wolf, Training deep quantum neural networks, Nat. Commun. 11, 808 (2020).
  • Krenn et al. [2021] M. Krenn, J. S. Kottmann, N. Tischler, and A. Aspuru-Guzik, Conceptual Understanding through Efficient Automated Design of Quantum Optical Experiments, Phys. Rev. X 11, 031044 (2021).
  • Hughes et al. [2013] J. Hughes, A. van Dam, M. McGuire, D. Sklar, J. Foley, S. Feiner, and K. Akeley, Computer Graphics: Principles and Practice, 3rd ed. (Addison-Wesley Professional, Upper Saddle River, New Jersey, 2013).
  • Nielsen and Chuang [2010] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information (Cambridge University Press, Cambridge, 2010).
  • Chakrabarti et al. [2019] S. Chakrabarti, Y. Huang, T. Li, S. Feizi, and X. Wu, Quantum Wasserstein Generative Adversarial Networks, in Proceedings of the 33rd International Conference on Neural Information Processing Systems (Curran Associates Inc., Red Hook, NY, USA, 2019) pp. 6781–6792.
  • Kiani et al. [2021] B. T. Kiani, G. De Palma, M. Marvian, Z.-W. Liu, and S. Lloyd, Quantum Earth Mover’s Distance: A New Approach to Learning Quantum Data, arXiv:2101.03037 (2021).
  • Elben et al. [2020] A. Elben, B. Vermersch, R. van Bijnen, C. Kokail, T. Brydges, C. Maier, M. K. Joshi, R. Blatt, C. F. Roos, and P. Zoller, Cross-Platform Verification of Intermediate Scale Quantum Devices, Phys. Rev. Lett. 124, 010504 (2020).
  • Anshu et al. [2021] A. Anshu, Z. Landau, and Y. Liu, Distributed quantum inner product estimation, arXiv:2111.03273 (2021).
  • Giovannetti et al. [2008a] V. Giovannetti, S. Lloyd, and L. Maccone, Quantum Random Access Memory, Phys. Rev. Lett. 100, 160501 (2008a).
  • Giovannetti et al. [2008b] V. Giovannetti, S. Lloyd, and L. Maccone, Architectures for a quantum random access memory, Phys. Rev. A 78, 052310 (2008b).
  • Park et al. [2019] D. K. Park, F. Petruccione, and J.-K. K. Rhee, Circuit-Based Quantum Random Access Memory for Classical Data, Sci. Rep. 9, 3949 (2019).
  • Hann et al. [2019] C. T. Hann, C.-L. Zou, Y. Zhang, Y. Chu, R. J. Schoelkopf, S. M. Girvin, and L. Jiang, Hardware-Efficient Quantum Random Access Memory with Hybrid Quantum Acoustic Systems, Phys. Rev. Lett. 123, 250501 (2019).
  • Buhrman et al. [2001] H. Buhrman, R. Cleve, J. Watrous, and R. de Wolf, Quantum Fingerprinting, Phys. Rev. Lett. 87, 167902 (2001).
  • Lloyd et al. [2014]

    S. Lloyd, M. Mohseni, and P. Rebentrost, Quantum principal component analysis

    Nat. Phys. 10, 631 (2014).
  • Berry et al. [2007] D. W. Berry, G. Ahokas, R. Cleve, and B. C. Sanders, Efficient Quantum Algorithms for Simulating Sparse Hamiltonians, Commun. Math. Phys. 270, 359 (2007).
  • Lloyd et al. [2013] S. Lloyd, M. Mohseni, and P. Rebentrost, Quantum algorithms for supervised and unsupervised machine learning, arXiv:1307.0411 (2013).
  • Cerezo et al. [2021] M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin, S. Endo, K. Fujii, J. R. McClean, K. Mitarai, X. Yuan, L. Cincio, and P. J. Coles, Variational quantum algorithms, Nat. Rev. Phys. 3, 625 (2021).
  • Benedetti et al. [2019] M. Benedetti, E. Lloyd, S. Sack, and M. Fiorentini, Parameterized quantum circuits as machine learning models, Quantum Sci. Technol. 4, 043001 (2019).
  • Farhi and Neven [2018] E. Farhi and H. Neven, Classification with Quantum Neural Networks on Near Term Processors, arXiv:1802.06002 (2018).
  • Grant et al. [2018] E. Grant, M. Benedetti, S. Cao, A. Hallam, J. Lockhart, V. Stojevic, A. G. Green, and S. Severini, Hierarchical quantum classifiers, Npj Quantum Inf. 4, 1 (2018).
  • Lu et al. [2020] S. Lu, L.-M. Duan, and D.-L. Deng, Quantum adversarial machine learning, Phys. Rev. Research 2, 033212 (2020).
  • Lu et al. [2021] Z. Lu, P.-X. Shen, and D.-L. Deng, Markovian Quantum Neuroevolution for Machine Learning, Phys. Rev. Applied 16, 044039 (2021).
  • Li et al. [2021] W. Li, S. Lu, and D.-L. Deng, Quantum federated learning through blind quantum computing, Sci. China-Phys. Mech. Astron. 64, 100312 (2021).
  • Liu et al. [2020] Z. Liu, L.-M. Duan, and D.-L. Deng, Solving Quantum Master Equations with Deep Quantum Neural Networks, arXiv:2008.05488 (2020).
  • LeCun et al. [1998] Y. LeCun, C. Cortes, and C. Burges, MNIST handwritten digit database (1998).
  • Smacchia et al. [2011] P. Smacchia, L. Amico, P. Facchi, R. Fazio, G. Florio, S. Pascazio, and V. Vedral, Statistical mechanics of the cluster Ising model, Phys. Rev. A 84, 022304 (2011).
  • Finlayson et al. [2019] S. G. Finlayson, J. D. Bowers, J. Ito, J. L. Zittrain, A. L. Beam, and I. S. Kohane, Adversarial attacks on medical machine learning, Science 363, 1287 (2019).
  • Liu and Wittek [2020] N. Liu and P. Wittek, Vulnerability of quantum classification to adversarial perturbations, Phys. Rev. A 101, 062331 (2020).
  • Kimble [2008] H. J. Kimble, The quantum internet, Nature 453, 1023 (2008).
  • Liao [2021] H. Liao, CapsNet-Tensorflow (2021).
  • Géron [2017] A. Géron, 

    Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

    , 1st ed. (O’Reilly Media, Beijing ; Boston, 2017).
  • Choi [1975] M.-D. Choi, Completely positive linear maps on complex matrices, Linear Algebra Its Appl. 10, 285 (1975).
  • Jamiołkowski [1972]

    A. Jamiołkowski, Linear transformations which preserve trace and positive semidefiniteness of operators, 

    Rep. Math. Phys. 3, 275 (1972).
  • Wilde [2017] M. M. Wilde, Quantum Information Theory, 2nd ed. (Cambridge University Press, Cambridge, UK ; New York, 2017).
  • Abadi et al. [2016] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng, TensorFlow: A system for large-scale machine learning, in Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI’16 (USENIX Association, USA, 2016) pp. 265–283.

I Classical capsule networks

Figure S1: The overall structure of classical capsule networks with three essential layers. The input data are first processed through the convolutional layer. Then the corresponding feature maps are fed into the PrimaryCaps layer and reshaped into capsules living in a -dimensional Euclidean space. Owing to the dynamic routing algorithm, the DigitCaps layer ( capsules in a -dimensional space) is able to extract features of entities from lower-level capsules and capture their geometric relationships. The norm of the active capsule in the DigitCaps layer indicates the probability of a specific class being detected.

General architecture. The classical capsule network (CapsNet) was first introduced in Ref. [10], and then equipped with a dynamic routing algorithm in Ref. [11] and an expectation-maximization (EM) routing algorithm in Ref. [12]. CapsNets are multi-layered networks, whose building block is a capsule that represented by a vector, instead of a scalar in Vanilla neural networks (see Table. S1 for their detailed comparisons) [70, 71]. The norm (length) of the vector reflects the probability of an entity being present, while the components (orientations) of the vector indicate the features of the entity. In computer graphics [43], features refer to different types of instantiation parameters of an entity, such as its rotation, pose, hue, texture, deformation, etc. The capsule can learn to detect an entity in an object, together with its corresponding features.

The structure of CapsNets generally consist of three parts (as shown in Fig. S1). The first part is the convolutional layer that used in convolutional neural networks (CNNs). In this layer, the model detects the basic local features of the input data. For the specific task demonstrated in Ref. [11], the MNIST handwritten data are processed through two convolutional layers:

, stride

:1[20,20,256]ReLU Conv2d−−−−−−−−−−−−kernel:9×9, stride:2[6,6,256]=[6,6,32×8]. The second part is the PrimaryCaps layer, which not only captures the features of data, but also generates their combinations. In this layer, the above feature map is reshaped into , corresponding to 1152 capsule vectors living in the -dimensional Euclidean space. These capsule vectors are then fed into the third part — the DigitCaps layer. Therein, via dynamic routing between the PrimaryCaps layer and the DigitCaps layer, the shape of the final output vector is , where is equal to the number of categories for classification. The probability can be read from the norm of the vector, while its components represent the features of the entity, which can be further used to unbox the learning model and seek for its explainability.

input : the index of layer , the number of iteration , the prediction vector
output : the capsule vector in the -th layer
1 initialize: // initialize a uniform routing path
2 for  iterations do
3       // computes Eq.(S3)
4       // weight with the routing parameters
5       // computes Eq.(S4)
6       // update with the geometric relation
Algorithm 1 Dynamic routing for classical capsule networks
Figure S2: Illustration of the classical dynamic routing. The prediction vector of the -th capsule in the -th layer is calculated by , where is a weight matrix and is the -th capsule in the -th layer. The unnormalized capsule is given by with routing coefficients . The output capsule is obtained by with a nonlinear function squash.

Dynamic routing. In the following context we will elucidate the dynamic routing algorithm [11], whose pseudocode is presented in Algorithm 1. Consider the dynamic routing algorithm between the -th and the -th capsule layer in the classical CapsNets (as shown in Fig. S2). Given the -th capsule in the -th layer, we first calculate the prediction vector of the -th capsule in the -th layer by multiplying a weight matrix (trained by backpropagation),


Then we sum over all the prediction vectors in the -th layer with routing coefficients to generate the pre-squashed vector of the -th capsule in the -th layer,


where quantifies the probability that the capsule vector may affect the capsule vector . Finding groups of similar can be seen as a clustering problem, thus we can interpret as the probability that capsule is grouped to cluster , which requires . To this end, we introduce a set of digits (unnormalized routing coefficients) , which is related to the routing coefficient by a softmax function:


We remark that the nonlinearity of the classical CapsNets is contributed by the so-called “squashing” operation:


The function forces short vectors to get shrunk to almost zero length, and long vectors to get shrunk to a length slightly below one, which is analogous to in the classical neural networks. The essence of the dynamic routing algorithm is the so-called routing-by-agreement mechanism, which is able to exact the geometric relationship between prediction vectors and the possible output vector . Note that capsules vectors are living in the Euclidean space, the geometric relationship can be characterized by the dot product between them:


At the beginning of the routing algorithm, all the routing digits are initialized to zero, which leads to a uniformly routing path between layers. During iterations, the routing digits are then iteratively refined by the agreement between the current output capsule and the prediction vector ,


Such a routing-by-agreement mechanism has been demonstrated far more effective than the max-pooling method used in CNNs [11, 12]. In short, CapsNet is a neural network which replaces scalar-output neurons with vector-output capsules and max-pooling with routing-by-agreement.

Training and reconstruction. In order to detect possibly multiple handwritten digits in an image, the classical CapsNets use a separate margin loss for each class digit present in the image [11]:


where is the -th capsule in the DigitCaps layer, is a indicator function for a specific class , i.e., if and only if an object of class is present. The margins are set as and . Here, is a down-weighting term that prevents the initial learning from shrinking the activity vectors of all classes. The total margin loss is just the sum of the losses of all classes. With such a loss function at hand, one can routinely utilize the backpropagation algorithm to train the parameters in the weight matrices . We remark that the routing coefficients are not trainable parameters, but values determined by the dynamic routing algorithm.

Apart from the aforementioned three layers for classification, one can add additional reconstruction layers to enable capsules to encode the corresponding features of input data. Specifically, the most active capsule in the DigitCaps layer can be fed into a decoder which consist of three fully connected layers. After minimizing the mean square error between the outputs of the logistic units and the original pixel intensities, one can use initiation parameters in the most active capsule to reconstruct the input image. In this vein, both the margin loss and the MSE loss should be optimized simultaneously, , where is a small regularization term that scales down so that it does not dominate during training.

Ii Implementation of the quantum dynamic routing algorithm

Networks Classical neural networks Classical capsule networks Quantum capsule networks
Input scalar vector quantum density matrix
Operations Transformation quantum sub-neural networks:
Normalization trace out the index register:
Output scalar vector quantum density matrix
Table S1: Comparisons of different networks

In this section, we will elucidate the quantum dynamic routing process among all the prediction states and the -th capsule state in -th layer, whose pseudocode is presented in Algorithm 2. A concise comparison to the classical CapsNets is given in Table. S1. For brevity, the superscript of the density matrix are omitted temporally in the following context. The overall quantum circuit implementation for our quantum dynamic routing algorithm is demonstrated in Fig. S3.

Measure the geometric relationship in parallel. Recall that in the main text, we quantify the geometric relationship between the prediction state and the capsule state as the -th moment overlap:


where denotes the power of the matrix multiplication. To obtain all the overlaps in parallel, we introduce a qRAM oracle to generate two qRAM states, and , which correlate all the prediction states as


where refers to the index (address) state, denotes copies of the prediction states stored in the bus register, and is the routing probability from the prediction state to the capsule state . Note that the routing probability of the input qRAM state is uniformly fixed at during the whole quantum routing process. However, the routing probability of the output qRAM state is uniformly initialized to only in the first iteration, and then dynamically refined by the geometric relationship.

With such a correlated qRAM states at hand, we can measure all the geometric relationships in parallel by means of a SWAP test [53]. As shown in the Fig. S3a, except for the input qRAM state , we also need to prepare that is stored in the capsule register. This can be done by tracing out the index qubits of the output qRAM state . Then, we introduce an ancilla qubit initialized as and apply a Hadamard gate on it. After that a controlled -SWAP gate (controlled-) is applied to the ancilla, capsule and bus registers. The -SWAP gate is defined as:


which permutes quantum states for one single cycle. After applying another Hadamard gate to the ancilla qubit and tracing out both the capsule and bus registers, we obtain the following quantum state:


Finally, by projecting the ancilla qubit into the subspace with probability , we can efficiently encode all the geometric relationships into the index register as a diagonal quantum density matrix:


The normalized factor can be estimated in measurements with the error for an -dimensional state.

Figure S3: Quantum circuits for the quantum dynamic routing algorithm. a Subroutine for parallel measuring the geometric relationship. The quantum circuit contains four registers: an index register initialized to , a bus register initialized to copies of , a capsule register initialized to copies of the capsule state , and an ancilla register initialized to . The qRAM oracle will generate an input qRAM state that correlates prediction states in the bus register. Then we implement a SWAP test by applying a controlled -SWAP gate [controlled- in Eq. (S11)] to the ancilla, capsule and bus registers. In the output of the circuit, we measure the ancilla qubit and project it in the subspace with a post-selection scheme. Finally, we trace out both the capsule and bus registers, and thus obtain the overlap density matrix , where contains the geometric information. b Subroutine for assigning the routing probability to each capsule. Here we introduce another ancilla qubit that initialized to . The unitary operator is , where is the identity operator in the bus register and the Pauli- gate acts on the ancilla qubit. After a short evolution, we project the ancilla qubit in the subsystem with a post selection scheme. The geometric relationships are now encoded into the output qRAM state as the routing probability.
input : the index of layer , the order of moment , the number of iteration , the prediction quantum density matrix
output : the quantum capsule state in the -th layer
1 initialize: : fix and assign to
2 for  iterations and  do
3       // trace out the index register
4       // parallel update by in Fig.S3a
5       // density matrix exponentiation in Eq.(S14)
6       // apply the phase-shift unitary in Eq.(S15)
7       // update with new in Fig.S3b
Algorithm 2 Quantum dynamic routing for QCapsNets

Assign the routing coefficients. Note that the overlap state contains the geometric information about the -th capsule state with its all corresponding prediction states . Now we would like to utilize to update the routing coefficients in the output qRAM state in Eq. (S9). To begin with, we first introduce two unitary operators, the geometric unitary and the phase-bias unitary , by preparing an extended overlap state and a Hamiltonian ,


where is the identity operator in the bus register, and the Pauli- gate acts on another ancilla qubit. The geometric unitary can be generated by the density matrix exponentiation technique [54]: given copies of the density matrix , the unitary operation can be implemented within error . Since is sparse, the phase-bias unitary can also be efficiently generated by the Hamiltonian simulation technique [55]. Combining these two unitary operators, we generate the following unitary operator,


which encodes all the geometric relationships . Accordingly, as shown in Fig. S3b, we are now able to assign the routing probability to each capsule by applying on the input qRAM state [Eq. (S9)] and the ancilla qubit:


Choosing is so small that and measuring the ancilla qubit in the state [56], we obtain the post-selection state and assign all the geometric relationships to the output qRAM state:


Tracing out the index qubits of the output qRAM state, the capsule state is now refined to with new routing coefficients containing the geometric information. Through repeating the above quantum routing procedure for few iterations (normally 3 in our numerical experiments), the routing coefficients generally converge to constants.

We give some additional remarks that the aforementioned oracle in Eq. (S9) can be treated as a generalization of the conventional qRAM from pure quantum states to mixed quantum states. A conventional qRAM is a device that, given a superposition of addresses , returns a correlated set of data : [49]. On a par with the Choi-Jamiołkowski isomorphism, which establishes a correspondence between quantum channels and density matrices [72, 73, 74], we adopt a bijection that maps a density matrix to a vector in the computational basis . Hence, the conventional qRAM is able to perform , which is equivalent to and thus achieves the oracle .

Iii Numerical details

Here we present the technical details of our numerical simulation. Due to the limited computational power of the classical computer, we choose only a subset of the MNIST dataset consisting of images labeled with “3” and “6”, and reduce the size of each image from pixels into pixels. We normalize and encode these pixels into an -qubit quantum state using the amplitude encoding method, together with an ancilla qubit.

The QCapsNet is composed of the preprocessing layer, the primary capsule layer, and the digit capsule layer. In the preprocessing layer, we apply a 5-depth PQC on the 9-qubit system. Then, we partial trace these nine qubits and rearrange them into 3 capsules in the primary capsule layer. Each capsule contains 3 qubits forming a sub-QNN . We apply to capsule states and obtain their corresponding prediction states . In the quantum dynamic routing process, we fix the moment order in Eq. (1) as , as it already results in a fast convergence. For two-category classifications, we place two capsules in the last layer of QCapsNets. Inside each capsule , we measure the average -directional magnetic moment of 3 qubits, . Then, the activation probability of -th output capsule state is computed by the normalized amplitude , which indicates the presence of label . Consequently, we use the cross entropy as the loss function for the stand-alone classification task,


As for the reconstruction task, the loss function has two parts. Except for the classification loss, we need to consider an extra reconstruction loss. Although we have already achieved the state-of-the-art accuracy in the classification task by use of the cross-entropy loss , such loss function is unable to retain enough information for the output capsule state. Under this circumstance, we replace the above cross-entropy loss with a margin loss as the first part of the loss function for the reconstruction task,


where if and only if matches the label of the input image. We set and as constants, and define the activation probability as the purity of the -th output capsule state . Since the margin loss measures the purity of the capsule states, it can capture the essential quantum information that required to reconstruct the original data. The second part of the loss function is the reconstruction loss, where we use a mean square loss,


to quantify the deviation between the reconstruction image and the original image . Here, refers to the -th pixel of image