Adaptive Routing Between Capsules

11/19/2019 ∙ by Qiang Ren, et al. ∙ 0

Capsule network is the most recent exciting advancement in the deep learning field and represents positional information by stacking features into vectors. The dynamic routing algorithm is used in the capsule network, however, there are some disadvantages such as the inability to stack multiple layers and a large amount of computation. In this paper, we propose an adaptive routing algorithm that can solve the problems mentioned above. First, the low-layer capsules adaptively adjust their direction and length in the routing algorithm and removing the influence of the coupling coefficient on the gradient propagation, so that the network can work when stacked in multiple layers. Then, the iterative process of routing is simplified to reduce the amount of computation and we introduce the gradient coefficient λ. Further, we tested the performance of our proposed adaptive routing algorithm on CIFAR10, Fashion-MNIST, SVHN and MNIST, while achieving better results than the dynamic routing algorithm.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In last few years, deep learning has made breakthroughs in many computer vision tasks, especially convolutional neural networks leading to state-of-the-art performance. In the convolutional neural network, neurons are scalar and unable to learn the complex relationship between neurons. But in the human brain, neurons usually work together rather than work alone. To overcome this shortcoming of convolutional neural networks, Hitton proposed the concept of “capsule”

[4] that a combination of neurons that stack features (neurons) of the feature map into vectors (capsules). In the capsule network, the model not only considers the attributes of the feature when training, but also takes account of the relationship between the features. The proposed dynamic routing algorithm enables the idea of ​​“capsule” to be implemented [15]. After the neurons are stacked into vectors(capsules), the coupling coefficient between the low-layer capsule and the high-layer capsule is learned through a dynamic routing algorithm. The relationship between the partial features and the whole will be obtained.

Improving the performance of neural networks is a major direction of deep learning research. A common method to improve the performance of deep neural networks is to increase the depth of the network. For example, VGG[16], GoogLeNet[17], and ResNet[3]

improves the network depth by proposed effective solutions and continuously improves the accuracy of classification of ImageNet

[1]. In capsule networks, in order to improve the performance of the capsule network can be achieved by increasing the number of capsule layers. Rajasegaran et al. [14] have tried and achieved impressive results in this research direction. However, the dynamic routing algorithm proposed by Sabour et al. [15] cannot simply increase the number of capsule layers in the capsule network.

Dynamic routing algorithm is the method used to learn the relationship between partial features and the whole in a capsule network, but it shows some shortcomings. After several iterations of training, the coupling coefficient of the capsule network shows a large sparsity, indicating that only a small number of low-layer capsules are useful for high-layer capsules. Most coupling coefficient computations are futile, which increases the amount of invalid computation during gradient back-propagation. The sparsity of the coupling coefficients in the dynamic routing algorithm makes most of the gradient flow propagating between the capsule layers very small. If the capsule layer is simply stacked, the gradient in the front layer of the model will become small, so that the model not working. If the interference of the coupling coefficient can be removed during the routing process, the stacked layers can continue to work.

To this end, in this paper, we proposed adaptive routing that a new routing algorithm for capsule networks. Unlike the dynamic routing algorithm, which updates the coupling coefficient at the end of each iteration, our proposed algorithm only updates the low-layer capsule itself at the end of each iteration, which makes the low-layer capsules more ”similar” to the high-layer capsules. Since there is no coupling coefficient , the propagation of the gradient flow in the capsule network is not suppressed during the routing process, so the gradient can be better transmitted to the layer in front of the model. More specifically, we made the following contributions in this article:

  1. The motivation proposed by the adaptive routing algorithm and explains why the dynamic routing algorithm causes the gradient vanishing and the capsule network to not work when stacking multiple layers.

  2. The adaptive routing algorithm is proposed to overcome the shortcoming that the dynamic routing algorithm will cause the gradient vanishing when stacking multiple layers. The adaptive routing algorithm can stack multiple layers and improve the performance of the capsule network.

  3. The iterative process of adaptive routing algorithms can be simplified, and the adaptive routing algorithm without routing process is used. The introduced hyper-parameter is used instead of the iteration number, which reduces the amount of computation and amplifying the gradient.

The rest of the paper is organized as follows: In Section 2, we discuss the related work on Capsule Networks, Section 3 describes the motivation and adaptive routing algorithm, Section 4 shows our experimental results. Finally, Section 5 concludes the paper.

2 Related Work

The capsule network is a new neural network architecture that stacks traditional scalar neurons into vector neurons called “capsule” neurons[4] which can store spatial location information of the feature so that it is more in line with the human brain mechanism. The dynamic routing algorithm was proposed by Sabour et al. [15] that a method learned the coupling relationship between low-layer capsules and high-layer capsules in neural networks so that the capsule network has become a practical model. Then, Hitton et al. [5] proposed the EM routing algorithm, which used matrix capsules instead of vector capsules. The EM routing algorithm is used to iteratively learn the coupling coefficient between the low-layer matrix capsule and the high-layer matrix capsule. In the research field of capsule networks, almost researches related to capsule networks are based on these two algorithms.

In this field, there are many great extensions. Lenssen et al. [11]

proposed a generic routing algorithm that defines the reliable variability and invariance for the capsule network and proved the equal variance of the output pose vector and the output activation. Rajasegaran

et al. [14]

proposed a deep capsule network architecture for the shortcomings of dynamic routing algorithms that cannot simply stack multiple layers. It uses 3D convolution to learn the spatial information between the capsules and the idea of skip connection in the residual network, and the skip connection in the capsule layer allows for a good gradient flow in back-propagation. At the bottom of the network, when skipping connections to more than one layer, a large number of route iterations are used. The 3D convolution is used to generate votes from the capsule tensor for dynamic routing. This helps route a set of localized capsules to a higher layer capsule. Jeong

et al. [7] proposed a new definition method for entities, which deletes the capsules that do not want to be closed and preserves the spatial relationship between low-layer and high-layer entities, and proposed the concepts of building layers and step layers. To capture the relationship between the part and the entire space, another new layer called a ladder layer is introduced, the outputs of which are regressed low-layer capsule outputs from high-layer capsules.

These extensions also make a lot of sense. Zhang et al. [19]

proposed to use a capsule carrier instead of a neuron activation sample, using a set of capsule subspaces, inputting a feature vector on this set of subspaces, and then using the length of the resulting capsule for the pair scores that fall into different categories. Such a capsule projection network (CapProNet) is trained by learning the orthogonal projection matrix of each capsule subspace and it is shown that each capsule subspace is updated until it contains an input feature vector corresponding to the relevant class. Since the dimension of the capsule subspace is low and an iterative method of estimating the matrix inverse is used, the network can be trained with only a small computational overhead. Ding

et al. [2]

divided all capsules into different groups and then performs a group reconstruction routing algorithm to obtain the corresponding advanced capsules. Capsule Max-Pooling is used between the lower and upper layers to prevent overfitting. Li

et al. [12] proposed to use two branches to approximate the routing process: one master branch collects the main information from its direct contact in the lower layer, and one auxiliary branch is based on the schema variables encoded in other lower containers to supplement the main information. These two branches communicate in a fast, supervised, and one-time pass compared to previous iterative and unsupervised routing schemes. As a result, the complexity and runtime of the model are reduced dramatically.

3 Methodology

3.1 Motivation

Figure 1: Illustration of forward data flow and backward gradient flow between the PrimaryCaps layer and the DigitCaps layer with the dynamic routing algorithm. The is a parameter in the affine transformation matrix , and is the feature associated with in the capsule and on the feature maps , , …, . The purple solid arrow represents the forward data flow, and the purple dotted arrow represents the backward gradient flow.

In the capsule networks used the dynamic routing algorithm, the low-layer capsules learn the ability of affine transformation through the affine transformation matrix

. Affine transformation matrix is similar to the Transformer Networks proposed by Jaderberg

et al. [6]

, enabling the capsule to have the ability to transform, scale, rotate, etc. The capsule network uses the backpropagation algorithm to train the parameters of the affine transformation matrix in the model. The coupling coefficient

between the low-layer capsules and the high-layer capsules is iteratively learned by the dynamic routing algorithm. The dynamic routing algorithm outputs the affine-transformed low-layer capsules to the high-layer capsules. During the back-propagation, the coupling coefficient adds weight to the gradient flow.

Figure 1 is the illustration of data flow and gradient flow between adjacent capsule layers with the dynamic routing algorithm. Same as the architecture of capsule network proposed by Sabour[15], the feature maps in the PrimaryCaps layer are , , …, . Features on the feature maps as defined in Equation LABEL:feaure_map below:

(1)

Features on the different feature maps are stacked (8 feature maps as a group) and formed into capsules. And all capsules is in layer and capsules is in layer . Capsules in the lower-layer are composed of features on the feature maps , , …, (36 features on each feature map), which are defined according to the Equation 2 below:

(2)

Affine matrix is defined by Equation 3 and transforms the capsule of dimension 8 to the capsule of dimension 16. Therefore, are obtained by affine transformation of as defined in Equation 4 below:

(3)
(4)

Calculate the weighted sum of and the coupling coefficient to get as described in the Equation 5 below:

(5)

The loss function of the correct category in the capsule network as in Equation

6 (, ) below:

(6)

It can be obtained from Equation 6 that the loss of the capsule networks are related to the length of the capsule and values of the capsule. And is the parameter in the affine transformation matrix , which is learned by the back-propagation algorithm. And is the coupling coefficient, which is learned by iterative calculation of dynamic routing.

When the gradient flows through the adjacent capsule layers, the result is as below:

(7)

In the Equation 7, the is a parameter in the affine transformation matrix , and is the feature associated with in the capsule and on the feature maps , , …, . The values of the gradient in back-propagation will be affected by the coupling coefficient .

Figure 2: 50 maximum values of coupling coefficient in the low-layer predicting the correct digit capsule.

From the Figure 2, the coupling coefficient obtained by the dynamic routing algorithm is mostly close to 0.1 and even smaller[7]. When the capsule networks is stacked in multiple capsule layers, the presence of will make the gradient value smaller which affects the learning of the parameters of the front layer and makes the capsule networks not working.

Figure 3:

Range of gradients in ReLU Conv1 layer using dynamic routing algorithm.

From the Figure 3, we compared the range of gradients in the ReLU Conv1 layer in the original capsule network(used dynamic routing algorithm in only two capsule layers) and multiple capsule network. It turns out that in the front layer of the multiple capsule networks, the gradient value is too small for the network to work.

In summary, the loss is related to the length of capsule . In the process of gradient back-propagation, the value of is close to 0.1 and even smaller, causing the gradient vanishing and making the capsule network not working. If coupling coefficient does not participate in routing iterations, the capsule network will continue to work with multiple capsule layers.

3.2 Adaptive Routing

In order to overcome the shortcomings of the coupling coefficient in the capsule network. We proposed the adaptive routing algorithm that does not involve parameter training in the route iteration process.

Figure 4: The architecture of the capsule network with the adaptive routing algorithm. This figure is the details of the iterative process of the adaptive routing algorithm. The light blue capsule in the low-layer is similar to the blue capsule in the high-layer, so its length becomes longer after iteration, and the lavender capsule in the low-layer is opposite to the blue capsule in the high-layer, so its length becomes shorter after iteration. During the iterative process, the capsules in the low-layer move adaptively towards the direction of the capsule in the high-layer.

In the capsule networks, the direction of the high-layer capsule is close to the maximum direction of the low-layer capsule length, if the coupling coefficient is removed, all the low-layer capsules are directly summed after affine transformation as described in Equation 8 below:

(8)

Squeeze

using the activation function (

squash), then we can obtain (same direction as sj) as in Equation 9:

(9)

From the Figure 4, the direction of the corresponding high-layer capsule is the same as that of the longer capsule in the lower layer. The purpose of the dynamic routing algorithm is that if the low-layer capsule and the corresponding high-layer capsule have higher similarity, the bigger the coupling coefficient between them after iteration. Thus, we can move the low-layer capsule towards the corresponding high-layer capsules. If the low-layer capsule and the corresponding high-layer capsule have higher similarity, then the new moved toward the corresponding high-layer capsule, enhanced directionality based on the original . And if the low-layer capsule and the corresponding high-layer capsule have lower similarity, the new also moved toward the corresponding high-layer capsule, reduced directionality based on the original . The adaptive update process is as defined in Equation 10 below:

(10)

The adaptive routing algorithm can be described as Algorithm 1.

1:procedure Routing(, , )
2:     capsule in layer and capsule in layer
3:     for  iterations do
4:         
5:         
6:         
7:     end for
8:     return
9:end procedure
Algorithm 1 Adaptive algorithm.
Figure 5: Two figures are comparisons of the iterative process of the dynamic routing algorithm and the adaptive routing algorithm.

In the capsule networks used dynamic routing algorithm, is a low-layer capsule neurons after affine transformation. From the dynamic routing in Figure 5, when the dynamic routing algorithm starts iterating, the coupling coefficient of each low-layer capsule neuron for the corresponding high-layer neurons is equal. , , are weighted sum to get , and the weights are , , . After the first routing, calculate the weighted sum for overall low-layer capsules. If the length of the low-layer capsule is larger, its direction is more similar to the direction of the corresponding high-layer capsule. After each iteration, the coupling coefficient are updated according to the dot product (similarity and length) of the low-layer capsules and the corresponding high-layer capsules. Update the new weights , , after iterating through the dynamic routing algorithm. If , , and are more similar, then becomes larger after updating. Similarly, becomes larger in the same direction before iteration. After the dynamic routing algorithm, the orientation of the high-layer capsules is close to the direction of the longer length capsules in the low-layer capsules. With the number of iterations increased, if the low-layer capsules are more similar to corresponding high-layer capsules, the coupling coefficient (weight) is larger. On the other hand, the is smaller.

Similarly, from the adaptive routing in Figure 5, when the adaptive routing algorithm starts to iterate, the coupling coefficient of each low-layer capsule neuron for the corresponding high-layer neurons is removed. , , are summed to get . After the first routing, sum overall low-layer capsules. If the length of the low-layer capsule is larger, its direction is more similar to the direction of the corresponding high-layer capsule. After each iteration, move the low-layer capsule , , to the direction of the high-layer capsule . After each iteration, the low-layer capsule , , will become closer the direction of the high-layer capsule

. After the process of adaptive routing algorithm, the orientation of the high-layer capsules is close to the direction of the capsules with longer lengths in the low-layer capsules. The low-layer capsules will move adaptively to high-layer capsules increasingly, and high-layer capsules will definitely represent the probability of object presence. Without the influence of the coupling coefficient

, the same effect as the dynamic routing algorithm can be obtained.

3.3 Introduce the gradient coefficient

The adaptive routing we proposed does not involve the coupling coefficient in the routing process. And we can simplify the training process of adaptive routing. No parameters need to be trained during the route iteration, only the capsules in the lower layer are summed. When the iteration r=1, the training process of adaptive routing as in Equation 11, 12, 13 below:

(11)
(12)
(13)

So after the first iteration, the output of the adaptive routing algorithm as in Equation 14 below:

(14)

Combine Equation 13 and 14, the input of the second iteration is updated to:

(15)

When the iteration r=2, the training process of adaptive routing as in Equation 16, 17, 18 below:

(16)
(17)
(18)

The introduction of indicates that the is amplified, and its value is close to after the activation function.

So after the second iteration, the output of the adaptive routing algorithm as in Equation 19 below:

(19)

In summary, if the number of iterations increased, the will be larger and finally get as in Equation 20 below:

(20)

The improved adaptive routing without iteration is described as Algorithm 2.

1:procedure Routing(, , )
2:     capsule in layer and capsule in layer
3:     
4:     
5:     return
6:end procedure
Algorithm 2 Adaptive Routing Without Iteration.

In the adaptive routing algorithm as described in the Equation 21 below:

(21)

Combine Equation21 and 6, we will get the gradient flows through the adjacent capsule layers used adaptive routing as below(the meaning of and is equivalent to Equation 7):

(22)

By comparing the Equation 22 the Equation 7 we obtained the improvement of the gradients in the back-propagation between the capsule layers. The gradient coefficient of the dynamic routing algorithm is mostly close to 0.1 or even smaller, which causes the gradient vanishing. The gradient coefficient of the adaptive routing algorithm is a hyper-parameter, usually a positive integer greater than 1, which amplifies the gradient.

Figure 6: Range of gradients in ReLU Conv1 layer using adaptive routing algorithm.

From the Figure 6, we compared the range of gradients in the ReLU Conv1 layer in the multiple capsule layers network(with adaptive routing). Compared with the results of dynamic routing algorithm in the Figure 3, it turns out that in the front layer of the multiple capsule layers network, the value of the gradients is larger and the capsule network still continue to work.

The hyper-parameter not only inhibits the gradient vanishing to some extent, but also the appropriate can magnify the gradient and spread the gradient more smoothly to the front of the model.

4 Experiments

4.1 Implementation

We tested our proposed adaptive routing algorithm for classification experiment on several common datasets, MNIST[10], Fashion-MNIST[18], SVHN[13] and CIFAR-10[9]. For CIFAR-10 and SVHN, we resized the images to and shifted by up to

pixels in each direction with zero padding, and there is no other data augmentation/deformation. For other datasets, original image sizes are used throughout our experiments. In the experiment of two capsule layers, we set the number of capsules per layer is [1152, 10] and the same as the dynamic routing algorithm

[15]. And for the experiment of three capsule layers and four capsule layers, the number of capsules per layer we set is [1152, 256, 10] and [1152, 256, 32, 10] respectively.

We used pytorch libraries for the development of experiment. For the training procedure, we used Adam optimizer with an initial learning rate of 0.001, which is reduced

after each epochs

[8]. We set the batchsize is 128 that train with 128 images each time. The models were trained on GTX-1080Ti and training 150 epoch for every experiment. All experiments were run three times and the results were averaged.

4.2 Classification Results

We tested our proposed adaptive routing algorithm and dynamic routing algorithm on several benchmark datasets, CIFAR10 [9], SVHN [13], Fashion-MNIST [18] and MNIST [10].

Model CIFAR10 SVHN F-MNIST MNIST
DRA 76.05% 93.65% 93.02% 99.65%
ARA 78.41% 94.27% 93.07% 99.65%

Table 1: Classification accuracies of dynamic routing algorithm(DRA) and our proposed adaptive routing algorithm(ARA) with the same configuration as two capsule layers.

From the Table 1, we have obtained the same network configuration and achieved better performance than the dynamic routing algorithm. The routing algorithm between the capsule layers learns the affine transformation of the object and the combination of low-layer capsules and high-layer capsules. Therefore, stacking multiple capsule layers can improve model performance, which can learn more powerful affine transformation capabilities and more complex combinations corresponding adjacent layer capsules.

 =1  =2  =3  =4
2-layers 92.78% 93.23% 93.07% 92.96%
3-layers 93.54% 93.63% 93.39% 93.38%
4-layers 93.61% 93.71% 93.57% 93.41%
Table 2: Classification accuracies of adaptive routing algorithm in different numbers of capsule layers and different values of on the dataset Fashion-MNIST[18].
 =1  =2  =3  =4
2-layers 78.24% 77.97% 78.41% 78.34%
3-layers 78.41% 78.01% 78.66% 78.44%
4-layers 78.42% 78.13% 78.68% 78.50%
Table 3: Classification accuracies of adaptive routing algorithm in different numbers of capsule layers and different values of on the dataset CIFAR10[9].

From the Table 2 and 3 , we have obtained different performances in different numbers of capsule layers and different values of on the dataset CIFAR10[9] and Fashion-MNIST[18]. When the other configuration parameters are identical, the performance of the model improved with the number of capsule layers increased. Moreover, the performance of the model is different by . When the value of is 2 or 3, the performance is better on the the dataset Fashion-MNIST. And when the value of equals to 1 or 3, we can also obtain the better performance on the dataset CIFAR10.

 =0.1  =0.001  =0.0001 =0.00001
2-layers 77.24% 69.25% 10.58% 10.42%
3-layers 10.23% 10.01% 10.22% 10.12%
4-layers 10.18% 10.15% 10.02% 10.06%
Table 4: Classification accuracies of adaptive routing algorithm in small values of on the dataset CIFAR10[9].

From the Table 4, we have obtained different performances in small values of on the dataset CIFAR10[9]. It is obvious that there are two situations leading to the capsule networks not working, First, the capsule network will collapse when the value of is setted to 0.0001 or even less in two capsule layers which is same as the original paper[15]. Second, when the value of is setted to 0.1 or even less in multiple capsule layers (3-layers and 4-layers), the capsule network is not working too. Also, capsule networks using dynamic routing algorithm has the same situation when stacking multiple capsule layers. In the end, by comparing the results of multiple capsule layers in Table 3 and Table 4, it proved that too small gradient coefficients in the capsule network result in the gradient vanishing and according to the value of the coupling coefficient in Figure 2

In our proposed algorithm, is equivalent to the number of iterations in the routing algorithm. In the capsule network, although the number increasing of iterations brings noise, it can enhance the activation probability of high-layer capsules. Further, we can get the best performance in the original capsule network when the number of iterations is three. In the end, although the meaning of the hyper-parameter is the same as the number of iterations, the scale is different.

5 Conclusion

In the original capsule network(used dynamic routing algorithm), the gradient vanishes when the model stacks multiple capsules layers. We analyzed the forward and backward propagation of the data flow in the capsule network and found that the coupling coefficient leads to the gradient vanishing. Therefore, we proposed the adaptive routing algorithm to overcome the disadvantage of gradient vanishing when the network stacks multiple capsule layers, which do not involve the coupling coefficient in the routing process. Considering the process of routing iteration will bring a large amount of computation, first, we derived the iterative process of the adaptive routing algorithm. Second, simplified the iteration of the routing by replacing the number of iteration with a hyper-parameter . The hyper-parameter not only inhibits the gradient vanishing but also the appropriate can magnify the gradient so that it can propagate more effectively to the front of the layers in the model. As a result, our proposed adaptive routing algorithm can achieve better performance than Sabour [15] on Fashion-MNIST[18], SVHN[13] and CIFAR-10[9], and have the state-of-the-art performance on MNIST[10] datasets. Further, we have obtained different performance in the different numbers of capsule layers and different values of hyper-parameters and analyzed the experimental results.

As future work, we will continue to research the capsule network to increase the number of network layers while reducing the amount of computation.

References

  • [1] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li (2009) ImageNet: A large-scale hierarchical image database. See DBLP:conf/cvpr/2009, pp. 248–255. External Links: Link, Document Cited by: §1.
  • [2] X. Ding, N. Wang, X. Gao, J. Li, and X. Wang (2019) Group reconstruction and max-pooling residual capsule network. See DBLP:conf/ijcai/2019, pp. 2237–2243. External Links: Link, Document Cited by: §2.
  • [3] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. See DBLP:conf/cvpr/2016, pp. 770–778. External Links: Link, Document Cited by: §1.
  • [4] G. E. Hinton, A. Krizhevsky, and S. D. Wang (2011) Transforming auto-encoders. See DBLP:conf/icann/2011-1, pp. 44–51. External Links: Link, Document Cited by: §1, §2.
  • [5] G. E. Hinton, S. Sabour, and N. Frosst (2018) Matrix capsules with EM routing. See DBLP:conf/iclr/2018, External Links: Link Cited by: §2.
  • [6] M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu (2015) Spatial transformer networks. See DBLP:conf/nips/2015, pp. 2017–2025. External Links: Link Cited by: §3.1.
  • [7] T. Jeong, Y. Lee, and H. Kim (2019) Ladder capsule network. See DBLP:conf/icml/2019, pp. 3071–3079. External Links: Link Cited by: §2, §3.1.
  • [8] D. P. Kingma and J. Ba (2015) Adam: A method for stochastic optimization. See DBLP:conf/iclr/2015, External Links: Link Cited by: §4.1.
  • [9] A. Krizhevsky, G. Hinton, et al. (2009) Learning multiple layers of features from tiny images. Technical report Citeseer. Cited by: §4.1, §4.2, §4.2, §4.2, Table 3, Table 4, §5.
  • [10] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al. (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §4.1, §4.2, §5.
  • [11] J. E. Lenssen, M. Fey, and P. Libuschewski (2018) Group equivariant capsule networks. See DBLP:conf/nips/2018, pp. 8858–8867. External Links: Link Cited by: §2.
  • [12] H. Li, X. Guo, B. Dai, W. Ouyang, and X. Wang (2018) Neural network encapsulation. See DBLP:conf/eccv/2018-11, pp. 266–282. External Links: Link, Document Cited by: §2.
  • [13] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng (2011) Reading digits in natural images with unsupervised feature learning. In Neural Information Processing Systems Workshop(NeurIPSW), Cited by: §4.1, §4.2, §5.
  • [14] J. Rajasegaran, V. Jayasundara, S. Jayasekara, H. Jayasekara, S. Seneviratne, and R. Rodrigo (2019) DeepCaps: going deeper with capsule networks. See DBLP:conf/cvpr/2019, pp. 10725–10733. External Links: Link Cited by: §1, §2.
  • [15] S. Sabour, N. Frosst, and G. E. Hinton (2017) Dynamic routing between capsules. See DBLP:conf/nips/2017, pp. 3856–3866. External Links: Link Cited by: §1, §1, §2, §3.1, §4.1, §4.2, §5.
  • [16] K. Simonyan and A. Zisserman (2015) Very deep convolutional networks for large-scale image recognition. See DBLP:conf/iclr/2015, External Links: Link Cited by: §1.
  • [17] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich (2015) Going deeper with convolutions. See DBLP:conf/cvpr/2015, pp. 1–9. External Links: Link, Document Cited by: §1.
  • [18] H. Xiao, K. Rasul, and R. Vollgraf (2017)

    Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms

    .
    CoRR abs/1708.07747. External Links: Link, 1708.07747 Cited by: §4.1, §4.2, §4.2, Table 2, §5.
  • [19] L. Zhang, M. Edraki, and G. Qi (2018)

    CapProNet: deep feature learning via orthogonal projections onto capsule subspaces

    .
    See DBLP:conf/nips/2018, pp. 5819–5828. External Links: Link Cited by: §2.