1 Introduction
According to Sabour et al. (2017)
, a capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or an object part. Intuitively, CapsNet can better model spatial relationship by using much more fewer parameters. The experiments in
Sabour et al. (2017) proved this argument in some small datasets like MNIST and smallNORB . However, the routing procedure is very computationally expensive, and the number of routing iterations has to be manually set by testing. To overcome this issue, we propose Generalized CapseNet. The key idea of CapsNet is incorporating the routing procedure to the overall optimization procedure. In other words, it makes the coupling coefficients trainable instead of being calculated by dynamic routing Sabour et al. (2017) or EM routing Hinton et al. (2018).Another interesting question yet to answer is that how to package a capsule from activations of previous convolutional layers. We can select the elements if each capsule at the same position across different feature maps (as Figure 2 Left shows). We can also select the elements of each capsule within each feature map (for example, we can choose each row or column as a capsule as Figure 2 Right shows). Two methods seem different, and the previous one makes more sense since capsules are supposed to capture different spatial features. However, according to our experiment, both ways have similar performance. We do not know the exact reason for the question but will try to answer it in the experiment session.
CapsNet seems more general than standard neural networks (neurons can be considered as a unit length of capsules), so we also explore the scalability of CapsNet by building a twolayer GCapsNet. Unfortunately, the network is inclined to get saturated even after a capsule version of ReLU layer is added. The same thing happens when we try to ‘capsulize’ the whole network, namely capsules other than neurons are the atomic units. For example, a colored input image can be considered as
, 3dimension capsules, so are the following layers.2 Generalized CapsNet
To better illustrate the idea of GCapsnet, let’s check what the normal neural networks’ loss function looks like. Assume we have a dataset
. If we can use neural networks to train a model, the loss function can be defined as Equation 1:(1) 
The loss function for GCapsNet is similar. The only difference is we have extra coupling coefficients (through routing procedure) to train. Note that is the mathematical form of the neural network that we assume would fit the dataset. , , are the number of layers, the number of neurons of layer and the number of neurons of layer . and are the parameters we are supposed to get by training the neural network.
For the generalized CapsNet, is the transformation matrix that maps one type of capsule in one layer to another type of capsule on top of it. is the coupling coefficient between adjacent layers.
(2) 
2.1 Structure of Generalized CapsNet
Like the structure of the previous CapsNets, the generalized CapsNet also includes two phases, capsule transformation, and capsule routing, as Figure 1 shows. Capsule transformation converse one type of capsules into another type of capsules. For example, Sabour et al. (2017) transforms 8dimension capsules into 16dimension capsules while Hinton et al. (2018) transforms capsules into capsules. In theory, we can transform type of capsules into any other type of capsules. The capsules can be of any shape (vector, matrix, cube or even hypercube). Assume the shape of capsules in the lower layer is , the transformation matrix is , then the shape of output capsules in the higher layer is . Capsule routing ensures capsules in lower layers are scaled and sent to their parent capsules in higher layers, as Equation 3 shows.
(3) 
2.2 Squash Function
2.3 Loss Function
(5) 
3 Experiments
3.1 full connected GCapsNet on MNIST
We adopt the same baseline as in paper . The first convolutional layer outputs 256 feature maps. The second convolutional layer outputs 256 feature maps or 3266 8D capsules. The last two layers are fully connected layers. Please check our released code or the original paper written by Sabour et al. (2017) for more details.
We call the Capsule structure that Sabour et al. (2017) proposed full connected CapsNet since each capsule in the higher layer connects to every capsule in the lower layer. As Table 1 shows, no matter whether the reconstruction procedure involved, GCapsNet can always achieve better performance by using much less number of parameters. Note that the performance of the baseline and the GCapsNet reported here is a little lower than in Sabour et al. (2017)
, we consider the difference is caused by different deep learning frame (We use Caffe other than TensorFlow).
Another interesting thing we found is that the way of stacking the activation of neurons does not matter a lot (0.68 VS 0.66). It makes more sense to package capsules across different feature maps since the capsules are supposed to capture a couple of different types of features at the same position, as Figure 2 left shows. However, what we found is that packaging capsules within each feature map (as Figure 2 right shows) as opposed to across feature maps can achieve similar performance. The potential reason for this is that no matter how we package capsules, once the organization of these capsules is fixed, the network will finally learn potential spatial relationship.
3.2 Convolutional GCapsNet on MNIST
Similarly, we call the structure in Hinton et al. (2018) as convolutional CapsNet since the same type of capsules of different positions shares the same transformation matrices. The convolutional GCapsNet we use here is similar to the previously mentioned full connected GCapsNet except we use 6 by 6 kernel for calculating the and 4 by 4 “matrix” capsule instead of 1 by 16 “vector” capsule for the capsule layer. Please check our project for more details. As Table 1 shows, convolutional GCapsNet can achieve better performance compared to the baseline by using fewer parameters.
Above GCapsNets all have a single capsule layer, then a natural question arises, can we build a Multilayer GCapsNet? After all, neural networks can be considered as a special type of CapsNet (the length of each capsule is 1). One straightforward way is stacking multiple capsule layers on top of each other. However, we found this type of CapsNet is easy to get saturated. To solve this issue, we design a capsule version of “ReLU” layer which makes the performance better but still far from satisfying. How to make a CapsNet more scalable will be our next work.
Algorithm  error rate(%)  number of parameters 

baseline  0.83  35.4M 
full connected GCapsNet  0.66  8.2M 
full connected GCapsNet*  0.66  6.8M 
full connected GCapsNet**  0.76  8.2M 
Convolutional GCapsNet  0.75  6.9M 
Convolutional GCapsNet*  0.70  5.5M 

4 Conclusion
GCapsNet incorporates the routing procedure of capsules into the whole optimization process which gets rid of the set of routing times and guarantees the convergence. Also, we tried two different ways of packaging capsules and conclude that how to packaging capsules does not matter a lot. Finally, we evaluated mulilayer CapsNets and found that multilayer CapsNets are easy to get saturated. How to make CapsNet scalable is still an open question.
References
 Edgar et al. (2017) Xi Edgar, Bing Selina, and Jin Yang. Capsule network performance on complex data. 2017. URL https://arxiv.org/abs/1712.03480.
 Hinton et al. (2018) Geoffrey E Hinton, Sara Sabour, and Nicholas Frosst. Matrix capsules with EM routing. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=HJWLfGWRb.
 Sabour et al. (2017) Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. Dynamic routing between capsules. CoRR, abs/1710.09829, 2017. URL http://arxiv.org/abs/1710.09829.
Comments
There are no comments yet.