1 Introduction
Deep learning systems are powerful tools for recognition, prediction and strategy in fields such as vision, speech, language and games Lecun_2015 ; Silver_2017 . The mammalian visual system extracts features from objects in cluttered scenes and then combines them for robust recognition. Inversion of sensory processing, using the ubiquitous feedback pathways present throughout sensory systems, is a major component of object recognition Harth_1987 . Learning systems that utilize compositionality of objects Lake_2015 , along with dynamic binding of parts (or features) to wholes (or objects) can become power architectures. Capsule Networks (CapsNets) have the potential to perform object recognition in a natural and systematic fashion.
Capsule Networks Sabour_2017 , Hinton_2018 use a dynamic routing algorithm to calculate a set of routing coefficients that link lower and higherlevel capsules between adjacent layers in the network. Each routing coefficient represents the probability that an individual lowerlevel capsule should be assigned to a higherlevel capsule. These routing coefficients are not learned during training, as is the case for the rest of the network parameters. For each input presented to the network, the routing coefficients are calculated at runtime (during both training and inference) from a set of initial values.
The Softmax function, given by Eq. 1
, has been widely used for object recognition tasks due to its ability to reduce the impact of outlier values in the dataset while still allowing those values to have an effect on the network’s learning ability during training. In CapsNets, Softmax is used to convert the log priors between capsules
i in layer l and capsules j in layer l + 1into a set of assignment probabilities between the capsules. While dampening the effects of outliers can be beneficial when training typical network parameters (following the Maximum Likelihood Estimation principle), outliers in the routing coefficients can provide optimal separation between features in adjacent capsule layers. Since these coefficients are not learned in the conventional sense (i.e., gradients do not flow through the routing coefficients during backpropagation), other normalization functions and methods can be used for the task of dynamic routing. In addition, the function need not be differentiable (e.g., a lookup table can be used to assign lowerlevel capsules to higherlevel capsules).
Here, we show that the use of the scaleinvariant MaxMin function (Eq. 2) improves the performance of CapsNets. We focus on the CapsNet formalism of Sabour et al. Sabour_2017 . The lower bound for the normalization is set to and gives higherlevel capsules the ability to completely disregard nonessential features presented by one of the lowerlevel capsules. This serves as a kind of dynamic dropout for the routing coefficients and forces the network to generalize better. The upper bound, in principle, can be set to any value. We tested upper bounds in the range of and found that the network performs well for all values within the range (we use as the upper bound in the rest of the paper). Bounding the routing coefficients between and in this manner allows each lowerlevel capsule to have an independent assignment probability to each of the higherlevel capsules. That is, the sum of the probabilities for a single lowerlevel capsule across each of the higherlevel capsules is no longer constrained to be . This can be beneficial for CapsNets since, often times, a single feature might have high probabilities of being assigned to multiple higherlevel objects.
The use of MaxMin over Softmax leads to an improvement in the test accuracy across five datasets and allows the use of more routing iterations between capsule layers without overfitting to the training data. In addition, we train a single CapsNet (with minimal data augmentation) on the MNIST dataset and achieve a test error of . With a model majority voting system, we achieve a test error of on MNIST, surpassing the accuracy of the model ensemble used by Wan_2013 by .
Section 2 provides a summary of the threelayer CapsNet from Sabour_2017 and the differences in the routing procedure between the Softmax and MaxMin normalizations. In Section 3, we compare the evolution of the logits and routing coefficients for a CapsNet trained using Softmax and MaxMin. Section 4 shows the tuning curves (i.e., outputs of the routing layer) for the network. Section 5 shows the main results of the sessions trained using Softmax vs. MaxMin. Performance on the MNIST dataset is detailed in Section 6 along with results on the MNIST dataset using other normalization functions.
2 Capsule Network Architecture
The CapsNet architecture we used follows the network described in Sabour_2017 and is shown in Fig. 1. A input image is fed into a convolutional layer (Conv1) that operates on the input with ,
kernels using a stride of
and the ReLU activation. The output of this operation is a
feature map tensor that is then fed into a second convolutional layer (PrimaryCaps) that uses
, kernels, a stride of and the ReLU activation. This results in a feature map tensor, which represents the lowerlevel capsules for the network. Each set ofscalar neurons in the
tensor is grouped channelwise and forms a single lowerlevel capsule , for a total of () lowerlevel capsules.The outputs from PrimaryCaps are fed through a dynamic routing algorithm, resulting in the DigitCaps output matrix. The squashing function used to calculate is as given in Sabour_2017 . Each row in the DigitCaps matrix represents the D instantiation parameters of a single class and the length of a
D vector represents the probability of the existence of a particular class. During training, the nongroundtruth rows are masked with zeros and the matrix is passed to a reconstruction subnetwork that consists of two fullyconnected layers of dimensions
and with ReLU activations and a final fullyconnected layer of dimension with a sigmoid activation. During inference, the row in the DigitCaps matrix with the largest length (i.e., highest probability) is taken as the predicted object class.The inputs to the routing algorithm consist of the prediction vectors, . These prediction vectors are calculated using learned transformation weight matrices and the capsule outputs from the PrimaryCaps layer. The prediction vectors remain fixed inside the algorithm as the routing procedure bootstrap calculates the DigitCaps capsules, , using the prediction vectors. Although there are no gradient flows in the routing layer, both the inputs and outputs of the routing layer are subjected to the usual gradient flows during training. In particular, the DigitCaps capsules are passed onto a subnetwork that learns to reconstruct the original input image. As a result, the prediction vectors and parentlevel capsules tend to evolve such that the scaled summation of the prediction vectors are similar to the parentlevel capsules. In other words, during the forward pass, the network calculates a set of parentlevel capsules that are used to recreate the original image. Any errors in the reconstruction network will backpropagate themselves to the prediction vectors and the preceding layers. During the next forward pass, the prediction vectors will evolve in such a way (via the transformation matrices) that aligns with the previously calculated parentlevel capsules.
The routing procedure from Sabour_2017 is given below for reference. The routing procedure using MaxMin normalization remains largely the same except the Softmax function is replaced with the MaxMin function as given by Eq. 2, where / are the lower/upper bounds of the normalization. For the first iteration, the routing coefficients are initialized to outside of the routing forloop.
Training is conducted similar to the approach taken in Sabour_2017
. Our implementation uses TensorFlow
TensorFlow and the Adam optimizer Adam_Optimizerwith TensorFlow’s default parameters and an exponentially decaying learning rate. Unless otherwise noted, the same network hyperparameters in
Sabour_2017 were used for all training sessions. Original code is adapted from Sabour_Code .Softmax Routing Procedure 
1: Input to Routing Procedure: (, , ) 
2: for all capsules in layer and capsule in layer ( + 1): 0 
3: for iterations: 
4: for all capsule in layer : Softmax() 
5: for all capsule in layer ( + 1): 
6: for all capsule in layer ( + 1): Squash() 
7: for all capsule in layer and capsule in layer ( + 1): 
return 
(1) 
MaxMin Routing Procedure 
1: Input to Routing Procedure: (, , ) 
2: for all capsules in layer and capsule in layer ( + 1): 1.0 
3: for iterations: 
4: for all capsule in layer ( + 1): 
5: for all capsule in layer ( + 1): Squash() 
6: for all capsule in layer and capsule in layer ( + 1): 
7: for all capsule in layer : MaxMin () 2 
return 
(2) 
3 Evolution of Logits and Routing Coefficients
In the CapsNet architecture presented in Fig. 1, the capsules in the PrimaryCaps layer can represent features useful for recognizing objects (e.g., handwritten digits). The convolutional layer before PrimaryCaps allow efficient learning of these features. The dynamic binding of parts (i.e., features) to wholes (i.e., objects) are then carried out through the routing coefficients.
In such CapsNets, the ability to create optimal separation between competing features in adjacent capsule layers is a crucial feature for efficient object recognition. The evolution of the logits and routing coefficients in the routing layer of a CapsNet offer insights into how two adjacent capsule layers assign object features to their wholes. When using Softmax, a zero initialization for the logits sets the routing coefficients to be (assuming parentlevel output capsules) for the first iteration. With MaxMin normalization, the routing coefficients are initialized to for the first iteration; thus, for the first iteration, the parentlevel capsules, , are simply the (nonscaled) squashed summation of the prediction vectors. The top rows of Figs. 2 and 3 show the initial values of the logits and routing coefficients for MNIST and CIFAR10, each for the same training image in their respective datasets. The middle and bottom rows show the evolution of the logits and coefficients throughout the routing procedure. For both the Softmax and MaxMin cases, the logits and coefficients are extracted from the network after their respective sessions have finished training under the same conditions.
As the routing progresses, the logits form a tight cluster around zero, with the majority of the values remaining at (yaxes are logscale for Figs. 2 and 3). Due to the tight clustering and the nonlinear behavior of Softmax, the routing coefficients from a Softmax trained network evolves in a manner similar to their corresponding logits (c.f. Figs. 2 (a) and 3 (a)); i.e., the majority of the routing coefficients remain at their initial value of , with only a few that evolve to significantly different values. As a result, the routing coefficients just barely separate each lowerlevel capsule among the higherlevel capsules. With MaxMin normalization, the majority of the logits also have a value of . However, due to the scaleinvariant nature of the MaxMin normalization, the tight grouping of logits can be better separated to form the routing coefficients (c.f. Figs. 2 (b) and 3 (b)).
MaxMin also allows a lowerlevel capsule to have high assignment probabilities with multiple higherlevel capsules. With Softmax, competition between a lowerlevel capsule and each of the higherlevel capsules reduces the likelihood of multiple high probabilities between features in adjacent capsule layers. Figure 4 shows examples of the routing coefficients for three lowerlevel capsules in PrimaryCaps across the ten higherlevel capsules in DigitCaps for the MNIST and CIFAR10 datasets at the last routing iteration. Since the majority of the logits are tightly clustered around their initial values, Softmax computes nearly identical assignment probabilities between a lowerlevel capsule and each of the higherlevel capsules. MaxMin normalization computes high assignment probabilities for multiple higherlevel capsules. In addition, the differences between the probabilities among the higherlevel capsules (for each lowerlevel capsule) is larger when MaxMin is used. This leads to a more optimal separation between capsules in adjacent layers.
4 DigitCaps Outputs
It is also instructive to examine the outputs of the DigitCaps layer (i.e., the parentlevel capsules,
). During inference, the output capsule with the largest vector length (i.e., highest probability) is used to classify the input image. For each input to the network, an ideal CapsNet would have a single output capsule with probability near
corresponding to the GT class and all other classes with probabilities near . The outputs can be examined on an inputbyinput basis (for each image, there is a corresponding DigitCaps matrix from which the classification is made) or on a classbyclass basis (for each class, there is a corresponding matrix that is the average of the individual matrices for that class).Figures 5 (a) and (b) show the output capsule probabilities for the same set of test images from the MNIST dataset and Figs. 5 (c) and (d) show the output capsule probabilities for the same set of test images from the CIFAR10 dataset. For MNIST, the network is properly trained and both normalizations provide digit class probabilities that are highlypeaked for their corresponding GT classes and vice versa for the other classes. For CIFAR10, the network does better at separating the classes when MaxMin normalization is used (i.e., higher GT probability and lower nongroundtruth probabilities). However, both normalizations produce output capsules that have multiple high peaks, signifying that the networks are not able to adequately differentiate between the object classes. This issue with CapsNets was addressed in Sabour_2017 by including a “noneoftheabove” category for the routing Softmax—our network does not have this category.
If we view the capsules in the DigitCaps layer to be “grandmother cells” Gross_2002 , then how welltuned they are to the objects they recognize provides a picture about the robustness of the system. Figures 6 (a) and (b) show the classaveraged output capsule probabilities for the test images for the MNIST dataset. These can be viewed as the tuning curves of the recognition units and demonstrate the ability of the network to adequately distinguish between each of the digit classes. The similarity between the tuning curves of the respective digits when trained with Softmax (Fig. 6 (a)) and MaxMin (Fig. 6 (b)) show that MaxMin normalization does not degrade the network’s ability to discriminate the ten digits. In contrast, the tuning curves for CIFAR10 (Figs. 6 (c) and (d)) show that, for certain object classes, the discriminability is not as good as those for MNIST. This is also reflected in the accuracies given in Table 1.
5 Results
We compared the network’s performance with MaxMin and Softmax normalizations on five datasets: MNIST MNIST , Background MNIST (bMNIST) and Rotated MNIST (rMNIST) R_and_BG_MNIST , Fashion MNIST (fMNIST) F_MNIST , and CIFAR10 CIFAR10 . In addition, we also evaluated the performance of the networks as a function of the number of routing iterations. All sessions were trained using the same threelayer model shown in Fig. 1 and hyperparameters as in Sabour_2017 . For the results in this section, no data augmentations were used for the datasets with the exception of CIFAR10, where random croppings were conducted for the training images and a centered cropping conducted for the test images. For variations of the MNIST dataset, the PrimaryCaps layer has capsules. For CIFAR10, the PrimaryCaps layer has capsules. Three routing iterations were used for all sessions. Unlike Sabour_2017 , we did not introduce a “noneoftheabove” category for the network classifier.
Table 1
lists the mean of the maximum test accuracies and their standard deviations for the five datasets, and shows that MaxMin normalization provides a consistent improvement in test accuracy compared with Softmax.
^{1}^{1}1Experiments using Softmax normalization and routing coefficients initialized to produced test accuracies of for the MNIST dataset. In particular, the improvement is most significant for datasets that have a nonzero background (i.e., bMNIST and CIFAR10). MaxMin also allows more routing iterations to be conducted without decreasing the test accuracy. As shown in Fig. 7, an improvement in test accuracy is obtained when the number of routing iterations is increased. With Softmax, the test accuracy decreases for all five datasets.Normalization  MNIST []  rMNIST []  fMNIST []  bMNIST []  CIFAR10 [] 

Softmax  99.28 0.06  93.72 0.08  90.52 0.14  89.08 0.19  73.65 0.09 
MaxMin  99.55 0.02  95.42 0.03  92.07 0.12  93.09 0.04  75.92 0.27 
MaxMin also prevents the network from overfitting to the training data, especially as the number of routing iterations is increased. As shown in Fig. 8, the differences between the mean of the maximum training and test accuracies are lower for CapsNets trained using MaxMin compared with Softmax. Thus, MaxMin normalization not only prevents the model from overfitting, but also allows the performance of the network to scale positively with the number of routing iterations.
6 Performance on MNIST
Sabour et al. demonstrates a low test error of Sabour_2017 using a single threelayer CapsNet with Softmax and performing image translation by up to
pixels in each direction with zero padding. Section
5 shows that MaxMin gives an average of improvement in test accuracy compared with Softmax for MNIST. Thus, it stands to reason that a single CapsNet trained using MaxMin and minimal augmentations can outperform the current stateoftheart results on MNIST Wan_2013 . We train the same threelayer CapsNet in Fig. 1 on the full MNIST training images using random image translation by up to pixels with zero padding and random image rotation (around image center) by up to degrees. In addition, we relax the margin loss constraints such that and and use only routing iterations. The batch size used for training is images and the networks are trained for epochs each. All other parameters follow those from Sabour_2017 .Table 2 gives a comparison on the test errors for the images in the MNIST test dataset for networks trained with MaxMin and Softmax using the parameters and image augmentations listed above. Each experiment was conducted a total of times. A single CapsNet using MaxMin achieves a test error of , while a 3model majority vote achieves a test error of . The misclassified images from the model ensemble are shown in Fig. 9. Further discussions on the MNIST results are presented in Appendix A along with misclassifications from each of the three models used in the ensemble.
Normalization  Maximum  Minimum  Mean  Stdev. 

Softmax  0.35%  0.29%  0.32%  0.021% 
MaxMin  0.27%  0.20%  0.24%  0.025% 
6.1 Comparisons with Human Polling Results
Out of predictions the network makes on the MNIST test set, differ from the GT labels. A poll of individuals on these images showed that in some cases ( out of ) the results agreed with the networks’ predictions. Table 3 illustrates this and details for one of the images are given in Fig. 10 (a). Figure 10 (b) shows an image that is consistently misclassified by the network but is almost always correctly classified by humans. This particular example of the digit is misclassified as a by several other methods as well Belongie_2002 ; Stuhlsatz_2012 ; Ciresan_2012
and points to the shortcomings of current machine learning methods, compared to the human brain. Several images from the
are of poor quality and Fig. 10 (c) gives the networks’ predictions and polling results for one such example.Image Index  GT Label 

Network Pred. Label 


1260  7  0.38/0.53  1  0.54/0.64  
1901  9  0.29/0.18  4  0.61/0.79  
2597  5  0.47/0.23  3  0.51/0.79  
4823  9  0.42/0.52  4  0.55/0.71  
5937  5  0.38/0.55  3  0.60/0.76  
9729  5  0.15/0.48  6  0.82/0.80 
6.2 Other Normalizations
MaxMin is not unique in its ability to optimally separate the logits. Various other functions can be applied in the routing procedure for CapsNets. We also tested the following functions on the MNIST dataset: 1) WinnerTakeAll (WTA), 2) sum, 3) centered MaxMin, 4) Zscore, and 5) adjusted log normalizations. An exhaustive study was not done for each of these methods—our primary goal was to probe the utility of other methods in creating the routing coefficients and whether or not a valid probability distribution was a strict requirement for the assignment of capsules. For WTA, each lowerlevel capsule in PrimaryCaps only contributes to a single higherlevel capsule in DigitCaps. The higherlevel capsule assignment is determined by the largest coefficient value for each lowerlevel capsule. The equation for adjusted log normalization is given by:
(subtracting the min value and adding one ensures that the min value of the transformed logits is zero). Sum, centered MaxMin, and Zscore normalizations have their usual meanings. A good initialization for each of the five methods is required in order for the network to converge during training. For MaxMin, the initialization for the routing coefficients was robust across two orders of magnitude (). To simplify matters, we initialized routing coefficients to for all five methods listed above.Table 4 shows the test accuracies on the MNIST datasets for the six methods, including MaxMin. Sum normalization performed the worst among the six methods primarily due to difficulties in loss convergence during training—this issue might be alleviated with a more suitable initialization. Centered maxmin and adjusted log normalizations performed approximately the same as one another and WTA and ZScore performed approximately the same. It is worth noting that the WTA method only results in a decrease in test accuracy. This is somewhat surprising since WTA assigns a value of for out of routing coefficients associated with each lowerlevel capsule. Both centered maxmin and Zscore normalization allow routing coefficients to take on negative values. However, the range of values transformed by Zscore is unbounded and can be difficult to initialize properly. The range of values transformed by centered maxmin is bounded between and . However, it is possible for large negative routing coefficients from centered maxmin normalization to counterbalance large positive routing coefficients, leading to a lower network performance compared with MaxMin. Log transformations generally compress high values and spread low values by expressing the values as orders of magnitude and are useful when a high degree of variation exists within variables. This transformation gives decent performance on MNIST but is difficult to initialize properly since the range of the transformed values are not bounded.
Normalization  Test Accuracy [%] 

MaxMin  99.55 0.02 
Centered MaxMin  99.52 0.03 
Adjusted Log  99.50 0.01 
WTA  99.24 0.36 
ZScore  99.21 0.03 
Sum  72.62 30.05 
7 Summary
In the formalism from Sabour_2017 , the logits are converted to the routing coefficients using the Softmax function. For optimal class separation, the routing coefficients should be widely separated in their values. That is, features that are useful for one class, represented by the output of PrimaryCaps, should be strongly coupled to the features in DigitCaps for that class.
We analyzed the distribution of generated by CapsNets using Softmax and find that they are tightly clustered around the initial value of . One reason for this may be that the Softmax function is not scale invariant and for the range of ’s being produced in the network, Softmax normalization reduces the dynamic range of ’s. With MaxMin normalization, the dynamic range of the routing coefficients is increased. We demonstrate improved recognition errors, ranging from % to % across five datasets and show that MaxMin allows more routing iterations between adjacent capsule layers without overfitting to the training data. Finally, a single CapsNet is able to achieve a stateoftheart result on the MNIST test set using just a single model with minimal data augmentation.
Acknowledgments
We would like to thank Peter Dolce for setting up and running the human polling. KPU acknowledges many useful conversations with PS Sastry.
References
 [1] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Gregory S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian J. Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Józefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Gordon Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul A. Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda B. Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensorflow: Largescale machine learning on heterogeneous distributed systems. CoRR, abs/1603.04467, 2016.
 [2] S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell., 24(4):509–522, April 2002.
 [3] Dan C. Ciresan, Ueli Meier, and Jürgen Schmidhuber. Multicolumn deep neural networks for image classification. CoRR, abs/1202.2745, 2012.
 [4] Charles G. Gross. Genealogy of the “grandmother cell”. The Neuroscientist, 8(5):512–518, 2002. PMID: 12374433.
 [5] E Harth, KP Unnikrishnan, and AS Pandya. The inversion of sensory processing by feedback pathways: a model of visual cognitive functions. Science, 237(4811):184–187, 1987.
 [6] Geoffrey E Hinton, Sara Sabour, and Nicholas Frosst. Matrix capsules with EM routing. In International Conference on Learning Representations, 2018.
 [7] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.
 [8] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
 [9] Brenden M. Lake, Ruslan Salakhutdinov, and Joshua B. Tenenbaum. Humanlevel concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.
 [10] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521:436 EP –, May 2015.
 [11] Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
 [12] University of Montreal. Variations on the mnist digits. http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Public/MnistVariations, 2018. Accessed: 20180717.
 [13] Sara Sabour. Capsule network github repo. https://github.com/Sarasra/models/tree/master/research/capsules, 2018. Accessed: 20180807.
 [14] Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton. Dynamic routing between capsules. In Advances in neural information processing systems, pages 3856–3866, 2017.
 [15] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis. Mastering the game of go without human knowledge. Nature, 550:354 EP –, Oct 2017. Article.
 [16] A. Stuhlsatz, J. Lippel, and T. Zielke. Feature extraction with deep neural networks by a generalized discriminant analysis. IEEE Transactions on Neural Networks and Learning Systems, 23(4):596–608, April 2012.
 [17] Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le Cun, and Rob Fergus. Regularization of neural networks using dropconnect. In International conference on machine learning, pages 1058–1066, 2013.
 [18] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashionmnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
Appendix A Misclassified Images From the MNIST Dataset
The MNIST images misclassified by each of the three models used in the majorityvoting scheme are shown below. Models A and B each misclassified images out of the test images. Model C misclassified images. Images with missing pieces of information present the most challenge to the network. Each of the three models are trained using the same set of network parameters and image augmentations mentioned in Section 6, with the only difference being the weight initializations for the network layers. The image index, model prediction, and GT labels are listed above each image.
Comments
There are no comments yet.