The practical usage of Deep Neural Networks (DNNs) has been disturbed by the inability to fully understand the rationale behind their predictions (Carvalho et al., 2019), which is especially crucial in many real-world scenarios, such as healthcare, criminal justice and administrative regulation (Rudin, 2019). Thus, interpreting DNNs has attracted ever-increasing research attention in recent years. To this end, existing endeavors (e.g., Network Dissection (Bau et al., 2017)
and Concept Activation Vector(Kim et al., 2018)) mainly target at pinning the semantic concepts to a DNN’s units or layers.
However, the hierarchical inference procedure for specific concept is not effectively captured through simply aligning concepts to units or layers. First, the learned networks entangle all the concepts together. For example, one unit can be responsible for multiple concepts (Morcos et al., 2018; Zhou et al., 2018). Second, only knowing which unit or layer represents which concept is not enough to understand the reasoning process for a given concept. For instance, the ‘conv5-3 unit 151’ of VGG16 pre-trained on ImageNet was shown to represent the concept of ‘Airplane’ (Bau et al., 2017), but it remains unknown how and why this unit infers such concept from the input. In contrast, if the hierarchical network architecture could be disentangled in terms of concepts, the above concerns could be addressed correspondingly. The disentangled sub-architectures naturally form the inference procedures for the concepts.
Therefore, in this paper, we introduce a new concept of Neural Architecture Disentanglement (NAD), which aims to decompose a pre-trained network into sub-architectures consisting of paths that relate to specific concepts.
We term such paths as concept feature paths, which are selected combinations of hidden units from a DNN’s bottom block111 ‘Block’ refers to a combination of convolutional layers, activation functions and normalization layers.
‘Block’ refers to a combination of convolutional layers, activation functions and normalization layers.to its top block. Fig. 1
shows some examples of disentangled concepts from the original network. Moreover, NAD provides a natural metric for evaluating the interpretability of DNNs. The metric is defined as the variance of similarities between paths, which serves as aquantification on how interpretable the network is. More details about the metric can be found in Sec. 4. To our best knowledge, such quantification of network interpretability is the first of its kind.
We conduct extensive experiments with four representative architectures, i.e., VGG (Simonyan & Zisserman, 2014), ResNet (He et al., 2016), DenseNet (Huang et al., 2017) and DARTS-Net (Liu et al., 2018), which range from hand-crafted to autoDL-based. We investigate the interpretability of the above architectures on an object-based dataset (i.e., ImageNet (Deng et al., 2009)) and a scene-based dataset (i.e., Place365 (Zhou et al., 2017)). NAD sheds important light on the information flow of semantic concepts in DNNs, as well as providing a fundamental metric that will facilitate the design of interpretable network architectures.
Several interesting observations are revealed by NAD with the proposed metric: (1) The well-organized hierarchical semantic structure of categories in ImageNet and Place365 can be learned by highly interpretable networks. (2) The order of interpretability, from highest to lowest, is roughly VGG16, DARTS-Net/DenseNet121, and ResNet50 shown in Fig. 6 of our experiments. Interestingly, we note that the connection complexity is not a key factor affecting interpretability. (3) Further analysis reveals that basic operations, i.e., the residual operation and the concat operation, indeed have important effect on the interpretability. Specifically, the residual operation reduces the interpretability, while the concat operation encourages it.
The rest of this paper is organized as follows: Sec. 2 reviews related work. The concept of Neural Architecture Disentanglement (NAD) and the proposed metric for evaluating DNNs’ interpretability are introduced in Sec. 3 and Sec. 4, respectively. Experimental results are given in Sec. 5. Finally, we conclude this work in Sec. 6.
2 Related Work
Post-model Interpretability for DNNs. Post-model interpretability refers to improving the interpretability after building a model. Many studies have been done to understand DNNs by linking the semantic concepts to units or layers. For instance, Activation Maximization (Erhan et al., 2009; Nguyen et al., 2019) optimizes a random input image to maximize a specific unit, and then assigns a concept to this unit by observing what the optimized input image looks like. Network Dissection (Bau et al., 2017, 2019) directly aligns individual hidden units with a set of semantic concepts by using pre-defined pixel-level labels. Concept Activation Vector (Kim et al., 2018) interprets the internal layers of DNNs in terms of human-friendly concepts by learning a normal vector of a plane that differentiates the selected concept from other concepts. Different from the above works, we focus on linking the image-level labeled concepts to the hierarchically disentangled sub-architectures.
. The key idea is to align the features to the standard Gaussian distribution and then attribute the concepts to the features(Tishby et al., 2000; Hu et al., 2019). Instead of disentangling the representation, we focus on disentangling the network itself, decomposing the whole architecture of a pre-trained DNN in terms of the semantic concepts, to interpret the working mechanism.
Dynamic Routings of DNNs. Dynamic routings focus on studying how to select a routing for each input during inference (Bengio et al., 2016, 2013; Bolukbasi et al., 2017; Li et al., 2019). For instance, Distillation Guided Routing (Wang et al., 2018) uses dynamic routings to interpret the neural networks, which finds paths for each individual input, and interprets DNNs by comparing intra-class samples. In contrast, our work disentangles the architecture with different semantic concepts, and interprets DNNs according to the semantic difference between inter-class concepts.
3 Neural Architecture Disentanglement
Neural Architecture Disentanglement (NAD) aims to decompose a pre-trained DNN into a set of sub-architectures consisting of feature paths with corresponding semantic concepts, which further provides a metric for quantifying the interpretability of DNNs. In this section, we first define the problem formulation of NAD. Then, we provide our solution to find the concept feature paths for NAD. In the next section, we define the metric to quantify the interpretability of different DNN models according to NAD.
3.1 Problem Formulation
A pre-trained neural network can be generally subdivided into two parts, i.e., the feature extractor and the classifier. We focus on the information flowing process in the feature extractor.222In this paper, we leave the decision-making process in the classifier alone, wihch focus instead on classifying the features from the feature extractor. As the labels are the most straightforward reflection of human cognition to images, we adopt the image labels as semantic concepts for network disentanglement. Let denote the concept to be studied; the image is sampled from the set with the identical concept . We define the blocks in the feature extractor as functions from the bottom layer to the top layer, and as the classifier. For the input , we can obtain the original activations of concept as:
where denote the activations of the blocks, and denotes the output of the classifier. The corresponding feature path for is defined as a set of binary vectors . The activations produced by this path are:
where denote the selected activations, and denotes the classification output of the concept path . Our target is to find the sparsest to make the inner activations and the classification results as identical as possible to their original outputs. Note that the constraints on the inner activations are necessary, as the combinations of the top layers’ units are enough to produce the desired classification results. In other words, the optimized in the bottom layers can all be without these inner constraints, as the classification results can be maintained using only partial units in the top layers. We form the objective for finding the concept path of as:
where is the metric for evaluating the difference (e.g., L1 distance and L2 distance), is the -norm of the set , is the threshold for adjusting the sparsity of the path, and is the cardinality of the set . Such selected activations form a feature path for concept with their corresponding units. Fig. 2 explains the generation mechanism of the concept feature path.
3.2 Finding Concept Feature Paths
In this subsection, we provide our solution to find such paths by optimizing Eq. 3. Specifically, we first soften the binary values of to be continuous values
. By using the Sigmoid function, we then constrain these values to. The L2 distance is used to measure the difference between the original outputs and the operated outputs. Therefore, the cost function constraining the difference is:
where the selected outputs are:
and is the Sigmoid function.
We regularize directly by their values activated after the Sigmiod function to enforce the sparsity:
We do not fix the threshold , but instead use a hyper-parameter to balance the sparsity and accuracy automatically. Therefore, the final cost function is:
To optimize Eq. 7 and find the appropriate path for concept
, we adopt stochastic gradient descent to update the continuous values. The L2 normalization is applied to the gradient to accelerate convergence.
Therefore, we update every by:
where is the learning rate. The parameters of pre-trained networks are fixed in the training phase. After optimization, we discretize the continuous set back to the binary set by:
where represents the -th vector in the set, and denotes the -th value of this vector. After obtaining feature paths for all the concepts, we have naturally disentangled the network. Alg. 1 summarizes the overall procedure.
4 Interpretability Evaluation for DNNs
Disentangling the network into concept feature paths enables us to measure the similarity between different semantic concepts. This similarity further provides a natural metric for evaluating the interpretability of DNNs. Given two concepts and their feature paths , we calculate the average Jaccard similarity coefficient for elements of and to measure the similarity between concepts and . This coefficient is defined as the size of the intersection divided by the size of the union of the sets. For example, the coefficient between sets and is defined as:
where denotes the cardinality of a set. Then, we can define the similarity between the concepts and with paths and as:
where and denote the indices of elements that are equal to in and . Then, we introduce the following hypothesis to define the metric for evaluating the interpretability of different DNNs.
Interpretability Hypothesis: If one network can better formulate the semantic diversity among different concepts, this network is more interpretable.
According to this hypothesis, we define the metric as the variance of the similarities between concepts, where a larger variance indicates higher interpretability. This metric effectively quantifies the interpretability in terms of semantic diversity in networks. For example, intuitively, the similarity between the concepts ‘hen’ and ‘cock’ should be high, while the similarity between the concepts ‘hen’ and ‘airplane’ should be low. Such semantic diversity can be well depicted by the variance of the similarity. Therefore, the metric for evaluating the interpretability of a given network is defined as:
where and are the path classification accuracies of concepts and , and is the mean value of the similarities with concepts.
In this section, we first introduce the experimental settings. Then, we check the validity of the optimized concept feature paths, and study the character of neural architecture disentanglement with the learned concept feature paths. After that, we show examples of the concept information flows according to the concept feature paths in DNNs. Finally, we study the interpretability versus discriminability of DNNs from the viewpoint of neural architecture disentanglement, and we investigate the relationship between neural concepts and human intuitions across different architectures.
5.1 Experimental Settings
Datasets. We conduct experiments on both the object-based dataset, i.e., ImageNet (Deng et al., 2009), and the scene-based dataset, i.e., Place365 (Zhou et al., 2017). ImageNet is an image dataset organized according to the WordNet hierarchy (Miller, 1998), in which each node of the hierarchy is depicted by hundreds of images. Place365 contains images comprising various unique scene categories, whose labels represent the entry-level of an environment. The hierarchical semantic categories of the above datasets make them very suitable to quantify the intepretability of DNNs. We disentangle the networks into the concept feature paths using the training set, and use the validation set to study the properties of neural architecture disentanglement and the network interpretability.
Network Architectures. We select four representative architectures, i.e., VGG16 (Simonyan & Zisserman, 2014), ResNet50 (He et al., 2016), DenseNet121 (Huang et al., 2017) and DARTS-Net (Liu et al., 2018) for our experiments. In these architectures, the connection type includes the direct connection of VGG, the skip connection of ResNet, the densely-skip connection of DenseNet, and the automatically learned connection of DARTS-Net. The operations include the residual operation and the concat operation. Intuitively, the interpretability of these networks tends to get worse due to their higher complexity of connection. However, we find some counter-intuitive results that do not support this argument in Sec. 5.4.
Experimental Details. We use the models pre-trained
on ImageNet and Place365 with Pytorch(Paszke et al., 2019), and fix their parameters when training the concept feature paths. The architecture of DARTS-Net is searched on ImageNet. The paths are initialized with random values from the standard Gaussian distribution. We set the learning rate for VGG, ResNet, DenseNet and DARTS-Net on ImageNet, and on Place365 by experience. Adjusting the hyper-parameter to balance the accuracy and sparsity is trivial, and we manually adjust it to on ImageNet, and on Place365, which effectively balance the accuracy and sparsity. The reported values are averaged results of paths with different initializations.
5.2 Validity of Concept Feature Paths in DNNs
We verify the validity of concept feature paths in neural architectures by checking the concept classification accuracy. Specifically, we gather the images from the validation set with the grouped labels, and calculate the concept precision for the same class by running through the corresponding concept feature paths of the disentangled networks. We discretize the Top@1 concept recognition precision (%) into the bins and count the number of classes whose precision is located in the above bins. From the results in Fig. 3, we find that the concepts are classified more precisely via the concept feature paths compared with the original network architectures, which suggests that the semantic concepts are well linked to the concept feature paths. For example, the concept precision distribution of the original VGG16 on Place365 is mainly found in range . Through the concept feature paths, the concept precision distribution of the disentangled VGG16 is squashed into the range .
5.3 Concept Information Flows in DNNs
We study the information flows of concepts in DNNs, and investigate the properties of neural architecture disentanglement by a horizontal comparison in the same architecture and a vertical comparison across different architectures. We first show the unit usage rate of different architectures. The unit usage rate is defined as the average number of selected units for different concepts divided by the number of their corresponding channels. Fig. 4 shows the results of the unit usage rate. From the horizontal comparison in the same architecture, we see that the unit usage rate generally decreases, which intuitively verifies the common statement that the bottom layers of DNNs share low-level patterns and the top layers of DNNs form the high-level semantic information. However, by the vertical comparison across different architectures, we find the overall unit usage rate roughly decreases in order of VGG, DARTS-Net, DenseNet, and ResNet, which suggests the semantic information is gradually compressed into less units. To explain this interesting insight, we think that the residual operation entangles information more severely than the concat operation through the skip connection over layers, thus learning features that highly entangle semantic information for classification. We discuss this in detail in Sec. 5.4.
We further use VGG16 as an example to visualize the concept information flows by the activations. Specifically, we combine different animals or scenes into one input image, and visualize the activations from the bottom layers to the top layers. Fig. 1 shows examples of inputs and their activations from the last block of the disentangled paths. We show the activations from the bottom layers to the top layers of the original VGG and the disentangled in Fig. 5, from which we clearly see the information flow of semantic concepts.
Generally, VGG16 first detect edges and colors, and then the small pieces are grouped into concrete patterns. The concrete patterns related to the specified concept are selected lastly. For example, the feature path of ‘panda’ first detects the round and straight edges, red, white and green colors in the blocks from 1 to 4, then the blocks from 5 to 9 gradually group the small pieces into concrete patterns such as ‘eyes of panda’ and ‘head of cock’, and the final blocks from 10 to 13 select the patterns which are most related to the concept ‘panda’. After this, the patterns are sent to the classifier to finish the decision-making process.
5.4 Interpretability versus Discriminability of DNNs
We study the interpretability versus discriminability of DNNs from the viewpoint of neural architecture disentanglement in this sub-section. Specifically, we compute the interpretability by Eq. 10, Eq. 11 and Eq. 12 after distanglement, and compute the classification accuracy by the original architectures of VGG16, ResNet50, DenseNet121 and DARTS-Net on ImageNet and Place365. From the quantitative results in Fig. 6, we find VGG16 is the most interpretable architecture among the compared architectures, which is intuitive as the VGG16 is the simplest architecture.
The interesting results come from the interpretability of ResNet50, DenseNet121 and DARTS-Net. Intuitively, the more complex connections between layers should make the architecture less interpretable, let alone the automatically-learned connections. However, we find ResNet50 has the worst interpretability among those architectures, and the automatically-learned DARTS-Net has comparable, or even better, results than DenseNet121.
By checking the architectures, we find the outputs of blocks are produced by the residual operation in ResNet, and concat operations in DenseNet121 and DARTS-Net. Therefore, we believe that the residual operation entangles semantic information more severely than the concat operation through the skip connections over layers. Specifically, the concat operation learns features that better depict the similarity and diversity with different concepts, while the residual operation somehow entangles such information. On one hand, we think that the model capacity is greatly enlarged by the residual operation, which might make the model more likely to overfit the training data. On the other hand, the residual operation adds the outputs of layers while the concat operation concatenates them, which could intensify information entanglement and impede the information flow in the network. Based on the above analysis, we conclude that the residual operation hinders the disentanglement of neural architectures, thus affecting their interpretability.
We further verify such findings by comparing the relationship between the concepts learned by neural networks and human perception. Concepts in human perception are often organized according to similar patterns caught by our visual systems, which is significant for us to understand the working mechanism of brains in cognitive science. For example, we may cluster the ‘white wolf’ with ‘white fox’ as the same species by their similar patterns such as white fur and pointed ears.
To study whether neural networks have this property, we define the neural concepts of DNNs by the learned concept feature paths. By Eq. 11, we can compute the similarity between any two neural concepts. Therefore, we select classes grouped by six primary human concepts manually, and visualize the cluster heatmaps of the path similarity for ImageNet and Place365, respectively, in Fig. 7. From the cluster results, we find that the clustering of similar concepts is gradually destroyed from VGG16, DARTS-Net, DenseNet121, to ResNet50. Additionally, from the heatmaps, we find that the well summarized relationship indicating the similarity of neural concepts (e.g., the concept ‘white wolf’ is similar to ‘white fox’ but is dissimilar to ‘grass snake’ on ImageNet, and the concept ‘hotel outdoor’ is similar to ‘inn outdoor’ but dissimilar to ‘waterfall’ on Place365) is gradually broken from VGG16, DARTS-Net, DenseNet121, to ResNet50, and the relationship between neural concepts becomes out-of-order with consistent similarity. These results support our findings on the architecture interpretability.
In this paper, we present a novel concept, termed Neural Architecture Disentanglement (NAD), for better understanding of DNNs’ interpretability. Differing from the current line of research, which links semantic concepts to a DNN’s single unit or single layer, we prefer to capture the hierarchical inference procedure throughout the network. Specifically, a pre-trained network is disentangled according to specific concepts, forming the concept feature paths which capture the concept flows from the bottom to top layers of a DNN. This procedure also enables to evaluate the interpretability of neural architectures. In experiments, we study the interpretability of DNNs on object-based and scene-based datasets (i.e., ImageNet and Place365) with four types of representative architectures ranging from handcrafted (i.e., VGG, ResNet and DenseNet) to autoML-based (i.e., DARTS-Net). Experimental results suggest that the connection complexity is not crucial for the neural network interpretability, while the basic operations, such as residual operation and concat operation, should be considered instead. The architecture disentanglement not only sheds important light on the information flow of semantic concepts in DNNs, but also provides a fundamental metric that facilitates the design of explainable network architectures.
- Bau et al. (2017) Bau, D., Zhou, B., Khosla, A., Oliva, A., and Torralba, A. Network dissection: Quantifying interpretability of deep visual representations. In
- Bau et al. (2019) Bau, D., Zhu, J.-Y., Strobelt, H., Zhou, B., Tenenbaum, J. B., Freeman, W. T., and Torralba, A. Gan dissection: Visualizing and understanding generative adversarial networks. In International Conference on Learning Representations, 2019.
- Bengio et al. (2016) Bengio, E., Bacon, P.-L., Pineau, J., and Precup, D. Conditional computation in neural networks for faster models. In International Conference on Learning Representations, 2016.
- Bengio et al. (2013) Bengio, Y., Léonard, N., and Courville, A. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
Bolukbasi et al. (2017)
Bolukbasi, T., Wang, J., Dekel, O., and Saligrama, V.
Adaptive neural networks for efficient inference.
International Conference on Machine Learning, 2017.
- Carvalho et al. (2019) Carvalho, D. V., Pereira, E. M., and Cardoso, J. S. Machine learning interpretability: A survey on methods and metrics. Electronics, 2019.
- Deng et al. (2009) Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009.
- Erhan et al. (2009) Erhan, D., Bengio, Y., Courville, A., and Vincent, P. Visualizing higher-layer features of a deep network. University of Montreal, 2009.
- He et al. (2016) He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
- Higgins et al. (2017) Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., and Lerchner, A. beta-vae: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations, 2017.
- Hu et al. (2019) Hu, J., Ji, R., Zhang, S., Sun, X., Ye, Q., Lin, C.-W., and Tian, Q. Information competing process for learning diversified representations. In Advances in Neural Information Processing Systems, 2019.
- Huang et al. (2017) Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
- Kim et al. (2018) Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, J., Viegas, F., and Sayres, R. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International Conference on Machine Learning, 2018.
- Li et al. (2019) Li, Y., Ji, R., Lin, S., Zhang, B., Yan, C., Wu, Y., Huang, F., and Shao, L. Dynamic neural network decoupling. arXiv preprint arXiv:1906.01166, 2019.
- Liu et al. (2018) Liu, H., Simonyan, K., and Yang, Y. Darts: Differentiable architecture search. In International Conference on Learning Representations, 2018.
- Miller (1998) Miller, G. A. WordNet: An electronic lexical database. MIT press, 1998.
- Morcos et al. (2018) Morcos, A. S., Barrett, D. G., Rabinowitz, N. C., and Botvinick, M. On the importance of single directions for generalization. In International Conference on Learning Representations, 2018.
- Nguyen et al. (2019) Nguyen, A., Yosinski, J., and Clune, J. Understanding neural networks via feature visualization: A survey. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. 2019.
- Paszke et al. (2019) Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, 2019.
- Rudin (2019) Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 2019.
- Simonyan & Zisserman (2014) Simonyan, K. and Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Tishby et al. (2000) Tishby, N., Pereira, F. C., and Bialek, W. The information bottleneck method. arXiv preprint physics/0004057, 2000.
- Wang et al. (2018) Wang, Y., Su, H., Zhang, B., and Hu, X. Interpret neural networks by identifying critical data routing paths. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
Zhang et al. (2018)
Zhang, Q., Nian Wu, Y., and Zhu, S.-C.
Interpretable convolutional neural networks.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
Zhou et al. (2017)
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., and Torralba, A.
Places: A 10 million image database for scene recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
- Zhou et al. (2018) Zhou, B., Sun, Y., Bau, D., and Torralba, A. Revisiting the importance of individual units in cnns via ablation. arXiv preprint arXiv:1806.02891, 2018.