Deep neural networks (DNNs) have shown promise in many tasks of artificial intelligence. However, there still lack mathematical tools to diagnose feature representations in intermediate layers of a DNN, e.g. discovering flaws in feature representations or identifying reliable and unreliable features. Traditional evaluation of DNNs based on the testing accuracy cannot insightfully examine the correctness of knowledge representations of a DNN.
Thus, in this paper, we propose to diagnose feature representations of intermediate layers of a DNN from the perspective of knowledge isomorphism. Given two DNNs pre-trained for the same task, we aim to examine whether intermediate layers of the two DNNs encode isomorphic knowledge. If the DNNs are well learned using the same training data, ideally, the DNNs need to converge to similar knowledge representations, no matter whether the DNNs have the same or different architectures.
Note that this research focuses on the isomorphism of the intermediate-layer knowledge between different DNNs, instead of comparing the similarity of features. Here, the “knowledge” is metaphysical, which means concepts that are modeled/memorized by intermediate-layers of a DNN. In comparison, the “feature” is referred to as the explicit output of a layer. Two DNNs extract totally different feature maps but may represent similar knowledge (a toy example of knowledge isomorphism is shown in the footnote111As a toy example, we show how to revise a pre-trained DNN to generate different features but represent isomorphic knowledge. The revised DNN shuffles feature elements in a layer and shuffles the feature back in the next convolutional layer , where ; is a permutation matrix.).
In general, we can understand knowledge isomorphism as follows. Let two DNNs and be learned for the same task. and denote two intermediate-layer features of and , respectively. Then, and contain both feature components corresponding isomorphic knowledge and non-isomorphic feature components . We assume that isomorphic components can reconstruct each other, i.e. can be reconstructed from ; vice versa.
More crucially, knowledge isomorphism between DNNs can be used to diagnose feature reliability of DNNs. Usually, isomorphic components represent common and reliable knowledge. Whereas, non-isomorphic components mainly represent unreliable knowledge or noises.
Therefore, in this paper, we propose a generic definition for knowledge isomorphism between two pre-trained DNNs, and we develop a method to disentangle isomorphic feature components from feature maps of intermediate layers in the DNNs. Our method is both task-agnostic and network-agnostic. I.e. (1) our method does not require any annotations w.r.t. the task for evaluation; (2) our method can be broadly applied to different DNNs as a supplement evaluation of DNNs besides the testing accuracy. Experiments demonstrated our assumption, i.e. the disentangled isomorphic feature components are usually more reliable for the task. Thus, our method of disentangling isomorphic features can be used to boost performance.
Furthermore, to enable a solid research on knowledge isomorphism, we consider the following issues.
Fuzzy isomorphism at different levels: As shown in Fig. 1, unlike traditional isomorphism problems, the knowledge isomorphism between DNNs needs to be defined at different fuzziness levels, because there is no strict knowledge isomorphism between two DNNs.
Disentanglement & quantification: We need to disentangle and quantify feature components, which correspond to the isomorphic knowledge at different fuzziness levels, away from the chaotic feature map. Similarly, we also disentangle and quantify feature components that are not non-isomorphic.
To define the isomorphism of a fuzziness level, we propose a model for feature reconstruction, which is agnostic to the target DNN. We define to represent components from the feature of the DNN that corresponds to isomorphic knowledge w.r.t. the feature at the -th fuzziness level (or the -th order), where measures the amount of non-linear operations during signal processing of . is also termed the -order isomorphic feature of w.r.t. .
In this way, the most strict isomorphism is the 0-order isomorphism, i.e. can be reconstructed from
via a linear transformation. In comparison, some neural activations in the 1-order isomorphic featureare not directly represented by
and need to be estimated (or guessed) via a non-linear transformation. The smallerindicates the less guessing involved in the reconstruction and the stricter isomorphism. Note that the number of non-linear operations is just a rough approximation of the difficulty of guessing, since there are no standard methods to quantify guessing effects.
More crucially, we implement the network-agnostic model as a neural network, where is set as the number of non-linear layers in . As shown in Fig. 2, is designed to disentangle and quantify isomorphic feature components of different orders between DNNs. Our method can be applied to different types of DNNs and explain the essence of various deep-learning techniques.
Our method provides a new perspective for explaining the effectiveness of knowledge distillation. I.e. we explore the essential reason why the born-again network BornAgain exhibits superior performance
Our method gives insightful analysis of network compression.
Our method can be used to diagnose and refine knowledge representations of pre-trained DNNs and boost the performance without any additional annotations for supervision.
Contributions of this study can be summarized as follows. (1) In this study, we focus on a new problem, i.e. the knowledge isomorphism between DNNs. (2) We define the knowledge isomorphism and propose a task-agnostic and model-agnostic method to disentangle and quantify isomorphic features of different orders. (3) Our method can be considered as a mathematical tool to analyze feature reliability of different DNNs. (4) Our method provides a new perspective on explaining existing deep-learning techniques, such as knowledge distillation and network compression.
2 Related work
In spite of the significant discrimination power of DNNs, black-box feature representations of DNNs have been considered an Achilles’ heel for decades. In this section, we will limit our discussion to the literature of explaining or analyzing feature representations of DNNs. In general, previous studies can be roughly classified into the following three types.
Explaining DNNs visually or semantically: First, visualization of DNNs is the most direct way of explaining knowledge hidden inside a DNN, which include gradient-based visualization CNNVisualization_1 ; CNNVisualization_2 and inversion-based visualization FeaVisual . Zhou et al. CNNSemanticDeep
developed a method to compute the actual image-resolution receptive field of neural activations in a feature map of a convolutional neural network (CNN), which is smaller than the theoretical receptive field based on the filter size. Based on the receptive field, six types of semantics were defined to explain intermediate-layer features of CNNs, including objects, parts, scenes, textures, materials, and colorsInterpretability ; interpretableDecomposition .
Beyond visualization, some methods diagnose a pre-trained CNN to obtain insight understanding of CNN representations. Fong and Vedaldi net2vector analyzed how multiple filters jointly represented a specific semantic concept. Selvaraju et al. visualCNN_grad_2 , Fong et al. visualCNN_grad , and Kindermans et al. patternNet estimated image regions that directly contribute the network output. The LIME trust and SHAP shap assumed a linear relationship between the input and output of a DNN to extract important input units.
Unlike previous studies visualizing visual appearance encoded in a DNN or extracting important pixels, our method disentangles and quantifies the isomorphic components of features between two DNNs. Isomorphic feature components of different orders can be explicitly visualized.
Learning explainable deep models: Compared to the post-hoc explanations of DNNs, some studies directly learn more meaningful CNN representations. Previous studies extracted scene semantics CNNSemanticDeep and mined objects ObjectDiscoveryCNN_2 from intermediate layers. In the capsule net capsule , each output dimension of a capsule may encode a specific meaning. Zhang et al. interpretableCNN proposed to learn CNNs with disentangled intermediate-layer representations. The infoGAN infoGAN and -VAE betaVAE learned interpretable input codes for generative models.
Mathematical evaluation of the representation capacity: Formulating and evaluating the representation capacity of DNNs is another emerging direction. Novak et al. NetSensitivity proposed generic metrics for the sensitivity of network outputs with respect to parameters of neural networks. Zhang et al. NetRethinking discussed the relationship between the parameter number and the generalization capacity of deep neural networks. Arpit et al. NetMemOrNot discussed the representation capacity of neural networks, considering real training data and noises. Yosinski et al. CNNAnalysis_2 evaluated the transferability of filters in intermediate layers. Network-attack methods CNNInfluence ; CNNAnalysis_1 ; CNNInfluence can also be used to evaluate representation robustness by computing adversarial samples. Lakkaraju et al. banditUnknown discovered knowledge blind spots of the knowledge encoded by a DNN via human-computer interaction. CNNBias discovered potential, biased representations of a CNN due to the dataset bias. NetManifold learned the manifold of network parameters to diagnose DNNs. Recently, the stiffness Stiffness was proposed to evaluate the generalization of DNNs.
The information-bottleneck theory InformationBottleneck ; InformationBottleneck2 provides a generic metric to quantify the information contained in DNNs. The information-bottleneck theory can be extended to evaluate the representation capacity of DNNs IBGeneralization ; InformationPlane . Achille et al. InformationDropout further used the information-bottleneck theory to revise the dropout layer in a DNN.
In comparison, our method diagnoses feature representations from a new perspective of knowledge isomorphism. Our method can be used to refine network features and explain the success of existing deep-learning techniques.
In this section, we will introduce the network architecture to disentangle feature components of isomorphic knowledge at a certain fuzziness level, when we use the intermediate-layer feature of a DNN to reconstruct intermediate-layer feature of another DNN222Without loss of generality, we assume and have been normalized to zero mean and unit variance.
have been normalized to zero mean and unit variance.. As shown in Fig. 2, the network with parameters has a recursive architecture with blocks. The function of the -th block is given as follows.
The output feature is computed using both the raw input and the feature of the higher order . denotes a linear operation without a bias term. The last block is given as . This linear operation can be implemented as either a layer in an MLP or a convolutional layer in a CNN. is referred to a diagonal matrix for element-wise variance of , which is used to normalize the magnitude of neural activations. Because of the normalization, the scalar value roughly controls the information ratio of w.r.t. . corresponds to final output of the network.
In this way, the entire network can be separated into branches (see Fig. 2), where the -th branch () contains a total of non-linear layers. Note that the -order isomorphic knowledge can also be represented by the network with -th branch, if .
In order to disentangle isomorphic features of different orders, the -th branch is supposed to exclusively encode the -order isomorphic features without representing lower-order isomorphic features. Thus, we propose the following loss to guide the learning process.
where and denote intermediate-layer features of two pre-trained DNNs. The second term in this loss penalizes neural activations from high-order branches, thereby forcing as much low-order isomorphic knowledge as possible to be represented by low-order branches.
Furthermore, based on branches of the network, we can disentangle feature representations of into additive components.
where indicates feature components that cannot be represented by . denotes feature components that are exclusively represented by the -order branch.
Based on Equation (1), the signal-processing procedure for the -th feature component can be represented as (see Fig. 2). Therefore, we can represent the exact transformation operation of the -th branch as follows to disentangle the -th feature component.
is a diagonal matrix that represents gating states of an ReLU layer (). Each element is computed as , where .
4 Comparative study
As a generic tool, the proposed network can be used for different applications. We designed various experiments to demonstrate utilities of knowledge isomorphism, including (1) diagnosing and debugging pre-trained DNNs, (2) evaluating the instability of learning DNNs, (3) feature refinement of DNNs, (4) analyzing information discarding during the compression of DNNs, and (5) explaining effects of knowledge distillation in knowledge representations.
A total of five typical DNNs for image classification were used in our experiments, i.e. the AlexNet CNNImageNet , the VGG-16 VGG , and the ResNet-18, ResNet-34, ResNet-50 ResNet . These DNNs were learned using three benchmark datasets, which included the CUB200-2011 dataset CUB200 , the Stanford Dogs dataset StandfordDog , and the Pascal VOC 2012 dataset VOC . Note that both training images and testing images were cropped using bounding boxes of objects. We set for all experiments, except for feature reconstruction of AlexNet (we set for AlexNet features). It was because the shallow model of the AlexNet usually had significant noisy features, which caused considerable non-isomorphic components.
4.1 Network diagnosis based on knowledge isomorphism
The most direct application of knowledge isomorphism is to diagnose representation flaws hidden in DNNs. Let two DNNs be pre-trained for the same task, and one DNN significantly outperforms the other DNN. Knowledge isomorphism between the two DNNs can help diagnose representation flaws of the weak DNN, if we assume that the strong DNN has encoded ideal feature representations of the target task. The weak DNN may have the following two types of representation flaws.
Unreliable features: Our method disentangles feature components in the weak DNN, which cannot be reconstructed by features of the strong DNN. These components usually correspond to unreliable features in the weak DNN.
Blind spots: Our method disentangles feature components in the strong DNN, which are non-isomorphic to features of the weak DNN. These components usually reflect blind spots of the knowledge of the weak DNN.
For implementation, we learned DNNs for fine-grained classification using the CUB200-2011 dataset CUB200 (without data augmentation). We considered the AlexNet CNNImageNet as the weak DNN (56.97% top-1 accuracy), and took the ResNet-34 ResNet as the strong DNN (73.09% top-1 accuracy).
Please see Fig. 3. We diagnosed the output feature of the last convolutional layer in the AlexNet, which is termed . Accordingly, we selected the last feature map of the ResNet-34 (denoted by ) for the computation of knowledge isomorphism, because and had similar map sizes. We disentangled and visualized unreliable components from (i.e. non-isomorphic components in ). We also visualized components disentangled from (i.e. non-isomorphic components in ), which corresponded to blind spots of the weak DNN’s knowledge.
4.2 Stability of learning
The stability of learning DNNs is of considerable values in deep learning, i.e. examining whether or not all DNNs represent the same knowledge, when people repeatedly learn multiple DNNs for the same task. High knowledge isomorphism between DNNs usually indicates high learning stability.
More specifically, when we learn two DNNs with the same architecture for the same task. Then, we disentangled non-isomorphic feature components () from their features () of a specific layer, respectively. The non-isomorphic feature was quantified by the variance of feature element through different units of and through different input images , where denotes the -th element of given the image . We can use to measure the instability of learning DNNs and .
Case 1, learning DNNs from different initializations using the same training data: For each network architecture, we learned multiple networks using the CUB200-2011 dataset CUB200 . The instability of learning DNNs was reported as the average of instability over all pairs of neural networks.
Case 2, learning DNNs using different sets of training data: We randomly divided all training samples in the CUB200-2011 dataset CUB200 into two subsets, each containing 50% samples. For each network architecture, we learned two DNNs for fine-grained classification (without pre-training). The instability of learning DNNs was reported as .
Table 1 compares the instability of learning different DNNs. Table 2 reports the variance of isomorphic components of different orders. We found that the learning of shallow layers in DNNs was usually more stable than the learning of deep layers. The reason may be as follows. A DNN with more layers usually can represent more complex visual patterns, thereby needing more training samples. Without a huge dataset (e.g.
the ImageNet datasetCNNImageNet ), a deep network may be more likely to suffer from the over-fitting problem, i.e. DNNs with different initial parameters may learn different knowledge representations.
|Learning DNNs from different initializations|
|conv4 @ AlexNet||conv5 @ AlexNet||conv4-3 @ VGG-16||conv5-3 @ VGG-16||last conv @ ResNet-34|
|Learning DNNs using different training data|
|conv4 @ AlexNet||conv5 @ AlexNet||conv4-3 @ VGG-16||conv5-3 @ VGG-16||last conv @ ResNet-34|
|conv4 @ AlexNet||conv5 @ AlexNet||conv4-3 @ VGG-16||conv5-3 @ VGG-16||last conv @ ResNet-34|
4.3 Feature refinement based on knowledge isomorphism
Knowledge isomorphism can also be used to refine intermediate-layer features of pre-trained DNNs. Given multiple DNNs pre-trained for the same task, feature components, which are isomorphic to various DNNs, usually represent common knowledge and are reliable. Whereas, non-isomorphic feature components w.r.t. other DNNs usually correspond to unreliable knowledge or noises. In this way, intermediate-layer features can be refined by removing non-isomorphic components and exclusively using isomorphic components to accomplish the task.
More specifically, given two pre-trained DNNs, we use the feature of a certain layer in the first DNN to reconstruct the corresponding feature of the second DNN. The reconstructed feature is given as . In this way, we can replace the feature of the second DNN with the reconstructed feature , and then use to learn subsequent layers in the second DNN to boost performance.
In experiments, we learned DNNs with various architectures for image classification, including the VGG-16 VGG , the ResNet-18, the ResNet-34, and the ResNet-50 ResNet . We conducted the following two experiments, in which we used knowledge isomorphism to refine DNN features.
Experiment 1, removing unreliable features (noises): For each specific network architecture, we learned two DNNs using the CUB200-2011 dataset CUB200 with different parameter initializations. Isomorphic components were disentangled from the original feature of a DNN and then used for image classification. As discussed in Section 4.1, isomorphic components can be considered as refined features without noises.
Table 3 reports the increase of the classification accuracy by using the refined feature. For implementation, we used the refined features as input to finetune the pre-trained upper layers in the DNN for classification333Theoretically, we can also further finetune these upper layers of DNN and DNN during the evaluation of DNN and DNN , but due to the over-fitting problem, DNN and DNN had minimized the training loss to almost 0.. Note that isomorphic feature components were fixed during the further fine-tuning process to enable a fair comparison. It is because that if we allowed the classification loss to change isomorphic features during the fine-tuning process, it would be equivalent to adding more layers to DNN for classification; we needed to eliminate such effects for fair comparison. Our method slightly boosted the performance.
|VGG-16 conv4-3||VGG-16 conv5-2||ResNet-18||ResNet-34||ResNet-50|
Experiment 2, removing redundant features from pre-trained DNNs: A typical deep-learning methodology is to finetune a pre-trained DNN for a specific task. However, if the DNN is pre-trained for multiple tasks (including both the target and other tasks), then feature components pre-trained for other tasks are redundant for the target task and will affect the further finetuning process.
Therefore, we conducted three experiments, in which our method disentangled and removed redundant features w.r.t. the target task from the pre-trained DNN. In the first experiment (namely VOC-animal), we learned two DNNs to classify 20 object classes in the Pascal VOC 2012 dataset VOC . The goal was to use object images of six animal categories (bird, cat, cow, dog, horse, sheep) to finetune the DNN to classify animals. Let and to denote two corresponding intermediate-layer features from the two DNNs. Our method used to reconstruct . Then, the reconstructed result corresponded to reliable features for animals, while non-isomorphic components indicated features of other categories. We used to learn the classifier of animals. Like in the previous experiment, isomorphic feature components were fixed during the further learning of the animal classifier to enable a fair comparison, which avoided the learning process from benefitting from additional parameters in .
In comparison, the baseline method directly used either or to finetune the pre-trained DNN to classify the six animal categories.
In the second experiment (termed Mix-CUB), two original DNNs were learned using both the CUB200-2011 dataset CUB200 and the Stanford Dogs dataset StandfordDog to classify both 200 bird species and 120 dog species. Then, our method disentangled feature components for birds to learn a new fine-grained classifier for birds. The baseline method was implemented following the same setting as in VOC-animal. The third experiment (namely Mix-Dogs) was similar to Mix-CUB. In this experiment, our method disentangled dog features away from bird features to learn a new dog classifier. In all above experiments, original DNNs were learned from scratch without data augmentation.
Table 4 compares the classification accuracy of different methods. It shows that our method significantly alleviated the over-fitting problem and outperformed the baseline.
|VGG-16 conv4-3||VGG-16 conv5-2|
|Features from the network||51.55||44.44||15.15||51.55||44.44||15.15|
|Features from the network||50.80||45.93||15.19||50.80||45.93||15.19|
|Features from the network||37.65||31.93||14.20||39.42||30.91||12.96|
|Features from the network||37.22||32.02||14.28||35.95||27.74||12.46|
4.4 Analyzing information discarding of network compression
Network compression is an emerging research direction in recent years. Knowledge isomorphism between the compressed network and the original network can evaluate the discarding of knowledge during the compression process. I.e. people may visualize or analyze feature components in the original network, which are not isomorphic to features in the compressed network, to represent the discarded knowledge in the compressed network.
In experiments, we learned the VGG-16 using the CUB200-2011 dataset CUB200 for fine-grained classification. Then, we compressed the VGG-16 using the method of NetCompress with different pruning thresholds. We used features of the compressed DNN to reconstruct features of the original DNN. Then, non-isomorphic components disentangled from the original DNN usually corresponded to the knowledge discarding during the compression process. Fig. 4(left) visualizes the discarded feature components. Furthermore, we used (defined in Section 4.2) to quantify the information discarding. Fig. 4 compares the decrease of accuracy with the discarding of feature information.
4.5 Explaining knowledge distillation via knowledge isomorphism
As a generic tool, our method can also explain the success of knowledge distillation. In particular, Furlanello et al. BornAgain proposed a method to gradually refine a neural network via recursive knowledge distillation. I.e. this method recursively distills the knowledge of the current net to a new net with the same architecture and distilling the new net to an even newer net. The new(er) net is termed a born-again neural network and learned using both the task loss and the distillation loss. Surprisingly, such a recursive distillation process can substantially boost the performance of the neural network in various experiments.
In general, the net in a new generation both inherits knowledge from the old net and learns new knowledge from the data. The success of the born-again neural network can be explained as that knowledge representations of networks are gradually enriched during the recursive distillation process.
To verify this assertion, in experiments, we learned the VGG-16 using the CUB200-2011 dataset CUB200 for fine-grained classification. We learned born-again neural networks of another four generations444Because BornAgain did not clarify the distillation loss, we applied the distillation loss in distill following parameter settings in Apprentice , i.e. .. We disentangled feature components in the newest DNN, which were not isomorphic to an intermediate DNN. Non-isomorphic components were considered as blind spots of knowledge representations of the intermediate DNN and were quantified by . Fig. 4(right) shows of DNNs in the 1st, 2nd, 3rd, and 4th generations.
In this paper, we have proposed a generic definition of knowledge isomorphism between intermediate-layers of two DNNs. A task-agnostic and model-agnostic method is developed to disentangle and quantify isomorphic features of different orders from intermediate-layer features. Isomorphic feature components are usually more reliable than non-isomorphic components for the task, so our method can be used to further refine the pre-trained DNN without a need for additional supervision. As a mathematical tool, knowledge isomorphism can also help explain existing deep-learning techniques, and experiments have demonstrated the effectiveness of our method.
- (1) A. Achille and S. Soatto. Information dropout: learning optimal representations through noise. In Transactions on PAMI, 40(12):2897–2905, 2018.
- (2) D. Arpit, S. Jastrzebski, N. Ballas, D. Krueger, E. Bengio, M. S. Kanwal, T. Maharaj, A. Fischer, A. Courville, Y. Bengio, and S. Lacoste-Julien. A closer look at memorization in deep networks. In ICLR, 2017.
- (3) D. Bau, B. Zhou, A. Khosla, A. Oliva, and A. Torralba. Network dissection: Quantifying interpretability of deep visual representations. In CVPR, 2017.
- (4) X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In NIPS, 2016.
- (5) H. Cheng, D. Lian, S. Gao, and Y. Geng. Evaluating capability of deep neural networks for image classification via information plane. In ECCV, 2018.
- (6) L. Deutsch. Generating neural networkswith neural networks. In arXiv:1801.01952, 2018.
- (7) A. Dosovitskiy and T. Brox. Inverting visual representations with convolutional networks. In CVPR, 2016.
M. Everingham, S. M. A. Eslami, L. V. Gool, C. K. I. Williams, J. Winn, and
The pascal visual object classes challenge: A retrospective.
In International Journal of Computer Vision (IJCV), 111(1):98–136, 2015.
- (9) R. Fong and A. Vedaldi. Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. In CVPR, 2018.
- (10) R. C. Fong and A. Vedaldi. Interpretable explanations of black boxes by meaningful perturbation. In ICCV, 2017.
- (11) S. Fort, P. K. Nowak, and S. Narayanan. Stiffness: A new perspective on generalization in neural networks. In arXiv:1901.09491, 2019.
- (12) T. Furlanello, Z. Lipton, M. Tschannen, L. Itti, and A. Anandkumar. Born again neural networks. In ICML, 2018.
- (13) S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In ICLR, 2016.
- (14) K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
- (15) I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner. -vae: learning basic visual concepts with a constrained variational framework. In ICLR, 2017.
- (16) G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. In NIPS Workshop, 2014.
- (17) A. Khosla, N. Jayadevaprakash, B. Yao, and L. Fei-Fei. Novel dataset for fine-grained image categorization. In First CVPR Workshop on Fine-Grained Visual Categorization (FGVC), 2011.
- (18) P.-J. Kindermans, K. T. Schütt, M. Alber, K.-R. Müller, D. Erhan, B. Kim, and S. Dähne. Learning how to explain neural networks: Patternnet and patternattribution. In ICLR, 2018.
- (19) P. Koh and P. Liang. Understanding black-box predictions via influence functions. In ICML, 2017.
- (20) A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
- (21) H. Lakkaraju, E. Kamar, R. Caruana, and E. Horvitz. Identifying unknown unknowns in the open world: Representations and policies for guided exploration. In AAAI, 2017.
- (22) S. M. Lundberg and S.-I. Lee. A unified approach to interpreting model predictions. In NIPS, 2017.
- (23) A. Mahendran and A. Vedaldi. Understanding deep image representations by inverting them. In CVPR, 2015.
- (24) A. K. Mishra and D. Marr. Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy. 2018.
- (25) R. Novak, Y. Bahri, D. A. Abolafia, J. Pennington, and J. Sohl-Dickstein. Sensitivity and generalization in neural networks: An empirical study. In ICLR, 2018.
- (26) M. T. Ribeiro, S. Singh, and C. Guestrin. “why should i trust you?” explaining the predictions of any classifier. In KDD, 2016.
- (27) S. Sabour, N. Frosst, and G. E. Hinton. Dynamic routing between capsules. In NIPS, 2017.
- (28) R. Schwartz-Ziv and N. Tishby. Opening the black box of deep neural networks via information. In arXiv:1703.00810, 2017.
- (29) R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In ICCV, 2017.
- (30) M. Simon and E. Rodner. Neural activation constellations: Unsupervised part model discovery with convolutional networks. In ICCV, 2015.
- (31) K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
- (32) C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In arXiv:1312.6199v4, 2014.
- (33) C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The caltech-ucsd birds-200-2011 dataset. Technical report, In California Institute of Technology, 2011.
- (34) N. Wolchover. New theory cracks open the black box of deep learning. In Quanta Magazine, 2017.
- (35) A. Xu and M. Raginsky. Information-theoretic analysis of generalization capability of learning algorithms. In NIPS, 2017.
- (36) J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? In NIPS, 2014.
- (37) M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In ECCV, 2014.
- (38) C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals. Undersantding deep learning requires rethinking generalization. In ICLR, 2017.
- (39) Q. Zhang, W. Wang, and S.-C. Zhu. Examining cnn representations with respect to dataset bias. In AAAI, 2018.
- (40) Q. Zhang, Y. N. Wu, and S.-C. Zhu. Interpretable convolutional neural networks. In CVPR, 2018.
- (41) B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Object detectors emerge in deep scene cnns. In ICRL, 2015.
- (42) B. Zhou, Y. Sun, D. Bau, and A. Torralba. Interpretable basis decomposition for visual explanation. In ECCV, 2018.