1 Introduction
Interpreting the decisionmaking logic hidden inside neural networks is an emerging research direction in recent years. The visualization of neural networks and the extraction of pixellevel inputoutput correlations are two typical methodologies. However, previous studies usually interpret the knowledge inside a pretrained neural network from a global perspective. For example, [18, 15, 11] mined input units (dimensions or pixels) that the network output is sensitive to; [3] visualized receptive fields of filters in intermediate layers; [34, 16, 25, 6, 7, 21] illustrated image appearances that maximized the score of the network output, a filter’s response, or a certain activation unit in a feature map.
However, instead of visualizing the entire appearance that is responsible for a network output or an activation unit, we are more interested in the following questions.

How does a local input unit contribute to the network output? Here, we can vectorize the input of the network into a highdimensional vector, and we treat each dimension as a specific “unit” without ambiguity. As we know, a single input unit is usually not informative enough to make independent contributions to the network output. Thus, we need to clarify which other input units the target input unit collaborates with to constitute inference patterns of the neural network, so as to pass information to high layers.

Can we quantitatively measure the significance of above contextual collaborations between the target input unit and its neighboring units?
Method:
Therefore, given a pretrained convolutional neural network (CNN), we propose to disentangle contextual effects
w.r.t. certain input units.As shown in Fig. 1, we design two methods to interpret contextual collaborations at different scales, which are agnostic to the structure of CNNs. The first method estimates a rough region of contextual collaborations, i.e. clarifying whether the target input unit mainly collaborates with a few neighboring units or most units of the input. This method distills knowledge from the pretrained network into a mixture of local models (see Fig. 2), where each model encodes contextual collaborations within a specific input region to make predictions. We hope that the knowledgedistillation strategy can help people determine quantitative contributions from different regions. Then, given a model for local collaborations, the second method further analyzes the significance of detailed collaborations between each pair of input units, when we use the local model to make predictions on an image.
Application, explaining the alphaGo Zero model: The quantitative analysis of contextual collaborations w.r.t. a local input unit is of special values in some tasks. For example, explaining the alphaGo model [23, 8] is a typical application.
The alphaGo model contains a value network to evaluate the current state of the game—a high output score indicates a high probability of winning. As we know, the contribution of a single move (
i.e. placing a new stone on the Go board) to the output score during the game depends on contextual shapes on the Go board. Thus, disentangling explicit contextual collaborations that contribute to the output of the value network is important to understand the logic of each new move hidden in the alphaGo model.More crucially, in this study, we explain the alphaGo Zero model [8], which extends the scope of interests of this study from diagnosing feature representations of a neural network to a more appealing issue letting selfimproving AI teach people new knowledge. The alphaGo Zero model is pretrained via selfplay without receiving any prior knowledge from human experience as supervision. In this way, all extracted contextual collaborations represent the automatically learned intelligence, rather than human knowledge.
As demonstrated in wellknown Go competitions between the alphaGo and human players [2, 1], the automatically learned model sometimes made decisions that could not be explained by existing gaming principles. The visualization of contextual collaborations may provide new knowledge beyond people’s current understanding of the Go game.
Contributions of this paper can be summarized as follows.
(i) In this paper, we focus on a new problem, i.e. visualizing local contextual effects in the decisionmaking of a pretrained neural network w.r.t. a certain input unit.
(ii) We propose two new methods to extract contextual effects via diagnosing feature representations and knowledge distillation.
(iii) We have combined two proposed methods to explain the alphaGo Zero model, and experimental results have demonstrated the effectiveness of our methods.
2 Related work
Understanding feature representations inside neural networks is an emerging research direction in recent years. Related studies include 1) the visualization and diagnosis of network features, 2) disentangling or distilling network feature representations into interpretable models, and 3) learning neural networks with disentangled and interpretable features in intermediate layers.
Network visualization: Instead of analyzing network features from a global view [31, 20, 17], [3] defined six types of semantics for middlelayer feature maps of a CNN, i.e. objects, parts, scenes, textures, materials, and colors. Usually, each filter encodes a mixture of different semantics, thus difficult to explain.
Visualization of filters in intermediate layers is the most direct method to analyze the knowledge hidden inside a neural network. [34, 16, 25, 6, 33, 5, 35] showed the appearance that maximized the score of a given unit. [6] used upconvolutional nets to invert CNN feature maps to their corresponding images.
Pattern retrieval: Some studies retrieved certain units from intermediate layers of CNNs that were related to certain semantics, although the relationship between a certain semantics and each neural unit was usually convincing enough. People usually parallel the retrieved units similar to conventional midlevel features [26] of images. [38, 39] selected units from feature maps to describe “scenes”. [24] discovered objects from feature maps.
Model diagnosis and distillation: Modeldiagnosis methods, such as the LIME [18], the SHAP [15], influence functions [12], gradientbased visualization methods [7, 21], and [13] extracted image regions that were responsible for network outputs. [30, 37] distilled knowledge from a pretrained neural network into explainable models to interpret the logic of the target network. Such distillationbased network explanation is related to the first method proposed in this paper. However, unlike previous studies distilling knowledge into explicit visual concepts, our using distillation to disentangle local contextual effects has not been explored in previous studies.
Learning interpretable representations: A new trend is to learn networks with meaningful feature representations in intermediate layers [10, 27, 14] in a weaklysupervised or unsupervised manner. For example, capsule nets [19] and interpretable RCNN [32] learned interpretable middlelayer features. InfoGAN [4] and VAE [9] learned meaningful input codes of generative networks. [36] developed a loss to push each middlelayer filter towards the representation of a specific object part during the learning process without given part annotations.
All above related studies mainly focused on semantic meanings of a filter, an activation unit, a network output. In contrast, our work first analyzes quantitative contextual effects w.r.t. a specific input unit during the inference process. Clarifying explicit mechanisms of how an input unit contributes to the network output has special values in applications.
3 Algorithm
In the following two subsections, we will introduce two methods that extract contextual collaborations w.r.t. a certain input unit from a CNN at different scales. Then, we will introduce the application that uses the proposed methods to explain the alphaGo Zero model.
3.1 Determining the region of contextual collaborations w.r.t. an input unit
Since the input feature usually has a huge number of dimensions (units), it is difficult to accurately discover a few input units that collaborate with a target input unit. Therefore, it is important to first approximate the rough region of contextual collaborations before the unitlevel analysis of contextual collaborations, i.e. clarifying in which regions contextual collaborations are contained.
Given a pretrained neural network, an input sample, and a target unit of the sample, we propose a method that uses knowledge distillation to determine the region of contextual collaborations w.r.t. the target input unit. Let denote the input feature (e.g.
an image or the state in a Go board). Note that input features of most CNNs can be represented as a tensor
, where and indicate the height of the width of the input, respectively; is the channel number. We clip different lattices (regions) from the input tensor, and input units within the th lattice are given as , . Different lattices overlap with each other.The core idea is that we use a mixture of models to approximate the function of the given pretrained neural network (namely the teacher net), where each model is a student net and uses input information within a specific lattice to make predictions.
(1) 
where and denote the output of the pretrained teacher net and the output of the th student net , respectively. is a scalar weight, which depends on the input . Because different lattices within the input are not equally informative w.r.t. the target task, input units within different lattices make different contributions to final network output.
More crucially, given different inputs, the importance for the same lattice may also change. For example, as shown in [21], the head appearance is the dominating feature in the classification of animal categories. Thus, if a lattice corresponds to the head, then this lattice will contribute more than other lattices, thereby having a large weight . Therefore, our method estimates a specific weight for each input , i.e. is formulated as a function of (which will be introduced later).
Significance of contextual collaborations: Based on the above equation, the significance of contextual collaborations within each lattice w.r.t. an input unit can be measured as .
(2) 
where we revise the value of the target unit in the input and check the change of network outputs, and . If contextual collaborations w.r.t. the target unit mainly localize within the th lattice , then can be expected to contribute the most to the change of .
We conduct two knowledgedistillation processes to learn student nets and a model of determining , respectively.
Student nets: The first process distills knowledge from the teacher net to each student net with parameters based on the distillation loss , where the subscript indicates the output for the input . Considering that only contains partial information of , we do not expect to reconstruct without any errors.
Distilling knowledge to weights: Then, the second distillation process estimates a set of weights for each specific input . We use the following loss to learn another neural network with parameters to infer the weight.
(3) 
3.2 Finegrained contextual collaborations w.r.t. an input unit
In the above subsection, we introduce a method to distill knowledge of contextual collaborations into student nets of different regions. Given a student net, in this subsection, we develop an approach to disentangling from the student net explicit contextual collaborations w.r.t. a specific input unit , i.e. identifying which input unit collaborates with to compute the network output.
We can consider a student net as a cascade of functions of layers, i.e. (or for skip connections), where denotes the output feature of the th layer. In particular, and
indicate the input and output of the network, respectively. We only focus on a single scalar output of the network (we may handle different output dimensions separately if the network has a highdimensional output). If the sigmoid/softmax layer is the last layer, we use the score before the softmax/sigmoid operation as
to simplify the analysis.3.2.1 Preliminaries, the estimation of quantitative contribution
As preliminaries of our algorithm, we extend the technique of [22] to estimate the quantitative contribution of each neural activation in a feature map to the final prediction. We use to denote the contribution distribution of neural activations on the th layer . The score of the th element denotes the ratio of the unit ’s score contribution w.r.t. the entire network output score. Because is the scalar network output, it has a unit contribution . Then, we introduce how to backpropagate contributions to feature maps in low layers.
The method of contribution propagation is similar to network visualization based on gradient backpropagation [16, 33]. However, contribution propagation reflects more objective distribution of numerical contributions over , instead of biasedly boosting compacts of the most important activations.
Without loss of generality, in this paragraph, we use to simplify the notation of the function of a certain layer. If the layer is a convlayer or a fullyconnected layer, then we can represent the convolution operation for computing each elementary activation score of in a vectorized form^{2}^{2}2Please see the Appendix for details. . We consider as the numerical contribution of to . Thus, we can decompose the entire contribution of , , into elementary contributions of , i.e. , which satisfies (see the appendix for details). Then, the entire contribution of is computed as the sum of elementary contributions from all in the above layer, i.e. .
A cascade of a convlayer and a batchnormalization layer can be rewritten in the form of a single convlayer, where normalization parameters are absorbed into the convlayer
^{2}^{2}footnotemark: 2. For skip connections, a neural unit may receive contributions from different layers,. If the layer is a ReLU layer or a Pooling layer, the contribution propagation has the same formulation as gradient backpropagations of those layers
^{2}^{2}footnotemark: 2.3.2.2 The extraction of contextual collaborations
As discussed in [3], each neural activation of a middlelayer feature can be considered as the detection of a midlevel inference pattern. All input units must collaborate with neighboring units to activate some middlelayer feature units, in order to pass their information to the network output.
Therefore, in this research, we develop a method to
1. determine which midlevel patterns (or which neural activations ) the target unit constitutes;
2. clarify which input units help the target to constitute the midlevel patterns;
3. measure the strength of the collaboration between and .
Let and denote the feature map of a certain convlayer when the network receives input features with the target unit being activated and the feature map generated without being activated, respectively. In this way, we can use to represent the absolute effect of on the feature map . The overall contribution of the th neural unit depends on the activation score , , where measures the activation strength used for inference. The proportion of the contribution is affected by the target unit can be roughly formulated as .
(4) 
where and thus if , because negative activation scores of a convlayer cannot pass information through the following ReLU layer ( is not the feature map of the last convlayer before the network output).
In this way, highlights a few midlevel patterns (neural activations) related to the target unit . measures the contribution proportion that is affected by the target unit . We can use to replace and use techniques in Section 3.2.1 to propagate back to input units . Thus, represents a map of finegrained contextual collaborations w.r.t. . Each element in the map is given as ’s collaboration with .
We can understand the proposed method as follows. The relative activation change can be used as a weight to evaluate the correlation between and the th activation unit (inference pattern). In this way, we can extract input units that make great influences on ’s inference patterns, rather than affect all inference patterns. Note that both and may either increase or decrease the value of . It means that the contextual unit may either boost ’s effects on the inference pattern, or weaken ’s effects.
3.3 Application: explaining the alphaGo Zero model
We use the ELF OpenGo [29, 28] as the implementation of the alphaGo Zero model. We combine the above two methods to jointly explain each move’s logic hidden in the value net of the alphaGo Zero model during the game. As we know, the alphaGo Zero model contains a value net, policy nets, and the module of the MonteCarlo Tree Search (MCTS). Generally speaking, the superior performance of the alphaGo model greatly relies on the enumeration power of the policy net and the MCTS, but the value net provides the most direct information about how the model evaluates the current state of the game. Therefore, we explain the value net, rather than the policy net or the MCTS. In the ELF OpenGo implementation, the value net is a residual network with 20 residual blocks, each containing two convlayers. We take the scalar output^{3}^{3}3The value net uses the current state, as well as seven most recent states, to output eight values for the eight states. To simplify the algorithm, we take the value corresponding to the current state as the target value. before the final (sigmoid) layer as the target value to evaluate the current state on the Go board.
Given the current move of the game, our goal is to estimate unitlevel contextual collaborations w.r.t. the current move. I.e. we aim to analyze which neighboring stones and/or what global shapes help the current move make influences to the game. We distill knowledge from the value net to student networks to approximate contextual collaborations within different regions. Then, we estimate unitlevel contextual collaborations based on the student net.
Determining local contextual collaborations: We design two types of student networks, which receive lattices at the scales of and , respectively. In this way, we can conduct two distillation processes to learn neural networks that encode contextual collaborations at different scales.
As shown in Fig. 2, we have four student nets oriented to lattices. Except for the output, the four student nets have the same network structure as the value net. The four student nets share parameters in all layers. The input of a student net only has two channels corresponding to maps of white stones and black stones, respectively, on the Go board. We crop four overlapping lattices at the four corners of the Go board for both training and testing. Note that we rotate the board state within each lattice to make the topleft position corresponds to the corner of the board, before we input to the student net. The neural network has the same settings as the value net. receives a concatenation of as the input. outputs four scalar weights for the four local student networks . We learn via knowledge distillation.
Student nets for lattices have similar settings as those for lattices. We divide the entire Go board into overlapping lattices. Nine student nets encode local knowledge from nine local lattices. We learn another neural network , which uses a concatenation of to weight for the nine local lattices.
Finally, we select the most relevant lattice and the most relevant lattice, via , for explanation.
Estimating unitlevel contextual collaborations: In order to obtain finegrained collaborations, we apply the method in Section 3.2.2 to explain two student nets corresponding to the two selected relevant lattices. We also use our method to explain the value net. We compute a map of contextual collaborations for each neural network and normalize values in the map. We sum up maps of the three networks together to obtain the final map of contextual collaborations .
More specifically, given a neural network, we use the feature of each convlayer to compute the initial in Equation (4) and propagated to obtain a map of collaborations . We sum up maps based on the 1st, 3rd, 5th, and 7th convlayers to obtain the collaboration map of the network.
4 Experiments
In experiments, we distilled knowledge of the value network to student nets, and disentangled finegrained contextual collaborations w.r.t. each new move. We compared the extracted contextual collaborations and human explanations for the new move to evaluate the proposed method.
4.1 Evaluation metric
In this section, we propose two metrics to evaluate the accuracy of the extracted contextual collaborations w.r.t. the new move. Note that considering the high complexity of the Go game, there is no exact groundtruth explanation for contextual collaborations. Different Go players usually have different analysis of the same board state. More crucially, as shown in competitions between the alphaGo and human players [2, 1], the knowledge encoded in the alphaGo was sometimes beyond humans’ current understanding of the Go game and could not be explained by existing gaming principles.
In this study, we compared the similarity between the extracted contextual collaborations and humans’ analysis of the new move. The extracted contextual collaborations were just rough explanations from the perspective of the alphaGo. We expected these collaborations to be close to, but not exactly the same as human understanding. More specifically, we invited Go players who had obtained fourdan grading rank to label contextual collaborations. To simplify the metric, Go players were asked to label a relative strength value of the collaboration between each stone and the target move (stone), no matter whether the relationship between the two stones was collaborative or adversarial. Considering the doubleblind policy, the paper will introduce the Go players if the paper is accepted.
Let be a set of existing stones except for the target stone on the Go board. denotes the labeled collaboration strength between each stone and the target stone . is referred to as the collaboration strength estimated by our method, where denotes the final estimated collaboration value on the stone . We normalized the collaboration strength, , and computed the Jaccard similarity between the distribution of and the distribution of as the similarity metric.
In addition, considering the great complexity of the Go game, different Go players may annotate different contextual collaborations. Therefore, we also required Go players to provide a subjective rating for the extracted contextual collaborations of each board state, i.e. selecting one of the five ratings: 1Unacceptable, 2Problematic, 3Acceptable, 4Good, and 5Perfect.
4.2 Experimental results and analysis
Fig. 3 shows the significance of the extracted contextual collaborations, as well as possible explanations for contextual collaborations, where the significance of the stone ’s contextual collaboration was reported as the absolute collaboration strength instead of the original score in experiments. Without loss of generality, let us focus on the winning probability of the black. Considering the complexity of the Go game, there may be two cases of a positive (or negative) value of the collaboration score . The simplest case is that when a white stone had a negative value of , it means that the white stone decreased the winning probability of the black. However, sometimes a white stone had a positive . It may be because that this white stone did not sufficiently exhibit its power due to its contexts. Since the white and the white usually had a very similar number of stones in the Go board, putting a relatively ineffective white stone in a local region also wasted the opportunity of winning advantages in other regions in the zerosum game. Similarly, the black stone may also have either a positive or a negative value of .
The Jaccard similarity between the extracted collaborations and the manuallyannotated collaborations was 0.3633. Nevertheless, considering the great diversity of explaining the same game state, the average rating score that was made by Go players for the extracted collaborations was 3.7 (between 3Acceptable and 4Good). Please see the appendix for more results.
5 Conclusion and discussions
In this paper, we have proposed two typical methods for quantitative analysis of contextual collaborations w.r.t. a certain input unit in the decisionmaking of a neural network. Extracting finegrained contextual collaborations to clarify the reason why and how an input unit passes its information to the network output is of significant values in specific applications, but it has not been well explored before, to the best of our knowledge. In particular, we have applied our methods to the alphaGo Zero model, in order to explain the potential logic hidden inside the model that is automatically learned via selfplay without human annotations. Experiments have demonstrated the effectiveness of the proposed methods.
Note that there is no exact groundtruth for contextual collaborations of the Go game, and how to evaluate the quality of the extracted contextual collaborations is still an open problem. As a pioneering study, we do not require the explanation to be exactly fit human logics, because human logic is usually not the only correct explanations. Instead, we just aim to visualize contextual collaborations without manually pushing visualization results towards humaninterpretable concepts. This is different from some previous studies of network visualization [16, 33] that added losses as the natural image prior, in order to obtain beautiful but biased visualization results. In the future, we will continue to cooperate with professional Go players to further refine the algorithm to visualize more accurate knowledge inside the alphaGo Zero model.
References
 [1] Alphago’s designers explore new ai after winning big in china. In CADE METZ, BUSINESS, 20170527.
 [2] Artificial intelligence: Google’s AlphaGo beats Go master Lee Sedol, Retrieved 17 March 2016.
 [3] D. Bau, B. Zhou, A. Khosla, A. Oliva, and A. Torralba. Network dissection: Quantifying interpretability of deep visual representations. In CVPR, 2017.
 [4] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In NIPS, 2016.
 [5] Y. Dong, H. Su, J. Zhu, and F. Bao. Towards interpretable deep neural networks by leveraging adversarial examples. In arXiv:1708.05493, 2017.
 [6] A. Dosovitskiy and T. Brox. Inverting visual representations with convolutional networks. In CVPR, 2016.
 [7] R. C. Fong and A. Vedaldi. Interpretable explanations of black boxes by meaningful perturbation. In arXiv:1704.03296v1, 2017.
 [8] D. Hassabis and D. Silver. Alphago zero: Learning from scratch. In deepMind official website, 18 October 2017.
 [9] I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner. vae: learning basic visual concepts with a constrained variational framework. In ICLR, 2017.
 [10] Z. Hu, X. Ma, Z. Liu, E. Hovy, and E. P. Xing. Harnessing deep neural networks with logic rules. In arXiv:1603.06318v2, 2016.
 [11] P.J. Kindermans, K. T. Schütt, M. Alber, K.R. Müller, D. Erhan, B. Kim, and S. Dähne. Learning how to explain neural networks: Patternnet and patternattribution. In ICLR, 2018.
 [12] P. Koh and P. Liang. Understanding blackbox predictions via influence functions. In ICML, 2017.

[13]
D. Kumar, A. Wong, and G. W. Taylor.
Explaining the unexplained: A classenhanced attentive response
(clear) approach to understanding deep neural networks.
In CVPR Workshop on Explainable Computer Vision and Job Candidate Screening Competition
, 2017.  [14] R. Liao, A. Schwing, R. Zemel, and R. Urtasun. Learning deep parsimonious representations. In NIPS, 2016.
 [15] S. M. Lundberg and S.I. Lee. A unified approach to interpreting model predictions. In NIPS, 2017.
 [16] A. Mahendran and A. Vedaldi. Understanding deep image representations by inverting them. In CVPR, 2015.
 [17] P. E. Rauber, S. G. Fadel, A. X. F. ao, and A. C. Telea. Visualizing the hidden activity of artificial neural networks. In Transactions on PAMI, 23(1):101–110, 2016.

[18]
M. T. Ribeiro, S. Singh, and C. Guestrin.
“why should i trust you?” explaining the predictions of any classifier.
In KDD, 2016.  [19] S. Sabour, N. Frosst, and G. E. Hinton. Dynamic routing between capsules. In NIPS, 2017.
 [20] R. SchwartzZiv and N. Tishby. Opening the black box of deep neural networks via information. In arXiv:1703.00810, 2017.
 [21] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. Gradcam: Visual explanations from deep networks via gradientbased localization. In ICCV, 2017.
 [22] A. Shrikumar, P. Greenside, A. Y. Shcherbina, and A. Kundaje. Not just a black box: Interpretable deep learning by propagating activation differences. in arXiv:1605.01713, 2016.
 [23] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, Madeleine, Leach, Koray, Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of go with deep neural networks and tree search. In Nature, 529(7587):484––489, 2016.
 [24] M. Simon and E. Rodner. Neural activation constellations: Unsupervised part model discovery with convolutional networks. In ICCV, 2015.
 [25] K. Simonyan, A. Vedaldi, and A. Zisserman. Deep inside convolutional networks: visualising image classification models and saliency maps. In arXiv:1312.6034, 2013.
 [26] S. Singh, A. Gupta, and A. A. Efros. Unsupervised discovery of midlevel discriminative patches. In ECCV, 2012.
 [27] A. Stone, H. Wang, Y. Liu, D. S. Phoenix, and D. George. Teaching compositionality to cnns. In CVPR, 2017.
 [28] Y. Tian, Q. Gong, W. Shang, Y. Wu, and C. L. Zitnick. Elf: An extensive, lightweight and flexible research platform for realtime strategy games. In Advances in Neural Information Processing Systems, pages 2656–2666, 2017.
 [29] Y. Tian, Jerry Ma*, Qucheng Gong*, S. Sengupta, Z. Chen, and C. L. Zitnick. Elf opengo. https://github.com/pytorch/ELF, 2018.
 [30] J. Vaughan, A. Sudjianto, E. Brahimi, J. Chen, and V. N. Nair. Explainable neural networks based on additive index models. in arXiv:1806.01933, 2018.

[31]
N. Wolchover.
New theory cracks open the black box of deep learning.
In Quanta Magazine, 2017.  [32] T. Wu, X. Li, X. Song, W. Sun, L. Dong, and B. Li. Interpretable rcnn. In arXiv:1711.05226, 2017.
 [33] J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson. Understanding neural networks through deep visualization. In ICML Deep Learning Workshop, 2015.
 [34] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In ECCV, 2014.
 [35] J. Zhang, Z. Lin, J. Brandt, X. Shen, and S. Sclaro. Topdown neural attention by excitation backprop. in ECCV, 2016.
 [36] Q. Zhang, Y. N. Wu, and S.C. Zhu. Interpretable convolutional neural networks. In CVPR, 2018.
 [37] Q. Zhang, Y. Yang, Y. Liu, Y. N. Wu, and S.C. Zhu. Unsupervised learning of neural networks to explain neural networks. in arXiv:1805.07468, 2018.
 [38] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Object detectors emerge in deep scene cnns. In ICRL, 2015.

[39]
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba.
Learning deep features for discriminative localization.
In CVPR, 2016.
Supplementary materials for the contribution propagation
Let denote the convolutional operation of a convlayer. We can rewrite the this equation in a vectorized form as , , . For each output element , . If the convlayer is a fullyconnected layer, then each element corresponds to an element in . Otherwise, is a sparse matrix, i.e. if and are too far way to be covered by the convolutional filter.
Thus, we can write to simplify the notation. Intuitively, we can propagate the contribution of to its compositional elements based on their numerical scores. Note that we only consider the case of , because if , cannot pass information through the ReLU layer, and we obtain and thus . In particular, when , all compositional scores just contribute an activation score , thereby receiving a total contribution of . When , we believe the contribution of all comes from elements of , and each element’s contribution is given a . Thus, we get
When a batchnormalization layer follows a convlayer, then the function of the two cascaded layers can be written as
Thus, we can absorb parameters for the batch normalization into the convlayer, i.e. and .
For ReLU layers and Pooling layers, the formulation of the contribution propagation is identical to the formulation for the gradient backpropagation, because the gradient backpropagation and the contribution propagation both pass information to neural activations that are used during the forward propagation.
More results
Considering the great complexity of the Go game, there do not exist groundtruth annotations for the significance of contextual collaborations. Different Go players may have different understanding of the same Go board state, thereby annotating different heat maps for the significance of contextual collaborations. More crucially, our results reflect the logic of the automaticallylearned alphaGo Zero model, rather than the logic of humans.
Therefore, in addition to manual annotations of collaboration significance, we also require Go players to provide a subjective evaluation for the extracted contextual collaborations.
We compared the extracted contextual collaborations at different scales (the second, third, fourth, and fifth columns) with annotations made by Go players.
We compared the extracted contextual collaborations at different scales (the second, third, fourth, and fifth columns) with annotations made by Go players.
Contextual collaborations of local regions
We show the significance of contextual collaborations within a local lattice. The score for the th lattice is reported as .
Comments
There are no comments yet.