Graph Neural Networks (GNNs) (Kipf and Welling, ; Velickovic et al., ; Zhang and Chen, ) have achieved great practical successes in many real-world applications, such as chemistry (Pires et al., 2015), molecular biology (Huber et al., 2007), social networks (Cho et al., 2011) and epidemic modelling (Simon et al., 2011). For most of these applications, explaining predictions made by a GNN model is crucial for establishing trust with end-users, identifying the cause of a prediction, and even discovering potential deficiencies of a GNN model before massive deployment. Ideally, an explanation should be able to answer questions like “Would the prediction of the GNN model change if a certain part of an input molecule is removed?” in the context of predicting whether an artificial molecule is active for a certain type of proteins Jiang et al. (2020); Xiong et al. (2021), “Would an item recommended still be recommended if a customer had not purchased some other items in the past?” for a GNN built for recommendation systems Fan et al. (2019); Yin et al. (2019).
Counterfactual explanations Moraffah et al. (2020) in the form of “If X had not occurred, Y would not have occurred” Molnar (2019) are the principled way to answer such questions and thus are highly desirable for GNNs. In the context of GNNs, a counterfactual explanation identifies a small subset of edges of the input graph instance such that removing those edges significantly changes the prediction made by the GNN. Counterfactual explanations are usually concise and easy to understand Moraffah et al. (2020); Sokol and Flach (2019) because they align well with the human intuition to describe a causal situation Molnar (2019). To make explanations more trustworthy, the counterfactual explanation should be robust to noise, that is, some slight changes on an input graph do not change the explanation significantly.
How to produce robust counterfactual explanations on predictions made by general graph neural networks is a novel problem that has not been systematically studied before. As to be discussed in Section 2, most GNN explanation methods (Ying et al., 2019; Luo et al., ; Yuan et al., 2020; Velickovic et al., ; Pope et al., 2019) are neither counterfactual nor robust. These methods mostly focus on identifying a subgraph of an input graph that achieves a high correlation with the prediction result. Such explanations are usually not counterfactual because, due to the high non-convexity of GNNs, removing a subgraph that achieves a high correlation does not necessarily change the prediction result. Moreover, many existing methods Ying et al. (2019); Luo et al. ; Velickovic et al. ; Pope et al. (2019) are not robust to noise and may change significantly upon slight modifications on input graphs, because the explanation of every single input graph prediction is independently optimized to maximize the correlation with the prediction, thus an explanation can easily overfit the noise in the data.
In this paper, we develop RCExplainer, a novel method to produce robust counterfactual explanations on GNNs. The key idea is to first model the common decision logic of a GNN by set of decision regions where each decision region governs the predictions on a large number of graphs, and then extract robust counterfactual explanations by a deep neural network that explores the decision logic carried by the linear decision boundaries of the decision regions. We make the following contributions.
First, we model the decision logic of a GNN by a set of decision regions, where each decision region is induced by a set of linear decision boundaries of the GNN. We propose an unsupervised method to find decision regions for each class such that each decision region governs the prediction of multiple graph samples predicted to be the same class. The linear decision boundaries of the decision region capture the common decision logic on all the graph instances inside the decision region, thus do not easily overfit the noise of an individual graph instance. By exploring the common decision logic encoded in the linear boundaries, we are able to produce counterfactual explanations that are inherently robust to noise.
Second, based on the linear boundaries of the decision region, we propose a novel loss function to train a neural network that produces a robust counterfactual explanation as a small subset of edges of an input graph. The loss function is designed to directly optimize the explainability and counterfactual property of the subset of edges, such that: 1) the subgraph induced by the edges lies within the decision region, thus has a prediction consistent with the input graph; and 2) deleting the subset of edges from the input graph produces a remainder subgraph that lies outside the decision region, thus the prediction on the remainder subgraph changes significantly.
Last, we conduct comprehensive experimental study to compare our method with the state-of-the-art methods on fidelity, robustness, accuracy and efficiency. All the results solidly demonstrate the superior performance of our approach.
2 Related work
The existing GNN explanation methods Yuan et al. (2020); Velickovic et al. ; Ying et al. (2019); Pope et al. (2019); Luo et al. generally fall into two categories: model level explanation Yuan et al. (2020) and instance level explanation Velickovic et al. ; Ying et al. (2019); Pope et al. (2019); Luo et al. .
A model level explanation method Yuan et al. (2020) produces a high-level explanation about the general behaviors of a GNN independent from input examples. This may be achieved by synthesizing a set of artificial graph instances such that each artificial graph instance maximizes the prediction score on a certain class. The weakness of model level explanation methods is that an input graph instance may not contain an artificial graph instance, and removing an artificial graph from an input graph does not necessarily change the prediction. As a result, model level explanations are substantially different from counterfactual explanations, because the synthesized artificial graphs do not provide insights into how the GNN makes its prediction on a specific input graph instance.
The instance level explanation methods Velickovic et al. ; Ying et al. (2019); Pope et al. (2019); Luo et al. explain the prediction(s) made by a GNN on a specific input graph instance or multiple instances by identifying a subgraph of an input graph instance that achieves a high correlation with the prediction on the input graph. GNNExplainer (Ying et al., 2019) removes redundant edges from an input graph instance to produce an explanation that maximizes the mutual information between the distribution of subgraphs of the input graph and the GNN’s prediction. Following the same idea by Ying et al. (2019), PGExplainer (Luo et al., ) parameterizes the generation process of explanations by a deep neural network, and trains it to maximize a similar mutual information based loss used by GNNExplainer (Ying et al., 2019). The trained deep neural network is then applied to generate explanations for a single input graph instance or a group of input graphs. MEG Numeroso and Bacciu (2021)
incorporates strong domain knowledge in chemistry with a reinforcement learning framework to produce counterfactual explanations on GNNs specifically built for compound prediction, but the heavy reliance on domain knowledge largely limits its applicability on general GNNs.
Some studies Pope et al. (2019); Velickovic et al. also adapt the existing explanation methods of image-oriented deep neural networks to produce instance level explanations for GNNs. Pope et al. (Pope et al., 2019) extend several gradient based methods Selvaraju et al. (2017); Simonyan et al. (2014); Zhang et al. (2018) to explain predictions made by GNNs. The explanations are prone to gradient saturation (Glorot and Bengio, 2010) and may also be misleading (Adebayo et al., 2018) due to the heavy reliance on noisy gradients. Velickovic et al. Velickovic et al. extend the attention mechanism Denil et al. (2017); Duan et al. (2017) to identify the nodes in an input graph that contribute the most to the prediction. This method has to retrain the GNN with the altered architecture and the inserted attention layers. Thus, the explanations may not be faithful to the original GNN.
Instance level explanations are usually not counterfactual because, due to the non-convexity of GNNs, removing an explanation subgraph from the input graph does not necessarily change the prediction result. Moreover, those methods Ying et al. (2019); Luo et al. ; Velickovic et al. ; Pope et al. (2019) are usually not robust to noise because the explanation of every single input graph prediction is independently optimized. Thus, an explanation can easily overfit the noise inside input graphs and may change significantly upon slight modifications on input graphs.
To tackle the weaknesses in the existing methods, in this paper, we directly optimize the counterfactual property of an explanation. Our explanations are also much more robust to modifications on input graphs, because they are produced from the common decision logic on a large group of similar input graphs, which do not easily overfit the noise of an individual graph sample.
Please note that our study is substantially different from adversarial attacks on GNNs. The adversarial attacking methods Zügner and Günnemann (2019a); Zügner et al. (2018); Xu et al. (2020, 2019); Jin and Zhang (2019) and the most recent CF-GNNExplainer Lucic et al. (2021) use adversarial examples as explanations and only focus on changing the predicted labels of GNNs, but totally ignore the explainability of the generated adversarial examples Freiesleben (2020). Thus, the adversarial examples generated by adversarial attacks do not align well with the human intuition. On the contrary, our method directly optimizes the explainability of an explanation and requires that the subgraph induced by the explanation lies within the decision region at a large distance from the decision boundaries. We also require that the explanation is generally valid for a large set of similar graph instances by extracting it from the common linear decision boundaries of a large decision region.
3 Problem Formulation
Denote by a graph where is the set of nodes and is the set of edges. The edge structure of a graph is described by an adjacency matrix , where if there is an edge between node and ; and otherwise.
a GNN model that maps a graph to a probability distribution over a set of classes denoted by. Let denote the set of graphs that are used to train the GNN model
. We focus on GNNs that adopt piecewise linear activation functions, such as MaxOut(Goodfellow et al., 2013)
and the family of ReLU(Glorot et al., 2011; He et al., 2015; Nair and Hinton, 2010).
The robust counterfactual explanation problem is defined as follows.
Definition 1 (Robust Counterfactual Explanation Problem)
Given a GNN model trained on a set of graphs , for an input graph , our goal is to explain why is predicted by the GNN model as by identifying a small subset of edges , such that (1) removing the set of edges in from changes the prediction on the remainder of significantly; and (2) is stable with respect to slight changes on the edges of and the feature representations of the nodes of .
In the definition, the first requirement requires that the explanation is counterfactual, and the second requirement requires that the explanation is robust to noisy changes on the edges and nodes of .
In this section, we first introduce how to extract the common decision logic of a GNN on a large set of graphs with the same predicted class. This is achieved by a decision region induced by a set of linear decision boundaries of the GNN. Then, based on the linear boundaries of the decision region, we propose a novel loss function to train a neural network that produces robust counterfactual explanations. Last, we discuss the time complexity of our method when generating explanations.
4.1 Modelling Decision Regions
Following the routines of many deep neural network explanation methods (Selvaraju et al., 2017; Zeiler and Fergus, 2014), we extract the decision region of a GNN in the -dimensional output space of the last convolution layer of the GNN. Because the features generated by the last convolution layer are more conceptually meaningful and more robust to noise than those raw features of input graphs, such as vertices and edges Zügner and Günnemann (2019b); Bojchevski and Günnemann (2019). Denote by the mapping function realized by the graph convolution layers that maps an input graph to its graph embedding , and by the mapping function realized by the fully connected layers that maps the graph embedding to a predicted distribution over the classes in . The overall prediction made by the GNN can be written as
For the GNNs that adopt piecewise linear activation functions for the hidden neurons, such as MaxOut(Goodfellow et al., 2013) and the family of ReLU (Glorot et al., 2011; He et al., 2015; Nair and Hinton, 2010), the decision logic of in the space
is characterized by a piecewise linear decision boundary formed by connected pieces of decision hyperplanes in(Adebayo et al., 2018). We call these hyperplanes linear decision boundaries (LDBs), and denote by the set of LDBs induced by . The set of LDBs in partitions the space into a large number of convex polytopes. A convex polytope is formed by a subset of LDBs in . All the graphs whose graph embeddings are contained in the same convex polytope are predicted as the same class Chu et al. (2018). Therefore, the LDBs of a convex polytope encode the common decision logic of on all the graphs whose graph embeddings lie within the convex polytope Chu et al. (2018). Here, a graph is covered by a convex polytope if the graph embedding is contained in the convex polytope.
Based on the above insight, we model the decision region for a set of graph instances as a convex polytope that satisfies the following two properties. First, the decision region should be induced by a subset of the LDBs in . In this way, when we extract counterfactual explanations from the LDBs, the explanations are loyal to the real decision logic of the GNN. Second, the decision region should cover many graph instances in the training dataset , and all the covered graphs should be predicted as the same class. In this way, the LDBs of the decision region capture the common decision logic on all the graphs covered by the decision region. Here, the requirement of covering a larger number of graphs ensures that the common decision logic is general, and thus it is less likely to overfit the noise of an individual graph instance. As a result, the counterfactual explanations extracted from the LDBs of the decision region are insensitive to slight changes in the input graphs. Our method can be easily generalized to incorporate prediction confidence in the coverage measure, such as considering the count of graphs weighted by prediction confidence. To keep our discussion simple, we do not pursue this detail further in the paper.
Next, we illustrate how to extract a decision region satisfying the above two requirements. The key idea is to find a convex polytope covering a large set of graph instances in that are predicted as the same class .
Denote by the set of graphs in predicted as a class , by a set of LDBs that partition the space into a set of convex polytopes, and by the convex polytope induced by that covers the largest number of graphs in . Denote by the number of graphs in covered by , and by the number of graphs in that are covered by but are not predicted as class . We extract a decision region covering a large set of graph instances in by solving the following constrained optimization problem.
This formulation realizes the two properties of decision regions because ensures that the decision region is induced by a subset of LDBs in , maximizing requires that covers a large number of graphs in , and the constraint ensures that all the graphs covered by are predicted as the same class .
Once we find a solution to the above problem, the decision region can be easily obtained by first counting the number of graphs in covered by each convex polytope induced by , and then select the convex polytope that covers the largest number of graphs in .
4.2 Extracting Decision Regions
The optimization problem in Equation (1) is intractable for standard GNNs, mainly because it is impractical to compute , all the LDBs of a GNN. The number of LDBs in of a GNN is exponential with respect to the number of neurons in the worst case Montúfar et al. (2014). To address this challenge, we substitute by a sample of LDBs from .
A LDB in the space can be written as , where is is a variable, is the basis term, and corresponds to the bias. Following (Chu et al., 2018), for any input graph , a linear boundary can be sampled from by computing
are the largest and the second largest values in the vector, respectively. Given an input graph , Equations (2) and (3) identify one LDB from . Thus, we can sample a subset of input graphs uniformly from , and use Equations (2) and (3) to derive a sample of LDBs as .
Now, we substitute in Equation (1) by to produce the following problem.
where is a tolerance parameter to keep this problem feasible. The parameter is required because substituting by ignores the LDBs in . Thus, the convex polytope induced by subset of boundaries in may contain instances that are not predicted as class . We directly set , which is the smallest value of that keeps the practical problem feasible.
The problem in Equation (4) can be proven to be a Submodular Cost Submodular Cover (SCSC) problem (Iyer and Bilmes, ) (see Appendix D for proof) that is well known to be NP-hard (Crawford et al., 2019). We adopt a greedy boundary selection method to find a good solution to this problem (Wolsey, 1982). Specifically, we initialize as an empty set, and then iteratively select a new boundary from by
where is the decrease of when adding into , and is the decrease of when adding into . Both and are non-increasing when adding into because adding a new boundary may only exclude some graphs from the convex polytope .
Intuitively, in each iteration, Equation (5) selects a boundary such that adding into reduces the least and reduces the most. In this way, we can quickly reduce to be smaller than without decreasing too much, which produces a good feasible solution to the practical problem. We add a small constant to the numerator such that, when there are multiple candidates of that do not decrease , we can still select the that reduces the most.
We apply a peeling-off strategy to iteratively extract multiple decision regions. For each class , we first solve the practical problem once to find a decision region , then we remove the graphs covered by from . If there are remaining graphs predicted as the class , we continue finding the decision regions using the remaining graphs until all the graphs in are removed. When all the graphs in are removed for each class , we stop the iteration and return the set of decision regions we found.
4.3 Producing Explanations
In this section, we introduce how to use the LDBs of decision regions to train a neural network that produces a robust counterfactual explanation as a small subset of edges of an input graph. We form explanations as a subset of edges because GNNs make decisions by aggregating messages passed on edges. Using edges instead of vertices as explanations can provide better insights on the decision logic of GNNs.
4.3.1 The Neural Network Model
Denote by the neural network to generate a subset of edges of an input graph as the robust counterfactual explanation on the prediction . represents the set of parameters of the neural network. For experiments, our explanation network consists of 2 fully connected layers with a ReLU activation and the hidden dimension of 64.
For any two connected vertices and of , denote by and the embeddings produced by the last convolution layer of the GNN for the two vertices, respectively. The neural network takes and
as the input and outputs the probability for the edge betweenand to be part of the explanation. This can be written as
where denotes the probability that the edge between and is contained in the explanation. When there is no edge between and , that is, , we set .
For an input graph with vertices and a trained neural network , is an -by- matrix that carries the complete information to generate a robust counterfactual explanation as a subset of edges, denoted by . Concretely, we obtain by selecting all the edges in whose corresponding entries in are larger than 0.5.
4.3.2 Training Model
For an input graph , denote by the subset of edges produced by to explain the prediction , our goal is to train a good model such that the prediction on the subgraph induced by from is consistent with ; and deleting the edges in from produces a remainder subgraph such that the prediction on changes significantly from .
Since producing by is a discrete operation that is hard to incorporate in an end-to-end training process, we define two proxy graphs to approximate and , respectively, such that the proxy graphs are determined by through continuous functions that can be smoothly incorporated into an end-to-end training process.
The proxy graph of , denoted by , is defined by regarding instead of as the adjacency matrix. That is, has exactly the same graph structure as , but the edge weights of is given by the entries in instead of . Here, the subscript means is determined by .
The proxy graph of , denoted by , also have the same graph structure as , but the edge weight between each pair of vertices and is defined as
The edge weights of both and are determined by through continuous functions, thus we can smoothly incorporate and into an end-to-end training framework.
As discussed later in this section, we use a regularization term to force the value of each entry in to be close to either 0 or 1, such that and better approximate and respectively.
We formulate our loss function as
where , and
are the hyperparameters controlling the importance of each term. The influence of these parameters is discussed in AppendixG. The first term of our loss function requires that the prediction of the GNN on is consistent with the prediction on . Intuitively, this means that the edges with larger weights in dominate the prediction on . We formulate this term by requiring to be covered by the same decision region covering .
Denote by the set of LDBs that induce the decision region covering , and by the number of LDBs in . For the -th LDB , denote by , where and are the basis and bias of , respectively, and is a point in the space . The sign of indicates whether a point lies on the positive side or the negative side of , and the absolute value is proportional to the distance of a point from . Denote by
the standard sigmoid function, we formulate the first term of our loss function as
such that minimizing encourages the graph embeddings and to lie on the same side of every LDB in . Thus, is encouraged to be covered by the same decision region covering .
The second term of our loss function optimizes the counterfactual property of the explanations by requiring the prediction on to be significantly different from the prediction on . Intuitively, this means that the set of edges with larger weights in are good counterfactual explanations because reducing the weights of these edges significantly changes the prediction. Following the above intuition, we formulate the second term as
such that minimizing encourages the graph embeddings and to lie on the opposite sides of at least one LDB in . This further means that is encouraged not to be covered by the decision region covering , thus the prediction on can be changed significantly from the prediction on .
Similar to (Ying et al., 2019), we use a L1 regularization on the matrix produced by on an input graph to produce a sparse matrix , such that only a small number of edges in are selected as the counterfactual explanation. We also follow (Ying et al., 2019) to use an entropy regularization
to push the value of each entry in to be close to either 0 or 1, such that and approximate and well, respectively.
Now we can use the graphs in and the extracted decision regions to train the neural network in an end-to-end manner by minimizing over using back propagation. Once we finish training , we can first apply to produce the matrix for an input graph , and then obtain the explanation by selecting all the edges in whose corresponding entries in are larger than 0.5. We do not need the extracted boundaries for inference as the the decision logic of GNN is already distilled into the explanation network during the training.
As discussed in Appendix B, our method can be easily extended to generate robust counterfactual explanations for node classification tasks.
Our method is highly efficient with a time complexity for explaining the prediction on an input graph , where is the total number of edges in . Additionally, the neural network can be directly used without retraining to predict explanations on unseen graphs. Thus our method is significantly faster than the other methods (Ying et al., 2019; Pope et al., 2019; Yuan et al., 2021; Vu and Thai, ) that require retraining each time when generating explanations on a new input graph.
We conduct series of experiments to compare our method with the state-of-the-art methods including GNNExplainer (Ying et al., 2019), PGExplainer (Luo et al., ), PGM-Explainer (Vu and Thai, ), SubgraphX (Yuan et al., 2021) and CF-GNNExplainer (Lucic et al., 2021). For the methods that identify a set of vertices as an explanation, we use the set of vertices to induce a subgraph from the input graph, and then use the set of edges of the induced subgraph as the explanation. For the methods that identify a subgraph as an explanation, we directly use the set of edges of the identified subgraph as the explanation.
To demonstrate the effectiveness of the decision regions, we derive another baseline method named RCExp-NoLDB that adopts the general framework of RCExplainer but does not use the LDBs of decision regions to generate explanations. Instead, RCExp-NoLDB directly maximizes the prediction confidence on class for and minimizes the prediction confidence of class for .
We evaluate the explanation performance on two typical tasks: the graph classification task that uses a GNN to predict the class of an input graph, and the node classification task that uses a GNN to predict the class of a graph node. For the graph classification task, we use one synthetic dataset, BA-2motifs (Luo et al., ), and two real-world datasets, Mutagenicity (Kazius et al., 2005) and NCI1 (Wale and Karypis, 2006). For the node classification task, we use the same four synthetic datasets as used by GNNExplainer Ying et al. (2019), namely, BA-shapes, BA-Community, tree-cycles and tree-grid.
Limited by space, we only report here the key results on the graph classification task for fidelity, robustness and efficiency. Please refer to Appendix E for details on datasets, baselines and the experiment setups. Detailed experimental comparison on the node classification task will be discussed in Appendix F where we show that our method produces extremely accurate explanations. CF-GNNExplainer Lucic et al. (2021) is only included in the results of node classification, because the source code of CF-GNNExplainer is not available and Lucic et al. (2021) reports performance on only node classification tasks.
Fidelity is measured by the decrease of prediction confidence after removing the explanation (i.e., a set of edges) from the input graph (Pope et al., 2019). We use fidelity to evaluate how counterfactual the generated explanations are on the datasets Mutagenicity, NCI1 and BA-2motifs. A large fidelity score indicates stronger counterfactual characteristics. It is important to note that fidelity may be sensitive to sparsity of explanations. The sparsity of an explanation with respect to an input graph is , that is, the percentage of edges remaining after the explanation is removed from . We only compare explanations with the same level of sparsity.
Figure 1 shows the results about fidelity. Our approach achieves the best fidelity performance at all levels of sparsity. The results validate the effectiveness of our method in producing highly counterfactual explanations. RCExplainer also significantly outperforms RCExp-NoLDB. This confirms that using LDBs of decision regions extracted from GNNs produces more faithful counterfactual explanations.
SubgraphX does not perform as well as reported by Yuan et al. (2021). The fidelity performance reported by Yuan et al. (2021) is obtained by setting the features of nodes that are part of the explanation to but not removing the explanation edges from the input graph. This does not remove the message passing roles of the explanation nodes from the input graph because the edges connected to those nodes still can pass messages. In our experiments, we fix this problematic setting by directly blocking the messages that are passed on the edges in the explanation. Appendix E provides more details.
5.2 Robustness Performance
In this experiment, we evaluate the robustness of all methods by quantifying how much an explanation changes after adding noise to the input graph. For an input graph and the explanation , we produce a perturbed graph by adding random noise to the node features and randomly adding or deleting some edges of the input graph such that the prediction on is consistent with the prediction on . Using the same method we obtain the explanation on . Considering top- edges of as the ground-truth and comparing against them, we compute a receiver operating characteristic (ROC) curve and evaluate the robustness by the area under curve (AUC) of the ROC curve. We report results for in Figure 2. Results for other values of are included in Appendix F where we observe similar trend.
Figure 2 shows the AUC of GNNExplainer, PGExplainer, RCExp-NoLDB and RCExplainer at different levels of noise. A higher AUC indicates better robustness. The percentage of noise shows the proportion of nodes and edges that are modified. Baselines such as PGM-Explainer and SubgraphX are not included in this experiment as they do not output the edge weights that are required for computing AUC. We present additional robustness experiments in Appendix F where we extend all the baselines to report node and edge level accuracy.
GNNExplainer performs the worst on most of the datasets, since it optimizes each graph independently without considering other graphs in the training set. Even when no noise is added, the AUC of GNNExplainer is significantly lower than 1 because different runs produce different explanations for the same graph prediction. PGExplainer is more robust than GNNExplainer because the neural network they trained to produce explanations implicitly considers all the graphs used for training.
Our method achieves the best AUC on all the datasets, because the common decision logic carried by the decision regions of a GNN is highly robust to noise. PGExplainer achieves a comparable performance as our method on the Mutagenicity dataset, because the samples of this dataset share a lot of common structures such as carbon rings, which makes it easier for the neural network trained by PGExplainer to identify these structures in presence of noise. However, for BA-2motifs and NCI1, this is harder as samples share very few structures and thus the AUC of PGExplainer drops significantly. RCExplainer also significantly outperforms RCExp-NoLDB on these datasets which highlights the role of decision boundaries in making our method highly robust.
|Time||1.2s 0.2||0.01s 0.03||13.1s 3.9||77.8s 4.5||0.01s 0.02|
We evaluate efficiency by comparing the average computation time taken for inference on unseen graph samples. Table 1 shows the results on the Mutagenicity dataset. Since our method also can be directly used for unseen data without any retraining, it is as efficient as PGExplainer and significantly faster than GNNExplainer, PGM-Explainer and SubgraphX.
In this paper, we develop a novel method for producing counterfactual explanations on GNNs. We extract decision boundaries from the given GNN model to formulate an intuitive and effective counterfactual loss function. We optimize this loss to train a neural network to produce explanations with strong counterfactual characteristics. Since the decision boundaries are shared by multiple samples of the same predicted class, explanations produced by our method are robust and do not overfit the noise. Our experiments on synthetic and real-life benchmark datasets strongly validate the efficacy of our method. In this work, we focus on GNNs that belong to Piecewise Linear Neural Networks (PLNNs). Extending our method to other families of GNNs and tasks such as link prediction, remains an interesting future direction.
Our method will benefit multiple fields where GNNs are intensively used. By allowing the users to interpret the predictions of complex GNNs better, it will promote transparency, trust and fairness in the society. However, there also exist some inherent risks. A generated explanation may expose private information if our method is not coupled with an adequate privacy protection technique. Also, some of the ideas presented in this paper may be adopted and extended to improve adversarial attacks. Without appropriate defense mechanisms, the misuse of such attacks poses a risk of disruption in the functionality of GNNs deployed in the real world. That said, we firmly believe that these risks can be mitigated through increased awareness and proactive measures.
- Sanity checks for saliency maps. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31, pp. . External Links: Cited by: §2, §4.1.
- Certifiable robustness to graph perturbations. arXiv preprint arXiv:1910.14356. Cited by: §4.1.
- Friendship and mobility: user movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1082–1090. Cited by: §1.
- Exact and consistent interpretation for piecewise linear neural networks: a closed form solution. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1244–1253. Cited by: §4.1, §4.2.
Submodular cost submodular cover with an approximate oracle.
International Conference on Machine Learning, pp. 1426–1435. Cited by: §4.2.
- Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity. Journal of medicinal chemistry 34 (2), pp. 786–797. Cited by: Appendix H.
- Programmable agents. arXiv preprint arXiv:1706.06383. Cited by: §2.
One-shot imitation learning. arXiv preprint arXiv:1703.07326. Cited by: §2.
- Graph neural networks for social recommendation. In The World Wide Web Conference, pp. 417–426. Cited by: §1.
- Counterfactual explanations & adversarial examples–common grounds, essential differences, and potential transfers. arXiv preprint arXiv:2009.05487. Cited by: §2.
Understanding the difficulty of training deep feedforward neural networks.
Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249–256. Cited by: §2.
- Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 315–323. Cited by: §3, §4.1.
- Maxout networks. In International conference on machine learning, pp. 1319–1327. Cited by: §3, §4.1.
Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In
Proceedings of the IEEE international conference on computer vision, pp. 1026–1034. Cited by: §3, §4.1.
- [Re] parameterized explainer for graph neural network. In ML Reproducibility Challenge 2020, Cited by: Appendix E.
- Graphs in molecular biology. BMC bioinformatics 8 (6), pp. 1–14. Cited by: §1.
-  Submodular optimization with submodular cover and submodular knapsack constraints. In Advances in Neural Information Processing Systems, Cited by: §4.2.
- Drug–target affinity prediction using graph neural network and contact maps. RSC Advances 10 (35), pp. 20701–20712. Cited by: §1.
- Latent adversarial training of graph convolution networks. In ICML Workshop on Learning and Reasoning with Graph-Structured Representations, Cited by: §2.
- Derivation and validation of toxicophores for mutagenicity prediction. Journal of medicinal chemistry 48 (1), pp. 312–320. Cited by: §5.
-  Semi-supervised classification with graph convolutional networks. In 5th International Conference on Learning Representations, ICLR 2017, Conference Track Proceedings, Cited by: §1.
DIG: a turnkey library for diving into graph deep learning research. arXiv preprint arXiv:2103.12608. Cited by: Appendix E, Appendix F.
- CF-gnnexplainer: counterfactual explanations for graph neural networks. arXiv preprint arXiv:2102.03322. Cited by: Appendix F, §2, §5, §5.
-  Parameterized explainer for graph neural network. In Advances in Neural Information Processing Systems, Cited by: Appendix E, Appendix E, Appendix F, Appendix F, §1, §2, §2, §2, §5, §5.
- Interpretable machine learning. Note: https://christophm.github.io/interpretable-ml-book/ Cited by: §1.
- On the number of linear regions of deep neural networks. arXiv preprint arXiv:1402.1869. Cited by: §4.2.
- Causal interpretability for machine learning-problems, methods and evaluation. ACM SIGKDD Explorations Newsletter 22 (1), pp. 18–33. Cited by: §1.
- Rectified linear units improve restricted boltzmann machines. In Icml, Cited by: §3, §4.1.
- MEG: generating molecular counterfactual explanations for deep graph networks. arXiv preprint arXiv:2104.08060. Cited by: §2.
- PkCSM: predicting small-molecule pharmacokinetic and toxicity properties using graph-based signatures. Journal of medicinal chemistry 58 (9), pp. 4066–4072. Cited by: §1.
Explainability methods for graph convolutional neural networks. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10772–10781. Cited by: Appendix F, §1, §2, §2, §2, §2, §4.3.2, §5.1.
- Grad-cam: visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pp. 618–626. Cited by: §2, §4.1.
- Exact epidemic models on graphs using graph-automorphism driven lumping. Journal of mathematical biology 62 (4), pp. 479–508. Cited by: §1.
- Deep inside convolutional networks: visualising image classification models and saliency maps. ICLR. Cited by: §2.
- Counterfactual explanations of machine learning predictions: opportunities and challenges for ai safety. In SafeAI@ AAAI, Cited by: §1.
-  Graph attention networks. In 6th International Conference on Learning Representations, ICLR 2018, Conference Track Proceedings, Cited by: §1, §1, §2, §2, §2, §2.
-  PGM-explainer: probabilistic graphical model explanations for graph neural networks. In Advances in Neural Information Processing Systems, Cited by: Appendix E, Appendix E, §4.3.2, §5.
- Comparison of descriptor spaces for chemical compound retrieval and classification. In Sixth International Conference on Data Mining (ICDM’06), Vol. , pp. 678–689. External Links: Cited by: §5.
- An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica 2 (4), pp. 385–393. Cited by: §4.2.
- Graph neural networks for automated de novo drug design. Drug Discovery Today. Cited by: §1.
- Adversarial attacks and defenses in images, graphs and text: a review. International Journal of Automation and Computing 17 (2), pp. 151–178. Cited by: §2.
- Topology attack and defense for graph neural networks: an optimization perspective. arXiv preprint arXiv:1906.04214. Cited by: §2.
- A deeper graph neural network for recommender systems. Knowledge-Based Systems 185, pp. 105020. Cited by: §1.
- GNNExplainer: generating explanations for graph neural networks. In Advances in Neural Information Processing Systems, Vol. 32. Cited by: Appendix E, Appendix E, Appendix E, Appendix F, Appendix F, §1, §2, §2, §2, §4.3.2, §4.3.2, §5, §5.
- Xgnn: towards model-level explanations of graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 430–438. Cited by: §1, §2, §2.
- On explainability of graph neural networks via subgraph explorations. arXiv preprint arXiv:2102.05152. Cited by: Appendix F, §4.3.2, §5.1, §5.
- Visualizing and understanding convolutional networks. In European conference on computer vision, pp. 818–833. Cited by: §4.1.
- Top-down neural attention by excitation backprop. International Journal of Computer Vision 126 (10), pp. 1084–1102. Cited by: §2.
-  Link prediction based on graph neural networks. In Advances in Neural Information Processing Systems, Cited by: §1.
- Adversarial attacks on neural networks for graph data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2847–2856. Cited by: §2.
- Adversarial attacks on graph neural networks via meta learning. arXiv preprint arXiv:1902.08412. Cited by: §2.
- Certifiable robustness and robust training for graph convolutional networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 246–256. Cited by: §4.1.
Appendix A Illustration of RCExplainer’s training
Appendix B Node classification
Our method is directly applicable to the task of node classification with few simple modifications. Instead of extracting Linear Decision Boundaries (LDBs) in feature space of graph embeddings, we operate on the feature space of node embeddings obtained after the last graph convolution layer. We use the greedy method described in Equation (5) to find the decision regions for each class, except for the node classification, the functions and denote the coverage of nodes rather than graphs.
The next step to train the explanation network to generate counterfactual explanations for node classification is identical to the procedure described in Section 4 except for one difference. For node classification, since a node’s prediction is only influenced by its local neighborhood, therefore we only need to consider the computation graph of the given node while generating the explanation. The computation graph of a node is defined as -hop neighborhood of the node , where refers to number of graph convolution layers in the given GNN model . In other words, GNN performs steps of message passing through its graph convolution layers during the forward pass to effectively convolve -hop neighborhood of the given node. Hence, the output of is the output mask over the adjacency matrix of the computation graph of the given node. The edges with mask values more than 0.5 are chosen from the computation subgraph to form the explanation subset that can explain the original node classification prediction.
Appendix C Interpreting individual boundaries
We present a case study to demonstrate that our method can be adapted to answer the question, “Which substructures make the samples of one class different from the samples of other specific class, and therefore can be masked to flip the prediction between the two given classes?”. This is useful in various fields, for instance, in drug discovery where the classes correspond to different chemical properties possible of a drug compound, researchers are often interested in understanding the role of chemical structures that result in a prediction of a particular property instead of another specific one. Also, this is especially helpful for debugging in cases where one expects a particular output for the given input but the GNN’s prediction does not agree with the expectation.
This case corresponds to a more constrained setting of counterfactual explanations as the target prediction is also predetermined. Let the original predicted label and the target label on a given graph be denoted by and respectively. Since our method explicitly models the boundaries separating the samples of one class from the samples of other classes, our method can be easily adapted to answer such questions. If we are able to only interpret the boundary separating the samples of the given two classes, this would allow us to uncover the substructures that make the samples of first class different from the samples of the other class. To address this, we modify the loss terms to
where refers to the specific boundary in the set separating the samples with predicted label from the samples with the predicted label . Since we are only concerned about changing the outcome from to , we need to consider only the boundary separating these classes while formulating the loss for the network.
We verify this on a synthetic graph classification dataset with 3 classes, , and such that each graph sample contains exactly 2 motifs. Both the motifs jointly determine the class because each possible pair of classes share exactly one motif as shown in Figure 4(a). We show explanation results produced by RCExplainer on an instance of class in Figure 4(b). For a given graph sample of class , we separately find explanations with respect to each of the two boundaries and , separates from , while separates from . We can see in the Figure 4(b) that optimizing our method w.r.t correctly identifies the motif (ABCD) in the sample that is not associated with the class . The other motif (EFGH) which is also associated with the is not considered important by the method. When we find the explanation for the same graph sample but with respect to the boundary , the results are opposite and align with the expectations. In this case, the motif (EFGH) that is not associated with is highlighted instead of the motif (ABCD). We observe similar behavior on the instances of other classes where interpreting an instance with respect to a single boundary correctly identifies the motif that identifies the given class from the other class.
In conclusion, the above case study demonstrates that our method can highlight the motif unique to the class by interpreting the boundary separating the classes and . Removing the highlighted motif from the given sample causes the drop in confidence of original predicted label while increasing the confidence for the class .
Appendix D Proof: Decision region extraction is an instance of SCSC optimization
Now we prove that the optimization problem in Equation (4) is an instance of Submodular Cover Submodular Cost (SCSC) problem.
The Equation (4) can be written as
Maximizing denotes maximizing the coverage of the set of boundaries for the samples of class denoted by . This can be seen as minimizing which denotes the number of graph samples of class that are not covered by and thus exclusive to . in the constraint is equal to that denotes the set of graph samples in the dataset that do not belong to the class .
Let us denote by function and by . To prove that the optimization problem in Equation (14) is an instance of SCSC problem, we prove the functions and are submodular with respect to .
For function to be submodular with respect to , we show that for any two arbitrary sets of LDBs denoted by and , if then
is always satisfied for a linear decision boundary .
As discussed in Section 4 the LDBs in induce a convex polytope that has the maximum coverage of samples of class . Adding a new boundary to may remove (separate) some samples of class from and lower its coverage. This reduction in coverage is denoted by the term on the left hand side of Equation (15). Similarly the term on the right hand side of Equation (15) denotes the reduction in coverage for the subset .
Now, since , the set of graph samples contained in the polytope is subset of the graph samples contained in the polytope . Hence, adding new a LDB to is not going to remove less number of samples from the polytope as compared to the samples removed from the polytope . Therefore, the function is submodular with respect to .
Similarly, we can prove the function to be submodular with respect to . This concludes the proof.
Appendix E Implementation details
Table 2 shows the properties of all the datasets used in experiments. The last row corresponds to the test accuracy of the GCN model we train on the corresponding dataset.
|# of Nodes (avg)||700||1400||871||1020||25||30.32||29.87|
|# of Edges (avg)||2050||4460||970||2540||25.48||30.77||32.30|
|# of Graphs||1||1||1||1||700||4337||4110|
|# of Classes||4||8||2||2||2||2||2|
|Base||BA graph||BA graph||Tree||Tree||BA graph||—||—|
|License||Apache 2.0||Apache 2.0||Apache 2.0||Apache 2.0||—||—||—|
For the baselines, we use publically available implementations provided in (Ying et al., 2019; Holdijk et al., 2021; Vu and Thai, ; Liu et al., 2021) to obtain the results. Implementation of GNNExplainer provided by (Ying et al., 2019) is licensed under Apache 2.0 license while implementation of SubgraphX provided by (Liu et al., 2021) is licensed under GNU General Public License v3.0. We use the default parameters provided by the authors for the baselines.
For the local baseline RCExp-NoLDB, we use same setup as RCExplainer except we don’t use LDBs for training the explanation network . The loss function denoted by for this baseline aligns with the loss functions of GNNExplainer and PGExplainer except we introduce a second term to enforce the counterfactual characteristics. We directly maximize the confidence of the original predicted class on the masked graph and minimize the confidence of original predicted class for the remainder graph . can be expressed as :
corresponds to the conditional probability distribution learnt by GNN modelfor as input graph.
corresponds to the random variable representing the set of classesand is the random variable representing possible input graphs for the GNN . Here is a hyperparameter that represents the weight of the second term in the loss function. The loss is jointly minimized with the regularizers and specified in Section 4.
We follow (Ying et al., 2019; Luo et al., ) and use the same architecture to train a GNN model with 3 graph convolution layers for generating explanations on each dataset. Consistent with prior works (Ying et al., 2019; Luo et al., ; Vu and Thai, ), we use (80/10/10)% random split for training/validation/test for each dataset.
We use Adam optimizer to tune the parameters of and set learning rate to . We train our method for epochs. For node classification datasets, we set to , to and to . For graph classification datasets, we set to , to . is set to for BA-2motifs and NCI1, and to for Mutagenicity. We also scale the combined loss by factor of for all the datasets. The number of LDBs to be sampled from GNN for each class is set to . Empirically, we find that this is enough as the subset of LDBs selected greedily from this set is able to cover all the samples of the given class. Our codebase is built on the top of implementations provided by (Ying et al., 2019; Luo et al., ).
All of the experiments are conducted on a Linux machine with an Intel i7-8700K processor and a Nvidia GeForce GTX 1080 Ti GPU with 11GB memory. Our code is implemented using python 3.8.5 with Pytorch 1.8.1 that uses CUDA version 10.0.
Appendix F Additional experiments
Fidelity. As described in Section 5
, counterfactual characteristic of an explanation is measured by using fidelity as an evaluation metric. It is defined as drop in confidence of the original predicted class after masking the produced explanation in the original graph(Pope et al., 2019). Since, we produce explanations as edges, we mask the edges in the input graph to calculate the drop. Fidelity for the input graph and the produced explanation is formally written as
where denotes the class predicted by for . As discussed in Section 5, explanations are mostly useful, if they are sparse (concise). Sparsity is defined as the fraction of total edges that are present in but not in :
However, since the approaches like SubgraphX and PGM-Explainer do not report the importance ranking of edges of , it’s not feasible to completely control the edge sparsity of the desired explanation. Hence, we take samples with similar sparsity level for comparison. Consistent with prior works (Ying et al., 2019; Luo et al., ), we compute fidelity for the samples that are labelled positive, for instance in Mutagenicity dataset, we compute fidelity for the compounds that are labelled as mutagenic. The results are presented in Figure 1.
As reported in Figure 1, the results obtained for SubgraphX are significantly lower than those reported by Yuan et al. (2021). We believe, this is the result of the problematic setting adopted in (Yuan et al., 2021) and implemented in (Liu et al., 2021) for computing the fidelity. To be specific, while computing the drop in confidence, the features of the nodes present in the explanation are set to 0 without removing the edges incident on these nodes. As the message passing is still allowed on these edges, therefore the first graph convolution results in updating the representation of the nodes to non-zero. As the features of these nodes are now not set to zero, the subsequent graph convolutions would also allow these nodes to participate in updating the representations of their neighboring nodes.
A simple fix to this problem is to mask the edges incident on these nodes while computing the fidelity. This would ensure that these nodes do not participate in message passing irrespective of the number of graph convolutions. Adopting this setting, allows us to get the results obtained in Figure 1. We also note that masking only the edges and not setting the nodes to zero also yields similar performance for SubgraphX as reported in Figure 1.
Robustness to noise.
As discussed in Section 5
, we use AUC to compare robustness of different methods. AUC is defined as area under receiver operating characteristic (ROC) curve of a binary classifier. We consider the top-edges of the produced explanation for the input graph as ground-truth. After we obtain the explanation for the noisy sample , we formulate this as binary classification problem. For each edge in , if it is present in the top- edges of the produced explanation , then it is labeled positive, and negative otherwise. For , the mask weight of an edge predicted by explanation network is the probability of the corresponding edge being classified as positive. Limited by space, we only reported the results for in Section 5. Now, we report the results for and in Figure 5 where we observe similar trend as observed in Figure 2. RCExplainer outperforms the rest of the methods by big margin on BA-2motifs and NCI1.
Since, AUC evaluation requires that the explanation method outputs the importance weights for the edges of a noisy sample , we cannot use this for comparing approaches like SubgraphX and PGM-Explainer that do not provide this information. Therefore, to provide a more comprehensive comparison, we use node accuracy as a measure to compare all the baselines. For calculating node accuracy, we consider top- important nodes in the explanation for the original graph as ground-truth and compare them with the the top- important nodes obtained through the explanation for the noisy graph . However, the challenge is that GNNExplainer, PGExplainer, RCExp-NoLDB and RCExplainer do not rank nodes based on their importance. To address this, we use edge weights to obtain the node weights. We approximate the node weights as :
where denotes the weight of the node and is the weighted adjacency mask predicted by . We believe this is a valid approximation because for an important edge to exist in the explanation subgraph, nodes connected by this edge must also be considered important and be present in the explanation subgraph. Now, using these node weights, we can obtain the ground-truth set of nodes by picking top- important nodes of the explanation on . Comparing top- important nodes of explanation on with the ground-truth set of nodes gives us accuracy.
We present the node accuracy plots for in Figure 6. We also note that comparison is not completely fair to GNNExplainer, PGExplainer and our method because of the approximation used to extend these methods for computing node level accuracy. Despite the approximation, our method significantly outperforms all the methods. GNNExplainer, PGM-Explainer and SubgraphX perform consistently worse as expected because they optimize each sample independently to obtain an explanation.
One other way to compare all the approaches on the robustness would be to compute edge level accuracy instead at the node level. This would require extending SubgraphX and PGM-Explainer to obtain important edges from the explanation subgraph. However, it is more challenging as SubgraphX only provides a subgraph as an output. To obtain top- important edges, we can randomly sample edges from the returned explanation subgraph that consists of slightly more than number of edges. The random sampling would make this evaluation more approximate and perhaps would further degrade the performance of SubgraphX, therefore, we do not report these results.
We evaluate our method on four synthetic node classification datasets used by GNNExplainer Ying et al. (2019), namely, BA-shapes, BA-Community, tree-cycles and tree-grid. Following (Ying et al., 2019; Luo et al., ), we formalize the explanation problem as binary classification of edges and adopt AUC under the ROC curve as the evaluation metric. This evaluation is only possible for synthetic datasets where we can consider the motifs as reasonable approximations of the explanation ground truth. The edges that are part of a motif are positive and rest of the edges are labelled negative during the evaluation. We show the results in Table 3 where we demonstrate that our method is extremely accurate and achieves close to optimal score for AUC on all of the datasets. This is solid evidence of our method’s ability to capture the behavior of underlying GNN better and produce consistently accurate explanations to justify the original predictions. Please note that PGM-Explainer does not provide edge weights so it is not applicable for AUC. Also since the implementation of CF-GNNExplainer is not available, we only report those results that are available in (Lucic et al., 2021).
Appendix G Hyperparameter analysis
Number of LDBs.
As mentioned in Section 4, we sample LDBs from the decision logic of GNN to form a candidate pool from which some boundaries are selected by the greedy method. In Figure 7, we show the effect of number of sampled candidate boundaries on the performance on BA-Community dataset. As we increase the number of sampled LDBs from 10 to 50, the fidelity improves and saturation is achieved once 50 LDBs are sampled. This is consistent with the expectations as more boundaries are sampled, the quality of decision region improves. When there are enough boundaries that can result in a good decision region after greedy selection, the performance saturates.
in Figure 8, we show the effect of hyperparameter introduced in Equation (8). We show the fidelity performance of our method on BA-2motifs dataset for different values of ranging from 0.01 to 0.90. The fidelity results are worst for as the second term of the loss that enforces counterfactual behavior of explanations is weighted very less in the combined loss. Setting gives the best results for fidelity.
Appendix H Qualitative results
Qualitative results. We present the sample results produced by GNNExplainer, PGExplainer and RCExplainer in Table 4. Our method consistently identifies the right motifs with high precision and is also able to handle tricky cases. For instance in Figure (q), note that our method is able to identify the right motif in the presence of another “house-structure”. The other structure contains the query node but as it also contains the nodes from the other community, hence, it is not the right explanation for the prediction on the given query node. In Figure (t), our method is able to correctly identify both NO2 groups present in the compound, and as discussed before, NO2 groups attached to carbon rings are known to make the compounds mutagenic (Debnath et al., 1991). The edges connecting nitrogen(N) atoms to the carbon(C) atoms are given the highest weights in the explanation. This is very intuitive in counterfactual sense as masking these edges would break the NO2 groups from the carbon ring and push the prediction of the compound towards “Non-Mutagenic”.