1 Introduction
Graphs are widely used data structures involved in many realworld problems. gnn [scarselli_graph_2009] are artificial neural networks suited for such data structure. For graph classification, node classification or link prediction tasks, gnn models have shown impressive performances [defferrard_convolutional_2016, xu_spatiotemporal_2019, zhang_link_2018]. Artificial neural networks are more and more used in daily life task, including gnn. gnn models show impressive results in drugs design [bapst_unveiling_2020], web recommendations [ying_graph_2018] or traffic forecasting [derrowpinion_eta_2021]. A major drawback of those deep models is their occluded internal decisionnal processes. For many daily usage of such models, in particular in critical applications, it raises confidence, trustworthy, privacy and security concerns. xai is a set of methods that aims to tackle these issues by providing humanlevel meaningful insights about deep model internals. Nonetheless, understanding and interpreting decisions remain humanrelative and contextdependent notions. One of those socialrelated requirements is that an explainer must adapt its explanation formulation according to the relative background of the explainee regarding the phenomenon to explain. Several interesting xai methods have been proposed for explaining graph neural network models but they often fail to take into account the social dependency when providing their explanations. In this contribution we provide a socialaware explaining method that leverages background knowledge variability that is inherent in any socialrelated process while maintaining high score regarding stateoftheart objective assessment metrics. Firstly we will frame the social context that the explanation process depends on. Then we will introduce our approach and provide its relevance against compared methods.
2 Related Work
The explaining process has been intensively numerically investigated despite the lack of common ground knowledge of what an explanation is and how to compare them. Moreover several methods have been designed according to different paradigms. XGNN [yuan_xgnn_2020]
is a modellevel approach that generates iteratively explaining graphs through a reinforcement learning procedures. During this sequential process, each graph is upgraded from the previous one, according to the learned policy. From an empty graph, nodes and related features are incorporated until a sufficient large graph sequence has been generated. The state space of this Markov Decision process can be assimilated to the cartesian product of the node space and finite feature space, which is a finitedimensional space, under these conditions, a sufficient long graph sequence leads to an optimal solution. GNNExplainer
[ying_gnnexplainer_2019] is a mask generator model based on mutual information optimization. It starts with randomly initialized nodes and node features masks jointly optimized, with mutual information, against the class label of the assessed graph. PGExplainer [luo_parameterized_2020]is also based on maximizing mutual information between a class label and a highly contributing graph towards GNN prediction. Explained graphs are sampled from a probability distribution where their parameters are learned to use a multiperceptron. SubgraphX
[yuan_explainability_2021] derived connected subgraph sequences from the input graph, to overcome the relevant information flow breaks that may arise in PGExplainer or GNNExplainer. To address this generation problem, SubgraphX uses a Monte Carlo Tree Search and assigns the shapely value of each graph; the higher the shapely value is, the more the subgraph is pertinent concerning explainability purposes. Layerwise Relevance Propagation (LRP) [schnake_higherorder_2021] is an adaptation of the original LRP implementation. For computing relevance, LRPGNN adopts a walkbased approach, introducing the node anteriority and apply the original LRP propagation rule. Assessing the quality of those methods also remains a core challenge for the xai community and some metrics have been proposed. They deal with explanation fidelity towards explained deep models. As well, sparsity measure is used to show the explanation compactness. Besides aforementioned methods show interesting results, they often miss socialdependent parameters, a central importance in any humanrelated explanation process. We tackle this issue by supplying an efficient explaining method for graph neural networks that take into account such aspects.3 Problem formulation
Explaining is a human knowledge transfer process involving an explainer and an explainee concerning a phenomenon . In order to have a profitable conversation (e.g. providing the explantation of from to ), both involved individuals must share a common vocabulary set. It means that shared ideas must be expressed upon a shared set of concepts by both individuals. This allows the conversation to be profitable for them. For explanation purposes, the term profitable means increasing the knowledge quantity of of thanks to explanation. For explaining, those concepts are framed as atomic parts that, when carefully mixed, allow the explainer to provide an explanation of to the explainee . However, those elementary bricks are chosen conditionally to both knowledge quantity of and that are also dependent on . Indeed , if the explainee has already a solid background or culture relatively to , basic insights allowing shallow understanding of are already acquired by the explainee . Only finer details must be provided by the explainer for explainee to have total understanding of . On the contrary, an explainee who has freshly begun to be interested in must assimilate the coarsed concepts relative to before reaching the finest ones with the explainer having to adapt his vocabulary complexity in order to be understandable.
3.1 GNN explanation framework
Graph Neural Network
A graph is a couple of two sets. The set of size is the set of nodes of . We abusively denote by the size of which is the cardinal of . The set is the set of edges describing the topology of . This set can be fully encoded by its adjacency matrix A defined as such that if , otherwise. A graph is said to be either undirected if or directed otherwise. In the context of graph representation learning, is rather seen as a domain which structure is determined by the topology of (i.e. described thanks to A) and a valued signal with as a support. Here denotes a dimensional Hilbert space. Under this paradigm, graphs are couples where
. One singularity of this data structure is that node ordering do not matter. Graphs are very rich mathematical objects that are widely used for representing realworld problems. There is an increasingly interest from the community to integrate such data structure in deep learning frameworks. Powerful graph signal encoder has been proposed. They leverage at the same time; deformation stability and scale separation, ubiquitous notions in modern deep learning approach; and the aforementioned permutation insensibility. Such models are named gnn. For instance, inspired by cnn, gcn
[kipf_semisupervised_2017] proposes a learning module which follows a nonparametric local signal permutationinvariant aggregation scheme. This module has been extended by the gat [velickovic_graph_2018] module which proposes an attentional nodepairwise interaction scheme for encoding local signals. Those modules show strong results for node classification, graph classification or link prediction problems.Supervised graph classification problems
For two measurable spaces, we define the set of measurable functions going from to . Given an i.i.d sampled finite dataset where each element is a graph and its label
representing the class it belongs to. A loss function is mapping
quantifying how well a learning mapping associated to its true label conditioned by a neural network architecture and a learning parameter . For a given architecture , we seek such that:(1) 
where is unseen data and where is a
valued random variable,
is a valued random variable andis the image probability measure of
in . In the context of graph classification, is a gnn model and is the crossentropy loss between the inferred label conditional probability law and its ground truthconditional probability law.Explaining graph classification models
Like modern deep learning models, gnn model leverages signal evolving on domain localities describing elementary lowlevel representations which are recursively combined (producing higherlevel representation) in order to englobe all those subrepresentations and carefully encoding them until producing the output. At a given model depth, such subrepresentations are, in a domain aspect: subgraphs of the input graph; and in a signal aspect: subparts of the signal evolving on such subgraph structure. Under the aforementioned graph learning paradigm, we redefine the subgraph , denoted as , of a graph of size . Considering , we define the subgraph of such that and is the adjacency matrix of which is the same size as A but we respect the new node adjacency distribution induced by . More generally, explaining methods that deal with learned signal thanks to a deep model often focus on finding relevant subdomains and associated signals that preserve model abilities (e.g. model performance, model expressivity, etc.) without taking into account the social aspect underlying any explanation procedures. The design of EiXGNN has as key component this socialaware feature that, as far as we know, stateoftheart methods do not consider.
4 EiXGNN
EiXGNN (eigencentralality explainer for graph neural network) provides its explanations according to a set of atomic concepts. These concepts are for explanation processes that coins are for money exchanges. They are the elementary parts of the explanation process that explainers, when explaining, will build their arguments upon those concepts. Those concepts must be carefully chosen by the explainer in order to match the explainee background on the explained phenomenon. With assuming that the explainer has an optimal knowledge of a phenomenon regardless concept selection, the concept selection process depends on the background (relatively to ) of the explainee and . EiXGNN has been designed to integrate this socialdependancy on the explainee background given a phenomenon to explain. Formally, we frame the set of explaineeadmissible concepts as a probability space where concepts are valued random variable. Parameter is the explainee concept assimibility constrain. It is bounded as and is proportional to the explainee concept assimibility given . We define phenomenon as being a valued random variable ^{1}^{1}1Note : since is a valued random variable, it follows that the node set of is a valued random variable and the edge set of is a valued random variable. The probability space can thus be seen as the product of which is a probability space as well.. In the following, excepting for contrary mentions, we consider the phenomenon where
is an optimized gnn classifier which has been trained on a dataset
which belongs to. We consider that graph is composed of nodes and edges. We also assume that the explainee has an explainee concept assimibility constraint . EiXGNN provides its explanation based on a conditioned local and global explaineesuited concept ordering. Firstly, we introduce concept generation procedure, then the global concept ordering process which is the common thread of the overall explaining procedure is described. Finally, the local concept ordering procedure is presented, this second step is a refining procedure that highly precise at a node level the provided explanation.4.1 Concept generation
As mentioned above, concepts are atomic elements that allow the explainer to provide its explanation. Given the explainee concept assimibility , concept is a valued random variable. This variable is a subgraph of such that . From a signal point of view, it describes a subpart of the signal evolving on a subdomain of . To generate those concepts, we have selected sampling approaches which depends either on a prior distribution or not. Sampling concept is thus a subgraph sampling process which has a combinatorial aspect inherent of any subgraph sampling problems. Concepts are key components of our approach, they have to be carefully selected since they are providing our raw materials for conceiving explanations. From all possible subgraphs we can derive from , some are more suited for providing explanation of than others. Assuming a uniform relevance distribution for explaining among all those graphs is not adapted, thus assuming that the sampling distribution is is not adapted either. We rather consider a light importance sampling approach that quantifies the prior relevance distribution of nodes conditionally to . For building such probability distribution, we apply a node ablation approach that assesses the importance of nodes within their neighborhood with respect to . Formally, for a neighboring node of . To quantify node ablation importance we define a random variable that measures the relative disturbance effect between two nodes relatively to (e.g. relative performance alteration impact of removing from ). With assuming a uniform relevance distribution of nodes composing , we defined the prior relevance distribution of the node conditionally to by:
(2) 
With a normalizing constant such that we obtain a prior node importance probability distribution that allows more efficient sampling process for determinitating pertinent concepts with respect to . Once such prior distribution is determined, we sample in an i.i.d manner realizations of which we denote by where each node composing the subgraph has been sampled thanks to the prior node sampling distribution. Next, we will present the procedure for hierarchizing those concepts relatively to . The impact of the values taken by and will be further investigated in supplementary part.
4.2 Global concept ordering
Once concepts are sampled, we must find an ordering relationship in order to classify their relevance according to . Thanks to the prior node importance sampling approach, we have already established such hierarchization but among all possible subgraphs of with size which considerably reduces the research perimeter of the optimal substructure that will explain . Instead, here we present an ordering method that hierarchizing pairwisely concepts among the sampled concepts. Considering these concepts, we build an operational research tree with as root and these concepts as leaves. Without any further works, we do not know yet if a concept is more relevant than another concept for explaining . In order to provide such ordering, we derive from the sample a complete graph where each node represents a concept and edge of represents the relative similarity between two concepts relatively to . Since in this context graph are seen as signal evolving on a precise deformation, we take into account each both aspects for quantifying concept similarities pairwisely.
Relative concept domain similarity
We define the domain similarity between two concepts as the relative edge density between and . The graph edge density of a concept , denote is the ratio between the actual edges composing over the total number of possible edges can be composed of. For a graph with nodes and edges, it is defined as follows
(3) 
It measures how
tends to be a complete graph. We choose this measure because of the local aggregation operation (e.g. sum) involve in many gnn models. It appears that complete substructures (i.e. subdomains) aggregate much more signal information than sparser substructures. Indeed, widely used gcnbased or gatbased gnn models aggregate node signal representation according to node neighborhoods. The completeness aspect of such a substructure does not ensure that they are more relevant than a sparser one but we empirically measure that node feature has a significant variance (i.e. signal variance) across the node set of
. From a statistical point of view, by averaging node information with many node signal representations it produces a more fidel local signal representation than with taking less node signal representation. As well as in local signal representation, when dealing with the same information aggregation strategy, the error made in the relevance attribution of a node given is much farther from the one which has been obtained with far fewer neighboring nodes. The prior on the complete structure of is thus more suited for less node importance attribution error than with a more degenerated concept . With relative edge density, we favor concepts that are closer to complete substructure than those which are more degenerated.Relative concept signal similarity
The concept signal similarity quantifies how similar behaviors are with respect to when the signal is propagated over a given concept subdomain and when it is propagated over another subdomain supplied by another concept. Let assume that we considered two concepts and , the case where is similar to given means that sees equivalently and . Considering does not provide any added value than solely considering itself, with respect to . As a similarity metric between two concepts and we use the KullbachLiebler divergence of both inferred probability distributions of and thanks to . Formally, we frame as the behavior similarity metric concerning two concepts and by :
(4) 
where denotes the KullbachLiebler divergence.
Computing in a pairwise manner relative concept domain and signal similarity of the concepts, we obtain a pairwise concept ordering between those concepts. Indeed, with considering two concepts and such that , quantifies how much and are similar in both, domain and signals, aspects at the same time. This quantity provides a relational measurement of two given concepts. It becomes natural to consider as being the entries of the adjacency matrix of that we denote now by that is not symmetric. In terms of both concepts signal and domain disimiliraty, the concept that has to be considered the most is the one identified has been the highliest dissimilar concepts within concepts relatively to . Under the terms of graph theory, such a concept is the one which has the highest eigencentrality value among concepts of . With considering:
(5) 
as the normalized version of with e
the unitvector of size
, we have that the eigencentrality vector ofis the right eigenvector
r ^{2}^{2}2Sinceis a stochastic matrix it always admits an eigenvalue equal to 1. Analogously, we can see
as avalued irreducible and recurrent Markov chain which admits its stationary law
r. ofwhich has 1 as eigenvalue. Formally
r satisfies the following equation :(6) 
Thereby, components of provide a natural ordering between sampled concepts given .
4.3 Local concept ordering
For each of the concepts, the global concept ordering phase assigns a global score to each concept . It means that nodes composing have a uniform contribution in explaining proportional to . Nonetheless, we know that in any graph learning problems, each node provides a varying contribution towards the learning task as mentioned in section [number]. Previously, we took the same approach for getting some trends concerning prior node relevance distribution, this compute has a complexity where is the number of nodes composing . Here we consider subgraphs of which have a fixed size for all . Since we deal with smaller graphs (i.e. concepts), we investigate a more precise strategy for quantifying node relevance distribution. Given a concept , it deals with computing the Shapley value of each node composing
. The Shapley value is a conceptual solution in cooperative game theory quantifying how important the marginal role of a player has in the game outcome. Considering a coalition of
players indexed within playing a cooperative game with a game payoff measurable function where denotes all subsets of . With supplying counting measure to , the Shapley value of a player , is defined by(7) 
Computing such value has a complexity of which is far more than the previous shallow node relevance computing procedure. Under our context, we want to provide at a node level a precise concept relevance value regarding . With being consistent with the approach described in [section] (i.e. quantifying behavior changing when removing some nodes). For a given node belonging to a concept , computing the Shapley value of required to consider all possible subgraphs of and compute the perturbing effect value of over . As mentioned, this computes is intensive. If we assume that a call take then it requires an amount of time surrounding which is dependent on the explainee concept assimibility constraint. We can adopt a Monte Carlo sampling strategy for computing the Shapley value of in order to have a reasonable time computation for any level of explainee concept assimibility constraint with a quantifiable error on the approximation made.
Once achieving global and local concepts ordering, we concatenate node relevance distribution described under the node set of that we denote by . Then, we provide our explanation mapping ^{3}^{3}3We project this relevance map to by considering . of through
(8) 
5 Experiments
We first introduce datasets we used to provide our explanation through EiXGNN. We then provide the training conditions and setup of each of our gnn classifier. Finally, we provide a quantitative objective assessment of EiXGNN according to objective metrics that are widely used in the literature.
5.1 Datasets
In order to provide meaningful results, we chose realworld datasets that incorporate human intelligible features. Each of the following datasets is suited for graph classification problems. (a) MNISTSuperpixels [bronstein_geometric_2017]
is a dataset composed of 60000 graphs, that each represents a superpixel version of the wellknown handwritten digit MNIST
[lecun_gradientbased_1998] dataset. Each MNISTSuperpixels instance is a graph representation of the original MNIST instance. Two vertices are linked according to their spatial proximity. (b) PROTEINS [borgwardt_protein_2005] is a dataset counting 1113 labeled graphs. Each graph represents a protein that is classified as enzymes or nonenzymes. Nodes represent the amino acids and two nodes are connected if they also share the same spatial locality. (c) MSRC [shotton_textonboost_2009, winn_object_2005] datasets are used in image semantic segmentation problems. Each image in converted into a semantic superpixel version of it. In MSRC9, which is composed of 221 labeled graphs, semantics label are distributed among 8 semantic labels. In the MSRC21 version, composed of 563 labeled graphs, extends the number of possible semantic labels to 21.5.2 Objective assessment metrics
Assessing explanation quality or relevance given a phenomenon often deals with requiring a experted approval. Acquiring such expert validation is not always practically obtainable. Contextfree and objective method has been proposed for quantifying explanation method relevance. Explaining method quality is driven by quantifying the infidelity of the explaining method, measuring in which manner input masked by relevance features keep the deep model performances; and the sparsity, measuring the relative size of the relevant subdomain with respect to the whole domain size. Those metrics focus on explained signal itself rather than considering it with its structure particularity. In the context of graph deep models, explained graph signal X is denoted by .
Infidelity [adebayo_sanity_2018] quantifies in which manner the explanation maps provided by the explainer changes in average when the input signal is perturbed by a random variable following a law . For a explained graph , and a deep model , the infidelity is defined as
(9) 
with .
In this study, we consider unitreduced Gaussian noisy perturbation that we further respectfully refer as Gaussian perturbation and as unittensor perturbation that we refer to unit perturbation
Entropy is the Shannon entropy of the normalized node relevance map . The entropy of a probability distribution encodes the uncertainty amount induced in this probability distribution. It can be seen as a sparsity metric since the more certain (i.e. lower entropy) the distribution is, the more the relevance is spatially concentrated and the explanation arguments clearly identified.i On the contrary, the more the entropy is important, the more unclear the relevance attribution process is. It is defined as
(10) 
5.3 Experimental setup
Training
For the datasets PROTEINS, MSRC9, MSRC21, we train a GCN^{4}^{4}4We discard the GAT usage since it achieve similar results on both classification accuracy and explanationrelated metrics
based classifier composed of two chained GCN modules. We choose Relu fonction as activation fonction. A global pooling layer is then added, combined with a dense layer, allowing to classifying graphs according given classes of these datasets. For the MNISTSuperpixels dataset, we use four chained GCN modules,
activation functions and a concatenation of the output of global mean pooling and global maximum polling layers linked to a dense layer for achieving the classification task. Both model approach and their datasets produce accurate classifiers that we can rely on to apply our explaining method over. Accuracy of those deep models are provided in Supplementary. Those different implementations all use the Adam [kingma_adam_2017] version of the stochastic gradient descend approach with the same learning parameter equals to . We use an Intel Xeon Silver 4208 and Nvidia Tesla A100 40 GB GPU for our trainings.Explaining
Providing explanations of graph neural networks often deal with combinatorial problems (e.g. finding relevant subgraphs). It thus requires high amount of computes to obtain explanation method outcomes. Among stateoftheart methods and for a given instance of data and computing machine, only GNNExplainer provides their explanation in a realistic amount of time (i.e. few seconds) whereas PGExplainer and SubgraphX provide their explanation respectfully in ten minutes and three hours in the same experimental setup. Consequently, we only consider for this study GNNExplainer as a baseline method. For benchmarking purpose we have fixed our concepts number to and our explainee concept assimibility to . Considering the average number of nodes contained in each graphs of considered datasets, these parameters set up allow low probability of concept redundancy when they are sampled and provide finecoarsed explanations. ^{†}^{†}Code repository will be released.
5.4 Results
The objectives metric benchmark is summarized in Table 1. The impact of key parameter on EiXGNN objective assessments is described in Figure 1. Regarding objective metrics, EiXGNN achieve stronger results on both infidelity approaches than GNNExplainer. EiXGNN also provides explanation map with lower entropy than GNNExplainer. It means that explanation map produced through EiXGNN are more specified than those proposed by GNNExplainer. Regarding the impact of key parameter of EiXGNN on its ability to provide noninfidel explanation map and specified ones, it appears that the greater is the lower the infidel level of EiXGNN is (in both settings). Regarding the entropy of explanation map provided by EiXGNN, it also appears that the greater is, the lower the explanation map entropy is. Meaning that explanation map specification increases with the number of concepts considered for building explanation upon. These results show that the more the argument basis is dense (i.e. considering a larger set of concepts that cover a larger span of explaining arguments) the more EiXGNN is able to supply specific and phenomenonclosed explanation map. Since has an higher impact in terms of amount of computations for providing explanation maps, its impact on objective metrics will be further discussed in Supplementary.
Dataset  Explainer  Entropy  Infidelity (Gaussian)  Infidelity (unit)  

MNISTSuperpixels 





PROTEINS 





MSRC9 





MSRC21 




In this part, we supply additional work relative to the variation of the explainee concept assimibility. Although, it is a impacting computitionnal parameter as well as , it is the only parameter that is truly dependent on the explainee. Without regarding the explainee background relative to a phenomenon , an explainer must always provides the same explanation quality of the considered phenomenon. We provide here additionnal results that highlights that explanation quality, supplied by EiXGNN, does not depend on the explainee concept assimibility constraint.
Explainee concept assimibility quality impact study
The explainee concept assimibility is a key component of EiXGNN. This parameter is dependent on the explainee background regarding . The explainee concept assimibility variation should not alter the explanation quality provided by the explainer. We have isolated (i.e. ) the impact of shifting regarding objectives assessment metrics we have considered before. We summarized our results in Figure 2.
We observe that entropy, Infidelity with Gaussian noise and Infidelity with unit baseline are respectfully constant with respect to . It means that the value of does not impact the explanation quality provided by EiXGNN and that EiXGNN can be suited for providing explanation regarding a phenomenon to a large public with high knowledge variability concerning this phenomenon.
6 Conclusion
Deep learning models, especially graph neural networks, are more and more considered for solving academic and industrial contemporary problems. When such models are used in a sensitive context (healthcare, law, etc.) their powerfulness for tackling the problem are often at the expense of having unintelligible internal that raises important concerns for safety broad deploying. It is thus important to explain such internal (i.e. signal propagation). But providing an explanation of any phenomenon is also a socialdependent task and more specifically when we want highly profitable explanations. Explaining a phenomenon is a knowledge transfer between an explainer (higher knowledge) and explainee (lower knowledge) regarding this phenomenon. Stateoftheart explaining methods suited for graph neural networks only focus on the signal aspect and do not include any social aspect in their explanations. In this contribution, we propose a modular conceptbased approach that integrates such social aspect underlying any explanation processes. This approach highlights the social dependency of the explanation process by showing that considering richer argument basis supply more specified and less infidel explanation regarding a phenomenon. Our method also provides substantial improvement of explanation quality by providing better results than state of the art methods with respect to widely used objective assessment metrics.