DeepAI
Log In Sign Up

EiX-GNN : Concept-level eigencentrality explainer for graph neural networks

Explaining is a human knowledge transfer process regarding a phenomenon between an explainer and an explainee. Each word used to explain this phenomenon must be carefully selected by the explainer in accordance with the current explainee phenomenon-related knowledge level and the phenomenon itself in order to have a high understanding from the explainee of the phenomenon. Nowadays, deep models, especially graph neural networks, have a major place in daily life even in critical applications. In such context, those models need to have a human high interpretability also referred as being explainable, in order to improve usage trustability of them in sensitive cases. Explaining is also a human dependent task and methods that explain deep model behavior must include these social-related concerns for providing profitable and quality explanations. Current explaining methods often occlude such social aspect for providing their explanations and only focus on the signal aspect of the question. In this contribution we propose a reliable social-aware explaining method suited for graph neural network that includes this social feature as a modular concept generator and by both leveraging signal and graph domain aspect thanks to an eigencentrality concept ordering approach. Besides our method takes into account the human-dependent aspect underlying any explanation process, we also reach high score regarding state-of-the-art objective metrics assessing explanation methods for graph neural networks models.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

07/25/2021

GCExplainer: Human-in-the-Loop Concept-based Explanations for Graph Neural Networks

While graph neural networks (GNNs) have been shown to perform well on gr...
06/22/2021

Towards Automated Evaluation of Explanations in Graph Neural Networks

Explaining Graph Neural Networks predictions to end users of AI applicat...
06/16/2021

SEEN: Sharpening Explanations for Graph Neural Networks using Explanations from Neighborhoods

Explaining the foundations for predictions obtained from graph neural ne...
11/09/2020

Parameterized Explainer for Graph Neural Network

Despite recent progress in Graph Neural Networks (GNNs), explaining pred...
04/06/2021

Contrastive Explanations for Explaining Model Adaptations

Many decision making systems deployed in the real world are not static -...
07/27/2022

Encoding Concepts in Graph Neural Networks

The opaque reasoning of Graph Neural Networks induces a lack of human tr...
11/19/2021

Explaining GNN over Evolving Graphs using Information Flow

Graphs are ubiquitous in many applications, such as social networks, kno...

1 Introduction

Graphs are widely used data structures involved in many real-world problems. gnn [scarselli_graph_2009] are artificial neural networks suited for such data structure. For graph classification, node classification or link prediction tasks, gnn models have shown impressive performances [defferrard_convolutional_2016, xu_spatio-temporal_2019, zhang_link_2018]. Artificial neural networks are more and more used in daily life task, including gnn. gnn models show impressive results in drugs design [bapst_unveiling_2020], web recommendations [ying_graph_2018] or traffic forecasting [derrow-pinion_eta_2021]. A major drawback of those deep models is their occluded internal decisionnal processes. For many daily usage of such models, in particular in critical applications, it raises confidence, trustworthy, privacy and security concerns. xai is a set of methods that aims to tackle these issues by providing human-level meaningful insights about deep model internals. Nonetheless, understanding and interpreting decisions remain human-relative and context-dependent notions. One of those social-related requirements is that an explainer must adapt its explanation formulation according to the relative background of the explainee regarding the phenomenon to explain. Several interesting xai methods have been proposed for explaining graph neural network models but they often fail to take into account the social dependency when providing their explanations. In this contribution we provide a social-aware explaining method that leverages background knowledge variability that is inherent in any social-related process while maintaining high score regarding state-of-the-art objective assessment metrics. Firstly we will frame the social context that the explanation process depends on. Then we will introduce our approach and provide its relevance against compared methods.

2 Related Work

The explaining process has been intensively numerically investigated despite the lack of common ground knowledge of what an explanation is and how to compare them. Moreover several methods have been designed according to different paradigms. XGNN [yuan_xgnn_2020]

is a model-level approach that generates iteratively explaining graphs through a reinforcement learning procedures. During this sequential process, each graph is upgraded from the previous one, according to the learned policy. From an empty graph, nodes and related features are incorporated until a sufficient large graph sequence has been generated. The state space of this Markov Decision process can be assimilated to the cartesian product of the node space and finite feature space, which is a finite-dimensional space, under these conditions, a sufficient long graph sequence leads to an optimal solution. GNNExplainer

[ying_gnnexplainer_2019] is a mask generator model based on mutual information optimization. It starts with randomly initialized nodes and node features masks jointly optimized, with mutual information, against the class label of the assessed graph. PGExplainer [luo_parameterized_2020]

is also based on maximizing mutual information between a class label and a highly contributing graph towards GNN prediction. Explained graphs are sampled from a probability distribution where their parameters are learned to use a multiperceptron. SubgraphX

[yuan_explainability_2021] derived connected subgraph sequences from the input graph, to overcome the relevant information flow breaks that may arise in PGExplainer or GNNExplainer. To address this generation problem, SubgraphX uses a Monte Carlo Tree Search and assigns the shapely value of each graph; the higher the shapely value is, the more the subgraph is pertinent concerning explainability purposes. Layer-wise Relevance Propagation (LRP) [schnake_higher-order_2021] is an adaptation of the original LRP implementation. For computing relevance, LRP-GNN adopts a walk-based approach, introducing the node anteriority and apply the original LRP propagation rule. Assessing the quality of those methods also remains a core challenge for the xai community and some metrics have been proposed. They deal with explanation fidelity towards explained deep models. As well, sparsity measure is used to show the explanation compactness. Besides aforementioned methods show interesting results, they often miss social-dependent parameters, a central importance in any human-related explanation process. We tackle this issue by supplying an efficient explaining method for graph neural networks that take into account such aspects.

3 Problem formulation

Explaining is a human knowledge transfer process involving an explainer and an explainee concerning a phenomenon . In order to have a profitable conversation (e.g. providing the explantation of from to ), both involved individuals must share a common vocabulary set. It means that shared ideas must be expressed upon a shared set of concepts by both individuals. This allows the conversation to be profitable for them. For explanation purposes, the term profitable means increasing the knowledge quantity of of thanks to explanation. For explaining, those concepts are framed as atomic parts that, when carefully mixed, allow the explainer to provide an explanation of to the explainee . However, those elementary bricks are chosen conditionally to both knowledge quantity of and that are also dependent on . Indeed , if the explainee has already a solid background or culture relatively to , basic insights allowing shallow understanding of are already acquired by the explainee . Only finer details must be provided by the explainer for explainee to have total understanding of . On the contrary, an explainee who has freshly begun to be interested in must assimilate the coarsed concepts relative to before reaching the finest ones with the explainer having to adapt his vocabulary complexity in order to be understandable.

3.1 GNN explanation framework

Graph Neural Network

A graph is a couple of two sets. The set of size is the set of nodes of . We abusively denote by the size of which is the cardinal of . The set is the set of edges describing the topology of . This set can be fully encoded by its adjacency matrix A defined as such that if , otherwise. A graph is said to be either undirected if or directed otherwise. In the context of graph representation learning, is rather seen as a domain which structure is determined by the topology of (i.e. described thanks to A) and a -valued signal with as a support. Here denotes a -dimensional Hilbert space. Under this paradigm, graphs are couples where

. One singularity of this data structure is that node ordering do not matter. Graphs are very rich mathematical objects that are widely used for representing real-world problems. There is an increasingly interest from the community to integrate such data structure in deep learning frameworks. Powerful graph signal encoder has been proposed. They leverage at the same time; deformation stability and scale separation, ubiquitous notions in modern deep learning approach; and the aforementioned permutation insensibility. Such models are named gnn. For instance, inspired by cnn, gcn

[kipf_semi-supervised_2017] proposes a learning module which follows a nonparametric local signal permutation-invariant aggregation scheme. This module has been extended by the gat [velickovic_graph_2018] module which proposes an attentional node-pairwise interaction scheme for encoding local signals. Those modules show strong results for node classification, graph classification or link prediction problems.

Supervised graph classification problems

For two measurable spaces, we define the set of measurable functions going from to . Given an i.i.d sampled finite dataset where each element is a graph and its label

representing the class it belongs to. A loss function is mapping

quantifying how well a learning mapping associated to its true label conditioned by a neural network architecture and a learning parameter . For a given architecture , we seek such that:

(1)

where is -unseen data and where is a

-valued random variable,

is a -valued random variable and

is the image probability measure of

in . In the context of graph classification, is a gnn model and is the cross-entropy loss between the inferred label conditional probability law and its ground truth-conditional probability law.

Explaining graph classification models

Like modern deep learning models, gnn model leverages signal evolving on domain localities describing elementary low-level representations which are recursively combined (producing higher-level representation) in order to englobe all those sub-representations and carefully encoding them until producing the output. At a given model depth, such subrepresentations are, in a domain aspect: subgraphs of the input graph; and in a signal aspect: sub-parts of the signal evolving on such subgraph structure. Under the aforementioned graph learning paradigm, we redefine the subgraph , denoted as , of a graph of size . Considering , we define the subgraph of such that and is the adjacency matrix of which is the same size as A but we respect the new node adjacency distribution induced by . More generally, explaining methods that deal with learned signal thanks to a deep model often focus on finding relevant subdomains and associated signals that preserve model abilities (e.g. model performance, model expressivity, etc.) without taking into account the social aspect underlying any explanation procedures. The design of EiX-GNN has as key component this social-aware feature that, as far as we know, state-of-the-art methods do not consider.

4 EiX-GNN

EiX-GNN (eigencentralality explainer for graph neural network) provides its explanations according to a set of atomic concepts. These concepts are for explanation processes that coins are for money exchanges. They are the elementary parts of the explanation process that explainers, when explaining, will build their arguments upon those concepts. Those concepts must be carefully chosen by the explainer in order to match the explainee background on the explained phenomenon. With assuming that the explainer has an optimal knowledge of a phenomenon regardless concept selection, the concept selection process depends on the background (relatively to ) of the explainee and . EiX-GNN has been designed to integrate this social-dependancy on the explainee background given a phenomenon to explain. Formally, we frame the set of explainee-admissible concepts as a probability space where concepts are -valued random variable. Parameter is the explainee concept assimibility constrain. It is bounded as and is proportional to the explainee concept assimibility given . We define phenomenon as being a -valued random variable 111Note : since is a -valued random variable, it follows that the node set of is a -valued random variable and the edge set of is a -valued random variable. The probability space can thus be seen as the product of which is a probability space as well.. In the following, excepting for contrary mentions, we consider the phenomenon where

is an optimized gnn classifier which has been trained on a dataset

which belongs to. We consider that graph is composed of nodes and edges. We also assume that the explainee has an explainee concept assimibility constraint . EiX-GNN provides its explanation based on a conditioned local and global explainee-suited concept ordering. Firstly, we introduce concept generation procedure, then the global concept ordering process which is the common thread of the overall explaining procedure is described. Finally, the local concept ordering procedure is presented, this second step is a refining procedure that highly precise at a node level the provided explanation.

4.1 Concept generation

As mentioned above, concepts are atomic elements that allow the explainer to provide its explanation. Given the explainee concept assimibility , concept is a -valued random variable. This variable is a subgraph of such that . From a signal point of view, it describes a sub-part of the signal evolving on a subdomain of . To generate those concepts, we have selected sampling approaches which depends either on a prior distribution or not. Sampling concept is thus a subgraph sampling process which has a combinatorial aspect inherent of any subgraph sampling problems. Concepts are key components of our approach, they have to be carefully selected since they are providing our raw materials for conceiving explanations. From all possible subgraphs we can derive from , some are more suited for providing explanation of than others. Assuming a uniform relevance distribution for explaining among all those graphs is not adapted, thus assuming that the sampling distribution is is not adapted either. We rather consider a light importance sampling approach that quantifies the prior relevance distribution of nodes conditionally to . For building such probability distribution, we apply a node ablation approach that assesses the importance of nodes within their neighborhood with respect to . Formally, for a neighboring node of . To quantify node ablation importance we define a random variable that measures the relative disturbance effect between two nodes relatively to (e.g. relative performance alteration impact of removing from ). With assuming a uniform relevance distribution of nodes composing , we defined the prior relevance distribution of the node conditionally to by:

(2)

With a normalizing constant such that we obtain a prior node importance probability distribution that allows more efficient sampling process for determinitating pertinent concepts with respect to . Once such prior distribution is determined, we sample in an i.i.d manner realizations of which we denote by where each node composing the subgraph has been sampled thanks to the prior node sampling distribution. Next, we will present the procedure for hierarchizing those concepts relatively to . The impact of the values taken by and will be further investigated in supplementary part.

4.2 Global concept ordering

Once concepts are sampled, we must find an ordering relationship in order to classify their relevance according to . Thanks to the prior node importance sampling approach, we have already established such hierarchization but among all possible subgraphs of with size which considerably reduces the research perimeter of the optimal substructure that will explain . Instead, here we present an ordering method that hierarchizing pair-wisely concepts among the sampled concepts. Considering these concepts, we build an operational research tree with as root and these concepts as leaves. Without any further works, we do not know yet if a concept is more relevant than another concept for explaining . In order to provide such ordering, we derive from the sample a complete graph where each node represents a concept and edge of represents the relative similarity between two concepts relatively to . Since in this context graph are seen as signal evolving on a precise deformation, we take into account each both aspects for quantifying concept similarities pairwisely.

Relative concept domain similarity

We define the domain similarity between two concepts as the relative edge density between and . The graph edge density of a concept , denote is the ratio between the actual edges composing over the total number of possible edges can be composed of. For a graph with nodes and edges, it is defined as follows

(3)

It measures how

tends to be a complete graph. We choose this measure because of the local aggregation operation (e.g. sum) involve in many gnn models. It appears that complete substructures (i.e. subdomains) aggregate much more signal information than sparser substructures. Indeed, widely used gcn-based or gat-based gnn models aggregate node signal representation according to node neighborhoods. The completeness aspect of such a substructure does not ensure that they are more relevant than a sparser one but we empirically measure that node feature has a significant variance (i.e. signal variance) across the node set of

. From a statistical point of view, by averaging node information with many node signal representations it produces a more fidel local signal representation than with taking less node signal representation. As well as in local signal representation, when dealing with the same information aggregation strategy, the error made in the relevance attribution of a node given is much farther from the one which has been obtained with far fewer neighboring nodes. The prior on the complete structure of is thus more suited for less node importance attribution error than with a more degenerated concept . With relative edge density, we favor concepts that are closer to complete substructure than those which are more degenerated.

Relative concept signal similarity

The concept signal similarity quantifies how similar behaviors are with respect to when the signal is propagated over a given concept subdomain and when it is propagated over another subdomain supplied by another concept. Let assume that we considered two concepts and , the case where is similar to given means that sees equivalently and . Considering does not provide any added value than solely considering itself, with respect to . As a similarity metric between two concepts and we use the Kullbach-Liebler divergence of both inferred probability distributions of and thanks to . Formally, we frame as the behavior similarity metric concerning two concepts and by :

(4)

where denotes the Kullbach-Liebler divergence.

Computing in a pairwise manner relative concept domain and signal similarity of the concepts, we obtain a pairwise concept ordering between those concepts. Indeed, with considering two concepts and such that , quantifies how much and are similar in both, domain and signals, aspects at the same time. This quantity provides a relational measurement of two given concepts. It becomes natural to consider as being the entries of the adjacency matrix of that we denote now by that is not symmetric. In terms of both concepts signal and domain disimiliraty, the concept that has to be considered the most is the one identified has been the highliest dissimilar concepts within concepts relatively to . Under the terms of graph theory, such a concept is the one which has the highest eigencentrality value among concepts of . With considering:

(5)

as the normalized version of with e

the unit-vector of size

, we have that the eigencentrality vector of

is the right eigenvector

r 222Since

is a stochastic matrix it always admits an eigenvalue equal to 1. Analogously, we can see

as a

-valued irreducible and recurrent Markov chain which admits its stationary law

r.
of

which has 1 as eigenvalue. Formally

r satisfies the following equation :

(6)

Thereby, components of provide a natural ordering between sampled concepts given .

4.3 Local concept ordering

For each of the concepts, the global concept ordering phase assigns a global score to each concept . It means that nodes composing have a uniform contribution in explaining proportional to . Nonetheless, we know that in any graph learning problems, each node provides a varying contribution towards the learning task as mentioned in section [number]. Previously, we took the same approach for getting some trends concerning prior node relevance distribution, this compute has a complexity where is the number of nodes composing . Here we consider subgraphs of which have a fixed size for all . Since we deal with smaller graphs (i.e. concepts), we investigate a more precise strategy for quantifying node relevance distribution. Given a concept , it deals with computing the Shapley value of each node composing

. The Shapley value is a conceptual solution in cooperative game theory quantifying how important the marginal role of a player has in the game outcome. Considering a coalition of

players indexed within playing a cooperative game with a game payoff measurable function where denotes all subsets of . With supplying counting measure to , the Shapley value of a player , is defined by

(7)

Computing such value has a complexity of which is far more than the previous shallow node relevance computing procedure. Under our context, we want to provide at a node level a precise concept relevance value regarding . With being consistent with the approach described in [section] (i.e. quantifying behavior changing when removing some nodes). For a given node belonging to a concept , computing the Shapley value of required to consider all possible subgraphs of and compute the perturbing effect value of over . As mentioned, this computes is intensive. If we assume that a call take then it requires an amount of time surrounding which is dependent on the explainee concept assimibility constraint. We can adopt a Monte Carlo sampling strategy for computing the Shapley value of in order to have a reasonable time computation for any level of explainee concept assimibility constraint with a quantifiable error on the approximation made.

Once achieving global and local concepts ordering, we concatenate node relevance distribution described under the node set of that we denote by . Then, we provide our explanation mapping 333We project this relevance map to by considering . of through

(8)

5 Experiments

We first introduce datasets we used to provide our explanation through EiX-GNN. We then provide the training conditions and setup of each of our gnn classifier. Finally, we provide a quantitative objective assessment of EiX-GNN according to objective metrics that are widely used in the literature.

5.1 Datasets

In order to provide meaningful results, we chose real-world datasets that incorporate human intelligible features. Each of the following datasets is suited for graph classification problems. (a) MNISTSuperpixels [bronstein_geometric_2017]

is a dataset composed of 60000 graphs, that each represents a superpixel version of the well-known handwritten digit MNIST

[lecun_gradient-based_1998] dataset. Each MNISTSuperpixels instance is a graph representation of the original MNIST instance. Two vertices are linked according to their spatial proximity. (b) PROTEINS [borgwardt_protein_2005] is a dataset counting 1113 labeled graphs. Each graph represents a protein that is classified as enzymes or non-enzymes. Nodes represent the amino acids and two nodes are connected if they also share the same spatial locality. (c) MSRC [shotton_textonboost_2009, winn_object_2005] datasets are used in image semantic segmentation problems. Each image in converted into a semantic superpixel version of it. In MSRC-9, which is composed of 221 labeled graphs, semantics label are distributed among 8 semantic labels. In the MSRC-21 version, composed of 563 labeled graphs, extends the number of possible semantic labels to 21.

5.2 Objective assessment metrics

Assessing explanation quality or relevance given a phenomenon often deals with requiring a -experted approval. Acquiring such expert validation is not always practically obtainable. Context-free and objective method has been proposed for quantifying explanation method relevance. Explaining method quality is driven by quantifying the infidelity of the explaining method, measuring in which manner input masked by relevance features keep the deep model performances; and the sparsity, measuring the relative size of the relevant subdomain with respect to the whole domain size. Those metrics focus on explained signal itself rather than considering it with its structure particularity. In the context of graph deep models, -explained graph signal X is denoted by .

Infidelity [adebayo_sanity_2018] quantifies in which manner the explanation maps provided by the explainer changes in average when the input signal is perturbed by a random variable following a law . For a -explained graph , and a deep model , the infidelity is defined as

(9)

with .

In this study, we consider unit-reduced Gaussian noisy perturbation that we further respectfully refer as Gaussian perturbation and as unit-tensor perturbation that we refer to unit perturbation



Entropy is the Shannon entropy of the normalized node relevance map . The entropy of a probability distribution encodes the uncertainty amount induced in this probability distribution. It can be seen as a sparsity metric since the more certain (i.e. lower entropy) the distribution is, the more the relevance is spatially concentrated and the explanation arguments clearly identified.i On the contrary, the more the entropy is important, the more unclear the relevance attribution process is. It is defined as

(10)

5.3 Experimental setup

Training

For the datasets PROTEINS, MSRC-9, MSRC-21, we train a GCN444We discard the GAT usage since it achieve similar results on both classification accuracy and explanation-related metrics

-based classifier composed of two chained GCN modules. We choose Relu fonction as activation fonction. A global pooling layer is then added, combined with a dense layer, allowing to classifying graphs according given classes of these datasets. For the MNISTSuperpixels dataset, we use four chained GCN modules,

activation functions and a concatenation of the output of global mean pooling and global maximum polling layers linked to a dense layer for achieving the classification task. Both model approach and their datasets produce accurate classifiers that we can rely on to apply our explaining method over. Accuracy of those deep models are provided in Supplementary. Those different implementations all use the Adam [kingma_adam_2017] version of the stochastic gradient descend approach with the same learning parameter equals to . We use an Intel Xeon Silver 4208 and Nvidia Tesla A100 40 GB GPU for our trainings.

Explaining

Providing explanations of graph neural networks often deal with combinatorial problems (e.g. finding relevant subgraphs). It thus requires high amount of computes to obtain explanation method outcomes. Among state-of-the-art methods and for a given instance of data and computing machine, only GNNExplainer provides their explanation in a realistic amount of time (i.e. few seconds) whereas PGExplainer and SubgraphX provide their explanation respectfully in ten minutes and three hours in the same experimental setup. Consequently, we only consider for this study GNNExplainer as a baseline method. For benchmarking purpose we have fixed our concepts number to and our explainee concept assimibility to . Considering the average number of nodes contained in each graphs of considered datasets, these parameters set up allow low probability of concept redundancy when they are sampled and provide fine-coarsed explanations. Code repository will be released.

5.4 Results

The objectives metric benchmark is summarized in Table 1. The impact of key parameter on EiX-GNN objective assessments is described in Figure 1. Regarding objective metrics, EiX-GNN achieve stronger results on both infidelity approaches than GNNExplainer. EiX-GNN also provides explanation map with lower entropy than GNNExplainer. It means that explanation map produced through EiX-GNN are more specified than those proposed by GNNExplainer. Regarding the impact of key parameter of EiX-GNN on its ability to provide non-infidel explanation map and specified ones, it appears that the greater is the lower the infidel level of EiX-GNN is (in both settings). Regarding the entropy of explanation map provided by EiX-GNN, it also appears that the greater is, the lower the explanation map entropy is. Meaning that explanation map specification increases with the number of concepts considered for building explanation upon. These results show that the more the argument basis is dense (i.e. considering a larger set of concepts that cover a larger span of explaining arguments) the more EiX-GNN is able to supply specific and phenomenon-closed explanation map. Since has an higher impact in terms of amount of computations for providing explanation maps, its impact on objective metrics will be further discussed in Supplementary.

Figure 1: Impact of parameter (i.e. number of concepts) over infidelity and entropy quality measures for a selection of datasets
Dataset Explainer Entropy Infidelity (Gaussian) Infidelity (unit)
MNISTSuperpixels
EiX-GNN
GNNExplainer
9.41E-01 ( 3.6E-03)
1.30E+03 (1.8E+00)
5.69E+00 ( 1.8E-01)
2.43E+05 (2.6E+02)
5.69E+00 ( 1.8E-01)
2.44E+05 (1.5E+03)
PROTEINS
EiX-GNN
GNNExplainer
9.37E-01 ( 1.0E-03)
5.21E+01 ( 1.0E+04)
2.38E-01 ( 1.1E-03)
3.36E+02 ( 3.8E+01)
2.78E-01 ( 1.2E-03)
3.47E+02 ( 3.9E+02)
MSRC-9
EiX-GNN
GNNExplainer
9.02E-01 ( 4.0E-04)
7.84E+01 ( 1.3E+03)
2.29E-05 ( 6.6E-01)
1.14E+03 (6.9E+03)
9.68E-08 ( 6.1E-01)
1.12E+03 (1.1E+03)
MSRC-21
EiX-GNN
GNNExplainer
8.54E-01 (4.9E-04)
2.79E+02 (4.2E+03)
2.02E+00 (1.4E+00)
1.03E+04 (6.8E+03)
2.05E+00 (1.5E+00)
1.04E+04 (7.2E+03)
Table 1: Comparision between EiX-GNN and GNNExplainer over three objective quality assessment measures for a selection of datasets

In this part, we supply additional work relative to the variation of the explainee concept assimibility. Although, it is a impacting computitionnal parameter as well as , it is the only parameter that is truly dependent on the explainee. Without regarding the explainee background relative to a phenomenon , an explainer must always provides the same explanation quality of the considered phenomenon. We provide here additionnal results that highlights that explanation quality, supplied by EiX-GNN, does not depend on the explainee concept assimibility constraint.

Explainee concept assimibility quality impact study

The explainee concept assimibility is a key component of EiX-GNN. This parameter is dependent on the explainee background regarding . The explainee concept assimibility variation should not alter the explanation quality provided by the explainer. We have isolated (i.e. ) the impact of shifting regarding objectives assessment metrics we have considered before. We summarized our results in Figure 2.

Figure 2: Impact of with concept number

We observe that entropy, Infidelity with Gaussian noise and Infidelity with unit baseline are respectfully constant with respect to . It means that the value of does not impact the explanation quality provided by EiX-GNN and that EiX-GNN can be suited for providing explanation regarding a phenomenon to a large public with high knowledge variability concerning this phenomenon.

6 Conclusion

Deep learning models, especially graph neural networks, are more and more considered for solving academic and industrial contemporary problems. When such models are used in a sensitive context (healthcare, law, etc.) their powerfulness for tackling the problem are often at the expense of having unintelligible internal that raises important concerns for safety broad deploying. It is thus important to explain such internal (i.e. signal propagation). But providing an explanation of any phenomenon is also a social-dependent task and more specifically when we want highly profitable explanations. Explaining a phenomenon is a knowledge transfer between an explainer (higher knowledge) and explainee (lower knowledge) regarding this phenomenon. State-of-the-art explaining methods suited for graph neural networks only focus on the signal aspect and do not include any social aspect in their explanations. In this contribution, we propose a modular concept-based approach that integrates such social aspect underlying any explanation processes. This approach highlights the social dependency of the explanation process by showing that considering richer argument basis supply more specified and less infidel explanation regarding a phenomenon. Our method also provides substantial improvement of explanation quality by providing better results than state of the art methods with respect to widely used objective assessment metrics.

References