GNNVis: A Visual Analytics Approach for Prediction Error Diagnosis of Graph Neural Networks

11/22/2020 ∙ by Zhihua Jin, et al. ∙ Singapore Management University ibm The Hong Kong University of Science and Technology 0

Graph Neural Networks (GNNs) aim to extend deep learning techniques to graph data and have achieved significant progress in graph analysis tasks (e.g., node classification) in recent years. However, similar to other deep neural networks like Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), GNNs behave like a black box with their details hidden from model developers and users. It is therefore difficult to diagnose possible errors of GNNs. Despite many visual analytics studies being done on CNNs and RNNs, little research has addressed the challenges for GNNs. This paper fills the research gap with an interactive visual analysis tool, GNNVis, to assist model developers and users in understanding and analyzing GNNs. Specifically, Parallel Sets View and Projection View enable users to quickly identify and validate error patterns in the set of wrong predictions; Graph View and Feature Matrix View offer a detailed analysis of individual nodes to assist users in forming hypotheses about the error patterns. Since GNNs jointly model the graph structure and the node features, we reveal the relative influences of the two types of information by comparing the predictions of three models: GNN, Multi-Layer Perceptron (MLP), and GNN Without Using Features (GNNWUF). Two case studies and interviews with domain experts demonstrate the effectiveness of GNNVis in facilitating the understanding of GNN models and their errors.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Graphs are pervasive in various applications, such as citation networks, social media, and biology. Analyzing graph data helps us understand the hidden patterns in graphs and benefits many graph-related tasks, including node classification, link prediction, and graph classification. For example, an effective analysis of a paper citation graph can facilitate the prediction of a new paper [23, 17]. A careful exploration of social networks can benefit the creation of an adaptive friend recommendation system in social media [8]

. By modeling molecules as graphs, where atoms and chemical bonds are treated as nodes and edges respectively, we can build machine learning techniques to predict the chemical properties (

e.g., solubility) of chemical compounds [14].

In recent years, graph analytics has embraced a new breakthrough—Graph Neural Networks (GNNs). A fast growing number of GNN models have been proposed to solve graph-based tasks. For example, Graph Convolutional Network (GCN) [23]

adapts the convolutional operation from natural images to graphs and conducts semi-supervised learning to perform node classification on them. Graph Attention Network (GAT) 

[45]

further integrates the attention mechanism, which is widely used in Natural Language Processing (NLP), into the GNN model architecture and dynamically assigns weights to different neighbors to enhance the model performance. The advances of GNNs bring new opportunities to the analysis of graph data and have become increasingly popular in recent years. However, similar to other deep neural networks, GNN models also suffer from the difficulty of interpreting their working mechanisms. When developing or using GNNs, developers and users often need to evaluate the model performance and explore the causes of model errors and failures, which, unfortunately, is often hard to achieve. Therefore, how to enable convenient error diagnosis of GNN models has become a challenging but significantly important task.

Visualization has been applied to helping model developers devise new deep learning techniques, and debug and compare different types of deep neural networks [19]. For example, various visualization techniques have been proposed to facilitate the development of a variety of deep learning models, such as CNN [32], RNN [33], GAN [48], and DQN [47]. These visualizations have achieved great success in understanding and analyzing those deep learning models. However, it is very challenging to directly apply them to GNNs, since most of those techniques are exclusively designed for Euclidean data like images and text, while GNNs mainly works on non-Euclidean data such as graphs.

Another challenge for the error diagnosis of GNNs comes from the fact that GNNs often involve both the complex topological structures and high dimensional features of graphs, as well as the interplay between them. To effectively analyze GNNs, it is crucial to properly link the topological data, high dimensional features, and prediction results with a comprehensive workflow. Preliminary studies [38, 54, 27] have proposed techniques to explain GNN model prediction results. Most of them focus on instance analysis, i.e., explaining a prediction for single nodes. However, there still lacks the ability and research at a higher level, i.e., analyzing and understanding the common causes of the classification errors of groups of nodes. Their methods make it difficult to conveniently explore the general error patterns in the prediction results of a GNN model, as well as further gain insights for model improvement. In summary, it still remains unclear on how to develop new visualization techniques to facilitate the effective error diagnosis of GNNs.

In this paper, we propose a novel error-pattern-driven visual analytics system, GNNVis111https://gnnvis.github.io/, to provide model developers and users with deep insights into model performance and its dependency on data characteristics. Instead of analyzing the GNN prediction results of single instances, we investigate the patterns in the prediction results shared by a group of instances to obtain generalizable insights into the model architecture. We worked closely with two GNN experts for four months to derive the design requirements of GNNVis. GNNVis comprises five parts: Control Panel, Parallel Sets View, Projection View, Graph View, and Feature Matrix View. The Parallel Sets View enables users to see the distribution of node-level metrics. The Projection View presents a set of 2D projections of the selected nodes according to metrics summarized from different perspectives, enabling users to extract potential clusters of nodes. Novel node glyphs in Projection View are proposed to help users to conveniently learn about the multiple metrics of nodes and extract general error patterns. We conducted two case studies and expert interviews to demonstrate the effectiveness and usability of GNNVis in helping model developers understand and diagnose GNNs.

The contributions of our work can be summarized as follows:

  • A visual analytics system to assist model developers and users in understanding and diagnosing GNNs.

  • A set of novel node glyphs to help users conveniently learn about the metrics of nodes.

  • Case studies on analyzing error patterns in GNN prediction results and interviews with domain experts to demonstrate the effectiveness and usability of the proposed system.

The remainder of this paper is organized as follows. Section 2 discusses the related work of this paper, including GNNs, visual analytics in deep learning, and GNN explainability. Section 3 provides a brief introduction to the basic concepts of GNNs, such as typical architectures GCN and GAT. By working closely with domain experts, we summarize the design requirements of understanding and diagnosing GNN models in Section 4 and further introduce the technical details of the proposed approach GNNVis. We evaluate our approach through case studies and expert interviews in Section 6 and discuss the possible limitations and future work of our approach in Section 7. Section 8 concludes the paper with a brief summary of the proposed method.

2 Related Work

The related work of this paper can be categorized into three groups: GNNs, visual analytics in deep learning, and GNN explainability.

2.1 GNNs

GNNs have been developed to analyze graph data by extending CNNs or RNNs to the graph domain [57] in the past few years. These neural networks have gained promising prediction results for analyzing graphs.

For the GNNs derived from CNN, they can be categorized into spectral approaches and spatial approaches [57]. Spectral approaches define convolution on the spectral representation of graphs [6, 9, 23]. The work done by Bruna et al. [6] is the first attempt to generalize the convolution concept from natural images to the graph domain. Defferrard et al. [9]

approximated the spectral convolution as Chebyshev polynomials of the diagonal matrix of eigenvalues, resulting in further low computational cost. Kipf and Welling

[23] further simplified the Chebyshev polynomials by using the first order of polynomials and renormalization tricks, known as GCNs, which have inspired many follow-up studies. Spatial approaches directly define convolution on spatially close neighbors [17, 11, 2, 58, 36, 15, 35]. Hamilton et al. [17] proposed GraphSAGE, which uses sampling methods and aggregators defined over the neighborhood to reduce dependence on processing whole graphs. Their approach greatly accelerates the GNN used in large scale graphs. Another direction is to extend RNN to the graph domain. Prior studies have attempted to utilize the gate function in GNNs to improve its ability to propagate information across graph structure [28, 43, 55, 29, 50].

Researchers have also made significant progress in analyzing GNN models. For example, Li et al. [25] showed that the graph convolution of a GCN is merely a Laplacian smoothing operation but when the number of layers increases, the risk of over smoothing will be increased. Also, they showed that when few training labels are given to train GCN models, co-training methods and self-training methods will improve the performance of GCN models. Xu et al. [52] provided a theoretical framework to analyze expressive power for GNNs and proved that their proposed model is as expressive as the Weisfeiler Lehman graph isomorphism test. Different from these studies, this study is aimed at extracting general error patterns of GNN models and further helps model developers understand and diagnose the models.

2.2 Visual Analytics in Deep Learning

Nowadays, there is a growing trend to use visualizations to understand, compare, and diagnose deep neural networks [19]. Prior studies on using visual analytics to enhance the interpretability of deep neural networks can generally be categorized into two types: model-agnostic visualizations and model-specific visualizations. For model-agnostic visualizations, prior studies mainly focus on visualizing the model input and output to provide insights into the correlation between them [56, 1] or using surrogate models to explain the deep neural networks [34, 49]. However, these model-agnostic visualizations avoid showing the hidden states of the deep neural networks and fail to reveal the inner working mechanism of different models.

To support a dive into the deep learning models, researchers have also proposed a series of model-specific visualizations for explaining deep learning models. Previous model-specific visualizations have covered a wide range of deep learning models, including CNN, RNN, and GAN. A variety of visualization techniques and interactions have been designed based on the data type, the model structures, and the working mechanism of different deep learning models. Since CNN and RNN are the most widely-used deep learning models [24, 16], a majority of model-specific visual analytics are proposed for both types of models. For example, CNNs are usually modeled using the directed acyclic graph visualization, and the output of each layer is usually displayed using matrix-based visualizations [32, 30, 37]. To open the black box of RNNs, clustering methods and correlation visualizations have been proposed to uncover the dynamic hidden states and learned patterns in RNNs [33, 42, 41]. Recently, visual analytics methods tailored for generative models [48, 31, 22]

and reinforcement learning models 

[47] have also been proposed.

Despite much work haveing been done by using visualization approaches to improve the explainability of deep learning models, little research has been conducted on enhancing the explainability of GNNs through visualizations. To fill this research gap, this paper contributes a visualization tool to assist in the understanding and diagnosis of GNNs.

2.3 GNN Explainability

According to our research, only a few studies have attempted to explain GNN models. For instance, Baldassarre et al. [38] explored the possibilities of adapting explanation techniques from CNNs to GNNs. They empirically evaluate three widely-used CNN explanation methods, i.e., Sensitivity Analysis (SA), Guided Back Propagation (GBP), and Layer-wise Relevance Propagation (LRP) when explaining GNN decisions. They found that explanations produced by SA or GBP tend to be inconsistent with the human interpretation, while LRP produces more natural explanation results. Meanwhile, Ying et al. [54] proposed GNNExplainer, which uses a subgraph to explain the GNN model prediction. Given a trained GNN model, they formulate an optimization task to maximize the mutual information between the trained model prediction and distribution of possible graph structures. They regard the subgraph as the explanation results. Li et al. [27] further extended GNNExplainer which is designed for an undirected unweighted graph to suit a directed weighted graph.

Previous studies have mainly focused on providing an instance-based explanation, rather than more insights on analyzing the classification errors made by GNNs. Different from previous studies on GNN explainability, our work mainly focuses on analyzing error patterns made by GNN models and gives a different perspective for model developers and users to inspect the model and become familiar with error patterns in the model prediction.

Fig. 2: Given an input graph (a), GNN predicts the label of the target node (e.g., the blue node) by aggregating the information from neighboring nodes (b).

3 Background

GNNs are deep neural networks that directly operate on graphs (i.e., networks). A graph can be represented as , where denotes the vertex set and denotes the edge set. is the feature matrix of the graph, where denotes the number of nodes in the vertex set and is the dimension of each node feature. The labels of the nodes in the graph are often denoted as . In this paper, we do not consider the features in edges.

We adopt similar notations introduced in [13] to illustrate the concept of GNNs. GNNs can be generally expressed in a neighborhood aggregation or message passing scheme [18], as shown in Fig. 2. A general message passing function for GNN is shown as below:

(1)

where denotes node features of node in layer . denotes the neighborhood of node . denotes a differentiable, permutation invariant function, e.g., sum. and denote differentiable functions such as MLPs.

GCNs and GATs are two popular GNNs. According to the study by Kipf et al. [23], the message passing function of GCN can be defined as follows:

(2)

where the features of the neighbors of node are first transformed by a weight matrix , then normalized by their degree and finally summed up. is a non-linearity function.

GAT is first proposed by Veličković et al. [45] and its message passing function is defined as follows:

(3)

where and are similarly defined as above. Different from GCN, GAT assigns different weights (attention coefficients) to each neighbor. The attention coefficients are computed as:

(4)

where

is a weight vector and LeakyReLU is an activation function which is defined as

.

The GNN models are mainly applied to the tasks of node classification and link prediction in individual graphs. In this paper, we take node classification as an example to illustrate how our approach can improve the interpretation of GNN models and facilitate model diagnosis. Such kinds of node classification tasks are often done in a semi-supervised way. Given a set of labeled nodes (i.e., training nodes) in a graph, a GNN model will be trained to predict the labels of the rest of the nodes in the graph.

4 Design Requirement Analysis

We work closely with two GNN experts, who are also co-authors of this work, to collect their feedback on the GNN interpretation issues they are facing and their current practices of understanding and diagnosing GNN models. One expert (E1) is a senior researcher who specializes in developing new kinds of GNNs. The other expert (E2) is a deep learning developer with strong experience in applying GNNs to modeling and analyzing the topology data from different application domains such as online education and visualization. Also, the development of GNNVis was conducted in an iterated way. After we finished each version of the system, we asked experts to use the pilot system, comment on the limitations, and suggest possible improvements. By combining the original requirements proposed by the experts and their subsequent comments on the limitations of the systems, we complied a list of major design requirements proposed by the domain experts, which can be summarized as follows:

R1: Provide an overview of GNN results. All experts commented that an overview of the GNN performance is crucial for the GNN analysis. To gain an overview of the dataset and classification results, the system needs to summarize various types of information, such as degree distribution and ground truth label distribution. This information, covering various aspects of a GNN model, needs to be organized and presented in a clear manner. Meanwhile, the correlation among this information should be presented to help users develop initial hypotheses about any possible error patterns in GNN results, i.e., a set of wrong predictions that share similar characteristics.

R2: Identify error patterns. After developing initial hypotheses about the error patterns, users need more detailed information to verify them. Specifically, users need to examine the characteristics shared by a set of wrong predictions and verify whether error patterns formed by these characteristics make sense in analyzing GNNs based on their domain knowledge. During the interview, experts agreed that they usually use several characteristics to group the wrong predictions and identify error patterns. For example, one expert stated that “misclassified nodes usually have a relatively large shortest path distance to the labeled nodes.” Therefore, the system should support users in examining these characteristics and identifying error patterns.

R3: Analyze the cause of error patterns. After identifying error patterns, finding the causes of these errors is important for users to understand, diagnose, and improve the GNNs. More detailed information is needed to understand the possible causes of error patterns. Specifically, users need to inspect the graph structures and node features to determine the causes of error patterns. According to the feedback from expert interviews, there are two main sources of wrong GNN predictions: noise in the training data and inaccurate feature aggregation in GNNs. To predict the label of a node, GNN aggregates the node’s own feature with the features of the neighboring nodes at each layer. Noise in the training data, e.g., the same nodes but different labels, can confuse the GNN and lead to wrong predictions. Inaccurate feature aggregation at any layer will also influence the GNN prediction of the node.

5 GNNVis

This section describes the details of the proposed approach, GNNVis. We first provide a system overview. Inspired by the fact that GNN prediction results are influenced by graph structure and node features [57], we define two proxy models and various kinds of metrics to help users effectively comprehend the cause of errors in GNN prediction results. Similar to the ablation study when evaluating GNN models [52], we use two proxy models GNNWUF and MLP to reflect the respective impact of the graph structure and node features on the GNN prediction results. To further help understand the impact of the graph structure and node features, we also provide a number of metrics, including graph structure-based metrics that take into account the graph structure but ignore the node features, and node features based metrics that take the node features into account but ignore the graph structure. Detailed information on proxy models and metrics are provided. Finally, we introduce the detailed visualization design of each view.

5.1 System Overview

The GNNVis

system consists of three major modules: storage, data processing, and visualization. The storage module stores and manages graph data and models. The data processing module implements the necessary procedures for analyzing the graph and model predictions, especially for calculating various kinds of metrics. The processed data is then passed to the visualization module, which supports the interactive visual analysis of the GNNs. The storage and data processing modules are developed using Python and integrated into a back-end web server built with Flask. The GNN models are implemented with PyTorch. We implement the visualization module as a front-end application using React, Typescript, and D3.

5.2 Proxy Models Training and Metrics Definition

We define two proxy models to analyze the influence of the graph structure and node features on GNN prediction results. GNNs consider both graph structures and node features to make predictions. Through expert interviews, experts are concerned about whether the graph structure or node features have a greater impact on GNN prediction, and then determine which components will have more impact. Hence, we define two proxy models such as GNNWUF and MLP. The two proxy models have the same model architectures as the GNN but are trained using different input data. GNNWUF are trained only using the graph structure while MLP is trained only using the node features. When training GNNWUF, we use one hot encoding as the node feature for each node, meaning GNNWUF only considers the graph structures. When GNN considers only the features of the node itself, then it can degenerate into an MLP model. Hence, MLP is chosen as the other GNN proxy model that only considers the node features and is used to evaluate the influence of node structures. We train both proxy models with the same settings as the training GNN.

To help users understand the graph dataset and error patterns of the GNN prediction results, we further compute a number of node-level metrics. Those metrics are derived from expert interviews. Details are presented in the following paragraphs.

  • Ground truth label and prediction results of GNN, GNNWUF, and MLP. The ground truth label is a basic metric which enables us to inspect the GNN prediction results. The predictions of the three models help in the investigation of how the GNN prediction is influenced by the node’s own features and the features from neighboring nodes. The comparison among the three models helps users understand how well GNN makes use of the graph structure and the node features.

  • Confidence

    . We use the GNN prediction probability on the

    GNN prediction label for each specific node to delineate the confidence of the GNN model on each specific node. It makes model users aware of how the model comes to a specific prediction.

  • Node degree. GNNs mainly operate over the neighborhood of each node and can affect the final performance of a GNN model. Therefore, the node degree is considered in this study.

  • Center-neighbor consistency rate. The center-neighbor consistency rate depicts how consistent the labels of the current node and its surrounding neighbors are. It can be divided into four major categories by considering both the ground truth labels and their predictions: (1) Label consistency shows the percentage of neighbors which have the same ground truth label as the current node; (2) Label-Prediction consistency describes the percentage of neighbors whose GNN prediction labels are the same as the current node’s ground truth label; (3) Prediction-Label consistency delineates the percentage of neighbors whose ground truth labels are the same as the current node’s GNN prediction label; and (4) Prediction consistency refers to the percentage of neighbors which have the same GNN prediction label as the current node. These values can indirectly reflect how many neighbors satisfy the constraints. If the node degree is zero, then the consistency rate is set to zero. These metrics help users check whether the one-hop neighborhood exerts influence on the GNN prediction result on the node of user interest.

  • Shortest path distance to training nodes. We use the breadth-first search (BFS) algorithm to calculate the shortest path distance from the current node to the training nodes. The algorithm will first start traversing the current node and then the neighbors of the visited nodes. When it first detects a node in the training set, the algorithm will regard the distance from that node to the current node as the shortest path distance from the current node to the training nodes. The distribution of the training nodes, also called labeled nodes, can have a significant influence on GNN prediction [53].

  • Nearest training nodes label distribution. To investigate the influence of training nodes distribution on model training, we calculate the nearest training nodes label distribution. To calculate the label distribution of the nearest training node(s), we first find the closest training nodes to the current node in terms of shortest path distance. Then we count the frequency of the labels of these training nodes and normalize them into . The normalized frequencies are considered to be the nearest training nodes label distribution. Analyzing these metrics helps to investigate the influence of training nodes distribution on model training.

  • Nearest training nodes dominant label consistency. In order to help users quickly capture the dominant information of nearest training nodes label distribution and further diagnose the causes of errors in GNN prediction results, we define the nearest label as the label that most frequently occurs in the training nodes closest to a specific node in terms of topological distance. Then, we further consider whether the nearest label is consistent with the ground truth label of this specific node. If yes, we set the nearest training nodes dominant label consistency for this node to True; otherwise, it is set to False. Sometimes, there may be multiple nearest labels. Then we directly set the nearest training nodes dominant label consistency to Not Sure. Such a metric is derived from nearest training nodes label distribution and the ground truth label of the current node

    . If it is true, the current node can get information from the structure and the training nodes and it has a high chance of being correctly classified. Otherwise, it has high probability to be misclassified.

  • The label distribution of the top-k training nodes with the most similar features. The feature similarity between two nodes is defined as the cosine distance between the feature vectors of two nodes. We first find the top-k training nodes with the most similar features to the node of user interest. Then we count the frequency of labels of those training nodes and normalize them into . They are then considered to be the label distribution of the top-k training nodes with the most similar features. With this metric, we can analyze the influence of node features on GNN predictions. Empirically we set k to 5 in our implementations.

  • Top-k most similar training nodes dominant label consistency. Similar to the definition of the previous metric, we can also calculate the top-k most similar training nodes dominant label consistency in the same way. The major difference is that this metric reflects the influence of training node features on the model prediction results on the current node.

5.3 Visualization

As shown in Fig. 1, GNNVis visualization module consists of a Control Panel (a), a Parallel Sets View (b), a Projection View (c), a Graph View (d), and a Feature Matrix View (e). The Control Panel allows users to select a graph dataset and inspect different subsets of the dataset (e.g., all, training, validation, and testing). The Parallel Sets View (Fig. 1(b)) visualizes the distribution of node-level metrics, which are defined in Section 5.2. Users can select a subset of metrics to inspect their distribution and correlation. Then users can select a subgroup of nodes and inspect them in the Projection View (Fig. 1(c)). The Projection View presents a set of 2D projections of the selected nodes according to metrics summarized from different perspectives. Users can lasso a potential cluster of nodes to see their location on the whole graph in the Graph View (Fig. 1(d)) and their feature distributions in the Feature Matrix View (Fig. 1(e)).

5.3.1 Parallel Sets View

In order to provide a high-level summary (R1) and help users understand the datasets and identify error patterns of GNN models prediction (R2), we design the Parallel Sets View to visualize node-level metrics using Parallel Sets [4]. Previous work [39, 51, 37, 10, 21] explored the selection of a subset of sample properties to study machine learning models. Inspired by previous research, we use this strategy to investigate error patterns in GNN models. We propose to use Parallel Sets to investigate error patterns in GNN prediction results, following the previous work [46, 7]

. Users can select what metrics are to be displayed in the Parallel Sets through Parallel Sets Settings Modal. In general, displaying fewer than five axes in Parallel Sets is a good practice to reduce visual clutter and make efficient use of functions in Parallel Sets. Due to the constraint that the Parallel Sets are used to display the categorical variables, we need to convert the continuous metrics to categorical variables by grouping a range of values into one category. Then we can also show them in the Parallel Sets View.

As shown in Fig. 1(b), each axis of the Parallel Sets shows a categorical variable. The axis is partitioned into multiple segments representing different categories of the variable. The width of each segment represents the number of nodes falling into that category. We can directly see the distribution of the categories on the axis. Between two consecutive axes, multiple ribbons are shown to connect the two axes, each simultaneously representing the nodes that satisfy the conditions specified by the two axes.

Users can easily select a subset of nodes in the dataset and further investigate their node metrics and the GNN model prediction results. When users click on a segment, the corresponding category of that axis will be selected. Also, when users click on the ribbon in the Parallel Sets, the corresponding set of nodes will be selected. Besides, the axes in the Parallel Sets can be easily reordered by users through drag-and-drop. By filtering the nodes according to node-level metrics such as correctness and ground truth label, users can easily select a node subset of their interest for further analysis.

A common alternative for visualizing multivariate data is the Parallel Coordinates Plot (PCP) [20]. Each data point is visualized as a single line across different attributes. However, when it comes to categorical data, it is challenging to identify the proportions of data that fall into specific categories. Compared with PCP, Parallel Sets intuitively show the distribution of the categories in each axis and the correlation between multiple axes. Thus, Parallel Sets are finally chosen to display the overall distribution of node attributes.

Fig. 3: (a) The links connecting different kinds of information of the same nodes shown in different planes will be displayed when users lasso a group of nodes in one plane. (b-e) Node glyphs design in planes of the Projection View. Color indicates the corresponding label. The color legend is shown in the bottom left of Fig. 1(d).

5.3.2 Projection View

With the overview of the dataset and GNN models provided by the Parallel Set View, we further design the Projection View to give users more insights into the subset of nodes selected in the Parallel Sets View (R2, R3). We group a subset of node-level metrics, display them in glyphs, and further project them to the 2D plane. The Projection View allows users to investigate the similarity of nodes regarding different perspectives. It can be helpful for investigating whether the nodes with similar node metrics share similar error patterns.

In the Projection View, we provide a set of linked projection planes of the nodes that use different features. Different from similar designs in EmbeddingVis [26], we design different node glyphs to display different combinations of node-level metrics. In order to project those node glyphs and avoid overlapping, we use the t-SNE [44] projection, which is a widely used projection technique, and force-directed collision-avoidance method to prevent the overlapping of node glyphs. When users lasso-select a set of nodes in a projection plane, the links between the same nodes in different planes will be shown to help users identify the nodes and other aspects of those nodes’ properties, as shown in Fig. 3

. After users hover on the node glyphs, the legend and detailed information of those node glyphs will be displayed. However, due to the limited screen space, it cannot display hundreds let alone thousands of node glyphs. Therefore, we apply a hierarchical clustering algorithm with complete linkage to cluster these nodes based on a specific distance function 

[12]. Cluster-level node glyphs are designed based on aggregating the node-level metrics for individual nodes in the clusters. In order to help users further inspect individual nodes, after users select a subset of cluster-level node glyphs, users can switch to ”Detail” mode, then the Projection View will display individual node glyphs for nodes in such a cluster. This design greatly enhances the scalability of the Projection View.

In our implementation, we categorize the metrics into four groups and provide four projections for each group of node metrics. The four projection planes are prediction results comparison, surrounding nodes label consistency, training nodes structure influence, and training nodes feature influence, respectively. Different glyph designs are also proposed for the nodes. We introduce them one by one in the following paragraphs.

A. Prediction results comparison. This plane aims to help users compare different models prediction results and reveal relative influence of structure and features for each nodes or clusters. The metrics used in this plane include the ground truth label , the prediction label of three models, i.e., GNN, GNNWUF, MLP , and the Confidence of GNN prediction . As shown in Fig. 3(b), three model prediction results can be found in the pie chart. The inner circle encodes the ground truth label. The outer circular ring encodes the confidence. The radius of the whole node glyph encodes the size of the clusters. Through such a node glyph, users can easily compare the ground truth label and model prediction results and understand how confidently GNN models make predictions. Through the projection, the nodes with similar metrics will be in close proximity. Users can see if there are clusters of nodes with the same ground truth labels and predictions, which helps GNN model developers and users further analyze what causes the model to make such predictions. For projection and clustering, the distance between Node and Node in this plane is defined as below:

(5)

where is an indicator function which values 1 when the expression is true and otherwise 0. Such a distance function guarantees that the value of each term is between 0 and 1.

B. Surrounding nodes label consistency. To help users explore the label consistency between a node and its neighboring nodes, we show the ground truth label , the degree , the center-neighbor consistent rate in this plane, where represents Label consistency, represents Label - Prediction consistency, represents Prediction - Label consistency, and represents Prediction consistency. The node glyph (Fig. 3(c)) is designed to show this group of metrics. The design is inherent from the Radar Chart as it can display continuous variables. The color of polygon encodes the ground truth label. The radius of the whole node glyph encodes the size of the clusters. Clusters may appear and users can easily spot them, since there will be a certain shape among those node glyphs. For the projection and clustering, the distance between Node and Node is defined as:

(6)

where is the normalized degree that bounds the value between 0 and 1. means the norm of the vector .

C. Training nodes structure influence. To help users capture the structure influence of training nodes on GNN model prediction, the metrics we visualize in this plane include GNN prediction label , shortest path distance to training nodes , and normalized nearest training nodes label distribution . Here is the number of classes. In order to encode the in the node glyph and highlight the difference between smaller values, i.e., , we define . It depicts the closeness of the nearest training nodes to the current node. The node glyph (Fig. 3(d)) is designed to show this group of metrics. The length of the line on the top of the rectangle encodes the closeness. The rectangle on the right-hand side of the glyph shows the distribution of ground truth labels of training nodes with the shortest path distance to that node. The width and height of the whole node glyph encode the size of the clusters. The left-hand side rectangle encodes the GNN prediction label. This helps users analyze the correlation between those variables. A high correlation between and the dominant component of indicates that the closest training nodes have a strong influence on GNN’s prediction of the current node. For the projection and clustering, the distance between Node and Node is defined as:

(7)

D. Training nodes feature influence. We further use another plane to help users capture the feature influence of training nodes. The metrics we used in this plane include GNN prediction label and , the label distribution of the top-k training nodes with the most similar features. The node glyph (Fig. 3(e)) share a similar visual design with Fig. 3(d). The difference is that the right-hand side rectangle encoded the top-k feature similarity training nodes ground truth label distribution and the node glyphs do not have a line at the top. It enables users to analyze GNN prediction results from the perspective of features. The clusters on the projection plane indicate nodes that have been similarly affected by the features. Combined with the Feature Matrix View, we can determine which feature may play a better or worse role in the GNN predictions. We use a similar distance function defined in the plane of training nodes structure influence:

(8)

There are a few design alternatives of those node glyphs. For the node glyph in the plane of prediction results comparison, we can use a grid to represent the ground truth label and three model prediction results. However, such a design cannot effectively help users compare the metrics in the diagonal and will confuse users. Therefore, such a design is not adopted. For the node glyph in the plane of surrounding nodes label consistency, an alternative design is to use Parallel Coordinates Plot to display the five continuous metrics. However, it is generally hard for users to distinguish between two node glyphs. Such a design is not used in this plane. For the node glyph in the plane of training nodes structure influence and feature influence, we can use a similar node glyph in the plane of prediction results comparison to encode the GNN prediction result in the inner circle and encode the label distribution in the outer ring. However, to avoid any confusion in the node glyph between those planes, we do not use this design in training nodes structure influence and feature influence plane.

Fig. 4: Graph View enables users to inspect the graph structure. Node glyph in Graph View enables users to compare three model prediction results and ground truth label simultaneously. The color legend indicates which class the color represents. The legend for the node glyphs shows the position at which each metric is encoded. The color in the legend for node glyphs is only intended to show an example of node glyphs.

5.3.3 Graph View

We use the classic node-link diagram with the force-directed collision-avoidance layout to visualize the graph dataset. Users can get a sense of the distribution of the selected nodes in the graph, and inspect the neighborhood of the nodes (R2, R3).

To further facilitate the convenient exploration of the reasons for errors, we design a node glyph to encode a group of node-level metrics. The experts commented that they are interested in the ground truth label, and the predictions of the GNN, GNNWUF, and MLP models. Combining the four metrics, they are able to investigate the potential error types of the nodes. As shown in Fig. 4, the glyph designed to present the node-level metrics is similar to the design used in the Projection View. A legend for the glyph is also displayed at the corner of the Graph View as an easy reference for users.

The set of nodes selected in the Parallel Sets View or the Projection View is highlighted in the Graph View. Users can hover a node in the Graph View, which will be further highlighted with the radius doubled. The Graph View allows users to quickly check any interesting neighboring nodes. Users can also switch to the “Extended” mode, which would further highlight the one-hop or two-hop neighbors of selected nodes, enabling users to explore different hops of neighborhood nodes. An overview of the graph is displayed in the bottom right-hand corner to support users navigating the graph. Users can click the specific position in the overview to navigate the displayed area of the graph. Users can choose to filter out the unfocused nodes to accelerate the rendering and reduce the visual clutter in the graph. To investigate the node features and most similar features of training nodes, users can click on the nodes of interest in the Graph View and further explore the node-level features in the Feature Matrix View.

Fig. 5: Feature Matrix View includes brushable bar chart (Top) and feature matrix (Bottom).

5.3.4 Feature Matrix View

We design the Feature Matrix View to help users further explore the node features (R3), as shown in Fig. 5. The Feature Matrix View consists of two components, i.e., a brushable bar chart and a feature matrix.

We first assume that all the features used in our dataset range from zero to one. The feature matrix indicates all the node features. The color encodes the prediction label of that node and the opacity encodes the specific feature value. In the brushable bar chart, the bar height encodes the count of any features with a value larger than 0 in the feature matrix. Users can brush a range of bars in the brushable bar chart and thus the feature matrix will display the specific range of feature dimensions. This makes it really convenient for users to inspect the features of nodes with a high dimensionality and without this design, the scalability of this view is not guaranteed. Users can change the sorting methods of feature dimensions. It can be sorted based on node ordering or frequency of features. When users select a subset of nodes in Parallel Sets View and Projection View, it will display the features of selected nodes. The hierarchical clustering algorithm and optimal leaf ordering [3] will be employed to generate the node ordering. After sorting the nodes, the similarity will be calculated between two consecutive nodes. If they are very similar, we highlight them by adding a border in the rectangle in the rows of corresponding nodes. When a node is selected in the Graph View, it will display the features of that node and the top-k most similar feature training nodes

. The training nodes will be sorted based on feature similarity with that node. Sorting based on the frequency of features uses a heuristic to sort feature dimensions. This heuristic is that for each dimension of the features, we first count the frequency

and then calculate the frequency of support , i.e., how many nodes with the same features and prediction label as the first node. We then calculate the support rate of the features by using formula: . Therefore, when the support rate is high, it will have a higher ranking. When the support rate between two dimensions of a feature is the same, it is sorted based on the frequency of features. Then we can figure out what features can be supportive of the predictions for the GNN model.

6 Evaluation

In this section, we demonstrate the effectiveness and usability of the system by two case studies and structured interviews with GNN experts. We conduct two case studies with two experts E1 and E2 who have both been mentioned in Section 4.

Fig. 6: The correlation between GCN correctness and label: (a) All nodes in Amazon Photo dataset; (b) Training nodes in Amazon Photo dataset.

6.1 Case One: Error Pattern Analysis of GCN on Amazon Photo Dataset

This case study shows how our approach helps the model researcher explore the error patterns of the GCN model, one of the most representative GNN models, on the Amazon Photo dataset [40]. The Amazon Photo dataset is a co-purchasing network of 7,650 products. In this dataset, each node represents a product and is classified into one of eight classes, including File Photography, Digital Concepts, Binoculars & Scopes, Lenses, Tripods & Monopods, Video Surveillance, Lighting & Studio, and Flashes. Each edge is a co-purchasing relationship, i.e., products are purchased by the same customer. Each feature of the node is a vector of 0-1 value indicating whether the corresponding word appears in product reviews.

6.1.1 Developing Initial Hypotheses about the Possible Error Patterns in GNN Results

E1 started his analysis from the Parallel Sets View. E1 found that the GCN model achieves an accuracy of 91.15% on the whole dataset. The test accuracy is 91.80%. The model performance is consistent with the results reported in other papers [40]. E1 changed the first axis of the Parallel Sets View to be GCN correctness by dragging the corresponding axis to the first axis. The total number of wrong prediction nodes is 677. Then E1 explored which variables the wrong prediction correlated with. E1 put the Label in the second axis and found that the GCN model makes most percentage of wrong predictions on the Class 7 nodes. This is indicated by the ribbon link flowing from the wrong category to the ground truth label “7”, which occupies the largest portion of the ground truth label “7” in Fig. 6(a). E1 used the Control Panel (Fig. 1(a)) to see the training node information by ticking “Training” only. E1 found that the training nodes are sampled with even probability from eight classes, which is shown by a similar distribution of ground truth labels in training nodes and all the nodes (Fig. 6(b)). The number of nodes with the ground truth label “7” is small compared with the number of nodes of other labels and the number of training nodes with the ground truth label “7” is also small. Perhaps this is the reason GCN is unable to correctly classify the nodes in Class 7. However, E1 also found that the number of training nodes with the ground truth label ”0” is also small, but the GCN model correctly classifies most of the nodes in Class 0. E1 doubted his hypothesis and decided to further investigate the cause of wrong predictions (R1).

6.1.2 Forming the Hypothesis about Possible Error Patterns

E1 selected four axes, including Label, GCN correctness, nearest training nodes dominant label consistency, and top-k most similar training nodes dominant label consistency, displayed in the Parallel Sets View using the Parallel Sets Settings Modal, because E1 thought that those variables are important for analyzing the error patterns. After hovering over Label ”0” and Label ”7”, E1 found that for nearest training nodes dominant label consistency, most nodes with Label ”0” have true value, while most nodes with Label ”7” have a false value. This shows that from graph structure perspective, it is easy for nodes with Label ”0” to find training nodes with the same ground truth label through searching the training nodes with shortest path distance to current nodes, while nodes with Label ”7” cannot satisfy these conditions. Therefore, E1 speculated that this is the reason for more classification errors when the GCN model is applied to the nodes of Label ”7” (R2).

6.1.3 Analyzing the Cause of Error Patterns

To further verify the cause of the error patterns identified above, E1 selected 150 wrongly-classified nodes of Label ”7” by GCN in Parallel Sets View (Fig. 1(b)) and further explored other views. From the planes of training nodes structure influence and feature influence in the Projection View (Fig. 1(c)), few nodes of Label ”7” appear in the label distribution and many prediction labels are consistent with the largest component on the right-hand side of the glyph. From the plane of surrounding node label consistency, the label consistency of most nodes is relatively small. It can be explained that the labels of the neighbors of these nodes are mostly inconsistent with the label of the current node. E1 further explored three training nodes that are also misclassified, as shown in Fig. 1(c). E1 lasso-selected them, and then selected one of the nodes in the Graph View. E1 found that it has a large number of neighbors with other ground truth labels. In the Feature Matrix View, there are also many training nodes with other labels. Therefore, E1 believed that the error of this training node is due to the existence of a large number of neighboring nodes with other ground truth labels around it. It is also observed from node glyphs that the GCNWUF prediction result is consistent with the GCN prediction result, and the MLP prediction result is consistent with the current node’s ground truth label. It also provides support for the conclusion that the structural impact on GCN prediction result maybe larger than the impact of features on the GCN prediction result for this training node (R3).

Fig. 7:

E2 selected a cluster (a1) to inspect in Projection View (a). Then E2 selected a node in Graph View (b) to further inspect its neighborhood. E2 found that in Feature Matrix View (c), the first few words are “markov”, “model”, “chain” (c1), which are common words in the paper belong to Probabilistic Methods. It maybe a reason for the misclassification of that node.

6.2 Case Two: Error Pattern Analysis of GAT on Cora-ML Dataset

The model developer, E2, often needs to use GNNs to model network data in real applications. This case study shows how GNNVis assists him in analyzing another representative GNN model (i.e.

, the GAT model) on the Cora-ML dataset 

[5]

. Specifically, the Cora-ML dataset is a citation network of 2810 scientific publications. Each node in the Cora-ML dataset represents a paper and is classified into one of seven classes, including Case-Based, Theory, Genetic Algorithms, Probabilistic Methods, Neural Networks, Rule Learning, and Reinforcement Learning. Each edge is a citation relationship. Each feature of the node is a vector, with each feature element ranging from 0 to 1. A feature element larger than 0 represents that the paper abstract contains the corresponding word.

6.2.1 Forming the Hypothesis about Possible Error Patterns

In the Parallel Sets View, E2 finds that the GAT achieves an accuracy of 86.16% on the whole dataset and 84.70% on the testing set. After inspecting the overview of metrics in the Parallel Sets View, E2 selected three axes, including nearest training nodes dominant label consistency, top-k most similar training nodes dominant label consistency, and GAT correctness, because E2 thought that those metrics are important and can support a detailed analysis. E2 found an interesting set of nodes with the nearest training nodes dominant label consistency as true, the top-k most similar training nodes dominant label consistency as false, and the GAT correctness as wrong. E2 decided to further explore them and select those nodes in the Parallel Sets View by clicking the ribbon satisfying the above conditions.

In the Projection View (Fig. 7(a)), E2 found that in the training nodes feature influence plane, the left side and the right side of the glyphs are the same colour (Fig. 7(a1)). This consistency means that the GAT prediction labels are consistent with the top-k similar features training nodes dominant labels. It implies that the node features may have a great impact on GAT prediction. Then, E2 selected one of the clusters, and then checked the other planes of the Projection View. In the training nodes structure influence plane, E2 can see that the left-hand side and the right-hand side of the highlighted node glyphs are different colours. In the surrounding nodes label consistency plane, it can be seen that the label consistency is generally large, indicating that the surrounding ground truth label is consistent with the current node’s ground truth label. In the prediction results comparison plane, it is found that the prediction results of MLP are consistent with the prediction results of GAT. It also shows that the node features of this cluster of nodes may have a negative impact on the GAT model performance on them (R1, R2).

6.2.2 Analyzing the Cause of Error Patterns

To verify his observation, E2 also explored the Graph View and selected a node for further checking. E2 found that the node has some neighbor nodes with a different ground truth label, as shown in Fig. 7(b). In the Feature Matrix View (Fig. 7(c)), E2 can see that the ground truth labels of most training nodes are the same as its GAT prediction label. The first few words are “markov”, “model”, “chain” (Fig. 7(c1)), which are common words in papers about Probabilistic Methods. This article has these words, but the ground truth class of this article is Neural Network. Therefore, these features may be one of the reasons for the misclassification made by GAT, and these features have a significantly negative impact on the performance of the GAT model (R3).

6.3 Expert Interviews

To evaluate whether GNNVis can help users find possible error patterns and is easy to understand and use, we conduct interviews with four experts (E3, E4, E5, E6). The four experts have diverse research interests in the field of GNNs. E3 has experience in research on Graph Pooling and Graph Agreement Models. E4 has experience in research on new GNN models and GNN model robustness. E5 has experience in applying GNN in healthcare such as predicting drug interactions. E6 has experience in utilizing GNN to generate Graph Embedding. None of the four experts are co-authors of this paper and know about the details of GNNVis before the interviews.

The expert interviews were conducted as follows: Experts first reviewed the introductory materials provided by us, including slides which illustrate the problem solved, the system overview, and a video which demonstrates the case studies on the system design and workflow of using the system. After experts reviewed those materials to learn about how the GNNVis works, we asked them to explore the GNNVis demo by following the demonstrated workflow to find the cause of the prediction errors of individual nodes and extract general error patterns in GNN prediction results independently. Finally, we asked them to finish a post-interview questionnaire to collect their feedback on GNNVis. The questionnaire mainly comprises the meta-information about the experts, the evaluation of the effectiveness of GNNVis, and the evaluation of the usability of each component and overall design of GNNVis. Results and feedback are summarized as follows:

All experts stated that they have no prior experience in using visualization to diagnose GNNs. E3 said that he has used networkx222https://networkx.github.io/

for visualizing graph datasets but not for the diagnosis of GNNs. Without visualization, they inspected their GNN models through such ways. E3 commented that he will first inspect the traditional evaluation metrics such as Accuracy, Recall, Precision, and Loss to monitor the training process of GNN. Then he would analyze the error of each class and further check network properties like the average length of the shortest paths, the average cluster coefficients, the average degree, and so on. E4 said that he will check whether each GNN layer output result is correct or not. E5 commented that he also uses training loss and accuracy to monitor the training process of GNN. Moreover, he would further check the attention weights of nodes and neighboring nodes of misclassified nodes. E6 mentioned that he will first find out whether GNN works on small test sets and if it works, he will train the model on a large dataset. To evaluate a GNN, E3 pointed out that it should be evaluated on the different splitting of the training dataset due to the fact that the performance of GNN will be greatly influenced by the training splitting of the dataset. E5 commented that memory usage and inference speed also should be considered in the evaluation of GNN.

Effectiveness: After exploring the GNNVis, all experts appreciated our efforts in making such an effective system to help them understand and diagnose the GNN models. E3 said that GNNVis can help him to check properties of the dataset and inspect the model behaviors. E5 commented that it can help him to extract the patterns of misclassified nodes. E6 mentioned that it is more intuitive to analyze GNN. It can also help facilitate users to simultaneously analyze the GNN prediction results from multiple perspectives. Through the exploration of GNNVis, E3 found that most of the error cases occur from ground truth label “3“ in the Cora dataset based on the GCN model. He could observe that most misclassifications are related to the nearest training nodes label distribution. It means that a few edges in the graph dataset might be noise for node classification. E6 found that if the degree of a node is small, it is hard for models to classify them correctly for they have little effective information to help models to correctly classify them. Based on these observations, experts proposed solutions to solve these problems. E3 suggested that we should add operations to correct edge noise in the model. E6 suggested that we make a GNN model which implements different strategies to make predictions on nodes. If the nodes have a high degree, we can enforce the model which considers more connection information. If the nodes have a low degree, we can enforce the model which considers more feature information of that node. Moreover, he observed that a few nodes misclassified by GNN can be correctly classified by MLP. He commented that utilizing multiple models to make predictions may be a good strategy to enhance model performance. The experts said that after using GNNVis, it inspires them to do something different than before to further inspect the GNN model. E3 would utilize GNNVis to check the plane of training nodes structure influence in the Projection View to find the misclassification of nodes with ground truth label 3, before and after training models, to see whether his model can correct the wrong edges or not. E6 would explore which patterns are shared by the misclassified nodes and find which kinds of nodes the specific model would be suitable to make predictions on.

Usability: All experts agreed that GNNVis is easy to use and easy to understand. For Parallel Sets View, they commented that this view offers important insights for them to analyze the GNN model prediction. E3 said that “Parallel Sets View helps me check what the major problem of the models is”. E5 mentioned that he can clearly see the ratio of correctly and wrongly classified nodes and whether the structure or feature has an influence on the performance of GNN models. E6 pointed out that it is intuitive to use that view. After adjusting the order of axes in the Parallel Sets View, he can get useful information to further support his analysis of the GNN model. E3 liked the Projection View, as it allows him to associate model behavior with node characteristics. For Graph View and Feature Matrix View, most of the experts also appreciated that they are easy to use and understand. Different from other experts, E5 preferred the Feature Matrix View over the Graph View because he is concerned more about the features of nodes when using GNNs in the drug interaction prediction.

Suggestions: Experts also gave helpful suggestions for improving GNNVis to further support their analysis of GNN models. E3, E5, and E6 mentioned that in Projection View, they would like to see the node id and other metric information when hovering over the nodes in Projection View so we have implemented it. E3 pointed out that basic properties of nodes like Connectivity, Clustering coefficient, and Centrality should be calculated and displayed. It can further help him to check the correlation between those metrics. Moreover, it would be better for him to easily download the figure in the system because he thinks that the system is awesome and powerful. E3, E5, and E6 commented that the system should support customized datasets like multiple relation graphs. E5 also suggested a “what-if” analysis that enables users to dynamically insert nodes, change the node features, and observe corresponding changes. Future work may include those functions.

7 Discussions and Future Work

Generalizability: GNNVis can be applied to analyze various kinds of GNN models and different datasets. However, it can only be utilized to analyze the task of node classification. It does not currently support the analyzing of link prediction task and graph classification. Moreover, if the dataset has multiple relations or the edges have features, the system cannot be currently directly used to analyze those customized data. Also, users may want to inspect self-defined metrics or other metrics defined in other papers but currently, the system does not support customized metrics inspection.

Scalability: One limitation of GNNVis is its scalability and we have attempted to mitigate this issue by different means. For example, the Projection View displays the individual node glyphs for each node. Due to the limited screen space, we cannot display up to 300 nodes. We improved the Projection View by using a hierarchical clustering algorithm to make it scale up to more nodes, as described in Section 5.3.2. The node glyphs are used to represent clusters and details can be checked on demand. This significantly improves the scalability of the Projection View. For the Graph View, in order to accelerate the rendering speed, we enable users to display only the focused nodes and their neighbors, without rendering other nodes.

Future Work: In the future, we plan to generalize GNNVis to other graph-related tasks, like link prediction and graph classification. We plan to make Parallel Sets View and Projection View more configurable, such as enabling users to define their own metrics, such as clustering coefficients to show. We also want to further improve the running performance of GNNVis and support the graph dataset with more nodes and higher dimensional features. We also want to generalize GNNVis to support dynamically insert nodes and edges to see the corresponding change in GNN prediction results. It will help users to understand more deeply how graph structure helps a model to make a correct prediction.

8 Conclusion

In this paper, we present GNNVis, a visual analytics system to help model developers and users understand and diagnose GNN model prediction results. GNNVis comprises four visualization components: the Parallel Sets View enables users to see the distribution of metrics; the Projection View presents a set of 2D projections of the selected nodes according to metrics summarized from different perspectives enabling users to extract potential clusters of nodes; the Graph View shows the whole graph; and the Feature Matrix View shows the selected node feature information. It further enables users to check the detailed information of individual nodes. All four visualization components are linked together to support users to analyze GNN models simultaneously from multiple angles and extract general error patterns in GNN prediction results. Two case studies and expert interviews demonstrate the effectiveness and usability of our system GNNVis.

Acknowledgments

We would like to thank external experts for participating in our interviews and giving us invaluable feedback. We also thank the anonymous reviewers for their detailed reviews and constructive suggestions.

References

  • [1] B. Alsallakh, A. Hanbury, H. Hauser, S. Miksch, and A. Rauber (2014) Visual methods for analyzing probabilistic classification data. IEEE Transactions on Visualization and Computer Graphics 20 (12), pp. 1703–1712. Cited by: §2.2.
  • [2] J. Atwood and D. Towsley (2016) Diffusion-convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 1993–2001. Cited by: §2.1.
  • [3] Z. Bar-Joseph, D. K. Gifford, and T. S. Jaakkola (2001) Fast optimal leaf ordering for hierarchical clustering. In Proceedings of the Ninth International Conference on Intelligent Systems for Molecular Biology, pp. 22–29. Cited by: §5.3.4.
  • [4] F. Bendix, R. Kosara, and H. Hauser (2005) Parallel sets: visual analysis of categorical data. In IEEE Symposium on Information Visualization, pp. 133–140. Cited by: §5.3.1.
  • [5] A. Bojchevski and S. Günnemann (2018) Deep gaussian embedding of graphs: unsupervised inductive learning via ranking. In Proceedings of the 6th International Conference on Learning Representations, Cited by: §6.2.
  • [6] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun (2014) Spectral networks and locally connected networks on graphs. In Proceedings of the 2nd International Conference on Learning Representations, Cited by: §2.1.
  • [7] A. Chaudhuri (2018) A visual technique to analyze flow of information in a machine learning system. In Visualization and Data Analysis, Cited by: §5.3.1.
  • [8] L. Chen, Y. Xie, Z. Zheng, H. Zheng, and J. Xie (2020) Friend recommendation based on multi-social graph convolutional network. IEEE Access 8, pp. 43618–43629. Cited by: §1.
  • [9] M. Defferrard, X. Bresson, and P. Vandergheynst (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, pp. 3844–3852. Cited by: §2.1.
  • [10] D. Dingen, M. van ’t Veer, P. Houthuizen, E. H. J. Mestrom, H. H. M. Korsten, A. R. A. Bouwman, and J. J. van Wijk (2019)

    RegressionExplorer: interactive exploration of logistic regression models with subgroup analysis

    .
    IEEE Transactions on Visualization and Computer Graphics 25 (1), pp. 246–255. Cited by: §5.3.1.
  • [11] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams (2015) Convolutional networks on graphs for learning molecular fingerprints. In Advances in Neural Information Processing Systems, pp. 2224–2232. Cited by: §2.1.
  • [12] B. Everitt, S. Landau, M. Leese, and D. Stahl (2011) Cluster analysis. 5th edition, Wiley (English). External Links: ISBN 978-0-470-74991-3 Cited by: §5.3.2.
  • [13] M. Fey and J. E. Lenssen (2019) Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, Cited by: §3.
  • [14] A. Fout, J. Byrd, B. Shariat, and A. Ben-Hur (2017) Protein interface prediction using graph convolutional networks. In Advances in Neural Information Processing Systems, pp. 6530–6539. Cited by: §1.
  • [15] H. Gao, Z. Wang, and S. Ji (2018) Large-scale learnable graph convolutional networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1416–1424. Cited by: §2.1.
  • [16] I. Goodfellow, Y. Bengio, and A. Courville (2016) Deep learning. MIT Press. Cited by: §2.2.
  • [17] W. Hamilton, Z. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pp. 1024–1034. Cited by: §1, §2.1.
  • [18] W. L. Hamilton, R. Ying, and J. Leskovec (2017) Representation learning on graphs: methods and applications. IEEE Data Engineering Bulletin 40 (3), pp. 52–74. Cited by: §3.
  • [19] F. Hohman, M. Kahng, R. Pienta, and D. H. Chau (2019) Visual analytics in deep learning: an interrogative survey for the next frontiers. IEEE Transactions on Visualization and Computer Graphics 25 (8), pp. 2674–2693. Cited by: §1, §2.2.
  • [20] A. Inselberg and B. Dimsdale (1990) Parallel coordinates: A tool for visualizing multi-dimensional geometry. In Proceedings of the First IEEE Conference on Visualization, pp. 361–378. Cited by: §5.3.1.
  • [21] M. Kahng, P. Y. Andrews, A. Kalro, and D. H. (. Chau (2018) ActiVis: visual exploration of industry-scale deep neural network models. IEEE Transactions on Visualization and Computer Graphics 24 (1), pp. 88–97. Cited by: §5.3.1.
  • [22] M. Kahng, N. Thorat, D. H. P. Chau, F. B. Viégas, and M. Wattenberg (2018) Gan lab: understanding complex deep generative models using interactive visual experimentation. IEEE Transactions on Visualization and Computer Graphics 25 (1), pp. 1–11. Cited by: §2.2.
  • [23] T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations, Cited by: §1, §1, §2.1, §3.
  • [24] Y. LeCun, Y. Bengio, and G. Hinton (2015) Deep learning. Nature 521 (7553), pp. 436–444. Cited by: §2.2.
  • [25] Q. Li, Z. Han, and X. Wu (2018)

    Deeper insights into graph convolutional networks for semi-supervised learning

    .
    In

    Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence

    ,
    Cited by: §2.1.
  • [26] Q. Li, K. S. Njotoprawiro, H. Haleem, Q. Chen, C. Yi, and X. Ma (2018) EmbeddingVis: A visual analytics approach to comparative network embedding inspection. In IEEE Conference on Visual Analytics Science and Technology, pp. 48–59. Cited by: §5.3.2.
  • [27] X. Li and J. Saude (2020) Explain graph neural networks to understand weighted graph features in node classification. arXiv preprint arXiv:2002.00514. Cited by: §1, §2.3.
  • [28] Y. Li, D. Tarlow, M. Brockschmidt, and R. S. Zemel (2016) Gated graph sequence neural networks. In Proceedings of the 4th International Conference on Learning Representations, Cited by: §2.1.
  • [29] X. Liang, X. Shen, J. Feng, L. Lin, and S. Yan (2016) Semantic object parsing with graph LSTM. In

    European Conference on Computer Vision

    ,
    pp. 125–143. Cited by: §2.1.
  • [30] D. Liu, W. Cui, K. Jin, Y. Guo, and H. Qu (2018) Deeptracker: visualizing the training process of convolutional neural networks. ACM Transactions on Intelligent Systems and Technology 10 (1), pp. 1–25. Cited by: §2.2.
  • [31] M. Liu, J. Shi, K. Cao, J. Zhu, and S. Liu (2017) Analyzing the training processes of deep generative models. IEEE Transactions on Visualization and Computer Graphics 24 (1), pp. 77–87. Cited by: §2.2.
  • [32] M. Liu, J. Shi, Z. Li, C. Li, J. Zhu, and S. Liu (2016) Towards better analysis of deep convolutional neural networks. IEEE Transactions on Visualization and Computer Graphics 23 (1), pp. 91–100. Cited by: §1, §2.2.
  • [33] Y. Ming, S. Cao, R. Zhang, Z. Li, Y. Chen, Y. Song, and H. Qu (2017) Understanding hidden memories of recurrent neural networks. In IEEE Conference on Visual Analytics Science and Technology, pp. 13–24. Cited by: §1, §2.2.
  • [34] Y. Ming, H. Qu, and E. Bertini (2019) RuleMatrix: visualizing and understanding classifiers with rules. IEEE Transactions on Visualization and Computer Graphics 25 (1), pp. 342–352. Cited by: §2.2.
  • [35] F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and M. M. Bronstein (2017) Geometric deep learning on graphs and manifolds using mixture model cnns. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 5115–5124. Cited by: §2.1.
  • [36] M. Niepert, M. Ahmed, and K. Kutzkov (2016) Learning convolutional neural networks for graphs. In Proceedings of the 33rd International Conference on Machine Learning, pp. 2014–2023. Cited by: §2.1.
  • [37] N. Pezzotti, T. Höllt, J. Van Gemert, B. P. Lelieveldt, E. Eisemann, and A. Vilanova (2017) Deepeyes: progressive visual analytics for designing deep neural networks. IEEE Transactions on Visualization and Computer Graphics 24 (1), pp. 98–108. Cited by: §2.2, §5.3.1.
  • [38] P. E. Pope, S. Kolouri, M. Rostami, C. E. Martin, and H. Hoffmann (2019) Explainability methods for graph convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10772–10781. Cited by: §1, §2.3.
  • [39] D. Ren, S. Amershi, B. Lee, J. Suh, and J. D. Williams (2016) Squares: supporting interactive performance analysis for multiclass classifiers. IEEE Transactions on Visualization and Computer Graphics 23 (1), pp. 61–70. Cited by: §5.3.1.
  • [40] O. Shchur, M. Mumme, A. Bojchevski, and S. Günnemann (2018) Pitfalls of graph neural network evaluation. Relational Representation Learning Workshop, NeurIPS. Cited by: §6.1.1, §6.1.
  • [41] H. Strobelt, S. Gehrmann, M. Behrisch, A. Perer, H. Pfister, and A. M. Rush (2019) Seq2seq-vis: A visual debugging tool for sequence-to-sequence models. IEEE Transactions on Visualization and Computer Graphics 25 (1), pp. 353–363. Cited by: §2.2.
  • [42] H. Strobelt, S. Gehrmann, H. Pfister, and A. M. Rush (2017) LSTMVis: A tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE Transactions on Visualization and Computer Graphics 24 (1), pp. 667–676. Cited by: §2.2.
  • [43] K. S. Tai, R. Socher, and C. D. Manning (2015)

    Improved semantic representations from tree-structured long short-term memory networks

    .
    In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, pp. 1556–1566. Cited by: §2.1.
  • [44] L. van der Maaten and G. Hinton (2008-11) Viualizing data using t-SNE. Journal of Machine Learning Research 9, pp. 2579–2605. Cited by: §5.3.2.
  • [45] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio (2018) Graph attention networks. In Proceedings of the 6th International Conference on Learning Representations, Cited by: §1, §3.
  • [46] Z. Vosough and V. Vasyutynskyy (2018) Using parallel sets for visualizing results of machine learning based plausibility checks in product costing.. In VisBIA@ AVI, pp. 4–11. Cited by: §5.3.1.
  • [47] J. Wang, L. Gou, H. Shen, and H. Yang (2018) Dqnviz: a visual analytics approach to understand deep q-networks. IEEE Transactions on Visualization and Computer Graphics 25 (1), pp. 288–298. Cited by: §1, §2.2.
  • [48] J. Wang, L. Gou, H. Yang, and H. Shen (2018) Ganviz: a visual analytics approach to understand the adversarial game. IEEE Transactions on Visualization and Computer Graphics 24 (6), pp. 1905–1917. Cited by: §1, §2.2.
  • [49] J. Wang, L. Gou, W. Zhang, H. Yang, and H. Shen (2019) DeepVID: deep visual interpretation and diagnosis for image classifiers via knowledge distillation. IEEE Transactions on Visualization and Computer Graphics 25 (6), pp. 2168–2180. Cited by: §2.2.
  • [50] Y. Wang, Z. Jin, Q. Wang, W. Cui, T. Ma, and H. Qu (2020) DeepDrawing: A deep learning approach to graph drawing. IEEE Transactions on Visualization and Computer Graphics 26 (1), pp. 676–686. Cited by: §2.1.
  • [51] J. Wexler, M. Pushkarna, T. Bolukbasi, M. Wattenberg, F. B. Viégas, and J. Wilson (2020) The what-if tool: interactive probing of machine learning models. IEEE Transactions on Visualization and Computer Graphics 26 (1), pp. 56–65. Cited by: §5.3.1.
  • [52] K. Xu, W. Hu, J. Leskovec, and S. Jegelka (2019) How powerful are graph neural networks?. In Proceedings of the 7th International Conference on Learning Representations, Cited by: §2.1, §5.
  • [53] Y. Yang, X. Wang, M. Song, J. Yuan, and D. Tao (2019) SPAGAN: shortest path graph attention network. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 4099–4105. Cited by: 5th item.
  • [54] Z. Ying, D. Bourgeois, J. You, M. Zitnik, and J. Leskovec (2019) Gnnexplainer: generating explanations for graph neural networks. In Advances in Neural Information Processing Systems, pp. 9240–9251. Cited by: §1, §2.3.
  • [55] V. Zayats and M. Ostendorf (2018) Conversation modeling on reddit using a graph-structured LSTM. Transactions of the Association for Computational Linguistics 6, pp. 121–132. Cited by: §2.1.
  • [56] J. Zhang, Y. Wang, P. Molino, L. Li, and D. S. Ebert (2018) Manifold: a model-agnostic framework for interpretation and diagnosis of machine learning models. IEEE Transactions on Visualization and Computer Graphics 25 (1), pp. 364–373. Cited by: §2.2.
  • [57] J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun (2018) Graph neural networks: a review of methods and applications. arXiv preprint arXiv:1812.08434. Cited by: §2.1, §2.1, §5.
  • [58] C. Zhuang and Q. Ma (2018) Dual graph convolutional networks for graph-based semi-supervised classification. In Proceedings of the World Wide Web Conference, pp. 499–508. Cited by: §2.1.