Log In Sign Up

SoftEdge: Regularizing Graph Classification with Random Soft Edges

by   Hongyu Guo, et al.

Graph data augmentation plays a vital role in regularizing Graph Neural Networks (GNNs), which leverage information exchange along edges in graphs, in the form of message passing, for learning. Due to their effectiveness, simple edge and node manipulations (e.g., addition and deletion) have been widely used in graph augmentation. In this paper, we identify a limitation in such a common augmentation technique. That is, simple edge and node manipulations can create graphs with an identical structure or indistinguishable structures to message passing GNNs but of conflict labels, leading to the sample collision issue and thus the degradation of model performance. To address this problem, we propose SoftEdge, which assigns random weights to a portion of the edges of a given graph to construct dynamic neighborhoods over the graph. We prove that SoftEdge creates collision-free augmented graphs. We also show that this simple method obtains superior accuracy to popular node and edge manipulation approaches and notable resilience to the accuracy degradation with the GNN depth.


page 1

page 2

page 3

page 4


Data Augmentation for Graph Neural Networks

Data augmentation has been widely used to improve generalizability of ma...

EEGNN: Edge Enhanced Graph Neural Networks

Training deep graph neural networks (GNNs) poses a challenging task, as ...

Intrusion-Free Graph Mixup

We present a simple and yet effective interpolation-based regularization...

Trainability for Universal GNNs Through Surgical Randomness

Message passing neural networks (MPNN) have provable limitations, which ...

Training Robust Graph Neural Networks with Topology Adaptive Edge Dropping

Graph neural networks (GNNs) are processing architectures that exploit g...

Model-Agnostic Augmentation for Accurate Graph Classification

Given a graph dataset, how can we augment it for accurate graph classifi...

My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control

Multitask Reinforcement Learning is a promising way to obtain models wit...

1 Introduction

Graph Neural Networks (GNNs) (kipf2017semi; velickovic2018graph) , which iteratively propagate learned information in the form of message passing, have recently emerged as powerful approaches on a wide variety of tasks, including drug discovery (stokes2020deep), chip design (circuit-gnn), and catalysts invention (godwin2021deep). Recent studies on GNNs, nevertheless, also reveal a challenge in training GNNs. That is, similar to other successfully deployed deep neural networks, GNNs also require strong model regularization techniques to rectify their over-parameterized learning paradigm.

To this end, various regularization techniques for GNNs have been actively investigated, combating issues such as over-fitting (10.1145/3446776), over-smoothing (LiHW18; abs-1901-00596), and over-squashing (alon2021on). Despite the complexity of arbitrary structure and topology in graph data, simple edge and node manipulations (e.g., addition and deletion) on graphs (rong2020dropedge; 3412086zhou; You2020GraphCL; zhao2021data) represent a very effective data augmentation strategy, which has been widely used to regularize the learning of GNNs.

Figure 1: 2D embeddings of the original training graphs generated by the trained DropEdge model on the NCI1 dataset with the edge drop rate as 20%, 40%, and 60%, respectively. The trained DropEdge model induces more overlapped graph embeddings as the drop rate increases, indicating that many graphs are mapped, by the trained network, to very similar or the same embeddings but with conflict labels, triggering the sample collision issue.

In this paper, we reveal an overlooked issue in the aforementioned graph data augmentation technique for supervised graph classification. That is, simple edge and node manipulations such as addition and deletion can create graphs that have the same structure or indistinguishable structures to message passing GNNs and are nevertheless associated with different labels. We refer to such a conflict as sample collision

. When the sample collision occurs, these graph samples will be mapped to the same embedding but their labels contradict each other. Due to these contradicted samples, the learning model would fail to classify them. This essentially induces a form of under-fitting, which would inevitably result in the performance degradation of the model.

Take as an example the popular edge manipulation method DropEdge (rong2020dropedge), which randomly removes a portion of the edges in a given graph for augmentation. Consider that in DropEdge we use a 8-layer GCN with skip-connection (kipf2017semi; li2019deepgcns), where the edge drop rate is set to 20%, 40%, and 60%, respectively. Figure 1 pictures the graph embeddings of the original training samples, generated by the learned DropEdge model in 2D using t-SNE (vandermaaten08a), where the overlapped embedding regions of the two classes indicate the sample collision issue. Figure 1 shows that many original graphs are mapped, by the trained network, to very similar or the same embeddings, but with conflict labels. We can clearly see from Figure 1 that as the drop rate increases the overlapped regions enlarge. This observation identifies the sample collision issue 111The sample collision is difficult to be purely measured using the input data due to the continuing data augmentation process, the dynamics of the training model, the progressive layer-by-layer mappings of the network, and the implicit inductive biases of GNN models. On the other hand, the visualization of the learned embeddings of the original training set can reflect how the data look like from the perspective of the trained model. in DropEdge especially with a larger edge drop rate. As expected, more overlapped embedding regions can lead to a significant drop in the predictive accuracy, which will be shown in our experiment later.

Before addressing the sample collision issue, we identify three types of such sample collision in message passing GNNs, as follows.

The first type is that, a synthetic graph has the same structure as an original training graph but with a different label. That is, a synthetic graph collides with an original graph.

The second type of sample collision is that, two synthetic graphs have an identical structure but with different labels. Consider the popular benchmark NCI1 dataset, where each input graph represents a chemical compound with nodes and edges respectively denoting atoms and bonds in a molecule. Figure 2 depicts three pairs of molecular graphs with 10 edges (i.e., bonds) from the NCI1 dataset 222For illustrative purpose we here omit the graph node features., where each pair contains two graphs with opposite labels. Consider that we remove the red nodes and edges from the graphs, which from left to right respectively represents the 10%, 20%, and 30% edge and node deletion. After that, each resultant pair will have the same graph structure but with opposite labels.

Figure 2: Three pairs of molecular graphs with 10 edges (bonds) from the NCI1 dataset. Each pair (column) contains two graphs with opposite labels. If we remove the red nodes and edges from the graphs (which represents the 10%, 20%, and 30% edge and node deletion, from left to right, respectively), each resultant pair will have the same graph structure but with opposite labels.
Figure 3: By removing the red edges on the left graphs, we have the molecular graphs for Decalin and Bicyclopentyl (right), which form non-isomorphic molecular graphs (with identical node feature of “c”) that can not be distinguished by the WL test (sato2020survey). These two graphs will be mapped, by a standard message passing GNN, to the same graph embedding but with different class labels. SoftEdge solves such ambiguity by assigning random weights to some edges of the graphs.

The third type of sample collision refers to certain indistinguishable graphs by the Weisfeiler-Lehman (WL) test having different labels, which is specific to message passing GNNs. As shown in (xu2018how; Morris2019), GNNs in principle are limited in their expressiveness by the Weisfeiler-Lehman (WL) test, a method to determine whether two graphs are isomorphic. In other words, GNNs cannot distinguish certain different graphs, e.g., those are indistinguishable by the WL test. As an example, the right side of Figure 3 depicts the molecular graphs for Decalin and Bicyclopentyl, which are obtained by removing the red edges from the graphs on the left side. These two graphs (with an identical node feature of “c”) are indistinguishable by the WL test. In this case, the nodes in these two graphs receive the same embedding-based messages from their neighbors and thus will create the same node embedding. As a result, these two graphs with different labels will be mapped to the same graph embedding by GNNs 333Note that, the third type of sample collision is not necessarily resulting from a graph data augmentation process. In fact, the original training dataset itself may contain such graph pairs. .

To address the aforementioned sample collision issue, we propose SoftEdge, which simply assigns random weights in (0, 1) to a portion of the edges of a given graph to generate synthetic graphs. We show that SoftEdge creates collision-free augmented graphs. We also empirically demonstrate that this simple approach can effectively regularize graph classification learning, resulting in superior accuracy to popular node and edge manipulation approaches and notable resilience to the accuracy degradation in deeper GNNs.

The SoftEdge is easy to be implemented in practice. For example, it can be implemented with the following 6 lines of code in PyTorch Geomertic:

1. row, _ = data.edge_index #data is the input graphs
2. p = 0.2  # 20%  of edges with soft weights
3. softedge = (1- torch.rand((row.size(0),))).to(device)
4. mask = data.edge_index.new_full(
        (row.size(0), ), 1 - p, dtype=torch.float)
5. mask = torch.bernoulli(mask)
6. data.edge_weight = softedge*(1-mask)+mask

2 Related Work

Graph data augmentation has been shown to be very effective in regularizing GNNs for generalizing to unseen graphs (kipf2017semi; YingY0RHL18; velickovic2018graph; klicpera_diffusion_2019; xu2018how; bianchi2020mincutpool). Nonetheless, graph data augmentation is rather under-explored due to the arbitrary structure and topology in graphs. Most of such strategies heavily focus on perturbing nodes and edges in graphs (10.5555/3294771.3294869; zhang2018bayesian; rong2020dropedge; ChenLLLZS20; 3412086zhou; 10.1145/3394486.3403168; WangWLCLH20; Fu2020TSExtractorLG; abs-2009-10564; zhao2021data; abs-2104-02478; zhao2021data). For example, DropEdge (rong2020dropedge) randomly removes a set of edges of a given graph. DropNode, representing node sampling based methods (10.5555/3294771.3294869; abs-1801-10247; 10.5555/3327345.3327367), samples a set of nodes from a given graph. Leveraging the graph augmentation in contrastive learning has also attracted a surge of interests (you2021graph; You2020GraphCL; 2020arXiv201014945Z). Compared to the existing work, our study identifies an intrinsic problem in graph data augmentation, i.e., the sample collision, which in turn inspires us to devise a novel method to address this collision issue in graph classification.

The creation of a synthetic image that has the same input as an image from the original training dataset but with a different label was first discussed in image augmentation in Mixup (GuoMZ19; hendrycksaugmix; baena2022preventing), where the term manifold intrusion is used. Our paper was inspired by their observations. In fact the first type of the sample collision we identified, as discussed in Section 1, is the same as their manifold intrusion problem. Differently, in this paper we discover two more intrusion scenarios, and one of them is specific to graph data augmentation.

3 Augmented Graph with Random Soft Edges

3.1 Graph Classification with Soft Edges

We consider an undirected graph with the node set and the edge set . represents the neighbours of node , namely the set of nodes in the graph that are directly connected to in (i.e., {). We denote with the number of nodes and the adjacency matrix. Each node in is also associated with a

-dimensional feature vector, forming the feature matrix

of the graph. Consider the graph classification task with categories. Based on a GNN, we aim to learn a mapping function , which assigns a graph to one of the -class labels .

Modern GNNs leverage both the graph structure and node features to construct a distributed vector to represent a graph. The forming of such graph representation or embedding follows the “message passing” mechanism for the neighborhood aggregation. In a nutshell, every node starts with the embedding given by its initial features . A GNN then iteratively updates the embedding of a node by aggregating representations (i.e., embeddings) of its neighbours layer-by-layer, keep transforming the node embeddings. After the construction of individual node embeddings, the entire graph representation can then be obtained through a READOUT function, which aggregates the embeddings of all nodes in the graph.

Formally, the -th node’s representation at the -th layer of a GNN is constructed as follows:


where is the trainable weights at the -th layer; AGGREGATE denotes an aggregation function implemented by the specific GNN model (e.g., the permutation invariant pooling operations Max, Mean, Sum); and is typically initialized as the node input feature .

To provide the graph level representation , a GNN typically aggregates node representations by implementing a READOUT graph pooling function (e.g., Max, Mean, Sum) to summarize information from its individual nodes:


The final step of the graph classification is to map the graph representation to a graph label

, through, for example, a softmax layer.

Our proposed SoftEdge leverages graph data with soft edges, which requires the GNN networks be able to take the edge weights into account for message passing when implementing Equation 1 to generate node representations. Fortunately, the two popular GNN networks, namely GCNs (kipf2017semi) and GINs (xu2018how), can naturally take into account the soft edges through weighted summation of neighbor nodes, as follows.

In a GCN, the adjacency matrix can naturally include edge weight values between zero and one (kipf2017semi), instead of binary edge weights. Consequently, Equation 1 in a GCN is implemented through a weighted sum operation:


where ; is the edge weight between nodes and ; stands for the trainable weights at layer ; and

is the non-linearity transformation ReLu.

Similarly, to handle soft edge weights in a GIN, we can simply replace the sum operation of the isomorphism operator with a weighted sum calculation, and get the following implementation of Equation 1 for message passing:


Here, denotes a learnable parameter.

3.2 Graph Learning with Random Soft Edges

Our method SoftEdge takes as input a graph , and a given

representing the percentage of edges that will be associated with random soft weights. Specifically, at each training epoch, SoftEdge randomly selects

percentage of edges in , and assigns each of the selected edges (denoted as , ) with a random weight that is uniformly sampled from .

Assume that all edge weights of the original graph are . With the above operation the newly created synthetic graph will have the same nodes (i.e., ) and the edge set (i.e., ) as the original graph , except that the edge weights are not all with the value anymore. In specific, % of edges will be associated with a random weight in , and the rest will be unchanged. Formally, the edge weights in SoftEdge are as follows:


where , , and .

Figure 4: Illustration of SoftEdge: The left is the original adjacency matrix (self-loop; fully connected) with binary weights, and the right is the SoftEdge’s adjacency matrix where % of its elements are in .

This softening edges process is illustrated in Figure 4, where the left subfigure is the original adjacency matrix for a fully connected graph with self-loop, and the right is the new adjacency matrix generated by SoftEdge, which can be obtained by the dot product between and the edge weights. The yellow cells are edges with soft weights (belong to edges in ). With such weighted edges, message passing in GNNs as formulated in Equations 3 and  4 can be directly applied for learning.

The pseudo-code of the synthetic graph generation in SoftEdge is described in Algorithm 1.

  Input: a graph (all edges in have weight 1); percentage of soft weights
  Output: a synthetic graph
  Sample percentage of edges from , forming .
  for  do
     if  then
         Sample from uniformly
     end if
  end for
  Return synthetic graph with new edge weights .
Algorithm 1 Synthetic graph generation in SoftEdge

3.3 Collision-Free Graph Data Augmentation

Recall from Section 1 that, we refer to the sample collision in graphs as two graphs during training having the same structure or indistinguishable structures to message passing GNNs but with different labels, and identify three types of collision. Specifically, for the sample collision to occur we should have one of the following three situations: 1) the new adjacency matrix is the same as an original ; 2) two new adjacency matrices and with and uniformly sampled from are exactly the same; or 3) two different graphs are indistinguishable by the Weisfeiler-Lehman test.

It is obvious that, given at least one edge weight in SoftEdge lies in , which automatically avoids the first type of collision. Next we show that SoftEdge also creates collision-free data augmentation for the last two situations.

Lemma 1

For any and , .

Proof: Each is independently drawn from a continuous distribution over

. Hence, the probability of two sets of soft weights

and being identical is zero.

Lemma 2

When two different graphs are indistinguishable by the Weisfeiler-Lehman test, assigning random weights to the graphs resolves the ambiguities leading to the failure of the Weisfeiler-Lehman test.

Proof: Due to the random soft edges, each message in the two graphs is transformed differently depending on the weight between the endpoint nodes. As such, the GNN produces the same embedding for the two graphs if we have exactly the same subset of edges with the same random weights, which has the probability zero following the same argument as in the proof of Lemma 1. Intuitively, the exploiting of graph structure in message passing GNNs can be understood as the breadth first search tree (xu2018how). Such trees will not be identical when associating some random weights in different tree traversal paths (an illustrative example is provided in Figure 9 in the Appendix).

With the above analysis we can conclude that, SoftEdge excludes a synthetic graph 1) to coincide with any other graph in the training set or another synthetic graph, or 2) to being indistinguishable from another graph by the Weisfeiler-Lehman test. This eliminates the possibility of the sample collision in SoftEdge.

3.4 Further Discussion on SoftEdge

3.4.1 Relation to DropEdge (rong2020dropedge)

The proposed SoftEdge method is inspired by and closely related to DropEdge. Unlike DropEdge, SoftEdge excludes the edge deletion, and instead assigns random weights between to a portion of the edges in a graph (SoftEdge would be equivalent to DropEdge if the soft weights were all zero). As a consequence, different from DropEdge, SoftEdge can guarantee that the synthetic graph dataset is free of collision after data augmentation, therefore eliminates the possibility of the performance degradation caused by the under-fitting due to the sample collision.

3.4.2 Why SoftEdge works?

Intuitively, GNNs essentially update node embeddings with a weighted sum of the neighborhood information through the graph structure. In SoftEdge, edges of the graph are associated with different weights that are randomly sampled. Consequently, SoftEdge enables a random subset aggregation instead of the fixed neighborhood aggregation during the learning of GNNs, which provides dynamic neighborhood over the graph for message passing. This can be considered as a form of data augmentation or denoising filter, which in turn helps the graph learning because edges in real graphs are often noisy and arbitrarily defined.

Furthermore, by assigning random edge weights to graphs, SoftEdge makes some graphs indistinguishable by the WL test become distinguishable, thus prevents them from being mapped to the same graph embedding. This mitigates the under-fitting problem in graphs as mentioned before.

We further note that, different from DropEdge, the synthetic graph generated by SoftEdge has the same nodes and their connectivities (i.e., the same and ) as the original graph. The only difference is that the synthetic graph has some soft edges. As such, the synthetic graph maintains a large similarity to its corresponding original graph, which we believe attributes to the superiority of SoftEdge to DropEdge.

4 Experiments

4.1 Settings

Datasets We conduct experiments using six graph classification tasks from the graph benchmark datasets collection TUDatasets (Morris+2020): PTC_MR, NCI109, NCI1, and MUTAG for small molecule classification, and ENZYMES and PROTEINS for protein categorization. These datasets have been widely used for benchmarking such as in (xu2018how) and can be downloaded directly using PyTorch Geometric (Fey/Lenssen/2019)’s built-in function online 444 Table 3 in the Appendix summarizes the statistics of the datasets, including the number of graphs, the average node number per graph, the average edge number per graph, the number of node features, and the number of classes.

Comparison Baselines We compare our method with three baselines: DropEdge (rong2020dropedge), DropNode (10.5555/3294771.3294869; abs-1801-10247; 10.5555/3327345.3327367), and Baseline. For the Baseline model, we use two popular GNN network architectures: GCNs (kipf2017semi) and GINs (xu2018how). GCNs use spectral-based convolutional operation to learn spectral features of graph through message aggregation, leveraging a normalized adjacency matrix. In the experiments, we use the GCN with Skip Connection (7780459) as that in (li2019deepgcns). Such Skip Connection empowers the GCN to benefit from deeper layers in a GNN. We denote this GCN as ResGCN. The GIN represents the state-of-the-art GNN architecture. It leverages the nodes’ spatial relations to aggregate neighbor features. For both ResGCN and GIN, we use their implementations in the PyTorch Geometric platform 555

We note that, in this paper, we aim to identify the issue of the sample collision and its impact, instead of establishing a new state-of-the-art accuracy. Therefore, we compare our method with commonly used data augmentation baselines DropEdge and DropNode. We believe that, our method is also useful for advanced graph data augmentation strategies, which we will leave for future studies.

Detailed Settings

We follow the evaluation protocol and hyperparameters search of GIN 

(xu2018how) and DropEdge (rong2020dropedge)

. In detail, we evaluate the models using 10-fold cross validation, and compute the mean and standard deviation of three runs. Each fold is trained with 350 epochs with AdamW optimizer 

(KingmaB14). The initial learning rate is decreased by half every 50 epochs. The hyper-parameters searched for all models on each dataset are as follows: (1) initial learning rate {0.01, 0.0005}; (2) hidden unit of size 64; (3) batch size {32, 128}; (4) dropout ratio after the dense layer {0, 0.5}; (5) drop ratio in DropNode and DropEdge {20%, 40%}, (6) number of layers in GNNs {3, 5, 8, 16, 32, 64, 100}. For SoftEdge, , the percentage of soft edges, is 20%, and the soft edge weights are uniformly sampled from (0, 1), unless otherwise specified. Following GIN (xu2018how) and DropEdge (rong2020dropedge), we report the case giving the best 10-fold average cross-validation accuracy. Our experiments use a NVIDIA V100/32GB GPU.

Dataset Method 3 layers 5 layers 8 layers 16 layers max
PTC_MR ResGCN 0.6190.006 0.6420.003 0.6380.003 0.6520.008 0.652
DropEdge 0.6330.006 0.6530.007 0.6460.002 0.6520.005 0.653
DropNode 0.6200.002 0.6480.018 0.6420.005 0.6490.007 0.649
SoftEdge 0.6490.006 0.6650.004 0.6640.006 0.6710.006 0.671
NCI109 ResGCN 0.7910.004 0.8030.003 0.8070.001 0.8100.003 0.810
DropEdge 0.7600.000 0.7780.008 0.8000.002 0.8080.003 0.808
DropNode 0.7650.002 0.7930.015 0.8010.002 0.8020.001 0.802
SoftEdge 0.7900.005 0.8130.003 0.8210.001 0.8240.001 0.824
NCI1 ResGCN 0.7960.002 0.8040.003 0.8100.002 0.8140.003 0.814
DropEdge 0.7760.001 0.7950.010 0.8140.001 0.8180.001 0.818
DropNode 0.7780.001 0.8050.019 0.8120.001 0.8130.002 0.813
SoftEdge 0.7990.002 0.8190.001 0.8220.002 0.8270.001 0.827
MUTAG ResGCN 0.8270.003 0.8410.009 0.8460.001 0.8460.000 0.846
DropEdge 0.8160.003 0.8320.003 0.8500.006 0.8580.003 0.858
DropNode 0.8230.006 0.8290.006 0.8390.003 0.8580.006 0.858
SoftEdge 0.8590.008 0.8410.001 0.8460.005 0.8740.003 0.874
ENZYMES ResGCN 0.5080.015 0.5370.003 0.5400.010 0.5410.014 0.541
DropEdge 0.4890.003 0.5210.003 0.5640.011 0.5970.002 0.597
DropNode 0.5100.002 0.5320.006 0.5730.006 0.5900.004 0.590
SoftEdge 0.5180.004 0.5640.005 0.5860.005 0.6150.004 0.615
PROTEINS ResGCN 0.7380.005 0.7380.002 0.7390.003 0.7470.005 0.747
DropEdge 0.7470.003 0.7500.003 0.7490.002 0.7550.002 0.755
DropNode 0.7450.004 0.7480.001 0.7440.002 0.7450.005 0.748
SoftEdge 0.7450.002 0.7480.000 0.7410.000 0.7570.003 0.757
Table 1: Accuracy of the testing methods with ResGCN networks as baseline. We report mean accuracy over 3 runs of 10-fold cross validation with standard deviations (denoted ). Max depicts the max accuracy over different GNN layers. Best results are in Bold.

4.2 Results with ResGCN

4.2.1 Main Results

Table 1 presents the accuracy obtained by the ResGCN (kipf2017semi; li2019deepgcns) baseline, DropEdge, DropNode, and SoftEdge on the six datasets, where we evaluate GNNs with 3, 5, 8, and 16 layers. In the table, the best results are in Bold.

Results in the last column of Table 1 show that SoftEdge outperformed all the three comparison models on all the six datasets when considering the max accuracy obtained with layers 3, 5, 8, and 16. For example, when compared with ResGCN, SoftEdge increased the accuracy from 65.2%, 84.6%, and 54.1% to 67.1%, 87.4%, and 61.5%, respectively, on the PTC_MR, MUTAG, and ENZYMES datasets. Similarly, when compared with DropEdge, SoftEdge improved the accuracy from 65.3%, 80.8%, and 84.6% to 67.1%, 82.4%, and 87.4%, respectively, on the PTC_MR, NCI109, and MUTAG datasets.

Promisingly, as highlighted in bold in the table, SoftEdge obtained superior accuracy to the other comparison models in most settings regardless of the network layers used. For example, on the PTC_MR, NCI1, and ENZYMES datasets, SoftEdge outperformed all the three baselines with all the network depths tested (i.e., 3, 5, 8, and 16 layers).

Another observation here is that SoftEdge improved or maintained the predictive accuracy in most of the cases as the networks increased their depth from 3, 5, 8, to 16. As can be seen in Table 1, the best accuracy of SoftEdge on all the six datasets were obtained by SoftEdge with 16 layers.

Figure 5: Accuracy obtained by ResGCN, DropEdge, and SoftEdge with 32, 64, and 100 layers on NCI109 and NCI1.

4.2.2 Ablation Studies

We conduct extensive ablation studies to evaluate SoftEdge. We particularly compare our strategy with DropEdge, since it is the most related algorithm to our approach.

Effect on GNN Depth We conduct experiments on further increasing the networks’ depth by adding more layers, including ResGCN with 32, 64, and 100 layers. Results on the NCI109 and NCI1 datasets are presented in Figure 5.

Results in Figure 5 show that DropEdge significantly degraded the predictive accuracy when ResGCN has 64 and 100 layers; and surprisingly, the baseline model ResGCN performed better than DropEdge as the networks went deeper. Notably, the SoftEdge method was able to slow down the degradation in terms of the accuracy obtained with deeper networks, outperforming both baselines ResGCN and DropEdge with layers 32, 64, and 100.

Note: experiment that increasing the dimension of node embedding in SoftEdge is shown in Section C in the Appendix.

Under- and Over- Fitting We also evaluate the efficacy of our method on coping with over-fitting and under-fitting. We plot the training loss and validation accuracy of the SoftEdge, ResGCN, and DropEdge methods with 8 layers across the 350 training epochs on the NCI1 dataset in Figure 6.

Figure 6: Training loss (top) and validation accuracy (bottom).

We have the following observations from Figure 6. First, the training loss of ResGCN approached zero after 250 epochs as shown by the orange curve in the top subfigure. This leaded to insufficient gradients for tuning the networks, which is undesirable. Second, the loss of both DropEdge and SoftEdge is above zero. This is due to a larger sample space induced by the synthetic training set as the newly generated synthetic graphs are included in both methods. This would prevent the model from being over-fitted by the original limited training set. Specifically, we can see that the training loss of DropEdge was high, around 0.4 after 100 epochs. This high loss allowed DropEdge to keep tuning the network, which, however, did not result in a test accuracy improvement as shown in the bottom subfigure. We hypothesize that the high training loss and low test accuracy in DropEdge may due to the sample collision issue, namely training with samples of conflict labels, which results in under-fitting. In contrast, the training loss of SoftEdge remained between 0.1 and 0.2 even after a long training time. The same as DropEdge, since the loss is above zero the models would keep on tuning the parameters. Promisingly, unlike DropEdge, such parameter tuning in SoftEdge did result in a test accuracy improvement as shown by the green curve in the bottom subfigure of Figure 6, and we attribute it to the collision-free data augmentation induced by SoftEdge.

Figure 7: Accuracy obtained by DropEdge and SoftEdge as ratio of modified edges (x-axis) increasing on NCI109 and NCI1.
Figure 8: 2D embeddings of the original training graphs generated by the trained SoftEdge model on NCI1 with soft edge rate as 20%, 40%, and 60%, respectively. The trained SoftEdge model sees much less overlapped graphs than DropEdge as shown in Figure 1.

Ratio of Modified Edges vs. Sample Collision In Figure 7, we present the results when varying the percentage of modified edges (i.e., ) in both DropEdge and SoftEdge with 8 layers. We evaluate the percentage of 20%, 40%, 60%, and 80%, respectively, on NCI109 and NCI1. We chose these two datasets since they have the largest number of node features amongst the six tested datasets, which are more challenging. This is because, intuitively with a large number of node features the sample collision issue could be mitigated as the chance of creating identical or indistinguishable synthetic graphs is small.

Results in Figure 7 show that, DropEdge was very sensitive to the percentage of modified edges as the model’s predictive accuracy decreases significantly with the drop rate. This is expected, as DropEdge completely dropped selected edges the graph structure would be largely changed especially when the drop rate is high. On the other hand, SoftEdge seemed to be robust to the percentage of edges being modified as the accuracy obtained for the candidate ratios was quite similar. This is mainly because, unlike DropEdge, the synthetic graphs generated by SoftEdge remain a large similarity to the original graphs as the node structures are kept the same.

To better understand Figure 7, we show in Figure 8 the embeddings of the original NCI1 training set generated by the SoftEdge models in 2D using t-SNE (vandermaaten08a). Embeddings in Figure 8 show that as the ratio of soft edges increases in SoftEdge, the overlapped embedding areas for the two classes seem to stay similar. This reveals a significant difference from the embeddings formed by DropEdge shown in Figure 1, where the overlapped areas of the different classes increase dramatically as the drop rate rises. This observation suggests that, the sample collision issue gets much severe when DropEdge increases its drop rate, causing many training graphs to have very similar or the same embeddings, but with different labels. On the contrary, this is not the case for SoftEdge due to its sample collision-freeness property.

Dataset Method 5 layers 8 layers 16 layers max
PTC_MR GIN 0.6470.005 0.6590.003 0.6730.022 0.673
DropEdge 0.6800.003 0.6810.009 0.6790.007 0.681
DropNode 0.6880.006 0.6890.003 0.6800.006 0.689
SoftEdge 0.6870.011 0.6910.009 0.6960.007 0.696
NCI109 GIN 0.8180.002 0.8230.002 0.8200.001 0.823
DropEdge 0.8130.005 0.8140.003 0.8050.003 0.814
DropNode 0.8190.002 0.8210.002 0.8160.004 0.821
SoftEdge 0.8350.002 0.8360.001 0.8380.003 0.838
NCI1 GIN 0.8200.002 0.8210.001 0.8210.002 0.821
DropEdge 0.8190.004 0.8230.001 0.8180.003 0.823
DropNode 0.8210.003 0.8210.003 0.8240.000 0.824
SoftEdge 0.8390.001 0.8390.002 0.8370.005 0.839
MUTAG GIN 0.8760.003 0.8710.003 0.8740.000 0.876
DropEdge 0.8570.005 0.8780.009 0.8710.003 0.878
DropNode 0.8640.006 0.8750.003 0.8690.009 0.875
SoftEdge 0.8760.006 0.8780.005 0.8890.005 0.889
ENZYMES GIN 0.5310.004 0.5330.006 0.5320.001 0.533
DropEdge 0.4800.003 0.4970.013 0.4870.004 0.497
DropNode 0.5210.004 0.5630.005 0.5520.004 0.563
SoftEdge 0.5710.002 0.5900.003 0.5880.009 0.590
PROTEINS GIN 0.7410.006 0.7440.005 0.7430.001 0.744
DropEdge 0.7420.002 0.7380.004 0.7360.003 0.742
DropNode 0.7360.002 0.7450.002 0.7470.002 0.747
SoftEdge 0.7400.002 0.7420.004 0.7470.002 0.747
Table 2: Accuracy of the testing methods with GIN networks as baseline. We report mean accuracy over 3 runs of 10-fold cross validation with standard deviations (denoted ). Max depicts the max accuracy over different GNN layers. Best results are in Bold.

4.3 Results with GIN

In this section, we also evaluate our method using the GIN (xu2018how) network architecture. Table 2 presents the accuracy obtained by the GIN, DropEdge, DropNode, and SoftEdge on the six datasets, where we examine GNNs with 5, 8, and 16 layers and the best results are in Bold.

Results in Table 2 show that, similar to the ResGCN case, SoftEdge with GIN as the baseline outperformed all the other comparison models on all the six datasets, as highlighted by the last column of the table. For example, when compared with the GIN baseline, SoftEdge increased the accuracy from 67.3%, 82.1%, and 53.3% to 69.6%, 83.9%, and 59.0%, respectively, on the PTC_MR, NCI1, and ENZYMES datasets. Similarly, when compared with DropEdge, SoftEdge improved the accuracy from 81.4%, 82.3%, and 49.7% to 83.8%, 83.9%, and 59.0%, respectively, on the NCI109, NCI1, and ENZYMES datasets.

5 Discussions

This paper aims at identifying the sample collision issue on graph classification. Two other popular tasks in graph learning are node classification and edge prediction. For example, on the node level classification, the sample collision issue also exists. That is, we may have graph nodes with the same embedding but of different node labels. In fact, the identical node embedding phenomenon has recently been well observed, and is attributed to the over-smoothing problem (LiHW18; abs-1901-00596). When over-smoothing occurs, nodes are projected to embeddings that are indistinguishable due to the message passing mechanism in GNNs. This will cause the collision issue on the node level if some of the indistinguishable nodes have different labels. Nonetheless, the role of the input graph structures playing in such node collision is difficult to quantify directly. This is because that, with deeper network architectures, the representations of all nodes in a graph may converge to a subspace that makes their representations unrelated to the input (LiHW18; abs-1901-00596). In this sense, the tight coupling of the graph structures and the learning dynamics of the GNNs makes the node level intrusion problem more challenging. We intend to further investigate this in the future.

Also, SoftEdge assumes that the given graphs for training have binary edges, which is a commonly adopted setting in the graph learning field. When using graphs with real-valued edge weights, there are two main strategies to handle such situation in the literature, which can be adopted by our SoftEdge algorithm. In the first method, edge weights are added to the features of the nodes (abs-2005-00687; DBLP:journals/corr/abs-2006-07739). In the second approach, edge weights are treated the same way as node features in the Aggregation function of the GNNs (DBLP:journals/corr/GilmerSRVD17; xu2018how; kipf2017semi; hu2020strategies).

6 Conclusions

We discovered an overlooked limitation, i.e., the sample collision, in simple edge and node manipulations for graph data augmentation, where graphs with an identical structure or indistinguishable structures to message passing GNNs but of conflict labels are created. We proposed SoftEdge, which assigns random weights to a portion of the edges of a given graph, enabling training GNNs with dynamic neighborhoods. We proved that SoftEdge creates collision-free augmented graphs. We also showed that this simple approach resulted in superior accuracy to popular node and edge manipulation methods and notable resilience to the accuracy degradation with the GNN depth.

We hope that our work here can facilitate future research that advances collision-free graph data augmentation.


Appendix A Illustrative Example of Distinguishable Graphs with Random Soft Edges

Figure 9 provides an illustrative example of the benefit of random soft edges for distinguishing graphs. The input graphs are on the top, and we here depict two hops of the GNNs on computing the node embedding for the nodes circled in red. A standard GNN cannot distinguish the two nodes if the edges are with binary weights: the left and right trees are equivalent due to the permutation invariant computations in GNNs. On the other hand, by assigning random edge weights, GNNs can disambiguate left tree from the right one due to the soft weights along the tree travel paths (highlighted in red).

Figure 9: Illustrative example: GNNs cannot distinguish nodes with binary weights. By assigning random edge weights (in red), GNNs can disambiguate them.

Appendix B Statistics of the graph classification benchmark datasets.

Table 3 summarizes the data statistics of the six datasets, including the number of graphs, the average node number per graph, the average edge number per graph, the number of node features, and the number of classes.

Name graphs nodes edges features classes
PTC_MR 334 14.3 29.4 18 2
NCI109 4127 29.7 64.3 38 2
NCI1 4110 29.9 64.6 37 2
MUTAG 188 17.9 39.6 7 2
ENZYMES 600 32.6 124.3 3 6
PROTEINS 1113 39.1 145.6 3 2
Table 3: Statistics of the graph classification benchmark datasets.

Appendix C Increasing the Node Embedding Dimension in SoftEdge

Despite its resilience to performance degradation with depth, SoftEdge also degraded its performance with 64 and 100 layers, as shown in Section 4.2.2. Theoretically, without sample collision, if the modeling power of the GNN networks is enough, SoftEdge should not degrade the predictive performance of a model. Inspired by the over-squashing issue as discussed in (alon2021on), we suspect that the degradation may due to the bottleneck of the node embedding dimension. To verify our hypothesis, we further conduct experiments of increasing the model’s capability through enlarging the dimension of the hidden embeddings. We increased the node embedding dimension from 64, to 128, 256 and 512 in ResGCN with 100 layers. Results are presented in Figure 10. Figure 10 show that SoftEdge can increase the accuracy by enlarging the hidden dimension, but still obtained accuracy slightly behind the less deeper models. We conjecture that such inferior accuracy may due to other issues in the graph learning such as over-smoothing (LiHW18; abs-1901-00596), over-squashing (alon2021on), or the networks’ learning dynamics.

Figure 10: Accuracy obtained when increasing the dimension of the node embedding in SoftEdge with 100 layers on the NCI1 and NCI109 datasets.