A pipeline for fair comparison of graph neural networks in node classification tasks
Graph neural networks (GNNs) have been investigated for potential applicability in multiple fields that employ graph data. However, there are no standard training settings to ensure fair comparisons among new methods, including different model architectures and data augmentation techniques. We introduce a standard, reproducible benchmark to which the same training settings can be applied for node classification. For this benchmark, we constructed 9 datasets, including both small- and medium-scale datasets from different fields, and 7 different models. We design a k-fold model assessment strategy for small datasets and a standard set of model training procedures for all datasets, enabling a standard experimental pipeline for GNNs to help ensure fair model architecture comparisons. We use node2vec and Laplacian eigenvectors to perform data augmentation to investigate how input features affect the performance of the models. We find topological information is important for node classification tasks. Increasing the number of model layers does not improve the performance except on the PATTERN and CLUSTER datasets, in which the graphs are not connected. Data augmentation is highly useful, especially using node2vec in the baseline, resulting in a substantial baseline performance improvement.
READ FULL TEXT