FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks

04/14/2021 ∙ by Chaoyang He, et al. ∙ University of Southern California 54

Graph Neural Network (GNN) research is rapidly growing thanks to the capacity of GNNs to learn representations from graph-structured data. However, centralizing a massive amount of real-world graph data for GNN training is prohibitive due to user-side privacy concerns, regulation restrictions, and commercial competition. Federated learning (FL), a trending distributed learning paradigm, aims to solve this challenge while preserving privacy. Despite recent advances in vision and language domains, there is no suitable platform for the federated training of GNNs. To this end, we introduce FedGraphNN, an open research federated learning system and a benchmark to facilitate GNN-based FL research. FedGraphNN is built on a unified formulation of federated GNNs and supports commonly used datasets, GNN models, FL algorithms, and flexible APIs. We also contribute a new molecular dataset, hERG, to promote research exploration. Our experimental results present significant challenges in federated GNN training: federated GNNs perform worse in most datasets with a non-I.I.D split than centralized GNNs; the GNN model that attains the best result in the centralized setting may not hold its advantage in the federated setting. These results imply that more research efforts are needed to unravel the mystery behind federated GNN training. Moreover, our system performance analysis demonstrates that the FedGraphNN system is computationally affordable to most research labs with limited GPUs. We maintain the source code at https://github.com/FedML-AI/FedGraphNN.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Graph Neural Networks (GNNs) are state-of-the-art models that learn representations from complex graph-structured data in various domains such as drug discovery (Rong et al., 2020b), social network recommendation (Wu et al., 2018a; Sun et al., 2019; He et al., 2019b), and traffic flow modeling (Wang et al., 2020b; Cui et al., 2019). However, for reasons such as user-side privacy, regulation restrictions, and commercial competition, there are surging real-world cases in which graph data is decentralized, limiting clients’ data size. For example, in the AI-based drug discovery industry, pharmaceutical research institutions would significantly benefit from other institutions’ private data, but neither cannot afford to disclose their private data for commercial reasons. Other possible use-cases are included in Figure 1. Federated Learning (FL) is a distributed learning paradigm that addresses this data isolation problem. In FL, training is an act of collaboration between multiple clients without requiring centralized local data while providing a certain degree of user-level privacy (McMahan et al., 2017; Kairouz et al., 2019).

(a) Graph-level FL
(b) Subgraph-level FL
(c) Node-level FL
(d) Edge-level FL
Figure 1: Four Types of Federated Graph Neural Networks (FedGraphNN): (a) Graph-level FL: molecule property prediction (our work); (b) Subgraph-level FL: recommendation system Wu et al. (2021)

, knowledge graphs (

Zheng et al. (2020b); (c) Node-level FL: spatio-temporal forecasting (Meng et al. (2021); (d) Edge-level FL: social networks.

Despite FL being successfully applied in domains like computer vision

(Liu et al., 2020; Hsu et al., 2020)

and natural language processing

(Hard et al., 2018; Ge et al., 2020), it has yet to be widely adopted in the domain of graph machine learning. There are multiple reasons for this:

  1. Most existing FL libraries, as summarized by (He et al., 2020c)

    , do not support GNNs. Given the complexity of graph data, the dynamics of training GNNs in a federated setting may be different from training vision or language models. A fair and easy-to-use benchmark is essential to distinguish the advantages of different GNN models and FL algorithms;

  2. The definition of federated GNNs is vague in current literature. This vagueness makes it difficult for researchers who focus on SGD-based federated optimization algorithms to understand challenges in federated GNNs;

  3. Applying existing FL algorithms to GNNs is nontrivial and requires significant engineering effort to transplant and reproduce existing algorithms to GNN models and graph datasets. Recent works (Wang et al., 2020a; Meng et al., 2021; Wu et al., 2021), only use the naive FedAvg algorithm (McMahan et al., 2017), which we demonstrate is sub-optimal in many cases.

To address these issues, we present an open-source federated learning system for GNNs, namely

FedGraphNN, that enables the training of a variety of GNN models effectively and efficiently in a federated setting as well as benchmarks in non-I.I.D. graph datasets (e.g., molecular graphs). We first formulate federated graph neural networks to provide a unified framework for federated GNNs (Section 3). Under this formulation, we design a federated learning system to support federated GNNs with a curated list of FL algorithms and provide low-level APIs for algorithmic research customization and deployment (Section 4). We then provide a benchmark on commonly used molecular datasets and GNNs. We also contribute a large-scale federated molecular dataset named hERG for further research exploration (Section 5). Our experiments show that the straightforward deployment of FL algorithms for GNNs is sub-optimal (Section 6). Finally, we highlight future directions for federated GNNs (Section 7).

2 Related Works

Federated Graph Neural Networks (FedGraphNN) lies at the intersection of graph neural networks (GNNs) and federated learning. We mainly discuss related works that train GNNs using decentralized datasets. (Suzumura et al., 2019) and (Mei et al., 2019) use computed graph statistics for information exchange and aggregation to avoid node information leakage. Sajadmanesh and Gatica-Perez (2021) introduce a privacy-preserving GNN model via local differential privacy (LDP). (Zhou et al., 2020) utilize Secure Multi-Party Computation (SMPC) and Homomorphic Encryption (HE) into GNN learning for node classification. (Jiang et al., 2020) propose a secure aggregation method to learn dynamic representations from multi-user graph sequences. More recently, (Wang et al., 2020a) use a hybrid method of federated learning and meta-learning to solve the semi-supervised graph node classification problem in decentralized social network datasets. (Meng et al., 2021) attempt to protect the node-level privacy using an edge-cloud partitioned GNN model for spatio-temporal forecasting tasks using node-level traffic sensor datasets.

Our vision is that FedGraphNN should cover four types of GNN-based federated learning:

  1. Graph level (Figure 1(a)). We believe molecular machine learning is a paramount application in this setting, where many small graphs are distributed between multiple edge devices;

  2. Sub-graph level (Figure 1(b)). This scenario typically pertains to social networks or knowledge graphs that need to be partitioned into many small sub-graphs due to data barriers between different departments in a giant company, as demonstrated in (Zheng et al., 2020b; Wu et al., 2021).

  3. Node level (Figure 1(c)). When the privacy of a specific node in a graph is important, node-level GNN-based FL is useful in practice. The IoT setting is a good example Zheng et al. (2020b);

  4. Link level (Figure 1(d)) is also a promising direction that is relevant when the privacy of edges (eg: connections in a social network) is of importance.

Although the current version of FedGraphNN only contains graph-level GNN-based FL, other scenarios are also in our plan.

3 Formulation: Federated Graph Neural Networks

Figure 2: Formulation of FedGraphNN (Federated Graph Neural Network)

We consider a graph level federated learning setting where graph datasets are dispersed over multiple edge servers that cannot be centralized for training due to privacy or regulation restrictions. For instance, compounds in molecular trials (Rong et al., 2020b) or knowledge graphs for recommendation systems (Chen et al., 2020) may not be shared across entities because of intellectual property concerns. Under this setting, we assume that there are clients in the FL network, and the client has its own dataset , where is the graph sample in with node & edge feature sets and , is the corresponding multi-class label of , is the sample number in dataset , and . Each client owns a Graph Neural Network (GNN) model to learn graph or node-level representations. Multiple clients are interested in collaborating through a server to improve their GNN models without necessarily revealing their graph datasets.

We illustrate the formulation of Federated Graph Neural Network (FedGraphNN) in Figure 2. Without loss of generality, we use a Message Passing Neural Network (MPNN) framework (Gilmer et al., 2017; Rong et al., 2020c). Most of the spatial-based GNN models (Kipf and Welling, 2016; Veličković et al., 2018; Hamilton et al., 2017) can be unified into this framework, where the forward pass has two phases: a message-passing phase and a readout phase. The message passing phase contains two steps: First, the model gathers and transforms the neighbors’ messages. Then, the model uses aggregated messages to update node hidden states. Mathematically, for client and layer indices , a -layer MPNN is formalized as follows:

(1)
(2)

where is the client’s node features, is the layer index, AGG is the aggregation function (e.g., in the GCN model, the aggregation function is a simple SUM operation), is the neighborhood set of node (e.g., 1-hop neighbors), and is the message generation function which takes the hidden state of current node , the hidden state of the neighbor node and the edge features as inputs. is the state update function receiving the aggregated feature . After propagating through an

-layer MPNN, the readout phase computes a feature vector for downstream tasks (node-level or graph-level). For example, we can obtain the whole graph representation using some readout function

according to:

(3)

To formulate GNN-based FL, we define as the overall learnable weights in client . In general, is independent of graph structure (i.e., GNN models are normally inductive and generalize to unseen graphs). Consequently, we formulate GNN-based FL as a distributed optimization problem as follows:

(4)

where is the th client’s local objective function that measures the local empirical risk over the heterogeneous graph dataset .

is the loss function of the global GNN model. To solve this problem, we utilize FedAvg

(McMahan et al., 2017). It is important to note here that in FedAvg, the aggregation function on the server merely averages model parameters. We use GNNs inductively, i.e. the model is independent of the structure of the graphs it is trained on. Thus, no topological information about graphs on any client is required on the server during parameter aggregation. Other advanced algorithms such as FedOPT (Reddi et al., 2020), FedGKT (He et al., 2020b), and Decentralized FL (He et al., 2019a) can also be applied.

4 FedGraphNN System Design

Deploying federated learning algorithms to existing internal systems in cross-silo institutes faces several challenges:

  1. Both different institutes and different subsystems in an institute have heterogeneous data schemes (different feature space, different labels for the same data point, different formats);

  2. Datasets or features are scattered in different subsystems in an institute;

  3. The FL client software should be compatible with the existing system (OS platform, system architecture, API design pattern).

In general, frequent and large-scale deployment of updates, monitoring, and debugging is challenging; running ML workloads on an edge server is hampered by the lack of a portable, fast, small footprint, and flexible runtime engine for on-device training (Kairouz et al., 2019, Section 7).

Figure 3: Overview of FedGraphNN System Architecture Design

We develop an open-source federated learning system for GNNs, named FedGraphNN, which includes implementations of standard baseline datasets, models, and federated learning algorithms for GNN-based FL research. FedGraphNN aims to enable efficient and flexible customization for future exploration.

Figure 4: Example code for benchmark evaluation with FedGraphNN

As shown in Figure 3, FedGraphNN is built based on FedML research library (He et al., 2020c) which is a widely used FL library, but without any GNN support as yet. To distinguish FedGraphNN over FedML, we color-coded the modules that specific to FedGraphNN. In the lowest layer, FedGraphNN reuses FedML-core

APIs but further supports tensor-aware RPC (remote procedure call), which enables the communication between servers located at different data centers (e.g., different pharmaceutical vendors). Enhanced security and privacy primitive modules are added to support techniques such as secure aggregation in upper layers. The layer above supports plug and play operation of common GNN models. To combat against the non-I.I.D. behavior of graph datasets, we provide dedicated split algorithms and data loaders. Users can either reuse our data distribution or manipulate the non-I.I.D.ness by setting hyperparameters. To address these deployment challenges, we plan to develop

FedML Client SDK, which has three key modules, Data Collector and Manager, Training Manager, and Model Serving, shown in Figure 3. We introduce details of FedML Client SDK in Appendix A. The example code is shown in Figure 4.

5 FedGraphNN Benchmark: Datasets, Models, and Algorithms

Category Dataset # Tasks Task Type # Compounds Average # of Nodes Average # of Edges Rec - Metric
Quantum Mechanics QM9 (Gaulton et al., 2012) 12 Regression 133885 8.80 27.60 MAE
Physical Chemistry ESOL (Delaney, 2004) 1 Regression 1128 13.29 40.65 RMSE
FreeSolv(Mobley and Guthrie, 2014) 1 Regression 642 8.72 25.60 RMSE
Lipophilicity (Gaulton et al., 2012) 1 Regression 4200 27.04 86.04 RMSE
Biophysics hERG(Gaulton et al., 2016; Kim et al., 2021) 1 Regression 10572 29.39 94.09 RMSE
BACE (Subramanian et al., 2016) 1 Classification 1513 34.09 36.89 ROC-AUC
Physiology BBBP (Martins et al., 2012) 1 Classification 2039 24.03 25.94 ROC-AUC
SIDER (Kuhn et al., 2016) 27 Classification 1427 33.64 35.36 ROC-AUC
ClinTox (Gayvert et al., 2016) 2 Classification 1478 26.13 27.86 ROC-AUC
Tox21 (, 2017) 12 Classification 7831 18.51 25.94 ROC-AUC
Table 1: Summary of Molecular Machine Learning Datasets

Non-I.I.D. Datasets.

To facilitate GNN-based FL research, we plan to support various graph datasets with nonI.I.D.ness in different domains such as molecule machine learning, knowledge graph, and recommendation system. In the latest release, we use MoleculeNet (Wu et al., 2018b), a molecule machine learning benchmark, as the data source to generate our non-I.I.D. benchmark datasets. Specially, we use the unbalanced partition algorithm Latent Dirichlet Allocation (LDA) (He et al., 2020c) to partition datasets in the MoleculeNet benchmark. Besides, we provide a new dataset, named hERG, related to cardiac toxicity and collected from (Kim et al., 2021; Gaulton et al., 2017) with data cleaning. Table 1 in Appendix summarizes all datasets we used in experiments. Figure 5 shows some of datasets’ non-I.I.D. distribution. The alpha value for latent Dirichlet allocation (LDA) in each non-I.I.D. graph dataset can be found in Table 2 and 3. The data distribution for each dataset is illustrated in Figure 8 in Appendix B.1. More details and their specific preprocessing details can be found in Appendices B.1 & B.2.

(a) hERG (#clients: 4, alpha: 3)
(b) QM9 (#clients: 8, alpha: 3)
Figure 5: blackExample Unbalanced Sample Distributions

Dataset Splitting.

We apply random splitting as advised in (Wu et al., 2018b). Dataset partition is 80 training, 10 validation, and 10 test. We plan to support the scaffold splitting (Bemis and Murcko, 1996) specifically for molecular machine learning datasets as future work.

GNN Models & Federated Learning Algorithms.

FedGraphNN’s latest release supports GCN (Kipf and Welling, 2016), GAT (Veličković et al., 2018), and GraphSage (Hamilton et al., 2017)

as the GNN models. The readout function currently supported is a simple Multilayer Perceptron (MLP). Users can easily plug their customized GNN models and readout functions into our framework. For Federated Learning algorithms, besides FedAvg

(McMahan et al., 2017), other advanced algorithms such as FedOPT (Reddi et al., 2020) and FedGKT (He et al., 2020b) are also supported. We refer to the Appendix B.4 for the details on FL algorithms and GNN models.

6 Experiments

6.1 Experimental Setup

Implementation and Hyper-parameters.

Experiments were conducted on a GPU server equipped with 8 NVIDIA Quadro RTX 5000 (16GB GPU memory). We built the benchmark with the FedAvg algorithm for three GNN models on various MoleculeNet datasets with different scales of sample numbers. The hyper-parameters used for the experiments are listed in Appendix C.

6.2 Result of Model Accuracy on Non-I.I.D. Partitioning

Figure 6: Tox21: test score during sweeping Figure 7: hERG: test score during sweeping

We run experiments on both classification and regression tasks. Hyper-parameters are tuned (sweeping) by grid search (see Section D for the search space). Figures 6 and 7 use GraphSage on Tox21 and hERG as examples to show the test score curve during sweeping. After hyper-parameter tuning, we report all results in Table 2 and Table 3. For each result, the optimal hyper-parameters can be found in Appendix C.

Dataset Non-I.I.D. GNN Federated Performance MoleculeNet Score on Score on
(samples) Partition Method Model Optimizer Metric Results Centralized Training Federated Training
SIDER LDA GCN FedAvg ROC-AUC 0.638 0.6476 0.6266 ()
with = 0.2 GAT 0.6639 0.6591 ()
(1427) 4 clients GraphSAGE 0.6669 0.6700 ()
BACE LDA GCN FedAvg ROC-AUC 0.806 0.7657 0.6594 ()
with = 0.5 GAT 0.9221 0.7714 ()
(1513) 4 clients GraphSAGE 0.9266 0.8604 ()
Clintox LDA GCN FedAvg ROC-AUC 0.832 0.8914 0.8784 ()
with = 0.5 GAT 0.9573 0.9129 ()
(1478) 4 clients GraphSAGE 0.9716 0.9246 ()
BBBP LDA GCN FedAvg ROC-AUC 0.690 0.8705 0.7629 ()
with = 2 GAT 0.8824 0.8746 ()
(2039) 4 clients GraphSAGE 0.8930 0.8935 ()
Tox21 LDA GCN FedAvg ROC-AUC 0.829 0.7800 0.7128 ()
with = 3 GAT 0.8144 0.7186 ()
(7831) 8 clients GraphSAGE 0.8317 0.7801 ()
  • *Note: to reproduce the result, please use the same random seeds we set in the library.

Table 2: Classification results (higher is better)
Dataset Non-I.I.D. GNN Federated Performance MoleculeNet Score for Score for
Partition Method Model Optimizer Metric Result Centralized Training Federated Training
FreeSolv LDA GCN FedAvg RMSE 1.40 0.16 1.5787 2.7470 ()
with = 0.5 GAT 1.2175 1.3130 ()
(642) 4 clients GraphSAGE 1.3630 1.6410 ()
ESOL LDA GCN FedAvg RMSE 0.97 0.01 1.0190 1.4350 ()
with = 2 GAT 0.9358 0.9643 ()
(1128) 4 clients GraphSAGE 0.8890 1.1860 ()
Lipo LDA GCN FedAvg RMSE 0.655 0.036 0.8518 1.1460 ()
with = 2 GAT 0.7465 0.8537 ()
(4200) 8 clients GraphSAGE 0.7078 0.7788 ()
hERG LDA GCN FedAvg RMSE - 0.7257 0.7944 ()
with = 3 GAT 0.6271 0.7322 ()
(10572) 8 clients GraphSAGE 0.7132 0.7265 ()
QM9 LDA GCN FedAvg MAE 2.35 14.78 21.075 ()
with = 3 GAT 12.44 23.173 ()
(133885) 8 clients GraphSAGE 13.06 19.167 ()
  • *Note: to reproduce the result, please use the same random seeds we set in the library.

Table 3: Regression results (lower is better)

There are multiple takeaways from these results:

  1. When graph datasets are small, FL accuracy is on par with (or even better than) centralized learning.

  2. But when dataset sizes grow, FL accuracy becomes worse than the centralized approach. In larger datasets, the non-I.I.D. nature of graphs leads to an accuracy drop.

  3. The dynamics of training GNNs in a federated setting are different from training federated vision or language models. Our findings show that the best model in the centralized setting may not necessarily be the best model in the non-I.I.D. federated setting.

  4. Interestingly, we find that GAT suffers the most considerable performance compromise on 5 out of 9 datasets. This may be due to the sensibility of the attention calculation on the non-I.I.D. settings.

Hence, additional research is needed to understand the nuances of training GNNs in a federated setting and bridge this gap.

6.3 System Performance Analysis

We also present system performance analysis when using Message Passing Interface (MPI) as the communication backend. The results are summarized in Table 4. Even on large datasets, federated training can be completed in under 1 hour using only 4 GPUs, except the QM9 dataset, which requires hours to finish training. FedGraphNN thus provides an efficient mapping of algorithms to the underlying resources, thereby making it attractive for deployment.

The training time using RPC is also evaluated; and results are similar to that of using MPI. Note that RPC is useful for realistic deployment when GPU/CPU-based edge devices can only be accessed via public IP addresses due to locating in different data centers. We will provide detailed test results in such a scenario in our future work.

SIDER BACE Clintox BBBP Tox21 FreeSolv ESOL Lipo hERG QM9
Wall-clock Time GCN 5m 58s 4m 57s 4m 40s 4m 13s 15m 3s 4m 12s 5m 25s 16m 14s 35m 30s 6h 48m
GAT 8m 48s 5m 27s 7m 37s 5m 28s 25m 49s 6m 24s 8m 36s 25m 28s 58m 14s 9h 21m
GraphSAGE 2m 7s 3m 58s 4m 42s 3m 26s 14m 31s 5m 53s 6m 54s 15m 28s 32m 57s 5h 33m
Average FLOP GCN 697.3K 605.1K 466.2K 427.2K 345.8K 142.6K 231.6K 480.6K 516.6K 153.9K
GAT 703.4K 612.1K 470.2K 431K 347.8K 142.5K 232.6K 485K 521.3K 154.3K
GraphSAGE 846K 758.6K 1.1M 980K 760.6K 326.9K 531.1K 1.5M 1.184M 338.2K
Parameters GCN 15.1K 13.5K 13.6K 13.5K 14.2K 13.5K 13.5K 13.5K 13.5K 14.2K
GAT 20.2K 18.5K 18.6K 18.5K 19.2K 18.5K 18.5K 18.5K 18.5K 19.2K
GraphSAGE 10.6K 8.9K 18.2K 18.1K 18.8K 18.1K 18.1K 269K 18.1K 18.8K
  • *Note that we use the distributed training paradigm where each client’s local training uses one GPU. Please refer to our code for details.

Table 4: Training time with FedAvg on GNNs (Hardware: 8 x NVIDIA Quadro RTX 5000 GPU (16GB/GPU); RAM: 512G; CPU: Intel Xeon Gold 5220R 2.20GHz).

7 Future Works and Conclusion

Here we highlight some future system-level improvements and research directions for consideration:

  1. Supporting more graph datasets and GNN models for diverse applications; Possible applications are not limited to knowledge graphs, recommendation systems, and spatio-temporal forecasting (Liu et al., 2021);

  2. Optimizing the system to accelerate the training speed for large-scale graph datasets (Zheng et al., 2020a; Lee et al., 2020);

  3. Proposing advanced FL algorithms or GNN models to mitigate the accuracy gap on datasets with non-I.I.D.ness; One promising direction is to follow the thought of Federated Neural Architecture Search (FedNAS) to search for a personalized GNN model for each FL client (He et al., 2020a, d);

  4. Exploring semi-supervised or self-supervised learning methods is essential for realistic GNN-based FL applications, given that real-world graph data often has limited labels

    (Xie et al., 2021);

  5. Addressing specific challenges in security and privacy under the setting of Federated GNN (Elkordy and Avestimehr, 2020; Prakash and Avestimehr, 2020);

  6. Developing coded computing ideas for mitigating stragglers in the setting of Federated GNN (Prakash et al., 2020a, b).

  7. Proposing efficient compression algorithms that adapt to the level of compression to the available bandwidth of the users while preserving the privacy of users’ local models.

In this paper, we design a federated learning (FL) system and benchmark for federated graph neural networks (GNN), named FedGraphNN. FedGraphNN includes implementations of baseline datasets, models, and federated learning algorithms. Our system performance analysis shows that GNN-based FL research is affordable to most research labs. We hope FedGraphNN can serve as an easy-to-follow research platform for researchers to explore vital problems at the intersection of federated learning and graph neural networks.

References

  • (2017) Tox21 challenge. Note: https://tripod.nih.gov/tox21/challenge/ Cited by: 5th item, Table 1.
  • G. W. Bemis and M. A. Murcko (1996) The properties of known drugs. 1. molecular frameworks. Journal of medicinal chemistry 39 (15), pp. 2887–2893. Cited by: §5.
  • C. Chen, J. Cui, G. Liu, J. Wu, and L. Wang (2020) Survey and open problems in privacy preserving knowledge graph: merging, query, representation, completion and applications. arXiv preprint arXiv:2011.10180. Cited by: §3.
  • Z. Cui, K. Henrickson, R. Ke, Z. Pu, and Y. Wang (2019)

    Traffic graph convolutional recurrent neural network: a deep learning framework for network-scale traffic learning and forecasting

    .
    External Links: 1802.07007 Cited by: §1.
  • J. S. Delaney (2004)

    ESOL: estimating aqueous solubility directly from molecular structure

    .
    Journal of chemical information and computer sciences 44 (3), pp. 1000–1005. Cited by: 3rd item, Table 1.
  • A. R. Elkordy and A. S. Avestimehr (2020) Secure aggregation with heterogeneous quantization in federated learning. External Links: 2009.14388 Cited by: item 5.
  • A. Gaulton, L. J. Bellis, A. P. Bento, J. Chambers, M. Davies, A. Hersey, Y. Light, S. McGlinchey, D. Michalovich, B. Al-Lazikani, et al. (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic acids research 40 (D1), pp. D1100–D1107. Cited by: 4th item, Table 1.
  • A. Gaulton, A. Hersey, M. Nowotka, A. P. Bento, J. Chambers, D. Mendez, P. Mutowo, F. Atkinson, L. J. Bellis, E. Cibrián-Uhalte, et al. (2017) The chembl database in 2017. Nucleic acids research 45 (D1), pp. D945–D954. Cited by: 6th item, 2nd item, §5.
  • A. Gaulton, A. Hersey, M. Nowotka, A. P. Bento, J. Chambers, D. Mendez, P. Mutowo, F. Atkinson, L. J. Bellis, E. Cibrián-Uhalte, M. Davies, N. Dedman, A. Karlsson, M. P. Magariños, J. P. Overington, G. Papadatos, I. Smit, and A. R. Leach (2016) The ChEMBL database in 2017. Nucleic Acids Research 45 (D1), pp. D945–D954. External Links: ISSN 0305-1048, Document, Link, https://academic.oup.com/nar/article-pdf/45/D1/D945/8846762/gkw1074.pdf Cited by: Table 1.
  • K. M. Gayvert, N. S. Madhukar, and O. Elemento (2016) A data-driven approach to predicting successes and failures of clinical trials. Cell chemical biology 23 (10), pp. 1294–1301. Cited by: 3rd item, Table 1.
  • S. Ge, F. Wu, C. Wu, T. Qi, Y. Huang, and X. Xie (2020)

    FedNER: privacy-preserving medical named entity recognition with federated learning

    .
    arXiv e-prints, pp. arXiv–2003. Cited by: §1.
  • J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl (2017) Neural message passing for quantum chemistry. In International Conference on Machine Learning, pp. 1263–1272. Cited by: §3.
  • O. Gupta and R. Raskar (2018) Distributed learning of deep neural network over multiple agents. Journal of Network and Computer Applications 116, pp. 1–8. Cited by: 1st item.
  • A. A. Hagberg, D. A. Schult, and P. J. Swart (2008) Exploring network structure, dynamics, and function using networkx. In Proceedings of the 7th Python in Science Conference, G. Varoquaux, T. Vaught, and J. Millman (Eds.), Pasadena, CA USA, pp. 11 – 15. Cited by: item 2.
  • W. L. Hamilton, R. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. CoRR abs/1706.02216. External Links: Link, 1706.02216 Cited by: 2nd item, §3, §5.
  • A. Hard, K. Rao, R. Mathews, S. Ramaswamy, F. Beaufays, S. Augenstein, H. Eichner, C. Kiddon, and D. Ramage (2018) Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604. Cited by: §1.
  • C. He, M. Annavaram, and S. Avestimehr (2020a) Fednas: federated deep learning via neural architecture search. arXiv preprint arXiv:2004.08546. Cited by: item 3.
  • C. He, M. Annavaram, and S. Avestimehr (2020b) Group knowledge transfer: federated learning of large cnns at the edge. Advances in Neural Information Processing Systems 33. Cited by: §3, §5.
  • C. He, S. Li, J. So, M. Zhang, H. Wang, X. Wang, P. Vepakomma, A. Singh, H. Qiu, L. Shen, P. Zhao, Y. Kang, Y. Liu, R. Raskar, Q. Yang, M. Annavaram, and S. Avestimehr (2020c) FedML: a research library and benchmark for federated machine learning. arXiv preprint arXiv:2007.13518. Cited by: item 1, §4, §5.
  • C. He, C. Tan, H. Tang, S. Qiu, and J. Liu (2019a) Central server free federated learning over single-sided trust social networks. arXiv preprint arXiv:1910.04956. Cited by: §3.
  • C. He, T. Xie, Y. Rong, W. Huang, J. Huang, X. Ren, and C. Shahabi (2019b) Cascade-bgnn: toward efficient self-supervised representation learning on large-scale bipartite graphs. arXiv preprint arXiv:1906.11994. Cited by: §1.
  • C. He, H. Ye, L. Shen, and T. Zhang (2020d) Milenas: efficient neural architecture search via mixed-level reformulation. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    ,
    pp. 11993–12002. Cited by: item 3.
  • T. H. Hsu, H. Qi, and M. Brown (2020) Federated visual classification with real-world data distribution. arXiv preprint arXiv:2003.08082. Cited by: §1.
  • M. Jiang, T. Jung, R. Karl, and T. Zhao (2020) Federated dynamic gnn with secure aggregation. arXiv preprint arXiv:2009.07351. Cited by: §2.
  • P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al. (2019) Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977. Cited by: §1, §4.
  • S. Kim, J. Chen, T. Cheng, A. Gindulyte, J. He, S. He, Q. Li, B. A. Shoemaker, P. A. Thiessen, B. Yu, et al. (2021) PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Research 49 (D1), pp. D1388–D1395. Cited by: 6th item, 2nd item, §5, Table 1.
  • T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. CoRR abs/1609.02907. External Links: Link, 1609.02907 Cited by: 1st item, §3, §5.
  • M. Kuhn, I. Letunic, L. J. Jensen, and P. Bork (2016) The sider database of drugs and side effects. Nucleic acids research 44 (D1), pp. D1075–D1079. Cited by: 2nd item, Table 1.
  • G. Landrum (2006) RDKit: open-source cheminformatics. External Links: Link Cited by: item 1.
  • S. Lee, Q. Kang, A. Agrawal, A. Choudhary, and W. -k. Liao (2020)

    Communication-efficient local stochastic gradient descent for scalable deep learning

    .
    In 2020 IEEE International Conference on Big Data (Big Data), Vol. , pp. 718–727. External Links: Document Cited by: item 2.
  • Q. Li, Z. Han, and X. Wu (2018)

    Deeper insights into graph convolutional networks for semi-supervised learning

    .
    In

    Proceedings of the AAAI Conference on Artificial Intelligence

    ,
    Vol. 32. Cited by: Appendix C.
  • M. Liu, Y. Luo, L. Wang, Y. Xie, H. Yuan, S. Gui, Z. Xu, H. Yu, J. Zhang, Y. Liu, et al. (2021) DIG: a turnkey library for diving into graph deep learning research. arXiv preprint arXiv:2103.12608. Cited by: item 1.
  • Y. Liu, A. Huang, Y. Luo, H. Huang, Y. Liu, Y. Chen, L. Feng, T. Chen, H. Yu, and Q. Yang (2020) Fedvision: an online visual object detection platform powered by federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, pp. 13172–13179. Cited by: §1.
  • E. Markowitz, K. Balasubramanian, M. Mirtaheri, S. Abu-El-Haija, B. Perozzi, G. V. Steeg, and A. Galstyan (2021) Graph traversal with tensor functionals: a meta-algorithm for scalable learning. External Links: 2102.04350 Cited by: 1st item.
  • I. F. Martins, A. L. Teixeira, L. Pinheiro, and A. O. Falcao (2012) A bayesian approach to in silico blood-brain barrier penetration modeling. Journal of chemical information and modeling 52 (6), pp. 1686–1697. Cited by: 1st item, Table 1.
  • B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas (2017) Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pp. 1273–1282. Cited by: 1st item, item 3, §1, §3, §5.
  • G. Mei, Z. Guo, S. Liu, and L. Pan (2019) Sgnn: a graph neural network based federated learning approach by hiding structure. In 2019 IEEE International Conference on Big Data (Big Data), pp. 2560–2568. Cited by: §2.
  • C. Meng, S. Rambhatla, and Y. Liu (2021) Cross-node federated graph neural network for spatio-temporal data modeling. External Links: Link Cited by: Figure 1, item 3, §2.
  • D. L. Mobley and J. P. Guthrie (2014) FreeSolv: a database of experimental and calculated hydration free energies, with input files. Journal of computer-aided molecular design 28 (7), pp. 711–720. Cited by: 5th item, Table 1.
  • S. Prakash and A. S. Avestimehr (2020) Mitigating byzantine attacks in federated learning. arXiv preprint arXiv:2010.07541. Cited by: item 5.
  • S. Prakash, S. Dhakal, M. R. Akdeniz, Y. Yona, S. Talwar, S. Avestimehr, and N. Himayat (2020a) Coded computing for low-latency federated learning over wireless edge networks. IEEE Journal on Selected Areas in Communications 39 (1), pp. 233–250. Cited by: item 6.
  • S. Prakash, A. Reisizadeh, R. Pedarsani, and A. S. Avestimehr (2020b) Coded computing for distributed graph analytics. IEEE Transactions on Information Theory 66 (10), pp. 6534–6554. Cited by: item 6.
  • R. Ramakrishnan, P. O. Dral, M. Rupp, and O. A. von Lilienfeld (2014) Quantum chemistry structures and properties of 134 kilo molecules. Scientific Data 1. Cited by: 1st item.
  • S. Reddi, Z. Charles, M. Zaheer, Z. Garrett, K. Rush, J. Konečnỳ, S. Kumar, and H. B. McMahan (2020) Adaptive federated optimization. arXiv preprint arXiv:2003.00295. Cited by: §3, §5.
  • Y. Rong, Y. Bian, T. Xu, W. Xie, Y. Wei, W. Huang, and J. Huang (2020a) Self-supervised graph transformer on large-scale molecular data. External Links: 2007.02835 Cited by: §B.2.
  • Y. Rong, Y. Bian, T. Xu, W. Xie, Y. Wei, W. Huang, and J. Huang (2020b) Self-supervised graph transformer on large-scale molecular data. Advances in Neural Information Processing Systems 33. Cited by: §1, §3.
  • Y. Rong, T. Xu, J. Huang, W. Huang, H. Cheng, Y. Ma, Y. Wang, T. Derr, L. Wu, and T. Ma (2020c) Deep graph learning: foundations, advances and applications. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge DiscoverY; Data Mining, KDD ’20, New York, NY, USA, pp. 3555–3556. External Links: ISBN 9781450379984, Link, Document Cited by: §3.
  • S. Sajadmanesh and D. Gatica-Perez (2021) Locally private graph neural networks. Cited by: §2.
  • G. Subramanian, B. Ramsundar, V. Pande, and R. A. Denny (2016) Computational modeling of -secretase 1 (bace-1) inhibitors using ligand based approaches. Journal of chemical information and modeling 56 (10), pp. 1936–1949. Cited by: 4th item, Table 1.
  • M. Sun, S. Zhao, C. Gilvary, O. Elemento, J. Zhou, and F. Wang (2019) Graph convolutional networks for computational drug development and discovery. Briefings in Bioinformatics 21 (3), pp. 919–935. External Links: ISSN 1477-4054, Document, Link, https://academic.oup.com/bib/article-pdf/21/3/919/33227266/bbz042.pdf Cited by: §1.
  • T. Suzumura, Y. Zhou, N. Baracaldo, G. Ye, K. Houck, R. Kawahara, A. Anwar, L. L. Stavarache, Y. Watanabe, P. Loyola, et al. (2019) Towards federated graph learning for collaborative financial crimes detection. arXiv preprint arXiv:1909.12946. Cited by: §2.
  • P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio (2018) Graph attention networks. External Links: 1710.10903 Cited by: 3rd item, §3, §5.
  • P. Vepakomma, O. Gupta, T. Swedish, and R. Raskar (2018) Split learning for health: distributed deep learning without sharing raw patient data. arXiv preprint arXiv:1812.00564. Cited by: 1st item.
  • B. Wang, A. Li, H. Li, and Y. Chen (2020a) GraphFL: a federated learning framework for semi-supervised node classification on graphs. arXiv preprint arXiv:2012.04187. Cited by: item 3, §2.
  • X. Wang, Y. Ma, Y. Wang, W. Jin, X. Wang, J. Tang, C. Jia, and J. Yu (2020b) Traffic flow prediction via spatial temporal graph neural network. In Proceedings of The Web Conference 2020, pp. 1082–1092. External Links: ISBN 9781450370233, Link Cited by: §1.
  • C. Wu, F. Wu, Y. Cao, Y. Huang, and X. Xie (2021) FedGNN: federated graph neural network for privacy-preserving recommendation. arXiv preprint arXiv:2102.04925. Cited by: Figure 1, item 3, item 2.
  • L. Wu, P. Sun, R. Hong, Y. Fu, X. Wang, and M. Wang (2018a) SocialGCN: an efficient graph convolutional network based model for social recommendation. CoRR abs/1811.02815. External Links: Link, 1811.02815 Cited by: §1.
  • Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, and V. Pande (2018b) MoleculeNet: a benchmark for molecular machine learning. Chemical science 9 (2), pp. 513–530. Cited by: §B.1, §5, §5.
  • Y. Xie, Z. Xu, Z. Wang, and S. Ji (2021) Self-supervised learning of graph neural networks: a unified review. arXiv preprint arXiv:2102.10757. Cited by: item 4.
  • Q. Yang, Y. Liu, T. Chen, and Y. Tong (2019) Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. 10 (2). External Links: ISSN 2157-6904, Link, Document Cited by: 1st item.
  • D. Zheng, C. Ma, M. Wang, J. Zhou, Q. Su, X. Song, Q. Gan, Z. Zhang, and G. Karypis (2020a) DistDGL: distributed graph neural network training for billion-scale graphs. arXiv preprint arXiv:2010.05337. Cited by: item 2.
  • L. Zheng, J. Zhou, C. Chen, B. Wu, L. Wang, and B. Zhang (2020b) ASFGNN: automated separated-federated graph neural network. arXiv preprint arXiv:2011.03248. Cited by: Figure 1, item 2, item 3.
  • J. Zhou, C. Chen, L. Zheng, X. Zheng, B. Wu, Z. Liu, and L. Wang (2020) Privacy-preserving graph neural network for node classification. arXiv preprint arXiv:2005.11903. Cited by: §2.

Appendix A More Details of System Design

Data Collector and Manager is a distributed computing system that can collect scattered datasets or features from multiple servers to Training Manager. Such collection can also keep the raw data in the original server with RPCs, which can only access the data during training. After obtaining all necessary datasets for federated training, Training Manager will start federated training using algorithms supported by FedML-API. Once training has been completed, Model Serving can request the trained model to deploy for inference. Under this SDK abstraction, we plan to address the challenges mentioned above (1) and (2) within the Data Collector and Manager. As for challenge (3), we plan to make FedML Client SDK compatible with any operating systems (Linux, Android, iOS) with a cross-platform abstraction interface design. In essence, the three modules inside FedML Client SDK builds up a pipeline that manages a model’s life cycle, from federated training to personalized model serving (inference). Unifying three modules of a pipeline into a single SDK can simplify the system design. Any subsystem in an institute can integrate FedML Client SDK with a host process, which can be the backend service or desktop application. Overall, we hope FedML Client SDK could be a lightweight and easy-to-use SDK for federated learning among diverse cross-silo institutes.

Appendix B Benchmark Details

b.1 Molecular Dataset Details

Table 1 summarizes the necessary information of benchmark datasets (Wu et al., 2018b). The details of each dataset are listed below:

Molecular Classification Datasets

  • BBBP (Martins et al., 2012) involves records of whether a compound carries the permeability property of penetrating the blood-brain barrier.

  • SIDER (Kuhn et al., 2016), or Side Effect Resource, the dataset consists of marketed drugs with their adverse drug reactions. The available

  • ClinTox (Gayvert et al., 2016) includes qualitative data of drugs both approved by the FDA and rejected due to the toxicity shown during clinical trials.

  • BACE (Subramanian et al., 2016) is collected for recording compounds that could act as the inhibitors of human -secretase 1 (BACE-1) in the past few years.

  • Tox21(, 2017) is a dataset which records the toxicity of compounds.

  • hERG(Kim et al., 2021; Gaulton et al., 2017) is a dataset that records the gene (KCNH2) that codes for a protein known as Kv11.1 responsible for its contribution to the electrical activity of the heart to help the coordination of the heart’s beating.

Molecular Regression Datasets

  • QM9 (Ramakrishnan et al., 2014) is a subset of GDB-13, which records the computed atomization energies of stable and synthetically accessible organic molecules, such as HOMO/LUMO, atomization energy, etc. It contains various molecular structures such as triple bonds, cycles, amide, and epoxy.

  • hERG (Gaulton et al., 2017; Kim et al., 2021) is a dataset that records the gene (KCNH2) that codes for a protein known as Kv11.1 responsible for its contribution to the electrical activity of the heart to help the coordination of the heart’s beating.

  • ESOL (Delaney, 2004) is a small dataset documenting the water solubility(log solubility in mols per litre) for common organic small molecules.

  • Lipophilicity (Gaulton et al., 2012) which records the experimental results of octanol/water distribution coefficient for compounds.

  • FreeSolv (Mobley and Guthrie, 2014) contains the experimental results of hydration-free energy of small molecules in water.

b.2 Feature Extraction Procedure for Molecules

The feature extraction is in two steps:

  1. Atom-level feature extraction and Molecule object construction using RDKit (Landrum, 2006).

  2. Constructing graphs from molecule objects using NetworkX (Hagberg et al., 2008).

Atom features, shown in Table 5, are the atom features we used exactly th same as in (Rong et al., 2020a).

Features Size Description

atom type
100 Representation of atom (e.g., C, N, O), by its atomic number
formal charge 5 An integer electronic charge assigned to atom
number of bonds 6 Number of bonds the atom is involved in
chirality 5 Number of bonded hydrogen atoms
number of H 5 Number of bonded hydrogen atoms
atomic mass 1 Mass of the atom, divided by 100
aromaticity 1 Whether this atom is part of an aromatic system
hybridization 5 SP, SP2, SP3, SP3D, or SP3D2
Table 5: Atom features

b.3 Non-I.I.D. Partitioning

(a) hERG (#clients: 4, alpha: 3)
(b) ESOL (#clients: 4, alpha: 2)
(c) FreeSolv (#clients: 4, alpha: 0.5)
(d) BACE (#clients: 4, alpha: 0.5)
(e) QM9 (#clients: 8, alpha: 3)
(f) Clintox (#clients: 4, alpha: 0.5)
(g) PCBA (#clients: 8, alpha: 3)
(h) Tox21 (#clients: 8, alpha: 3)
(i) BBBP (#clients: 4, alpha: 2)
(j) SIDER (#clients: 4, alpha: 0.2)
(k) LIPO (#clients: 8, alpha: 2)
Figure 8: blackUnbalanced Sample Distribution (Non-I.I.D.) for Molecular Datasets

b.4 Details of Supported Models and Algorithms

Graph Neural Network Architectures

  • Graph Convolutional Networks (Kipf and Welling, 2016) is a GNN model which is a order approximation to spectral GNN models. Markowitz et al. (2021)

  • GraphSAGE (Hamilton et al., 2017) is a general inductive GNN framework capable of generating node-level representations for unseen data.

  • Graph Attention Networks (Veličković et al., 2018) is the first attention-based GNN model. Attention is computed in a message-passing fashion.

Federated Learning Algorithms

  • Federated Averaging (FedAvg). FedAvg McMahan et al. (2017)

    is a standard federated learning algorithm that is normally used as a baseline for advanced algorithm comparison. Each worker trains its local model for several epochs, then updates its local model to the server. The server aggregates the uploaded client models into a global model by weighted coordinate-wise averaging (the weights are determined by the number of data points on each worker locally), and then synchronizes the global model back to all workers.

    Vertical Federated Learning (VFL). VFL or feature-partitioned FL  Yang et al. (2019) is applicable to the cases where all participating parties share the same sample space but differ in the feature space. VFL is the process of aggregating different features and computing the training loss and gradients in a privacy-preserving manner to build a model with data from all parties collaboratively.

    Split Learning. Split learning is computing and memory-efficient variant of FL introduced in Gupta and Raskar (2018); Vepakomma et al. (2018) where the model is split at a layer and the parts of the model preceding and succeeding this layer are shared across the worker and server, respectively. Only the activations and gradients from a single layer are communicated in split learning, as against that the weights of the entire model are communicated in federated learning.

Appendix C Hyper-parameters

For each task, we utilize grid search to find the best results. Table 6 & 7 list all the hyper-parameters ranges used in our experiments. All hyper-parameter tuning is run on a single GPU. The best hyperparameters for each dataset and model are listed in Table 8,9,10, & 11 For molecule tasks ,batch-size is kept fixed since the molecule-level task requires us to have mini-batch is equal to 1. Also, number of GNN layers were fixed to 2 because having too many GNN layers result in over-smoothing phenomenon as shown in (Li et al., 2018). For all experiments, we used Adam optimizer.

hyper-parameter Description Range
learning rate Rate of speed at which the model learns.
dropout rate Dropout ratio
node embedding dimension Dimensionality of the node embedding
hidden layer dimension Hidden layer dimensionality
readout embedding dimension Dimensionality of the readout embedding
graph embedding dimension Dimensionality of the graph embedding
attention heads Number of attention heads required for GAT 1-7
alpha LeakyRELU parameter used in GAT model 0.2
Table 6: Hyper-parameter Range for Centralized Training
hyper-parameter Description Range
learning rate Rate of speed at which the model learns.
dropout rate Dropout ratio
node embedding dimension Dimensionality of the node embedding 64
hidden layer dimension Hidden layer dimensionality 64
readout embedding dimension Dimensionality of the readout embedding 64
graph embedding dimension Dimensionality of the graph embedding 64
attention heads Number of attention heads required for GAT 1-7
alpha LeakyRELU parameter used in GAT model 0.2
rounds Number of federating learning rounds [10, 50, 100]
epoch Epoch of clients 1
number of clients Number of users in a federated learning round 4-10
Table 7: Hyper-parameter Range for Federated Learning
Dataset Score & Parameters GCN GAT GraphSAGE
BBBP ROC-AUC Score 0.8705 0.8824 0.8930
learning rate 0.0015 0.015 0.01
dropout rate 0.2 0.5 0.2
node embedding dimension 64 64 64
hidden layer dimension 64 64 64
readout embedding dimension 64 64 64
graph embedding dimension 64 64 64
attention heads None 2 None
alpha None 0.2 None
BACE ROC-AUC Score 0.9221 0.7657 0.9266
learning rate 0.0015 0.001 0.0015
dropout rate 0.3 0.3 0.3
node embedding dimension 64 64 16
hidden layer dimension 64 64 64
readout embedding dimension 64 64 64
graph embedding dimension 64 64 64
attention heads None 2 None
alpha None 0.2 None
Tox21 ROC-AUC Score 0.7800 0.8144 0.8317
learning rate 0.0015 0.00015 0.00015
dropout rate 0.4 0.3 0.3
node embedding dimension 64 128 256
hidden layer dimension 64 64 128
readout embedding dimension 64 128 256
graph embedding dimension 64 64 128
attention heads None 2 None
alpha None 0.2 None
SIDER ROC-AUC Score 0.6476 0.6639 0.6669
learning rate 0.0015 0.0015 0.0015
dropout rate 0.3 0.3 0.6
node embedding dimension 64 64 16
hidden layer dimension 64 64 64
readout embedding dimension 64 64 64
graph embedding dimension 64 64 64
attention heads None 2 None
alpha None 0.2 None
ClinTox ROC-AUC Score 0.8914 0.9573 0.9716
learning rate 0.0015 0.0015 0.0015
dropout rate 0.3 0.3 0.3
node embedding dimension 64 64 64
hidden layer dimension 64 64 64
readout embedding dimension 64 64 64
graph embedding dimension 64 64 64
attention heads None 2 None
alpha None 0.2 None
Table 8: Hyperparameters for Molecular Classification Task
Dataset Score & Parameters GCN + FedAvg GAT + FedAvg GraphSAGE + FedAvg
BBBP ROC-AUC Score 0.7629 0.8746 0.8935
number of clients 4 4 4
learning rate 0.0015 0.0015 0.015
dropout rate 0.3 0.3 0.6
node embedding dimension 64 64 64
hidden layer dimension 64 64 64
readout embedding dimension 64 64 64
graph embedding dimension 64 64 64
attention heads None 2 None
alpha None 0.2 None
BACE ROC-AUC Score 0.6594 0.7714 0.8604
number of clients 4 4 4
learning rate 0.0015 0.0015 0.0015
dropout rate 0.5 0.3 0.5
node embedding dimension 64 64 16
hidden layer dimension 64 64 64
readout embedding dimension 64 64 64
graph embedding dimension 64 64 64
attention heads None 2 None
alpha None 0.2 None
Tox21 ROC-AUC Score 0.7128 0.7171 0.7801
number of clients 4 4 4
learning rate 0.0015 0.0015 0.00015
dropout rate 0.6 0.3 0.3
node embedding dimension 64 64 64
hidden layer dimension 64 64 64
readout embedding dimension 64 64 64
graph embedding dimension 64 64 64
attention heads None 2 None
alpha None 0.2 None
SIDER ROC-AUC Score 0.6266 0.6591 0.67
number of clients 4 4 4
learning rate 0.0015 0.0015 0.0015
dropout rate 0.6 0.3 0.6
node embedding dimension 64 64 16
hidden layer dimension 64 64 64
readout embedding dimension 64 64 64
graph embedding dimension 64 64 64
attention heads None 2 None
alpha None 0.2 None
ClinTox ROC-AUC Score 0.8784 0.9160 0.9246
number of clients 4 4 4
learning rate 0.0015 0.0015 0.015
dropout rate 0.5 0.6 0.3
node embedding dimension 64 64 64
hidden layer dimension 64 64 64
readout embedding dimension 64 64 64
graph embedding dimension 64 64 64
attention heads None 2 None
alpha None 0.2 None
Table 9: Hyperparameters for Federated Molecular Classification Task
Dataset Score &Parameters GCN GAT GraphSAGE
Freesolv RMSE Score 0.8705 0.8824 0.8930
learning rate 0.0015 0.015 0.01
dropout rate 0.2 0.5 0.2
node embedding dimension 64 64 64
hidden layer dimension 64 64 64
readout embedding dimension 64 64 64
graph embedding dimension 64 64 64
attention heads None 2 None
alpha None 0.2 None
ESOL RMSE Score 0.8705 0.8824 0.8930
learning rate 0.0015 0.015 0.01
dropout rate 0.2 0.5 0.2
node embedding dimension 64 64 64
hidden layer dimension 64 64 64
readout embedding dimension 64 64 64
graph embedding dimension 64 64 64
attention heads None 2 None
alpha None 0.2 None
Lipophilicity RMSE Score 0.8521 0.7415 0.7078
learning rate 0.0015 0.001 0.001
dropout rate 0.3 0.3 0.3
node embedding dimension 128 128 128
hidden layer dimension 64 64 64
readout embedding dimension 128 128 128
graph embedding dimension 64 64 64
attention heads None 2 None
alpha None 0.2 None
hERG RMSE Score 0.7257 0.6271 0.7132
learning rate 0.001 0.001 0.005
dropout rate 0.3 0.5 0.3
node embedding dimension 64 64 64
hidden layer dimension 64 64 64
readout embedding dimension 64 64 64
graph embedding dimension 64 64 64
attention heads None 2 None
alpha None 0.2 None
QM9 RMSE Score 14.78 12.44 13.06
learning rate 0.0015 0.015 0.01
dropout rate 0.2 0.5 0.2
node embedding dimension 64 64 64
hidden layer dimension 64 64 64
readout embedding dimension 64 64 64
graph embedding dimension 64 64 64
attention heads None 2 None
alpha None 0.2 None
Table 10: Hyperparameters for Molecular Regression Task
Dataset Parameters GCN + FedAvg GAT + FedAvg GraphSAGE + FedAvg
FreeSolv RMSE Score 2.747 3.108 1.641
number of clients 4 8 4
learning rate 0.0015 0.00015 0.015
dropout rate 0.6 0.5 0.6
node embedding dimension 64 64 64
hidden layer dimension 64 64 64
readout embedding dimension 64 64 64
graph embedding dimension 64 64 64
attention heads None 2 None
alpha None 0.2 None
ESOL RMSE Score 1.435 1.028 1.185
number of clients 4 4 4
learning rate 0.0015 0.0015 0.0015
dropout rate 0.5 0.3 0.3
node embedding dimension 64 256 64
hidden layer dimension 64 64 64
readout embedding dimension 64 64 64
graph embedding dimension 64 64 64
attention heads None 2 None
alpha None 0.2 None
Lipophilicity RMSE Score 1.146 1.004 0.7788
number of clients 4 4 4
learning rate 0.0015 0.0015 0.0015
dropout rate 0.3 0.3 0.3
node embedding dimension 64 64 256
hidden layer dimension 64 64 256
readout embedding dimension 64 64 256
graph embedding dimension 64 64 256
attention heads None 2 None
alpha None 0.2 None
hERG RMSE Score 0.7944 0.7322 0.7265
number of clients 8 8 8
learning rate 0.0015 0.0015 0.0015
dropout rate 0.3 0.3 0.6
node embedding dimension 64 64 64
hidden layer dimension 64 64 64
readout embedding dimension 64 64 64
graph embedding dimension 64 64 64
attention heads None 2 None
alpha None 0.2 None
QM9 MAE Score 21.075 23.173 19.167
number of clients 8 8 8
learning rate 0.0015 0.00015 0.15
dropout rate 0.2 0.5 0.3
node embedding dimension 64 256 64
hidden layer dimension 64 128 64
readout embedding dimension 64 256 64
graph embedding dimension 64 64 64
attention heads None 2 None
alpha None 0.2 None
Table 11: Hyperparameters for Federated Molecular Regression Task

Appendix D More Experimental Details

The hyper-parameters reported in Section C are based on the hyper-parameter sweeping (grid search). We further provide the curve of test score (accuracy) during training for each dataset with a specific model. We hope these visualized training results can be a useful reference for future research exploration.