FL-AGCNS: Federated Learning Framework for Automatic Graph Convolutional Network Search

04/09/2021 ∙ by Chunnan Wang, et al. ∙ 0

Recently, some Neural Architecture Search (NAS) techniques are proposed for the automatic design of Graph Convolutional Network (GCN) architectures. They bring great convenience to the use of GCN, but could hardly apply to the Federated Learning (FL) scenarios with distributed and private datasets, which limit their applications. Moreover, they need to train many candidate GCN models from scratch, which is inefficient for FL. To address these challenges, we propose FL-AGCNS, an efficient GCN NAS algorithm suitable for FL scenarios. FL-AGCNS designs a federated evolutionary optimization strategy to enable distributed agents to cooperatively design powerful GCN models while keeping personal information on local devices. Besides, it applies the GCN SuperNet and a weight sharing strategy to speed up the evaluation of GCN models. Experimental results show that FL-AGCNS can find better GCN models in short time under the FL framework, surpassing the state-of-the-arts NAS methods and GCN models.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Graph Convolutional N

etwork (GCN) is a powerful deep learning approach for graph-structured data. It can learn high-level node representation from node features and linkage patterns, thus effectively deal with graph-based tasks, such as node classification 

(Hu et al., 2019), traffic forecasting (Yu et al., 2018) and online recommendation (Wu et al., 2019b). Despite great success of GNNs, the design of GCN architecture requires both heavy manual work and domain knowledge (Gao et al., 2020), which is very laborious.

In order to reduce the development cost of GCNs, recently researches develop some GCN Neural Architecture Search (GCN NAS) algorithms, including GraphNAS (Gao et al., 2020) and SNAG (Zhao et al., 2020)

, to automatically discover good GCN architectures. GraphNAS and SNAG apply a recurrent neural network controller to sample candidate architectures from the search space, and train controller with policy gradient to maximize the expected validation accuracy of the generated architectures. These methods greatly reduce the labour of human experts and can find better GCN architectures than the human-invented ones. However, existing approaches are hardly applicable to the

Federated Learning (FL) scenarios (Yang et al., 2019) with distributed and private graph datasets due to lack of information exchange scheme and efficient evaluation method, which reduces their practicality.

Specifically, the existing GCN NAS techniques are centralized learning approaches, i.e., they require the graph dataset to be aggregated on a single machine or in a datacenter. However, in many practical scenarios, the graph data is distributed across multiple clients without data sharing. For example, the traffic graph with flow data of a country is distributed across different cities, and the social graph with users’ purchase history stored by a multinational corporation is scattered across different datacenters. These graph data fail to be aggregated due to huge transmission cost or information privacy, but a large amount of graph information and labels are still needed for getting a robust GCN model. This requires GCN NAS techniques to be able to cope with FL scenarios, i.e., design effective information exchange strategy and thus learn the optimal GCN architecture in a distributed and privacy-preserving manner. However, existing solutions fail to do so, which is not practical enough.

In addition, we notice that the GCN evaluation methods applied in the existing GCN NAS solutions are inefficient for the FL scenario. They need to train numerous architecture candidates from scratch and for a large number of epochs. In the FL framework, such methods bring huge amount of communications, since architectures should be trained jointly by multiple clients through information exchange for each epoch, which is very inefficient.

In this paper, we address the above challenges, and propose FL-AGCNS, an efficient GCN NAS algorithm, that enables distributed agents to cooperatively design powerful GCN models while keeping personal information on local devices.

Specifically, FL-AGCNS designs a federated evolutionary optimization strategy to fully consider the preferences of each client, and thus recommends GCN architectures that perform well in multiple datasets. Specifically, in the early evolution, we find that population individuals may not be applicable to some clients and thus show poor performance in the FL framework. To efficiently improve the overall performance of individuals, we execute evolutionary algorithms in clients to explore the characteristics of multiple datasets, and utilize the GCN architectures favored by each client to guide individuals to efficiently cover their shortages.

Besides, FL-AGCNS applies the GCN SuperNet, a weight sharing strategy, to speed up the model evaluation. It does not train different GCN architectures from scratch separately in the search stage, but to optimize parameters of GCN SuperNet, and efficiently evaluate various GCN architectures by sharing corresponding parameters in SuperNet. Such weight sharing strategy dramatically reduces the computational complexity of FL-AGCNS, making the search stage of FL-AGCNS efficient. The experimental results show that our optimization strategy performs well in the FL framework, outperforming the traditional evolutionary strategy which only considers the overall performance scores during the evolution. The code will be available on Github 111Our Github link will be available in the published version..

Our major contributions are concluded as follows:

  • Innovation: We are the first to enable the GCN NAS technique to run on the FL framework. The combination of GCN NAS and FL strengthens the practicality of the GCN NAS method and increases the flexibility of the FL framework.

  • Effectiveness: We design a federated evolutionary optimization strategy to effectively search for high-quality GCN architectures under FL framework.

  • Efficiency: We propose to use GCN SuperNet to reduces the search cost of the GCN NAS method, promoting the search efficiency.

2 Prerequisite

We firstly introduce the existing GCNs and FL techniques, then give related concepts of FL based GCN NAS.

2.1 Graph Convolutional Network

Graph Convolutional N

etworks (GCNs) are a kind of neural networks which generalize the operation of convolution from grid data to graph data 

(Wu et al., 2019c). The existing GCNs fall into two categories, spectral-based methods (Kipf & Welling, 2017; Klicpera et al., 2019; Wu et al., 2019a) which define graph convolutions by introducing filters from the perspective of graph signal processing (Shuman et al., 2013), and spatial-based methods (Velickovic et al., 2018; Xu et al., 2019; Hamilton et al., 2017; Verma et al., 2018) which define graph convolutions by information propagation. Two kinds of methods interpret graph convolutions from different angles, and propose many effective graph convolutional layers to extract node features from the one-hop neighborhood. In this paper, we flexibly use these existing graph convolutional layers to achieve the automatic design of appropriate GCN architectures.

2.2 Federated Learning

Federated Learning (FL) is a decentralized approach, which aims to train robust centralized model based on datasets that are distributed across multiple clients without sharing their data (McMahan et al., 2017; Yang et al., 2019). FL makes it possible to vigorously develop AI techniques in privacy-preserving era, and attracts great attention of scholars. Many effective FL schemes are proposed to train well-known machine learning models, such as deep neural networks (Liu et al., 2020)

and random forest 

(Wu et al., 2020), and achieve great effect in the real applications. However, there is still no FL work on automatic model design yet. In this paper, we fill this gap, enabling FL framework to automatically learn good GCN architectures.

Based on distribution characteristics of the data, FL algorithms can be divided into three categories, i.e., horizontal FL, vertical FL and federated transfer learning 

(Yang et al., 2019). Horizontal FL applies to scenarios that datasets share same feature space but different in samples (McMahan et al., 2016), whereas the other two categories consider datasets with different feature spaces. In this paper, we focus on the cases that graph dataset is distributed to different regions in the form of densely connected subgraphs, nodes are different among clients, but the feature space of each node is same. Therefore, our study belongs to horizontal FL. Our study can be extended to others FL scenarios after minor adjustments, which will be studied in our future works.

Figure 1: Overall framework of FL-AGCNS. Subgraphs are scattered in different clients without sharing. Each client is responsible for providing its preferred architectures and local evaluation/gradient information, and the controller aims to guide clients to search for the optimal GCN architecture by analyzing or aggregating these messages.

2.3 Related Definitions

We introduce graph’s basic concepts, and describe the horizontal FL based GCN NAS problem studied in this paper.

Definition 1: Graph and Graph Signal. We use to denote a graph, where is the set of nodes, is the adjacency matrix of , is the graph signal (feature matrix) of the graph , denotes the number of vertices, and denotes the number of attribute features of each node. We use , and to represent the training set, validation set and test set of graph , respectively.

Definition 2: HFL based GCN NAS. Given a search space for GCN architectures and a set of subgraphs that are distributed and non-shared across clients, the Horizontal FL based GCN NAS (HFL based GCN NAS) problem aims to find that minimizes the overall validation loss of all subgraphs.

where and represent the validation and training loss of on graph under weights respectively. Different from the traditional GCN NAS, which only copes with a complete graph . HFL-based GCN NAS breaks the limitation of “Isolated Data Islands”, enabling the scattered graph data to jointly search for good GCN architectures without leaking private information.

3 Our Method

In this paper, we design the FL-AGCNS algorithm to deal with the HFL based GCN NAS problem. We apply GCN SuperNet to achieve fast evaluation of GCN architectures (Section 3.2), and utilize the federated evolutionary optimization strategy (Section 3.3) to effectively improve the federated performance of the population. Following previous work (Dong & Yang, 2019; Yang et al., 2020), we optimize SuperNet weights and population successively in each training step, and thus reduce expensive inner optimization of our bi-optimization problem. Figure 1 gives the overall framework of FL-AGCNS, and the following are the detailed introduction of FL-AGCNS.

Stage Stage Function Description Detail Contents Operation Options
Stage1: Input
Transform Stage
Produce low-dimensional representations of nodes. Input Structure (IS)
: FullyConnection + Sigmoid (out size:64)
: FullyConnection + Tanh (out size:64)

: FullyConnection + Relu (out size:64)

: FullyConnection + Softmax (out size:64)
: FullyConnection + Identity (out size:64)
Stage2: Feature
Embedding Stage
Get high-level node features using GCN layers: . GCN Type of ()
: GATConv    : GINConv    : SAGEConv
: GCNConv    : SGConv    : APPNP
: AGNNConv    : ARMAConv    : FeaStConv
: GENConv    : GMMConv    : GatedGraphConv
Note: is the number of GCN layers,
and detail contents of each GCN layer
are described in the right part.
Index of the Preceding
GCN Layer of
()
=None,0,…,i-1
Note: =None denotes ,…, are invalid, and
the output of Stage 2 is considered as .
Stage3: Output
Transform Stage
Transform the output of the Stage2 (S2) into the expected prediction. Output Structure (OS)
: FullyConnection + Sigmoid (out size:64)
: FullyConnection + Tanh (out size:64)
: FullyConnection + Relu (out size:64)
: FullyConnection + Softmax (out size:64)
: FullyConnection + Identity (out size:64)
Table 1: Three stages that need to be gown through to construct a GCN architecture.

3.1 Code Representation of GCN in FL-AGCNS

The design of a GCN architecture contains the following 3 stages: (1) Input transform stage, which produces low-dimensional representations of nodes; (2) Feature embedding stage, which extracts high-level node features using multiple GCN layers; (3) Output transform stage, which transforms the output of the feature embedding stage to the final prediction. In FL-AGCNS we use parameters ( is the number of GCN layers), as is shown in Table 1, to describe the detailed structures of these stages, so as to obtain the code representation of the GCN architecture.

Specifically, in Stage 1 and Stage 3, we use 2 parameters: Input Structure (IS) and Output Structure (OS), whose options are composed of fully connected layers with different activation functions, to specify the applied processing structure. For Stage 2, we utilize

and to describe the operation type and connection method of each GCN layer , and apply the mean operation to fuse the outputs of all valid GCN layers. We take some state-of-the-art GCN layers implemented by the torch_geometric library222https://pytorch-geometric.readthedocs.io/en/latest/ as the options of

, and allow each layer to connect with any one of its front layers, so as to build a flexible and diversified feature extraction architecture. Figure 

2 (a) is an example of a GCN architecture code.

Considering the value space of each parameter, the above defined code representation can represent about GCN architectures in total. These architectures are regarded as the search spaces in FL-AGCNS, denoted as , and the code representation of a GCN architecture is denoted as .

3.2 GCN SuperNet in FL-AGCNS

In FL-AGCNS, the main role of GCN SuperNet is to evaluate the performance of GCN architectures by sharing parameters for different architectures, thus efficiently complete the fitness evaluation in the evolution process. Specifically, given , FL-AGCNS extracts corresponding weights from , i.e., the collection of all parameters in SuperNet, and thus efficiently transform the data flow to evaluate its performance. The realization of this function requires SuperNet to cover all possible GCN architectures in . Based on this thought, we design the structure of GCN SuperNet. Figure 2(b) is an example.

Federated SuperNet Evaluation. In FL-AGCNS, subgraphs are scattered in different clients without sharing. To obtain the performance of on such federated graph dataset, the local evaluation information of each client should be calculated separately, and therefore GCN SuperNet should be stored in each client. Let be the corresponding parameters of , where is the mask operation that keeps parameters of the complete SuperNet only for positions corresponding to the operations applied in code . Then, the performance of under the FL framework (we then call this the featured performance) can be expressed as follows:

(1)
Figure 2: Code Representation and GCN SuperNet Structure.

where denotes the validation loss of architecture on subgraph under weight , EN and DE are homomorphic encryption and decryption method (Cheon et al., 2017) respectively. In FL-AGCN, each client calculates separately, then sends encrypted evaluation information to the controller to ensure the security of information transference. Controller aggregates local information and obtains the featured performance after decryption. Detailed process is shown in the first round of information exchange in the right dotted box of Figure 1.

Federated SuperNet Optimization. In addition to the evaluation task, SuperNet also needs to optimize according to the federated performance of GCN architectures in population provided by the optimizer, so as to achieve higher evaluation capability. Given , the gradient of its architecture parameters can be calculated as follows:

SuperNet parameters should fit all the individuals in population , and thus the gradients for all architectures should be accumulated to calculate the gradient of :

Denote , i.e., information of obtained by client , as , and add homomorphic encryption to ensure the security of information transference, then the above equation can be written as:

(2)

In the federated SuperNet optimization stage of FL-AGCNS, each client calculates separately, then sends the encrypted gradient information to the controller. The controller aggregates the local information to obtain the encrypted , i.e., , and sends it to all clients. After decryption, each client can use to optimize the weight of according to the gradient descent method, and thus improve the evaluation ability of GCN SuperNet. The operation flow of this part is shown in the left dotted box of Figure 1.

3.3 Federated Evolutionary Optimization

Architecture optimization is another core content of the HFL-based GCN NAS. In FL-AGCN, we introduce Evolutionary Algorithm (EA) (Holland, 1992) to optimize GCN architectures. We aim to recommend a GCN architecture that performs well on multiple subgraphs, so our optimization goal is to minimize the federated performance score mentioned in Section 3.2. A simple EA implementation scheme is to take the negative of federated performance as the individual fitness, and execute evolution steps in the controller to optimize the population.

Two Difficulties. However, this EA scheme, denoted as , is not effective enough for GCN NAS tasks under the FL framework, having the following two defects. Firstly, it ignores the preference of each client during evolution, and thus be less efficient. Specifically, we find that individuals in the population may not be applicable to some clients in the early evolution, and thus show poor federated performance in FL framework. If GCN architectures preferred by each client can be involved in the population evolution, they can guide individuals to efficiently cover their shortages, and thus speed up the optimization process. However, scheme fails to do so and thus be inefficient.

Secondly, due to lacks of dominant architectures of clients, the population in

may lead to the unfairness of SuperNet evaluation, and thus falls into the local optimal solution. Specifically, in each evolution step, our algorithm firstly uses population to optimize SuperNet parameters, and then utilizes the optimized SuperNet to evaluate individual fitness and evolves a new population. In such a scenario, the population’s performance skew would greatly affect the fairness of SuperNet evaluation. That is to say, if the federated performance of population is heavily skewed towards partial clients (denoted by

), SuperNet weights will show performance evaluation deviation after optimization, i.e., score high on architectures suitable for while ignoring other subgraphs. With a long time, would fall into local optimums under the wrong guidance of SuperNet. A solution to the performance skew problem is to add dominant architectures of each client to population to optimize SuperNet parameters, and thus enhance the fairness of evaluation. However, the fails to do so. The federated performance of its population is generally low at the early stage, which may lead to performance skew, resulting in poor optimization effect.

Algorithm 1 Federated Evolutionary Optimization

FEO strategy. To better solve the HFL based GCN NAS problem, in this paper, we propose a Federated Evolutionary Optimization strategy (FEO strategy), which fully considers the preference of each client during the evolution for optimization acceleration and optimization effect improvement. In FEO strategy, each client and the controller maintain an EA optimizer, so as to search for the population with good federated performance, which is denoted as , and the population performing well in a specific subgraph , which is denoted as , respectively333Controller’s optimizer takes the negative of federated performance as the individual fitness, aiming to improve the federated performance of the population. The EA optimizer in each client uses to measure individual fitness instead, aims to explore characteristics of subgraph and get GCN architectures suitable for .. In each round of evolution, FEO strategy firstly evolves new and (i=1,…,N), and then optimizes SuperNet parameters using a new population , which is composed of top individuals with lowest scores in each and top individuals with the lowest scores in , where is the proportion of clients’ evolutionary results in . Considering the GCN architectures favored by each client, new population can effectively avoid the performance skew problem, maintaining the fairness of GCN SuperNet evaluation.

After the SuperNet optimization step, FEO strategy replaces with the new population to perform the next round of evolution. For that individuals in are involved in the population evolution of controller, the shortages of individuals can be efficiently corrected in the next round of evolution, achieving higher federated performance. After adding the local EA operations, FEO strategy successfully overcomes two difficulties of , performing well on GCN NAS problems under FL framework. The right dotted box of Figure 1 shows the detailed procedure of FEO, and Algorithm 1 gives its pseudo-code.

Note that in FEO strategy, the value of will decrease with the increase of evolution generations. The reasons are as follows. With the increase of generations, the shortages of individuals in are gradually resolved and their federated performance are improved, the performance skew phenomenon disappears progressively, and SuperNet parameters are gradually stabilized. In this case, the advantages of introducing the preference of each client will gradually decrease. Promoting the proportion of in can make the optimizer focus on more individuals with high federated performance, and turn to apply the advantage fusion strategy to evolve individuals with better federated performance.

1:  # Initialization
2:  Controller: Initialize the weights of GCN SuperNet, population . Send them to N Clients.
3:  Client i (i=1,…,N): Receive and SuperNet weights from Controller. Initialize SuperNet and local population .
4:  # Searching for the Optimal GCN architecture
5:  for  do
6:     # Federated SuperNet Weights Optimization
7:     Controller: send to N Clients.
8:     Client i (i=1,…,N): Receive from Controller.
9:     for  do
10:         Client i (i=1,…,N): Send to Controller.
11:         Controller: Receive gradient information from N Clients. Get using Equation 2, send it to N Clients.
12:         Client i (i=1,…,N): Receive from Controller. Update the weights of SuperNet using gradient descent.
13:     end for
14:     # Execute FEO Strategy
15:     Controller: , send to N Clients.
16:     Controller&Clients: Execute Algorithm 1.
17:  end for
18:  # Output the Optimal GCN architecture
19:  Controller: send to N clients;
20:  Client i (i=1,…,N): Receive from Controller. Calculate and send it to Controller.
21:  Controller: Receive from N Clients. Get according to Equation 1.
22:  Controller:
23:  Output:
Algorithm 2 FL-AGCN Algorithm

3.4 FL-AGCN: Federated Learning Framework for Automatic Graph Convolutional Network Search

Combining the GCN SuperNet with the FEO strategy, then we get the FL-AGCN algorithm. Algorithm 2 summarizes the detailed procedure of FL-AGCN. Given a HFL based GCN NAS task, FL-AGCN firstly performs initialization steps (Line 1-3). Then it performs evolutionary steps iteratively to search for optimal GCN architecture (Line 4-15). In each evolution step, FL-AGCN synchronously updates SuperNet parameters in each client (Line 6-12), and executes FEO strategy to optimize population (Line 13-15). Finally, FL-AGCN selects an individual with the highest federated performance as the final output (Line 16-20). Search time analysis of FL-AGCN is given in supplementary material.

Dataset #Nodes #Edges #Classes #Features Train/Dev/Test Nodes
Cora 2,708 5,429 7 1433 140/500/1,000
CiteSeer 3,327 4,732 6 3,703 120/500/1,000
PubMed 19,717 44,338 3 500 60/500/1,000
CoraFull 19,793 126,842 70 8,710 1,395/500/1,000
Physics 34,493 495,924 5 8,415 100/500/1,000
Table 2: Datasets used in our experiment.
Cora CiteSeer PubMed
(%) Params(M) Time(s) (%) Params(M) Time(s) Params(M) Time(s)
FL-AGCNS 82.9 0.105 0.0339 70.6 0.250 0.0553 78.4 0.036 0.0601
FL-Gradient 65.4 0.045 0.0415 37.0 0.300 0.6061 77.9 0.099 0.0720
FL-RL 75.6 0.101 0.0297 70.2 0.242 0.0631 68.6 0.032 0.0373
FL-Random 79.8 0.113 0.0335 47.4 0.296 0.0916 64.3 0.066 0.0492
GAT 79.0 0.092 0.0262 68.0 0.238 0.0612 77.5 0.032 0.0502
SAGE 78.8 0.184 0.0294 66.6 0.475 0.0920 75.2 0.064 0.0426
GCN 81.3 0.102 0.0200 68.6 0.260 0.0467 77.3 0.034 0.0395
SGC 78.1 0.010 0.0152 67.0 0.022 0.0381 76.3 0.002 0.0417
APPNP 82.2 0.092 0.0216 69.5 0.237 0.0572 77.5 0.032 0.0401
AGNN 81.2 0.023 0.0820 68.7 0.059 0.0969 78.1 0.008 0.0960
ARMA 75.2 0.092 0.0246 63.8 0.237 0.0462 76.8 0.032 0.0360
GatedGraph 76.6 0.130 0.0270 63.5 0.275 0.0494 77.3 0.070 0.0361
Table 3: Experimental results on three datasets: Cora, CiteSeer and PubMed.
CoraFull Physics
(%) Params(M) Time(s) (%) Params(M) Time(s)
FL-AGCNS 59.7 0.562 0.4859 77.1 0.547 0.4012
FL-Gradient 58.0 0.450 0.6381 67.0 0.539 0.4162
FL-RL 44.2 0.562 0.4813 61.8 0.539 0.3722
FL-Random 50.0 0.291 0.5815 72.8 0.576 0.3778
GAT 57.5 0.562 0.6381 64.9 0.539 0.5399
SAGE 57.3 1.124 0.6385 70.1 1.078 0.9268
GCN 51.5 1.172 0.7674 76.5 0.581 0.5130
SGC 56.1 0.610 0.5420 70.2 0.042 0.3460
APPNP 60.3 0.562 0.4773 73.2 0.539 0.6709
AGNN 47.2 0.141 0.5188 69.9 0.135 0.3399
ARMA 47.0 0.562 0.5506 58.8 0.539 0.5171
GatedGraph 31.7 0.599 0.4726 62.8 0.576 0.4052
Table 4: Results on the other two datasets: CoraFull and Physics.

4 Experiment

In this section, we test the performance of FL-AGCNS. Firstly, we compare FL-AGCNS with existing NAS algorithms. Secondly, we compare the optimal GCN architecture discovered by FL-AGCNS with the state-of-the-art GCN architectures. Finally, ablation experiment is conducted to analyze FEO strategy designed in our algorithm. All experiments are implemented using RTX 2080 Ti.

4.1 Experimental settings

Datasets. We evaluate the proposed algorithm on 5 popular network datasets: Cora, CiteSeer, PubMed (Yang et al., 2016), CoraFull (Bojchevski & Günnemann, 2017) and Physics (Shchur et al., 2018). The first four datasets are citation networks, and the last dataset is a co-author network. The dataset statistics are given in Table 2.

Evaluation Metrics. We use the federated accuracy which is defined as follows to examine the performance of GCN architecture under FL framework.

where denotes the test accuracy of on subgraph under weight . For each GCN NAS algorithm, we report the score of the optimal GCN architecture discovered by it, so as to examine its ability in dealing with the HFL based GCN NAS problems. We also report GCN architectures’ parameter amount (denoted by Params) and the inference time on the test set (denoted by Time), to show their complexity.

Baselines.

We compare FL-AGCNS with two popular NAS algorithms: reinforcement learning based BlockQNN 

(Zhong et al., 2018) and gradient based DARTS (Liu et al., 2019), and a commonly used baseline in NAS, Random Search. To enable these non-federated NAS algorithms to cope with our HFL based GCN NAS problems, we set the search space to , and replace their evaluation or gradient information with the federated ones. We then denote these federated versions as FL-RL, FL-Gradient and FL-Random respectively. In FL-RL and FL-Random, we evalue each GCN architecture for 50 epochs. In addition, we take 8 state-of-the-art GNN architectures: GAT (Velickovic et al., 2018), SAGE (Hamilton et al., 2017), GCN (Kipf & Welling, 2017), SGC (Wu et al., 2019a), APPNP (Klicpera et al., 2019), AGNN (Thekumparampil et al., 2018), ARMA (Bianchi et al., 2019) and GatedGraph (Li et al., 2016), as baselines, to show the importance of GCN NAS under FL framework.

Implementation Details. In the experiments, we divide each network into subgraphs using Metis (Karypis & Kumar, 1998) clustering method, and use these subgraphs to simulate clients in FL framework444In the experiment, we simulate clients instead in the Physics dataset, due to GPU memory constraints.. We apply CKKS scheme555https://github.com/muhanzhang/SEAL/tree/master/Python (Cheon et al., 2017) to achieve encryption and decryption. In FL-AGCNS, we set population size to 60, the number of GCN layers to 6, evolution generations to 250 and to 5. As for the compared NAS algorithms and existing GCN architectures, we follow implementation details reported in their papers, and control the running time of each NAS algorithm to be the same.

Figure 3: Optimal GCN architecture searched by FL-AGCNS.
Figure 4: Visualization of experiment results. represents the complexity of the GCN architecture , where denotes 12 architectures determined by algorithms or models in Table 3.

4.2 Performance Evaluation

The performance of 4 NAS algorithms and 8 GCN architectures under FL framework are shown in Table 3 and Table 4, Figure 3 visualizes GCN architectures searched by FL-AGCNS. We can observe that FL-AGCNS exceeds the other NAS algorithms on five datasets, achieving higher performance in the FL framework. This result shows the importance of local optimizers under FL scenarios. With the help of local optimizers, FL-AGCNS can utilize the preference of each client to quickly enhance population’s federated performance under FL scenarios, thus recommends more powerful GCN architectures within the same search time, compared with other NAS methods without local optimizers.

In addition, we observe that GCN architectures discovered by FL-AGCNS generally outperform existing GCN architectures in FL framework. This result shows us the importance of GCN NAS. GCN NAS algorithms make FL more flexible and powerful. They can provide clients with more and better GCN models in the FL framework, achieving stronger federated effect, compared with the existing models.

Figure 4 visualizes GCN architectures’ performance and complexity in Table 3. As we can see, GCN architectures discovered by FL-AGCNS achieve the highest performance scores, and their complexity exceeds many of the existing classic GCN architectures and NAS solutions. Overall, the solution provided by FL-AGCNS is superior considering both performance and complexity.

4.3 Ablation Experiments

We further investigate the effect of FEO strategy and connections among each clients on the performance of FL-AGCNS algorithm using the following three variants of FL-AGCNS, thus verify the innovations presented in this paper and specify the application conditions of our FL-AGCNS.

  • CrotrollerEO: This algorithm fixes the value of in FL-AGCNS as 1. It only preserves the EA optimizer in the controller.

  • ClientEO: This algorithm fixes the value of in FL-AGCNS as 0. It only preserves the local EA optimizer in each client.

  • RandomPartition: This algorithm uses randomly-divided subgraphs to simulate clients in the FL framework, and executes FL-AGCNS on those clients.

Corresponding results are shown in Table 5, we can see that FL-AGCNS has much better performance than CrotrollerEO and ClientEO. This result shows us the significance and necessity of maintaining EA optimizers in both the controller and clients under FL scenarios. As discussed in Section 3.3, EA optimizers in each client contribute much in avoiding local optimums and improving federated performance in the early generations, whereas the controller’s EA optimizer performs better on population optimization in the later generations. Two kinds of optimizers optimize population from different aspects in FL-AGCNS and show good performance in different stages. Therefore, their combination is meaningful and necessary.

Besides, we observe that RandomPartition performs much worse than FL-AGCNS. We notice that the random partition method generates much more external connections among subgraphs compared with the Metis method, which leads to many useful edge information being ignored in the FL framework. This result shows that FL-AGCNS is more suitable for FL scenarios where the subgraph of each client is densely connected and its outer connections are of no great importance and rare in practise. For example, the clients contain graph information of different districts or different user groups.

Algorithm (%) Params(M) Time(s)
FL-AGCNS 82.9 0.105 .0339
ControllerEO 71.6 0.117 .0434
ClientEO 78.6 0.105 .0489
RandomPartition 52.6 0.143 .0526
Table 5: Results of ablation experiments on Cora dataset.

5 Conclusion and Future Works

In this paper, we make up for the shortcomings of the existing GCN NAS techniques, and propose FL-AGCNS, a more efficient and practical GCN NAS algorithm which considers the FL scenarios. Our algorithm designs the FEO strategy to fully consider the preferences of different clients, enabling distributed agents to collaboratively design the high-performance GCN model while protecting data privacy. As we know, this is the first work which combines GCN NAS with the FL. In addition, we apply the GCN SuperNet to reduce the evaluation cost of GCN models, and thus promote the search efficiency of FL-AGCNS. The extensive experiments show that our FL-AGCNS can recommend better GCN models in short time under the FL framework, surpassing the state-of-the-arts NAS methods and GCN models, which demonstrates the effectiveness of FL-AGCNS. In this paper, we only consider one optimization objective and the horizontal FL scenarios. In future works, we will consider more optimization objectives and more FL scenarios, so as to recommend more practical GCN models under more realistic scenarios.

References

  • Bianchi et al. (2019) Bianchi, F. M., Grattarola, D., Livi, L., and Alippi, C. Graph neural networks with convolutional ARMA filters. CoRR, abs/1901.01343, 2019. URL http://arxiv.org/abs/1901.01343.
  • Bojchevski & Günnemann (2017) Bojchevski, A. and Günnemann, S. Deep gaussian embedding of attributed graphs: Unsupervised inductive learning via ranking. CoRR, abs/1707.03815, 2017.
  • Cheon et al. (2017) Cheon, J. H., Kim, A., Kim, M., and Song, Y. S. Homomorphic encryption for arithmetic of approximate numbers. In Takagi, T. and Peyrin, T. (eds.), ASIACRYPT, volume 10624 of Lecture Notes in Computer Science, pp. 409–437. Springer, 2017.
  • Dong & Yang (2019) Dong, X. and Yang, Y. Searching for a robust neural architecture in four GPU hours. In CVPR

    , pp. 1761–1770. Computer Vision Foundation / IEEE, 2019.

  • Gao et al. (2020) Gao, Y., Yang, H., Zhang, P., Zhou, C., and Hu, Y. Graph neural architecture search. In Bessiere, C. (ed.), IJCAI, pp. 1403–1409. ijcai.org, 2020.
  • Hamilton et al. (2017) Hamilton, W. L., Ying, Z., and Leskovec, J. Inductive representation learning on large graphs. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R. (eds.), Annual Conference on Neural Information Processing Systems, pp. 1024–1034, 2017.
  • Holland (1992) Holland, J. H.

    Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence

    .
    MIT Press, 1992. ISBN 9780262275552. doi: 10.7551/mitpress/1090.001.0001. URL https://doi.org/10.7551/mitpress/1090.001.0001.
  • Hu et al. (2019) Hu, F., Zhu, Y., Wu, S., Wang, L., and Tan, T. Hierarchical graph convolutional networks for semi-supervised node classification. In Kraus, S. (ed.), IJCAI, pp. 4532–4539. ijcai.org, 2019.
  • Karypis & Kumar (1998) Karypis, G. and Kumar, V. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput., 20(1):359–392, 1998.
  • Kipf & Welling (2017) Kipf, T. N. and Welling, M. Semi-supervised classification with graph convolutional networks. In ICLR. OpenReview.net, 2017.
  • Klicpera et al. (2019) Klicpera, J., Bojchevski, A., and Günnemann, S. Predict then propagate: Graph neural networks meet personalized pagerank. In ICLR. OpenReview.net, 2019.
  • Li et al. (2016) Li, Y., Tarlow, D., Brockschmidt, M., and Zemel, R. S. Gated graph sequence neural networks. In Bengio, Y. and LeCun, Y. (eds.), ICLR, 2016.
  • Liu et al. (2019) Liu, H., Simonyan, K., and Yang, Y. DARTS: differentiable architecture search. In ICLR. OpenReview.net, 2019.
  • Liu et al. (2020) Liu, Y., Huang, A., Luo, Y., Huang, H., Liu, Y., Chen, Y., Feng, L., Chen, T., Yu, H., and Yang, Q. Fedvision: An online visual object detection platform powered by federated learning. In AAAI, pp. 13172–13179. AAAI Press, 2020.
  • McMahan et al. (2017) McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. Communication-efficient learning of deep networks from decentralized data. In Singh, A. and Zhu, X. J. (eds.), AISTATS, volume 54 of Proceedings of Machine Learning Research, pp. 1273–1282. PMLR, 2017.
  • McMahan et al. (2016) McMahan, H. B., Moore, E., Ramage, D., and y Arcas, B. A. Federated learning of deep networks using model averaging. CoRR, abs/1602.05629, 2016.
  • Shchur et al. (2018) Shchur, O., Mumme, M., Bojchevski, A., and Günnemann, S. Pitfalls of graph neural network evaluation. CoRR, abs/1811.05868, 2018. URL http://arxiv.org/abs/1811.05868.
  • Shuman et al. (2013) Shuman, D. I., Narang, S. K., Frossard, P., Ortega, A., and Vandergheynst, P.

    The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains.

    IEEE Signal Process. Mag., 30(3):83–98, 2013.
  • Thekumparampil et al. (2018) Thekumparampil, K., Wang, C., Oh, S., and Li, L.-J.

    Attention-based graph neural network for semi-supervised learning.

    03 2018.
  • Velickovic et al. (2018) Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., and Bengio, Y. Graph attention networks. In ICLR. OpenReview.net, 2018.
  • Verma et al. (2018) Verma, N., Boyer, E., and Verbeek, J. Feastnet: Feature-steered graph convolutions for 3d shape analysis. In CVPR, pp. 2598–2606. IEEE Computer Society, 2018.
  • Wu et al. (2019a) Wu, F., Jr., A. H. S., Zhang, T., Fifty, C., Yu, T., and Weinberger, K. Q. Simplifying graph convolutional networks. In Chaudhuri, K. and Salakhutdinov, R. (eds.), ICML, volume 97 of Proceedings of Machine Learning Research, pp. 6861–6871. PMLR, 2019a.
  • Wu et al. (2019b) Wu, S., Tang, Y., Zhu, Y., Wang, L., Xie, X., and Tan, T. Session-based recommendation with graph neural networks. In AAAI, pp. 346–353. AAAI Press, 2019b.
  • Wu et al. (2020) Wu, Y., Cai, S., Xiao, X., Chen, G., and Ooi, B. C. Privacy preserving vertical federated learning for tree-based models. Proc. VLDB Endow., 13(11):2090–2103, 2020.
  • Wu et al. (2019c) Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., and Yu, P. S. A comprehensive survey on graph neural networks. CoRR, abs/1901.00596, 2019c.
  • Xu et al. (2019) Xu, K., Hu, W., Leskovec, J., and Jegelka, S. How powerful are graph neural networks? In ICLR. OpenReview.net, 2019.
  • Yang et al. (2019) Yang, Q., Liu, Y., Chen, T., and Tong, Y. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol., 10(2):12:1–12:19, 2019.
  • Yang et al. (2016) Yang, Z., Cohen, W. W., and Salakhutdinov, R. Revisiting semi-supervised learning with graph embeddings. CoRR, abs/1603.08861, 2016. URL http://arxiv.org/abs/1603.08861.
  • Yang et al. (2020) Yang, Z., Wang, Y., Chen, X., Shi, B., Xu, C., Xu, C., Tian, Q., and Xu, C. CARS: continuous evolution for efficient neural architecture search. In CVPR, pp. 1826–1835. IEEE, 2020.
  • Yu et al. (2018) Yu, B., Yin, H., and Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. In Lang, J. (ed.), IJCAI, pp. 3634–3640. ijcai.org, 2018.
  • Zhao et al. (2020) Zhao, H., Wei, L., and Yao, Q. Simplifying architecture search for graph neural network. In Conrad, S. and Tiddi, I. (eds.), CIKM, volume 2699 of CEUR Workshop Proceedings. CEUR-WS.org, 2020.
  • Zhong et al. (2018) Zhong, Z., Yan, J., Wu, W., Shao, J., and Liu, C. Practical block-wise neural network architecture generation. In CVPR, pp. 2423–2432. IEEE Computer Society, 2018.