DeepAI
Log In Sign Up

p2pGNN: A Decentralized Graph Neural Network for Node Classification in Peer-to-Peer Networks

In this work, we aim to classify nodes of unstructured peer-to-peer networks with communication uncertainty, such as users of decentralized social networks. Graph Neural Networks (GNNs) are known to improve the accuracy of simpler classifiers in centralized settings by leveraging naturally occurring network links, but graph convolutional layers are challenging to implement in decentralized settings when node neighbors are not constantly available. We address this problem by employing decoupled GNNs, where base classifier predictions and errors are diffused through graphs after training. For these, we deploy pre-trained and gossip-trained base classifiers and implement peer-to-peer graph diffusion under communication uncertainty. In particular, we develop an asynchronous decentralized formulation of diffusion that converges at the same predictions linearly with respect to communication rate. We experiment on three real-world graphs with node features and labels and simulate peer-to-peer networks with uniformly random communication frequencies; given a portion of known labels, our decentralized graph diffusion achieves comparable accuracy to centralized GNNs.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

1 Introduction

The pervasive integration of mobile devices and the Internet-of-Things in everyday life has created an expanding interest in processing their collected data Wang et al. (2018); Li et al. (2018); Lim et al. (2020). However, traditional data mining techniques require communication, storage and processing resources proportional to the number of devices and have raised data control and privacy concerns. An emerging alternative is to mine data at the devices gathering them with protocols that do not require costly or untrustworthy central infrastructure. One such protocol is gossip averaging, which averages local model parameters across pairs of devices during training (Subsection 2.2).

In this paper we tackle the problem of classifying points of a shared feature space when each one is stored at the device generating it, i.e. each device accesses only its own point but all devices collect the same features. For example, mobile devices of decentralized social media users could classify user interests based on locally stored content features, such as the bag-of-words of sent messages, and given that some device users have disclosed their interests. We further consider devices that are nodes of peer-to-peer networks and communicate with each other based on underlying relations, such as friendship or proximity. In this setting, social network overlays coincide with communication networks. However, social behavior dynamics (e.g. users going online or offline) could prevent devices from communicating on-demand or at regular intervals. Ultimately, with who and when communication takes place can not be controlled by learning algorithms.

When network nodes corresponding to data points are linked based on social relations, a lot of information is encapsulated in their link structure in addition to data features. For instance, social network nodes one hop away often assume similar classification labels, a concept known as homophily McPherson et al. (2001); Berry et al. (2020). Yet, existing decentralized learning algorithms do not naturally account for information encapsulated in links, as they are designed for structured networks of artificially generated topologies. In particular, decentralized learning often focuses on creating custom communication topologies that optimize some aspect of learning Koloskova et al. (2019) and are thus independent from data. Our investigation differs from this assumption in that we classify data points stored in decentralized devices that form unstructured communication networks, where links capture real-world activity (e.g. social interactions) unknown at algorithm design time, as demonstrated in Figure 1.

Figure 1: A directed ring-structured communication network (left) and an undirected unstructured example (right) of devices .

If a centralized service performed classification, Graph Neural Networks (GNNs) could be used to improve the accuracy of base classifiers, such as ones trained with gossip averaging, by accounting for link structure (Subsection 2.1). But, if we tried to implement GNNs with the same decentralized protocols, connectivity constraints would prevent nodes from timely collecting latent representations from neighbors, where these representations are needed to compute graph convolutions. To tackle this problem, we propose working with decoupled GNNs, where network convolutions are separated from base classifier training and organized into graph diffusion components. Given this architecture, we either use pre-trained base classifiers or train those with gossip protocols and realize graph diffusion in peer-to-peer networks by developing an algorithm whose fragments run on each node and converge at the same predictions as graph diffusion while working under uncontrolled irregular communication initiated by device users. Critically, this algorithm allows online modification of base predictions it diffuses and hence base classifiers can be trained while their outcomes are being diffused. Thus, all components of implemented decoupled GNNs—both base classifier training and graph diffusion processes—run at the same time and eventually converge at the desired results.

Our contribution lies in introducing a decentralized setting for classifying peer-to-peer network devices and in porting to this centralized GNN components that induce accuracy improvements. Given existing methods of training or deploying base classifiers in peer-to-peer networks, our decentralized algorithm, called p2pGNN, is used to classify nodes of simulated networks under uncertain availability and with higher accuracy than base classifiers. To our knowledge, our approach is the first that considers communication links themselves useful to be useful for the learning task, i.e. in networks where communication topology is retrieved from the real world instead of being imposed on it. To support link mining operations, we introduce a novel organization of prediction primitives that facilitates decentralized graph diffusion and theoretically show fast convergence to similar prediction quality as centralized architectures. Furthermore, we experimentally verify that our approach successfully takes advantage of graph diffusion components and reaches closely matches the classification accuracy of fully centralized computations.

2 Background

2.1 Graph Neural Networks

Graph Neural Networks (GNNs) are a machine learning paradigm in which links between data samples

111In our setting, there is 1-1 correspondence between samples and graph nodes. are used to improve the predictions of base neural network models Wu et al. (2020). In detail, samples are linked to form graphs based on real-world relations, and information diffusion schemes smooth —e.g. average— latent attributes across graph neighbors before transforming them with dense layers and non-linear activations to new representations to be smoothed again. This is repeated either ad infinitum or for a fixed number of steps to combine original representations with structural information. Although propagation is similar to decentralized learning in that nodes work independently, transformation parameters are shared and learned across all nodes.

GNN architectures tend to suffer from over-smoothing if too many (e.g. more than two) smoothing layers are employed. However, using few layers limits architectures to propagating information only few hops away from its original nodes. Mitigating this issue often involves recurrent links to the first latent representations, which lets GNNs achieve at least the same theoretical expressiveness as graph filters Klicpera et al. (2018); Chen et al. (2020). In fact, it has been argued that the success of GNNs can in large part be attributed to the use of recurrency rather than end-to-end training of seamless architectures Huang et al. (2020)

. As a result, recent works have introduced decoupled architectures that achieve the same theoretical expressive power as end-to-end training by training base statistical models (such as two-layer perceptrons) to make predictions, and smoothing the latter through graph edges.

In this work, we build on the FDiff-scale prediction smoothing proposed by Huang et al. Huang et al. (2020)

, which diffuses the base predictions and respective errors of base classifiers to all graph nodes using a constrained personalized PageRank that retains training node labels. Then, a linear trade-off between errors and predictions is calculated for each node and the outcome is again diffused with personalized PageRank to make final predictions. This process generalizes to multiclass predictions by replacing node values propagated by personalized PageRank with vectors holding prediction scores, where initial predictions are trained by the base classifier to minimize a cross-entropy loss. This architecture is discussed in more detail in Subsection 

3.2.

2.2 Decentralized Learning

Decentralized learning refers to protocols that help pools of devices learn statistical models by accounting for each other’s data. Conceptually, each device holds its own autonomous version of the model, and training aims to collectively make those converge to being similar to each other and to a centralized training equivalent, i.e. to be able to replicate would-be centralized predictions locally.

Many decentralized learning practices have evolved from distributed learning, which aims to speed up the time needed to train statistical models by splitting calculations among many available devices, called workers. Typically, workers perform computationally heavy operations, such as gradient estimation for subsets of training data, and send these to a central infrastructure that orchestrates the learning process. A well-known variation of distributed learning occurs when data batches are split across workers a-priori, for example because they are gathered by these, and are sensitive in the sense that they cannot be directly presented to the orchestrating service. This paradigm is called federated learning and is often realized with the popular federated averaging (FedAvg) algorithm

McMahan et al. (2017)

. FedAvg performs several learning epochs in each worker before sending parameter gradients to a server that uses the average across workers to update a model and send it back to all of them.

By definition, distributed and federated learning train one central model that is fed back to workers to make predictions with. However, gathering gradients and sending back the model requires a central service with significantly higher throughput than individual workers to simultaneously communicate with all of them and orchestrate learning. To reduce the related infrastructure costs and remove the need for a central authority, decentralized protocols have been introduced to let workers directly communicate with each other. These require either constant communication between workers or a rigid (e.g. ring-like) topology and many communication rounds to efficiently learn Lian et al. (2018); Luo et al. (2020); Zhou et al. (2020). Most decentralized learning practices have evolved to or are variations of gossip learning, where devices exchange and average (parts of) their learned parameters with random others Hu et al. (2019); Savazzi et al. (2020); Hegedűs et al. (2021); Danner and Jelasity (2018); Koloskova et al. (2019).

3 A Peer-to-Peer Graph Neural Network

3.1 Communication Protocol

In Section 1 we argued that, if communication links between peer-to-peer nodes correspond to real-world relations, GNNs can improve node classification accuracy. At the same time, peer-to-peer networks often suffer from node churn, power usage constraints, as well as virtual or physical device mobility that cumulatively make communication channels between nodes irregularly-available. In this work, we assume that linked nodes keep communicating over time without removing links or introducing new ones, though links can become temporarily inactive. We expect this assumption to hold true in social networks that evolve slowly, i.e., in which user interactions are many times more frequent than link changes, which from the perspective of link mining can be viewed as static relational graphs.

To provide a framework in which peer-to-peer nodes learn to classify themselves, we first specify a communication protocol of information exchanges. To do this, we consider static adjacency matrices with elements , where links indicate communication channels of uncertain availability. These matrices are not fully observable by decentralized nodes (the latter are at most aware of their corresponding rows and columns), but would be the input to centralized GNNs. Uncertainty is encoded with time-evolving communication matrices with non-zero elements indicating exchanges through the corresponding links:

To simplify the rest of our analysis, and without loss of generality, we adopt a discrete notion of time that orders the sequence of communication events. We stress that real-world time intervals between consecutive timestamps could vary.

To exchange information through channels represented by time-evolving communication matrices, we use the broadly popular send-receive-acknowledge communication protocol: devices (in our case, these are nodes of the peer-to-peer network) are equipped with identifiers and operations , and that respectively implement message generation, receiving message callbacks that generate new message to send back and acknowledging that sent messages have been received. Expected usage of these operations is demonstrated in Algorithm 1 in the form of a simulation.

Inputs: devices with identifiers , time-evolving
for  do
     for all  such that  do
         message .send(v.id)
         reply .receive(u.id, message)
         .acknowledge(.id, reply)      
Algorithm 1 Send-Receive-Acknowledge protocol

3.2 GNN Architecture

Let base classifiers with parameters be trained on a set of labeled nodes to produce estimations of one-hot label encodings from feature vectors , i.e. is the predicted label for each node with feature vector . Regardless of how classifiers are trained, peer-to-peer network links can be leveraged to improve predictions. For neural networks, this is achieved by transforming them to GNNs that incorporate graph convolutions in multilayer parameter-based transformations both during training and during predictions.

Unfortunately, graph convolutions smooth latent representations across neighbor nodes but uncertain availability means that there are only two realistic options to implement smoothing in peer-to-peer networks: either a) the last retrieved representations are used, or b) node features and links from many hops away are stored locally for in-device computation of graph convolutions. In the first case, convergence to equivalent centralized model parameters could be slow, since learning would impact neighbor representations only during communication. In the second case, multilayer architectures aiming to broaden node receptive fields from many hops away would end up storing most network links and node features in each node.

To avoid these shortcomings, we build on the decoupled GNNs outlined in Subsection 2.1, which separate the challenges of training base classifiers with leveraging network links to improve predictions. To mathematically manipulate decoupled GNN primitives, we organize base predictions of node features , where rows hold the features of nodes , into matrices with rows holding the predictions of respective feature rows . Predicted classes would be obtained by . If classifiers are trained for the features and labels of node sets , we build on the FDiff-scale decoupled GNN’s description Huang et al. (2020), which we transcribe as:

(1)

where is the unit matrix, masked adjacency matrices with elements (these are not symmetric) prevent diffusion from affecting training nodes, , is a diagonal matrix of node degrees and ,

are hyperparameters. The above formula comprises two graph diffusion operations of the following form:

(2)

The operation performed first (the one inside the parenthesis) sets so that only the personalization of training nodes is diffused through the graph. The second operation sets and becomes equivalent to constraining the personalized PageRank scheme Page et al. (1999); Tong et al. (2006) with normalized communication matrix so that it preserves original node predictions assigned to training nodes . Effectively, it is equivalent to restoring training node scores after each power method iteration , where each iteration step is a specific type of graph convolution. The representations to be diffused by the two operations are training node errors and a trade-off between diffused errors and node predidictions respectively.

We stress that, although the above-described decoupled GNN architecture already exists in the literature, supporting its diffusion operation in peer-to-peer networks requires the analysis we present in the rest of this section.

3.3 Peer-to-Peer Personalized PageRank

If matrix row additions are atomic node operations, implementing the graph diffusion of Equation 1 in peer-to-peer networks with uncertain availability is reduced to implementing two versions of Equation 2’s constrained personalized PageRank; one with parameters and one with parameters . Thus, we focus on implementing personalized PageRank variations.

Previous works have computed personalized (for which columns are normalized vectors of ones) or non-personalized PageRank in peer-to-peer networks by letting peers hold fragments of the network spanning multiple nodes and merging these when peers communicate Parreira et al. (2006); Zhang et al. (2020); Bahmani et al. (2010); Hu and Lau (2012). Unlike these, in our setting peers coincide with nodes and merging network fragments could require untenable bandwidths that grow proportionally to network size as merged sub-networks are continuously exchanged between peers. Instead, we devise a new computational scheme that is lightweight in terms of communication.

Iterative synchronized convolutions require node neighbor representations at intermediate steps. However, an early work by Lubachevsky and Mitra Lubachevsky and Mitra (1986) showed that, for non-personalized PageRank, decentralized schemes holding local estimations of earlier-computed node scores (or, in the case of graph diffusion, vectors) converge at the same point as centralized ones as long as communication intervals are bounded. This motivates us to similarly iterate personalized PageRank by using the last communicated neighbor representations to update local nodes. In this subsection we mathematically describe this scheme and show that it converges to the same point as its centralized equivalent with a linear rate (which corresponds to an exponentially degrading error) and even if personalization evolves over time but still converges at near-linear rates. Notably, keeping older representations is not a viable solution to calculate graph convolutions when these are entangled with representation transformations, but employing decoupled GNNs lets us separate learning from diffusion.

To set up a decentralized implementation of personalized PageRank, we introduce a theoretical construct we dub decentralized graph signals that describes decentralized operations in peer-to-peer networks while accounting for personalization updates over time, in case these are trained while being diffused. Our structure is defined as matrices with multidimensional vector elements (in our case is the number of classes) that hold in devices the estimate of device representations. Rows are stored on devices and only cross-column operations are impacted by communication constraints.

We now consider a scheme that updates decentralized graph signals at times per the rules:

(3)

where are time-evolving representations of nodes . The first of the above equations describes node representation exchanges between devices based on the communication matrix, whereas the second one performs a local update of personalized PageRank estimation given the last updated neighbor estimation that involves only data stored on devices . Then, Theorem 1 shows that the main diagonal of the decentralized graph signal deviates from the desired node representations with an error that converges to zero mean with linear rate. This weak convergence may not perfectly match centralized diffusion. However, it still guarantees that the outcomes of the two correlate in large part correlate.

Theorem 1.

If is bounded and converges in distribution with linear rate, the elements of

are independent discrete random variables with fixed means and

then converges in distribution to satisfying Equation 2 with linear rate.

Proof.

Without loss of generality, we will show this result for , for which , as more training nodes only add constraints to the diffusion scheme that force it to converge faster.

Let us denote as and vectors with elements and , where

is the expected value operation. Then, since the communication rate is fixed and an independent random variable for each edge, it holds that:

which, for a communication matrix and yields the solution

that is unique, given that for eigenvalues

of when it holds that (given the properties of doubly stochastic and Markovian matrices) and hence the corresponding eigenvalues of become , which make it invertible. Hence, the unique solution necessarily coincides with the solution .

For the same quantities, we can see that the convergence rate would be equal to or faster than the convergence rate if all communications took place with probability

where is the communication matrix. Thus, we consider a communication matrix whose non-zero elements are from with probability and analyse the latter to find the slowest possible convergence rate. In this setting, we obtain the recursive formula:

where . Thus, denoting as the spectral radius of , where is the spectral radius of the matrix it holds that:

where is the convergence rate of . Thus, for and , we calculate the behavior as to obtain the linear convergence rate . ∎

Algorithm 2 realizes Equation 1 for peer-to-peer network nodes that communicate with social neighbors under the Send-Receive-Acknowledge protocol. We implement the protocol’s operations, node initialization given prediction vectors and target labels and the ability to update predictions. Nodes are initialized per initialize(, ), where the last argument is a vector of zeroes for non-training nodes. We implement graph diffusion with decentralized graph signals predictions and errors, where the former uses the outcome of the latter. Predictions -that is, the decentralized graph signal’s main diagonal- are stored in .prediction. There are two hyper-parameters: that determines the diffusion rate and that trades-off errors and predictions. Importantly, given linear or faster convergence rates for base classifier predictions, Theorem 1 yields linear convergence in distribution for errors and hence for the in-code variable combined of each node. Therefore, from the same theorem, predictions also converges linearly in distribution.

procedure initialize(prediction, target)
     .predictions Map()
     .errors Map()
     .target target
     .update(prediction)
procedure update(prediction)
     .base_prediction prediction
     .prediction prediction
     if  then
         .error (prediction target)      
procedure receive(.id, message)
     message .send(.id)
     .acknowledge(.id, message)
     return message
procedure send(.id)
     return ,
procedure acknowledge(.id, message)
     prediction, error message
     .predictions[.id] .prediction
     .errors[.id] .error
     .predictions[.id] prediction
     .errors[.id] error
     if  then
         .error .errors.values()
         combined .base_prediction.error
     else
         combined .base_prediction      
     .prediction combined
     .predictions.values()
Algorithm 2 p2pGNN operations at devices

4 Experiments

4.1 Datasets and Simulation

To compare the ability of peer-to-peer learning algorithms to make accurate predictions, we experiment on three datasets that are often used to assess the quality of GNNs Shchur et al. (2018)

; the Citeseer

Namata et al. (2012)

, Cora

Sen et al. (2008) and Pubmed Namata et al. (2012) social graphs. Pre-processed versions of these are retrieved from the programming interface of the publicly available Deep Graph Library Wang et al. (2019) and comprise the node features, class labels and training-validation-test splits summarized in Table 1. In practice, the class labels of training and validation nodes would have been manually provided by respective devices (e.g. submitted by their users) and would form the ground truth to train base models.

Dataset Nodes Links Node Features Class Labels Training Validation Test
Citeseer 3,327 9,228 3,703 6 120 500 1,000
Cora 2,708 10,556 1,433 7 140 500 1,000
Pubmed 19,717 88,651 500 3 60 500 1,000
Table 1: Dataset details

We use these datasets to simulate peer-to-peer networks with the same nodes and links as in the dataset graphs and fixed probabilities for communication through links at each time step, uniformly sampled from the range . To speed up experiments, we further force nodes to participate in only one communication at each time step by randomly determining which edges to ignore when conflicts arise; this way, we are able to use threading to parallelize experiments by distributing time step computations between available CPUs (this is independent of our decentralized setting and its only purpose is to speed-up simulations). We obtain classification accuracy of test labels after 1000 time steps (all algorithms converge well within that number) and report its average across five experiment repetitions. Similar results are obtained for communication rates sampled from different range intervals. Experiments are available online222https://github.com/MKLab-ITI/decentralized-gnn Apache License, Version 2.0. and were conducted on a machine running Python 3.6 with 64GB RAM (they require at least 12GB available to run) and 32x1.80GHz CPUs.

4.2 Base Classifiers

Experiments are conducted on three base classifiers:

0.5em1MLP

– A multilayer perceptron often employed by GNNs

Klicpera et al. (2018); Huang et al. (2020). This consists of a dense two-layer architecture starting from a transformation of node features into

-dimensional representations activating ReLU outputs and an additional dense transformation of the latter whose softmax aims to predict one-hot encodings of classification labels.

0.5em1LR

– A simple multilabel logistic regression classifier whose softmax aims to predict one-hot encodings of classification labels.

0.5em1Label – Classification that repeats training node labels. If no diffusion is performed, this provides random predictions for test nodes.

MLP and LR are trained towards minimizing the cross-entropy loss of known node labels with Adam optimizers Kingma and Ba (2014); Bock et al. (2018). We set learning rates to respectively, which is often used for training on similarly-sized datasets, and maintain the default momentum parameters proposed by the optimizer’s original publication. For MLP, we use dropout for the dense layer to improve robustness and for all classifies we L2-regularize dense layer weights with penalty. We do not perform hyperparameter tuning, as further protocols would be needed to make peer-to-peer nodes learn a common architecture optimal for a set of validation nodes. For FDiff-scale hyperparameters, we select a personalized PageRank restart probability often used for graphs of several thousand nodes and error scale parameter

, where the latter is selected so that it theoretically satisfies a heuristic of perfectly reconstructing the class labels of training nodes.

4.3 Compared Approaches

We experiment with the following two versions of MLP and LR classifiers, which differ with respect to whether they are pre-trained and deployed to nodes or learned via gossip averaging. In total, experiments span 2 MLP + 2 LR + Label = 7 base classifiers.

0.5em1Pre-trained – Training classifiers on the training node set in a centralized architecture over epochs. For faster training, we perform early stopping if validation set loss has not decreased for epochs. In practice, this version of classifiers could take the form of a service (e.g. a web service) that trains classifiers from submitted device labels and hosts the result for all devices to retrieve.

0.5em1Gossip – Fully decentralized gossip averaging, where each node holds a copy of the base classifier and parameters are set to the average between pairs of communicating nodes. Since no stopping criterion can be enforced, both training and validation nodes contribute to training the (fragments of) the base classifier. The simulated devices corresponding to those nodes perform a gradient step on a local instance of the Adam optimizer every time they are involved in a communication. If training data were independent and identically distributed and a structured topology was forced upon devices, this approach could be considered a state-of-the-art baseline in terms of accuracy, as indicated by the theoretical analysis of Koloskova et al. Koloskova et al. (2019) and experiment results of Niwa et al. Niwa et al. (2020). However, our setting involves an unstructured peer-to-peer topology, where nodes are connected based on homophily and the efficacy of this practice is uncertain. We also consider the Label classifier as natively Gossip, as it does not require any centralized infrastructure.

For all base classifiers, we report: a) their vanilla accuracy, b) the accuracy of passing base predictions through the FDiff-scale scheme of Equation 1, as implemented via p2pGNN operations presented in Algorithm 2, and c) the accuracy of passing base predictions through a centralized implementation of FDiff-scale with the same hyperparameters. For the last scheme, we report diffusion improvements only for pre-trained models, as it makes little sense to perform decentralized training by centralized diffusion.

Finally, given that training does not depend on diffusion, we perform the latter by considering both training and validation node labels as known information. That is, both types of nodes form the set of of our analysis. Ideally, p2pGNN would leverage the homophilous node communications to improve base accuracy and tightly approximate fully-centralized predictions. In this case, it would become a decentralized equivalent to centralized diffusion that works under uncertain communication availability and does not expose predictive information of devices to devices other than communicating graph neighbors.

4.4 Results

In Table 2

we compare the accuracy of base algorithms vs. their augmented predictions with the decentralized p2pGNN and a fully centralized implementation of FDiff-scale. We remind that the last two schemes implement the same architecture and differ only on whether they run on peer-to-peer networks or not respectively. We can see that, in case of pre-trained base classifiers, p2pGNN diffusion successfully improves their accuracy scores by wide margins, i.e. 7%-47% relative increase. In fact, the improved scores closely resemble the ones of centralized diffusion, i.e. with less than 3% relative decrease, for the Citeseer and Cora datasets. In these cases, we consider our peer-to-peer diffusion algorithm to have successfully decentralized diffusion components. On the Pubmed dataset, centralized schemes are replicated less tightly (this also holds true for simple Label propagation), but there is still substantial improvement compared to pre-trained base classifiers.

On the other hand, results are mixed when we consider base classifiers trained via gossip averaging. Before further exploration, we remark that MLP and LR outperform their pre-trained counterparts in large part due to a combination of training with larger sets of node labels (both training and validation nodes) and “leaking” the graph structure into local classifier fragment parameters due to non-identically distributed node class labels. However, after diffusion is performed, accuracy does not reach the same levels as pre-trained base classifiers—in fact, in the Citesser and Cora datasets homophilous parameter training reduces the diffusion of classifier fragment parameters to the diffusion of class labels. This indicates that classifier fragments tend to correlate node features with graph structure and hence additional diffusion operations do not provide only new information. Characteristically, the linear nature of LR makes its base gossip-trained and p2pGNN versions near-identical. Since this issue systemically arises from gossip training shortcomings, we leave its mitigation to future research.

Base p2pGNN Fully Centralized GNN
Citeseer Cora Pubmed Citeseer Cora Pubmed Citeseer Cora Pubmed
Pre-trained
MLP 52.3% 54.9% 70.9% 67.8% 81.5% 76.0% 69.0% 84.0% 81.2%
LR 59.4% 58.7% 72.2% 70.5% 82.0% 77.3% 70.3% 85.7% 81.5%
Gossip
MLP 63.1% 66.3% 74.9% 61.3% 80.8% 78.0% - - -
LR 61.8% 79.9% 78.7% 61.4% 80.8% 78.7% - - -
Labels 15.9% 11.6% 22.0% 61.1% 80.8% 71.5% 61.5% 78.9% 78.6%
Table 2: Comparing the accuracy of different types and training schemes of base algorithms and their combination with the diffusion of p2pGNN. Accuracy is computed after time steps and averaged across peer-to-peer simulation runs.

Overall, experiment results indicate that, in most cases, p2pGNN successfully applies GNN principles to improve base classifier accuracy. Importantly, although neighbor-based gossip training of base classifiers on both training and validation nodes outperforms models pre-trained on only training nodes (in which case validation nodes are used for early stopping), decentralized graph diffusion of the latter exhibits the highest accuracy across most combinations of datasets and base classifiers.

5 Conclusions and Future Work

In this work, we investigated the problem of classifying the nodes of unstructured peer-to-peer networks under communication uncertainty and proposed that homophilous communication links can be mined with decoupled GNN diffusion to improve base classifier accuracy. We thus introduced a decentralized implementation of diffusion, called p2pGNN, whose fragments run on decentralized devices and mine network links as irregular peer-to-peer communication takes place. Theoretical analysis and experiments on three simulated peer-to-peer networks from labeled graph data showed that combining pre-trained (and often gossip-trained) base classifiers with our approach successfully improves their accuracy at similar degrees to fully centralized decouple graph neural networks.

For future work, we aim to improve gossip training to let it account for the non-identically distributed spread of data across graph nodes. We are also interested in addressing privacy concerns and societal biases in our approach and explore automated hyperparameter selection.

Acknowledgements

This work was partially funded by the European Commission under contract numbers H2020-951911 AI4Media and H2020-825585 HELIOS.

References

  • B. Bahmani, A. Chowdhury, and A. Goel (2010) Fast incremental and personalized pagerank. arXiv preprint arXiv:1006.2880. Cited by: §3.3.
  • G. Berry, A. Sirianni, I. Weber, J. An, and M. Macy (2020) Going beyond accuracy: estimating homophily in social networks using predictions. arXiv preprint arXiv:2001.11171. Cited by: §1.
  • S. Bock, J. Goppold, and M. Weiß (2018) An improvement of the convergence proof of the adam-optimizer. arXiv preprint arXiv:1804.10587. Cited by: §4.2.
  • M. Chen, Z. Wei, Z. Huang, B. Ding, and Y. Li (2020) Simple and deep graph convolutional networks. In International Conference on Machine Learning, pp. 1725–1735. Cited by: §2.1.
  • G. Danner and M. Jelasity (2018) Token account algorithms: the best of the proactive and reactive worlds. In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pp. 885–895. Cited by: §2.2.
  • I. Hegedűs, G. Danner, and M. Jelasity (2021) Decentralized learning works: an empirical comparison of gossip learning and federated learning. Journal of Parallel and Distributed Computing 148, pp. 109–124. Cited by: §2.2.
  • C. Hu, J. Jiang, and Z. Wang (2019) Decentralized federated learning: a segmented gossip approach. arXiv preprint arXiv:1908.07782. Cited by: §2.2.
  • P. Hu and W. C. Lau (2012) Localized algorithm of community detection on large-scale decentralized social networks. arXiv preprint arXiv:1212.6323. Cited by: §3.3.
  • Q. Huang, H. He, A. Singh, S. Lim, and A. R. Benson (2020) Combining label propagation and simple models out-performs graph neural networks. arXiv preprint arXiv:2010.13993. Cited by: §2.1, §2.1, §3.2, §4.2.
  • D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.2.
  • J. Klicpera, A. Bojchevski, and S. Günnemann (2018) Predict then propagate: graph neural networks meet personalized pagerank. arXiv preprint arXiv:1810.05997. Cited by: §2.1, §4.2.
  • A. Koloskova, S. Stich, and M. Jaggi (2019) Decentralized stochastic optimization and gossip algorithms with compressed communication. In International Conference on Machine Learning, pp. 3478–3487. Cited by: §1, §2.2, §4.3.
  • H. Li, K. Ota, and M. Dong (2018)

    Learning iot in edge: deep learning for the internet of things with edge computing

    .
    IEEE network 32 (1), pp. 96–101. Cited by: §1.
  • X. Lian, W. Zhang, C. Zhang, and J. Liu (2018)

    Asynchronous decentralized parallel stochastic gradient descent

    .
    In International Conference on Machine Learning, pp. 3043–3052. Cited by: §2.2.
  • W. Y. B. Lim, N. C. Luong, D. T. Hoang, Y. Jiao, Y. Liang, Q. Yang, D. Niyato, and C. Miao (2020) Federated learning in mobile edge networks: a comprehensive survey. IEEE Communications Surveys & Tutorials 22 (3), pp. 2031–2063. Cited by: §1.
  • B. Lubachevsky and D. Mitra (1986) A chaotic asynchronous algorithm for computing the fixed point of a nonnegative matrix of unit spectral radius. Journal of the ACM (JACM) 33 (1), pp. 130–150. Cited by: §3.3.
  • Q. Luo, J. He, Y. Zhuo, and X. Qian (2020) Prague: high-performance heterogeneity-aware asynchronous decentralized training. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 401–416. Cited by: §2.2.
  • B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas (2017) Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pp. 1273–1282. Cited by: §2.2.
  • M. McPherson, L. Smith-Lovin, and J. M. Cook (2001) Birds of a feather: homophily in social networks. Annual review of sociology 27 (1), pp. 415–444. Cited by: §1.
  • G. Namata, B. London, L. Getoor, B. Huang, and U. EDU (2012) Query-driven active surveying for collective classification. In 10th International Workshop on Mining and Learning with Graphs, Vol. 8. Cited by: §4.1.
  • K. Niwa, N. Harada, G. Zhang, and W. B. Kleijn (2020) Edge-consensus learning: deep learning on p2p networks with nonhomogeneous data. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 668–678. Cited by: §4.3.
  • L. Page, S. Brin, R. Motwani, and T. Winograd (1999) The pagerank citation ranking: bringing order to the web.. Technical report Stanford InfoLab. Cited by: §3.2.
  • J. X. Parreira, D. Donato, S. Michel, and G. Weikum (2006) Efficient and decentralized pagerank approximation in a peer-to-peer web search network. In Proceedings of the 32nd international conference on Very large data bases, pp. 415–426. Cited by: §3.3.
  • S. Savazzi, M. Nicoli, and V. Rampa (2020) Federated learning with cooperating devices: a consensus approach for massive iot networks. IEEE Internet of Things Journal 7 (5), pp. 4641–4654. Cited by: §2.2.
  • P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad (2008) Collective classification in network data. AI magazine 29 (3), pp. 93–93. Cited by: §4.1.
  • O. Shchur, M. Mumme, A. Bojchevski, and S. Günnemann (2018) Pitfalls of graph neural network evaluation. arXiv preprint arXiv:1811.05868. Cited by: §4.1.
  • H. Tong, C. Faloutsos, and J. Pan (2006) Fast random walk with restart and its applications. In Sixth international conference on data mining (ICDM’06), pp. 613–622. Cited by: §3.2.
  • J. Wang, B. Cao, P. Yu, L. Sun, W. Bao, and X. Zhu (2018) Deep learning towards mobile applications. In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pp. 1385–1393. Cited by: §1.
  • M. Wang, D. Zheng, Z. Ye, Q. Gan, M. Li, X. Song, J. Zhou, C. Ma, L. Yu, Y. Gai, et al. (2019) Deep graph library: a graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315. Cited by: §4.1.
  • Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip (2020) A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems. Cited by: §2.1.
  • X. Zhang, J. You, H. Xue, and J. Wang (2020) A decentralized pagerank based content dissemination model at the edge of network. International Journal of Web Services Research (IJWSR) 17 (1), pp. 1–16. Cited by: §3.3.
  • P. Zhou, Q. Lin, D. Loghin, B. C. Ooi, Y. Wu, and H. Yu (2020) Communication-efficient decentralized machine learning over heterogeneous networks. arXiv preprint arXiv:2009.05766. Cited by: §2.2.