1 Introduction
Graph Neural Networks (GNNs), especially those using convolutional methods, have become a popular computational model for graph data analysis as the highperformance computing systems blossom during the last decade. One of wellknown methods in GNNs is Graph Convolutional Networks (GCNs) [22]
, which learn a highorder approximation of a spectral graph by using convolutional layers followed by a nonlinear activation function to make the final prediction. Like most of the deep learning models, GCNs incorporate complex structures with costly training and testing process, leading to significant power consumption. It has been reported that the computation resources consumed for deep learning have grown
fold from 2012 to 2018 [1]. The high energy consumption, when further coupled with sophisticated theoretical analysis and blurred biological interpretability of the network, has resulted in a revival of effort in developing novel energyefficient neural architectures and physical hardware.Inspired by the brainlike computing process, Spiking Neural Networks (SNNs) formalize the event or clockdriven signals as inference for a set of parameters to update the neuron nodes [2]. Different from conventional deep learning models that communicate information using continuous decimal values, SNNs perform inexpensive computation by transmitting the input into discrete spike trains. Such a biofidelity method can perform a more intuitive and simpler inference and model training than traditional networks [29]
. Another distinctive merit of SNNs is the intrinsic power efficiency on the neuromorphic hardware, which is capable of running 1 million neurons and 256 million synapses with only 70 mW energy cost
[30]. Nevertheless, employing SNNs as an energyefficient architecture to process graph data as effectively as GCNs still faces fundamental challenges.Challenges: (i) Spike representation. Despite the promising results achieved on common tasks (e.g., image classification), SNN models are not trivially portable to nonEuclidean domains, such as graphs. Given the graph datasets widely used in many applications (e.g., citation networks and social networks), how to extract the graph structure and transfer the graph data into spike trains poses a challenge. (ii) Model generalization. GCNs can be extended to diverse circumstances by using deeper layers. Thus, it is essential to further extend the SNNs to a wider scope of applications where graphs are applicable. (iii) Energy efficiency.
Except for the common metrics like accuracy or prediction loss in artificial neural networks (ANNs), the energy efficiency of SNNs on the neuromorphic chips is an important characteristic to be considered. However, neuromorphic chips are not as advanced as contemporary GPUs, and the lack of uniform standards also impacts the energy estimation on different platforms.
To tackle these fundamental challenges, we introduce Spiking Graph Neural Network (SpikingGCN): an endtoend framework that can properly encode graphs and make a prediction for nontrivial graph datasets that arise in diverse domains. To our best knowledge, SpikingGCN is the firstever SNN designed for node classification in graph data
, and it can also be extended into more complex neural network structures. Overall, our main contribution is threefold: (i) We propose SpikingGCN, the first endtoend model for node classification in SNNs, without any pretraining and conversion. The graph data is transformed into spike trains by a spike encoder. These generated spikes are used to predict the classification results. (ii) We show that the basic model inspired by GCNs can effectively merge the convolutional features into spikes and achieve competitive predictive performance. In addition, we further evaluate the performance of our model for active learning and energy efficient settings; (iii) We extend our framework to enable more complex network structures for different tasks, including image graph classification and rating predictions in recommender systems. The extensibility of the proposed model also opens the gate to perform SNNbased inference and training in various kinds of graphbased data. The code and Appendix are available on Github
^{3}^{3}3https://github.com/ZulunZhu/SpikingGCN.git.2 Spiking Graph Neural Networks
Graphs are usually represented by a nonEuclidean data structure consisting of a set of nodes (vertices) and their relationships (edges). The reasoning process in the human brain depends heavily on the graph extracted from daily experience [48]. However, how to perform biologically interpretable reasoning for the standard graph neural networks has not been adequately investigated. Thus, the proposed SpikingGCN aims to address challenges of semisupervised node classification in a biological and energyefficient fashion. As this work refers to the methods in GNNs and SNNs, we list the frequently used notations in Table 8 in Appendix.
Graph neural networks (GNNs) conduct propagation guided by the graph structure, which is fundamentally different from existing SNN models that can only handle relatively simple image data. Instead of treating the single node as the input of an SNN model, the states of their neighborhood should also be considered. Let formally denote a graph, where is the node set and represents the adjacent matrix. Here is the number of nodes. The entire attribute matrix
includes the vectors of all nodes
. The degree matrix consists of the rowsum of the adjacent matrix , where denotes the edge weight between nodes and . Each node has dimensions. Our goal is to conduct SNN inference without neglecting the relationships between nodes.Inference in SNN models is commonly conducted through the classic Leaky IntegrateandFire (LIF) mechanism [13]. Given the membrane potential at time step , the time constant , and the new presynaptic input , the membrane potential activity is governed by:
(1) 
where is the signed reset voltage. The left differential item is widely used in the continuous domain, but the biological simulation in SNNs requires the implementation to be executed in a discrete and sequential way. Thus, we approximate the differential expression using an iterative version to guarantee computational availability. Updating using the input of our network, we can formalize (1) as:
(2) 
To tackle the issue of feature propagation in an SNN model, we consider a spike encoder to extract the information in the graph and output the hidden state of each node in the format of spike trains. As shown in Fig. 1, the original input graph is transformed into the spikes from a convolution perspective. To predict the labels for each node, we consider a spike decoder and treat the final spike rate as a classification result.
Graph convolution.
The pattern of graph data consists of two parts: topological structure and node’s own features, which are stored in the adjacency and attribute matrices, respectively. Different from the general processing of images with singlechannel pixel features, the topological structure will be absent if only the node attributes are considered. To avoid the performance degradation of attributesonly encoding, SpikingGCN utilizes the graph convolution method inspired by GCNs to incorporate the topological information. The idea is to use the adjacency relationship to normalize the weights, thus nodes can selectively aggregate neighbor attributes. The convolution result, i.e., node representations, will serve as input to the subsequent spike encoder. Following the propagation mechanism of GCN [22] and SGC [44], we form the new node representation utilizing the attributes of each node and its local neighborhood:
(3) 
Here, we can express the attribute transformation over the entire graph by:
(4) 
where is the adjacent matrix with added selfconnection, is the graph convolution layer number and is the degree matrix of . Similar to the simplified framework as SGC, we drop the nonlinear operation and focus on the convolutional process on the entire graph. As a result, (4) acts as the only convolution operation in the spike encoder. While we incorporate the feature propagation explored by GCN and SGC, we would like to further highlight our novel contributions. First, our original motivation is to leverage an SNNsbased framework to reduce the inference energy consumption of graph analysis tasks without performance degradation. GCN’s effective graph Laplacian regularization approach allows us to minimize the number of trainable parameters and perform efficient inference in SNNs. Second, convolutional techniques only serve as the initial building block of SpikingGCN. More significantly, SpikingGCN is designed to accept the convolutional results in a binary form (spikes), and further detect the specific patterns among these spikes. This biological mechanism makes it suitable to be deployed on a neuromorphic chip to improve energy efficiency.
Representation encoding.
The representation
consists of continuous floatpoint values, but SNNs accept discrete spike signals. A spike encoder is essential to take node representations as input and output spikes for the subsequent procedures. We propose to use a probabilitybased Bernoulli encoding scheme as the basic method to transform the node representations to the spike signals. Let
and denote the spikes before the fully connected layers’ neurons at the th time step and the th feature in the new representation for node , respectively. Our hypothesis is that the spiking rate should keep a positive relationship with the importance of patterns in the representations. In probabilitybased Bernoulli encoder, the probability to fire a spike by each feature is related to the value of in node representation as following:(5) 
Here, with denotes a presynaptic spike, which takes a binary value (0 or 1). Note that derived from the convolution of neighbors is positively correlated with the feature significance. The larger the value, the greater the chance of a spike being fired by the encoder. Since the encoder generates the spike for each node on a tiny scale, we interpret the encoding module as a sampling process of the entire graph. In order to fully describe the information in the graph, we use time steps to repeat the sampling process. It is noteworthy that the number of time steps can be defined as the resolution of the message encoded.
Charge, fire and reset in SpikingGCN.
The following module includes the fully connected layer and the LIF neuron layer. The fully connected layer takes spikes as input and outputs voltages according to trainable weights. The voltages charge LIF neurons and then conduct a series of actions, including fire spikes and reset the membrane potential.
Potential charge. General deep SNN models adopt a multilayer network structure including linear and nonlinear counterparts to process the input [5]. Following SGC’s assumption, the depth in deep SNNs is not critical to predict unknown labels on the graph [44]. Thus, we drop redundant modules except for the final linear layer (fully connected layer) to simplify our framework and increase the inference speed. We obtain the linear summation as the input of SNN structure in (2). The LIF model includes the floatingpoint multiplication of the constant , which is not biologically plausible. To address this challenge and avoid the additional hardware requirement when deployed on neuromorphic chips, we calculate the factor as and incorporate the constant into the synapse parameters , then simplify the equation as:
(6) 
Fire and reset. In a biological neuron, a spike is fired when the accumulated membrane potential passes a spiking threshold . In essence, the spikes after the LIF neurons are generated and increase the spike rate in the output layer. We adopt the Heaviside function:
(7) 
to simulate the fundamental firing process. As shown in Fig. 2, which demonstrates our framework, time spike trains for each node are generated from the three LIF neurons. Neurons sum the number of spikes and then divide it by to get the firing rate of each individual. For instance, for the example nodes from ACM datasets, we get neurons’ firing rates as: , the true label: , then the loss in training process is MSE(, ). If it is in the testing phase, the predicted label would be the neuron with the max firing rate, i.e., 2 in this example.
Negative voltages would not trigger spikes, but these voltages contain information that (7) ignores. To compensate for the negative term, we propose to use a negative threshold to distinguish the negative characteristics of the membrane potential. Inspired by [21], we adjust the Heaviside activation function after the neuron nodes as follows:
(8) 
where
is the hyperparameter that determines the negative range. In the context of biological mechanisms, we interpret the fixed activation function as an excitatory and inhibitory processes in the neurons. When capturing more information and firing spikes in response to various features, this more biologically reasonable modification also improves the performance of our model on classification tasks. The whole process is detailed by Algorithm 1 in Appendix B.
In biological neural systems, after firing a spike, the neurons tend to rest their potential and start to accumulate voltage again. We reset the membrane potential:
(9) 
Gradient surrogate.
One of the most significant obstacles for SNN’s training is the nondifferentiable nature of activation function (7). In that case, the backpropagation algorithm can not be employed directly during the training phase. Inspired by [34]
, the sigmoid function is adopted to approximate the training fire phase when executing the backpropagation and stochastic gradient descent operation. Thus, we formalize the surrogate function as follow:
(10) 
where can measure the approximation degree. When we train the model, the backpropagation is feasible and the gradient of the Heaviside function can be replaced as followed:
(11) 
Model feasibility analysis
Since the input spikes can be viewed as a rough approximation of original convolutional results in the initial graph, two key questions remain: (i) does this proposed method really work for the prediction task? (ii) how to control the information reduction of the sampled spikes compared with other GNN models? It turns out that although equation (6) shows a rough intuitive approximation of the graph input using trainable linear combination, it cannot fully explain why SpikingGCN can achieve comparable performance with other realvalue GNN models. Next, we will show that our spike representations is very close to the realvalue output with a high probability.
To explain why SpikingGCN can provide an accurate output using spike trains, let us list SGC ([44]) as the model for comparison. SGC adopts a similar framework to our model, which transfers the convolution result into a fully connected layer in a realvalue format. Given the parameters , the realvalue convolution result and the spike representations for one node, we have:
(12) 
Note that firing a spike in each dimension of a node is independent. When merging our case into a generalization of the Chernoff inequalities for binomial distribution in
[8], we derive the following estimation error bound.(13) 
(14) 
where . Note that is exactly the output of the SGC’s trainable layer, and is the output of SpikingGCN after a linear layer. By applying the upper and lower bounds, the failure probability will be at most . For question (i) mentioned earlier, we guarantee that our spike representations have approximation to SGC’s output with at least probability. For question (ii), we also employ the regularization and parameter clip operations during our experiments to control the and respectively, which can further help us optimize the upper and lower bounds.
3 Experiments
To evaluate the effectiveness of the proposed SpikingGCN, we conduct extensive experiments that focus on four major objectives: (i) semisupervised node classification on citation graphs, (ii) performance evaluation under limited training data in active learning, (iii) energy efficiency evaluation on neuromorphic chips, and (iv) extensions to other application domains. Due to the limitation of space, we leave the active learning experiments in Appendix C.2.
3.1 Semisupervised Node Classification
Datasets.
For node classification, we test our model on four commonly used citation network datasets: Cora, citeseer, ACM, and Pubmed
[42], where nodes and edges represent the papers and citation links. The statistics of the four datasets are summarized in Table 1. Sparsity refers to the number of edges divided by the square of the number of nodes.Datasets  Nodes  Edges  Attributes  Classes  Sparsity 

Cora  
ACM  %  
citeseer  
Pubmed 
Cora  ACM  citeseer  Pubmed  

Models  Split I Split II  Split I Split II  Split I Split II  Split I Split II 
GCN  
SGC  
FastGCN  
GAT  
DAGNN  
SpikingGCN  
SpikingGCNN 
Baselines. We implement our proposed SpikingGCN and the following competitive baselines: GCNs [22], SGC [44], FastGCN [6], GAT [40], DAGNN [26]. We also conduct the experiments on SpikingGCNN, a variant of SpikingGCN, which uses a refined Heaviside activation function (8) instead. For a fair comparison, we partition the data using two different ways. The first is the same as [46], which is adopted by many existing baselines in the literature. In this split method (i.e., Split I), 20 instances from each class are sampled as the training datasets. In addition, 500 and 1000 instances are sampled as the validation and testing datasets respectively. For the second data split (i.e., Split II), the ratio of training to testing is 8:2, and of training samples is further used for validation.
Table 2 summarizes the node classification’s accuracy comparison with the competing methods over four datasets. We show the best results we can achieve for each dataset and have the following key observations: SpikingGCN achieves or matches SOTA results across four benchmarks on these two different dataset split methods. It is worth noting that, when the dataset is randomly divided proportionally and SpikingGCN obtains enough data, it can even outperform the stateoftheart approaches. For example, SpikingGCNN outperforms DAGNN by over on citeseer dataset. The detailed discussion of the performance can be found in Appendix C.1.
3.2 Energy Efficiency on Neuromorphic Chips
To examine the energy efficiency of SpikingGCN, we propose two metrics: i) the number of operations required to predict a node on each model, and ii) the energy consumed by SpikingGCN on neuromorphic hardware versus other models on GPUs. In this experiment, only the basic SpikingGCN is conducted to evaluate the energy efficiency. The reason we omit SpikingGCNN is that the negative spikes cannot be implemented on neuromorphic hardware.
We note that training SNN models directly on neuromorphic chip is rarely explored ([38]). In that case, we employ the training phase on GPUs and estimate the energy consumption of test phase on neuromorphic hardware. More importantly, a specific feature of semisupervised on GNNs is that test data is also visible during the training process. Therefore, the convolutional part during the training covers the global graph. Then during the test phase, no MAC operation is required by our SNN model because all of the data has been processed on GPUs.
Estimating the computation overhead relies on operations in the hardware [30]. The operation unit of ANNs in contemporary GPUs is usually set to multiplyaccumulate (MAC), and for SNNs in the neuromorphic chip is the synaptic operation (SOP). Furthermore, SOP is defined as the change of membrane potential (i.e., voltages) in the LIF nodes, and specific statistics in the experiment refer to voltages’ changes during charge and fire processes. Following the quantification methods introduced in [17]
and ensuring the consistency between different network constraints, we compute the operations of baselines and SpikingGCN to classify one node. Table
4 shows that SpikingGCN has a significant operand reduction. According to the literature [16, 21], SOPs consume far less energy than MACs, which further highlights the energy efficiency of SpikingGCN.models  Cora  ACM  citeseer  Pubmed 

GCN  K  K  K  K 
SGC  K  K  K  K 
FastGCN  K  K  K  K 
GAT  K  K  K  M 
DAGNN  K  K  K  K 
SpikingGCN  K  K  K  K 
GCN on TITAN  
Power (W)  GFLOPS  Nodes  FLOPS  Energy (J) 
280  16,310  10,000  4.14E+09  0.07 
SpikingGCN on ROLLs  
Voltage (V)  Energy/spike (pJ)  Nodes  Spikes  Energy 
1.8  3.7  10,000  2.73E+07  1.01E04 
However, the energy consumption measured by SOPs may be biased, e.g., the zero spikes would also result in the voltage descending changes, which does not require new energy consumption in neuromorphic chips [18]. Hence, calculating energy cost only based on operations may result in an incorrect conclusion. To address this issue, we further provide an alternative estimation approach as follow. Neuromorphic designs could provide eventbased computation by transmitting onebit spikes between neurons. This characteristic contributes to the energy efficiency of SNNs because they consume energy only when needed [11]. For example, during the inference phase, the encoded sparse spike trains act as a lowprecision synapse event, which costs the computation memory once spikes are sent from a source neuron. Considering the above hardware characteristics and the deviation of SOPs in consumption calculation, we follow the spikebased approach utilized in [5]
and count the overall spikes during inference for 4 datasets, to estimate the SNN energy consumption. We list an example of energy consumption when inferring 10,000 nodes in the Pubmed dataset, as shown in Table
4.Applying the energy consumed by each spike or operation, in Appendix C.3, we visualize the energy consumption between SpikingGCN and GNNs when employed on the recent neuromorphic chip (ROLLS [18]) and GPU (TITAN RTX, 24G ^{4}^{4}4https://www.nvidia.com/enus/deeplearningai/products/), respectively. Fig. 6 shows that SpikingGCN could use remarkably less energy than GNNs when employed on ROLLs. For example, SpikingGCN could save about 100 times energy than GCN in all datasets. Note that different from GPUs, ROLLS is firstly introduced in 2015, and higher energy efficiency of SpikingGCN can be expected in the future.
3.3 Extension to other application domains
In the above experiments, we adopt a basic encoding and decoding process, which can achieve competitive performance on the citation datasets. However, some other graph structures like image graphs and social networks can not be directly processed using graph Laplacian regularization (i.e., [22, 44]). To tackle the compatibility issue, we extend our model and make it adapt to the graph embedding methods (i.e., [46]). Different from the graph Laplacian regularization methods like GCNs, the graph embedding methods always contain specific trainable parameters to incorporate the attributes in the graph structure. In this case, the Bernoulli encoder is unable to generate the spike trains, which perfectly represent the graph information. Taking the image graph as an example, we can see that the Bernoulli encoder cannot fully represent the pixels. Hence, the characteristics of the pixels’ local Euclidean neighborhoods must be aggregated. We propose a trainable spike encoder, to allow deeper SNNs for different tasks, including classification on grid images and superpixel images, and rating prediction in recommender systems. Limited by space, we leave the implementation detail to Appendix C.4.
Result on grid images.
To validate the performance of SpikingGCN on image graphs, we first apply our model to the MNIST dataset
[23]. The classification results of grid images on MNIST are summarized in Table 5. We choose several SOTA algorithms including ANN and SNN models, which work on MNIST datasets. The depth is calculated according to the layers including trainable parameters. Since we are using a similar network structure as the Spiking CNN [25], the better result proves that our clockdriven architecture is able to capture more significant patterns in the data flow. The competitive performance of our model on image classification also proves that SpikingGCN’s compatibility to different graph scenarios.Models  Type  Depth  Accuracy 

SplineCNN [12]  ANN  
LeNet5 [23]  ANN  
LISNN [7]  SNN  
Spiking CNN [25]  SNN  
SResNet [16]  SNN  
SpikingGCN (Ours)  SNN 
Results on superpixel images. We select the MNIST superpixel dataset [31] for the comparison with the grid experiment mentioned above. The results of the MNIST superpixel experiments are presented in Table 6. Since our goal is to prove the generalization of our model on different scenarios, we only use 20 time steps to conduct this subgraph classification task and achieve the mean accuracy of over 10 runs. It can be seen that SpikingGCN is readily compatible with the different convolutional methods of the graph and obtain a competitive performance through a biological mechanism.
Models  Accuracy 

ChebNet [9]  
MoNet [31]  
SplineCNN [12]  
SpikingGCN (Ours) 
Models  RMSE Score 

MC [3]  
GMC [20]  
GRALS [33]  
sRGCNN [32]  
GCMC [39]  
SpikingGCN (Ours) 
Test RMSE scores with MovieLens 100K datasets. Baselines numbers are taken from
[39].Results on recommender systems. We also evaluate our model with a rating matrix extracted from MovieLens 100K ^{5}^{5}5https://grouplens.org/datasets/movielens/ and report the RMSE scores compared with other matrix completion baselines in Table 7. The comparable loss indicates that our proposed framework can also be employed in recommender systems. Because the purpose of this experiment is to demonstrate the applicability of SpikingGCN in recommender systems, we have not gone into depth on the design of a specific spike encoder. We leave this design in the future work since it is not the focus of the current paper.
4 Conclusions
In this paper, we present SpikingGCN, a firstever biofidelity and energyefficient framework focusing on graphstructured data, which encodes the node representation and makes the prediction with less energy consumption. In our basic model for citation networks, the encoded spike trains are processed by a simple linear layer combined with a neuron layer. We conduct extensive experiments on node classification with four public datasets. Compared with other SOTA approaches, we demonstrate that SpikingGCN achieves the best accuracy with the lowest computation cost and muchreduced energy consumption. Furthermore, SpikingGCN also exhibits great generalization when confronted with limited data. In our extended model for more graph scenarios, SpikingGCN also has the potential to compete with the SOTA models on tasks from computer vision or recommender systems. Relevant results and discussions are presented to offer key insights on the working principle, which may stimulate future research on environmentally friendly and biological algorithms.
Acknowledgments
The research is supported by the KeyArea Research and Development Program of Guangdong Province (2020B010165003), the Guangdong Basic and Applied Basic Research Foundation (No. 2020A1515010831), the Guangzhou Basic and Applied Basic Research Foundation (No. 202102020881), the Tencent AI Lab RBFR2022017, and the Program for Guangdong Introducing Innovative and Entrepreneurial Teams (No. 2017ZT07X355). Qi Yu is supported in part by an NSF IIS award IIS1814450 and an ONR award N000141812875. The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing any funding agency.
References
 [1] (2020) Carbontracker: tracking and predicting the carbon footprint of training deep learning models. CoRR abs/2007.03051. Cited by: Appendix A, §1.
 [2] (2007) Simulation of networks of spiking neurons: a review of tools and strategies. J. Comput. Neurosci. 23 (3), pp. 349–398. Cited by: §1.
 [3] (2012) Exact matrix completion via convex optimization. Commun. ACM 55 (6), pp. 111–119. Cited by: Table 7.
 [4] (2016) An analysis of deep neural network models for practical applications. CoRR abs/1605.07678. Cited by: Appendix A.

[5]
(2015)
Spiking deep convolutional neural networks for energyefficient object recognition
. In IJCV 113 (1), pp. 54–66. Cited by: Appendix A, §2, §3.2.  [6] (2018) Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv:1801.10247. Cited by: Appendix A, §3.1.
 [7] (2020) LISNN: improving spiking neural networks with lateral interactions for robust object recognition. In IJCAI, pp. 1519–1525. Cited by: Table 5.
 [8] (2002) Connected components in random graphs with given expected degree sequences. Annals of combinatorics 6 (2), pp. 125–145. Cited by: §2.
 [9] (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In NIPS, pp. 3837–3845. Cited by: Table 6.
 [10] (2015) Unsupervised learning of digit recognition using spiketimingdependent plasticity. Frontiers Comput. Neurosci. 9, pp. 99. Cited by: Appendix A.
 [11] (2016) Convolutional networks for fast, energyefficient neuromorphic computing. Proceedings of the national academy of sciences 113 (41), pp. 11441–11446. Cited by: §3.2.
 [12] (2018) SplineCNN: fast geometric deep learning with continuous bspline kernels. In CVPR, pp. 869–877. Cited by: §C.4, Table 5, Table 6.
 [13] (2002) Spiking neuron models: single neurons, populations, plasticity. Cambridge university press. Cited by: §2.
 [14] (2020) LightGCN: simplifying and powering graph convolution network for recommendation. In SIGIR, pp. 639–648. Cited by: §C.4.
 [15] (2014) Predictive entropy search for efficient global optimization of blackbox functions. In NIPS, pp. 918–926. Cited by: §C.2.
 [16] (2018) Spiking deep residual network. arXiv:1805.01352. Cited by: Appendix A, §3.2, Table 5.
 [17] (2005) Floating point operations in matrixvector calculus. Cited by: §3.2.
 [18] (2015) Neuromorphic architectures for spiking deep neural networks. In IEDM, pp. 4–2. Cited by: Appendix A, §3.2, §3.2.

[19]
(2018)
Hybrid macro/micro level backpropagation for training deep spiking neural networks
. In NIPS, Cited by: Appendix A.  [20] (2014) Matrix completion on graphs. CoRR abs/1408.1717. Cited by: Table 7.
 [21] (2020) Spikingyolo: spiking neural network for energyefficient object detection. In AAAI, pp. 11270–11277. Cited by: Appendix A, Appendix A, §2, §3.2.
 [22] (2016) Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §1, §2, §3.1, §3.3.
 [23] (1998) Gradientbased learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §3.3, Table 5.
 [24] (2018) Training deep spiking convolutional neural networks with stdpbased unsupervised pretraining followed by supervised finetuning. Frontiers in neuroscience 12, pp. 435. Cited by: Appendix A.
 [25] (2016) Training deep spiking neural networks using backpropagation. CoRR abs/1608.08782. Cited by: §3.3, Table 5.
 [26] (2020) Towards deeper graph neural networks. In SIGKDD, pp. 338–348. Cited by: §3.1.
 [27] (2020) Abstract interpretation based robustness certification for graph convolutional networks. In 24th ECAI, Cited by: Appendix A.
 [28] (2013) optimality for active learning on gaussian random fields. In NIPS, pp. 2751–2759. Cited by: §C.2.
 [29] (1997) Networks of spiking neurons: the third generation of neural network models. Neural networks 10 (9), pp. 1659–1671. Cited by: §1.
 [30] (2014) A million spikingneuron integrated circuit with a scalable communication network and interface. Science 345 (6197), pp. 668–673. Cited by: Appendix A, §1, §3.2.
 [31] (2017) Geometric deep learning on graphs and manifolds using mixture model cnns. In CVPR, pp. 5425–5434. Cited by: §3.3, Table 6.
 [32] (2017) Geometric matrix completion with recurrent multigraph neural networks. In NIPS, pp. 3697–3707. Cited by: Table 7.
 [33] (2015) Collaborative filtering with graph information: consistency and scalable methods. In NIPS, pp. 2107–2115. Cited by: Table 7.
 [34] (2019) Scaling deep spiking neural networks with binary stochastic activations. In ICCC, pp. 50–58. Cited by: §2.
 [35] (2017) Conversion of continuousvalued deep networks to efficient eventdriven networks for image classification. Front. Neurosci. 11, pp. 682. Cited by: Appendix A.
 [36] (2019) Energy and policy considerations for deep learning in NLP. In ACL (1), pp. 3645–3650. Cited by: Appendix A.
 [37] (2019) Deep learning in spiking neural networks. Neural Networks 111, pp. 47–63. Cited by: Appendix A.
 [38] (2019) Spikegrad: an annequivalent computation model for implementing backpropagation with spikes. arXiv preprint arXiv:1906.00851. Cited by: §3.2.
 [39] (2017) Graph convolutional matrix completion. CoRR abs/1706.02263. Cited by: §C.4, Table 7.
 [40] (2017) Graph attention networks. arXiv:1710.10903. Cited by: Appendix A, §C.1, §3.1.
 [41] (2019) Neural graph collaborative filtering. In SIGIR, pp. 165–174. Cited by: §C.4.
 [42] (2019) Heterogeneous graph attention network. In WWW, pp. 2022–2032. Cited by: §3.1.
 [43] (2020) AMGCN: adaptive multichannel graph convolutional networks. In KDD, pp. 1243–1253. Cited by: Table 2.
 [44] (2019) Simplifying graph convolutional networks. In ICML, Cited by: Appendix A, §2, §2, §2, §3.1, §3.3.
 [45] (2020) Graph neural network and multiview learning based mobile application recommendation in heterogeneous graphs. In 2020 IEEE (SCC), Cited by: Appendix A.

[46]
(2016)
Revisiting semisupervised learning with graph embeddings
. In ICML, JMLR Workshop and Conference Proceedings, Vol. 48, pp. 40–48. Cited by: §3.1, §3.3.  [47] (2020) Spiketimingdependent back propagation in deep spiking neural networks. arXiv preprint arXiv:2003.11837. Cited by: Appendix A.
 [48] (2020) Graph neural networks: A review of methods and applications. AI Open 1, pp. 57–81. Cited by: §2.
Appendix A Related Work
Spiking Neural Networks
The fundamental SNN architecture includes the encoder, spiking neurons, and interconnecting synapses with trainable parameters [37]. These procedures contribute to the substantial integrateandfire (IF) process in SNNs: any coming spikes lead to the change of the membrane potential in the neuron nodes; once membrane potentials reach the threshold voltage, the neuron nodes fire spikes and transmit the messages into their next nodes.
Some studies have developed the methodology along with a function to approximate the nondifferentiable IF process [19, 47]. Although gradient descent and error backpropagation are directly applicable for SNNs in that way, a learning phase strongly related to ANNs still causes a heavy burden on the computation. Another approach to alleviate the difficulty of training in SNNs is using an ANNtoSNN conversion by using the pretrained neuron weights. [21] take advantage of the weights of pretrained ANNs to construct a spiking architecture for object recognition or detection. Although those conversions can be successfully performed, multiple operators of already trained ANNs are not fully compatible with SNNs [35]. As a result, SNNs constructed from a fully automatic conversion of arbitrary pretrained ANNs are not able to achieve a comparable prediction performance.
Another popular way to build the SNNs models is the spiketimingdependentplasticity (STDP) learning rule, where the synaptic weight is adjusted according to the interval between the pre and postsynaptic spikes. [10] propose an unsupervise learning model, which utilizes more biologically plausible components like conductancebased synapses and different STDP rules to achieve competitive performance on the MNIST dataset. [24]
introduce a pretraining scheme using biologically plausible unsupervised learning to better initialize the parameters in multilayer systems. Although STDP models provide a closer match to biology for the learning process, how to achieve a higher level function like classification using supervised learning is still unsolved
[5]. Besides, it can easily suffer from prediction performance degradation compared with supervised learning models.Graph neural networks.
Unlike a standard neural network, GNNs need to form a state that can extract the representation of a node from its neighborhood with an arbitrary graph [27]. In particular, GNNs utilize extracted node attributes and labels in graph networks to train model parameters in a specific scenario, such as citation networks, social networks, proteinprotein interactions (PPIs), and so on. GAT [40] has shown that capturing the weight via an endtoend neural network can make more important nodes receive larger weights. In order to increasingly improve the accuracy and reduce the complexity of GCNs, the extended derivative SGC [44] eliminates the nonlinearities and collapses weight matrices between consecutive layers. FastGCN [6]
successfully reduces the variance and improves the performance by sampling a designated number of nodes for each convolutional layer. Nonetheless, these convolutional GNN algorithms rely on highperformance computing systems to achieve fast inference for highdimensional graph data due to a heavy computational cost. Since GCNs bridge the gap between spectralbased and spatialbased approaches
[45], they offer desirable flexibility, extensibility, and architecture complexity. Thus, we adopt the GCNbased feature processing to construct our basic SNNs model.Energy consumption estimation.
An intuitive measurement of the model’s energy consumption is investigating the practical electrical consumption. [36] propose to repeatedly query the NVIDIA System Management Interface ^{6}^{6}6nvidiasmi: https://bit.ly/30sGEbi
to obtain the average energy consumption for training deep neural networks for natural language processing (NLP) tasks.
[4] measure the average power draw required during inference on GPUs by using the Keysight 1146B Hall effect current probe. However, querying the practical energy consumption requires very strict environment control (e.g., platform version and temperature), and might include the consumption of background program, which results in the inaccuracy measurement. Another promising approach to estimate the model’s energy consumption is according to the operations during training or inference. [1] develop a tool for calculating the operations of different neural network layers, which helps to track and predict the energy and carbon footprint of ANN models. Some SNN approaches [16, 21] successfully access the energy consumed by ANN and SNN models by measuring corresponding operations multiplied by theoretical unit power consumption. This kind of methods can estimate the ideal energy consumption excluding environmental disturbance. In addition, contemporary GPU platforms are much more mature than SNN platforms or neuromorphic chips [30, 18]. As a result, due to the technical restriction of employing SpikingGCN on neuromorphic chips, we theoretically estimate the energy consumption in the experimental section.Appendix B Notation, algorithm and source code
We list the frequently used notation in Table 8. Algorithm 1 shows the detailed training process of the proposed SpikingGCN model. The source code can be accessed via https://anonymous.4open.science/r/SpikingGCN1527.
Notations  Descriptions 

Graph structure data  
Single node in the graph  
Node set in the graph  
Number of nodes, class number and feature dimensions  
Edge weight between nodes and , scalar  
Adjacent matrix of the graph,  
Feature vector of th node,  
Entire attribute matrix in the graph,  
Onehot labels for each node,  
Degree of a single node,scalar  
Diagonal matrix of the degree of each node,  
New feature of th node after convolution,  
Entire attribute matrix after convolution,  
Time step in the clockdriven SNNs  
Spike of one node generated by encoder ,  
Spike of one node generated by decoder,  
th feature value of a single node, scalar  
Basic spike unit generated by th feature value, equal to 0 or 1  
Membrane potential at th time step in the decoder,  
Trainable weight matrix,  
Time constant, hyperparameter, scalar  
Signed reset voltage, hyperparameter, scalar  
Spiking threshold, hyperparameter, scalar 
Appendix C Additional Experimental Results
We report additional experimental results that complement the ones reported in the main paper.
c.1 Discussion of Node Classification Experiments
The remarkable performance of biofidelity SpikingGCN is attributed to three main reasons. First, as shown in Fig. 3, an appropriate can enable our network to focus on the most relevant parts of the input representation to make a decision, similar to the attention mechanism [40]. Note that an optimal
relies on different statistical patterns in the dataset. In another word, we can also view the Bernoulli encoder as a moderate maxpooling process on the graph features, where the salient representation of each node can have a higher probability to be the input of the network. As a result, assigning varying importance to nodes enable SpikingGCN to perform more effective prediction on the overall graph structure.
Second, based on our assumption, the majority of accurate predictions benefit from attribute integration. We simplify the network and make predictions using fewer parameters, which effectively reduces the chance of overfitting. The significant performance gain indicates the better generalization ability of neural inference trained with the simplified network, which validates the effectiveness of biofidelity SpikingGCN. Last, the variant SpikingGCNN has achieved better results than the original one on Cora, ACM, and citeseer datasets. As shown in Fig. 4, part of the negative voltages will be converted into negative spikes by the Heaviside activation function. The negative spikes can play a role in suppression since the spikes of times are summed to calculate the fire ratio, which is more biologically plausible. However, the improvement seems to have no effect on Pubmed, which has the highest sparsity and the lowest number of attributes. Sparse input leads to sparse spikes and voltages, and negative spikes tend to provide overly dilute information because the hyperparameters (e.g., of Heaviside activation function) are more elusive.
c.2 SpikingGCN for Active Learning
Cora  ACM  

SOPTSpikingGCN  
SOPTGCN  
PESpikingGCN  
PEGCN  
RandomSpikingGCN  
RandomGCN 
Based on the prediction result above, we are interested in SpikingGCN’s performance when the training samples vary, especially when the data is limited. Active learning has the same problem as semisupervised learning in that labels are rare and costly to get. The objective of active learning is to discover an acquisition function that can successively pick unlabeled data in order to optimize the prediction performance of the model. Thus, instead of obtaining unlabeled data at random, active learning may help substantially increase data efficiency and reduce cost. Meanwhile, active learning also provides a way to evaluate the generalization capability of models when the data is scarce. Since SpikingGCN can achieve a percent performance improvement with sufficient data, we are interested in how the prediction performance changes as the number of training samples increases.
Experiment Setup.
We apply SpikingGCN and GCN as the active learners and observe their performance. Furthermore, three kinds of acquisition methods are considered. First, according to [28], the  optimal (SOPT) acquisition function is model agnostic because it only depends on the graph Laplacian to determine the order of unlabeled nodes. The second one is the standard predictive entropy (PE) [15]. Last, we consider random sampling as the baseline. Starting with only one initial sample, the accuracy is periodically reported until 50 nodes are selected. Results are reported on both Cora and ACM datasets.
The Area under the Learning Curve (ALC) ^{7}^{7}7ALC corresponds to the area under the learning curve and is constrained to have the maximum value 1. results are shown in Table 9. We provide the active learning curves of SpikingGCN and GCN in Fig. 5, which are consistent with the statistics reported in Table 9. It can be seen that SOPT can choose the most informative nodes for SpikingGCN and GCN. At the same time, the PE acquisition function is a moderate strategy for performance improvement. Finally, in random strategy both models suffer from high variations during prediction as well as unstable conditions throughout the active learning process. However, no matter which strategy is adopted, SpikingGCN achieves a better generalization than GCN when the training data is scarce.
c.3 Energy Efficiency Experiments
Fig. 6 shows the remarkable energy difference between SpikingGCN and GNN based models. First, the sparse characteristic of graph datasets fits the spikebased encoding method. Furthermore, the zero values in node representations would have no chance to inspire a synapse event (spike) on a neuromorphic chip, leading to no energy consumption. Second, our simplified network architecture only contains two main neuron layers: a single fully connected layer and an LIF layer. Consider Pubmed as an example. Few attributes and a sparse adjacency matrix result in sparse spikes, and the smaller number (i.e., 3) of classes also require fewer neurons. This promising results imply that SpikingGCN could have the potential to achieve more significant advantages in energy consumption than general GNNs.
c.4 SpikingGCN on Other Application Domains
Results on image grids. The MNIST dataset contains 60,000 training samples and 10,000 testing samples of handwritten digits from 10 classes. Each image has grids or pixels, hence we treat each image as a node which has 784 features. It is worth noting that the grid image classification is identical to the citation networks where node classes will be identified, with the exception of the absence of an adjacent matrix. To extend our model, we adopt the traditional convolutional layers and provide the trainable spike encoder for graph embedding models, and the extended framework is given by Fig. 7. Since the LIF neuron models contain the leaky parameters , which can decay the membrane potential and activate the spikes on a small scale, we adopt the IntegrateandFire (IF) process to maintain a suitable firing rate for the encoder. The membrane activity happening in the spike encoder can be formalized as:
(15) 
where is the convolutional output at time step , and is given in (8). As shown in Fig. 7, the convolutional layers combined with the IF neurons will perform an autoencoder function for the input graph data. After processing the spike trains, the fully connected layers combined with the LIF neurons can generate the spike rates for each class, and we will obtain the prediction result from the most active neurons.
Results on superpixel images.
Another more complex graph structure is the superpixel images. Compared with the general grid images, superpixel images represent each picture as a graph which consists of connected nodes. Hence the classification task is defined as the prediction on the subgraphs. Another important distinction is that the superpixel images require to construct the connectivity between chosen nodes. A comparison between the grid and superpixel images is shown in Fig. 8, where 75 superpixels are processed as the representation of the image.
One of the important steps when processing the superpixel data is learning effective graph embedding extracted from the graph. To demonstrate the ability of our model when predicting based on the superpixel images, we empirically follow the convolutional approach utilized in SplineCNN [12] to further aggregate the connectivity of superpixels. The trainable kernel function based on Bsplines can make the most use of the local information in the graph and filter the input into a representative embedding. Similar to the framework proposed in Fig 7, the experiments on superpixel images also follow the structures as grid image experiments, where the convolutional layers & IF neurons enable the spike representations, and the fully connected layers & LIF neurons are responsible for the classification results.
In addition, we also provide a unique perspective to understand the mechanism of our model. In particular, our spike encoder can be regarded as a sampling process using spike train representation. The scenario of the image graph provides us an ideal chance to visualize the data processing in our model. Regarding the experiments of grid and superpixel images, we extract the outputs of our spike encoder and visualize them in Fig. 9 (c), along with other observations. First, the Bernoulli encoder mentioned above can be viewed as a sampling process with respect to the pixel values. As the time step increases, the encoder almost rebuilds the original input. However, the static spike encoder can not capture more useful features from the input data. Thus, our trainable encoder performs the convolution procedure and stimulates the IF neurons to fire a spike. As shown in Fig. 9 (b) and (c), by learning the convolutional parameters in the encoder, the spike encoder successfully detects the structure patterns and represents them in a discrete format.
Spike encoder for recommender systems.
Much research has tried to leverage the graphbased methods in analyzing social networks [39, 41, 14]. To this end, we extend our framework to the recommender systems, where users and items form a bipartite interaction graph for message passing. We tackle the rating prediction in recommender systems as a link classification problem. Starting with MovieLens 100K datasets, we take the rating pairs between users and items as the input, transform them into suitable spike representations, and finally output the classification class via firing rate. To effectively model this graphstructured data, we build our trainable spike encoder based on the convolutional method used in GCMC [39]. In particular, GCMC applies a simple but effective convolutional approach based on differentiable message passing on the bipartite interaction graph, and reconstruct the link utilizing a bilinear decoder.