DeepAI
Log In Sign Up

Spiking Graph Convolutional Networks

Graph Convolutional Networks (GCNs) achieve an impressive performance due to the remarkable representation ability in learning the graph information. However, GCNs, when implemented on a deep network, require expensive computation power, making them difficult to be deployed on battery-powered devices. In contrast, Spiking Neural Networks (SNNs), which perform a bio-fidelity inference process, offer an energy-efficient neural architecture. In this work, we propose SpikingGCN, an end-to-end framework that aims to integrate the embedding of GCNs with the biofidelity characteristics of SNNs. The original graph data are encoded into spike trains based on the incorporation of graph convolution. We further model biological information processing by utilizing a fully connected layer combined with neuron nodes. In a wide range of scenarios (e.g. citation networks, image graph classification, and recommender systems), our experimental results show that the proposed method could gain competitive performance against state-of-the-art approaches. Furthermore, we show that SpikingGCN on a neuromorphic chip can bring a clear advantage of energy efficiency into graph data analysis, which demonstrates its great potential to construct environment-friendly machine learning models.

READ FULL TEXT VIEW PDF
09/05/2022

Spiking GATs: Learning Graph Attentions via Spiking Neural Network

Graph Attention Networks (GATs) have been intensively studied and widely...
05/28/2019

Harnessing Slow Dynamics in Neuromorphic Computation

Neuromorphic Computing is a nascent research field in which models and d...
06/30/2021

Exploiting Spiking Dynamics with Spatial-temporal Feature Normalization in Graph Learning

Biological spiking neurons with intrinsic dynamics underlie the powerful...
10/13/2022

Real Spike: Learning Real-valued Spikes for Spiking Neural Networks

Brain-inspired spiking neural networks (SNNs) have recently drawn more a...
05/18/2022

Relational representation learning with spike trains

Relational representation learning has lately received an increase in in...

1 Introduction

Graph Neural Networks (GNNs), especially those using convolutional methods, have become a popular computational model for graph data analysis as the high-performance computing systems blossom during the last decade. One of well-known methods in GNNs is Graph Convolutional Networks (GCNs) [22]

, which learn a high-order approximation of a spectral graph by using convolutional layers followed by a nonlinear activation function to make the final prediction. Like most of the deep learning models, GCNs incorporate complex structures with costly training and testing process, leading to significant power consumption. It has been reported that the computation resources consumed for deep learning have grown

-fold from 2012 to 2018 [1]. The high energy consumption, when further coupled with sophisticated theoretical analysis and blurred biological interpretability of the network, has resulted in a revival of effort in developing novel energy-efficient neural architectures and physical hardware.

Inspired by the brain-like computing process, Spiking Neural Networks (SNNs) formalize the event- or clock-driven signals as inference for a set of parameters to update the neuron nodes [2]. Different from conventional deep learning models that communicate information using continuous decimal values, SNNs perform inexpensive computation by transmitting the input into discrete spike trains. Such a bio-fidelity method can perform a more intuitive and simpler inference and model training than traditional networks [29]

. Another distinctive merit of SNNs is the intrinsic power efficiency on the neuromorphic hardware, which is capable of running 1 million neurons and 256 million synapses with only 70 mW energy cost

[30]. Nevertheless, employing SNNs as an energy-efficient architecture to process graph data as effectively as GCNs still faces fundamental challenges.

Challenges: (i) Spike representation. Despite the promising results achieved on common tasks (e.g., image classification), SNN models are not trivially portable to non-Euclidean domains, such as graphs. Given the graph datasets widely used in many applications (e.g., citation networks and social networks), how to extract the graph structure and transfer the graph data into spike trains poses a challenge. (ii) Model generalization. GCNs can be extended to diverse circumstances by using deeper layers. Thus, it is essential to further extend the SNNs to a wider scope of applications where graphs are applicable. (iii) Energy efficiency.

Except for the common metrics like accuracy or prediction loss in artificial neural networks (ANNs), the energy efficiency of SNNs on the neuromorphic chips is an important characteristic to be considered. However, neuromorphic chips are not as advanced as contemporary GPUs, and the lack of uniform standards also impacts the energy estimation on different platforms.

To tackle these fundamental challenges, we introduce Spiking Graph Neural Network (SpikingGCN): an end-to-end framework that can properly encode graphs and make a prediction for non-trivial graph datasets that arise in diverse domains. To our best knowledge, SpikingGCN is the first-ever SNN designed for node classification in graph data

, and it can also be extended into more complex neural network structures. Overall, our main contribution is threefold: (i) We propose SpikingGCN, the first end-to-end model for node classification in SNNs, without any pre-training and conversion. The graph data is transformed into spike trains by a spike encoder. These generated spikes are used to predict the classification results. (ii) We show that the basic model inspired by GCNs can effectively merge the convolutional features into spikes and achieve competitive predictive performance. In addition, we further evaluate the performance of our model for active learning and energy efficient settings; (iii) We extend our framework to enable more complex network structures for different tasks, including image graph classification and rating predictions in recommender systems. The extensibility of the proposed model also opens the gate to perform SNN-based inference and training in various kinds of graph-based data. The code and Appendix are available on Github

333https://github.com/ZulunZhu/SpikingGCN.git.

2 Spiking Graph Neural Networks

Graphs are usually represented by a non-Euclidean data structure consisting of a set of nodes (vertices) and their relationships (edges). The reasoning process in the human brain depends heavily on the graph extracted from daily experience [48]. However, how to perform biologically interpretable reasoning for the standard graph neural networks has not been adequately investigated. Thus, the proposed SpikingGCN aims to address challenges of semi-supervised node classification in a biological and energy-efficient fashion. As this work refers to the methods in GNNs and SNNs, we list the frequently used notations in Table 8 in Appendix.

Graph neural networks (GNNs) conduct propagation guided by the graph structure, which is fundamentally different from existing SNN models that can only handle relatively simple image data. Instead of treating the single node as the input of an SNN model, the states of their neighborhood should also be considered. Let formally denote a graph, where is the node set and represents the adjacent matrix. Here is the number of nodes. The entire attribute matrix

includes the vectors of all nodes

. The degree matrix consists of the row-sum of the adjacent matrix , where denotes the edge weight between nodes and . Each node has dimensions. Our goal is to conduct SNN inference without neglecting the relationships between nodes.

Inference in SNN models is commonly conducted through the classic Leaky Integrate-and-Fire (LIF) mechanism [13]. Given the membrane potential at time step , the time constant , and the new pre-synaptic input , the membrane potential activity is governed by:

(1)

where is the signed reset voltage. The left differential item is widely used in the continuous domain, but the biological simulation in SNNs requires the implementation to be executed in a discrete and sequential way. Thus, we approximate the differential expression using an iterative version to guarantee computational availability. Updating using the input of our network, we can formalize (1) as:

(2)

To tackle the issue of feature propagation in an SNN model, we consider a spike encoder to extract the information in the graph and output the hidden state of each node in the format of spike trains. As shown in Fig. 1, the original input graph is transformed into the spikes from a convolution perspective. To predict the labels for each node, we consider a spike decoder and treat the final spike rate as a classification result.

Figure 1: Schematic view of the proposed SpikingGCN.

Graph convolution.

The pattern of graph data consists of two parts: topological structure and node’s own features, which are stored in the adjacency and attribute matrices, respectively. Different from the general processing of images with single-channel pixel features, the topological structure will be absent if only the node attributes are considered. To avoid the performance degradation of attributes-only encoding, SpikingGCN utilizes the graph convolution method inspired by GCNs to incorporate the topological information. The idea is to use the adjacency relationship to normalize the weights, thus nodes can selectively aggregate neighbor attributes. The convolution result, i.e., node representations, will serve as input to the subsequent spike encoder. Following the propagation mechanism of GCN [22] and SGC [44], we form the new node representation utilizing the attributes of each node and its local neighborhood:

(3)

Here, we can express the attribute transformation over the entire graph by:

(4)

where is the adjacent matrix with added self-connection, is the graph convolution layer number and is the degree matrix of . Similar to the simplified framework as SGC, we drop the non-linear operation and focus on the convolutional process on the entire graph. As a result, (4) acts as the only convolution operation in the spike encoder. While we incorporate the feature propagation explored by GCN and SGC, we would like to further highlight our novel contributions. First, our original motivation is to leverage an SNNs-based framework to reduce the inference energy consumption of graph analysis tasks without performance degradation. GCN’s effective graph Laplacian regularization approach allows us to minimize the number of trainable parameters and perform efficient inference in SNNs. Second, convolutional techniques only serve as the initial building block of SpikingGCN. More significantly, SpikingGCN is designed to accept the convolutional results in a binary form (spikes), and further detect the specific patterns among these spikes. This biological mechanism makes it suitable to be deployed on a neuromorphic chip to improve energy efficiency.

Representation encoding.

The representation

consists of continuous float-point values, but SNNs accept discrete spike signals. A spike encoder is essential to take node representations as input and output spikes for the subsequent procedures. We propose to use a probability-based Bernoulli encoding scheme as the basic method to transform the node representations to the spike signals. Let

and denote the spikes before the fully connected layers’ neurons at the -th time step and the -th feature in the new representation for node , respectively. Our hypothesis is that the spiking rate should keep a positive relationship with the importance of patterns in the representations. In probability-based Bernoulli encoder, the probability to fire a spike by each feature is related to the value of in node representation as following:

(5)

Here, with denotes a pre-synaptic spike, which takes a binary value (0 or 1). Note that derived from the convolution of neighbors is positively correlated with the feature significance. The larger the value, the greater the chance of a spike being fired by the encoder. Since the encoder generates the spike for each node on a tiny scale, we interpret the encoding module as a sampling process of the entire graph. In order to fully describe the information in the graph, we use time steps to repeat the sampling process. It is noteworthy that the number of time steps can be defined as the resolution of the message encoded.

Charge, fire and reset in SpikingGCN.

The following module includes the fully connected layer and the LIF neuron layer. The fully connected layer takes spikes as input and outputs voltages according to trainable weights. The voltages charge LIF neurons and then conduct a series of actions, including fire spikes and reset the membrane potential.

Potential charge. General deep SNN models adopt a multi-layer network structure including linear and nonlinear counterparts to process the input [5]. Following SGC’s assumption, the depth in deep SNNs is not critical to predict unknown labels on the graph [44]. Thus, we drop redundant modules except for the final linear layer (fully connected layer) to simplify our framework and increase the inference speed. We obtain the linear summation as the input of SNN structure in (2). The LIF model includes the floating-point multiplication of the constant , which is not biologically plausible. To address this challenge and avoid the additional hardware requirement when deployed on neuromorphic chips, we calculate the factor as and incorporate the constant into the synapse parameters , then simplify the equation as:

(6)

Fire and reset. In a biological neuron, a spike is fired when the accumulated membrane potential passes a spiking threshold . In essence, the spikes after the LIF neurons are generated and increase the spike rate in the output layer. We adopt the Heaviside function:

(7)

to simulate the fundamental firing process. As shown in Fig. 2, which demonstrates our framework, time spike trains for each node are generated from the three LIF neurons. Neurons sum the number of spikes and then divide it by to get the firing rate of each individual. For instance, for the example nodes from ACM datasets, we get neurons’ firing rates as: , the true label: , then the loss in training process is MSE(, ). If it is in the testing phase, the predicted label would be the neuron with the max firing rate, i.e., 2 in this example.

Figure 2: An illustration of SpikingGCN’s detailed framework

Negative voltages would not trigger spikes, but these voltages contain information that (7) ignores. To compensate for the negative term, we propose to use a negative threshold to distinguish the negative characteristics of the membrane potential. Inspired by [21], we adjust the Heaviside activation function after the neuron nodes as follows:

(8)

where

is the hyperparameter that determines the negative range. In the context of biological mechanisms, we interpret the fixed activation function as an excitatory and inhibitory processes in the neurons. When capturing more information and firing spikes in response to various features, this more biologically reasonable modification also improves the performance of our model on classification tasks. The whole process is detailed by Algorithm 1 in Appendix B.

In biological neural systems, after firing a spike, the neurons tend to rest their potential and start to accumulate voltage again. We reset the membrane potential:

(9)

Gradient surrogate.

One of the most significant obstacles for SNN’s training is the non-differentiable nature of activation function (7). In that case, the back-propagation algorithm can not be employed directly during the training phase. Inspired by [34]

, the sigmoid function is adopted to approximate the training fire phase when executing the back-propagation and stochastic gradient descent operation. Thus, we formalize the surrogate function as follow:

(10)

where can measure the approximation degree. When we train the model, the back-propagation is feasible and the gradient of the Heaviside function can be replaced as followed:

(11)

Model feasibility analysis

Since the input spikes can be viewed as a rough approximation of original convolutional results in the initial graph, two key questions remain: (i) does this proposed method really work for the prediction task? (ii) how to control the information reduction of the sampled spikes compared with other GNN models? It turns out that although equation (6) shows a rough intuitive approximation of the graph input using trainable linear combination, it cannot fully explain why SpikingGCN can achieve comparable performance with other real-value GNN models. Next, we will show that our spike representations is very close to the real-value output with a high probability.

To explain why SpikingGCN can provide an accurate output using spike trains, let us list SGC ([44]) as the model for comparison. SGC adopts a similar framework to our model, which transfers the convolution result into a fully connected layer in a real-value format. Given the parameters , the real-value convolution result and the spike representations for one node, we have:

(12)

Note that firing a spike in each dimension of a node is independent. When merging our case into a generalization of the Chernoff inequalities for binomial distribution in

[8], we derive the following estimation error bound.

Let

be independent random variables with equation (

12). For , we have and we define . Then we have

(13)
(14)

where . Note that is exactly the output of the SGC’s trainable layer, and is the output of SpikingGCN after a linear layer. By applying the upper and lower bounds, the failure probability will be at most . For question (i) mentioned earlier, we guarantee that our spike representations have approximation to SGC’s output with at least probability. For question (ii), we also employ the regularization and parameter clip operations during our experiments to control the and respectively, which can further help us optimize the upper and lower bounds.

3 Experiments

To evaluate the effectiveness of the proposed SpikingGCN, we conduct extensive experiments that focus on four major objectives: (i) semi-supervised node classification on citation graphs, (ii) performance evaluation under limited training data in active learning, (iii) energy efficiency evaluation on neuromorphic chips, and (iv) extensions to other application domains. Due to the limitation of space, we leave the active learning experiments in Appendix C.2.

3.1 Semi-supervised Node Classification

Datasets.

For node classification, we test our model on four commonly used citation network datasets: Cora, citeseer, ACM, and Pubmed

[42], where nodes and edges represent the papers and citation links. The statistics of the four datasets are summarized in Table 1. Sparsity refers to the number of edges divided by the square of the number of nodes.

Datasets Nodes Edges Attributes Classes Sparsity
Cora
ACM %
citeseer
Pubmed
Table 1: Statistics of the citation network datasets.
Cora ACM citeseer Pubmed
Models Split I   Split II Split I   Split II Split I   Split II Split I   Split II
GCN
SGC
FastGCN
GAT
DAGNN
SpikingGCN
SpikingGCN-N
Table 2: Test accuracy (%) comparison of different methods. The results from the literature and our experiments are provided. The literature statistics of ACM datasets are taken from [43]. All results are averaged over 10 runs. The top 2 results are boldfaced.

Baselines. We implement our proposed SpikingGCN and the following competitive baselines: GCNs [22], SGC [44], FastGCN [6], GAT [40], DAGNN [26]. We also conduct the experiments on SpikingGCN-N, a variant of SpikingGCN, which uses a refined Heaviside activation function (8) instead. For a fair comparison, we partition the data using two different ways. The first is the same as [46], which is adopted by many existing baselines in the literature. In this split method (i.e., Split I), 20 instances from each class are sampled as the training datasets. In addition, 500 and 1000 instances are sampled as the validation and testing datasets respectively. For the second data split (i.e., Split II), the ratio of training to testing is 8:2, and of training samples is further used for validation.

Table 2 summarizes the node classification’s accuracy comparison with the competing methods over four datasets. We show the best results we can achieve for each dataset and have the following key observations: SpikingGCN achieves or matches SOTA results across four benchmarks on these two different dataset split methods. It is worth noting that, when the dataset is randomly divided proportionally and SpikingGCN obtains enough data, it can even outperform the state-of-the-art approaches. For example, SpikingGCN-N outperforms DAGNN by over on citeseer dataset. The detailed discussion of the performance can be found in Appendix C.1.

3.2 Energy Efficiency on Neuromorphic Chips

To examine the energy efficiency of SpikingGCN, we propose two metrics: i) the number of operations required to predict a node on each model, and ii) the energy consumed by SpikingGCN on neuromorphic hardware versus other models on GPUs. In this experiment, only the basic SpikingGCN is conducted to evaluate the energy efficiency. The reason we omit SpikingGCN-N is that the negative spikes cannot be implemented on neuromorphic hardware.

We note that training SNN models directly on neuromorphic chip is rarely explored ([38]). In that case, we employ the training phase on GPUs and estimate the energy consumption of test phase on neuromorphic hardware. More importantly, a specific feature of semi-supervised on GNNs is that test data is also visible during the training process. Therefore, the convolutional part during the training covers the global graph. Then during the test phase, no MAC operation is required by our SNN model because all of the data has been processed on GPUs.

Estimating the computation overhead relies on operations in the hardware [30]. The operation unit of ANNs in contemporary GPUs is usually set to multiply-accumulate (MAC), and for SNNs in the neuromorphic chip is the synaptic operation (SOP). Furthermore, SOP is defined as the change of membrane potential (i.e., voltages) in the LIF nodes, and specific statistics in the experiment refer to voltages’ changes during charge and fire processes. Following the quantification methods introduced in [17]

and ensuring the consistency between different network constraints, we compute the operations of baselines and SpikingGCN to classify one node. Table

4 shows that SpikingGCN has a significant operand reduction. According to the literature [16, 21], SOPs consume far less energy than MACs, which further highlights the energy efficiency of SpikingGCN.

models Cora ACM citeseer Pubmed
GCN K K K K
SGC K K K K
FastGCN K K K K
GAT K K K M
DAGNN K K K K
SpikingGCN K K K K
Table 4: Energy consumption comparison
GCN on TITAN
Power (W) GFLOPS Nodes FLOPS Energy (J)
280 16,310 10,000 4.14E+09 0.07
SpikingGCN on ROLLs
Voltage (V) Energy/spike (pJ) Nodes Spikes Energy
1.8 3.7 10,000 2.73E+07 1.01E-04
Table 3: Operations comparison

However, the energy consumption measured by SOPs may be biased, e.g., the zero spikes would also result in the voltage descending changes, which does not require new energy consumption in neuromorphic chips [18]. Hence, calculating energy cost only based on operations may result in an incorrect conclusion. To address this issue, we further provide an alternative estimation approach as follow. Neuromorphic designs could provide event-based computation by transmitting one-bit spikes between neurons. This characteristic contributes to the energy efficiency of SNNs because they consume energy only when needed [11]. For example, during the inference phase, the encoded sparse spike trains act as a low-precision synapse event, which costs the computation memory once spikes are sent from a source neuron. Considering the above hardware characteristics and the deviation of SOPs in consumption calculation, we follow the spike-based approach utilized in [5]

and count the overall spikes during inference for 4 datasets, to estimate the SNN energy consumption. We list an example of energy consumption when inferring 10,000 nodes in the Pubmed dataset, as shown in Table

4.

Applying the energy consumed by each spike or operation, in Appendix C.3, we visualize the energy consumption between SpikingGCN and GNNs when employed on the recent neuromorphic chip (ROLLS [18]) and GPU (TITAN RTX, 24G 444https://www.nvidia.com/en-us/deep-learning-ai/products/), respectively. Fig. 6 shows that SpikingGCN could use remarkably less energy than GNNs when employed on ROLLs. For example, SpikingGCN could save about 100 times energy than GCN in all datasets. Note that different from GPUs, ROLLS is firstly introduced in 2015, and higher energy efficiency of SpikingGCN can be expected in the future.

3.3 Extension to other application domains

In the above experiments, we adopt a basic encoding and decoding process, which can achieve competitive performance on the citation datasets. However, some other graph structures like image graphs and social networks can not be directly processed using graph Laplacian regularization (i.e., [22, 44]). To tackle the compatibility issue, we extend our model and make it adapt to the graph embedding methods (i.e., [46]). Different from the graph Laplacian regularization methods like GCNs, the graph embedding methods always contain specific trainable parameters to incorporate the attributes in the graph structure. In this case, the Bernoulli encoder is unable to generate the spike trains, which perfectly represent the graph information. Taking the image graph as an example, we can see that the Bernoulli encoder cannot fully represent the pixels. Hence, the characteristics of the pixels’ local Euclidean neighborhoods must be aggregated. We propose a trainable spike encoder, to allow deeper SNNs for different tasks, including classification on grid images and superpixel images, and rating prediction in recommender systems. Limited by space, we leave the implementation detail to Appendix C.4.

Result on grid images.

To validate the performance of SpikingGCN on image graphs, we first apply our model to the MNIST dataset

[23]. The classification results of grid images on MNIST are summarized in Table 5. We choose several SOTA algorithms including ANN and SNN models, which work on MNIST datasets. The depth is calculated according to the layers including trainable parameters. Since we are using a similar network structure as the Spiking CNN [25], the better result proves that our clock-driven architecture is able to capture more significant patterns in the data flow. The competitive performance of our model on image classification also proves that SpikingGCN’s compatibility to different graph scenarios.

Models Type Depth Accuracy
SplineCNN [12] ANN
LeNet5 [23] ANN
LISNN [7] SNN
Spiking CNN [25] SNN
S-ResNet [16] SNN
SpikingGCN (Ours) SNN
Table 5: Test accuracy (%) comparison on MNIST. The best results are boldfaced.

Results on superpixel images. We select the MNIST superpixel dataset [31] for the comparison with the grid experiment mentioned above. The results of the MNIST superpixel experiments are presented in Table 6. Since our goal is to prove the generalization of our model on different scenarios, we only use 20 time steps to conduct this subgraph classification task and achieve the mean accuracy of over 10 runs. It can be seen that SpikingGCN is readily compatible with the different convolutional methods of the graph and obtain a competitive performance through a biological mechanism.

Models Accuracy
ChebNet [9]
MoNet [31]
SplineCNN [12]
SpikingGCN (Ours)
Table 6: Test accuracy comparison on MNIST. The best results are boldfaced. Baseline numbers are taken from [12].
Models RMSE Score
MC [3]
GMC [20]
GRALS [33]
sRGCNN [32]
GC-MC [39]
SpikingGCN (Ours)
Table 7:

Test RMSE scores with MovieLens 100K datasets. Baselines numbers are taken from

[39].

Results on recommender systems. We also evaluate our model with a rating matrix extracted from MovieLens 100K 555https://grouplens.org/datasets/movielens/ and report the RMSE scores compared with other matrix completion baselines in Table 7. The comparable loss indicates that our proposed framework can also be employed in recommender systems. Because the purpose of this experiment is to demonstrate the applicability of SpikingGCN in recommender systems, we have not gone into depth on the design of a specific spike encoder. We leave this design in the future work since it is not the focus of the current paper.

4 Conclusions

In this paper, we present SpikingGCN, a first-ever bio-fidelity and energy-efficient framework focusing on graph-structured data, which encodes the node representation and makes the prediction with less energy consumption. In our basic model for citation networks, the encoded spike trains are processed by a simple linear layer combined with a neuron layer. We conduct extensive experiments on node classification with four public datasets. Compared with other SOTA approaches, we demonstrate that SpikingGCN achieves the best accuracy with the lowest computation cost and much-reduced energy consumption. Furthermore, SpikingGCN also exhibits great generalization when confronted with limited data. In our extended model for more graph scenarios, SpikingGCN also has the potential to compete with the SOTA models on tasks from computer vision or recommender systems. Relevant results and discussions are presented to offer key insights on the working principle, which may stimulate future research on environmentally friendly and biological algorithms.

Acknowledgments

The research is supported by the Key-Area Research and Development Program of Guangdong Province (2020B010165003), the Guangdong Basic and Applied Basic Research Foundation (No. 2020A1515010831), the Guangzhou Basic and Applied Basic Research Foundation (No. 202102020881), the Tencent AI Lab RBFR2022017, and the Program for Guangdong Introducing Innovative and Entrepreneurial Teams (No. 2017ZT07X355). Qi Yu is supported in part by an NSF IIS award IIS-1814450 and an ONR award N00014-18-1-2875. The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing any funding agency.

References

  • [1] L. F. W. Anthony, B. Kanding, and R. Selvan (2020) Carbontracker: tracking and predicting the carbon footprint of training deep learning models. CoRR abs/2007.03051. Cited by: Appendix A, §1.
  • [2] R. Brette, M. Rudolph, T. Carnevale, et al. (2007) Simulation of networks of spiking neurons: a review of tools and strategies. J. Comput. Neurosci. 23 (3), pp. 349–398. Cited by: §1.
  • [3] E. J. Candès and B. Recht (2012) Exact matrix completion via convex optimization. Commun. ACM 55 (6), pp. 111–119. Cited by: Table 7.
  • [4] A. Canziani, A. Paszke, and E. Culurciello (2016) An analysis of deep neural network models for practical applications. CoRR abs/1605.07678. Cited by: Appendix A.
  • [5] Y. Cao, Y. Chen, and D. Khosla (2015)

    Spiking deep convolutional neural networks for energy-efficient object recognition

    .
    In IJCV 113 (1), pp. 54–66. Cited by: Appendix A, §2, §3.2.
  • [6] J. Chen, T. Ma, and C. Xiao (2018) Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv:1801.10247. Cited by: Appendix A, §3.1.
  • [7] X. Cheng, Y. Hao, J. Xu, and B. Xu (2020) LISNN: improving spiking neural networks with lateral interactions for robust object recognition. In IJCAI, pp. 1519–1525. Cited by: Table 5.
  • [8] F. Chung and L. Lu (2002) Connected components in random graphs with given expected degree sequences. Annals of combinatorics 6 (2), pp. 125–145. Cited by: §2.
  • [9] M. Defferrard, X. Bresson, and P. Vandergheynst (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In NIPS, pp. 3837–3845. Cited by: Table 6.
  • [10] P. U. Diehl and M. Cook (2015) Unsupervised learning of digit recognition using spike-timing-dependent plasticity. Frontiers Comput. Neurosci. 9, pp. 99. Cited by: Appendix A.
  • [11] S. K. Esser, P. A. Merolla, J. V. Arthur, et al. (2016) Convolutional networks for fast, energy-efficient neuromorphic computing. Proceedings of the national academy of sciences 113 (41), pp. 11441–11446. Cited by: §3.2.
  • [12] M. Fey, J. E. Lenssen, F. Weichert, and H. Müller (2018) SplineCNN: fast geometric deep learning with continuous b-spline kernels. In CVPR, pp. 869–877. Cited by: §C.4, Table 5, Table 6.
  • [13] W. Gerstner and W. M. Kistler (2002) Spiking neuron models: single neurons, populations, plasticity. Cambridge university press. Cited by: §2.
  • [14] X. He, K. Deng, X. Wang, Y. Li, Y. Zhang, and M. Wang (2020) LightGCN: simplifying and powering graph convolution network for recommendation. In SIGIR, pp. 639–648. Cited by: §C.4.
  • [15] J. M. Hernández-Lobato, M. W. Hoffman, and Z. Ghahramani (2014) Predictive entropy search for efficient global optimization of black-box functions. In NIPS, pp. 918–926. Cited by: §C.2.
  • [16] Y. Hu, H. Tang, Y. Wang, and G. Pan (2018) Spiking deep residual network. arXiv:1805.01352. Cited by: Appendix A, §3.2, Table 5.
  • [17] R. Hunger (2005) Floating point operations in matrix-vector calculus. Cited by: §3.2.
  • [18] G. Indiveri, F. Corradi, and N. Qiao (2015) Neuromorphic architectures for spiking deep neural networks. In IEDM, pp. 4–2. Cited by: Appendix A, §3.2, §3.2.
  • [19] Y. Jin, W. Zhang, and P. Li (2018)

    Hybrid macro/micro level backpropagation for training deep spiking neural networks

    .
    In NIPS, Cited by: Appendix A.
  • [20] V. Kalofolias, X. Bresson, M. M. Bronstein, and P. Vandergheynst (2014) Matrix completion on graphs. CoRR abs/1408.1717. Cited by: Table 7.
  • [21] S. Kim, S. Park, B. Na, and S. Yoon (2020) Spiking-yolo: spiking neural network for energy-efficient object detection. In AAAI, pp. 11270–11277. Cited by: Appendix A, Appendix A, §2, §3.2.
  • [22] T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §1, §2, §3.1, §3.3.
  • [23] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §3.3, Table 5.
  • [24] C. Lee, P. Panda, G. Srinivasan, and K. Roy (2018) Training deep spiking convolutional neural networks with stdp-based unsupervised pre-training followed by supervised fine-tuning. Frontiers in neuroscience 12, pp. 435. Cited by: Appendix A.
  • [25] J. Lee, T. Delbrück, and M. Pfeiffer (2016) Training deep spiking neural networks using backpropagation. CoRR abs/1608.08782. Cited by: §3.3, Table 5.
  • [26] M. Liu, H. Gao, and S. Ji (2020) Towards deeper graph neural networks. In SIGKDD, pp. 338–348. Cited by: §3.1.
  • [27] Y. Liu, J. Peng, L. Chen, and Z. Zheng (2020) Abstract interpretation based robustness certification for graph convolutional networks. In 24th ECAI, Cited by: Appendix A.
  • [28] Y. Ma, R. Garnett, and J. G. Schneider (2013) -optimality for active learning on gaussian random fields. In NIPS, pp. 2751–2759. Cited by: §C.2.
  • [29] W. Maass (1997) Networks of spiking neurons: the third generation of neural network models. Neural networks 10 (9), pp. 1659–1671. Cited by: §1.
  • [30] P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, et al. (2014) A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345 (6197), pp. 668–673. Cited by: Appendix A, §1, §3.2.
  • [31] F. Monti, D. Boscaini, J. Masci, E. Rodolà, J. Svoboda, and M. M. Bronstein (2017) Geometric deep learning on graphs and manifolds using mixture model cnns. In CVPR, pp. 5425–5434. Cited by: §3.3, Table 6.
  • [32] F. Monti, M. M. Bronstein, and X. Bresson (2017) Geometric matrix completion with recurrent multi-graph neural networks. In NIPS, pp. 3697–3707. Cited by: Table 7.
  • [33] N. Rao, H. Yu, P. Ravikumar, and I. S. Dhillon (2015) Collaborative filtering with graph information: consistency and scalable methods. In NIPS, pp. 2107–2115. Cited by: Table 7.
  • [34] D. Roy, I. Chakraborty, and K. Roy (2019) Scaling deep spiking neural networks with binary stochastic activations. In ICCC, pp. 50–58. Cited by: §2.
  • [35] B. Rueckauer, I. Lungu, Y. Hu, M. Pfeiffer, and S. Liu (2017) Conversion of continuous-valued deep networks to efficient event-driven networks for image classification. Front. Neurosci. 11, pp. 682. Cited by: Appendix A.
  • [36] E. Strubell, A. Ganesh, and A. McCallum (2019) Energy and policy considerations for deep learning in NLP. In ACL (1), pp. 3645–3650. Cited by: Appendix A.
  • [37] A. Tavanaei, M. Ghodrati, S. R. Kheradpisheh, T. Masquelier, and A. Maida (2019) Deep learning in spiking neural networks. Neural Networks 111, pp. 47–63. Cited by: Appendix A.
  • [38] J. C. Thiele, O. Bichler, and A. Dupret (2019) Spikegrad: an ann-equivalent computation model for implementing backpropagation with spikes. arXiv preprint arXiv:1906.00851. Cited by: §3.2.
  • [39] R. van den Berg, T. N. Kipf, and M. Welling (2017) Graph convolutional matrix completion. CoRR abs/1706.02263. Cited by: §C.4, Table 7.
  • [40] P. Veličković, G. Cucurull, A. Casanova, et al. (2017) Graph attention networks. arXiv:1710.10903. Cited by: Appendix A, §C.1, §3.1.
  • [41] X. Wang, X. He, M. Wang, F. Feng, and T. Chua (2019) Neural graph collaborative filtering. In SIGIR, pp. 165–174. Cited by: §C.4.
  • [42] X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu (2019) Heterogeneous graph attention network. In WWW, pp. 2022–2032. Cited by: §3.1.
  • [43] X. Wang, M. Zhu, D. Bo, P. Cui, C. Shi, and J. Pei (2020) AM-GCN: adaptive multi-channel graph convolutional networks. In KDD, pp. 1243–1253. Cited by: Table 2.
  • [44] F. Wu, A. H. Souza Jr, T. Zhang, C. Fifty, T. Yu, and K. Q. Weinberger (2019) Simplifying graph convolutional networks. In ICML, Cited by: Appendix A, §2, §2, §2, §3.1, §3.3.
  • [45] F. Xie, Z. Cao, Y. Xu, L. Chen, and Z. Zheng (2020) Graph neural network and multi-view learning based mobile application recommendation in heterogeneous graphs. In 2020 IEEE (SCC), Cited by: Appendix A.
  • [46] Z. Yang, W. W. Cohen, and R. Salakhutdinov (2016)

    Revisiting semi-supervised learning with graph embeddings

    .
    In ICML, JMLR Workshop and Conference Proceedings, Vol. 48, pp. 40–48. Cited by: §3.1, §3.3.
  • [47] M. Zhang, J. Wang, Z. Zhang, et al. (2020) Spike-timing-dependent back propagation in deep spiking neural networks. arXiv preprint arXiv:2003.11837. Cited by: Appendix A.
  • [48] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun (2020) Graph neural networks: A review of methods and applications. AI Open 1, pp. 57–81. Cited by: §2.

Appendix A Related Work

Spiking Neural Networks

The fundamental SNN architecture includes the encoder, spiking neurons, and interconnecting synapses with trainable parameters [37]. These procedures contribute to the substantial integrate-and-fire (IF) process in SNNs: any coming spikes lead to the change of the membrane potential in the neuron nodes; once membrane potentials reach the threshold voltage, the neuron nodes fire spikes and transmit the messages into their next nodes.

Some studies have developed the methodology along with a function to approximate the non-differentiable IF process [19, 47]. Although gradient descent and error back-propagation are directly applicable for SNNs in that way, a learning phase strongly related to ANNs still causes a heavy burden on the computation. Another approach to alleviate the difficulty of training in SNNs is using an ANN-to-SNN conversion by using the pre-trained neuron weights. [21] take advantage of the weights of pre-trained ANNs to construct a spiking architecture for object recognition or detection. Although those conversions can be successfully performed, multiple operators of already trained ANNs are not fully compatible with SNNs [35]. As a result, SNNs constructed from a fully automatic conversion of arbitrary pre-trained ANNs are not able to achieve a comparable prediction performance.

Another popular way to build the SNNs models is the spike-timing-dependent-plasticity (STDP) learning rule, where the synaptic weight is adjusted according to the interval between the pre- and postsynaptic spikes. [10] propose an unsupervise learning model, which utilizes more biologically plausible components like conductance-based synapses and different STDP rules to achieve competitive performance on the MNIST dataset. [24]

introduce a pre-training scheme using biologically plausible unsupervised learning to better initialize the parameters in multi-layer systems. Although STDP models provide a closer match to biology for the learning process, how to achieve a higher level function like classification using supervised learning is still unsolved

[5]. Besides, it can easily suffer from prediction performance degradation compared with supervised learning models.

Graph neural networks.

Unlike a standard neural network, GNNs need to form a state that can extract the representation of a node from its neighborhood with an arbitrary graph [27]. In particular, GNNs utilize extracted node attributes and labels in graph networks to train model parameters in a specific scenario, such as citation networks, social networks, protein-protein interactions (PPIs), and so on. GAT [40] has shown that capturing the weight via an end-to-end neural network can make more important nodes receive larger weights. In order to increasingly improve the accuracy and reduce the complexity of GCNs, the extended derivative SGC [44] eliminates the nonlinearities and collapses weight matrices between consecutive layers. FastGCN [6]

successfully reduces the variance and improves the performance by sampling a designated number of nodes for each convolutional layer. Nonetheless, these convolutional GNN algorithms rely on high-performance computing systems to achieve fast inference for high-dimensional graph data due to a heavy computational cost. Since GCNs bridge the gap between spectral-based and spatial-based approaches

[45], they offer desirable flexibility, extensibility, and architecture complexity. Thus, we adopt the GCN-based feature processing to construct our basic SNNs model.

Energy consumption estimation.

An intuitive measurement of the model’s energy consumption is investigating the practical electrical consumption. [36] propose to repeatedly query the NVIDIA System Management Interface 666nvidia-smi: https://bit.ly/30sGEbi

to obtain the average energy consumption for training deep neural networks for natural language processing (NLP) tasks.

[4] measure the average power draw required during inference on GPUs by using the Keysight 1146B Hall effect current probe. However, querying the practical energy consumption requires very strict environment control (e.g., platform version and temperature), and might include the consumption of background program, which results in the inaccuracy measurement. Another promising approach to estimate the model’s energy consumption is according to the operations during training or inference. [1] develop a tool for calculating the operations of different neural network layers, which helps to track and predict the energy and carbon footprint of ANN models. Some SNN approaches [16, 21] successfully access the energy consumed by ANN and SNN models by measuring corresponding operations multiplied by theoretical unit power consumption. This kind of methods can estimate the ideal energy consumption excluding environmental disturbance. In addition, contemporary GPU platforms are much more mature than SNN platforms or neuromorphic chips [30, 18]. As a result, due to the technical restriction of employing SpikingGCN on neuromorphic chips, we theoretically estimate the energy consumption in the experimental section.

Appendix B Notation, algorithm and source code

We list the frequently used notation in Table 8. Algorithm 1 shows the detailed training process of the proposed SpikingGCN model. The source code can be accessed via https://anonymous.4open.science/r/SpikingGCN-1527.

Notations Descriptions
Graph structure data
Single node in the graph
Node set in the graph
Number of nodes, class number and feature dimensions
Edge weight between nodes and , scalar
Adjacent matrix of the graph,
Feature vector of -th node,
Entire attribute matrix in the graph,
One-hot labels for each node,
Degree of a single node,scalar
Diagonal matrix of the degree of each node,
New feature of -th node after convolution,
Entire attribute matrix after convolution,
Time step in the clock-driven SNNs
Spike of one node generated by encoder ,
Spike of one node generated by decoder,
-th feature value of a single node, scalar
Basic spike unit generated by -th feature value, equal to 0 or 1
Membrane potential at -th time step in the decoder,
Trainable weight matrix,
Time constant, hyperparameter, scalar
Signed reset voltage, hyperparameter, scalar
Spiking threshold, hyperparameter, scalar
Table 8: Frequently used notations in this paper

Input: Graph ; input attributes ; one-hot matrix of label ;
Parameter: Learning rate ; Weight matrix ; embedding function embedding(); encoding function encoding();
charge, fire, reset functions charge(), fire(), reset()
Output: Firing rate vector for training subset , which is the prediction

1:  while not converge do
2:      Sample a mini-batch nodes from the training nodes
3:      for each node  do
4:           // Eq. (3)(4)
5:          for  do
6:               // Eq. (5)
7:              ) // Eq. (2)
8:              
9:               // Eq. (9)
10:          end for
11:          
12:      end for
13:      Perform meta update,
14:  end while
Algorithm 1 Model Training of SpikingGCN

Appendix C Additional Experimental Results

We report additional experimental results that complement the ones reported in the main paper.

c.1 Discussion of Node Classification Experiments

Figure 3: Impact of T

The remarkable performance of bio-fidelity SpikingGCN is attributed to three main reasons. First, as shown in Fig. 3, an appropriate can enable our network to focus on the most relevant parts of the input representation to make a decision, similar to the attention mechanism [40]. Note that an optimal

relies on different statistical patterns in the dataset. In another word, we can also view the Bernoulli encoder as a moderate max-pooling process on the graph features, where the salient representation of each node can have a higher probability to be the input of the network. As a result, assigning varying importance to nodes enable SpikingGCN to perform more effective prediction on the overall graph structure.

Figure 4: Membrane potential activity

Second, based on our assumption, the majority of accurate predictions benefit from attribute integration. We simplify the network and make predictions using fewer parameters, which effectively reduces the chance of overfitting. The significant performance gain indicates the better generalization ability of neural inference trained with the simplified network, which validates the effectiveness of bio-fidelity SpikingGCN. Last, the variant SpikingGCN-N has achieved better results than the original one on Cora, ACM, and citeseer datasets. As shown in Fig. 4, part of the negative voltages will be converted into negative spikes by the Heaviside activation function. The negative spikes can play a role in suppression since the spikes of times are summed to calculate the fire ratio, which is more biologically plausible. However, the improvement seems to have no effect on Pubmed, which has the highest sparsity and the lowest number of attributes. Sparse input leads to sparse spikes and voltages, and negative spikes tend to provide overly dilute information because the hyperparameters (e.g., of Heaviside activation function) are more elusive.

c.2 SpikingGCN for Active Learning

Cora ACM
SOPT-SpikingGCN
SOPT-GCN
PE-SpikingGCN
PE-GCN
Random-SpikingGCN
Random-GCN
Table 9: The Area under the Learning Curve (ALC) on Cora and ACM datasets.
(a) Active learning on the Cora dataset
(b) Active learning on the ACM dataset
Figure 5: Active learning curves for both Cora and ACM datasets.

Based on the prediction result above, we are interested in SpikingGCN’s performance when the training samples vary, especially when the data is limited. Active learning has the same problem as semi-supervised learning in that labels are rare and costly to get. The objective of active learning is to discover an acquisition function that can successively pick unlabeled data in order to optimize the prediction performance of the model. Thus, instead of obtaining unlabeled data at random, active learning may help substantially increase data efficiency and reduce cost. Meanwhile, active learning also provides a way to evaluate the generalization capability of models when the data is scarce. Since SpikingGCN can achieve a percent performance improvement with sufficient data, we are interested in how the prediction performance changes as the number of training samples increases.

Experiment Setup.

We apply SpikingGCN and GCN as the active learners and observe their performance. Furthermore, three kinds of acquisition methods are considered. First, according to [28], the - optimal (SOPT) acquisition function is model agnostic because it only depends on the graph Laplacian to determine the order of unlabeled nodes. The second one is the standard predictive entropy (PE) [15]. Last, we consider random sampling as the baseline. Starting with only one initial sample, the accuracy is periodically reported until 50 nodes are selected. Results are reported on both Cora and ACM datasets.

Figure 6: The energy consumption of SpikingGCN and baselines on their respective hardware.

The Area under the Learning Curve (ALC) 777ALC corresponds to the area under the learning curve and is constrained to have the maximum value 1. results are shown in Table 9. We provide the active learning curves of SpikingGCN and GCN in Fig. 5, which are consistent with the statistics reported in Table 9. It can be seen that SOPT can choose the most informative nodes for SpikingGCN and GCN. At the same time, the PE acquisition function is a moderate strategy for performance improvement. Finally, in random strategy both models suffer from high variations during prediction as well as unstable conditions throughout the active learning process. However, no matter which strategy is adopted, SpikingGCN achieves a better generalization than GCN when the training data is scarce.

c.3 Energy Efficiency Experiments

Fig. 6 shows the remarkable energy difference between SpikingGCN and GNN based models. First, the sparse characteristic of graph datasets fits the spike-based encoding method. Furthermore, the zero values in node representations would have no chance to inspire a synapse event (spike) on a neuromorphic chip, leading to no energy consumption. Second, our simplified network architecture only contains two main neuron layers: a single fully connected layer and an LIF layer. Consider Pubmed as an example. Few attributes and a sparse adjacency matrix result in sparse spikes, and the smaller number (i.e., 3) of classes also require fewer neurons. This promising results imply that SpikingGCN could have the potential to achieve more significant advantages in energy consumption than general GNNs.

c.4 SpikingGCN on Other Application Domains

Results on image grids. The MNIST dataset contains 60,000 training samples and 10,000 testing samples of handwritten digits from 10 classes. Each image has grids or pixels, hence we treat each image as a node which has 784 features. It is worth noting that the grid image classification is identical to the citation networks where node classes will be identified, with the exception of the absence of an adjacent matrix. To extend our model, we adopt the traditional convolutional layers and provide the trainable spike encoder for graph embedding models, and the extended framework is given by Fig. 7. Since the LIF neuron models contain the leaky parameters , which can decay the membrane potential and activate the spikes on a small scale, we adopt the Integrate-and-Fire (IF) process to maintain a suitable firing rate for the encoder. The membrane activity happening in the spike encoder can be formalized as:

(15)

where is the convolutional output at time step , and is given in (8). As shown in Fig. 7, the convolutional layers combined with the IF neurons will perform an auto-encoder function for the input graph data. After processing the spike trains, the fully connected layers combined with the LIF neurons can generate the spike rates for each class, and we will obtain the prediction result from the most active neurons.

Figure 7: An extended model for deep SNNs
Figure 8: Comparison between grid images and superpixel images

Results on superpixel images.

Another more complex graph structure is the superpixel images. Compared with the general grid images, superpixel images represent each picture as a graph which consists of connected nodes. Hence the classification task is defined as the prediction on the subgraphs. Another important distinction is that the superpixel images require to construct the connectivity between chosen nodes. A comparison between the grid and superpixel images is shown in Fig. 8, where 75 superpixels are processed as the representation of the image.

One of the important steps when processing the superpixel data is learning effective graph embedding extracted from the graph. To demonstrate the ability of our model when predicting based on the superpixel images, we empirically follow the convolutional approach utilized in SplineCNN [12] to further aggregate the connectivity of superpixels. The trainable kernel function based on B-splines can make the most use of the local information in the graph and filter the input into a representative embedding. Similar to the framework proposed in Fig 7, the experiments on superpixel images also follow the structures as grid image experiments, where the convolutional layers & IF neurons enable the spike representations, and the fully connected layers & LIF neurons are responsible for the classification results.

In addition, we also provide a unique perspective to understand the mechanism of our model. In particular, our spike encoder can be regarded as a sampling process using spike train representation. The scenario of the image graph provides us an ideal chance to visualize the data processing in our model. Regarding the experiments of grid and superpixel images, we extract the outputs of our spike encoder and visualize them in Fig. 9 (c), along with other observations. First, the Bernoulli encoder mentioned above can be viewed as a sampling process with respect to the pixel values. As the time step increases, the encoder almost rebuilds the original input. However, the static spike encoder can not capture more useful features from the input data. Thus, our trainable encoder performs the convolution procedure and stimulates the IF neurons to fire a spike. As shown in Fig. 9 (b) and (c), by learning the convolutional parameters in the encoder, the spike encoder successfully detects the structure patterns and represents them in a discrete format.

(a) Outputs of Bernoulli encoder in grid images
(b) Outputs of trainable encoder in grid images
(c) Outputs of trainable encoder in superpixel images
Figure 9: Visualization of the spike trains generated by the spike encoder. We extract these features from the MNIST dataset for demonstration. Grid images: (a) shows the spike trains from a simple Bernoulli encoder, and we list the different time steps which indicate different precision. (b) depicts the spikes from the trainable spike encoder, in which the overall shape patterns are learned. Superpixel images: (c) demonstrates the spikes from the trainable encoder, and the encoding results indicate the successful detection of local aggregation.

Spike encoder for recommender systems.

Much research has tried to leverage the graph-based methods in analyzing social networks [39, 41, 14]. To this end, we extend our framework to the recommender systems, where users and items form a bipartite interaction graph for message passing. We tackle the rating prediction in recommender systems as a link classification problem. Starting with MovieLens 100K datasets, we take the rating pairs between users and items as the input, transform them into suitable spike representations, and finally output the classification class via firing rate. To effectively model this graph-structured data, we build our trainable spike encoder based on the convolutional method used in GC-MC [39]. In particular, GC-MC applies a simple but effective convolutional approach based on differentiable message passing on the bipartite interaction graph, and reconstruct the link utilizing a bilinear decoder.