1 Introduction
The software vulnerability problems have been rapidly grown recently, either reported through publicly disclosed informationsecurity flaws and exposures (CVE) or exposed inside privatelyowned source codes and opensource libraries. These vulnerabilities are the main reasons for cyber security attacks on the software systems that cause substantial damages economically and socially
(Neuhaus et al., 2007; Zhou et al., 2019). Therefore, vulnerability detection is an essential yet challenging step to identify vulnerabilities in the source codes to provide security solutions for the software systems.Early approaches (Neuhaus et al., 2007; Nguyen & Tran, 2010; Shin et al., 2010)
have been proposed to carefully design handengineered features for machine learning algorithms to detect vulnerabilities. These early approaches, however, suffer from two major drawbacks. First, designing good features requires prior knowledge, hence needs domain experts, and is usually timeconsuming. Second, handengineered features are impractical and not straightforward to adapt to all vulnerabilities in numerous libraries evolving over time.
To reduce human efforts on feature engineering, recent approaches (Li et al., 2018; Russell et al., 2018)
consider each raw source code as a flat natural language sequence and explore deep learning architectures applied for natural language processing (NLP) (such as LSTMs
(Hochreiter & Schmidhuber, 1997) and CNNs (Kim, 2014)) in vulnerability detection. Besides, it is worth noting that pretrained language models such as BERT
(Devlin et al., 2018) have recently emerged as a significantly trending learning paradigm, offering numerous successful applications in NLP. Inspired by the successes of BERTstyle models, pretrained programming language (PL) models such as CodeBERT (Feng et al., 2020) have also made a significant improvement for PL downstream tasks, including vulnerability detection. However, as mentioned in (Nguyen et al., 2019), all interactions among all positions in the input sequence inside the selfattention layer of the BERTstyle model build up a complete graph, i.e., every position has an edge to all other positions. Hence, this limits learning local structures within the source code to differentiate vulnerabilities.Graph neural networks (GNNs) have recently become a central method to embed nodes and graphs into lowdimensional continuous vector spaces
(Hamilton et al., 2017; Wu et al., 2019). GNNs provide faster and practical training, higher accuracy, and stateoftheart results for downstream tasks such as text classification (Yao et al., 2019). Inspired by this advanced architecture, Devign (Zhou et al., 2019) is proposed to utilize GNNs for vulnerability detection, using a complex preprocess to extract multiedged graph information such as Abstract Syntax Tree (AST), data flow, and control flow from the source code. This complex preprocess, however, is difficult of being practiced for many programming languages and numerous opensource codes and libraries.In this paper, we propose a general and novel graph neural networkbased model, named ReGVD, for vulnerability identification. In particular, we also consider programming language as natural language. Hence, ReGVD treats a given source code as a flat sequence of tokens and leverages two effective graph construction methods to build a single graph. The first method is to consider unique tokens as nodes and cooccurrences between nodes (within a fixedsize sliding window) as edges. The second is to consider indexes as nodes and also cooccurrences between indexes as edges. To make a fair comparison with pretrained PL models such as CodeBERT, ReGVD employs only the embedding layer of the pretrained PL model to initialize node feature vectors. Then, ReGVD examines GNNs, but with a novelty of using residual connection among GNN layers. Next, ReGVD exploits the sum and max poolings and utilizes a beneficial mixture between these poolings to produce a graph embedding for the given source code. This graph embedding is finally fed to a single fullyconnected layer followed by a softmax layer to predict the code vulnerabilities. To sum up, our main contributions are as follows:

[leftmargin=0.75cm]

We are inspired by pretrained programming language models and graph neural neural networks to introduce ReGVD – a novel GNNbased model for vulnerability detection.

ReGVD makes use of effective code representation through two graph construction methods to build a graph for each given source code, wherein node features are initialized only by the embedding layer of a pretrained PL model. ReGVD then introduces a novel adaptation of residual connection among GNN layers and an advantageous mixture of the sum and max poolings to enhance learning better code graph representation.

Extensive experiments show that ReGVD significantly outperforms the existing stateoftheart models and produces the highest accuracy of 63.69%, gaining absolute improvements of 1.61% and 1.39% over CodeBERT and GraphCodeBERT respectively, on the benchmark vulnerability detection dataset from CodeXGLUE (Lu et al., 2021).
2 The proposed ReGVD
2.1 Problem definition
We formalize vulnerability detection as an inductive binary classification problem for source code at the function level, i.e., we aim to identify whether a given function in raw source code is vulnerable or not (Zhou et al., 2019). We define a data sample as , where represents the set of raw source codes, denotes the label set with for vulnerable and otherwise, and is the number of instances. As graph neural networks (GNNs) provide faster and practical training, higher accuracy, and stateoftheart results for many downstream tasks (Kipf & Welling, 2017), we leverage GNNs for vulnerability detection. Therefore, we construct a graph for each given source code , wherein is a set of nodes in the graph; is the node feature matrix, wherein each node in is represented by a dimensional realvalued vector ; is the adjacency matrix, where equal to 1 means having an edge between node and node , and 0 otherwise. We aim to learn a mapping function to determine whether a given source code is vulnerable or not. The mapping function
can be learned by minimizing the loss function with the regularization on model parameters
as:(1) 
where is the crossentropy loss function and and is an adjustable weight.
Note that Devign (Zhou et al., 2019) uses a complex preprocess to build a multiedged graph for each given source code. Hence it is impractical for many programming languages (PLs) and numerous opensource codes and libraries. It is also worth noting that recently, pretrained PL models such as CodeBERT (Feng et al., 2020) have significantly improved the performance of PL downstream tasks such as vulnerability detection. However, these BERTstyle PL models limit learning local and logical structures inside the source code to differentiate vulnerabilities. To this end, we propose ReGVD – a novel and general GNNbased model using effective code representation for vulnerability detection as follows: (i) ReGVD views a given raw source code as a flat sequence of tokens and transforms this sequence into a single graph. (ii) ReGVD examines GCNs (Kipf & Welling, 2017) and Gated GNNs (Li et al., 2016), with a novel use of residual connection among GNN layers. (iii) ReGVD utilizes a new and beneficial mixture between the sum and max poolings to produce a graph embedding for the given source code.
In what follows, we first introduce two effective methods to construct a graph for each raw source code in Section 2.2, then describe our ReGVD in utilizing graph neural networks with residual connection in Section 2.3, finally focus on presenting a graphlevel readout layer in Section 2.4 to obtain the graph embedding to perform the classification task.
2.2 Graph construction
We consider a given source code as a flat sequence of tokens and illustrate two graph construction methods in Figure 1 to keep the local programming logic of source code. Note that we omit selfloops in these two methods since the selfloops do not help to improve performance in our pilot experiments. A possible reason is that source code is more structural than natural language where the selfloops can contribute useful graph information (Yao et al., 2019; Huang et al., 2019; Zhang et al., 2020).
Unique tokenfocused construction
We represent unique tokens as nodes and cooccurrences between tokens (within a fixedsize sliding window) as edges, and the obtained graph has an adjacency matrix as:
As the size of the graph is much smaller than the actual length of the source code, this method can consume less GPU memory.
Indexfocused construction
Given a flat sequence of tokens , we represent all tokens as the nodes, i.e., treating each index as a node to represent token . The number of nodes equals the sequence length. We also consider cooccurrences between indexes (within a fixedsize sliding window) as edges, and the obtained graph has an adjacency matrix as:
Node feature initialization
To attain the advantage of pretrained PL models such as CodeBERT and make a fair comparison, we use only the embedding layer of the pretrained PL model to initialize node feature vectors.
2.3 Graph neural networks with residual connection
GNNs aim to update vector representations of nodes by recursively aggregating vector representations from their neighbours (Scarselli et al., 2009; Kipf & Welling, 2017). Mathematically, given a graph , we formulate GNNs as follows:
(2) 
where is the vector representation of node at the th iteration/layer; is the set of neighbours of node ; and is the node feature vector of .
There have been many GNNs proposed in recent literature (Wu et al., 2019), wherein Graph Convolutional Networks (GCNs) (Kipf & Welling, 2017) is the most widelyused one, and Gated graph neural networks (“Gated GNNs” or “GGNNs” for short) (Li et al., 2016) is also suitable for our data structure. Our ReGVD leverages GCNs and GGNNs as the base models.
Formally, GCNs is given as follows:
(3) 
where is an edge constant between nodes and in the Laplacian renormalized adjacency matrix (as we omit selfloops), wherein D is the diagonal node degree matrix of ; is a weight matrix; and
is a nonlinear activation function such as
.GGNNs adopts GRUs (Cho et al., 2014), unrolls the recurrence for a fixed number of timesteps, and removes the need to constrain parameters to ensure convergence as:
(4) 
where z and r are the update and reset gates;
is the sigmoid function; and
is the elementwise multiplication.The residual connection (He et al., 2016)
is used to incorporate information learned in the lower layers to the higher layers, and more importantly, to allow gradients to directly pass through the layers to avoid vanishing gradient or exploding gradient problems. The residual connection is employed in many architectures in computer vision and NLP. Motivated by that, ReGVD presents a novel adaptation of residual connection among the GNN layers, with fixing the same hidden size for the different layers. In particular, ReGVD redefines Equation
3 as:(5) 
Similarly, ReGVD also redefines Equation 4 as follows:
(6) 
2.4 Graphlevel readout pooling layer
The graphlevel readout layer is used to produce a graph embedding for each input graph. This layer can be built more complex poolings such as hierarchical pooling (Cangea et al., 2018), differentiable pooling (Ying et al., 2018), and Conv pooling (Zhou et al., 2019). As the simple sum pooling produces better results for graph classification (Xu et al., 2019), ReGVD leverages the sum pooling to obtain the graph embedding. Besides, ReGVD utilizes the max pooling to exploit more information on the key nodes. ReGVD defines a beneficial mixture between the sum and max poolings to produce the graph embedding as follows:
(7)  
(8) 
where is the final vector representation of node , wherein acts as soft attention mechanisms over nodes (Li et al., 2016), and is the vector representation of node at the last th layer; and denotes an arbitrary function. ReGVD examines three functions consisting of , , and as:
(9)  
(10)  
(11) 
After that, ReGVD feeds to a single fullyconnected layer followed by a layer to predict whether a given source code is vulnerable or not as:
(12) 
Finally, ReGVD is trained by minimizing the crossentropy loss function. We illustrate our proposed ReGVD in Figure 2 and briefly present the learning process in ReGVD in Algorithm 1.
3 Experimental setup and results
In this section, we evaluate the benefits of our proposed ReGVD and address the following questions:
Q1
How does ReGVD compare to other stateoftheart vulnerability detection methods?
Q2
Can the graphlevel readout pooling layer proposed in our ReGVD work better than the more complex Conv pooling layer employed in Devign (Zhou et al., 2019)?
Q3
How is the influence of the residual connection, the mixture function, and the sliding window size on the GNNs performance?
Q4
Can ReGVD obtain satisfactory accuracy results even with limited training data?
3.1 Experimental setup
Dataset  #Instances 

Training set  21,854 
Validation set  2,732 
Test set  2,732 
Dataset
We use the realworld benchmark dataset from CodeXGLUE (Lu et al., 2021) for vulnerability detection at the function level.^{1}^{1}1https://github.com/microsoft/CodeXGLUE/tree/main/CodeCode/Defectdetection The dataset was firstly created by Zhou et al. (2019), including 27,318 manuallylabeled vulnerable or nonvulnerable functions extracted from securityrelated commits in two large and popular C programming language opensource projects (i.e., QEMU and FFmpeg) and diversified in functionality. Since Zhou et al. (2019) did not provide official training/validation/test sets, Lu et al. (2021) combined these projects and then split into the training/validation/test sets. Table 1 reports the statistics of this benchmark dataset.
Training protocol
We construct a 2layer model, set the batch size to 128, and employ the Adam optimizer (Kingma & Ba, 2014)
to train our model up to 100 epochs. As mentioned in Section
2.3, we set the same hidden size (“hs”) for the hidden GNN layers, wherein we vary the size value in {128, 256, 384}. We vary the sliding window size (“ws”) in {2, 3, 4, 5} and the Adam initial learning rate (“lr”) in . The final accuracy on the test set is reported for the best model checkpoint, which obtains the highest accuracy on the validation set. Table 2 shows the optimal hyperparameters for each setting in our ReGVD.Const  Init  Base  lr  ws  hs  

Idx  CB  GGNN  2  
GCN  2  
GCB  GGNN  2  
GCN  2  
UniT  CB  GGNN  2  
GCN  5  
GCB  GGNN  3  
GCN  5 
Baselines
We compare our ReGVD with strong and uptodate baselines as follows:

[leftmargin=0.75cm]

GraphCodeBERT (Guo et al., 2021) is a new pretrained PL model, extending CodeBERT to consider the inherent structure of code data flow into the training objective.
We note that Zhou et al. (2019) did not release the official implementation of Devign. Thus, we reimplement Devign using our two graph construction methods and the same training protocol w.r.t. the optimizer, the window sizes, the initial learning rate values, and the number of training epochs.
Model  Accuracy (%) 

BiLSTM  59.37 
TextCNN  60.69 
RoBERTa  61.05 
CodeBERT  62.08 
GraphCodeBERT  62.30 
Devign (Idx + CB)  60.43 
Devign (Idx + GCB)  61.31 
Devign (UniT + CB)  60.40 
Devign (UniT + GCB)  59.77 
ReGVD (GGNN + Idx + CB)  63.54 
ReGVD (GGNN + Idx + GCB)  63.29 
ReGVD (GGNN + UniT + CB)  63.62 
ReGVD (GGNN + UniT + GCB)  62.41 
ReGVD (GCN + Idx + CB)  62.63 
ReGVD (GCN + Idx + GCB)  62.70 
ReGVD (GCN + UniT + CB)  63.14 
ReGVD (GCN + UniT + GCB)  63.69 
3.2 Main results
Table 3 presents the accuracy results of the proposed ReGVD and the strong and uptodate baselines on the realworld benchmark dataset from CodeXGLUE for vulnerability detection regarding Q1. We note that TextCNN and RoBERTa outperform Devign, except the Devign setting (Idx+GCB). Both the recent models CodeBERT and GraphCodeBERT obtain competitive performances and perform better than Devign, indicating the effectiveness of the pretrained PL models. More importantly, ReGVD gains absolute improvements of 1.61% and 1.39% over CodeBERT and GraphCodeBERT, respectively. This shows the benefit of ReGVD in learning the local structures inside the source code to differentiate vulnerabilities (w.r.t using only the embedding layer of the pretrained PL model). Hence, our ReGVD significantly outperforms the uptodate baseline models. In particular, ReGVD produces the highest accuracy of 63.69% – a new stateoftheart result on the CodeXGLUE vulnerability detection dataset.
In our pilot studies, we achieve higher results for our ReGVD by feeding the flat sequence of tokens as an input for the pretrained PL model to obtain the contextualized embeddings, which are then used to initialize the node feature vectors. But for a fair comparison, we use only the embedding layer of the pretrained PL model to initialize the feature vectors.


We look at Figure 2(a) to address Q2 to investigate whether the graphlevel readout layer proposed in ReGVD performs better than the more complex Conv pooling layer utilized in Devign. Since Devign also uses Gated GNNs to update the node representations and gains the best accuracy of 61.31% for the setting (Idx+GCB); thus, we consider the ReGVD setting (GGNN+Idx +GCB) without using the residual connection for a fair comparison, wherein ReGVD achieves an accuracy of 63.51%, which is 2.20% higher accuracy than that of Devign. More generally, we get a similar conclusion from the results of three remaining ReGVD settings (without using the residual connection) that the graphlevel readout layer utilized in ReGVD outperforms that used in Devign.


We analyze the influence of the residual connection, the mixture function, and the sliding window size to answer Q3. We first look back Figure 2(a) for the ReGVD accuracies w.r.t with and without using the residual connection among the GNN layers. It demonstrates that the residual connection helps to boost the GNNs performance on seven settings, where the maximum accuracy gain is 2.05% for the ReGVD setting (GCN+Idx+GCB). Next, we look at Figure 2(b) for the ReGVD results w.r.t the functions. We find that ReGVD generally gains the highest accuracies on six settings using the operator and on two remaining settings using the operator. But it is worth noting that the ReGVD setting (GGNN+Idx+CB) using the operator obtains an accuracy of 62.59%, which is still higher than that of Devign, CodeBERT, and GraphCodeBERT. Then, we check the results shown in Figure 4 to explore the influence of the sliding window size on the accuracy performance w.r.t the graph construction method. We see that using smaller sizes produces better accuracies than using the larger ones regarding the indexfocused construction method, otherwise regarding the unique tokenfocused construction method. We also find that the window size 3 gives stable results and the highest average accuracy of 62.67% over all eight settings.
We test the best performing settings with different percents of the training data regarding Q4. Figure 5 shows the test accuracies when training ReGVD with 20%, 40%, 60%, 80%, and full training sets. Our model achieves satisfactory performance with limited training data, compared to the baselines using the full training data. For example, ReGVD obtains an accuracy of 61.68% with 60% training set, which is higher than that of BiLSTM, TextCNN, RoBERTa, and Devign. It also achieves an accuracy of 62.55% with 80% training set, which is better than that of CodeBERT and GraphCodeBERT.
4 Conclusion
We introduce a novel graph neural networkbased model, ReGVD, to detect vulnerabilities in source code. ReGVD transforms each raw source code into a single graph to benefit the local structures inside the source code through two effective graph construction methods, wherein ReGVD utilizes only the embedding layer of the pretrained programming language model to initialize node feature vectors. ReGVD then makes novel use of residual connection among GNN layers and a valuable mixture of the sum and max poolings to learn better code graph representation. To demonstrate the effectiveness of ReGVD, we conduct extensive experiments to compare ReGVD with the strong and uptodate baselines on the benchmark vulnerability detection dataset from CodeXGLUE. Experimental results show that the proposed ReGVD is significantly better than the baseline models and obtains the highest accuracy of 63.69% on the benchmark dataset.
ReGVD can be seen as a general, unified and practical framework. Future work can adapt ReGVD for similar classification tasks such as clone detection. Furthermore, we plan to extend and combine ReGVD using the indexfocused graph construction with a BERTstyle model to build an encoder for other programming language tasks such as code completion and code translation.
Acknowledgements
We would like to thank Anh Bui (tuananh.bui@monash.edu) for his kind help and support.
References
 Cangea et al. (2018) Cătălina Cangea, Petar Veličković, Nikola Jovanović, Thomas Kipf, and Pietro Liò. Towards sparse hierarchical graph classifiers. arXiv preprint arXiv:1811.01287, 2018.
 Cho et al. (2014) Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder–decoder for statistical machine translation. In EMNLP, 2014.
 Clark et al. (2020) Kevin Clark, MinhThang Luong, Quoc V Le, and Christopher D Manning. Electra: Pretraining text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555, 2020.
 Devlin et al. (2018) Jacob Devlin, MingWei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
 Feng et al. (2020) Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. Codebert: A pretrained model for programming and natural languages. arXiv preprint arXiv:2002.08155, 2020.
 Guo et al. (2021) Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin B. Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. Graphcodebert: Pretraining code representations with data flow. In ICLR, 2021.
 Hamilton et al. (2017) William L. Hamilton, Rex Ying, and Jure Leskovec. Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584, 2017.
 He et al. (2016) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, pp. 770–778, 2016.
 Hochreiter & Schmidhuber (1997) Sepp Hochreiter and Jürgen Schmidhuber. Long shortterm memory. Neural Computation, 9:1735–1780, 1997.
 Huang et al. (2019) Lianzhe Huang, Dehong Ma, Sujian Li, Xiaodong Zhang, and Houfeng Wang. Text level graph neural network for text classification. In EMNLPIJCNLP, pp. 3444–3450, 2019.
 Kim (2014) Yoon Kim. Convolutional Neural Networks for Sentence Classification. In EMNLP, pp. 1746–1751, 2014.
 Kingma & Ba (2014) Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 Kipf & Welling (2017) Thomas N. Kipf and Max Welling. Semisupervised classification with graph convolutional networks. In ICLR, 2017.
 Li et al. (2016) Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gated Graph Sequence Neural Networks. ICLR, 2016.
 Li et al. (2018) Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. Vuldeepecker: A deep learningbased system for vulnerability detection. arXiv preprint arXiv:1801.01681, 2018.
 Liu et al. (2019) Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
 Lu et al. (2021) Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin B. Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664, 2021.
 Neuhaus et al. (2007) Stephan Neuhaus, Thomas Zimmermann, Christian Holler, and Andreas Zeller. Predicting vulnerable software components. In Proceedings of the 14th ACM conference on Computer and communications security, pp. 529–540, 2007.
 Nguyen et al. (2019) Dai Quoc Nguyen, Tu Dinh Nguyen, and Dinh Phung. Universal graph transformer selfattention networks. arXiv preprint arXiv:1909.11855, 2019.
 Nguyen & Tran (2010) Viet Hung Nguyen and Le Minh Sang Tran. Predicting vulnerable software components with dependency graphs. In Proceedings of the 6th International Workshop on Security Measurements and Metrics, pp. 1–8, 2010.
 Russell et al. (2018) Rebecca Russell, Louis Kim, Lei Hamilton, Tomo Lazovich, Jacob Harer, Onur Ozdemir, Paul Ellingwood, and Marc McConley. Automated vulnerability detection in source code using deep representation learning. In Proceedings of the 17th IEEE international conference on machine learning and applications, pp. 757–762, 2018.
 Scarselli et al. (2009) Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model. IEEE Transactions on Neural Networks, 20:61–80, 2009.
 Shin et al. (2010) Yonghee Shin, Andrew Meneely, Laurie Williams, and Jason A Osborne. Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE transactions on software engineering, 37:772–787, 2010.
 Wu et al. (2019) Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S Yu. A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596, 2019.
 Xu et al. (2019) Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How Powerful Are Graph Neural Networks? ICLR, 2019.
 Yao et al. (2019) Liang Yao, Chengsheng Mao, and Yuan Luo. Graph convolutional networks for text classification. In AAAI, pp. 7370–7377, 2019.
 Ying et al. (2018) Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L. Hamilton, and Jure Leskovec. Hierarchical graph representation learning with differentiable pooling. In NeurIPS, pp. 4805–4815, 2018.
 Zhang et al. (2020) Yufeng Zhang, Xueli Yu, Zeyu Cui, Shu Wu, Zhongzhen Wen, and Liang Wang. Every document owns its structure: Inductive text classification via graph neural networks. In ACL, pp. 334–339, 2020.
 Zhou et al. (2019) Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In NeurIPS, 2019.