1 Introduction
The pace of growth of the body of scientific research has been rapidly increasing in recent years. For example, the number of records in DBLP^{1}^{1}1https://dblp.unitrier.de/statistics/recordsindblp has increased from 2,486,800 in 2013, to 4,893,893 in 2019; according to the AI index report 2019^{2}^{2}2https://hai.stanford.edu/aiindex/2019, the number of peerreviewed AI publications has increased by 300% between 19982018. Quantifying the impact of the publications, as well as individual scholars/authors is an important task in many domains of societal and scientific relevance. For example, funding agencies and research institutes need to deeply understand the current research development – e.g., discovering frontier ideas, identifying breakthrough topics and productive scholars, seeking wellfitted scientists for defined projects, hiring highquality faculties [Fortunato et al.2018] – for improved policy/decision making. The availability of various scientific databases, such as Web of Science, Google Scholar, DBLP and U.S. Patent, provides an unprecedented opportunity to explore the career of scientists and the dynamic evolving process of paper dissemination. However, scientific impacts of scholars and papers can be affected by a variety of factors. For example, a productive researcher may publish a number of papers every year, but the impact of her/his publications may vary significantly [Sinatra et al.2016]. Also, some scientific findings may receive a burst of attention immediately, while others may take decades since their original publication date [Ke et al.2015].
Existing works: Quantifying and foreseeing the (impact of) scientific diffusion have been scrutinized by generations of researchers since [Price1965]. Earlier efforts [Yan et al.2011, Acuna et al.2012, Wang et al.2013] primarily focused on extracting indicative features and discovering latent mechanisms that drive the accumulation of citations. Features of scholars such as the number of publications, the index, years since the first publication were used to forecast the future index in [Acuna et al.2012]. Factors such as topical authority and publication venue that may increase citations were used to predict the scientific impact in [Dong et al.2016]. Despite certain merits, the previous works are limited on the impact predictability due to the confluence of different and sometimes controversial factors [Clauset et al.2017, Ke et al.2015]
, and the difficulty of generalizing the knowledge from one discipline to another. Meanwhile, some implicit but important factors are not fully leveraged in a scientific way, such as academic authority that amplifies author/paper exposure and facilitates grants funding. Another line of work predicts the propagation of scientific impact from the perspective of stochastic information dynamics, relying on various patternrecognition based models (e.g., Hawkes process and Poisson process)
[Shen et al.2014, Cao et al.2017]. These methods are theoretically solid and demonstrated their advancement, particularly for interpretability, but they require longer sequences of observations and are unable to fully leverage the complex interactions among authors and papers for impacts prediction.Recent applications of deep neural networks on graphbased data have inspired numerous models for capturing temporal and sequential process of information diffusion, including scientific impact. DeepCas [Li et al.2017] is a graphembedding based popularity prediction model, learning the representation of cascade graphs with DeepWalk [Perozzi et al.2014]
and the diffusion process via recurrent neural networks
[Chung et al.2014]. CasCN [Chen et al.2019] exploits the structure of each information cascade by a dynamic graph convolutional network (GCN) [Kipf and Welling2017], and predicts the size of cascades while taking the directionality of cascades and time decay effects into consideration. As a multitask learning framework, it simultaneously predicts the information popularity at the macrolevel and the user participation in reposting at the microlevel However, these approaches deal with representation learning of homogeneous graphs, which limits their capability of exploiting the information associated with node attributes and complex relations among heterogeneous nodes. Thus, incorporating meaningful relations among nodes into the information diffusion remains one of the unaddressed issues in existing methods, which also motivates our present work.Present work: In this paper, we propose a Heterogeneous Dynamical Graph Neural Networks (HDGNN)based approach to study the dynamic evolving process of scientific impact while capturing rich semantics embedded in bibliographic graphs. HDGNN bridges the gap between dynamical GNNs [Trivedi et al.2019, Manessi et al.2020] and heterogeneous information network (HIN) embedding [Zhang et al.2019a, Lu et al.2019, Shi et al.2018], which have largely been studied independently in the prior works. HDGNN learns academic graph representation with a heterogeneous GNN that aggregates neighboring features of nodes with a newly designed weighted contextualized node selection strategy and temporalattentive representation network, while preserving the unevenly distributed scientific impact of nodes. It also captures the dynamic evolution of nodes and the temporal dependencies among authors/papers, by encoding temporal cascading information into node representations which, in turn, sheds light on the underlying mechanism that accumulates the impact for both authors and papers.
Our main contribution is twofold: (1) We study scientific impact quantification problem from the view of heterogeneous graph learning compared to prevalent homogeneous graph/cascade learning models [Li et al.2017, Chen et al.2019], which allows us to capture complex and rich interactions among different types of nodes. (2) We extend heterogeneous graph learning with a temporal horizon, enabling us to address the dynamic prediction problem – compared to existing HIN embedding works, which mainly focus on graph representation learning and/or several relevant static tasks such as link prediction and node classification [Zhang et al.2019a, Lu et al.2019].
2 Preliminaries
We now introduce the necessary background and formally define our problem settings.
Consider papers and authors as two independent sets of entities, denoted as and , respectively. For each paper , indicates the time since its first publication. For each author , represents how many years this author has been publishing papers. Let be the reference time and be the prediction time, be the number of citations of paper at time , and the number of citations of author at time . The scientific impact predictions for papers and authors can be defined as regression problems as follows:
Problem 1 (Scientific impact prediction for papers).
Given papers , for each paper and its associated observations at time , we aim to predict its total number of citations at prediction time , i.e., how many times this paper has been cited since publication.
Problem 2 (Scientific impact prediction for authors).
Given authors , for each author and its associated observations at time , we aim to predict its total number of citations at prediction time , i.e., how many times this author has been cited since her first publication.
Given the above we build an academic heterogeneous graph , where denotes the set of nodes and is the set of weighted and directed edges indicating node relations. For each node in , it is associated with a node type in (we consider types of nodes: paper, author, and venue). Edges in are described by different types defined in : author writes paper, author collaborates with author, author publishes in venue, author cites paper, paper is published in venue, paper cites paper, and paper cites author(s). Additionally, node features are represented by , including content of papers, profiles of authors, etc.
Suppose we have papers and authors. During an observation window each paper or author can be cited by other papers/authors. Then, the sequence of citations can be represented as ), i.e., cascades. Given the exact number of citations at prediction time for a particular paper/author, the scientific impact prediction problem can be solved by optimizing the mean square error loss between predicted number of citations and the true number .
3 Methodology: HDGNN
We now present our proposed model HDGNN, which consists of two main building blocks: (i) heterogeneous representation learning via Graph Neural Networks (GNNs); and (ii) temporal paper sequence modeling and author aggregating via Recurrent Neural Networks (RNNs). For simplicity, here we use scientific impact prediction for papers as an illustrative scenario, with a note that the results can be easily generalized to the scenario of scientific impact prediction for authors (we show the prediction results for both settings in Section 4).
3.1 Heterogeneous Graph Representation
The first part of HDGNN is to learn representation of nodes in . Specifically, for a paper node , author node , and venue node – given heterogeneous neighbors in a nonEuclidean graph structure – we learn a lowdimensional node embedding via a mapping function and the embedding preserves neighboring proximity. Towards that, we borrow the idea of random walk with restart [Tong et al.2006] and a deep neural network architecture [LeCun et al.2015] from [Zhang et al.2019a] to model the heterogeneous graph representation learning.
Heterogeneous neighboring node sampling.
Given a node (paper/author/venue) in graph
, the distribution of its neighboring nodes may be highly skewed, i.e., some nodes connect to a large number of other nodes (e.g., those highly cited papers/authors) while most of them only have a few neighbors, greatly following the heavytailed distribution of citations
[Fortunato et al.2018]. High impact journals, productive authors, or influential papers, often have higher degrees compared to other majorities. To accommodate this factor into our model, we design a weighted contextualized node selection strategy based on random walk, which is more suitable for capturing scientific impact and imbalanced distribution of nodes in the heterogeneous academic graph. Specifically, for each current step, a given nodeeither returns to the previous node with probability
, or jumps to the next neighbor node with probability . Let be the set of ’s neighbors. Then the node has a probability to select one of its neighbors based on node types , edge types , and node/edge characteristics. Specifically, the probability to walk to the next node from is:(1) 
where are influence functions measuring node influence from various factors, e.g., arrival time, node degrees, pagerank scores, and similarities, according to the node type or edge type .
Through running random walks iteratively, we can sample a fixed number of nodes for each node type in , resulting in three sets denoted as: , , and , respectively. Note that we consider edge directions, weights, and node degrees when sampling representative heterogeneous neighbors.
Aggregating node features.
After sampling the neighbors for each node, we utilize Bidirectional Gated Recurrent Units (BiGRUs)
[Chung et al.2014] to model the dependencies among the nodes’ content features. Assuming that there are content features for one specific type of nodes, the feature aggregation can be formalized as:(2)  
(3) 
where is the aggregated embedding of node computed by mean pooling; denotes the concatenation operation; are heterogeneous node content features and
is the output of the MultiLayer Perceptron (MLP). In practical applications, various content features can be used here to enhance the model learning ability – e.g., metadata and the text of papers (title, abstract, main body), illustrations/figures historical publications of authors/venues, profiles and honors of authors.
Aggregating heterogeneous neighbors.
After aggregating node content features, for each node in the graph we have its corresponding aggregated features . Then we are ready to use a typebased RNN to aggregate embeddings of the neighbors in . For each node type in (in our case the paper/author/venue), is the homogeneous typespecific neighboring set of node and is a typespecific aggregator. More specifically, HDGNN utilizes another BiGRU for modeling ’s neighbors:
(4) 
where is the output embedding from the homogeneous neighboring set , and is the dimension of aggregated neighboring embeddings of node .
In HDGNN we use deterministic neural networks, bidirectional RNN, and mean pooling as aggregators of node’s content along with node’s neighbors. Alternatively, other types of aggregators, e.g., last hidden state of RNNs, CNNs, max pooling, can be used (cf.
[Zhang et al.2019a, Hamilton et al.2017]).Multihead attention for typebased neighbors.
With each of the typebased neighboring aggregators in hand, we are able to combine them using multihead attention mechanism [Veličković et al.2018]:
(5) 
where is the learned embedding of node
, LeakyReLU is the activation function,
denotes the concatenation operation, is the attention parameter, and is the number of attention heads. Here, and are computed by Eq. (2) and Eq. (4), respectively.3.2 Citation Cascading and Author Aggregation
The second part of HDGNN is to model the cascading behavior of the papers/authors. Here we consider each paper as an independent entity. Recall that is the publication time of , is the reference time, is the set of citation papers of published at time during the observation window . Since we already obtained embeddings of the papers , authors , and venues (cf. Eq. (5)), we now separately model authors of a paper and the paper itself by feeding them into RNNs.
Multiauthor aggregation layer.
Note that each citing paper of a given/original paper , may contain multiple authors (in our dataset the mean number of authors per paper is 3.438 and the max is 25). We sequentially pipeline the author embeddings into a GRU and then use the last hidden state as the representation of ’s authors.
Sequential citation aggregation layer.
After author aggregation, for each paper , we have its own embedding , the corresponding venue embedding , and the aggregated author embeddings . We then use a twolayer BiGRU to sequentially aggregate the citing papers ordered by their publishing time , where each citing paper is modeled as the combination of paper, authors, and venue. The rationalè is that we expect to capture temporal dependencies among citing papers, which, as we will show in the experiments, is superior to other aggregators such as sum or max pooling [Hamilton et al.2017]. The overall architecture of the citation aggregation can be formalized as:
(6) 
where is the th hidden state of the second layer of BiGRU. Here we concatenate the last hidden state of BiGRU as the final output representation of paper, and then make use of it to predict the scientific impact of .
Output and model training.
The output of HDGNN is the predicted citation number of a paper . We use a twolayer of MLPs with GeLU activation [Hendrycks and Gimpel2016]. The training losses of graph representation and impact prediction are respectively defined as:
(7)  
(8) 
where is the typebased neighboring set of node , is the conditional probability, is the number of training samples, and is the predicted number of citations of paper at time .
As for scientific impact prediction for scholars, the general training process is similar, except that Eq. (8) is alternatively defined as: where is the predicted number of citations for author .
4 Experiments
We evaluate HDGNN and several baselines on two scientific impact predictions – paper and author citations, respectively.
Dataset.
The evaluations were performed on American Physical Society (APS) dataset^{3}^{3}3https://journals.aps.org/datasets. The APS dataset contains over 422K academic papers on 17 venues and 54M citations among papers between 1893 and 2017. The constructed heterogeneous graph of APS contains 616,316 papers, 430,950 authors, and 17 venues. For edges we have: author writes paper (2.9M), author collaborates with author (2.8M), author publishes in venue (0.6M), author cites paper (20.5M), paper cites paper (7.3M), paper cites author (4.8M), paper is published in venue (0.6M). For papers (authors) in the dataset, we select 20 years as the prediction time (). Thus, we only consider the papers published before 1997, to ensure that each paper has at least 20 years to grow its citations count. In the same way, selected authors are required to start their research career no later than 1997. We set reference time to 2 years and we note that papers/authors whose citations are less than 10 during the observation window are filtered out. The settings of prediction for authors are as the same as that for papers. After the preprocessing, we have a total of 11,475 papers and 14,318 authors. We use 50% of them for training, 25% for validation and the rest 25% for testing. Figure 2 shows the statistics of APS dataset.
Baselines.
The baselines used for comparison include featureoriented models, graph embedding models, graph neural networks, as well as the stateoftheart information cascade popularity prediction models.
Uniform – for all papers/authors, we always predict their impact as a fixed number, uniformly searched from the minimum to maximum with a step of .
Feature
– we extract features into a linear regression model: observed citations
, mean arrival time, and degrees of nodes. We use observed citation (i.e., Feature) as a simple baseline.DeepCas [Li et al.2017]
– is a deep learning based prediction model utilizing DeepWalk for graph embedding and RNNs for cascade modeling and predicting.
DeepHawkes [Cao et al.2017] – makes use of Hawkes point processes and neural networks for cascade prediction.
CasCN [Chen et al.2019] – utilizes GCN [Kipf and Welling2017] and LSTM to model the structural and temporal information of cascades.
Variants of HDGNN.
In order to compare other graph representation frameworks with our proposed HDGNN, we select following 10 models to replace the first part of HDGNN as variants, including homogeneous or heterogeneous methods, skipgram based or matrix factorization based methods: DeepWalk [Perozzi et al.2014], LINE [Tang et al.2015], Metapath2Vec [Dong et al.2017], ProNE [Zhang et al.2019b], together with graph neural network GraphSAGE [Hamilton et al.2017] and HetGNN [Zhang et al.2019a]. Besides, we substitute the RNN aggregator with max pooling or sum pooling as two additional variants, denoted as HDGNNMaxP and HDGNNSumP. To evaluate the impact of author/venue embeddings, we separately remove the author part or venue part in Eq. (3.2) as HDGNNNoAuthor and HDGNNNoVenue.
Metrics.
We use two widely used evaluation metrics
[Zhao et al.2015, Shen et al.2014, Cao et al.2017], i.e., mean square logarithmic error (MSLE) and accuracy (ACC):MSLE: ;
ACC: ;
where is the indicator function, is the test sample size.
Model  Papers  Authors  

MSLE  ACC  MSLE  ACC  
Uniform  0.588  49.70%  1.102  36.37% 
Feature  0.401  58.35%  0.939  39.16% 
Feature  0.361  58.77%  0.832  40.19% 
DeepCas  0.349  58.22%  0.787  41.45% 
DeepHawkes  0.328  59.91%  0.725  42.38% 
CasCN  0.310  61.71%  0.692  44.03% 
DeepWalk  0.288  68.89%  0.627  49.09% 
LINE  0.281  68.70%  0.614  47.88% 
Metapath2Vec  0.294  66.38%  0.642  49.72% 
GraphSAGE  0.309  64.97%  0.675  48.46% 
HetGNN  0.292  65.79%  0.607  51.06% 
ProNE  0.297  66.60%  0.635  47.65% 
HDGNNMaxP  0.358  64.22%  0.810  42.71% 
HDGNNSumP  0.280  69.90%  0.749  44.10% 
HDGNNNoAuthor  0.279  69.05%  0.605  50.00% 
HDGNNNoVenue  0.290  67.10%  0.651  48.26% 
HDGNN  0.268  69.77%  0.590  51.62% 
Experimental settings.
For all graph representation baselines, we set the embedding dimension to 128. Random walk restart probability is 0.5, walk length is 30, and number of walks for each node equals to 5. For type specific parameters , we use node indegree and edge weights as a proxy of node influence. For HDGNN and its variants, the learning rate is chosen from , and the node embedding size is . The length of citation sequence of all methods (whether RNN, LSTM or GRU) is set to – i.e., the max number of citation sequence. For papers/authors whose length is more than , we only select their first citations (as for author sequence, the length of RNN is set to ). The units are set to 128 and 64 in twolayer BiRNNs, and to 64 and 32 in twolayer MLPs. For feature aggregation RNNs, we use paper title embeddings pretrained via BERT [Devlin et al.2019]
, and node embeddings pretrained via DeepWalk. All the other hyperparameters of baselines are set to their default values. Performance results are reported with early stopping on validation loss of 10 epoch patience. The source code of HDGNN is released at
https://github.com/Xovee/hdgnn.Prediction performance.
We show the performance of all the models in Table 1, and we observe that:
(1) HDGNN outperforms all the other methods in both paper and author impact prediction. This result demonstrates the effectiveness of learning interactions among heterogeneous nodes with the proposed heterogeneous information aggregation, which can be further verified by the fact that both featurebased models and homogeneous cascade prediction methods do not show comparable performance. Previous popularity prediction methods, e.g., DeepCas, DeepHawkes and CasCN, do not distinguish the type of nodes and therefore fail to model their complex and meaningful interactions.
(2) Author impact prediction is much harder than that of papers. As shown in (B)(E) in Figure 2, the citation number of authors is higher than that of papers by orders of magnitude, as well as the coefficients of correlation between observed and future citations. In fact, in settings of two year observation, the proportion of average observed citations to is about 9.1% for authors. In contrast, the proportion for papers is 34.6% (cf. (A) in Figure 2), which explains why prediction for authors’ impact is more difficult – i.e., largely due to insufficient observations and enormous variability in scholars’ productivity [Clauset et al.2017] (cf. (F) in Figure 2). In addition, paper citation is strongly correlated to the factors such as the citations a paper has gained and the importance of publication venue (e.g., journal impact factor), which can be easily modeled in the graph with node attributes. In contrast, scholars’ impact is far more unstable due to implicit factors such as funding scheme, tenure, gender issues – all of which need to be quantified with external highresolution data repositories.
Ablation study.
We now investigate the effect of important modules in HDGNN. Firstly, the information aggregation mechanism used in HDGNN is better than other graph embedding techniques including two heterogeneous network embedding methods, i.e., Metapath2Vec and HetGNN – because of the more complex relations considered in our model and the benefit of considering temporal dependencies between citation sequences and/or author sequences. For example, HDGNN models 7 types of relations among nodes, whereas HetGNN, in contrast, only considers 3 edge types.
Additionally, the publication venue plays a vital role in predicting the impact of an author or a paper. This is demonstrated by the significant performance degradation after removing venue embedding in Eq. (3.2). Authorship, surprisingly, is less important than the journal that a paper published in, though masking the authorship information may slightly degrade the prediction performance. As for aggregation choices, both max pooling and sum pooling are inferior to the RNN aggregator used in HDGNN, due to their lack of sequential dependencies, which are important for evolving trend prediction.
Qualitative analysis.
Figure 4 shows the prediction results on 8 representative journals – the lower the MSLE and/or the higher the accuracy, the better performance. The performance of HDGNN varies significantly on different publication venues – this is natural since venue is a strong indicator for future impact accumulation. In addition, we found that the prediction accuracy is affected by the citation distribution of papers in a journal. For example, the standard deviation of 20 year citations of papers (i.e., ) on Rev. Mode. Phys. is very high (255.02), whereas the value on Phys. Rev. is significantly less (43.24). This discrepancy also reveals why prediction of papers on Rev. Mode. Phys. is more difficult.
Figure 4 plots the latent space learned in HDGNN, where we can observe clear clustering phenomena of author/paper embeddings from (A) and (C). It appears that papers published in the same journal tend to cluster together, which also indicates publication venue is an important indicator for scientific impact prediction. In addition, we also visualize a “crowd effect” of high impact papers/authors, as shown in (B) and (D). This also implies strong correlations among high impact scholars and papers, e.g., high impact scholars prefer to cite papers from other high impact authors/papers. In other words, there indeed exists a positive feedback loop between high impact papers/scholars. Another interesting result can be visualized is the gradually decaying color of the paper/author citations, implying that heavytailed distribution of scientific impact is successfully (to some extent at least) encoded in our model. It could also explain why our dynamic heterogeneous neighboring aggregation with weighted contextualized node selection strategy substantially outperforms other homogeneous and heterogeneous graph embeddings.
5 Conclusion
We introduced the HDGNN approach for effectively quantifying and predicting the scientific impact of scholars and research publications, by bridging the dynamic processes of impact evolution and complex nodes interactions. We presented an efficient network sampling method with the consideration of rich node relations and a temporally attentive neighbor aggregation network to model the complex and accumulating dynamic processes of scientific impact. Evaluations on a realworld scientific dataset demonstrated the superior performance HDGNN in comparison to several stateoftheart baselines. Future work will investigate the impact of crossinstitutional collaboration on citations.
Acknowledgments
This work was supported by National Natural Science Foundation of China (Grant No.61602097, 61472064, U19B2028 and 61772117), NSF grants III 1213038 and CNS 1646107.
References
 [Acuna et al.2012] Daniel E Acuna, Stefano Allesina, and Konrad P Kording. Future impact: Predicting scientific success. Nature, 489(7415):201, 2012.
 [Cao et al.2017] Qi Cao, Huawei Shen, Keting Cen, Wentao Ouyang, and Xueqi Cheng. Deephawkes: Bridging the gap between prediction and understanding of information cascades. In CIKM, 2017.
 [Chen et al.2019] Xueqin Chen, Fan Zhou, Kunpeng Zhang, Goce Trajcevski, Ting Zhong, and Fengli Zhang. Information diffusion prediction via recurrent cascades convolution. In ICDE, 2019.
 [Chung et al.2014] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
 [Clauset et al.2017] Aaron Clauset, Daniel B Larremore, and Roberta Sinatra. Datadriven predictions in the science of science. Science, 355(6324):477–480, 2017.
 [Devlin et al.2019] Jacob Devlin, MingWei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. In NAACLHLT, pages 4171–4186, 2019.
 [Dong et al.2016] Yuxiao Dong, Reid A Johnson, and Nitesh V Chawla. Can scientific impact be predicted? IEEE Transactions on Big Data, 2(1):18–30, 2016.
 [Dong et al.2017] Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. metapath2vec: Scalable representation learning for heterogeneous networks. In KDD, pages 135–144, 2017.
 [Fortunato et al.2018] Santo Fortunato, Carl T Bergstrom, Katy Börner, James A Evans, Dirk Helbing, Staša Milojević, Alexander M Petersen, Filippo Radicchi, Roberta Sinatra, Brian Uzzi, et al. Science of science. Science, 359(6379), 2018.
 [Hamilton et al.2017] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In NIPS, pages 1024–1034, 2017.
 [Hendrycks and Gimpel2016] Dan Hendrycks and Kevin Gimpel. Bridging nonlinearities and stochastic regularizers with gaussian error linear units. arXiv preprint arXiv:1606.08415, 2016.
 [Ke et al.2015] Qing Ke, Emilio Ferrara, Filippo Radicchi, and Alessandro Flammini. Defining and identifying sleeping beauties in science. PNAS, 112:7426–7431, 2015.
 [Kipf and Welling2017] Thomas N. Kipf and Max Welling. Semisupervised classification with graph convolutional networks. In ICLR, 2017.
 [LeCun et al.2015] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436, 2015.
 [Li et al.2017] Cheng Li, Jiaqi Ma, Xiaoxiao Guo, and Qiaozhu Mei. Deepcas: An endtoend predictor of information cascades. In WWW, 2017.
 [Lu et al.2019] Yuanfu Lu, Chuan Shi, Linmei Hu, and Zhiyuan Liu. Relation structureaware heterogeneous information network embedding. In AAAI, pages 4456–4463, 2019.
 [Manessi et al.2020] Franco Manessi, Alessandro Rozza, and Mario Manzo. Dynamic graph convolutional networks. Pattern Recognition, 97, 2020.
 [Perozzi et al.2014] Bryan Perozzi, Rami AlRfou, and Steven Skiena. Deepwalk: Online learning of social representations. In KDD, pages 701–710, 2014.
 [Price1965] Derek J De Solla Price. Networks of scientific papers. Science, pages 510–515, 1965.
 [Shen et al.2014] Huawei Shen, Dashun Wang, Chaoming Song, and AlbertLászló Barabási. Modeling and predicting popularity dynamics via reinforced poisson processes. In AAAI, 2014.
 [Shi et al.2018] Yu Shi, Qi Zhu, Fang Guo, Chao Zhang, and Jiawei Han. Easing embedding learning by comprehensive transcription of heterogeneous information networks. In KDD, pages 2190–2199, 2018.
 [Sinatra et al.2016] Roberta Sinatra, Dashun Wang, Pierre Deville, Chaoming Song, and AlbertLászló Barabási. Quantifying the evolution of individual scientific impact. Science, 354(6312):aaf5239, 2016.
 [Tang et al.2015] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Largescale information network embedding. In WWW, pages 1067–1077, 2015.
 [Tong et al.2006] Hanghang Tong, Christos Faloutsos, and JiaYu Pan. Fast random walk with restart and its applications. In ICDM, pages 613–622. IEEE, 2006.
 [Trivedi et al.2019] Rakshit Trivedi, Mehrdad Farajtabar, Prasenjeet Biswal, and Hongyuan Zha. Dyrep: Learning representations over dynamic graphs. In ICLR, 2019.
 [Veličković et al.2018] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. In ICLR, 2018.
 [Wang et al.2013] Dashun Wang, Chaoming Song, and AlbertLászló Barabási. Quantifying longterm scientific impact. Science, 342(6154):127–132, 2013.

[Yan et al.2011]
Rui Yan, Jie Tang, Xiaobing Liu, Dongdong Shan, and Xiaoming Li.
Citation count prediction: learning to estimate future citations for literature.
In CIKM, pages 1247–1252, 2011.  [Zhang et al.2019a] Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh V Chawla. Heterogeneous graph neural network. In KDD, pages 793–803, 2019.
 [Zhang et al.2019b] Jie Zhang, Yuxiao Dong, Yan Wang, Jie Tang, and Ming Ding. Prone: fast and scalable network representation learning. In IJCAI, pages 4278–4284, 2019.
 [Zhao et al.2015] Qingyuan Zhao, Murat A Erdogdu, Hera Y He, Anand Rajaraman, and Jure Leskovec. Seismic: A selfexciting point process model for predicting tweet popularity. In KDD, 2015.