Many application settings involve analysis of timestamped relational event data in the form of triplets (sender, receiver, timestamp), as shown in Figure (a)a. Examples include analysis of messages between users of an on-line social network, emails between employees of a company, and transactions between buyers and sellers on e-commerce websites. These types of data can be represented as dynamic networks evolving in continuous time due to the fine granularity on the timestamps of events and the irregular time intervals at which events occur.
Statistically modeling these types of relations and their dynamics over time has been of great interest, especially given the ubiquity of such data in recent years. The majority of work has involved modeling these relations using network representations, with nodes representing senders and receivers, and edges representing events. Such network representations often either discard the timestamps altogether, which transforms the dynamic network into a static network, or aggregate events over time windows to form network snapshots evolving in discrete time. There have been numerous statistical models proposed for static networks dating back to the 1960s , and more recently, for discrete-time networks [33, 32, 31, 22], but comparatively less attention has been devoted to continuous-time networks.
are quite complex and utilize Markov chain Monte Carlo (MCMC) techniques for inference. Fitting these models to large networks remains a major challenge due to the poor scalability of the MCMC-based algorithms. The development of these continuous-time network models appears to have progressed separately from recent advances in static and discrete-time network models. Although some continuous-time models have drawn their inspiration from static network models, they have not been able to leverage the efficient inference techniques available for static network models to fit continuous-time models.
In this paper we introduce the block point process model (BPPM) for continuous-time event-based dynamic networks, inspired by the well-known stochastic block model (SBM) for static networks. The BPPM is a simpler version of the recently-proposed Hawkes infinite relational model (IRM) , which is a non-parametric Bayesian model for which inference scales only to very small networks. Due to its simplicity and relationship to the SBM, we show that the BPPM can be fit in an efficient manner that scales to large networks with thousands of nodes and hundreds of thousands of events. Our main contributions are as follows:
We prove that static networks aggregated from a BPPM-generated continuous-time network follow an SBM in the limit of growing number of nodes.
We leverage this relationship with the SBM to develop a principled and efficient inference algorithm for the BPPM using a local search initialized by regularized spectral clustering.
We fit the BPPM to several real network data sets, including a Facebook network with over nodes and events, several orders of magnitude larger than the Hawkes IRM and other existing point process network models.
We consider a dynamic network evolving in continuous time through the observation of events between two nodes at recorded timestamps, as shown in Figure (a)a. We assume that events are directed, so we refer to the two nodes involved in an event as the sender and receiver to distinguish directions (although the model we propose can be trivially modified to handle undirected events by reducing the number of parameters). Such event data can be represented in the form of a matrix where each row is a triplet denoting an event from node to node at timestamp . Let denote the total number of nodes in the network, and let denote the time of the last interaction, so that the interaction times are all in .
From an event matrix , one can obtain an adjacency matrix over any given time interval such that . To simplify notation, we drop the time interval from the adjacency matrix, i.e. . In this adjacency matrix, if there is at least one event between and in , and otherwise. For example, Figure (b)b shows two adjacency matrices constructed by aggregating events from the event table shown in Figure (a)a over and .
2.1 The Stochastic Block Model
Most statistical models for networks consider an adjacency matrix rather than event-based representation; many commonly used models of this type are discussed in the survey by Goldenberg et al. . One model that has received significant attention is the stochastic block model (SBM), which is defined as follows (adapted from Definition 3 in Holland et al. ):
Definition 1 (Stochastic block model).
Let denote a random adjacency matrix for a static network, and let
denote a class membership vector.is generated according to a stochastic block model with respect to the membership vector if and only if,
For any nodes
, the random variablesare statistically independent.
For any nodes and , if and are in the same class, i.e. , and and are in the same class, i.e. , then and are identically distributed.
The classes in the SBM are also commonly referred to in the literature as blocks. The class membership vector has entries where each entry denotes the class membership of node , and denotes the total number of classes. Holland et al.  originally considered the setting where the class membership vector
is specified a priori, e.g. by using some other available covariate. More recent work has considered estimation in the a posteriori setting where the class memberships are also estimated, and there exist a variety of estimation techniques that have theoretical accuracy guarantees and scale to large networks with thousands of nodes, including spectral clustering[26, 28, 19, 27]
and variational expectation-maximization[4, 1].
2.2 Related Work
Most existing work on modeling dynamic networks has considered a discrete-time representation, where the observations consist of a sequence of adjacency matrices. This observation model is ideally suited for network data collected at regular time intervals, e.g. weekly surveys. In practice, however, dynamic network data is often collected at much finer levels of temporal resolution (e.g. at the level of a second or millisecond), in which case it likely makes more sense to treat time as continuous rather than discrete. In order to apply discrete-time dynamic network models to such data, it must first be pre-processed by aggregating events over time windows to form network snapshots, and this technique is used in many real data experiments [33, 32, 14, 31, 23]. For example, an aggregated representation of the network in Figure (a)a with time window of is shown in Figure (b)b.
Aggregating continuous-time network data into discrete-time snapshots presents several challenges. One would ideally choose the time window to be as short as possible for the maximum temporal resolution. However, this increases the number of snapshots, and accordingly, the computation time (typically linear in the number of snapshots). More importantly, models fit using shorter time windows can lead to worse predictors than models fit using longer time windows because the models often assume short-term memory, such as the Markovian dynamics in several discrete-time SBMs [33, 32, 31, 22]. We demonstrate some of these practical challenges in an experiment in Section 5.2.2.
Another line of research that has evolved independently of discrete-time network models involves the use of point processes to estimate the structure of an implicit or latent network from observations at the nodes [12, 20, 9]. These models are often used to estimate networks of diffusion from information cascades. Such work differs from the setting we consider in this paper, where we directly observe events between pairs of nodes and seek to model the dynamics of such event sequences.
There have been several other models proposed using point processes to model continuous-time event-based networks [7, 2, 8, 30, 23, 10], which is the setting we consider in this paper. The BPPM that we consider is a simpler version of the Hawkes IRM . The relational event model (REM)  is related to the BPPM in that it is also inspired by the SBM and shares parameters across nodes in the network in a manner. We discuss the Hawkes IRM and REM in greater detail and compare them to our proposed model in Section 3.1.
3 The Block Point Process Model
3.1 Model Specification
We propose to model continuous-time dynamic networks using a generative point process network model. Motivated by the SBM for static networks, we propose to divide nodes into classes or blocks and to associate a univariate point process with each pair of node blocks , which we refer to as a block pair. Let denote the total number of block pairs. Let
denote the class membership probability vector, wheredenotes the probability that a node belongs to class . (It follows that all entries of must sum to .) We call our model the block point process model (BPPM). The generative process for the BPPM for a network of duration is shown in Algorithm 1.
The BPPM is a very general model—notice that we have not specified what type of point process to use in the model (we discuss this in Section 3.3). If we choose a Hawkes process as the point process, then it is a simpler version of the Hawkes infinite relational model (IRM), which couples an IRM with mutually-exciting Hawkes processes. The IRM  is a non-parametric Bayesian version of the SBM that enables the model to automatically discover the number of classes . By utilizing mutually-exciting Hawkes processes, the Hawkes IRM allows for reciprocity between block pairs. Similar to the BPPM, node pairs in a block pair are selected at random to form an edge. The authors propose a Markov chain Monte Carlo (MCMC) approach for inference that scales only to very small networks. The BPPM (when paired with a Hawkes process) simplifies the Hawkes IRM by using a fixed number of classes and univariate rather than multivariate Hawkes processes. We demonstrate in Section 4 that the simplifications allow for much more scalable inference to networks with thousands of nodes.
The BPPM is also related to the relational event model (REM) proposed by DuBois et al. , which associates a non-homogeneous Poisson process with each pair of nodes, where the intensity function is piecewise constant with knots (change points) at the event times. Different node pairs belonging to the same block pair are governed by the same set of parameters. The REM also incorporates other edge formation mechanisms within block pairs such as reciprocity and transitivity. However, if one uses only the intercept term in the REM, then it is similar to a BPPM with a piecewise constant intensity Poisson process. The authors also use MCMC for inference, but their approach scales to networks with hundreds of nodes and thousands of events, larger than the Hawkes IRM.
The BPPM we propose is a less flexible model than the Hawkes IRM and the REM, but its simplicity enables us to develop an efficient inference procedure that scales to much larger networks with thousands of nodes and hundreds of thousands of events. The proposed inference procedure, which we discuss in Section 4, takes advantage of the close relationship between the BPPM and the SBM, which we discuss next.
3.2 Relation to the Stochastic Block Model
The BPPM is motivated by the SBM, where the probability of forming an edge between two nodes depends only the classes of the two nodes. Given the relation between the point process and adjacency matrix representations discussed in Section 2, a natural question is whether there is any equivalence between the BPPM and the SBM. Specifically, does an adjacency matrix constructed from an event matrix generated by the BPPM follow an SBM? As far as we know, this connection between point process and static network models has not been previously explored.
We first note that meets criterion 2 (identical distribution within a block pair) in Definition 1 due to the random selection of node pair for each event in step 8 of Algorithm 1. To check criterion 1 (independence of all entries of ), we first note that entries and in different block pairs, i.e. , depend on different point processes, so they are independent.
Next, consider entries and in the same block pair . In general, these entries are dependent so that criterion 1 is not satisfied111One notable exception is the case of a homogeneous Poisson process, for which the entries are independent by the splitting property.. For example, if a Hawkes process is used in step 5 of Algorithm 1, then indicates that at least one event was generated in block pair , i.e. there was at least one jump in the intensity of the process. This indicates that the probability of another event is now higher, so the conditional probability should be higher than the marginal probability , and thus and are dependent.
We denote the deviation from independence using the terms and defined by
If , then the two adjacency matrix entries are independent. If or , then the two entries are dependent, with smaller values of indicating less dependence. The following theorem bounds these values.
Theorem (Asymptotic Independence Theorem).
Consider an adjacency matrix constructed from the BPPM over some time interval . Then, for any two entries and both in block pair , the deviation from independence given by defined in (1) is bounded in the following manner:
where denotes the expected number of events in block pair in , and denotes the size of block pair . In the limit as , provided grows at a slower rate than . Thus and are asymptotically independent for growing .
The proof of the Theorem (Asymptotic Independence Theorem). is provided in Appendix A in the supplementary material. We evaluate the tightness of the bound in (2) via simulation in Section 5.1.1. Since it depends only on the expected number of events and not the distribution, it is likely to be loose in general.
The Theorem (Asymptotic Independence Theorem). states that the deviation given by is non-zero in general for fixed , so the entries and are dependent, but the dependence decreases as the size of a block (and thus, a block pair) grows. This can be achieved by letting the number of nodes in the network grow while holding the number of classes fixed. In this case, the sizes of block pairs would be growing at rate , so the asymptotic behavior should be visible for networks with thousands of nodes. Thus, an adjacency matrix constructed from the BPPM follows the SBM in the limit of a growing network. To the best of our knowledge, this is the first such result linking networks constructed from point process models and static network models. It is practically useful in that it allows us to leverage recent work on efficient inference on the SBM for the BPPM.
3.3 Choice of Point Process Model
Any temporal point process can be used to generate the event times in the BPPM. We turn our attention to a specific point process: the Hawkes process , which is a self-exciting process where the occurrence of events increases the probability of additional events in the future. The self-exciting property tends to create clusters of events in time, which are empirically observed in many settings. Prior work has suggested that Hawkes processes with exponential kernels provide a good fit to many real social network data sets, including email and conversation sequences [13, 21] and re-shares of posts on Twitter . Hence, we also adopt the exponential kernel, which has intensity function
where denotes the background rate that the intensity reverts to over time, denotes the jump size for the intensity function, denotes the exponential decay rate, and the ’s denote times of events that occurred prior to time . We refer to this model as the block Hawkes model (BHM).
4 Inference Procedure
4.1 Likelihood Function
The observed data is in the form of triplets for each event denoting the nodes involved and the timestamp . Consider an event matrix where each row corresponds to an event in the form of a triplet . Let denote rows of corresponding to events involving block pair ; that is, rows where and . The row blocks form a partition of the rows of matrix . Let denote the number of events observed in block pair . Let denote the size of block pair , i.e. the number of possible edges in block pair , which is given by if and otherwise. The likelihood function is given by
where the last equality follows from the random selection of nodes in step 8 of the BPMM generative process. Taking the log of the likelihood function results in the log-likelihood
The term is simply the log-likelihood of the point process model parameters given the timestamps of events in block pair . The full expression for for the block Hawkes model is provided in Section B of the supplementary material.
4.2 Local Search
The log-likelihood (3) given the observed event matrix is a function of both the class assignments and the point process parameters . The class assignments are used to partition the event matrix into row blocks and thus affect both terms in (3) through , , and . Class assignments are discrete, and directly maximizing the log-likelihood over , e.g. by exhaustive search, is applicable only to extremely small networks.
We use a local search (hill climbing) procedure, which is also often referred to as label switching or node swapping in the networks literature [16, 35] to iteratively update the class assignments to reach a local maximum in a greedy fashion. Recent work has found that such greedy algorithms are competitive with more computationally demanding estimation algorithms in both the static SBM  and discrete-time dynamic SBM [32, 6] while scaling to much larger networks. At each iteration, we swap a single node to a different class by choosing the swap that increases the log-likelihood the most. For each possible swap, we evaluate the log-likelihood by partitioning events into blocks according to the new class assignments, obtaining the maximum-likelihood estimates of the point process model parameters, and substituting these estimates along with the new class assignments into (3).
Each iteration of the local search considers possible swaps, and computation of the likelihood for each swap involves iterating over the timestamps of all events. Thus, each iteration of the local search has time complexity , which is linear in both the number of events and number of nodes, allowing it to scale to large networks. We verify this time complexity experimentally in Section 5.1.3. The local search is easily parallelized by evaluating each possible swap on a separate CPU core. We terminate the local search procedure when no swap is able to increase the log-likelihood, indicating that we have reached a local maximum.
4.3 Spectral Clustering Initialization
In order to ensure that the local search procedure does not get stuck in poor local maxima, it is important to provide a good initialization. Given the close relationship between the proposed BPPM and the SBM discussed in Section 3.2
, we propose to leverage estimators for class assignments in the SBM with known desirable qualities in order to initialize the local search, which is a much faster and more principled approach than the typical black-box approach of using multiple random initializations. Methods used to initialize class estimates in static and discrete-time SBMs include k-means clustering[5, 22] and spectral clustering [32, 31]. Spectral clustering is an attractive choice because it scales to large networks containing thousands of nodes and has theoretical performance guarantees applicable to the BPPM, as we discuss next.
Recent work has demonstrated that applying spectral clustering (or a regularized variant) to a network generated from an SBM results in consistent estimates of class assignments as the number of nodes [26, 28, 19, 27]. These theoretical guarantees typically require the expected degrees of nodes to grow polylogarithmically with the number of nodes so that the network is not too sparse. On the other hand, the Theorem (Asymptotic Independence Theorem). shows that an asymptotic equivalence between the BPPM and SBM provided that there are not too many events, i.e. the network is not too dense. In the polylog degree regime, the ratio
so the Theorem (Asymptotic Independence Theorem). holds. Thus, spectral clustering should provide an accurate estimate of the class assignments in the polylog degree regime, which is commonly observed in real networks such as social networks. Since we consider directed relations, we use a regularized spectral clustering algorithm for directed networks (pseudocode provided in Algorithm 2 in Appendix B in the supplementary material) to initialize the local search.
5.1 Simulated Networks
We evaluate the proposed BPPM and local search inference procedure in three simulation experiments examining the deviation from independence, class estimation accuracy, and scalability.
5.1.1 Deviation from Independence
The Theorem (Asymptotic Independence Theorem). demonstrates that pairs of adjacency matrix entries in the same block pair are dependent, but that the dependence is upper bounded by (2), and that the dependence goes to for growing blocks. To evaluate the slackness of the bounds, we simulate networks from the block Hawkes model (BHM). Since and depend only on the size of the blocks, we simulate networks with a single block and let the number of nodes grow from to . For each number of nodes, we simulate networks from the block Hawkes model for a duration of . We choose the Hawkes process parameters to be , , and . For both cases, the expected number of events , which grows with and is slower than the growth of the size of the block pair, so the Theorem (Asymptotic Independence Theorem). applies.
We evaluate the absolute difference between the empirical marginal probability and the empirical conditional probabilities and . The empirical deviation from independence is plotted against the upper bound in Figure 2. Since the bound (2) in the Theorem (Asymptotic Independence Theorem). does not depend on any property of the point process aside from the mean number of events, it is somewhat loose when applied to the block Hawkes model. Note that the upper bound decays with rate here due to growing with .
5.1.2 Class Estimation
This simulation experiment is based on the synthetic network generator from Newman and Girvan , where all the diagonal blocks have the same parameters, and the off-diagonal blocks have the same parameters, but different from the diagonal blocks. We generate networks with nodes and classes from the block Hawkes model using Algorithm 1 with varying durations from to time units. networks were generated for each duration, with Hawkes process parameters and being and , respectively, for all blocks. The baseline rates are for diagonal blocks and for off-diagonal blocks. Class estimates were extracted using local search initialized with spectral clustering, with spectral clustering alone as a baseline for comparison. The results shown in Figure 3 demonstrate the necessity of the local search, which achieves much higher accuracy than spectral clustering alone. Additionally, as the number of events increases, the class estimates of both estimators begin to converge to the true classes as one would expect.
5.1.3 Scalability of Inference Procedure
We evaluate the scalability of the proposed inference procedure by generating networks of varying sizes with classes from the block Hawkes model. For each value of (number of nodes), we simulate networks of equal duration and record the CPU time per iteration of the local search inference procedure on each network. The CPU time per iteration is shown in Figure 4 along with a best-fit line for a power law relationship (beginning with nodes). The best-fit line has slope), confirming the linear time complexity in terms of the number of nodes . Additional details on the experiment set-up and additional results on scalability, including for a growing number of events rather than nodes, are presented in Appendix C.1 in the supplementary material.
5.2 Real Networks
We fit the block Hawkes model (BHM) to three real social network data sets: Enron emails ( nodes, events), Reality Mining phone calls ( nodes, events), and Facebook wall posts ( nodes, events). For the Enron and Reality Mining data sets, we compare the fits of the BHM to the relational event model (REM) . We are unable to compare the BHM to the Hawkes IRM  because it scales only to extremely small networks.
5.2.1 Enron and Reality Mining
|Enron emails||Reality Mining|
We use the preprocessed versions of the Enron and Reality Mining data sets from DuBois et al.  and apply the same train/test splits: / train/test events for the Enron email data and / for the Reality Mining data. We fit the BHM on the training data and evaluate the log-likelihood on the test data, which we compare to the values reported in . For the REM, we consider both the full model (REM-Full) and only the intercept term (REM-BM). REM-BM is more similar to our proposed BPPM as discussed in Section 3.1. The comparisons of mean test log-likelihood per event are shown in Table 1. For both data sets, the conclusions are similar. Our BHM achieves higher test log-likelihood than the REM-BM; we believe this is due to the superiority of the exponential Hawkes process compared to the piecewise constant Poisson process used in . However, REM-Full outperforms the BHM due to its inclusion of the additional model terms, which allows it greater flexibility but limits its scalability compared to the BHM.
5.2.2 Facebook Wall Posts
Model-Based Exploratory Analysis
We analyze the Facebook wall post data collected by Viswanath et al.  by fitting a BHM with
classes (chosen by examining the singular values of the regularized graph Laplacian as described inRohe et al. ). We consider events between January 1, 2007 and January 1, 2009. We remove nodes with degree less than , resulting in a network with events among nodes.
The parameters inferred from the BHM fit on the Facebook data are shown in Figure 5. The diagonal block pairs have larger values of background intensity , indicating that the blocks form communities. This finding could also have been yielded by static and discrete-time SBMs. The diagonal block pairs also have higher values of jump sizes , indicating that wall posts between members of a community are more bursty. A portion of the Hawkes process intensity function for block pair is shown in Figure 6. In addition to diurnal patterns, one can observe bursty periods of wall posts throughout the day. This finding could not have been obtained from static and discrete-time SBMs. By observing the values of on off-diagonal block pairs, we notice that there isn’t much asymmetry, but the decay rate exhibits asymmetry. Specifically, notice that events from class 1 to class 3 have longer sustained bursts than events from class 3 to 1. The same is true for events from class 1 to 2 compared to class 2 to 1.
Comparison with Discrete-Time SBM
To compare our continuous-time block Hawkes model with a discrete-time SBM , we split the set of events into folds. At fold , we train both models on all folds up to and attempt to predict the time to the next event in each block. For the discrete-time models, we extract time snapshots of hour, hours, hours, hours, day, days, and days. From Figure 7, notice that the block Hawkes model has lower mean-squared error (MSE) in its prediction than any of the discrete-time SBMs, confirming the benefits of continuous-time rather than discrete-time network modeling. Additional details on the experiment set-up and additional results are presented in Appendix C.2 in the supplementary material.
In this paper, we introduced the block point process model (BPPM) for dynamic networks evolving in continuous time in the form of timestamped events between nodes. Our model was inspired by the well-known stochastic block model (SBM) for static networks and is a simpler version of the Hawkes IRM. We demonstrated that adjacency matrices constructed from the BPPM follow an SBM in the limit of a growing number of nodes. To the best of our knowledge, this is the first result of this type connecting point process network models with adjacency matrix network models. Additionally we proposed an efficient algorithm to fit the BPPM that allows it to scale to large networks, including a network of Facebook wall posts with over nodes and over events.
Appendix A Proof of Theorem (Asymptotic Independence Theorem).
We begin with a well-known lemma on the difference of powers that will be used both to upper and lower bound the deviation from independence.
Lemma 1 (Difference of powers).
For a real number and integer , we have the following identity:
The proof follows straightforwardly from factorizing a difference of powers. Specifically, for real numbers and integer ,
If and , then (5) becomes
There are terms in the summation. The largest term is , and the smallest term is . Thus, we can upper and lower bound the sum by and , respectively, to arrive at (4). ∎
The next lemma will be used in the upper bound.
For a real number and integer ,
We now state the proof of the Theorem (Asymptotic Independence Theorem)..
First compute the marginal probability . implies that no events between nodes and occurred. To compute this probability, we first compute the conditional probability given that the number of events in block pair is . To simplify notation, we drop the subscript from and in the remainder of the proof.
where the equality follows by noting that, conditioned on total events in block pair , the number of events between nodes and
follows a binomial distribution withtrials and success probability . The success probability is due to step 8
of the generative process of the BPPM, which involves selecting node pairs randomly to receive an event. By the Law of Total Probability, the marginal probability is
where the probability mass function denotes the probability that events in block pair occurred.
Next consider the joint probability . As before, condition on the number of events . The conditional joint probability is
because the number of events for each node pair in block pair follow a multinomial distribution with trials and all event probabilities equal to . By the Law of Total Probability,
We first lower bound by noting that
Finally we upper bound using the same approach as for :
where the final inequality is obtained from (17).
Appendix B Inference Details for Block Hawkes Model
and denotes the th event corresponding to block pair . For each swap in the local search procedure described in Section 4.2, we maximize (20) with respect to for each block using a standard interior point optimization routine . The local search is initialized using the regularized spectral clustering algorithm listed in Algorithm 2. It is a variant of the DI-SIM co-clustering algorithm  modified to produce a single set of clusters for directed graphs in a manner similar to Sussman et al. .
Appendix C Additional Experiment Details and Results
c.1 Scalability of Inference Procedure
In the simulation experiment to test the scalability of our inference procedure (Section 5.1.3), we set the Hawkes process parameters , , and to , , and , respectively, for off diagonal blocks and , , and , respectively, for diagonal blocks. For each value of (number of nodes), we simulate five networks with duration of time units and record the CPU time for each local search iteration as well as the total CPU time for the inference procedure on each network. The CPU time per iteration scales linearly in , as shown in Figure 4. In Figure 8, we show the total CPU time over all iterations for the local search inference procedure. The best-fit line has slope (standard error ), so that the entire inference procedure scales between quadratically and cubically in the number of nodes, suggesting that it is indeed scalable to networks with thousands of nodes.
To test our inference procedure with respect growing number of events, we set the Hawkes process parameters , , and to , , and , respectively, for off diagonal blocks and , , and , respectively, for diagonal blocks. We generated networks with the number of events varying from to million. For each event count, we simulated five networks with the number of nodes and classes fixed at and , respectively. The CPU time per iteration scales linearly in , as shown in Figure 9, as expected according to the computed time complexity in Section 4.2. All CPU times are recorded on a Linux workstation using Intel Xeon processor cores operating at GHz.
c.2 Facebook Wall Posts
In the prediction experiment where we compare the block Hawkes model with a discrete-time SBM on the Facebook wall posts network (Section 5.2.2), we estimate class assignments for both models using spectral clustering with classes (with no local search iterations) so that the class estimates do not play a role in the accuracy. We believe this is a valid comparison because spectral clustering is used as the initialization to local search in the inference procedure for both models, as discussed in Section 4.2 for the block Hawkes model and in  for the discrete-time SBM.
We split events between Jan 1, 2007 and January 1, 2009 into folds, and for each fold , we train both models on all folds up to and attempt to predict the time to the next event in each block. The block Hawkes model directly models event times, so we use the expected next event time for each block as the prediction. The discrete-time SBM does not directly model event times, so we multiply the expected number of time snapshots that will elapse before the next edge formation by the snapshot length to get a prediction for the next event time. Since the prediction for the discrete-time SBM is dependent on the snapshot length, we test time snapshots of hour, hours, hours, hours, day, days and days.
We evaluate the accuracy of the predictions by computing the total mean-squared error (MSE) between all predicted event times and actual event times for the first event in each block in the next fold. As shown in Figure 7, the accuracy of the discrete-time SBM is highly dependent on the snapshot length. For snapshots beyond days long, the loss in temporal resolution is the main contributor to the high MSE. The shorter snapshots such as hour and hours produce extremely sparse networks, so that the Markovian assumptions in the discrete-time SBM are no longer accurate models for edge generation. Using the block Hawkes model, we avoid this complex model selection problem of choosing the snapshot length and produce more accurate predictions than the discrete-time model for any snapshot length, as shown in Figure 7.
- Bickel et al.  P. Bickel, D. Choi, X. Chang, and H. Zhang. Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. The Annals of Statistics, 41(4):1922–1943, 2013.
- Blundell et al.  C. Blundell, J. Beck, and K. A. Heller. Modelling reciprocating relationships with Hawkes processes. In Advances in Neural Information Processing Systems 25, pages 2600–2608. 2012.
- Byrd et al.  R. H. Byrd, J. C. Gilbert, and J. Nocedal. A trust region method based on interior point techniques for nonlinear programming. Mathematical Programming, 89(1):149–185, 2000.
- Celisse et al.  A. Celisse, J.-J. Daudin, and L. Pierre. Consistency of maximum-likelihood and variational estimators in the stochastic block model. Electronic Journal of Statistics, 6:1847–1899, 2012.
- Côme and Latouche  E. Côme and P. Latouche. Model selection and clustering in stochastic block models based on the exact integrated complete data likelihood. Statistical Modelling, 15(6):564–589, 2015.
- Corneli et al.  M. Corneli, P. Latouche, and F. Rossi. Exact ICL maximization in a non-stationary temporal extension of the stochastic block model for dynamic networks. Neurocomputing, 192:81–91, 2016.
- DuBois and Smyth  C. DuBois and P. Smyth. Modeling relational events via latent classes. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 803–812, 2010.
DuBois et al. 
C. DuBois, C. T. Butts, and P. Smyth.
Stochastic blockmodeling of relational event dynamics.
Proceedings of the 16th International Conference on Artificial Intelligence and Statistics, pages 238–246, 2013.
- Farajtabar et al.  M. Farajtabar, Y. Wang, M. G. Rodriguez, S. Li, H. Zha, and L. Song. COEVOLVE: A joint point process model for information diffusion and network co-evolution. In Advances in Neural Information Processing Systems 28, pages 1945–1953, 2015.
- Fox et al.  E. W. Fox, M. B. Short, F. P. Schoenberg, K. D. Coronges, and A. L. Bertozzi. Modeling e-mail networks and inferring leadership using self-exciting point processes. Journal of the American Statistical Association, 111(514):564–584, 2016.
Goldenberg et al. 
A. Goldenberg, A. X. Zheng, S. E. Fienberg, and E. M. Airoldi.
A survey of statistical network models.
Foundations and Trends® in Machine Learning, 2(2):129–233, 2010.
- Hall and Willett  E. C. Hall and R. M. Willett. Tracking dynamic point processes on networks. IEEE Transactions on Information Theory, 62(7):4327–4346, 2016.
- Halpin and De Boeck  P. F. Halpin and P. De Boeck. Modelling dyadic interaction with Hawkes processes. Psychometrika, 78(4):793–814, 2013.
- Han et al.  Q. Han, K. S. Xu, and E. M. Airoldi. Consistent estimation of dynamic and multi-layer block models. In Proceedings of the 32nd International Conference on Machine Learning, pages 1511–1520, 2015.
- Holland et al.  P. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social Networks, 5(2):109–137, 1983.
- Karrer and Newman  B. Karrer and M. E. J. Newman. Stochastic blockmodels and community structure in networks. Physical Review E, 83(1):016107, 2011.
- Kemp et al.  C. Kemp, J. B. Tenenbaum, T. L. Griffiths, T. Yamada, and N. Ueda. Learning systems of concepts with an infinite relational model. In Proceedings of the 21st National Conference on Artificial Intelligence, pages 381–388, 2006.
- Laub et al.  P. J. Laub, T. Taimre, and P. K. Pollett. Hawkes Processes. arXiv.org, math.PR, jul 2015. URL http://arxiv.org/abs/1507.02822.
- Lei and Rinaldo  J. Lei and A. Rinaldo. Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 43(1):215–237, 2015.
- Linderman and Adams  S. W. Linderman and R. P. Adams. Scalable Bayesian inference for excitatory point process networks. arXiv preprint arXiv:1507.03228, 2015.
- Masuda et al.  N. Masuda, T. Takaguchi, N. Sato, and K. Yano. Self-exciting point process modeling of conversation event sequences. In Temporal Networks, pages 245–264. Springer, 2013.
- Matias and Miele  C. Matias and V. Miele. Statistical clustering of temporal networks through a dynamic stochastic block model. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(4):1119–1141, 2017.
- Matias et al.  C. Matias, T. Rebafka, and F. Villers. A semiparametric extension of the stochastic block model for longitudinal networks. arXiv preprint arXiv:1512.07075, 2015.
- Newman and Girvan  M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69(2):026113, aug 2004.
- Ogata  Y. Ogata. On Lewis’ simulation method for point processes. IEEE Transactions on Information Theory, 27(1):23–31, 1981.
- Rohe et al.  K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics, 39(4):1878–1915, 2011.
- Rohe et al.  K. Rohe, T. Qin, and B. Yu. Co-clustering directed graphs to discover asymmetries and directional communities. Proceedings of the National Academy of Sciences, 113(45):12679–12684, 2016.
- Sussman et al.  D. L. Sussman, M. Tang, D. E. Fishkind, and C. E. Priebe. A consistent adjacency spectral embedding for stochastic blockmodel graphs. Journal of the American Statistical Association, 107(499):1119–1128, 2012.
- Viswanath et al.  B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi. On the evolution of user interaction in facebook. In Proceedings of the 2nd ACM Workshop on Online Social Networks, pages 37–42. ACM, 2009.
- Xin et al.  L. Xin, M. Zhu, and H. Chipman. A continuous-time stochastic block model for basketball networks. arXiv preprint arXiv:1507.01816, 2015.
- Xu  K. S. Xu. Stochastic block transition models for dynamic networks. In Proceedings of the 18th International Conference on Artificial Intelligence and Statistics, pages 1079–1087, 2015.
- Xu and Hero III  K. S. Xu and A. O. Hero III. Dynamic stochastic blockmodels for time-evolving social networks. IEEE Journal of Selected Topics in Signal Processing, 8(4):552–562, 2014.
- Yang et al.  T. Yang, Y. Chi, S. Zhu, Y. Gong, and R. Jin. Detecting communities and their evolutions in dynamic social networks—a Bayesian approach. Machine Learning, 82(2):157–189, sep 2011.
- Zhao et al.  Q. Zhao, M. A. Erdogdu, H. Y. He, A. Rajaraman, and J. Leskovec. SEISMIC: A self-exciting point process model for predicting tweet popularity. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1513–1522, 2015.
- Zhao et al.  Y. Zhao, E. Levina, and J. Zhu. Consistency of community detection in networks under degree-corrected stochastic block models. The Annals of Statistics, 40(4):2266–2292, aug 2012.