Online social networks, such as Twitter or Weibo, have become large information networks where people share, discuss and search for information of personal interest as well as breaking news . In this context, users often forward to their followers information they are exposed to via their followees, triggering the emergence of information cascades that travel through the network , and constantly create new links to information sources, triggering changes in the network itself over time. Importantly, recent empirical studies with Twitter data have shown that both information diffusion and network evolution are coupled and network changes are often triggered by information diffusion [3, 4, 5].
While there have been many recent works on modeling information diffusion [6, 7, 8, 2, 9] and network evolution [10, 11, 12], most of them treat these two stochastic processes independently and separately, ignoring the influence one may have on the other over time. Thus, to better understand information diffusion and network evolution, there is an urgent need for joint probabilistic models of the two processes, which are largely inexistent to date.
In this paper, we propose a probabilistic generative model, Coevolve, for the joint dynamics of information diffusion and network evolution. Our model is based on the framework of temporal point processes, which explicitly characterizes the continuous time interval between events, and it consists of two interwoven and interdependent components, as shown in Figure 1:
Information diffusion process. We design an “identity revealing” multivariate Hawkes process  to capture the mutual excitation behavior of retweeting events, where the intensity of such events in a user is boosted by previous events from her time-varying set of followees. Although Hawkes processes have been used for information diffusion before [14, 15, 16, 17, 18, 19, 20, 21], the key innovation of our approach is to explicitly model the excitation due to a particular source node, hence revealing the identity of the source. Such design reflects the reality that information sources are explicitly acknowledged, and it also allows a particular information source to acquire new links in a rate according to her “informativeness”.
Network evolution process. We model link creation as an “information driven” survival process, and couple the intensity of this process with retweeting events. Although survival processes have been used for link creation before [22, 23], the key innovation in our model is to incorporate retweeting events as the driving force for such processes. Since our model has captured the source identity of each retweeting event, new links will be targeted toward information sources, with an intensity proportional to their degree of excitation and each source’s influence.
Our model is designed in such a way that it allows the two processes, information diffusion and network evolution, unfold simultaneously in the same time scale and exercise bidirectional influence on each other, allowing sophisticated coevolutionary dynamics to be generated, as illustrated in Figure 2.
Importantly, the flexibility of our model does not prevent us from efficiently simulating diffusion and link events from the model and learning its parameters from real world data:
Efficient simulation. We design a scalable sampling procedure that exploits the sparsity of the generated networks. Its complexity is , where is the number of events, is the number of users and is the maximum number of followees per user.
Convex parameters learning. We show that the model parameters that maximize the joint likelihood of observed diffusion and link creation events can be efficiently found via convex optimization.
Then, we experiment with our model and show that it can produce coevolutionary dynamics of information diffusion and network evolution, and generate retweet and link events that obey common information diffusion patterns (, cascade structure, size and depth), static network patterns (, node degree) and temporal network patterns (, shrinking diameter) described in related literature [24, 12, 25]. Finally, we show that, by modeling the coevolutionary dynamics, our model provides significantly more accurate link and diffusion event predictions than alternatives in large scale Twitter dataset .
The remainder of this article is organized as follows. We first proceed by building sufficient background on the temporal point processes framework in Section 2. Then, we introduce our joint model of information diffusion and network structure co-evolution in Section 3. Sections 4 and 5 are devoted to answer two essential questions: how can we generate data from the model? and how can we efficiently learn the model parameters from historical event data? Any generative model should be able to answer the above questions. In Sections 6, 7, and 8
we perform empirical investigation of the properties of the model, we evaluate the accuracy of the parameter estimation in synthetic data, and we evaluate the performance of the proposed model in real-world dataset, respectively. Section9 reviews the related work and Section 10 discusses some extensions to the proposed model. Finally, the paper is concluded in Section 11.
2 Background on Temporal Point Processes
A temporal point process is a random process whose realization consists of a list of discrete events localized in time, with and . Many different types of data produced in online social networks can be represented as temporal point processes, such as the times of retweets and link creations. A temporal point process can be equivalently represented as a counting process, , which records the number of events before time . Let the history be the list of times of events up to but not including time . Then, the number of observed events in a small time window of length is
and hence , where is a Dirac delta function. More generally, given a function , we can define the convolution with respect to as
The point process representation of temporal data is fundamentally different from the discrete time representation typically used in social network analysis. It directly models the time interval between events as random variables, avoids the need to pick a time window to aggregate events, and allows temporal events to be modeled in a fine grained fashion. Moreover, it has a remarkably rich theoretical support.
An important way to characterize temporal point processes is via the conditional intensity function — a stochastic model for the time of the next event given all the times of previous events. Formally, the conditional intensity function
(intensity, for short) is the conditional probability of observing an event in a small windowgiven the history , ,
where one typically assumes that only one event can happen in a small window of size and thus . Then, given the observation until time and a time , we can also characterize the conditional probability that no event happens until as
the (conditional) probability density function that an event occurs at timeas
and the (conditional) cumulative density function, which accounts for the probability that an event happens before time :
Figure 3 illustrates these quantities. Moreover, we can express the log-likelihood of a list of events in an observation window as
This simple log-likelihood will later enable us to learn the parameters of our model from observed data.
|a) Poisson process|
|b) Hawkes process|
|c) Survival process|
Finally, the functional form of the intensity is often designed to capture the phenomena of interests. Some useful functional forms we will use are :
Poisson process. The intensity is assumed to be independent of the history , but it can be a nonnegative time-varying function, ,
Hawkes Process. The intensity is history dependent and models a mutual excitation between events, ,
is an exponential triggering kernel and is a baseline intensity independent of the history. Here, the occurrence of each historical event increases the intensity by a certain amount determined by the kernel and the weight , making the intensity history dependent and a stochastic process by itself. In our work, we focus on the exponential kernel, however, other functional forms, such as log-logistic function, are possible, and the general properties of our model do not depend on this particular choice.
Survival process. There is only one event for an instantiation of the process, ,
where and the term makes sure is if an event already happened before .
3 Generative Model of Information Diffusion and Network Evolution
In this section, we use the above background on temporal point processes to formulate Coevolve, our probabilistic model for the joint dynamics of information diffusion and network evolution.
3.1 Event Representation
|a) Event representation||b) Point and counting processes|
We model the generation of two types of events: tweet/retweet events, , and link creation events, . Instead of just the time , we record each event as a triplet, as illustrated in Figure 5(a):
For retweet event, the triplet means that the destination node retweets at time a tweet originally posted by source node . Recording the source node reflects the real world scenario that information sources are explicitly acknowledged. Note that the occurrence of event does not mean that is directly retweeting from or is connected to . This event can happen when is retweeting a message by another node where the original information source is acknowledged. Node will pass on the same source acknowledgement to its followers (, “I agree @a @b @c @s”). Original tweets posted by node are allowed in this notation. In this case, the event will simply be . Given a list of retweet events up to but not including time , the history of retweets by due to source is
The entire history of retweet events is denoted as
For link creation event, the triplet means that destination node creates at time a link to source node , , from time on, node starts following node . To ease the exposition, we restrict ourselves to the case where links cannot be deleted and thus each (directed) link is created only once. However, our model can be easily augmented to consider multiple link creations and deletions per node pair, as discussed in Section 10. We denote the link creation history as .
3.2 Joint Model with Two Interwoven Components
Given users, we use two sets of counting processes to record the generated events, one for information diffusion and another for network evolution. More specifically,
Retweet events are recorded using a matrix of size for each fixed time point . The -th entry in the matrix, , counts the number of retweets of due to source up to time . These counting processes are “identity revealing”, since they keep track of the source node that triggers each retweet. The matrix is typically less sparse than , since can be nonzero even when node does not directly follow . We also let .
Link events are recorded using an adjacency matrix of size for each fixed time point . The -th entry in the matrix, , indicates whether is directly following . Therefore, means the directed link has been created before . For simplicity of exposition, we do not allow self-links. The matrix is typically sparse, but the number of nonzero entries can change over time. We also define .
Then, the interwoven information diffusion and network evolution processes can be characterized using their respective intensities
The sign means that the intensity matrices will depend on the joint history, , and hence their evolution will be coupled. By this coupling, we make: (i) the counting processes for link creation to be “information driven” and (ii) the evolution of the linking structure to change the information diffusion process. In the next two sections, we will specify the details of these two intensity matrices.
3.3 Information Diffusion Process
We model the intensity, , for retweeting events using multivariate Hawkes process :
where is the indicator function and is the current set of followees of . The term is the intensity of original tweets by a user on his own initiative, becoming the source of a cascade, and the term models the propagation of peer influence over the network, where the triggering kernel models the decay of peer influence over time.
Note that the retweeting intensity matrix is by itself a stochastic process that depends on the time-varying network topology, the non-zero entries in , whose growth is controlled by the network evolution process in Section 3.4. Hence the model design captures the influence of the network topology and each source’s influence, , on the information diffusion process. More specifically, to compute , one first finds the current set of followees of , and then aggregates the retweets of these followees that are due to source . Note that these followees may or may not directly follow source . Then, the more frequently node is exposed to retweets of tweets originated from source via her followees, the more likely she will also retweet a tweet originated from source . Once node retweets due to source , the corresponding will be incremented, and this in turn will increase the likelihood of triggering retweets due to source among the followers of . Thus, the source does not simply broadcast the message to nodes directly following her but her influence propagates through the network even to those nodes that do not directly follow her. Finally, this information diffusion model allows a node to repeatedly generate events in a cascade, and is very different from the independent cascade or linear threshold models  which allow at most one event per node per cascade.
|a) Link creation process||b) Social network||c) Information diffusion process|
3.4 Network Evolution Process
In our model, each user is exposed to information through a time-varying set of neighbors. By doing so, information diffusion affects network evolution, increasing the practical application of our model to real-world network datasets. The particular definition of exposure (, a retweet’s neighbor) depends on the type of historical information that is available. Remarkably, the flexibility of our model allows for different types of diffusion events, which we can broadly classify into two categories.
In the first category, events corresponds to the times when an information cascade hits a person, for example, through a retweet from one of her neighbors, but she does not explicitly like or forward the associated post. Here, we model the intensity, , for link creation using a combination of survival and Hawkes process:
where the term effectively ensures a link is created only once, and after that, the corresponding intensity is set to zero. The term denotes a baseline intensity, which models when a node decides to follow a source spontaneously at her own initiative. The term corresponds to the retweets by node (a followee of node ) which are originated from source . The triggering kernel models the decay of interests over time.
In the second category, the person decides to explicitly like or forward the associated post and influencing events correspond to the times when she does so. In this case, we model the intensity, , for link creation as:
where the terms , , and the decaying kernel play the same role as the corresponding ones in Equation (20). The term corresponds to the retweets of node due to tweets originally published by source . The higher the corresponding retweet intensity, the more likely will find information by source useful and will create a direct link to .
In both cases, the link creation intensity is also a stochastic process by itself, which depends on the retweet events, be it the retweets by the neighbors of node or the retweets by node herself, respectively. Therefore, it captures the influence of retweets on the link creation, and closes the loop of mutual influence between information diffusion and network topology. Figure 6 illustrates these two interdependent intensities.
Intuitively, in the latter category, information diffusion events are more prone to trigger new connections, because, they involve the target and source nodes in an explicit interaction, however, they are also less frequent. Therefore, it is mostly suitable to large event datasets, as the ones we generate in our synthetic experiments. In contrast, in the former category, information diffusion events are less likely to inspire new links but found in abundance. Therefore, it is more suitable for smaller datasets, as the ones we use in our real-world experiments. Consequently, in our synthetic experiments we used the latter and in our real-world experiments, we used the former. More generally, the choice of exposure event should be made based on the type and amount of available historical information.
Finally, note that creating a link is more than just adding a path or allowing information sources to take shortcuts during diffusion. The network evolution makes fundamental changes to the diffusion dynamics and stationary distribution of the diffusion process in Section 3.3. As shown in , given a fixed network structure , the expected retweet intensity at time due to source will depend of the network structure in a nonlinear fashion, ,
where has a single nonzero entry with value and is the matrix exponential. When , the stationary intensity is also nonlinearly related to the network structure. Thus, given two network structures and at two points in time, which are different by a few edges, the effect of these edges on the information diffusion is not just an additive relation. Depending on how these newly created edges modify the eigen-structure of the sparse matrix , their effect on the information diffusion dynamics can be very significant.
|(a) Ogata’s algorithm||(b) Proposed algorithm|
4 Efficient Simulation of Coevolutionary Dynamics
We could simulate samples (link creations, tweets and retweets) from our model by adapting Ogata’s thinning algorithm , originally designed for multidimensional Hawkes processes. However, a naive implementation of Ogata’s algorithm would scale poorly, , for each sample, we would need to re-evaluate and . Thus, to draw sample events, we would need to perform operations, where is the number of nodes. Figure 7(a) schematically demonstrates the main steps of Ogata’s algorithm. Please refer to Appendix A for further details.
Here, we design a sampling procedure that is especially well-fitted for the structure of our model. The algorithm is based on the following key idea: if we consider each intensity function in and as a separate point process and draw a sample from each, the minimum among all these samples is a valid sample for the multidimensional point process.
As the results of this section are general and can be applied to simulate any multi-dimensional point process model we abuse the notation a little bit and represent (possibly inter-dependent) point processes by intensity functions . In the specific case of simulating coevolutionary dynamics we have were the first and second terms are the number information diffusion and link creation processes, respectively. Figure 7 illustrates the way in which both algorithms differ. The new algorithm has the following steps:
Initialization: Simulate each dimension separately and find their next sampled event time.
Minimization: Take the minimum among all the sampled times and declare it as the next event of the multidimensional process.
Update: Recalculate the intensities of the dimensions that are affected by this approved sample and re-sample only their next event. Then go to step 2.
To prove that the new algorithm generates samples from the same distribution as Ogata’s algorithm does we need the following Lemma. It justifies step 2 of the above outline.
Assume we have independent non-homogeneous Poisson processes with intensity . Take random variable equal to the time of process ’s first event after time . Define and . Then,
(a) is the first event after time of the Poisson process with intensity . In other words, has the same distribution as the next event () in Ogata’s algorithm.
(b) follows the conditional distribution . I.e. the dimension firing the event comes from the same distribution as the one in Ogata’s algorithm.
(a) The waiting time of the first event of a dimension111 If random variable is exponentially distributed with parameter , then
is its probability distribution function and
is the cumulative distribution function.random variable ; , . We have:
Therefore, is exponentially distributed with parameter which can be seen as the first event of a non-homogenous poisson process with intensity after time .
(b) To find the distribution of we have
After normalization we get .
Given the above Lemma, we can now prove that the distribution of the samples generated by the proposed algorithm is identical to the one generated by Ogata’s method. The sequence of samples from Ogata’s algorithm and our proposed algorithm follow the same distribution. Using the chain rule the probability of observingis written as:
By fixing the history up to some time, say , all dimensions of multivariate Hawkes process become independent of each other (until next event happens). Therefore, the above lemma can be applied to show that the next sample time from Ogata’s algorithm and the proposed one come from the same distribution, , for every , is the same for both algorithms. Thus, the multiplication of individual terms is also equal for both. This will prove the theorem.
This new algorithm is specially suitable for the structure of our inter-coupled processes. Since social and information networks are typically sparse, every time we sample a new node (or link) event from the model, only a small number of intensity functions in the local neighborhood of the node (or the link), will change. This number is of where is the maximum number of followers/followees per node. As a consequence, we can reuse most of the individual samples for the next overall sample. Moreover, we can find which intensity function has the minimum sample time in operations using a heap priority queue. The heap data structure will help maintain the minimum and find it in logarithmic time with respect to the number of elements therein. Therefore, we have reduced an factor in the original algorithm to .
Finally, we exploit the properties of the exponential function to update individual intensities for each new sample in . For simplicity consider a Hawkes process with intensity . Note that both link creation and information diffusion processes have this structure. Now, let be two arbitrary times, we have
It can be readily generalized to the multivariate case too. Therefore, we can compute the current intensity without explicitly iterating over all previous events. As a result we can change an factor in the original algorithm to . Furthermore, the exponential kernel also facilitates finding the upper bound of the intensity since it always lies at the beginning of one of the processes taken into consideration. Algorithm 2 summarizes the procedure to compute intensities with exponential kernels, and Algorithm 3 shows the procedure to sample the next event in each dimension making use of the special property of exponential kernel functions.
The simulation algorithm is shown in Algorithm 1. By using this algorithm we reduce the complexity from to , where is the maximum number of followees per node. That means, our algorithm scales logarithmically with the number of nodes and linearly with the number of edges at any point in time during the simulation. Moreover, events for new links, tweets and retweets are generated in a temporally intertwined and interleaving fashion, since every new retweet event will modify the intensity for link creation and vice versa.
5 Efficient Parameter Estimation from Coevolutionary Events
In this section, we first show that learning the parameters of our proposed model reduces to solving a convex optimization problem and then develop an efficient, parameter-free Minorization-Maximization algorithm to solve such problem.
5.1 Concave Parameter Learning Problem
Given a collection of retweet events and link creation events recorded within a time window , we can easily estimate the parameters needed in our model using maximum likelihood estimation. To this aim, we compute the joint log-likelihood of these events using Equation eq:loglikehood_fun, ,
For the terms corresponding to retweets, the log term sums only over the actual observed events while the integral term actually sums over all possible combination of destination and source pairs, even if there is no event between a particular pair of destination and source. For such pairs with no observed events, the corresponding counting processes have essentially survived the observation window , and the term simply corresponds to the log survival probability. The terms corresponding to links have a similar structure.
Once we have an expression for the joint log-likelihood of the retweet and link creation events, the parameter learning problem can be then formulated as follows:
If we stack all parameters in a vector, one can easily notice that the log-likelihood can be written as , which is clearly a concave function with respect to , and thus is convex. Moreover, the constraints are linear inequalities and thus the domain is a convex set. This completes the proof for convexity of the optimization problem.
It’s notable that the optimization problem decomposes in independent problems, one per node , and can be readily parallelized.
5.2 Efficient Minorization-Maximization Algorithm
Since the optimization problem is jointly convex with respect to all the parameters, one can simply take any convex optimization method to learn the parameters. However, these methods usually require hyper parameters like step size or initialization, which may significantly influence the convergence. Instead, the structure of our problem allows us to develop an efficient algorithm inspired by previous work [16, 17], which leverages Minorization Maximization (MM)  and is parameter free and insensitive to initialization.
Our algorithm utilizes Jensen’s inequality to provide a lower bound for the second log-sum term in the log-likelihood given by Equation (27). More specifically, consider a set of arbitrary auxiliary variable , where , and is the number of link events, , . Further, assume these variables satisfy
Then, we can lower bound the logarithm in Equation (29) using Jensen’s inequality as follows:
Now, we can lower bound the log-likelihood given by Equation (29) as:
By taking the gradient of the lower-bound with respect to the parameters, we can find the closed form updates to optimize the lower-bound:
Finally, although the lower bound is valid for every choice of satisfying Equation (30), by maximizing the lower bound with respect to the auxiliary variables we can make sure that the lower bound is tight:
Fortunately, the above constrained optimization problem can be solved easily via Lagrange multipliers, which leads to closed form updates:
6 Properties of Simulated Co-evolution, Networks and Cascades
In this section, we perform an empirical investigation of the properties of the networks and information cascades generated by our model. In particular, we show that our model can generate co-evolutionary retweet and link dynamics and a wide spectrum of static and temporal network patterns and information cascades.
6.1 Simulation Settings
Throughout this section, if not said otherwise, we simulate the evolution of a 8,000-node network as well as the propagation of information over the network by sampling from our model using Algorithm 1. We set the exogenous intensities of the link and diffusion events to and respectively, and the triggering kernel parameter to . The parameter determines the independent growth of the network – roughly speaking, the expected number of links each user establishes spontaneously before time is . Whenever we investigate a static property, we choose the same sparsity level of .
6.2 Retweet and Link Coevolution
Figures 8(a,b) visualize the retweet and link events, aggregated across different sources, and the corresponding intensities for one node and one realization, picked at random. Here, it is already apparent that retweets and link creations are clustered in time and often follow each other. Further, Figure 8(c) shows the cross-covariance of the retweet and link creation intensity, computed across multiple realizations, for the same node, , if and are two intensities, the cross-covariance is a function . It can be seen that the cross-covariance has its peak around 0, , retweets and link creations are highly correlated and co-evolve over time. For ease of exposition, we illustrated co-evolution using one node, however, we found consistent results across nodes.
6.3 Degree Distribution
Empirical studies have shown that the degree distribution of online social networks and microblogging sites follow a power law [10, 1], and argued that it is a consequence of the rich get richer phenomena. The degree distribution of a network is a power law if the expected number of nodes with degree is given by , where . Intuitively, the higher the values of the parameters and , the closer the resulting degree distribution follows a power-law. This is because the network grows more locally. Interestingly, the lower their values, the closer the distribution to an Erdos-Renyi random graph , because, the edges are added almost uniformly and independently without influence from the local structure. Figure 9 confirms this intuition by showing the degree distribution for different values of and .
6.4 Small (shrinking) Diameter
There is empirical evidence that the diameter of online social networks and microblogging sites exhibit relatively small diameter and shrinks (or flattens) as the network grows [33, 10, 24]. Figures 10(a-b) show the diameter on the largest connected component (LCC) against the sparsity of the network over time for different values of and . Although at the beginning, there is a short increase in the diameter due to the merge of small connected components, the diameter decreases as the network evolves. Moreover, larger values of or lead to higher levels of local growth in the network and, as a consequence, slower shrinkage. Here, nodes arrive to the network when they follow (or are followed by) a node in the largest connected component.
6.5 Clustering Coefficient
Triadic closure [34, 11, 35] has been often presented as a plausible link creation mechanism. However, different social networks and microblogging sites present different levels of triadic closure . Importantly, our method is able to generate networks with different levels of triadic closure, as shown by Figure 10(c-d), where we plot the clustering coefficient , which is proportional to the frequency of triadic closure, for different values of and .
|(a) Diameter,||(b) Diameter,||(c) CC,||(d) CC,|
6.6 Network Visualization
Figure 11 visualizes several snapshots of the largest connected component (LCC) of two 300-node networks for two particular realizations of our model, under two different values of . In both cases, we used , , and . The top two rows correspond to and represent one end of the spectrum, , Erdos-Renyi random network. Here, the network evolves uniformly. The bottom two rows correspond to and represent the other end, , scale-free networks. Here, the network evolves locally, and clusters emerge naturally as a consequence of the local growth. They are depicted using a combination of forced directed and Fruchterman Reingold layout with Gephi222http://gephi.github.io/. Moreover, the figure also shows the retweet events (from others as source) for two nodes, and , on the bottom row. These two nodes arrive almost at the same time and establish links to two other nodes. However, node ’s followees are more central, therefore, is being exposed to more retweets. Thus, node performs more retweets than does. It again shows how information diffusion is affected by network structure. Overall, this figure clearly illustrates that by careful choice of parameters we can generate networks with a very different structure.
|t = 5||t=20||t=35|
|t = 5||t=20||t=35|
Information Diffusion Network Evolution: When node 6 joins the network a few nodes follow her and retweet her posts. Her tweets being propagated (shown in red) turning her to a valuable source of information. Therefore, those retweets are followed by links created to her (shown in magenta).
Network Evolution Information Diffusion: Nodes 46 and 68 both have almost the same number of followees. However, as soon as node 46 connects to node 130 (which is a central node and retweets very much) her activity dramatically increases compared to node 68.
Figure 12 illustrates the spike trains (tweet, retweet, and link events) for the first 140 nodes of a network simulated with a similar set of parameters as above and Figure 13 shows three snapshots of the network at different times. First, consider node 6 in the network. After she joins the network, a few nodes begin to follow him. Then, when she starts to tweet, her tweets are retweeted many times by others (red spikes) in the figure and these retweets subsequently boost the number of nodes that link to her (Magenta spikes). This clearly illustrates the scenario in which information diffusion triggers changes on the network structure. Second, consider nodes 46 and 68 and compare their associated events over time. After some time, node 46 becomes much more active than node 68. To understand why, note that soon after time 137, node 46 followed node 130, which is a very central node ( following a lot of people), while node 68 did not. This clearly illustrates the scenario in which network evolution triggers changes on the dynamics of information diffusion.