Deep Collaborative Embedding for information cascade prediction

Recently, information cascade prediction has attracted increasing interest from researchers, but it is far from being well solved partly due to the three defects of the existing works. First, the existing works often assume an underlying information diffusion model, which is impractical in real world due to the complexity of information diffusion. Second, the existing works often ignore the prediction of the infection order, which also plays an important role in social network analysis. At last, the existing works often depend on the requirement of underlying diffusion networks which are likely unobservable in practice. In this paper, we aim at the prediction of both node infection and infection order without requirement of the knowledge about the underlying diffusion mechanism and the diffusion network, where the challenges are two-fold. The first is what cascading characteristics of nodes should be captured and how to capture them, and the second is that how to model the non-linear features of nodes in information cascades. To address these challenges, we propose a novel model called Deep Collaborative Embedding (DCE) for information cascade prediction, which can capture not only the node structural property but also two kinds of node cascading characteristics. We propose an auto-encoder based collaborative embedding framework to learn the node embeddings with cascade collaboration and node collaboration, in which way the non-linearity of information cascades can be effectively captured. The results of extensive experiments conducted on real-world datasets verify the effectiveness of our approach.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

05/18/2021

Independent Asymmetric Embedding Model for Cascade Prediction on Social Network

The prediction for information diffusion on social networks has great pr...
09/06/2017

An Influence-Receptivity Model for Topic based Information Cascades

We consider the problem of estimating the latent structure of a social n...
05/27/2018

Dynamic Network Model from Partial Observations

Can evolving networks be inferred and modeled without directly observing...
04/18/2022

Preference Enhanced Social Influence Modeling for Network-Aware Cascade Prediction

Network-aware cascade size prediction aims to predict the final reposted...
12/07/2021

CCasGNN: Collaborative Cascade Prediction Based on Graph Neural Networks

Cascade prediction aims at modeling information diffusion in the network...
06/16/2020

Network Diffusions via Neural Mean-Field Dynamics

We propose a novel learning framework based on neural mean-field dynamic...
06/03/2021

Influence Estimation and Maximization via Neural Mean-Field Dynamics

We propose a novel learning framework using neural mean-field (NMF) dyna...

1 Introduction

In recent years, as more and more people enjoy the services provided by Facebook, Twitter, and Weibo, etc., information cascades have become ubiquitous in online social networks, which has motivated a huge amount of researches Cheng et al. (2014); Li et al. (2017); Sun et al. (2017); Li et al. (2018); Liu et al. (2018). An important research topic is information cascade prediction, whose purpose is to predict who will be infected by a piece of information in the future Saito et al. (2008); Guille and Hacid (2012); Wang et al. (2017); Gao et al. (2017), where infection refers to the actions that users reshare (retweet) or comment a tweet, a photo, or other piece of information Bourigault et al. (2014).

While lots of methods have been proposed for information cascade prediction Saito et al. (2008); Gomez-Rodriguez et al. (2011); Bourigault et al. (2016); Zhang et al. (2019); Varshney et al. (2017)

, the existing works often suffer from three defects. First, the existing works often focus on predicting the probability that whether a node will be infected in the future given nodes infected in the past, but ignore the prediction of infection order, i.e., which nodes will be infected earlier or later than others. However, predicting the infection order is important in many scenarios. For example, it is helpful for blocking rumor spread to know who will be the next infected node

Guille et al. (2013); Yang et al. (2020)

. Second, the existing methods often assume that information diffusion follows a parametric model such as Independent Cascade (IC) model

Goldenberg et al. (2001) and Susceptible-Infected (SI) model Radcliffe (1977). In real world, however, information diffusion processes are so complicated that we seldom exactly know the underlying mechanisms of how information diffuses Steeg and Galstyan (2013). At last, the existing works often assume that the explicit paths along which information propagates between nodes are observable. Yet in many scenarios we can only observe that nodes get infected but can not know who infects them Bourigault et al. (2016). For example, in viral marketing, one can track whether a customer buys a product but it is difficult to exactly determine who influences her/him.

In this paper, we aim at the problem of information cascade prediction without requirement of the knowledge about the underlying diffusion mechanism and the diffusion network. This is not easy due to the following two major challenges:

  • Cascading Characteristics The probability that a node is infected by a cascade and the relative infection order mainly depend on its cascading characteristics that reveal its relation to other nodes in that cascade. The existing methods often just take into consideration the static structural properties of nodes, for example, the node neighborship in a static social network. However, the cascading characteristics of a node intuitively vary in different cascades, and different cascades can contain totally different infection ranges or orders of nodes. For example, in some cascades, one node may often get infected by certain nodes, but in other cascades, it may be more susceptible to different nodes, even though the node structural properties remain the same. Intuitively, different contents often lead to different cascading characteristics of a node and result in different underlying mechanisms in different cascades. However, in many situations it is not easy to recognize the content (i.e., what is diffused) and its underlying diffusion mechanism (i.e., why and how it is diffused). For example, we often do not know what virus is being propagated in a plague, but when and which nodes are infected can be observed. To make prediction for cascades in such situations, we have to explicitly model the observable cascading characteristics which arguably implicitly captures the effect of the unobservable content and underlying mechanism as well. Therefore, what cascading characteristics of nodes should be captured and how to capture them are crucial to our purpose.

  • Cascading Non-linearity Information cascades are often non-linear. The non-linearity comes from two perspectives. One is the non-linearity of the dynamics of the information cascades, and the other is the non-linearity of the structure of the social networks on which cascades exist. The non-linearity will cause the problem when nodes spread the content of a cascade, they exhibit non-linear cascading patterns (e.g., emergence pattern) that the existing shallow models can not effectively recognize. How to capture the non-linear features of nodes in information cascades is also a critical challenge for our problem.

Inspired by the impressive network representation learning ability of deep learning that has been demonstrated by the recent works

Wang et al. (2016); Liao et al. (2018); Chang et al. (2015), we propose a novel model called Deep Collaborative Embedding (DCE) for prediction of infection and infection order in cascades, which can learn the embeddings without assumption about the underlying diffusion model and diffusion networks. The main idea of DCE is to collaboratively embed the nodes with a deep architecture into a latent space where the closer the embeddings of the two nodes are, the more likely the two nodes will be infected in the same cascade and the closer their infection time will be.

Different from the traditional network embedding methods Wang et al. (2016); Tang et al. (2015); Xie et al. (2019); Perozzi et al. (2014), which mainly focus on preserving the static structural properties of nodes in a network, DCE can capture not only the node structural property but also two kinds of node cascading characteristics that are important for the prediction of node infection and infection order. One is the cascading context, which reveals the temporal relation of nodes in a cascade. The cascading context of one node consists of two aspects, including the potential influence it receives from earlier infected nodes and their temporal relative positions in a cascade. The other kind of cascading characteristic captured by DCE is the cascading affinity, which reveals the co-occurrence relation of nodes in cascades. Cascading affinity essentially reflects the probability that two nodes will be infected by the same cascade. Higher cascading affinity between two nodes indicates that it is more likely for them to co-occur in a cascade. Intuitively, the cascading characteristics of nodes reflect the effect of the unobservable underlying diffusion mechanisms and diffusion networks. Therefore, by explicitly preserving the node cascading characteristics, the learned embeddings also implicitly capture the effect of unobservable underlying diffusion mechanisms and diffusion network, which makes it feasible to make cascade predictions in terms of the similarity between embeddings in the latent space. As we will see later in the experiments, due to the ability to capture the cascading characteristics, the embeddings learned by DCE show a better performance in the task of infection prediction.

To effectively capture the non-linearity of information cascades, we introduce an auto-encoder based collaborative embedding

architecture for DCE. DCE consists of multi-layer non-linear transformations by which the non-linear cascading patterns of nodes can be effectively encoded into the embeddings. DCE can learn embeddings for nodes in a collaborative way, where there are two kinds of collaborations, i.e.,

cascade collaboration and node collaboration. At first, in light of the observation that a node often participates in more than one cascade of different contents, for a node DCE can collaboratively encode its cascading context features in each cascade into its embedding. In other words, the embedding of a node is learned with the collaboration of the cascades the node participates, which we call the cascade collaboration. At the same time, DCE can concurrently embed the nodes, during which the embedding for a node is generated under the constraints of its relation to other nodes, i.e., its cascading affinity to other nodes and its neighborship in social networks. In other words, the embeddings of nodes are learned with the collaboration of each other, which we call the node collaboration.

The major contributions of this paper can be summarized as follows:

  1. We propose a novel model called Deep Collaborative Embedding (DCE) for information cascade prediction without requirement of the knowledge about the underlying diffusion mechanism and the diffusion network. The node embeddings learned by DCE are beneficial to not only the infection prediction but also the prediction of infection order of nodes in a cascade.

  2. We propose an auto-encoder based collaborative embedding framework for DCE, which can collaboratively learn the node embeddings, preserving the node cascading characteristics including cascading context and cascading affinity, as well as the structural property.

  3. The extensive experiments conducted on real datasets verify the effectiveness of our proposed model.

The rest of this paper is organized as follows. We give the preliminaries in Section 2. The cascading context is defined and modeled in Section 3. In section 4 we illustrate our proposed model and in Section 5 we analyze the experiments results. Finally, we briefly review the related work in Section 6 and conclude in section 7.

2 Preliminaries and Problem Definition

Symbol Description
the number of nodes
the number of cascades
network
the set of nodes
the set of edges
the set of cascades
the cascading context matrix of cascade ,

the cascading affinity matrix,

the structural proximity matrix,
the infections time of node in cascade

the row vector of node

in
the learned embedding vector of node
Table 1: Notations

2.1 Basic Definitions

We denote a social network as , where is the nodes set comprising nodes and is the edges set. Let be the set of information cascades. An information cascade () observed on a social network is defined as a set of timestamped infections, i.e., , where represents node is infected by cascade at time . We also say if node participates in cascade . Additionally, we use to denote the set of nodes infected by cascade before time , and the set of nodes which haven’t been infected before . Note that the nodes in might or might not be infected by after .

2.2 Problem Definition

The target problem of this paper can be formulated as: given a set of information cascades observed on a given social network , we want to learn embeddings for nodes in , where the learned embeddings can preserve the cascading characteristics and structural property of nodes, so that closer embeddings indicate that the corresponding nodes are more likely to be infected by the same cascade with the closer infection time.

3 Modeling Cascading Characteristics

Cascading characteristics of a node reveal its relation to other nodes in information cascades, which are crucial to the prediction of node infection and infection order. In this section, we will define two kinds of cascading characteristics, the cascading context and the cascading affinity, which will be encoded into the learning embeddings.

3.1 Cascading Context

As mentioned before, the cascading context of a node in a cascade is supposed to capture its temporal relation to other nodes in that cascade, which includes the potential influence imposed by other nodes and their temporal infection order. There are three factors we have to consider for the definition of cascading context. First, the infection of a node is intuitively caused by the potential influence of all the nodes infected before it, and the influence declines over time. Second, the cascading context should be specific to a cascade, as one node might have different cascading contexts in different cascades. Finally, in the same cascade, the infection of one node can be influenced neither by the nodes that are infected after it, nor by the nodes that are not infected at all. Based on these ideas, we can define the cascading context as follow:

Definition 1

(Cascading Context): Given the set of cascades on a social network of nodes, , the cascading context of the nodes involved in cascade () is defined as a matrix . The entry at the -th row and the -th column of represents the potential influence from node to , which is defined as

(1)

where is the infection time of in cascade and is the decaying factor. The cascading context of node in cascade is defined as the row vector .

As we will see later, will be fed into our model as it quantitatively captures ’s temporal relation (including the influence and the relative infection position) to the other nodes in a cascade .

3.2 Cascading Affinity

As mentioned before, cascading affinity of two nodes measures the similarity of them with respect to the cascades, which can be defined in terms of their co-occurrences in historical cascades as follow:

Definition 2

(Cascading Affinity): Given the set of cascades on a social network of nodes,i.e., , the cascading affinity of two nodes and is represented by the entry at the -th row and the -th column of the cascading affinity matrix , which is defined as the ratio of the cascades involving both and , i.e.,

(2)

Definition 2 tells us that for two given nodes, the more number of cascades involving both of them, the higher their cascading affinity, and intuitively the more similar their preferences to the contents of cascades. In this sense, cascading affinity of two nodes implies that how close their embeddings should be in the latent space.

4 Deep Collaborative Embedding

Figure 1: Architecture of DCE

In this paper, we propose an auto-encoder based Deep Collaborative Embedding (DCE) model, which can learn embeddings for nodes in a given social network, based on the cascades observed on the network, so that the learned embeddings can be used for cascade prediction without knowing the underlying diffusion mechanisms and the explicit diffusion networks. In this section, we first present the architecture of the Deep Collaborative Embedding (DCE) model in detail, and then we describe the objective function and the learning of DCE.

4.1 Architecture of DCE

The architecture of DCE is shown in Fig.1. As we can see from Fig.1, DCE learns the embeddings through two collaborations, the cascade collaboration and the node collaboration. With the cascade collaboration, DCE can generate the result -dimensional embedding for a node by collaboratively encoding its cascading contexts, (). At first, DCE will learn intermediate embeddings for by auto-encoders, respectively, each of which corresponds to a cascade. The auto-encoder for cascade () takes the ’s cascading context in the cascade as input, and then generates the intermediate embedding of in cascade , , through its encoder part consisting of non-linear hidden layers defined by the following equations:

(3)

where is the output vector of -th hidden layer of -th auto-encoder taking as input, is the parameter matrix of that layer, and is the corresponding bias.

At last, the result embedding is generated by fusing the intermediate embeddings () through the following non-linear mappings:

(4)

Symmetrically, the decoder part of the auto-encoder for cascade is defined by the following equations:

(5)

In the above Equations (3), (4), and (5), the parameter matrices and

, and the bias vectors

and are the parameters that will be learned from training data.

At the same time, with the node collaboration, DCE can concurrently embed the nodes into latent space, by which the similarity between nodes in the social network can be captured into the learned embeddings. Particularly, to regulate the closeness between any two embeddings and , DCE will impose the constraints of the cascading affinity and structural proximity between and via Laplacian Eigenmaps, which will be described in detail in next subsection.

4.2 Optimization Objective of DCE

4.2.1 Loss Function for Cascade Collaboration

At first, as described in last subsection, auto-encoders defined by Equations (3), (4), and (5) fulfill the cascade collaboration for embedding by reconstructing its cascading contexts . The optimization objective for this part is to minimize the reconstruction error between and

, of which the loss function is defined as follow:

(6)

where and are the original cascading context matrix and the reconstructed cascading context matrix of cascade , respectively, which are defined in Definition 1.

The cascading context vectors are often sparse, which may leads to undesired vectors in the embeddings and the reconstructed if the sparse vectors are straightforwardly fed into DCE. To overcome this issue, inspired by the idea used in the existing works Wang et al. (2016); Zhang et al. (2017) which assign more penalty (corresponding to larger weight) to the loss incurred by non-zero elements than that incurred by zero elements, the can be redefined as

(7)

where denotes the Hadamard product, and the -th column vector of the matrix is the weight vector assigned to cascading context . An entry if , otherwise .

4.2.2 Loss Functions for Node Collaboration

Next we introduce the loss function for node collaboration. As mentioned in last subsection, through the node collaboration the embeddings will preserve the cascading affinity of nodes in cascades and the structural proximity of nodes in social network. Following the idea of Laplacian Eigenmaps, we weight the similarity between two embeddings with the cascading affinity of their corresponding nodes, which leads to the following loss function:

(8)

where is the cascading affinity between and defined in Equation (2). The insight of Equation (8) is that a penalty will be imposed when two nodes with high cascading affinity are relocated far away in the latent space.

Similarly, we also weight the similarity between two embeddings with the structural proximity of their corresponding nodes, which leads to the following loss function:

(9)

where is the structural proximity between and in social network. Note that it does not matter how to define , and theoretically, the node structural proximity of any order can be used for . In this paper, we employ the first-order proximity Tang et al. (2015) to define . To be more specific, if and are connected by a link in the network, otherwise .

Let be the laplacian matrix of the cascading affinity matrix , i.e., , where is diagonal and . Let be the structural proximity matrix whose entry at -th row and -th column is , and similarly, let be its laplacian matrix, i.e., , where is also diagonal and . Then we can rewrite the Equations (8) and (9) with their matrix forms:

(10)

and

(11)

where is the embedding matrix whose -th column is .

4.2.3 The Complete Loss Function

By combining , , and , we can define the complete loss function of DCE as follow:

(12)

where is a 2-norm regularizer term to avoid overfitting, and , , and are nonnegative parameters used to control the contributions of the terms.

0:     The set of cascading context matrices , cascading affinity matrix , structural proximity matrix , and the parameters and .
0:     Node embeddings .
1:  Initialize parameters , , , and .
2:  repeat
3:     Compute according to Equations (3), (4) and (5).
4:     Compute total loss according to Equation (12).
5:     Update , , , and according to Equations (13) to (16) using SGD.
6:  until  converges.
Algorithm 1 learning algorithm of DCE

4.3 Learning of DCE

DCE model can be learned using Stochastic Gradient Descent (SGD), the gradients of which are given by the follow equations:

(13)
(14)
(15)
(16)

where the partial derivatives on the right side of the equations can be computed using back-propagation.

The learning process is given in Algorithm 1. Note that in each iteration, the parameters are updated (Line 5) once the embeddings , are concurrently generated (Line 3). Such concurrent embedding scheme ensures the cascading context can be encoded into the embeddings as well as the cascading affinity and the structural proximity of nodes can be preserved at the same time.

5 Experiments

In this section, we will present the details of experiments conducted on real-world datasets. The experiments include two parts, the tuning of the hyper-parameters and the verifying of DCE. Particularly, to verify the effectiveness of DCE, we will check whether the embeddings learned by DCE improve the performance of the prediction of information cascades on the real world datasets.

5.1 Settings

5.1.1 Datasets

We verify the effectiveness of our method through experiments conducted on three real datasets, Digg, Twitter, and Weibo, which are described as follows:

Digg is a website where users can submit stories and vote for the stories they like Hogg and Lerman (2012). The dataset extracted from Digg contains 3,553 stories, 139,409 users, and 3,018,197 votes with timestamps. A vote for a story is treated as an infection of that story, and the votes for the same story constitute a cascade. In addition, a social link exists between two users if one of them is watching or is a fan of the other one.

Twitter is a social media network which offers microblog service Weng et al. (2013). The dataset extracted from Twitter comprises 510,795 users and 12,054,205 tweets with timestamps, where each tweet is associated with a hashtag. If the hashtag is adopted in one user’s tweet, we consider it infects that user. The tweets sharing the same hashtag are treated as a cascade, and 1,345,913 cascades are contained in the dataset. In addition, the users are linked by their following relationships.

Weibo is a Twitter-like social network Zhang et al. (2013). The dataset extracted from Weibo contains 1,340,816 users and their 31,444,325 tweets with timestamps. A retweeting action of a user is viewed as an infection of the retweeted tweet to that user. The retweetings of the same tweet constitute a cascade, and the dataset contains 232,978 cascades of different tweets. The users in Weibo network are also connected by following relationships.

The statistics of the datasets are summarized in Table 2. On each dataset, we randomly select 60% of the total cascades as training set, 20% as validating set, and the remaining 20% as testing set.

Dataset #Nodes #Links  Avg. Degree #Cascades #Infections  Avg. Cascade Length
Digg 139,409 1,731,658 12.4 3,553 3,018,197 849.5
Twitter 510,795 14,273,311 27.9 1,345,913 12,054,205 9.0
Weibo 1,340,816 308,489,739 230.1 232,978 31,444,325 135.0
Table 2: The statistics of datasets

5.1.2 Baselines

In order to demonstrate the effectiveness of DCE, we compare it with the following baseline methods:

NetRate NetRate is a generative cascade model which exploits infection times of nodes without assumptions on the network structure Rodriguez et al. (2011)

. It models information diffusion process as discrete networks of continuous temporal process occurring at different rates, and then infers the edges of the global diffusion network and estimates the transmission rates of each edge that best explain the observed data.

CDK CDK maps nodes participating in information cascades to a latent representation space using a heat diffusion process Bourigault et al. (2014). It treats learning diffusion as a ranking problem and learns heat diffusion kernels that defines, for each node of the network, its likelihood to be reached by the diffusing content, given the initial source of diffusion. Here we adopt the without-content version of CDK considering that other baselines and our approach are not designed to deal with diffusion content.

Topo-LSTM Topo-LSTM uses directed acyclic graph as the diffusion topology to explore the diffusion structure of cascades rather than regarding it as merely a sequence of nodes ordered by their infection timestamps Wang et al. (2017). Then it puts dynamic DAGs into a LSTM-based model to generate topology-aware embeddings for nodes as outputs. The infection probability at each time step will be computed according to the embeddings.

Embedded-IC Embedded-IC is a representation learning technique for inference of Independent Cascade (IC) model Bourigault et al. (2016). Embedded-IC can embed users in cascades into a latent space and infer the diffusion probability between users based on the relative positions of the users in the latent space.

DCE-C DCE-C is a special version of the proposed DCE, where the node collaborations of cascading affinity and structural proximity are removed while only the cascade collaboration of cascading contexts is kept.

5.2 Cascade Prediction

In this paper, we evaluate the learned embeddings by applying them to the task of information cascade prediction, the details of which are described as follows.

For a testing cascade , given a set of seed nodes which are infected before, we predict the infection probabilities for the remaining nodes and their infecting order. To be more specific, the size of the seed set will be of the total number of the nodes. Let be the set of nodes that are predicted before time step , and then the probability that one node will be infected at is

(17)

where is the probability that is infected by . Our idea of computing is based on the similarity between the embeddings, which is defined as

(18)

where and are embedding vectors of nodes and , respectively, and the similarity is measured by Euclidean distance. For each uninfected node , its infection probability can be computed according to Equation (17), and we can obtain a list of the nodes in descending order of their infection probabilities. Comparing with the ground truth , we can evaluate the performance of the prediction with two metrics, Mean Average Precision(MAP) and order-Precision.

As a metric originating from information retrieval, MAP can evaluate the prediction of information cascades by taking positions of nodes in the predicting list into consideration. We first define the top- precision of as the hit rate of the first nodes of over the ground truth, i.e.,

(19)

where is the set of first nodes of . Then based on , we can define the average precision of as

(20)

where denotes the rank of node in and is the top- precision of . From Equations (19) and (20) we can see that, it will lead to a low if too many nodes which occur in but rank low in . What’s more, we set the size of the predicted list in {100, 300, 500, 700, 900} to compute among the first nodes. Finally, can be defined as the average of over testing set , i.e.,

(21)

To evaluate the prediction of infection order, we propose a new metric, order-Precision, which is defined as

(22)

where is the true infection time of and is the predicted one, and and denotes the sets of nodes infected before node in the ground truth list and the predicted list respectively. The idea of Equation (22) is that the more nodes with more similar relative orders of nodes in and , the higher the order-Precision of . First, to evaluate the similarity of node s relative orders in and

, we consider a heuristic indicator, the number of the nodes that are infected before node

and shared by and , i.e., , and the larger this number is, the more similar the relative orders will be. Then we can obtain the relative order similarity for one single testing cascade by taking the average over all nodes shared by and . Finally, the overall order-Precision is the average of the relative order similarities over all testing cascades in .

Figure 3: Tuning the parameter and on Twitter.
(a) MAP
(b) order-Precision
(a) MAP
(b) order-Precision
(a) MAP
(b) order-Precision
Figure 2: Tuning the parameter and on Digg.
Figure 3: Tuning the parameter and on Twitter.
Figure 4: Tuning the parameter and on Weibo.
Figure 2: Tuning the parameter and on Digg.

5.3 Hyper-parameter Tuning

In this subsection, we investigate the hyper-parameters and in Equation (12) on the validation set, which control the influence of the cascading affinity and the structure proximity on the embedding learning, respectively.

For simplicity, we fix and adopt a grid search in the range of with a step size of 0.2 to determine the optimal values of and . Fig. 4, Fig. 4, and Fig. 4 show the results of MAP and order-Precision over different combinations of and on three datasets. Through a comprehensive comparison, we can find that, in most cases the MAPs and order-Precisions at non-zero and are better than those at zero and . Taking the Fig.4 (a) as an instance, the MAP value at (0.6, 0.8) is 0.8835, which is higher than 0.8703 at (0.0, 0.0). It verifies that appropriately applying cascading affinity and structural proximity as constrains can improve the learned embeddings for information cascade prediction. The combinations of and at which the sum of MAP and order-Precision achieve the highest are chosen for the remaining experiments. Based on this criterion, we set (, ) as (0.1, 0.9) for Digg, (0.6, 0.8) for Twitter, and (0.8, 0.2) for Weibo.

5.4 Effectiveness

In this section, we will analyze the experiments results in the tasks of infection prediction and infection order prediction, which are presented in Table 3 and Figure 5 respectively.

5.4.1 Infection Prediction

Dataset Method MAP@ (%)
@100 @300 @500 @700 @900
Digg NetRate 1.108 5.749 10.933 16.618 24.043
CDK 27.951 39.766 52.032 65.220 80.408
Embedded-IC 2.084 9.073 23.314 47.249 78.066
Topo-LSTM 2.444 17.535 25.812 42.779 69.534
DCE-C 32.356 55.308 63.546 66.823 86.879
DCE 47.497 72.952 76.694 84.250 91.362
Twitter NetRate 0.140 2.550 6.724 15.058 30.572
CDK 9.512 22.724 34.701 48.162 63.315
Embedded-IC 0.751 4.740 12.568 24.985 43.347
Topo-LSTM 0.665 5.084 13.681 26.083 42.050
DCE-C 15.983 27.846 37.427 53.617 65.858
DCE 16.376 29.773 40.690 56.301 69.863
Weibo NetRate 0.469 2.696 7.724 15.280 25.583
CDK 1.124 11.510 25.348 41.810 54.429
Embedded-IC 0.185 3.988 9.706 18.965 30.738
Topo-LSTM 0.005 0.268 2.204 7.084 19.774
DCE-C 3.466 28.526 52.084 62.684 71.339
DCE 10.506 30.986 53.555 64.533 72.746
Table 3: MAP@ on Digg, Twitter and Weibo datasets

Tables 3 gives the MAPs of different methods for infection prediction task, with the best ones in each case being boldfaced. From Table 3 we can make some analyses as follows:

  1. The proposed DCE-C and DCE always outperform all baselines, giving improvements on the best baselines by (Twitter, MAP@500) to (Digg, MAP@300) relatively across all datasets. We can also find that DCE achieves better results than DCE-C in every case, and it proves that by using node collaborations as constrains, DCE can better characterize relations between nodes, which are important in information cascades.

  2. The results show that, through collaboratively mapping the nodes into a latent space with a deep architecture, DCE can better capture deep and non-linear features of nodes in information cascades than Netrate, which estimates infection probability directly with a shallow probabilistic model.

  3. In contrast with embedding baselines CDK, Embedded-IC, and Topo-LSTM, DCE’s deep collaborative embedding architecture can better preserve the cascading characteristics and structural properties of nodes, which are crucial for infection prediction. Unlike CDK which assumes unrealistically that information diffusion is driven by the relations between source node and the others, in DCE all infected nodes are thought to have potential influence on the not yet infected ones and cascading context is employed to model their temporal relations. And as DCE makes no assumption about the underlying diffusion mechanism, it can better utilize the cascading contexts of nodes than Embedded-IC which is based on the IC model. Compared with Topo-LSTM that also adopts a deep model, DCE does not rely on the knowledge of the underlying diffusion network, which is usually difficult to obtain.

5.4.2 Infection Order Prediction

(a) Digg
(b) Twitter
(c) Weibo
Figure 5: order-Precision on Digg, Twitter and Weibo datasets.

In Figure 5 the order-Precisions of different methods for infection order prediction are presented, based on which several analyses can be made as follows:

  1. We can see that the proposed DCE-C and DCE achieve best performance in all three datasets. The reason is that with the proposed cascading context, DCE is able to not only better preserve the temporal relations, but also better capture the infection order characteristics in information cascades than baselines. And DCE’s superior results over DCE-C reveals that, even though cascading affinity and structural property do not indicate nodes infection orders explicitly, they can lead to further improvements when they are used as constrains in DCE.

  2. To be more specific, NetRate is incapable of capturing the infection order features with its shallow probabilistic model. While CDK exploits heat diffusion kernel to formulate a ranking problem, where infection orders are kind of modeled, it can not fully characterize node infection order features like the proposed cascading context in DCE. For embedded-IC, nodes infection orders do not get any attention in this IC-based model and certainly can not be captured, which results in its bad performance. Notwithstanding Topo-LSTM’s adoption of diffusion topology can encode the nodes infection orders to some extent, it still can not get rid of the dependence on the underlying diffusion network, which can not always be satisfied.

6 Related Work

In this section, we briefly review two lines of related works with our research, including network embedding and information cascade prediction.

6.1 Network Embedding

With the wide employment of embedding methods in various machine learning tasks

Mikolov et al. (2013a, b); Pota et al. (2019); Esposito et al. (2020); Deng et al. (2020), network embedding also gains more and more attentions and applicationsCui et al. (2017); Cai et al. (2018). Network embedding refers to assigning nodes in a network to low-dimensional representations and effectively preserving the network structure Cui et al. (2017). Intuitively, nodes can be represented by their correspondent raw or column feature vectors in the adjacent matrix of a network. However, sometimes these vectors are sparse with high dimensions, which brings challenges to machine learning tasks. As a result, a set of traditional network embedding methods Tenenbaum et al. (2000); Roweis and Saul (2000); Belkin and Niyogi (2001); Cox and Cox (2001) are proposed mainly for dimension reduction. Nevertheless, these methods can only work well on networks of relatively small sizes and suffer from high computation cost when coping with online social networks with huge numbers of nodes.

Recent works like DeepWalk Perozzi et al. (2014) and LINE Tang et al. (2015) are proposed to learn low-dimensional representations for nodes through an optimization process instead of directly transforming the original feature vectors, where the scaling problem also can be well handled. Inspired by word2vec Mikolov et al. (2013a, b), DeepWalk considers the nodes in network as the words in natural language and utilizes random walks to generate node sequences, based on which the node representations are learned following the procedure of word2vec. As a more generalized version of DeepWalk, node2vec is proposed in Grover and Leskovec (2016) with biased random walks to control the generation of nodes’ contexts more flexibly. LINE produces embeddings for nodes with the expectation to preserve both the first-order and second-order proximities of the network neighborhood structure. Under the influence of these researches, a collection of network embedding methods are proposed for different scenarios. For instances, Swami et al. (2017) modifies DeepWalk for heterogeneous networks by introducing meta-path based random walks, and Xu et al. (2017)

incorporates a harmonious embedding matrix to further embed the embeddings that only encode intra-network edges. As the deep neural network has shown remarkable effectiveness in many machine learning tasks, there also emerges a series of works which perform network embedding with a deep model. For example,

Wang et al. (2016)

adopts a semi-supervised deep autoencoder model to exploit the first-order and second-order proximities jointly to preserve the network structure.

Liao et al. (2018)

learns nodes representations by keeping both the structural proximity and attribute proximity with a designed multilayered perceptron framework. And in

Chang et al. (2015), the researchers use a highly nonlinear multilayered embedding function to capture the complex interactions between the heterogeneous data in a network.

However, most of these network embedding methods Xie et al. (2019); Goyal et al. (2020); Huang et al. (2019) are not applicable to information cascade prediction. In our work, we employ an auto-encoder based collaborative embedding architecture to learn embeddings from nodes’ cascading contexts with constrains.

6.2 Information Cascade Prediction

Information cascade phenomena have been widely investigated in the context of epidemiology, physical science and social science, and the development of online social network has greatly promoted related researches Chou and Chen (2018); Li et al. (2018); Varshney et al. (2017). Most of the early researches Kempe and Kleinberg (2003) analyse information cascade based on fixed models, the representatives among which are Independent Cascade(IC) Goldenberg et al. (2001) model and the Linear Threshold(LT) Granovetter (1978) model. Classic IC model treats the diffusion activity of information as cascades while the LT model determines infections of users according to thresholds of the influence pressure incoming from the neighborhood. Both of them can be unified into a same framework Kempe and Kleinberg (2003), and a series of extension work has been proposed Saito et al. (2008, 2010, 2009); Guille and Hacid (2012); Wang et al. (2012); Ding et al. (2019); Gursoya and Gunnec (2018). For example, Saito et al. (2010) extends the IC model to formulate a generative model that can take time delay into consideration. However, information diffusion processes are so complicated that we seldom exactly know the underlying mechanisms of how information diffuses. What’s more, these works are often based on the assumption that the explicit paths along which information propagates between nodes are observable, which is difficult to satisfy.

A collection of methods are proposed to infer the most possible links that can best explain the observed diffusion cascades without knowing the explicit paths. For instance, NetInf Gomez-Rodriguez et al. (2011) and Connie Myers and Leskovec (2010) use greedy algorithms to find a fixed number of links between users that maximize the likelihood of a set of observed diffusions under an IC-like diffusion hypothesis. And a more general framework called NetRate Rodriguez et al. (2011) has been proposed, which also occurs in our experiments as a baseline. NetRate models information diffusion process as discrete networks of continuous temporal process occurring at different rates, and then infers the edges of the global diffusion network and estimates the transmission rates of each edge that best explain the observed data Rodriguez et al. (2011). There are also further variants of this framework being proposed Gomez-Rodriguez and Leskovec (2013); Wang et al. (2014). However, most of these works still rely on the assumption that information diffusion follows a parametric model.

In recent years, a set of researches Bourigault et al. (2014); Gao et al. (2017); Bourigault et al. (2016); Wang et al. (2017); Qiu et al. (2018) which adopt network embedding techniques to handle information cascade prediction have be proposed. These methods usually embed nodes in a latent feature space, then the diffusion probabilities between nodes are computed based on their positions in the space. CDK proposed in Bourigault et al. (2014) treats information diffusion as a ranking problem and maps nodes to a latent space using a heat diffusion process. However, it assumes the infected nodes orders of a cascade is influenced by the relations between source node and the other nodes, which is not realistic. Bourigault et al. (2016) follows the mechanism of IC model to embed users in cascades into a latent space. Wang et al. (2017) puts dynamic directed acyclic graphs into an LSTM-based model to generate topology-aware embeddings for nodes, which depends a lot on the network structure information. In contrast, our proposed method DCE collaboratively embed the nodes with a deep architecture into a latent space, without requirement of the knowledge about the underlying diffusion mechanisms and the explicit paths of diffusions on the network structure.

7 Conclusions

In this paper, we address the problem of information cascade prediction in online social networks with the network embedding techniques. We propose a novel model called Deep Collaborative Embedding (DCE) for information cascade prediction which can learn embeddings for not only infection prediction but also infection order prediction in a cascade, without the requirement to know the underlying diffusion mechanisms and the diffusion network. We propose an auto-encoder based collaborative embedding architecture to generate the embeddings that preserve the node structural property as well as the node cascading characteristics simultaneously in the learned embeddings. The results of extensive experiments conducted on real datasets verify the effectiveness of the proposed method.

Acknowledgment

This work is supported by National Natural Science Foundation of China under grant 61972270, and in part by NSF under grants III-1526499, III-1763325, III-1909323, CNS-1930941, and CNS-1626432.

References

References

  • M. Belkin and P. Niyogi (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, pp. 585–591. Cited by: §6.1.
  • S. Bourigault, C. Lagnier, S. Lamprier, L. Denoyer, and P. Gallinari (2014) Learning social network embeddings for predicting information diffusion. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 393–402. Cited by: §1, §5.1.2, §6.2.
  • S. Bourigault, S. Lamprier, and P. Gallinari (2016) Representation learning for information diffusion through social networks: an embedded cascade model. In Proceedings of the 9th ACM International Conference on Web Search and Data Mining, pp. 573–582. Cited by: §1, §5.1.2, §6.2.
  • H. Cai, V. W. Zheng, and K. C. Chang (2018) A comprehensive survey of graph embedding: problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering 30 (9), pp. 1616–1637. Cited by: §6.1.
  • S. Chang, W. Han, J. Tang, G. J. Qi, C. C. Aggarwal, and T. S. Huang (2015) Heterogeneous network embedding via deep architectures. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 119–128. Cited by: §1, §6.1.
  • J. Cheng, L. Adamic, P. A. Dow, J. M. Kleinberg, and J. Leskovec (2014) Can cascades be predicted?. In Proceedings of the 23rd International Conference on World Wide Web, pp. 925–936. Cited by: §1.
  • C. Chou and M. Chen (2018) Learning multiple factors-aware diffusion models in social networks. IEEE Transactions on Knowledge and Data Engineering 30 (7), pp. 1268–1281. Cited by: §6.2.
  • M. A.A. Cox and T. F. Cox (2001) Multidimensional scaling. Journal of the Royal Statistical Society 46 (2), pp. 1050 C1057. Cited by: §6.1.
  • P. Cui, X. Wang, J. Pei, and W. Zhu (2017) A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering 31 (5), pp. 833–852. Cited by: §6.1.
  • T. Deng, D. Ye, R. Ma, H. Fujita, and L. Xiong (2020) Low-rank local tangent space embedding for subspace clustering. Information Sciences 508, pp. 1–21. Cited by: §6.1.
  • J. Ding, W. Sun, J. Wu, and Y. Guo (2019) Influence maximization based on the realistic independent cascade model. Knowledge-Based Systems, pp. 105265. Cited by: §6.2.
  • M. Esposito, E. Damiano, A. Minutolo, G. D. Pietro, and H. Fujita (2020) Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering. Information Sciences 514, pp. 88–105. Cited by: §6.1.
  • S. Gao, H. Pang, P. Gallinari, J. Gup, and N. Kato (2017) A novel embedding method for information diffusion prediction in social network big data. IEEE Transactions on Industrial Informatics 13 (4), pp. 2097–2105. Cited by: §1, §6.2.
  • J. Goldenberg, B. Libai, and E. Muller (2001) Talk of the network: a complex systems look at the underlying process of word-of-mouth. Marketing Letters 12 (3), pp. 211–223. Cited by: §1, §6.2.
  • M. Gomez-Rodriguez, J. Leskovec, and A. Krause (2011) Inferring networks of diffusion and influence. ACM Transactions on Knowledge Discovery from Data 5 (4), pp. 1019–1028. Cited by: §1, §6.2.
  • M. Gomez-Rodriguez and J. Leskovec (2013) Modeling information propagation with survival theory. In Proceedings of the 30th International Conference on Machine Learning, pp. 666–674. Cited by: §6.2.
  • P. Goyal, S. R. Chhetri, and A. Canedo (2020) Dyngraph2vec: capturing network dynamics using dynamic graph representation learning. Knowledge-Based Systems 187, pp. 104816. Cited by: §6.1.
  • M. Granovetter (1978) Threshold models of collective behavior. American Journal of Sociology 83 (6), pp. 1420–1443. Cited by: §6.2.
  • A. Grover and J. Leskovec (2016) Node2vec: scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. Cited by: §6.1.
  • A. Guille, H. Hacid, C. Favre, and D. A. Zighed (2013) Information diffusion in online social networks:a survey. ACM SIGMOD Record 42 (2), pp. 17–28. Cited by: §1.
  • A. Guille and H. Hacid (2012) A predictive model for the temporal dynamics of information diffusion in online social networks. In Proceedings of the 21st International Conference on World Wide Web, pp. 1145–1152. Cited by: §1, §6.2.
  • F. Gursoya and D. Gunnec (2018) Influence maximization in social networks under deterministic linear threshold model. Knowledge-Based Systems 161, pp. 111–123. Cited by: §6.2.
  • T. Hogg and K. Lerman (2012) Social dynamics of digg.

    Epj Data Science

    1 (1), pp. 1–26.
    Cited by: §5.1.1.
  • F. Huang, X. Zhang, J. Xu, C. Li, and Z. Li (2019) Network embedding by fusing multimodal contents and links. Knowledge-Based Systems 171, pp. 44–55. Cited by: §6.1.
  • D. Kempe and J. Kleinberg (2003) Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 137–146. Cited by: §6.2.
  • C. Li, J. Ma, X. Guo, and Q. Mei (2017) DeepCas: an end-to-end predictor of information cascades. In Proceedings of the 26th International Conference on World Wide Web, pp. 577–586. Cited by: §1.
  • Y. Li, J. Fan, Y. Wang, and K. Tan (2018) Influence maximization on social graphs: a survey. IEEE Transactions on Knowledge and Data Engineering 30 (10), pp. 1852–1871. Cited by: §1, §6.2.
  • L. Liao, X. He, H. Zhang, and T. S. Chua (2018) Attributed social network embedding. IEEE Transactions on Knowledge and Data Engineering 30 (12), pp. 2257–2270. Cited by: §1, §6.1.
  • S. Liu, Q. Qu, and S. Wang (2018)

    Heterogeneous anomaly detection in social diffusion with discriminative feature discovery

    .
    Information Sciences 439-440, pp. 1–18. Cited by: §1.
  • T. Mikolov, K. Chen, G. Corrado, and J. Dean (2013a) Efficient estimation of word representations in vector space. Computer Science. Cited by: §6.1, §6.1.
  • T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean (2013b) Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 3111–3119. Cited by: §6.1, §6.1.
  • S. A. Myers and J. Leskovec (2010) On the convexity of latent social network inference. In Proceedings of the 23rd International Conference on Neural Information Processing Systems, pp. 1741–1749. Cited by: §6.2.
  • B. Perozzi, R. Alrfou, and S. Skiena (2014) DeepWalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. Cited by: §1, §6.1.
  • M. Pota, F. Marulli, M. Esposito, G. D. Pietro, and H. Fujita (2019) Multilingual pos tagging by a composite deep architecture based on character-level features and on-the-fly enriched word embeddings. Knowledge-Based Systems 164, pp. 309–323. Cited by: §6.1.
  • J. Qiu, J. Tang, H. Ma, Y. Dong, K. Wang, and J. Tang (2018) DeepInf: social influence prediction with deep learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2110–2119. Cited by: §6.2.
  • J. Radcliffe (1977) The mathematical theory of infectious diseases and its applications. Journal of the Royal Statistical Society Series C 26 (1), pp. 85–87. Cited by: §1.
  • M. G. Rodriguez, D. Balduzzi, and B. Scholkopf (2011) Uncovering the temporal dynamics of diffusion networks. In Proceedings of the 28th International Conference on Machine Learning, pp. 561–568. Cited by: §5.1.2, §6.2.
  • S. T. Roweis and L. K. Saul (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290 (5500), pp. 2323–2326. Cited by: §6.1.
  • K. Saito, M. Kimura, K. Ohara, and H. Motoda (2010) Generative models of information diffusion with asynchronous timedelay. In Proceedings of 2nd Asian Conference on Machine Learning, Vol. 13, pp. 193–208. Cited by: §6.2.
  • K. Saito, R. Nakano, and M. Kimura (2008) Prediction of information diffusion probabilities for independent cascade model. In Proceedings of the 12th International Conference on Knowledge-Based Intelligent Information and Engineering Systems, pp. 67–75. Cited by: §1, §1, §6.2.
  • K. Saito, K. Ohara, K. Ohara, and H. Motoda (2009) Learning continuous-time information diffusion model for social behavioral data analysis. In Asian Conference on Machine Learning: Advances in Machine Learning, pp. 322–337. Cited by: §6.2.
  • G. V. Steeg and A. Galstyan (2013) Information-theoretic measures of influence based on content dynamics. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining, pp. 3–12. Cited by: §1.
  • Y. Sun, C. Qian, N. Yang, and P. S. Yu (2017) Collaborative inference of coexisting information diffusions. In 2017 IEEE International Conference on Data Mining (ICDM), pp. 1093–1098. Cited by: §1.
  • A. Swami, A. Swami, and A. Swami (2017) Metapath2vec: scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 135–144. Cited by: §6.1.
  • J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei (2015) LINE:large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. Cited by: §1, §4.2.2, §6.1.
  • J. B. Tenenbaum, V. D. Silva, and J. C. Langford (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290 (5500), pp. 2319–2323. Cited by: §6.1.
  • D. Varshney, S. Kumar, and V. Gupta (2017)

    Predicting information diffusion probabilities in social networks: a bayesian networks based approach

    .
    Knowledge-Based Systems 133, pp. 66–76. Cited by: §1, §6.2.
  • D. Wang, P. Cui, and W. Zhu (2016) Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1225–1234. Cited by: §1, §1, §4.2.1, §6.1.
  • J. Wang, V. W. Zheng, Z. Liu, and C. C. Chang (2017)

    Topological recurrent neural network for diffusion prediction

    .
    In 2017 IEEE International Conference on Data Mining (ICDM), pp. 475–484. Cited by: §1, §5.1.2, §6.2.
  • L. Wang, S. Ermon, and J. E. Hopcroft (2012) Feature-enhanced probabilistic models for diffusion network inference. In In European conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 12, pp. 499–514. Cited by: §6.2.
  • S. Wang, X. Hu, P. S. Yu, and Z. Li (2014) MMRate:inferring multi-aspect diffusion networks with multi-pattern cascades. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1246–1255. Cited by: §6.2.
  • L. Weng, F. Menczer, and Y. Y. Ahn (2013) Virality prediction and community structure in social networks. Scientific Reports 3 (8), pp. 2522. Cited by: §5.1.1.
  • Y. Xie, M. Gong, A. K. Qin, Z. Tang, and X. Fan (2019) TPNE: topology preserving network embedding. Information Sciences 504, pp. 20–31. Cited by: §1, §6.1.
  • L. Xu, X. Wei, J. Cao, and P. S. Yu (2017) Embedding of embedding (eoe):joint embedding for coupled heterogeneous networks. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining, pp. 741–749. Cited by: §6.1.
  • L. Yang, Z. Li, and A. Giua (2020) Containment of rumor spread in complex social networks. Information Sciences 506, pp. 113–130. Cited by: §1.
  • J. Zhang, C. Xia, C. Zhang, L. Cui, Y. Fu, and P. S. Yu (2017) BL-mne: emerging heterogeneous social network embedding through broad learning with aligned autoencoder. In 2017 IEEE International Conference on Data Mining (ICDM), pp. 605–614. Cited by: §4.2.1.
  • J. Zhang, B. Liu, J. Tang, T. Chen, and J. Li (2013) Social influence locality for modeling retweeting behaviors. In

    Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence

    ,
    pp. 2761–2767. Cited by: §5.1.1.
  • X. Zhang, Y. Su, S. Qu, S. Xie, B. Fang, and P. S. Yu (2019) IAD: interaction-aware diffusion framework in social networks. IEEE Transactions on Knowledge and Data Engineering 31 (7), pp. 1341–1354. Cited by: §1.