Deep Learning Approach on Information Diffusion in Heterogeneous Networks

02/23/2019 ∙ by Soheila Molaei, et al. ∙ 0

There are many real-world knowledge based networked systems with multi-type interacting entities that can be regarded as heterogeneous networks including human connections and biological evolutions. One of the main issues in such networks is to predict information diffusion such as shape, growth and size of social events and evolutions in the future. While there exist a variety of works on this topic mainly using a threshold-based approach, they suffer from the local viewpoint on the network and sensitivity to the threshold parameters. In this paper, information diffusion is considered through a latent representation learning of the heterogeneous networks to encode in a deep learning model. To this end, we propose a novel meta-path representation learning approach, Heterogeneous Deep Diffusion(HDD), to exploit meta-paths as main entities in networks. At first, the functional heterogeneous structures of the network are learned by a continuous latent representation through traversing meta-paths with the aim of global end-to-end viewpoint. Then, the well-known deep learning architectures are employed on our generated features to predict diffusion processes in the network. The proposed approach enables us to apply it on different information diffusion tasks such as topic diffusion and cascade prediction. We demonstrate the proposed approach on benchmark network datasets through the well-known evaluation measures. The experimental results show that our approach outperforms the earlier state-of-the-art methods.



There are no comments yet.


page 11

page 14

page 15

page 16

page 17

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Information diffusion is one of the widely studied dynamical processed on networks. Information such as news, innovations, and viruses start from a set of seed nodes and propagates throughout the networkbakshy2012role ; zhang2016dynamics . Information diffusion has been investigated in a wide range of fields including health care Hu2017 ; Green2002 , complex networks Kirst2016 ; Zhang2016 and social networks Dhamal2016 ; Margaris2016 . One of the most important tasks on networked systems is the understanding, modeling and predicting the rapid events and evolutions in the body of network. This is mainly motivated by the well-known fact that discovering the structure of networks is resulted to predict the patterns of social events such as their shape, size and growth known as information diffusion li2017deepcas . Many researchers have investigated various techniques and approaches to model information diffusion on homogeneous and heterogeneous networks such as Gui2014 ; Yang2015 .

Formally, an information network denoted as a graph with the set of nodes and set of edges, is homogeneous if and only if the edges and nodes are of the same type, and heterogeneous if different types of nodes and relations on edges are involved Sun2013a . Various studies have been conducted on homogeneous networks including semantic parsing bordes2012joint and link prediction Bordes2013 ; Socher2013 ; Wang2014 . Recently, heterogeneous networks have been considered as an attractive research field, due to the fact of more natural assumption of heterogeneous networks on many real-world phenomena. Watts Watts2002 studied the role of threshold values and network structure in the information diffusion. Therefore, information diffusion, as a particular topic of interest in this regard, has been studied in Gui2014 using different meta-paths in heterogeneous networks. In Gui2014 , MLTM-R is proposed by distinguishing the power in passing information around for different types of meta-paths. Pathsim was considered as a weight between each two nodes in this method through which predictions were conducted Kuck2015 ; shang2016meta .

Recently, much attention has been focused on deep learning in heterogeneous networks audebert2017fusion ; liu2016deep . In wu2017coupled , a deep learning framework is investigated as coupled with deep learning (CDL), to address the VIS-NIR heterogeneous matching problem via the topic diffusion on networks. Tang et al. tang2015line proposed the LINE algorithm to learn embedding which traverses all edge types and samples one edge at a time for each edge type. Chang et al. chang2015heterogeneous proposed a deep architecture by embedding Heterogeneous Information Network(HIN). Recently dong2017metapath2vec is proposed a new HIN algorithm which transcribes semantics in HINs by meta-paths. With this in mind, we point out the strengths and weaknesses of the existing approach to topic diffusion in heterogeneous networks. Our motivation is to employ a deep learning approach on information diffusion tasks such as topic diffusion and cascade prediction to alleviate the problems of the earlier ones’. Most works on topic diffusion suffer from local similarity considerations and single node-based embedding Gui2014 , our latent based representational learning approach investigates a global view of the activation prediction modeling.

Here, we propose a novel latent representational learning approach on heterogeneous networks based on different types of meta-paths. Furthermore, the proposed approach is employed on different deep neural network architectures to predict information diffusion tasks in an end-to-end framework. The experimental results demonstrate the strength of the proposed approach as compared with the earlier works. We evaluate the performance of the proposed method using real-world information graph i.e. DBLP, PubMed and ACM. HDD is compared with multiple strong baselines, including feature-based methods node2vec

grover2016node2vec , deepcas li2017deepcas and MLTM-R Gui2014 . HDD method significantly improves the results over these baselines. Our representations are general and can be used on any heterogeneous network for any diffusion. To summarize, our work makes the following contributions:

  • Propose a novel representation on heterogeneous networks to create the vector of features from the meta-paths.

  • Investigate the proposed representation aligned with different types of meta-paths.

  • Employ the deep learning based architectures on predicting the topic diffusion along the heterogeneous networks.

  • Apply the proposed algorithm on different diffusion processes including topic diffusion and information cascade.

  • Evaluate the performance of the proposed approach versus the earlier methods through the well-known network datasets.

The remainder of this paper is organized as follows. In Section 2 related works are reviewed on the heterogeneous networks along with a brief motivation for the idea. Section 3 describes the proposed deep learning framework of information diffusion for heterogeneous networks. Section 4 is devoted to the experimental settings and results on the benchmark real networks. Finally, we conclude the paper in Section 5 with some suggestions for further works.

2 Literature Review

Recently, there exist a multitude of works on exploiting heterogeneous networks to uncover the structural patterns by considering the rich side of information on different nodes objects and edge attributes in these networks shi2017survey . Sentiment classification of product reviews using heterogeneous networks of users, products, and words was addressed by Zhou et al. Zhou2007 . In this regard, Zhou et al. Zhou2007

proposed a co-ranking method which classifies the authors and documents separately based on random walks. Angelova et al.

Angelova2012 presented a new classification method for mining of homogeneous information networks through their decomposition into multiple homogeneous ones. The idea of citation recommendation using heterogeneous networks was proposed by Liu et al.Liu2014 . Information diffusion has recently used in such networks. Information diffusion is mainly fallen into two categories as topic diffusion and cascade prediction which are described as follows:

2.1 Topic Diffusion

Recently, a variety of works on topic diffusion as primary tasks on analyzing the heterogeneous networks are employed in applied domains such as medical and public health issues. Several works considered the epidemic modeling in heterogeneous networks like infections spreading on population systems Moreno2002 , modified SIR model Yang2007 , and a survey salehi2015spreading . Wang and Dai Wang2008 addressed virus spreading in heterogeneous networks by applying an epidemic threshold on the well-known SIS model. Yang2015 showed leveraging a heterogeneous network among people to yield more resistance against the epidemic spread of the virus. Epidemic spreading is an important issue that was considered in other networks likes time-varying networks nadini2017epidemic and adaptive ones’ demirel2017dynamics . Nadini et al. nadini2017epidemic used SIR and SIS models and investigated effects of modular and temporal connectivity patterns on epidemic spreading.

Various techniques are proposed on topic diffusion in heterogeneous networks. The degree distribution is used in Sermpezis2013 , for the modeling of information diffusion by taking the assumption of diffusion between two nodes at random times. Zhou and Liu zhou2013social presented a social influence based clustering framework. Molaei et al. molaei2018information predicted topic diffusion process in heterogeneous networks by considering the interactions of different meta-paths. A heterogeneous network based model was proposed for new products diffusion in two stages framework Li2014 . In Boccaletti2014 , the concept of heterogeneous networks was used as an alternative definition for the infrastructure networks to explain the diffusion process.

2.2 Cascade Prediction

There are many cascade prediction methods originating from different research areas as both classification and regression problems. Recently, methods based on representation learning emerge with impressive predictive power. Graph representation learning methods have largely been based on the popular skip-gram model mikolov2013efficient ; cheng2006n originally introduced for learning vector representations of words in the text. In particular, DeepWalk perozzi2014deepwalk used this approach to embed the nodes such that the co-occurrence frequencies of pairs in short random walks are preserved. Node2vec grover2016node2vec

introduced hyperparameters to DeepWalk that tune the depth and breadth of the random walks.

Lately, some deep learning based models show good performances. These models learn to predict information cascade in an end-to-end manner. DeepCas li2017deepcas

uses random walk to sample paths from different snapshots of the graph then uses Gated Recurrent Units (GRU) and attention mechanism to extract features from random walk paths to predict information cascade. DeepHawkes

cao2017deephawkes uses GRU to encode each cascade path, and employs weighted average pooling based on time decay effect to combine features from all cascade paths.

While there exist some recent works on cascade predictions with a deep learning approach li2017deepcas ; cao2017deephawkes , our proposed framework differs from them due to the type of used networks, applied methods, input parameters and also more generalization capability to the different information diffusion process. We focus on the heterogeneous network and we added wights to the input with considering different meta-paths. The deep learning approach exploits the representational learning on meta-paths to generate the required features.

3 Proposed Method

We present a general framework, HDD(Heterogeneous Deep Diffusion) on topic diffusion in heterogeneous networks through a deep learning approach. The flow-graph of the overall structure of the proposed approach is shown in Figure 1.

Figure 1: The overall procedure of the proposed method

Initially, the graph with distinct node types as active and inactive ones’ is given. In the representation stage, the continuous latent features are learned based on different meta-path definitions. Therefore, each node is represented as a continuous vector in a sequence of different meta-paths. We employ a variety of deep neural networks architectures on sequences of nodes representations to predict outgoing activation or inactivation in the body of network. In following, we describe the main information diffusion tasks in our formulation and then present our approach.

3.1 Problem Statement

The two main tasks of information diffusion, topic diffusion and cascade prediction are described on heterogeneous network setting.

3.1.1 Topic Diffusion

In a general scheme of network , meta-path is defined where and represent the type of nodes and edges (meta-path). It is displayed as .

Figure 2: DBLP network

A scenario of heterogeneous collaboration network is represented in Figure 2 where an edge between two authors denotes a common publication. There may be multiple edges between two nodes for authors with multiple publications together. The “APA” presents the meta-path of coauthors on a paper (P) between two authors (A), and in the same way, “AVA” denotes authors (A) publish papers in the same conference (V).

Topic diffusion aims to solve the problem of who will write a paper on a particular topic at the time , when a node wrote a paper on the same topic at time .

Here, our aim is to construct a universal framework to employ the meta-path objects (instances) based on deep learning ideas to alleviate the learning of computations in the prior works.

3.1.2 Cascade Prediction

For constructing cascades in the graph , citation relationship between nodes and is required. The cascade path is defined according to the reference of its citing papers. Suppose we have topics denoted by . For each topic , we use a cascade to record the topic diffusion process of meaning that the author cites paper of the author and the time elapsed between the citation process is . Figure 3 represents the cascades which are denoted by , , , , . In our framework, the aim is to predict subsequent cascades of the specific topic at time from the current -th snapshot graph.

Figure 3: Cascade definition

3.2 Heterogeneous Deep Diffusion

To model topic diffusion in the heterogeneous networks, we introduce Heterogeneous Deep Diffusion(HDD) method by exploiting the meta-paths. In this part, meta-path graphs are extracted from the raw datasets to represent nodes in a heterogeneous network . A snapshot of graph at time is characterized by a meta-path graph , where is a subset of nodes in that have adopted meta-path at time . In our setting, graph snapshots are considered on different timestamps along with a variety of meta-path definitions.

3.2.1 Author Embedding

Each node is represented as a vector, where

is the total number of authors. All users share an embedding tensor

, which is the number of timestamps as depicted in Figure 1.

3.2.2 Meta-path Encoding

To represent the information flow, we use a variety of neural network architectures like LSTM, CNN, and CNN-LSTM. The main key factor here is considering all meta-paths. The all of meta-paths author embedding tensor are merged to a unique tensor in our framework at different timestamps rather than applying ad-hoc based relationship. The meta-paths embedding in a unique tensor enriches the proposed approach from the following perspectives,

  1. Loss of information: In the real world, two authors may have different relationships to be considered rather than single relation to prevent the loss of significant information.

  2. Time sequences: Different snapshots of the meta-path graphs are yielded to distinct author embedding cascades over time.

We encode the entire met-paths for each author by using embedding tensor as an input through an LSTM hochreiter1997long . At each time step LSTM can choose to read from, write to, or reset the sell using gating mechanisms. They include four gates, generally denoted as , , , and , corresponding to the input, output, forget, and new memory gate. and correspond to weights of the input, , and the hidden state, , where can either be the input gate, output gate, and forget gate or the memory gate, depending on the activation being calculated.

Figure 4: Long Short-term Memory Cell

Intuitively, the gate controls the flow of information to enter and exit from the cell at each time step. In the first step, the forget gate, , decides how much of the previous state to take into account.


Next, the input gate, , decides which values to update and a new memory generation stage, , creates a vector of new candidate values that could be added to the state.


Now, the new memory gate, forgets the old cell state, and gates the new memory cell, .


Finally, the output gate, , decides what parts of the cell state to output, then the output will be filtered by a tanh layer, to output the desired parts of it.


We built the LSTM model from an embedding layer (of dimensionality 512), an LSTM layer (with 512 network units for each gate) with dropout regularization, and finally, a sigmoid activation function applied to the output of the LSTM.

Convolutional neural networks (CNNs)lecun1995convolutional

, are comprised of an input layer, one or more hidden layers, and an output layer. The hidden layers are a combination of convolutional layers, relu layers, pooling layers, and fully-connected layers. Convolutional layers will compute the output of neurons that are connected to only a small region in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. Relu layers will apply an element-wise activation function which zeros out negative inputs and is represented as

(0, ). Pooling layers will perform a down-sampling operation along the spatial dimensions (width and height) of the input. Fully-connected layers have neurons that are functionally similar to convolutional layers (compute dot products) but are different in that they are connected to all activations in the previous layer. The last fully-connected layer is called the output layer and it will compute the class scores. Stacking these layers will form a full CNN architecture as demonstrated in Figure 1.

CNN-LSTM is a combination of CNN for feature extraction and LSTM for summarization of the extracted features. Adopting an LSTM for aggregating the features enables the network to take the global structure into account while local features are extracted by CNN layers as represented in Figure

5. These features can be used in various heterogeneous network mining tasks, such as clustering, classification and so on which we used for prediction. Our CNN-LSTM model uses an embedding layer (of dimensionality 1000), a one dimensional CNN layer of convolutions interspersed with a max pooling, an LSTM layer (with 1000 network units for each gate), and finally, a sigmoid activation function applied to the output of the LSTM.

Figure 5: CNN-LSTM Architecture

One of the primary methods in the deep neural network is MLP. In this section this method is briefly described just for a more detailed comparison with the other introduced methods ordonez2016deep .

4 Experiments

We evaluate the proposed approach based on benchmark datasets and standard evaluation measures which are introduced in the following subsections. Then, the performance of the proposed approach is described in comparison with the earlier well-known methods. For topic diffusion and cascade prediction, initially, we need to select a topic like “data mining”. The authors who had a paper in the field of related topic considered as active. After that, we should predict the activation of other inactive nodes in the next timestamps with the proposed methods.

4.1 Datasets

The DBLP, PubMed, and ACM are benchmark real network datasets which are employed in our experimental studies.

  • DBLP: This dataset is about computer science bibliography among authors in main conferences and publications Dataset . Objects indicate authors in this network. Different meta-path such as APA (Author-Paper-Author), ACA (Author-Conference-Author), APAPA (Author-Paper-Author-Paper-Author), and ACACA (Author- Conference -Author- Conference -Author) are considered. Different topics are extracted from this dataset, and information diffusion about a specific topic is investigated. This dataset contains information from 1954 to 2016.

  • PubMed: This dataset consists of medical science bibliography among authors in main conferences and publications in this domainDataRep . In this network, the authors are represented by objects and meta-paths APA and APAPA are used. The dataset contains bibliographic information from 1994 to 2003.

  • ACM: This dataset consists of the bibliographic information of publications of the Association for Computing Machinery (ACM) Dataset . In this network, we used APA and APAPA meta-paths. The dataset contains information from 1959 to 2009.

A summary about the datasets are given in Table 1.

Dataset Authors Papers
DBLP 215222 104940
PubMed 7140093 4271136
ACM 468114 649526
Table 1: The main properties of applied datasets

4.2 Evaluation Measures

For topic diffusion, we use Precision and Recall criteria to assess the performance. These measures are defined as,


Where True Positive (TP) is the active nodes that are correctly tagged as active by the algorithm, True Negative (TN) is the inactive nodes that are correctly tagged as inactive by the algorithm, False Positive (FP) is the active nodes that are falsely tagged as inactive by the algorithm, and False Negative (FN) is the inactive nodes that are falsely tagged as active by the algorithm.
Furthermore, AUPR( Aurea under Precision Recall) curve is used by plotting Precision against Recall. The higher they are, the better the model is.

The MSE(Mean Squared Error) and AP(Average Precision) is applied to measure the accuracy of cascade prediction task. AP summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold zhu2004recall .

On training and test data selection, we first consider all nodes with published papers as our particular topic of interest as active ones and the rest nodes as inactive. At time , the training and test sets are selected as follows:
Training set: Those within the time period from to and from to are considered as the training set for topic diffusion and cascade prediction separately.
Test set: Those within the time period from to are considered as the test set in topic diffusion. Additionally, the nodes tagged as active up to the time are considered as the seed nodes that are activated initially in the start of the diffusion process. In the case of cascade prediction, the sequence of nodes at time are employed as the test set.

4.3 Performance Evaluation

The proposed approach HDD is applied on topic diffusion and cascade prediction in comparison with the earlier well-known methods. On topic diffusion on heterogeneous networks, the performance of HDD is compared to MLTM-R which is the only related work in this category. On cascade prediction, we employ Node2vec and DeepCas as well-known feature learning techniques that are presented in the following:

  1. Node2vecgrover2016node2vec : It learns a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes. This method is selected as a representative work in the node embedding methods.

  2. DeepCasli2017deepcas : It is initiated on paths samples from different snapshots of a graph. They used a GRU network to transform path samples into a single vector.

4.3.1 ACM dataset

In this data set, the topics “Data Mining”,

Machine Learning


Decision Tree

are selected due to having the ground-truth about them in this dataset. Figures 6, 7 and 8 represent topic diffusion results on “Data Mining”, “Machine Learning” and “Decision Tree” topics on ACM. It can be observed that the LSTM and CNN-LSTM have a significant improvement rather than the other methods. In the dataset, LSTM increased the AUPR measure around 35% and CNN-LSTM enhanced the AUPR up to 50%. After these methods, CNN is better than the MLTM-R and MLP which raised the AUPR by 30%.

Figure 6: AUPR measure in ACM dataset for Data Mining topic (1994- 2003)
Figure 7: AUPR measure in ACM dataset for Machine Learning topic (1994- 2003)
Figure 8: AUPR measure in ACM dataset for Decision tree topic (1994- 2003)

4.3.2 DBLP dataset

These topics “Data mining”, “Social Network” and “Regression” were selected in DBLP. We can infer from Figures 9, 10 and 11 that LSTM and CNN-LSTM still have better results but here, versus ACM dataset, MLTM-R method on average shows a slight growth compared with MLP. The results comparison showed that CNN-LSTM returned high AUPR (overall AUPR about 55%) followed by LSTM and CNN.

Figure 9: AUPR measure in DBLP dataset for Data Mining topic (2003- 2015)
Figure 10: AUPR measure in DBLP dataset for Social Network topic (2003- 2015)
Figure 11: AUPR measure in DBLP dataset for Regression topic (2003- 2015)

4.3.3 PubMed dataset

In this dataset “Health care” topic was examined. As shown in the figure 12, LSTM and CNN-LSTM improved the AUPR Measure in comparison to other used methods. Here, CNN makes the results a little better against MLP due to the volume of data. According to AUPR, CNN-LSTM and LSTM outperformed MLTM-R by 25% and 18% respectively.

Figure 12: AUPR measure in PubMed dataset for Health care topic (1997- 2003)

4.4 Cascade Prediction

We evaluate the prediction of cascades through the DBLP dataset because of existing citation relations. We created cascades output according to PCP(paper-citation-paper) in data mining, machine learning topics and construct the meta-paths graph input based on PCP(paper-citation-paper), PVP(paper-conference-paper). As shown in Figures 13 and 14, LSTM and CNN-LSTM outperform the other methods by a significant margin.

(a) MSE error for Data Mining topic
(b) AP measure for Data Mining topic
Figure 13: AP Measure and MSE error for Data Mining topic
(a) MSE error for Machine Learning topic
(b) AP measure for Machine Learning topic
Figure 14: AP Measure and MSE error for Machine Learning topic

5 Conclusion

This paper studied topic diffusion and cascade prediction in heterogeneous networks. With this aim, we introduced a new end to end graph representation learning method in heterogeneous networks. The end-to-end predictor, HDD, outperformed the feature-based machine learning methods and alternative author embedding and meta-path graph embedding methods. The HDD model captures the influence of authors and cascades in different timestamps and meta-paths. Besides, employing the entire meta-paths through deep structures as an alternative to the ad-hoc based relations can significantly improve the prediction performance. In addition, the HDD model is flexible to use in other multilayer networks. We demonstrated the advantages of deep architecture methods in real-world networks such as DBLP, PubMed and ACM. Our experimental results on three real data sets verified the effectiveness and efficiency of our methods, LSTM and CNN-LSTM. The performance of the proposed method was compared with the state-of-the-art techniques. The obtained results showed that the proposed method outperforms the earlier ones’. As future work, we are interested in a combination of graph summarization and deep learning to improve the results.