I Introduction
Online social networks have become an important platform for people to communicate, share knowledge and disseminate information. Given the widespread usage of social media, individuals’ ideas, preferences and behavior are often influenced by their peers or friends in the social networks that they participate in. Since the last decade, influence maximization (im) problem has been extensively adopted to model the diffusion of innovations and ideas. The purpose of im is to select a set of seed nodes who can influence the most individuals in the network [1]. For instance, an advertiser may wish to send promotional material about a product to users of a social network that are likely to sway the largest number of users to buy the product.
A large number of greedy and heuristicbased
im solutions have been proposed in the literature to improve efficiency, scalability, or influence quality. Stateoftheart im techniques attempt to generate approximate solutions with a smaller number of ris (Random Interleaved Sampling) samples, which are mainly used to estimate the expected maximum influence (denoted as ) for an arbitrary node given the current selected seeds . They use sophisticated estimation methods to reduce the number of ris samples closer to a theoretical threshold [2]. As provides a lower bound for the number of required ris samples, these methods have to undertake a diffusion sampling phase and generate enough propagation samples in order to estimate the expected influence before selecting a seed. Despite the improvements of the sampling model and reduction in the number of required samples brought by recent studies, the cost of generating these samples is large especially in huge networks. Consequently, is it possible to avoid the diffusion sampling phase in im solutions? In this paper, we answer to this question affirmatively by proposing a novel framework that for the first time utilizes network embedding and deep learning to tackle the im problem.The core challenge in im lies in the process of estimating given and the partial solution , which is known to be #Phard [1]. Traditional im
solutions address it by sampling the diffusion paths to generate an unbiased estimate for
. In essence, can be viewed as a mapping , where denotes the network and refers to the set of diffusion models, which defines the criteria to determine whether a node is influenced or not. That is, given a network , an arbitrary node , and a particular diffusion model (e.g., Independent Cascade, Linear Threshold model [1]), outputs the expected maximum number of influenced nodes by . In this paper, we advocate that it is possible to predict the expected influence of if we can approximate the mapping as and learn the values of the parametersby utilizing some machine learning methods. Fortunately, a series of complex mappings have been recently approximated using deep learning techniques
[3, 4, 5]. We leverage on these recent results and present a learningbased framework for the im problem called disco (Deep learnIngbaSed influenCe maximizatiOn).Learning the mapping is a nontrivial and challenging problem. First, we need to transform the topology information of the target network into features. Second, there is no labeled expected maximum influence to train
. Hence, supervised learning approaches cannot be adopted in this scenario. To address these challenges, we
integrate deep reinforcement learning [6, 7] and network embedding techniques [8] in disco. To learn the mapping, we adopt a network embedding method to represent the network topology as vectorbased features, which further acts as input to deep reinforcement learning. For the learning phase, inspired by recent progress in combinatorial optimization problems, we propose a model to approximate
as using a deep reinforcement learning technique. For each potential seed, instead of estimating its maximum influence by sampling the diffusion paths, we directly predict its influence via learned mapping function. Moreover, we show that under our model the difference in the predicted influences between any pair of nodes will hardly change whenever a new seed is selected. Therefore, instead of iteratively select seeds, in our model we are able to select all seeds at the same time. Notably, once a mapping function is learned, it can be applied to multiple homogeneous networks. The main contributions of the paper can be summarized as follows.
We present a novel framework that exploits learning methods to solve the classical im problem. To the best of our knowledge, this is the first deep learningbased solution to the im problem.

We present a novel learning method to approximate as , by exploiting deep reinforcement learning and network embedding.

Our proposed framework generates seed sets with superior running time to stateoftheart machine learningoblivious im techniques without compromising on result quality. Specifically, it is up to 36 times faster than ssa [9]. Furthermore, the influence quality of our method is slightly better than the traditional methods.

We show how disco can be utilized to address the im problem in largescale networks or in the presence of evolutionary networks.
The rest of this paper is organized as follows. Section II reviews related work. We formally present the learningbased im problem in Section III. We introduce the training and testing procedures in our framework in Sections IV and V, respectively. Experimental results are reported in Section VI. Finally, we conclude this work in Section VII.
Ii Related Work
In this section, we review research on influence maximization and network embedding.
Iia Influence Maximization
Since the elegant work in [1], the im problem has been studied extensively. Kempe et al. [1] proved that the problem of im is NPhard and provided a greedy algorithm that produces a approximate solution. Later, celf algorithm [10] is proposed, which reduces running time through a lazy update strategy. It is nearly 700 times faster than the hillclimb algorithm proposed by [1]. celf++ further optimizes celf by exploiting the submodularity property [11]. Although celf++ maintains more information than celf, it has nearly 7% efficiency improvement over celf. However, it is still time consuming for these approximate algorithms.
Besides these approximate algorithms, many excellent heuristic algorithms [12, 13, 14], which do not provide an approximation ratio, have been proposed to reduce running time further. Compared to celf++/celf, these heuristic algorithms have improved the efficiency by at least two orders of magnitude. Although these heuristic algorithms greatly reduce the execution time, they sacrifice accuracy of the results significantly [12, 13, 14].
Recently, Borgs et al. [15] proposed a new im sampling method called reverse influence sampling (ris). It sets the threshold to determine how many reverse reachable sets are generated from the network and then selects the node that covers the most number of these sets. Based on this idea, a series of advanced approximate algorithms have been proposed. These algorithms can not only provide an approximation ratio, but also exhibit competitive running time compared to heuristic algorithms. tim/tim+ [16] algorithm significantly improved the efficiency of [15] and is the first risbased approximate algorithm to achieve competitive efficiency with heuristic algorithms. Subsequently, imm [2] utilized the notion of martingales to significantly reduce computation time in practice while retaining tim’s approximation guarantee. Specifically, it can return a approximate solution with at least probability in expected time.
More recently, ssa and dssa [9] claimed to be superior to imm w.r.t. running time while sharing the same accuracy. ssa is based on the StopandStare strategy where the number of required ris samples can be closer to the theoretical lower bound. However, Huang et al. [17] performed a rigorous theoretical and experimental analysis of these algorithms and found that the results reported in [9] are somewhat unfair.
In recent years, there are increasing efforts to address im problems with the help of learning methods. [18, 19] have both employed reinforcement learning model to find best strategy in competitive im problem, which was thoroughly studied in [20, 21]. Different from [18], the authors in [19] addressed the competitive im in a multiround and adaptive setting [22, 23]. Both of these works treat the competition between multiple parties as a game; and employ reinforcement learning to find the best policy (i.e., strategy) that can maximize the profit given the opponents’ choices. They do not natively model the influence estimation as a reinforcement learning problem or node influence as a learning task. Hence, these works are only applicable to competitive setting and are orthogonal to our problem. Besides, [24, 25] adopt learning methods to study the diffusion model and optimize the linear threshold model parameters, respectively. They do not consider im as a machine learning problem and are also orthogonal to our problem.
In summary, imm and ssa are currently recognized as the stateoftheart methods to solve the im problem according to a benchmarking study [26]. Notably, none of these efforts integrate machine learning with the im problem to further enhance the performance.
IiB Network Embedding
Network embedding is to find a mapping function that converts each node in a network into a vectorbased representation. The learned vectorbased representation can be used as a feature of various tasks based on graphs, such as classification, clustering, link prediction and visualization. One of the major benefit of network embedding is that the resulting vector representation can be directly feed into most machine learning models to solve specific problems.
Early methods for learning node representations focused primarily on matrix decomposition, which was directly inspired by classical techniques for dimensionality reduction [27]. However, these methods introduce a lot of computational cost. Recent approaches aim to learn the embedding of nodes based on random walks. DeepWalk [28] was proposed as the first network embedding method using deep learning technology, which compensates for the gap between language modeling and network modeling by treating nodes as words and generating short random walks. LINE [29] uses the Breadth First Search strategy to generate context nodes, in which only nodes that are up to two hops from a given node are considered to be neighbors. In addition, it uses negative sampling to optimize the Skipgram model compared to the layered softmax used in DeepWalk. node2vec [30] is a sequence extraction strategy optimized for random walks on DeepWalk framework. It introduces a biased random walk program that combines Breadth First Search and Depth First Search during neighborhood exploration. SDNE [31]
captures the nonlinear dependency between nodes by maintaining the proximity between 2hop neighbors through a deep autoencoder. It designs an objective function that describes both local and global network information, using a semisupervised approach to fit optimization. It further maintains the proximity between adjacent nodes by minimizing the Euclidean distance between their representations. There is also a kernelbased approach where the feature vectors of the graph come from various graphics kernels
[32]. Structure2vec [8] models each structured data point as a latent variable model, then embeds the graphical model into the feature space, and uses the inner product in the embedded space to define the kernel, which can monitor the learning of the graphical structure. DeepInf [33] presents an endtoend model to learn the probability of a user’s action status conditioned on her local neighborhood.Iii Problem Statement and Diffusion Model
In this section, we first formally present the learningbased im problem. Next, we briefly describe the information diffusion models discussed in these definitions.
Iiia Problem Statement
Let be a social network, where is a set of nodes, is a set of edges, , and . represents an edge from node to node . Let denote the weight of an edge indicating the strength of the influence. Accordingly, the im problem can be formally defined as follows.
Definition 1
(Influence Maximization) Given a social network , an information diffusion model , integer , the influence maximization problem aims to select nodes as the seed set (), such that, under the diffusion model , the expected number of influenced nodes by , namely , is maximized. The problem can be modeled as the following.
As im is proved to be NPhard [1], all approximate solutions need to greedily select the next seed with the maximum marginal improvement in expected influence. In particular, let be a partial solution with seeds (i.e., ), then in th iteration, an approximate algorithm shall choose a node , such that is maximized, where . To facilitate the following discussions, we refer to as the maximum marginal expected influence of given a partial solution . As is #Phard to calculate based on and , traditional efforts in im generate unbiased estimates for using a set of ris samples. In this paper, we solve the im problem by adopting a completely new strategy i.e., a learning method. In our solution, is not estimated using risbased samplings. Instead, it is approximated using deep learning models. In this regard, we present the definition of learningbased im problem as follows.
Definition 2
(Learningbased Influence Maximization) [Learning phase.] Given a series of homogeneous networks , an information diffusion model , train a group of parameters such that function can be used to approximate as accurately as possible. [Seeds selection phase.] Given a target network , integer and a function that approximately calculates the marginal influence of with respect to the partial solution , solve the im problem in with respect to budget and diffusion model .
IiiB Diffusion Models
Based on the definition of im, one can observe that a diffusion model is vital for the selection of seeds. Currently, there exist two popular diffusion models, namely Linear Threshold (lt) and Independent Cascade (ic). Throughout a diffusion process, a node has only two states, activated and inactivated. Both models assume that, when a node is activated, its state will not change further.
Linear Threshold (lt) model. The lt model is a special case of the triggering model [34]. To understand the concept, we introduce two related notions. : set of neighbors of node ; : set of activated neighbors of node .
In the lt model, each node in a graph has a threshold . For each node , the edge has a nonnegative weight . Given a graph and a seed set , and the threshold for each node, this model first activates the nodes in . Then it starts spreading in discrete timestamps according to the following random rule. In each step, an inactivate node will be activated if The newly activated node will attempt to activate its neighbors. The process stops when no more nodes are activated.
Independent Cascade (ic) model. Given a graph and a seed set , this model first activates the nodes in , and then starts spreading in discrete timestamps according to the following rule. When a node is first activated at timestamp , it gets a chance to activate a node in its neighborhood that is not activated. The success probability of activation is , and the failure probability is . If is activated successfully, will become an active node in step and it can no longer activate other nodes in subsequent steps. This process continues until no new nodes can be activated. In other words, whether can activate is not affected by previous propagation.
Iv Learning the Mapping Function
Notably, in traditional approximate im solutions, it is inevitable to undertake a diffusion sampling phase to generate a group of ris and the number of required ris is at least as large as the threshold . In this paper, we turn to deep learning models to avoid the traditional diffusion sampling phase in seeds selection. In this section, we shall present our disco framework that addresses the im problem using deep learning techniques. As remarked earlier, the key challenge in addressing im lies in the estimation of expected maximum influence function . A learningbased im solution should inevitably train a model, namely , that can approximate the mapping as accurately as possible. Unfortunately, as the influence maximization problem is NPhard, the groundtruth label for is hard to acquire. Consequently, we adopt Deep QNetwork (dqn) [6, 7], a popular deep reinforcement learning model to learn the parameters . In this way, an optimal approximation of can be acquired, i.e., .
Next, we introduce the learning phase of our framework, which consists of network embedding, training of the parameters , and approximating with for use in dqn. The test phase for selecting nodes according to the learned parameters and predicted influences will be detailed in the next section. An overview of the disco framework is depicted in Figure 1, where the top half depicts the learning phase while the bottom half illustrates the test phase (i.e., seeds selection).
Iva Embedding the Nodes
Before dqn model can be applied, we shall first extract features of nodes based on the topological information. To this end, we need the embedding of each node as a vector . Among series of embedding methods, e.g., DeepWalk [28], node2vec [30], DeepInf [33], etc., we select Structure2vec [8] to accomplish this step due to the following reasons. Firstly, the other alternatives are ‘transductive’ embedding methods, that is, embeddings extracted across graphs are not consistent, since they only care about intragraph proximity. In our framework, we aim to use a method to complete the crossgraph extraction of embedded results. This means that the parameters trained in the subgraphs can be applied to the target graph. Secondly, since they are unsupervised network embedding methods (i.e., DeepWalk and node2vec) or supervised for a particular task (i.e., DeepInf), they may not capture the desired information (i.e., expected influence spread) within im problem. In our case, the network embedding is trained endtoend for the optimization target, thus it can be more discriminative.
In order to fully encode the topological context of the nodes in a network and meet the needs of our framework, we present a variation of Structure2vec [8] to achieve this task. Structure2vec learns feature space with discriminative information, and embeds latent variable models into feature spaces. It combines the characteristics of nodes, edges and network structure in the current network. These characteristics will recursively aggregate according to the state of the current network. Structure2vec can achieve endtoend learning through the combination with dqn, and the parameters learned from training graphs are well applied to the test graph. However, the original Structure2vec model cannot be directly applied in dqn. Hence, we present a variant of this as follows.
First, we initialize the vectors of all nodes and set each of them as a dimensional zero vector^{1}^{1}1In line with [8], is generally set to 64, it can be adjusted according to the size of the network. After iterations, ( is usually small, set to 4 or less), each node reaches to the final state. In the im scenario, we adopt to indicate whether node is in the partial solution . That is, if the node appears in the seed set , else . The formula for the update of vectors is as follows:
(1) 
In Eq. 1,
refers to the Rectified Linear Unit of a neural network,
is the neighbor set of node , represents the vector of node during the th iteration, is the weight of edge , and are the parameters that need to be trained. Although all the neighbors of each node remain unchanged during the update process, in order to better model the nonlinear mapping of the input space, we add two parameters andto construct two independent layers of MultiLayer Perceptron in the above formula.
It can be seen that the first two items in the equation aggregate the surrounding information by summing up the neighbors of . Besides, during the iterations, the update formula can also pass information and network characteristics of a node across iterations. When the embeddings of all nodes have been updated, the next iteration begins. After iterations, the vector of will contain information about all neighbors within the hop neighborhood.
After generating the vector of each node in the network, we define the evaluation function that will be used in dqn. As dqn employs function to evaluate the quality (i.e., influence) of each candidate solution (i.e., node) at a particular state (i.e., partial seeds), it naturally points to . In this regard, we shall refer to as function. Traditionally, the function is manually set based on experience, which is challenging. Hence, we use neural networks to find the best performing evaluation function. In state , the evaluation function of node is defined as follows:
(2) 
where is the vector generated after iterations; is the concatenation operator. Because is mainly determined by the embedding of the current node and its surrounding neighbors, the function is related to the parameters , all of which need to be learned. We denote all these parameters using for simplicity. By using matrix operations, we can reduce the input dimension and output a singlevalued using the idea of value function approximation. After adding a new node to the seed set, is changed from 0 to 1. In the new state, embedding needs to be performed again, and the value of the remaining nodes needs to be recalculated.
When we evaluate the quality of each node, the node with the best marginal expected influence is added to the seed set by the greedy algorithm until the number of seeds reaches . In general, given a network , a positive integer , a seed set , as a set of candidate nodes, we shall calculate of each node in the candidate set. If , then we add node to .
IvB Model Training via DQN
In the im problem, we are unable to acquire sufficient labeled samples for training the parameters as the exact evaluation of is #Phard. Hence, we turn to deep reinforcement learning. Intuitively, deep reinforcement learning enables endtoend learning from Perception to Action and automatically extracts complex features. The most representative of these approaches is the dqn (Deep QNetwork) algorithm [6, 7]. dqn exhibits several advantages over existing algorithms. First of all, it uses reward to construct labels through QLearning. Second, the correlation between data samples is eliminated by experience replay. Finally, the input to dqn is a graph, and the output is the action and the corresponding function. We refer to an evaluation function with weights as a Qnetwork. We will now show how to train the Qnetwork in our scenario.
Reinforcement learning needs to consider the interaction between Agent and Environment. For our im scenario, we refer to Agent as an object with behavior. No matter what kind of scenario it is applied, an Agent contains a series of Action, State, and Reward. For the im problem, we define these reinforcement learning components as follows:

Action: represents actions of the agent. Each action adds a node () to the current seed set . We use the network embedding for to denote an action.

State: is the state of the world that an agent can perceive. In the im problem, we use the current seed set to represent the state . The final state is the nodes that have been selected. The state is denoted by the embedding of the currently selected node.

Reward: Reward is a real value. Each time the Agent interacts with the environment, the environment returns a reward for evaluating the action, which represent reward or punishment. The state changes from to , after adding node to the current seed set. The increment of the influence range is the reward of node in state , . It should be noted that the ultimate goal of learning is to maximize the cumulative reward .

Transition: When a node is selected to join the seed set, will change from 0 to 1. We define this process as .
Based on these definitions, Policy of reinforcement learning can be constructed. Policy is the behavior function of the agent, which is a mapping from state to action. It tells the agent how to pick the next action. A policy that picks an optimal action based on the current value is referred to as greedy policy. That is, . In this paper ,we use policy, which is a strategy including a random policy and a greedy policy. The advantage of policy is as follows. The usage of random policy can expand the search range, and a greedy policy can facilitate refinement of the value. Generally, is a small value as the probability of selecting random actions. The policy balances exploration and exploitation by adapting .
To train the model, we also need to define a loss function. In order to provide a labeled sample to the
network, dqn adds a target function: compared to Qlearning, is decaying factor, which controls the importance of future rewards in learning. If , the model will not learn any future reward information, become shortsighted, and only pay attention to the current interests; if , the expected value is continuously accumulated, there is no attenuation, so the expected value may diverge. Therefore, is generally set to a number slightly smaller than 1.In the im problem, the reward of an action can only be calculated accurately after a series of actions. Therefore, we use step Qlearning, which can effectively handle delayed rewards. Specifically, we wait steps before updating the parameters to collect rewards for the future more accurately. We set target as . Hence, the loss function in the neural network is:
(3) 
The training process is outlined in Algorithm 1. Because of the continuity between the data, the training is easily affected by the sample distribution and it is difficult to converge. In this regard, we use experience replay to update the parameters. In this way, the training samples stored in the dataset can be randomly extracted, so that the training process becomes smooth. We use a batch of homogeneous networks during the training. The episode represents the process of obtaining a complete sequence of a network seed set (Lines 111). For each network that is trained, the seed set is initialized (Lines 2). Next, we embed each node and calculate its value. According to the embedding process discussed earlier in this section, we update the nodes’ embeddings and calculate the value for each node (Lines 34). After getting the influence quality of each node, we apply the aforementioned policy. In particular, we randomly select a node with probability ; otherwise we select the node with the highest value with respect to current partial solution. The selected node is then added to the seed set (Lines 57). Finally, in order to accurately collect rewards for the future, we will wait for steps before updating the parameters. After steps, the empirical sample randomly extracted from replay memory is adopted to update the parameters using sgd
(Stochastic Gradient Descent) (Lines 912).
V Seeds Selection
Va Generating Result Set via Learned Function
The training process results in the learned parameters . Once the parameters are learned, they can be used to address im problem in multiple networks. In this part, we shall show the test phase of disco model, i.e., online selection of seeds. First of all, as we have learned all the parameters in , we are able to directly calculate the predicted expected marginal influence for each node. Afterwards, intuitively, it is natural for one to follow the existing hillclimb strategy to iteratively select the best node that exhibits the highest predicted marginal influence. However, our empirical (shown in Section VIB) and theoretical studies (will show immediately) reveal that the order of the nodes with respect to their values remain almost unchanged whenever we select a seed and recompute the network embeddings as well as the values. Therefore, for the seeds selection phase, we hereby present a much better solution that is strictly suitable for our disco framework. To begin with, we shall first conduct a study over the intrinsic features for our learning model. In particular, with the help of CertaintyFactor (CF) model [35]
, we theoretically examine the difference of the predicted influences for any pair of nodes before and after some seed is selected into results set. The CF model is a method for managing uncertainty in rulebased systems. A
represents a person’s change in belief in the hypothesis given the evidence. In particular, a from 0 to 1 means that the person’s belief increases.For an arbitrary pair of nonseed nodes and , and current seeds set , there corresponding predicted influences are and , respectively. Suppose after a new seed, but not or , is selected into the result set, leading to a new partial solution . For brevity, we denote and by and , respectively. Similarly, we also use and for the values of node . Then the following holds.
Theorem 1
, where are not seeds, and a very small positive number ,
(4) 
That is, the order of any two nodes before and after recomputing the embeddings and values does not change (The proof is given in Appendix A).
Further, we justify that is expected as a very small positive value.
Claim V.1
and are not seeds, suppose the number of nodes and average degree of graph is and , respectively. Then, according to disco model (where is fixed less than 5 and values are Maxmin Normalized):
(The proof is given in Appendix B).
In practice, the average degree of nodes in realworld social networks are always very small (i.e., within 37) [36] compared to the size of the network. Therefore, is expected to be a very small positive value in practice.
According to the above study, it is known that should probably be a very small nonnegative value. That is to say, whether or not we recompute the embeddings and values after each seeds insertion can hardly affect the order of nonseed nodes. Therefore, during the seeds selection phase, instead of iteratively selectandrecompute the embeddings and values according to each seed insertion, we simplify the procedure into only one iteration, by embedding only once and selecting the top nodes with the maximum without updating the predicted values.
The seeds selection process is outlined in Algorithm 2. Given a network and a budget , we first initialize the seed set to be empty and initialize each node in the network as a dimensional zero vector (Lines 1). Next, we embed each node in the network (Lines 35). The update of the embeddings can be performed according to Equation 1, and the parameters have already been learned. The formula aggregates the neighborhood information of node , and encodes the node information and network characteristics (Lines 5). After getting embeddings of the nodes, we calculate the influence quality of each node according to Equation 2 using the learned parameters (Line 6). Finally, the node with the top influence quality is added to the seed set.
The time complexity of the selection process is determined by three parts. First, in the embedding phase, the complexity is influenced by the number of nodes and the number of neighbors . As is usually a small constant (e.g., ) in Algorithm 2, network embedding takes . Second, after the node is embedded, the quality of nodes in the graph is evaluated using the formula . Since the values of have been learned, this step is influenced by the time taken to find the neighbors of node . Hence, it takes time. Finally, selecting the optimal node according to function and adding the node to the set takes . Consequently, the total time complexity for seeds selection is .
VB Pretraining the Model in Largescale or Evolutionary Networks
As a learningbased framework (also shown in Definition 2), we need to pretrain the embedding and function parameters offline using a group of training networks. Intuitively, the training and testing (i.e., target) networks should be homogeneous in terms of topology such that the quality of the learned parameters can be guaranteed. In general, the homogeneity in terms of topology can be reflected in the aspects of many topological properties, e.g., degree distribution, spectrum, etc. Therefore, in order to select seeds within a targeted network, we need to train the required parameters in a group of homogeneous networks offline. Afterwards, the trained model can be further used to select seeds in a target network. In the following, depending on the specific characteristic of target networks, we shall present two different pretraining strategies of disco.
Applying disco in largescale stationary networks. Consider a largescale stationary network without any evolution log. We can turn to subgraphsampling technique to generate sufficient homogeneous training networks. Afterwards, with learned parameters from these small networks, we can address im over the target largescale network. In order to ensure that the topological features of the sampled training subgraphs are as consistent as possible with the original largescale target network, we evaluate different sampling algorithms following the same framework adopted in [37, 38]. In particular, we apply different sampling methods to real largescale networks and sample subgraphs over a series of sample fractions = . Following the methods introduced in [37]
, KolmogorovSmirnov Dstatistic is adopted to measure the differences between the network statistics distributions of the sampled subgraphs and the original graph, where Dstatistic is typically applied as part of the KomogorovSmirnov test to reject the null hypothesis. It is defined as
, wheredenotes the range of random variables;
and are two empirical distribution functions of the data. We found that our experimental results show similar phenomenon with the benchmarking papers [37, 38]. That is, TopologyBased Sampling is superior to Node Sampling and Edge Sampling in the distribution of graph features such as degree distribution and clustering coefficient distribution. So in the next section, we adopt TopologyBased sampling methods in our model and compare several existing TopologyBased subgraph sampling methods, including Breadth First Sampling (BFS)[39], Simple Random Walk (SRW) and its variants, namely Random Walk with Flyback (RWF) and Induced Subgraph Random Walk Sampling (ISRW) [40, 41], as well as Snowball Sampling (SB)[42], a variant of Breadth First Search which limits the number of neighbors that are added to the sample. Notably, as the subgraphsampling technique is beyond the focus of this work, we do not discuss the detailed techniques for these methods.Applying disco in evolutionary networks. Almost all existing social networks keep on evolving, with insertion of new nodes/edges and deletion of old nodes/edges. Although social networks evolve rapidly over time, the structural properties remain relatively stable [36]. During the evolution of a particular realworld network, we advocate that any two historical snapshots of the same network are homogeneous to each other. By leveraging on the aforementioned technique, we now briefly describe how disco can address the im problem in dynamic networks via timebased sampling. In particular, given a dynamic network , whose snapshot at time is referred to as , we can apply disco in the following way. During the Learning phase of Definition 2, we can train the model using a series of temporal (sampled) network snapshots (i.e., ). For the Seeds selection phase, the trained model can be used to select the best seed set in an arbitrary snapshot of . Besides, if any snapshot of the network is very large, we can also adopt the subgraph sampling method mentioned above.
Based on this strategy, we can accomplish the seeds selection task over a target network in realtime, although it keeps on evolving. This provides a practical solution for the im problem in evolutionary networks.
Vi Experiments
In this section, we evaluate the performance of the disco framework. We compare our model with two stateoftheart traditional im solutions, namely imm [2] and ssa [9], as suggested by a recent benchmarking study [26]. Recall that the efforts in [18, 19, 22, 23] are designed for competitive im and hence are orthogonal to our problem. In line with all the im solutions, we evaluate the performances from two aspects, namely computational efficiency and influence quality. Besides, as a learningbased solution, we also justify our model generality within evolutionary scenarios. All the experiments were performed on a machine with Intel Xeon CPU (2.2GHz, 20 cores), 512GB of DDR4 RAM, Nvidia Tesla K80 with 24GB GDDR5, running Linux^{2}^{2}2We shall provide the url of complete source code for disco here as soon as this paper is accepted..
Dataset  n  m  Type  Avg.Degree 

HepPh  12K  118K  Undirected  9.83 
DBLP  317K  1.05M  Undirected  3.31 
LiveJournal  4.85M  69M  Directed  6.5 
Orkut  3.07M  117.1M  Undirected  4.8 
Via Experimental Setup
Datasets. In the experiments, we present results on four realworld social networks, taken from the SNAP repository^{3}^{3}3https://snap.stanford.edu/data/, as shown in Table I. In these four datasets, HepPh and DBLP are citation networks, LiveJournal and Orkut are the largest online social networks ever used in influence maximization. For each realworld network, we use the following sampling methods: Breadth First Sampling (BFS) [39], Simple Random Walk (SRW), Induced Subgraph Random Walk Sampling (ISRW) [40, 41], and Snowball Sampling (SB) [42].
Diffusion Models. disco can be easily adapted to different diffusion models. For instance, we can simply revise the Reward definition to switch from ic model to lt model. As our experimental results under both lt and ic models are qualitatively similar, we mainly report the results under the ic model here. In ic model, each edge has a constant probability . In vast majority of the im techniques, takes the value of assigned to all the edges of the network. In order to fairly calculate the expected range of influence for the three approaches, we first record the seed set of each algorithm independently, and then perform simulated propagations based on the selected seeds. Finally, we take the average result of simulations as the number of influenced nodes for each tested approach.
Parameter Settings. For our method, we set the batch size, i.e., the number of samples extracted from each time, as . Besides, we set as 5 and the learning rate of sgd to . We set the dimension of the nodes embedded in the network to . As suggested in imm [2], we set throughout the experiments in both imm and ssa. It should be noted that the seed set produced by ssa is not constant. Therefore, for ssa and imm, we report the average results over 100 independent runs (i.e., 100 independent results seed sets, each of which is averaged for simulations).
ViB Experimental Results
Training issues. As remarked earlier, during dqn training, we need to obtain a batch of training graphs by sampling the realworld network. In our experiments, we mainly compare the aforementioned four sampling methods (BFS, ISRW, Snowball, and SRW). The results are shown in Figure 2. Observe that for the disco training process, the results of the subgraph training model obtained by using different TopologyBased sampling methods have ignorable difference, indicating that different TopologyBased sampling method has little effect on the result seeds quality. In other words, the TopologyBased sampling method and the disco framework are loosely coupled. Therefore, we chose to use simple and mature BFS (Breadth First Sampling) as the default method in rest of the experiments.
Given the sampled subnetworks, we train the dqn parameters using greedy exploration described in Section IVB. That is, the next action is randomly selected with probability , and the action of maximizing the function is selected with probability . During training, is linearly annealed from 1 to 0.05 in ten thousand steps. The training procedure will terminate when the value of is less than or the training time exceeds 3 days. For both HepPh, DBLP, the training phases terminate at 2.5 and 23.5 hours, respectively; for LiveJournal and Orkut, we limit the training time within 3 days and apply the learned parameters to node selection.
Notably, the training phase is performed only once and we do not need to train while running an im query. The study on evolutionary networks at the end of this section further justifies that, once trained, it can be applied many times to address the im problem.
Computational efficiency. We compare the running times of the three algorithms by varying from to . The results on datasets HepPh, DBLP, LiveJournal and Orkut are reported in Figure 3. We can make the following observations. In terms of computational efficiency, disco significantly outperforms imm and ssa by a huge margin. Regardless of the value of , the cost of the disco is significantly less than that of imm and ssa. Particularly, when the number of seed nodes selected is small, the performance gain is more prominent. For example, under the ic model, when and , disco is 23 and 1.57 times faster than ssa on LiveJournal network, respectively.
Influence quality. Next, we compare the quality of the seed sets generated by imm, ssa and disco on IC models. For each algorithm we set the optimal parameter values according to their original papers and then evaluate their quality. As can be seen from Figure 4, our proposed method is as good as imm and ssa in real networks of different sizes, and the growth of influence spread with have few minor fluctuations. Note that the scales of the vertical axis in the four figures are different. In our experiments, we also observe that ssa results are unstable. In comparison, our method can choose superior result set, and the results are stable across runs. Importantly, disco produces the same or even better quality results as the stateoftheart.
HepPh  1  1  0.997  0.996  0.970  0.996  0.95  0.995 
DBLP  1  1  0.942  0.998  0.962  0.998  0.968  0.996 
LiveJournal  1  1  1  0.999  0.985  0.998  0.979  0.998 
Orkut  1  1  0.983  0.995  0. 965  0.997  0.972  0.993 
Justification of seeds selection strategy. When we select the seed set, we only execute the process of network embedding and then select nodes with the largest values, instead of updating the embedding every time a seed node is selected. The purpose of this experiment is to show the difference over the results between the strategy shown in Section VA and traditional hillclimb strategy, i.e., selectandupdate marginal influence iteratively. To this end, we compare the results by these two different strategies and report the differences in terms of node order and influence quality. We hereby report the results in Table II. The column records the difference between the result quality for our seeds selection strategy of Section VA and the traditional hillclimb iterative method (i.e., select a seed and update the marginal influence for the rest iteratively). The column titled records the probability that the node order changes between the two strategies. As can be seen from Table II, the probability of the order does not change in a seed set when we select one node iteratively or select nodes at a time ( on Orkut). Once again, this supports our theoretical discussion in Theorem 1 and Claim V.1.
Performance in evolutionary networks. Finally, we evaluate the performance of disco on evolutionary networks following the strategy discussed in Section VB. Among all the tested datasets, HepPh is associated with evolution logs, i.e., the timestamps when each node/edge is inserted to the network. Therefore, following the proposed implementation model of disco in evolutionary network, we train the model using a series of temporal snapshots for HepPh networks when it has , and nodes, respectively. Then the three trained models are tested on a future snapshot of the network, which contains nodes, during the evolution. As can be seen in Figure 5, the trained model from nodes snapshot and that from nodes ones have little differences in terms of the influence quality. For instance, when the number of seeds is , the difference between nodes model and nodes model in the influence spread is , which is about of the total influenced nodes. Therefore, in realworld dynamic networks, disco can be easily trained using earlier snapshot or even the subnetworks of some snapshots. When the network evolves and expands, we can continue to use the pretrained model for selecting seeds in a larger network snapshot.
Vii Conclusions
In this work, we have presented a novel framework, disco, to address the im problem by exploiting machine learning technique. To the best of our knowledge, we are the first to employ deep learning methods to address the classical im problem. Our framework incorporates both dqn and network embedding methods to train superior approximation of the function in im. Compared to stateoftheart samplingbased im solutions, disco can avoid the costly diffusion sampling phase. By training with sampled subnetworks once, the learned model can be applied many times. Therefore, we are able to achieve superior efficiency compared to stateoftheart classical solutions. Besides, the result quality in terms of influence spread of disco is the same or better than the competitors. As the seminal effort towards the paradigm of learningbased im solution, disco demonstrates exciting performance in terms of result quality and running time. We believe that our effort paves the way for a new direction to address the challenging classical im problem.
Acknowledgment
The work is supported by National Nature Science Foundation of China (No. 61672408), the Director Foundation Project of National Engineering Laboratory for Public Safety Risk Perception and Control by Big Data (PSRPC), Fundamental Research Funds for the Central Universities (No. JB181505), Natural Science Basic Research Plan in Shaanxi Province of China (No. 2018JM6073) and China 111 Project (No. B16037).
Appendix A Proof of Theorem 1
Given , there are two different cases as follows.
Case 1: , then it is ;
Case 2: , then it is .
We shall prove the theorem in Case 1 first.
First we assume that at state , given value of every node, which are ranked in the following order:
For any nodes and , , where is a small positive value. We find that after selecting a seed node , the values recalculated after embedding are updated to the following values, respectively:
As , we calculate the probability for and to keep their order, that is to calculate . Besides, is equivalent to . To get , the above probability can be inferred by scaling to
As , , by the condition , can be obtained, then .
For Case 2, the proof is the same as Case 1.
Appendix B Proof of Claim v.1
The proof can be separated into two stages.
First stage. Firstly, we can prove that when ,
(5) 
where refers to the sum of the embeddings for nodes that are th hop away from node when all nodes in have been removed from the graph.
According to the update process of at state
(6) 
, and ,. We divide into two parts, the former column is , and the last column is . Thus the update formula for at state is:
(7) 
We can get it from Eq. 7 as follows.
Notably, in the initial state, all parameters are random numbers in the interval . Besides, all the parameters in are vectors, whose entries are nonnegative (in implementation they are all normalized into [0,1]). Without loss of generality, we assume all these parameters are dimensional. Then, ReLU can be resolved to get
(8) 
Hereby, represents the final vector after iterations for state . During the current state, and are not seed nodes, so the corresponding is . Next, we will use as an example to expand the equation. are the same as , and are not described here. Then the update formula for each iteration is
(9) 
Assume that have neighbors, which are , then Eq. 9 can be expanded to:
(10) 
Similarly, we assume that have neighbors, namely . For , we assume that there are neighbors, namely . For , we assume that there are neighbors, namely . Because in our experiment, of all nodes, so that can be expressed as follows:
As the weight is fixed, so is mainly related to the number of neighbors, and the formula can be solved as . The representation in the network is approximately the sum of the number of neighbors of all nodes within four hops of node A.
Finally, we can get Eq. 5.
Second stage. Secondly, we study the right side of Eq. 5, which can be further derived as:
For ease of discussion, we denote the right part for to as to , respectively. For the case , it is not hard to see that only when the new seed, say , appears as either ’s or ’s (but not both’s) instant neighbor, is not zero. Notably, if is both ’s and ’s neighbor at the same time, the value within the square brackets will also be zero. Therefore,
where denotes the set of nodes that are hop away from , e.g., is in fact .
Then we carry on with when . If is either or ’s 2hop neighbor, but not both, appears to be not zero. Similar with , the probability for this case is . Besides, if appears to be the instant neighbor of or , but not both, also appears positive. The probability for this case is the same with , namely . Then, the expectation of is as follows.
Following the same way, we can derive and .
In fact, it is obvious that
Comments
There are no comments yet.