Earned Benefit Maximization in Social Networks Under Budget Constraint

04/08/2020 ∙ by Suman Banerjee, et al. ∙ IIT Gandhinagar 0

Given a social network with nonuniform selection cost of the users, the problem of Budgeted Influence Maximization (BIM in short) asks for selecting a subset of the nodes within an allocated budget for initial activation, such that due to the cascading effect, influence in the network is maximized. In this paper, we study this problem with a variation, where a set of nodes are designated as target nodes, each of them is assigned with a benefit value, that can be earned by influencing them, and our goal is to maximize the earned benefit by initially activating a set of nodes within the budget. We call this problem as the Earned Benefit Maximization Problem. First, we show that this problem is NPHard and the benefit function is monotone, submodular under the Independent Cascade Model of diffusion. We propose an incremental greedy strategy for this problem and show, with minor modification it gives (1-1/√(e))factor approximation guarantee on the earned benefit. Next, by exploiting the submodularity property of the benefit function, we improve the efficiency of the proposed greedy algorithm. Then, we propose a hopbased heuristic method, which works based on the computation of the `expected earned benefit' of the effective neighbors corresponding to the target nodes. Finally, we perform a series of extensive experiments with four reallife, publicly available social network datasets. From the experiments, we observe that the seed sets selected by the proposed algorithms can achieve more benefit compared to many existing methods. Particularly, the hopbased approach is found to be more efficient than the other ones for solving this problem.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Social Networks are an interconnected structure among a group of agents [1]. These are effective platforms, where word-of-mouth effect happens at a large scale and information, ideas, rumors etc. disseminates widely and rapidly [13], [25]. This phenomenon has been exploited by the E-Commerce houses for promoting their brands among people [14], [2]. The key problem is that which users initially to choose for initiating the diffusion process such that the influence in the network gets maximized. Formally this problem is called as the Social Influence Maximization Problem [23]. Due to the wider applications in different domains such as viral marketing [11], social recommendation [51], market basket analysis [35], prediction of hot topics [20] this problem has been studied in different variations. Please, look into [31] [7] for recent surveys. The social influence happens due to the cascading process in the underlaying network [33, 18], and this has huge impact, because human decisions from personal (which place to visit and which restaurant to explore? ) to political (which political party to vote in the coming election?) are influenced by their neighbors, at least to some extent. To study the diffusion process in a social network several diffusion models have been studied. Please look into [29] for recent survey.

One of the recently introduced variants of the SIM Problem is the problem of Budgeted Influence Maximization [39]. This problem assumes that users of the network have nonuniform selection cost, which signifies the amount of incentive need to be paid if a user is selected as a seed node. A fixed amount of budget is allocated for the seed set selection process, and the job is to choose highly influential seed nodes within the budget to maximize the influence. There are a few solution methodologies available in the literature for this problem, such as directed acyclic graph-based heuristic by Nguyen et al. [39], Sample Average Aggregate Scheme by Guney et al. [19], ComBIM by Banerjee et al. [4]. In all these studies, it is implicitly assumed that the influencing each user is equally important, though commercial campaigns are targated in nature, which means a specific brand is to be advertised towards a specific set of users. Because, advertising a brand towards a set of people who do not have any interest towards it does not make any sence. On the other hand, there are some studies in the literature that consider the target user in the influence maximization process [32] [34] [37]. To the best of our knowledge, none of the targeted influence maximization studies considers non-uniform selection cost of the users.

In target advertisement scenarios, influencing different target user leads to the different amount of benefit and from the advertisers perspective, the main goal is to maximize the total earned benefit. Motivated by this practical scenario, we study the Earned Benefit Maximization Problem (EBM Problem), where the target users are associated with a benefit value, users are associated with a selection cost, and a fixed budget is given. The goal is to select a seed set within the budget to maximize the earned benefit. There are previous studies on this problem by us. In [6], we came up with a integer programming formulation for this problem. In [3, 5], we proposed a ranking approach with the exploitation of community structure for this problem. However, in this study our approach to this problem is very different. We start by studying the properties of the earned benefit function and propose a number of solutions followed by experiments. Particularly, we make the following contributions in this paper.

  • We extend the BIM Problem by considering the notion of target users with non-uniform benefit values and propose the ‘Earned Benefit Maximization Problem’.

  • For the EBM Problem, we propose an ‘incremental greedy strategy’ and show with minor modification, this methodology leads to factor approximation gurrantee on the earned benefit.

  • We show that the benefit function is monotone and sub-modular under IC Model of diffusion and exploit this property to improve the efficiency of the incremental greedy algorithm.

  • Using the concept of ‘expected earned benefit’ of a node, we propose an efficient hop-based heuristic for solving this problem.

  • We conduct a set of extensive experiments with four real-world publicly available social network datasets for showing the effectiveness and efficiency of the proposed methodologies.

Rest of the paper is organized as follows. Section II describes some of the recent studies from the literature. Section III presents preliminary definitions, formally defines the EBM Problem and state its hardness result. The proposed methodologies for solving this problem have been described in Section IV. Section V contains the experimental evaluations of the proposed methodologies, and finally, in Section VI, we conclude this study and give the future directions.

Ii Related Work

In this section, we present some closely related studies from the literature. This study closely related with the SIM Problem and its variants and more particularly for the targeted users.

Social Influence Maximization and Its Variants

Given a social network of users, which nodes should be chosen for initially injecting the information that causes the maximum influence in the network? This problem is known as the social influence maximization. Initially, this problem was identified in the context of viral marketing by Damingos and Rechardson [41]. However, Kempe et al. [23] were the first to investigate the computational issues of this Problem and proved that it is NP-Hard and proposed an incremental greedy algorithm, which admits -factor approximation ratio. Their study triggers a vast amount of research on the SIM Problem and hence, a plenty of solution methodologies are available in the literature, such as Cost-Effective Lazy Forward (CELF) [27], CELF++ [16], SIMPATH [17], Two-Phase Influence Maximization () [46], Influence Maximization Via Martingales (IMM) [45],

Influence Ranking and Influence estimation

(IRIE) [22], different community-based solution methodologies [42] [30], different non-traditional optimization algorithms such as genetic algorithm [53],

discrete particle swarm optimization

[48] and many more. Also, there are several variants of this problem studied in the literature, such as -coverage problem [36], budgeted influence maximization problem [39, 4] and many more.

Social Influence Maximization for the Targeted Users

Recently, the problem of influence maximization for the targeted users in the social network has been addressed by the researchers. Li et al. [32] studied this problem, and they considered target users, who are relevant to a particular keyword. There solution methodology for this problem was based on the construction of reverse influence set and its indexing. Song et al. [43] addressed the targeted influence maximization problem by considering the geographical location of the users and the time deadline within which the users should be influenced. Recently, Wen et al. [50] studied this problem focusing on mainly two issues: how to capture the social influence among the target user and develop an efficient scheme that can offer the wider influence spread among the target users. Wang et al. [49] also solved the same problem by considering the impact of budget on the influence spread and incorporating efficient sampling techniques.

However, to the best of the authors’ knowledge, none of the existing studies on targeted influence maximization problem considers the nonuniform benefit associated with each target users and the nonuniform selection cost of users. In this paper, we study the Earned Benefit Maximization Problem, where the target users are associated with non-uniform benefit value and non-uniform selection cost of the users.

Iii Background and Problem Definition

We consider that the social network is represented as a weighted graph , where the vertex set is the set of users of the network and the edge set, represents the set of social ties among the users.

is the edge weight function that assigns each edge to its influence probability, i.e.,

. For any edge , we denote its influence probability as . This signifies the probability that the user will be able to influence . We denote the number of nodes and edges of by and , respectively. Next, we briefly describe the Independent Cascade Model, which we consider as the underlying diffusion model in our study.

Iii-a The Independent Cascade Model

The Independent Cascade Model is one of the models, which has been predominantly used in influence maximization literature [23, 46, 45]. Here, the diffusion of information starts from a set of nodes selected initially and known as the seed nodes. All the nodes of the network are ignorant of the information and the seed nodes are informed at time . Now, from these seed nodes, information is diffused by the following rules:

  • information is diffused in discrete time steps,

  • a node can be either one of the two states: ‘active’ (‘influenced’) or inactive (‘uninfluenced’),

  • a node can change its state from inactive to active however, not the vice-versa.

  • once a node is influenced, it will remain in this state.

Each active node (say, ) at current time stamp (say ) will get a chance to activate its currently inactive neighbors ( and is inactive) with probability as their edge weight. If any one of them succeeds, then will become an active node at time . Only the recently active node can take part in the triggering process. This process stops, when no more node activation is possible. Next, we introduce the Earned Benefit Maximization Problem.

Iii-B The Earned Benefit Maximization Problem

In this problem, along with the social network , a subset of the users is given as the target users. Each of them is associated with a benefit, which can be earned by influencing the corresponding target user. This can be characterized by the benefit function . For any his benefit is denoted as and for any , . For a seed set , the set of nodes influenced by it is denoted by . As the diffusion of information under IC Model is a probabilistic process, the influence of a seed set is measured in terms of expectation. Hence, the number of influenced nodes due to the seed set is , where, is the social influence function [23]. Now, the earned benefit by the seed set is defined as . Here, is the earned benefit function, that maps each subset of the nodes to is expected earned benefit value, i.e., .

In real-world campaigns, earned benefit maximization is done by conducting an information diffusion process. As the real-life social networks are formed by the rational human agents, if a user is selected as seed, incentivization is required. This can be characterized by the cost function . Selection cost associated with the node is denoted as . For a subset of nodes , their selection cost is denoted as , and a fixed amount of budget is allocated for seed set selection. Hence, the problem here is to choose a subset from to maximize the function subject to the constraint . Formally, the problem can be expressed as follows:

Earned Benefit Maximization Problem
Input: Social Network , Target Nodes , Cost Function , Benefit Function , and Budget .

Problem: Find out the seed set () such that and for any other seed set with , .

The EBM Problem is basically the generalization of the BIM Problem, which is NP-Hard under the IC Model of diffusion [39]. Hence, Theorem 1 holds.

Theorem 1.

The EBM Problem is NP-Hard under Independent Cascade Model of diffusion.

This result motivates us to design suitable approximation algorithm and heuristic solution for this problem. We discuss them in the next Section.

Iv Proposed Methodology

In this section, we present our proposed methodologies for the EBM problem. Prior to that, we establish two properties of the benefit function, which will be used subsequently.

Iv-a Properties of the Benefit Function

As mentioned previously, the benefit earned by a given seed set is defined as . So, the benefit function can be thought of a set function, which is defined on the ground set , i.e., . Now, we prove two important properties of the benefit function, namely monotonicity and sub-modularity. This two properties are exploited for proving the approximation guarantee of Algorithm 2.

Definition 1 (Non-negativity and Monotonicity of Set Function).

A set function defined over the ground set is said to be non-negative if , and monotone if , .

Lemma 1.

The benefit function, is non-negative and monotone under IC Model of diffusion.

Proof.

It is reported in the literature that the social influence function is non-negative and monotone [23]. As , , it is trivial to observe that , . By the monotonicity property of , ,

which means is monotone. This completes the proof. ∎

Definition 2 (Sub-modularity of Set Function).

A set function defined over the ground set is said to be sub-modular if and , the following condition is met:

(1)
Lemma 2.

The benefit function, is sub-modular under IC Model of diffusion.

Proof.

In the literature, it is mentioned that the social influence function, is sub-modular under the IC Model of diffusion [23]. Let us assume that and . Now, from the definition of , we have

By simple set theoretic interpretation, we can write

Hence,

[This is due to to the monotonicity property of ]

We obtain the inequality required to show sub-modularity property of . This completes the proof. ∎

Iv-B Incremental Greedy Algorithm

Let us assume be the seed set and . We define the marginal gain in benefit for the node with respect to the seed set as the amount of increased benefit when the node is included in the seed set . Formally, it is stated in Definition 3.

Definition 3 (Marginal Gain in Earned Benefit).

Given a seed set and a node , which is currently not in the seed set, i.e., , its marginal gain in the earned benefit with respect to the seed set is denoted as and defined as

(2)

The working principle of the proposed incremental greedy strategy is as follows. Starting with an empty seed set, this procedure incrementally selects a node within the budget that causes the maximum marginal gain per unit cost. Let and denote the seed set and remaining budget at the end of -th iteration. In the -th iteration, the node is added in the seed set , i.e., , if the following condition is met.

(3)

In an iteration, if the no seed node is selected within the remaining budget, then is null and if this happens, then the procedure is exiting. Algorithm 1 states the procedure.

1:Social Network , Target Nodes , Cost Function , Benefit Function , and Budget .
2:The seed set such that .
3:
4:while  do
5:   
6:   if  then
7:      
8:   end if
9:   
10:   
11:end while
12:
Algorithm 1 Incremental Greedy Algorithm for the EBM Problem

Though the Algorithm 1 is simple to understand, it does not give any bounded approximation guarantee on the earned benefit and we demonstrate this claim with an example.

Example 1.

Let us assume, a network with nodes , where is an isolated node and the remaining nodes connected within themselves with each edge having the diffusion probability . The entire vertex set of the network is the target node set, i.e., . Benefit associated with each target node is . For each , its associated selection cost is and the selection cost of is , where . The allocated budget for the seed set selection process is . The optimal algorithm for this problem should select any node and achieve the earned benefit of amount by influencing all the remaining nodes. However, as Algorithm 1 selects the seed node based on the marginal gain in the earned benefit per unit cost, it will select the node and not any . For the node , the value of , when , is . On the other hand, for any , the value of is . As , Algorithm 1 selects the node . After selecting the node , the remaining budget will be , which is less than . Within this budget none of the nodes can be selected as each of them has the selection cost . Hence, Algorithm 1 terminates by earning the benefit and returning an unutilized budget of amount . The approximation ratio of the Algorithm 1 is define as

In this example the value of is . If the value of is arbitrarily large, then the approximation ratio of Algorithm 1 becomes very very less. Hence, Theorem 2 holds.

Theorem 2.

Algorithm 1 does not provide any constant approximation guarantee.

Now, we present two important inequities on the iterative performance of Algorithm 1 and this result will be used subsequently.

Lemma 3.

After each iteration of the ‘while’ loop , the following inequality always holds

(4)
Proof.

The value of is no more than the sum of the benefit values of the target nodes that are influenced by the seed nodes in , however, not by the nodes in . For each node in , the earned benefit to cost ratio could be at most , where is the earned benefit by the nodes in but not in . This is because maximizes this ratio in Algorithm 1. Since the total selection cost of the nodes in is bounded by the budget , the total earned benefit due to the target nodes in can be at most . Hence, we have

(5)

By definition, we have

(6)

From the Equations (5) and (6), we have

This completes the proof. ∎

Lemma 4.

In any arbitrary iteration of the While loop from Line to of Algorithm 1, the following condition will be true

Proof.

We prove this statement by the method of induction on the iteration of the ‘while’ loop. For the first iteration, i.e., at , we need to show,

From Lemma 3, by putting in Equation (4) we have,

As we are starting with an empty seed set, hence and . This clearly implies that .

Now, suppose the statement holds till iteration. We show that the statement holds in the iteration as well. Now,

Here, the first inequality is due to Lemma 3 and the second one is due to inductive hypothesis. ∎

Algorithm 1 can be modified for yielding a constant approximation ratio on the earned benefit. Let be the seed set generated by the Algorithm 1. be the node that has the highest individual benefit gain. We compare the earned benefit, when the seed set is and the node is . We return the seed set that maximizes the earned benefit. Algorithm 2 formally states the procedure.

1:Social Network , Target Nodes , Cost Function , Benefit Function , and Budget .
2:The seed set such that .
3:
4:
5:
6:
7:
Algorithm 2 Modified Incremental Greedy Algorithm

Algorithm 2 provides bounded approximation guarantee, which is stated in Theorem 3.

Theorem 3.

is the seed set selected by Algorithm 2 and be the optimal seed set, then , where . In other words, Algorithm 2 provides an approximation guarantee of .

Proof.

The strategy of this proof has been used previously for proving the approximation bound of the Budgeted Maximum Coverage Problem by Khuller et al. [24]. Here, we prove the statement by case-wise analysis of Algorithm 2.
Case I
If there exists one node , which has the earned benefit , and is found to be greater than equal to , then will be selected as in Algorithm 2. In this case, the approximation ratio of Algorithm 2 will be as follows:

Case II
If Case I does not happen, then there does not exist any , for which is greater than . This can be divided into two sub-cases.
Case IIa
Now, if we have , then , . Hence, no more node can be added to . Otherwise, the budget constraint will be violated. Without the loss of generality, let us assume that . In this case, can contain one extra node without violating the budget constraint. Now, as the function is sub-modular, hence,

As , . This clarifies that . As and , . This essentially means . In this case, the approximation ratio of the Algorithm 2 will be as follows:

Case IIb
If , we first observe that for real numbers and , the function attains its maximum value, when . Hence, by Lemma 4, we have

Hence, the worst case performance guarantee of Algorithm 2 is . This proves the statement. ∎

Now, we investigate the time requirement of Algorithms 1 and 2. For both of them, it is easy to observe that the time requirement is heavily dependent on the earned benefit calculation for a given seed set. It is reported in the literature that counting the number of influenced nodes for a given seed set is problem [23]. With this argument, we can say that for a given seed set , computing the exact value of the earned benefit is also . Hence, we estimate this value, the way influence of a seed set is estimated [23]. First, a number (say ) of sampled graphs of , i.e., are generated, and for all , for all , with probability and with probability . Now, earned benefit is computed in all of the sampled graphs and the average value is returned as its approximate value, which is given in Equation 7

(7)

If , then traversing subgraphs will require time. Let, be the minimum selection cost among all the users. Maximum number of possible iterations of the While loop (Line 2 to 9) in Algorithm 1 is . Hence, . In Algorithm 1, in each iteration, maximum number of times earned benefit estimations are done is of . Hence, the total number of times earned benefit estimations are of . The required computational time for Algorithm 1 is .

In Algorithm 2, along with the incremental greedy strategy, the node, which can grab the maximum earned benefit has to be found out (Line 3 of Algorithm 2). This can be done earned benefit estimations with a single seed node, and this will take time. Hence, running time of Algorithm 2 is of . If we do the on-line sampling of the input social network for sampled graph generation, then only one subgraph is required per iteration. For storing, this network will take space. Storing the seed set requires space. Hence, the total amount of space required by both Algorithms 1 and 2 is and the number of seed nodes is generally found to be much much less than the number of nodes, i.e., . Hence, . Hence, Theorem 4 holds.

Theorem 4.

Algorithms 1 and 2 have the running time of and space requirement of .

Iv-C Improving the Efficiency of Algorithm 2

Though Algorithm 2 provides a provable approximation bound on the earned benefit, it is highly inefficient, as it estimates the earned benefit many times. Here, we present an improvised version of Algorithm 2 in Algorithm 3 by removing redundant earned benefit estimations due to the exploitation of the sub-modularity property of the earned benefit function.

1:Social Network , Target Nodes , Cost Function , Benefit Function , and Budget .
2:The seed set such that .
3:
4:while  do
5:   for All  do
6:      
7:   end for
8:   while True do
9:      
10:      if  then
11:         
12:         
13:      else
14:         
15:         
16:      end if
17:   end while
18:end while
19:
Algorithm 3 Incremental Greedy Algorithm with Improve Performance in terms of Efficiency (IGAIP).

In Lemma 2, it has been shown that the earned benefit function is sub-modular and this implies that the marginal gain in earned benefit for a non-seed node (say ) with respect to the seed set in -th iteration () will always be more than that of with respect to the seed set in -th iteration. In Algorithm 3, in the first iteration of the while loop (Line ), the earned benefit by the nodes in individually is computed, sorted them in descending order, and put the node with the highest individual earned benefit in the seed set. Now, in the next iteration on words, during the computation of the marginal gain of the non-seed nodes in descending order of their marginal earned benefit, as soon as we get a node, whose marginal gain in the current iteration is more than that in the previous iteration of the next node in the sorted list, then we include the first node and move to the next iteration. This is because, as the benefit function is sub-modular, even if we compute the marginal gain, earned benefit is computed for the second and the subsequent nodes, it cannot be more than the values in the previous iteration. This process is iterated, until the budget is exhausted. One important point to observe here is that, escaping the unnecessary benefit function evaluation does not result in loosing approximation guarantee in the quality of the selected seed set. This exploitation of the sub-modularity property results in significant improvement in the efficiency of our proposed methodology, as we observe in our experiments.

Iv-D Efficient Heuristic Solution

Though Algorithm 3 is quite efficient, it is not enough to deal with large real-life social networks. Here, we propose an efficient heuristic solution for the EBM Problem. Before stating the procedure, first we state one important aspect of social influence. In social networks, influence of a node is bounded within to hops, which is called as the influence zone of a node [44] [10]. According to Goel et al. [15], in a diffusion cascade, less than of the influenced nodes resides more than hop count from any seed node. These existing results reported in the literature motivate us to design algorithm considering the locality of influence effect. Based on this principle, to influence a target node, there should be at least one seed node within a few hop count. In this context, we define h-hop neighbor of a node as follows:

Definition 4 (-hop Neighbor).

For a node , its -hop neighbor is defined as the set of nodes that are at most at a distance of from and denoted as , i.e., .

Among the nodes present in -hop neighbor set of a target node, there can be many nodes, whose influence probability to the target node is extremely low. Hence, those nodes probably cannot be able to influence the target node. To identify such nodes, it is important to compute the influence probability. For a target node , here we describe the procedure for computing , where . We construct the breadth first search tree upto depth rooted at the node . Now, for any node other than root of the tree, the value of can be be computed by the following equation:

(8)

In Equation (8), the value of can be recursively computed, until the child of the currently processing node is . For details, please look into [44]. Now, it is easy to identify among the nodes in the which are effective for influencing the target user . Here, we define the Effective -hop neighbors as follows.

Definition 5 (Effective -hop Neighbors).

Given a target node and an , the effective -hop neighbor(s) of is a subset of its -hop neighbors and denoted as . For , the node is an effective -hop neighbor of the node , if , i.e., .

For any node say , the main criterion to be included in the seed set is how much benefit it can earn. If the node is one of the target nodes, then the benefit associated with this node is surely be earned and along with this, if there are some target nodes (say ) within a few hop distance, benefit corresponding to that node may be earned, however, it depends upon the influence probability . Now, we define the earned benefit of a node as follows.

Definition 6 (Earned Benefit of a Node).

For a node , its earned benefit is defined as the amount of benefit that can be earned by including this node in the seed set. It has two components. One is the direct benefit due to this node. The other one is the expected benefit due to influencing nearby target nodes. Mathematically, it can be expressed as follows:

(9)

There are two components in the right hand side of Equation (9). The first part is due to the benefit associated with this particular node and the second part signifies the ‘expected earned benefit’, i.e., the expected benefit due to the influence of the nodes within the few of distance of the node under consideration.

Now, we describe the hop-based heuristic for solving the EBM Problem. First, we create an array for storing the expected earned benefit of each individual node and initialized with for non-target nodes and associated benefit value for the target nodes (from Lines to of Algorithm 4). Then, for a target node, we compute the effective -hop neighbors (from Lines to ). Then, for each of these nodes, we compute the expected benefit that can be earned by influencing the target node and sum it up. This process is repeated for each of the target nodes. Next, we divide the earned benefit of each target node by its selection cost and sort the nodes in descending order based on this earned benefit value. Finally, we choose the seed node from this sorted list until the budget is exhausted. Algorithm 4 describes this procedure.

1:Social Network , Target Nodes , Cost Function , Benefit Function , Hop Count , Cut off Probability , and Budget .
2:The seed set such that .
3:

Create Vector (

4:for  do
5:   if  then
6:      
7:   end if
8:end for
9:for  do
10:   
11:   for Each  do
12:       using Equation (8) 
13:      if  then
14:         
15:      end if
16:   end for
17:   for Each  do
18:      
19:   end for
20:end for
21:for Each  do
22:   
23:end for
24:
25:
26:
27:while  do
28:   if  then
29:      
30:      
31:   end if
32:   
33:end while
34:
Algorithm 4 A Hop-Based Heuristic for the EBM Problem

Now, we analyze the time and space requirement of Algorithm 4 by assuming it as a sparse and -regular graph. For initializing the array requires time (Line to ). Now, for a target node in a -regular graph, number of nodes and edges within the hop is . Hence, performing breadth first search from upto depth requires time. For computing the influence probability from each node to , i.e., and comparing with requires time. In the worst case, all the -hop neighbor nodes may be the effective -hop neighbor nodes. Then, for computing the earned benefit by influencing the target node requires time. The same process is iterated over all the target nodes. Hence, the execution from Lines to of Algorithm 4 requires . Dividing the earned benefit by the corresponding selection cost requires time (Line to ). Sorting the nodes based on this value requires time. Now, scanning the sorted list for selecting the seed nodes requires time. Hence, total computational time of Algorithm 4 is , which is equivalent to . Other than the input social network, additional space requirements due to storing the earned benefits, influence probability and seed set which is of , , and , respectively. Hence, the total space requirement of Algorithm 4 is of . The formal statement is stated in Theorem 5.

Theorem 5.

Algorithm 4 has the running time of and space requirement of .

V Experimental Evaluation

In this section, we report the experimental evaluation of our proposed methodologies. Initially, we start with a brief description of the datasets.

V-a Dataset Description

In our experiments, we use the following four publicly available social network datasets.

  • Email-Eu-core network Dataset 111http://snap.stanford.edu/data/email-Eu-core.html [52], [26]: The network is generated based on the e-mail exchanges among different departments from a large European research institution. There is an edge between the users and , if there is an e-mail exchange between them.

  • Facebook Network Dataset 222http://snap.stanford.edu/data/ego-Facebook.html [28]: This dataset was collected from survey participants using a Facebook app. Each user of the network is represented by a node, and two vertices are connected by an edge, if the corresponding users are friend of each other in Facebook.

  • Physics Network Dataset 333https://arxiv.org/: This is an academic collaboration network among the researchers of physics section of arxiv.org. Two users are connected by an undirected edge, if they are co-author in atleast one paper.

  • Epinions 444http://www.epinions.com/?sb=1 [40]: This is a who-trust-whom on-line social network of a general consumer review site: Epinions.com. There is a directed edge from the user to , if the user trusts .

Among them, the first, second and fourth one are downloaded from Stanford Social Network Analysis http://snap.stanford.edu/data/index.html and the third one is from https://www.microsoft.com/en-us/research/people/weic/#!selected-projects. Here, we give a brief description of each of the datasets.These datasets have been extensively used in social influence maximization research [11]. Table I gives a basic statistics of the described datasets.


Dataset Name Avg Deg Avg Clus Coeff
Email-Eu-core network 1005 25571 25.443 0.3994
Facebook Dataset 4039 88234 43.6910 0.6055
PHY Network 37154 231584 12.466 0.2371
Epinions 75879 508837 15.6345 0.1378
TABLE I: Basic statistics of the datasets.

V-B Parameter Settings

V-B1 Diffusion Probability

In this paper, we consider the following two diffusion probability settings.

  • Uniform Setting: In this setting, , and . We set the value of as . This value has also been used in the literature, in many studies [16].

  • Trivalency Setting: In this setting, each edge is assigned diffusion probability uniformly at random from the set .

On the other hand,

V-B2 Target Nodes

In this study, we select of the nodes as target nodes and they are chosen uniformly at random. We adopt this settings from [38].

V-B3 Cost and Benefit

In this study, we follow two different settings.

  • First one is the random setting, where the selection cost of the nodes and the earned benefit of the target nodes are selected from the intervals and , respectively, uniformly at random. We adopt this setting from [39] and call it the random setting.

  • Secondly, the influence ability of a node is directly proportional to its degree. Naturally, the selection cost of a node should be proportional to its degree. We adopt another settings from [38]. By this setting, we compute the selection cost of the node is computed as

    (10)

    and in this case, the benefit of each target node is considered as . We call this setting as the ‘degree proportional’ setting.

V-B4 Budget

In our study, based on the two different cost assignment settings, we adopt two different budget settings. In case of random setting, we consider the budget values starting from continued till , and each time is incremented by . In the ‘degree proportional setting’, we start with the budget value of and continued until with a gap of .

V-B5 Hop Count and Cut Off Probability

In Algorithm 4, we use a hop count and cut off probability for computing the effective nodes. In our experiments, we choose the value of as and the value of as . We adopt these settings from [44].

V-C Algorithms in the Experiment

Here, we have listed out the algorithms that we have listed out for the experimentation.

V-C1 Algorithms proposed in this paper

  • Incremental Greedy Approach with Approximation Guarantee (IGAAG): This is basically the Algorithm 2 of this paper, which returns either the set of nodes chosen incrementally by Algorithm 1 or the node that causes maximum individual benefit gain.

  • Incremental Greedy Approach with Improved Performance (IGAIP): This is the Algorithm 3 of this paper, which improves the Algorithm 2 by exploiting the sub-modularity property of the benefit function.

  • Hop-Based Heuristic (HBH): This is the Algorithm 4 of this paper, which works based on the computation of expected earned benefit of the nodes that are within the -hop (for a given value of ) of the target nodes.

V-C2 Baseline Algorithms

  • Maximum Degree Heuristic (Max_DEG): In this method, the maximum degree nodes within the budget is returned as the seed set. This method has been used in previous studies as well [23].

  • Degree Discount Heuristic (DEG_DIS): This is a popular heuristic for the SIM Problem proposed by Chen et al. [12]. In this heuristic, if is a seed node and , then the degree of will be discounted by , where is the number of neighbors of currently in the seed set, and is the degree of . This method has been used in many previous studies [21].

  • Single Discount Heuristic (SIN_DEG): This a variant of degree discount heuristic proposed by Chen et al. [12]. In this heuristic, if is a seed node and , then the degree of will be discounted by . This method has been used as a baseline in previous studies [9] [8].

  • Prefix excluded Maximum Influence Arbarence (PMIA): This is one of the state-of-the-art and popular heuristic for influence maximization problem proposed by Chen et al. [11] [47].

  • ComPBRA: This is a recently developed community-based solution framework for the EBM Problem developed by Banerjee et al. [5].

All the algorithms have been implemented in ‘Python 3.4’ along with ‘NetworkX 1.9.1’. We have carried out all the experiments in a high performance computing cluster with nodes and each of them having cores and of RAM running in Centos environment. As, the Algorithm 2 (IGAAG) is quite inefficient, we don’t execute this on the larger datasets (e.g., Physics Network Dataset, Epinions).

V-D Experimental Results and Discussion

The main goal of our experimentation is to make a comparative study of the proposed as well as baseline methods in terms of performance. It is measured as the amount of earned benefit obtained by influencing the target users due to the initial activation of the seed nodes selected by different algorithms. We also report the computational time requirement by different algorithms for selecting the seed sets.

V-D1 Performance on Earned Benefit

First row of Figure 1 shows the budget vs. earned benefit plot for the ‘email-Eu-core’ dataset. Based on the random and degree proportional setting, the maximum benefit that can be earned is and , respectively. From the results, it is observed that there is a gap in the earned benefit between the existing methods and the methods proposed in this paper. The gap is even significant in tri-valency setting compared to the uniform setting. As an example, in uniform setting () with random cost and benefit assignment for , among the existing methods from the literature, the seed set selected by ComPBRA leads to more earned benefit and the amount is , which is of the maximum possible. On the other hand, among the proposed methodologies, the seed set selected by the IGAAG leads to more amount of earned benefit , which is of maximum possible. In degree proportional setting, for , in tri-valency setting among the existing methods, the seed set selected by both PMIA and ComPBRA leads to the earned benefit of ( of the maximum possible), whereas the same for both IGAAG and IGAIP is ( of the maximum possible).

Next, we report the results for the ‘Facebook’ dataset in the second row of Figure 1. In this dataset also, we observe that the seed set selected by the proposed methodologies leads to more earned benefit compared to the existing methods. As an example, when the budget value is , under random cost and benefit with tri-valency setting, among the existing methods the seed set selected by ComPBRA leads to the earned benefit of . However, the same due to the seed set selected by the proposed hop-based heuristic is , which is almost more. Now, under the degree proportional cost and tri-valency setting, when the budget value is , among the existing methods the earned benefit due to the seed set by ComPBRA is , and the same by the hop-based heuristics is .

(1UR) (1TR) (1UD) (1TD)
(2UR) (2TR) (2UD) (2TD)
(3UR) (3TR) (3UD) (3TD)
(4UR) (4TR) (4UD) (4TD)
Fig. 1: Budget vs. Earned Benefit Plots for different datasets. In the individual figure captions 1, 2, 3, 4 denotes the datasets in which they have been described in Section V-A. U and T denotes the uniform and trivalency probability setting. R and D denotes random and degree proportional cost setting.

Next, we show the results for the ‘Physics Collaboration Network’ dataset in the third row of Figure 1. In this dataset also, we observe a significant gap in the earned benefit between the existing methods and the methods proposed in the literature. The gap is more in case of tri-valency setting. As an example, for , under random cost with uniform influence probability setting, among the existing methods, the seed set selected by PMIA leads to maximum amount of earned benefit which is and the same by the hop-based heuristic is . In tri-valency setting, for , the seed set selected by SIN_DIS leads to the earned benefit, which is equal to , and the same obtained by hop-based heuristic is . This is approximately more compared to the SIN_DIS method.

Next, we report the results for the ‘Epinions’ dataset in the last row of Figure 1. In this dataset also, we observe a significant difference between the earned benefit due to the seed sets selected by the baseline methods and the methods proposed in this paper. As an example, for the highest budget (), in uniform setting under random cost and benefit assignment seed set selected by the ComPBRA leads to the earned benefit of worth , and the same in case of the ‘hop-based heuristic’ is , which is almost more compared to the ComPBRA. Similarly, in degree proportional setting under tri-valency diffusion model, the seed set selected by the ComPBRA leads to the earned benefit of amount and the same for the ‘hop-based heuristic’ is , which is almost more compared to the ComPBRA.

From the results, it is observed that the seed set selected by the proposed methodologies can lead to more amount of earned benefit compared to the existing methods considered in this paper. Next, we report the computational time of the proposed and baseline methods.

V-D2 Computational Time

Table II reports the computational time required for selecting the seed sets by different methodologies. From the reported results, it is observed that though the IGAAG method can achieve an approximation guarantee, the computational time requirement is quite impractical. However, the IGAIP method overcomes this issue by improving it upto times faster compared to IGAAG. However, the hop-based heuristic is much more efficient and also scalable compared to both IGAAG and IGAIP, while achieving the almost similar amount of earned benefit, in some instances even more.

Among the baseline methods, the MAX_DEG is the fastest one, as it returns the high degree nodes within the budget. The DEG_DIS and SIN_DIS methods take more time compared to the MAX_DEG method. Among the existing methods, PMIA is seen to be the fastest.

Now, in real-life applications of this problem, such as ‘computational advertisement’, ‘viral marketing’ etc. from the advertisers point of view, the main priority will be the earned benefit. However, the methodology used for seed set selection purpose should be able to perform this task with a reasonable computational time. From the experimental evaluation, it is established that among the proposed methodologies, the hop-based heuristic is far ahead compared to the existing methods.

Dataset Budget Computational Time (in Seconds)
IGAAG IGAIP HBH MAX_DEG DEG_DIS SIN_DIS PMIA ComPBRA
Email 2000 6.2351 35.5362 0.0614 0.0253 0.0293 0.2825 0.2671 0.1667
4000 6.4995 61.2136 0.2222 0.0269 0.0358 0.4567 0.4988 0.3911
6000 6.4995 60.8463 0.2327 0.0294 0.0459 0.2289 1.0202 0.6006
8000 6.6856 63.1582 0.2826 0.0328 0.0830 0.5065 1.1021 0.7407
10000 6.8265 66.7364 0.4451 0.0365 0.1216 0.4219 1.2865 0.9168
12000 7.0004 79.1924 0.7923 0.0416 0.1461 0.6986 1.3761 1.0451
14000 7.3358 94.2375 1.2280 0.0474 0.1669 0.5504 1.4902 1.1184
16000 7.5138 110.7938 1.5740 0.0548 0.1872 0.5643 1.9567 1.2251
Facebook 2000 9.3518 57.5381 0.5593 0.1270 0.1371 0.1779 0.5124 0.3252
4000 1.0031 72.1473 2.7052 0.1291 0.