I Introduction
Social Networks are an interconnected structure among a group of agents [1]. These are effective platforms, where wordofmouth effect happens at a large scale and information, ideas, rumors etc. disseminates widely and rapidly [13], [25]. This phenomenon has been exploited by the ECommerce houses for promoting their brands among people [14], [2]. The key problem is that which users initially to choose for initiating the diffusion process such that the influence in the network gets maximized. Formally this problem is called as the Social Influence Maximization Problem [23]. Due to the wider applications in different domains such as viral marketing [11], social recommendation [51], market basket analysis [35], prediction of hot topics [20] this problem has been studied in different variations. Please, look into [31] [7] for recent surveys. The social influence happens due to the cascading process in the underlaying network [33, 18], and this has huge impact, because human decisions from personal (which place to visit and which restaurant to explore? ) to political (which political party to vote in the coming election?) are influenced by their neighbors, at least to some extent. To study the diffusion process in a social network several diffusion models have been studied. Please look into [29] for recent survey.
One of the recently introduced variants of the SIM Problem is the problem of Budgeted Influence Maximization [39]. This problem assumes that users of the network have nonuniform selection cost, which signifies the amount of incentive need to be paid if a user is selected as a seed node. A fixed amount of budget is allocated for the seed set selection process, and the job is to choose highly influential seed nodes within the budget to maximize the influence. There are a few solution methodologies available in the literature for this problem, such as directed acyclic graphbased heuristic by Nguyen et al. [39], Sample Average Aggregate Scheme by Guney et al. [19], ComBIM by Banerjee et al. [4]. In all these studies, it is implicitly assumed that the influencing each user is equally important, though commercial campaigns are targated in nature, which means a specific brand is to be advertised towards a specific set of users. Because, advertising a brand towards a set of people who do not have any interest towards it does not make any sence. On the other hand, there are some studies in the literature that consider the target user in the influence maximization process [32] [34] [37]. To the best of our knowledge, none of the targeted influence maximization studies considers nonuniform selection cost of the users.
In target advertisement scenarios, influencing different target user leads to the different amount of benefit and from the advertisers perspective, the main goal is to maximize the total earned benefit. Motivated by this practical scenario, we study the Earned Benefit Maximization Problem (EBM Problem), where the target users are associated with a benefit value, users are associated with a selection cost, and a fixed budget is given. The goal is to select a seed set within the budget to maximize the earned benefit. There are previous studies on this problem by us. In [6], we came up with a integer programming formulation for this problem. In [3, 5], we proposed a ranking approach with the exploitation of community structure for this problem. However, in this study our approach to this problem is very different. We start by studying the properties of the earned benefit function and propose a number of solutions followed by experiments. Particularly, we make the following contributions in this paper.

We extend the BIM Problem by considering the notion of target users with nonuniform benefit values and propose the ‘Earned Benefit Maximization Problem’.

For the EBM Problem, we propose an ‘incremental greedy strategy’ and show with minor modification, this methodology leads to factor approximation gurrantee on the earned benefit.

We show that the benefit function is monotone and submodular under IC Model of diffusion and exploit this property to improve the efficiency of the incremental greedy algorithm.

Using the concept of ‘expected earned benefit’ of a node, we propose an efficient hopbased heuristic for solving this problem.

We conduct a set of extensive experiments with four realworld publicly available social network datasets for showing the effectiveness and efficiency of the proposed methodologies.
Rest of the paper is organized as follows. Section II describes some of the recent studies from the literature. Section III presents preliminary definitions, formally defines the EBM Problem and state its hardness result. The proposed methodologies for solving this problem have been described in Section IV. Section V contains the experimental evaluations of the proposed methodologies, and finally, in Section VI, we conclude this study and give the future directions.
Ii Related Work
In this section, we present some closely related studies from the literature. This study closely related with the SIM Problem and its variants and more particularly for the targeted users.
Social Influence Maximization and Its Variants
Given a social network of users, which nodes should be chosen for initially injecting the information that causes the maximum influence in the network? This problem is known as the social influence maximization. Initially, this problem was identified in the context of viral marketing by Damingos and Rechardson [41]. However, Kempe et al. [23] were the first to investigate the computational issues of this Problem and proved that it is NPHard and proposed an incremental greedy algorithm, which admits factor approximation ratio. Their study triggers a vast amount of research on the SIM Problem and hence, a plenty of solution methodologies are available in the literature, such as CostEffective Lazy Forward (CELF) [27], CELF++ [16], SIMPATH [17], TwoPhase Influence Maximization () [46], Influence Maximization Via Martingales (IMM) [45],
Influence Ranking and Influence estimation
(IRIE) [22], different communitybased solution methodologies [42] [30], different nontraditional optimization algorithms such as genetic algorithm [53],discrete particle swarm optimization
[48] and many more. Also, there are several variants of this problem studied in the literature, such as coverage problem [36], budgeted influence maximization problem [39, 4] and many more.Social Influence Maximization for the Targeted Users
Recently, the problem of influence maximization for the targeted users in the social network has been addressed by the researchers. Li et al. [32] studied this problem, and they considered target users, who are relevant to a particular keyword. There solution methodology for this problem was based on the construction of reverse influence set and its indexing. Song et al. [43] addressed the targeted influence maximization problem by considering the geographical location of the users and the time deadline within which the users should be influenced. Recently, Wen et al. [50] studied this problem focusing on mainly two issues: how to capture the social influence among the target user and develop an efficient scheme that can offer the wider influence spread among the target users. Wang et al. [49] also solved the same problem by considering the impact of budget on the influence spread and incorporating efficient sampling techniques.
However, to the best of the authors’ knowledge, none of the existing studies on targeted influence maximization problem considers the nonuniform benefit associated with each target users and the nonuniform selection cost of users. In this paper, we study the Earned Benefit Maximization Problem, where the target users are associated with nonuniform benefit value and nonuniform selection cost of the users.
Iii Background and Problem Definition
We consider that the social network is represented as a weighted graph , where the vertex set is the set of users of the network and the edge set, represents the set of social ties among the users.
is the edge weight function that assigns each edge to its influence probability, i.e.,
. For any edge , we denote its influence probability as . This signifies the probability that the user will be able to influence . We denote the number of nodes and edges of by and , respectively. Next, we briefly describe the Independent Cascade Model, which we consider as the underlying diffusion model in our study.Iiia The Independent Cascade Model
The Independent Cascade Model is one of the models, which has been predominantly used in influence maximization literature [23, 46, 45]. Here, the diffusion of information starts from a set of nodes selected initially and known as the seed nodes. All the nodes of the network are ignorant of the information and the seed nodes are informed at time . Now, from these seed nodes, information is diffused by the following rules:

information is diffused in discrete time steps,

a node can be either one of the two states: ‘active’ (‘influenced’) or inactive (‘uninfluenced’),

a node can change its state from inactive to active however, not the viceversa.

once a node is influenced, it will remain in this state.
Each active node (say, ) at current time stamp (say ) will get a chance to activate its currently inactive neighbors ( and is inactive) with probability as their edge weight. If any one of them succeeds, then will become an active node at time . Only the recently active node can take part in the triggering process. This process stops, when no more node activation is possible. Next, we introduce the Earned Benefit Maximization Problem.
IiiB The Earned Benefit Maximization Problem
In this problem, along with the social network , a subset of the users is given as the target users. Each of them is associated with a benefit, which can be earned by influencing the corresponding target user. This can be characterized by the benefit function . For any his benefit is denoted as and for any , . For a seed set , the set of nodes influenced by it is denoted by . As the diffusion of information under IC Model is a probabilistic process, the influence of a seed set is measured in terms of expectation. Hence, the number of influenced nodes due to the seed set is , where, is the social influence function [23]. Now, the earned benefit by the seed set is defined as . Here, is the earned benefit function, that maps each subset of the nodes to is expected earned benefit value, i.e., .
In realworld campaigns, earned benefit maximization is done by conducting an information diffusion process. As the reallife social networks are formed by the rational human agents, if a user is selected as seed, incentivization is required. This can be characterized by the cost function . Selection cost associated with the node is denoted as . For a subset of nodes , their selection cost is denoted as , and a fixed amount of budget is allocated for seed set selection. Hence, the problem here is to choose a subset from to maximize the function subject to the constraint . Formally, the problem can be expressed as follows:
Earned Benefit Maximization Problem
Input: Social Network , Target Nodes , Cost Function , Benefit Function , and Budget .
Problem: Find out the seed set () such that and for any other seed set with , .
The EBM Problem is basically the generalization of the BIM Problem, which is NPHard under the IC Model of diffusion [39]. Hence, Theorem 1 holds.
Theorem 1.
The EBM Problem is NPHard under Independent Cascade Model of diffusion.
This result motivates us to design suitable approximation algorithm and heuristic solution for this problem. We discuss them in the next Section.
Iv Proposed Methodology
In this section, we present our proposed methodologies for the EBM problem. Prior to that, we establish two properties of the benefit function, which will be used subsequently.
Iva Properties of the Benefit Function
As mentioned previously, the benefit earned by a given seed set is defined as . So, the benefit function can be thought of a set function, which is defined on the ground set , i.e., . Now, we prove two important properties of the benefit function, namely monotonicity and submodularity. This two properties are exploited for proving the approximation guarantee of Algorithm 2.
Definition 1 (Nonnegativity and Monotonicity of Set Function).
A set function defined over the ground set is said to be nonnegative if , and monotone if , .
Lemma 1.
The benefit function, is nonnegative and monotone under IC Model of diffusion.
Proof.
It is reported in the literature that the social influence function is nonnegative and monotone [23]. As , , it is trivial to observe that , . By the monotonicity property of , ,
which means is monotone. This completes the proof. ∎
Definition 2 (Submodularity of Set Function).
A set function defined over the ground set is said to be submodular if and , the following condition is met:
(1) 
Lemma 2.
The benefit function, is submodular under IC Model of diffusion.
Proof.
In the literature, it is mentioned that the social influence function, is submodular under the IC Model of diffusion [23]. Let us assume that and . Now, from the definition of , we have
By simple set theoretic interpretation, we can write
Hence,
[This is due to to the monotonicity property of ]
We obtain the inequality required to show submodularity property of . This completes the proof. ∎
IvB Incremental Greedy Algorithm
Let us assume be the seed set and . We define the marginal gain in benefit for the node with respect to the seed set as the amount of increased benefit when the node is included in the seed set . Formally, it is stated in Definition 3.
Definition 3 (Marginal Gain in Earned Benefit).
Given a seed set and a node , which is currently not in the seed set, i.e., , its marginal gain in the earned benefit with respect to the seed set is denoted as and defined as
(2) 
The working principle of the proposed incremental greedy strategy is as follows. Starting with an empty seed set, this procedure incrementally selects a node within the budget that causes the maximum marginal gain per unit cost. Let and denote the seed set and remaining budget at the end of th iteration. In the th iteration, the node is added in the seed set , i.e., , if the following condition is met.
(3) 
In an iteration, if the no seed node is selected within the remaining budget, then is null and if this happens, then the procedure is exiting. Algorithm 1 states the procedure.
Though the Algorithm 1 is simple to understand, it does not give any bounded approximation guarantee on the earned benefit and we demonstrate this claim with an example.
Example 1.
Let us assume, a network with nodes , where is an isolated node and the remaining nodes connected within themselves with each edge having the diffusion probability . The entire vertex set of the network is the target node set, i.e., . Benefit associated with each target node is . For each , its associated selection cost is and the selection cost of is , where . The allocated budget for the seed set selection process is . The optimal algorithm for this problem should select any node and achieve the earned benefit of amount by influencing all the remaining nodes. However, as Algorithm 1 selects the seed node based on the marginal gain in the earned benefit per unit cost, it will select the node and not any . For the node , the value of , when , is . On the other hand, for any , the value of is . As , Algorithm 1 selects the node . After selecting the node , the remaining budget will be , which is less than . Within this budget none of the nodes can be selected as each of them has the selection cost . Hence, Algorithm 1 terminates by earning the benefit and returning an unutilized budget of amount . The approximation ratio of the Algorithm 1 is define as
In this example the value of is . If the value of is arbitrarily large, then the approximation ratio of Algorithm 1 becomes very very less. Hence, Theorem 2 holds.
Theorem 2.
Algorithm 1 does not provide any constant approximation guarantee.
Now, we present two important inequities on the iterative performance of Algorithm 1 and this result will be used subsequently.
Lemma 3.
After each iteration of the ‘while’ loop , the following inequality always holds
(4) 
Proof.
The value of is no more than the sum of the benefit values of the target nodes that are influenced by the seed nodes in , however, not by the nodes in . For each node in , the earned benefit to cost ratio could be at most , where is the earned benefit by the nodes in but not in . This is because maximizes this ratio in Algorithm 1. Since the total selection cost of the nodes in is bounded by the budget , the total earned benefit due to the target nodes in can be at most . Hence, we have
(5) 
By definition, we have
(6) 
From the Equations (5) and (6), we have
This completes the proof. ∎
Lemma 4.
In any arbitrary iteration of the While loop from Line to of Algorithm 1, the following condition will be true
Proof.
We prove this statement by the method of induction on the iteration of the ‘while’ loop. For the first iteration, i.e., at , we need to show,
From Lemma 3, by putting in Equation (4) we have,
As we are starting with an empty seed set, hence and . This clearly implies that .
Now, suppose the statement holds till iteration. We show that the statement holds in the iteration as well. Now,
Here, the first inequality is due to Lemma 3 and the second one is due to inductive hypothesis. ∎
Algorithm 1 can be modified for yielding a constant approximation ratio on the earned benefit. Let be the seed set generated by the Algorithm 1. be the node that has the highest individual benefit gain. We compare the earned benefit, when the seed set is and the node is . We return the seed set that maximizes the earned benefit. Algorithm 2 formally states the procedure.
Theorem 3.
Proof.
The strategy of this proof has been used previously for proving the approximation bound of the Budgeted Maximum Coverage Problem by Khuller et al. [24]. Here, we prove the statement by casewise analysis of Algorithm 2.
Case I
If there exists one node , which has the earned benefit , and is found to be greater than equal to , then will be selected as in Algorithm 2. In this case, the approximation ratio of Algorithm 2 will be as follows:
Case II
If Case I does not happen, then there does not exist any , for which is greater than . This can be divided into two subcases.
Case IIa
Now, if we have , then , . Hence, no more node can be added to . Otherwise, the budget constraint will be violated. Without the loss of generality, let us assume that . In this case, can contain one extra node without violating the budget constraint. Now, as the function is submodular, hence,
As , . This clarifies that . As and , . This essentially means . In this case, the approximation ratio of the Algorithm 2 will be as follows:
Case IIb
If , we first observe that for real numbers and , the function attains its maximum value, when . Hence, by Lemma 4, we have
Hence, the worst case performance guarantee of Algorithm 2 is . This proves the statement. ∎
Now, we investigate the time requirement of Algorithms 1 and 2. For both of them, it is easy to observe that the time requirement is heavily dependent on the earned benefit calculation for a given seed set. It is reported in the literature that counting the number of influenced nodes for a given seed set is problem [23]. With this argument, we can say that for a given seed set , computing the exact value of the earned benefit is also . Hence, we estimate this value, the way influence of a seed set is estimated [23]. First, a number (say ) of sampled graphs of , i.e., are generated, and for all , for all , with probability and with probability . Now, earned benefit is computed in all of the sampled graphs and the average value is returned as its approximate value, which is given in Equation 7
(7) 
If , then traversing subgraphs will require time. Let, be the minimum selection cost among all the users. Maximum number of possible iterations of the While loop (Line 2 to 9) in Algorithm 1 is . Hence, . In Algorithm 1, in each iteration, maximum number of times earned benefit estimations are done is of . Hence, the total number of times earned benefit estimations are of . The required computational time for Algorithm 1 is .
In Algorithm 2, along with the incremental greedy strategy, the node, which can grab the maximum earned benefit has to be found out (Line 3 of Algorithm 2). This can be done earned benefit estimations with a single seed node, and this will take time. Hence, running time of Algorithm 2 is of . If we do the online sampling of the input social network for sampled graph generation, then only one subgraph is required per iteration. For storing, this network will take space. Storing the seed set requires space. Hence, the total amount of space required by both Algorithms 1 and 2 is and the number of seed nodes is generally found to be much much less than the number of nodes, i.e., . Hence, . Hence, Theorem 4 holds.
IvC Improving the Efficiency of Algorithm 2
Though Algorithm 2 provides a provable approximation bound on the earned benefit, it is highly inefficient, as it estimates the earned benefit many times. Here, we present an improvised version of Algorithm 2 in Algorithm 3 by removing redundant earned benefit estimations due to the exploitation of the submodularity property of the earned benefit function.
In Lemma 2, it has been shown that the earned benefit function is submodular and this implies that the marginal gain in earned benefit for a nonseed node (say ) with respect to the seed set in th iteration () will always be more than that of with respect to the seed set in th iteration. In Algorithm 3, in the first iteration of the while loop (Line ), the earned benefit by the nodes in individually is computed, sorted them in descending order, and put the node with the highest individual earned benefit in the seed set. Now, in the next iteration on words, during the computation of the marginal gain of the nonseed nodes in descending order of their marginal earned benefit, as soon as we get a node, whose marginal gain in the current iteration is more than that in the previous iteration of the next node in the sorted list, then we include the first node and move to the next iteration. This is because, as the benefit function is submodular, even if we compute the marginal gain, earned benefit is computed for the second and the subsequent nodes, it cannot be more than the values in the previous iteration. This process is iterated, until the budget is exhausted. One important point to observe here is that, escaping the unnecessary benefit function evaluation does not result in loosing approximation guarantee in the quality of the selected seed set. This exploitation of the submodularity property results in significant improvement in the efficiency of our proposed methodology, as we observe in our experiments.
IvD Efficient Heuristic Solution
Though Algorithm 3 is quite efficient, it is not enough to deal with large reallife social networks. Here, we propose an efficient heuristic solution for the EBM Problem. Before stating the procedure, first we state one important aspect of social influence. In social networks, influence of a node is bounded within to hops, which is called as the influence zone of a node [44] [10]. According to Goel et al. [15], in a diffusion cascade, less than of the influenced nodes resides more than hop count from any seed node. These existing results reported in the literature motivate us to design algorithm considering the locality of influence effect. Based on this principle, to influence a target node, there should be at least one seed node within a few hop count. In this context, we define hhop neighbor of a node as follows:
Definition 4 (hop Neighbor).
For a node , its hop neighbor is defined as the set of nodes that are at most at a distance of from and denoted as , i.e., .
Among the nodes present in hop neighbor set of a target node, there can be many nodes, whose influence probability to the target node is extremely low. Hence, those nodes probably cannot be able to influence the target node. To identify such nodes, it is important to compute the influence probability. For a target node , here we describe the procedure for computing , where . We construct the breadth first search tree upto depth rooted at the node . Now, for any node other than root of the tree, the value of can be be computed by the following equation:
(8) 
In Equation (8), the value of can be recursively computed, until the child of the currently processing node is . For details, please look into [44]. Now, it is easy to identify among the nodes in the which are effective for influencing the target user . Here, we define the Effective hop neighbors as follows.
Definition 5 (Effective hop Neighbors).
Given a target node and an , the effective hop neighbor(s) of is a subset of its hop neighbors and denoted as . For , the node is an effective hop neighbor of the node , if , i.e., .
For any node say , the main criterion to be included in the seed set is how much benefit it can earn. If the node is one of the target nodes, then the benefit associated with this node is surely be earned and along with this, if there are some target nodes (say ) within a few hop distance, benefit corresponding to that node may be earned, however, it depends upon the influence probability . Now, we define the earned benefit of a node as follows.
Definition 6 (Earned Benefit of a Node).
For a node , its earned benefit is defined as the amount of benefit that can be earned by including this node in the seed set. It has two components. One is the direct benefit due to this node. The other one is the expected benefit due to influencing nearby target nodes. Mathematically, it can be expressed as follows:
(9) 
There are two components in the right hand side of Equation (9). The first part is due to the benefit associated with this particular node and the second part signifies the ‘expected earned benefit’, i.e., the expected benefit due to the influence of the nodes within the few of distance of the node under consideration.
Now, we describe the hopbased heuristic for solving the EBM Problem. First, we create an array for storing the expected earned benefit of each individual node and initialized with for nontarget nodes and associated benefit value for the target nodes (from Lines to of Algorithm 4). Then, for a target node, we compute the effective hop neighbors (from Lines to ). Then, for each of these nodes, we compute the expected benefit that can be earned by influencing the target node and sum it up. This process is repeated for each of the target nodes. Next, we divide the earned benefit of each target node by its selection cost and sort the nodes in descending order based on this earned benefit value. Finally, we choose the seed node from this sorted list until the budget is exhausted. Algorithm 4 describes this procedure.
Now, we analyze the time and space requirement of Algorithm 4 by assuming it as a sparse and regular graph. For initializing the array requires time (Line to ). Now, for a target node in a regular graph, number of nodes and edges within the hop is . Hence, performing breadth first search from upto depth requires time. For computing the influence probability from each node to , i.e., and comparing with requires time. In the worst case, all the hop neighbor nodes may be the effective hop neighbor nodes. Then, for computing the earned benefit by influencing the target node requires time. The same process is iterated over all the target nodes. Hence, the execution from Lines to of Algorithm 4 requires . Dividing the earned benefit by the corresponding selection cost requires time (Line to ). Sorting the nodes based on this value requires time. Now, scanning the sorted list for selecting the seed nodes requires time. Hence, total computational time of Algorithm 4 is , which is equivalent to . Other than the input social network, additional space requirements due to storing the earned benefits, influence probability and seed set which is of , , and , respectively. Hence, the total space requirement of Algorithm 4 is of . The formal statement is stated in Theorem 5.
Theorem 5.
Algorithm 4 has the running time of and space requirement of .
V Experimental Evaluation
In this section, we report the experimental evaluation of our proposed methodologies. Initially, we start with a brief description of the datasets.
Va Dataset Description
In our experiments, we use the following four publicly available social network datasets.

EmailEucore network Dataset ^{1}^{1}1http://snap.stanford.edu/data/emailEucore.html [52], [26]: The network is generated based on the email exchanges among different departments from a large European research institution. There is an edge between the users and , if there is an email exchange between them.

Facebook Network Dataset ^{2}^{2}2http://snap.stanford.edu/data/egoFacebook.html [28]: This dataset was collected from survey participants using a Facebook app. Each user of the network is represented by a node, and two vertices are connected by an edge, if the corresponding users are friend of each other in Facebook.

Physics Network Dataset ^{3}^{3}3https://arxiv.org/: This is an academic collaboration network among the researchers of physics section of arxiv.org. Two users are connected by an undirected edge, if they are coauthor in atleast one paper.

Epinions ^{4}^{4}4http://www.epinions.com/?sb=1 [40]: This is a whotrustwhom online social network of a general consumer review site: Epinions.com. There is a directed edge from the user to , if the user trusts .
Among them, the first, second and fourth one are downloaded from Stanford Social Network Analysis http://snap.stanford.edu/data/index.html and the third one is from https://www.microsoft.com/enus/research/people/weic/#!selectedprojects. Here, we give a brief description of each of the datasets.These datasets have been extensively used in social influence maximization research [11]. Table I gives a basic statistics of the described datasets.
Dataset Name  Avg Deg  Avg Clus Coeff  

EmailEucore network  1005  25571  25.443  0.3994 
Facebook Dataset  4039  88234  43.6910  0.6055 
PHY Network  37154  231584  12.466  0.2371 
Epinions  75879  508837  15.6345  0.1378 
VB Parameter Settings
VB1 Diffusion Probability
In this paper, we consider the following two diffusion probability settings.

Uniform Setting: In this setting, , and . We set the value of as . This value has also been used in the literature, in many studies [16].

Trivalency Setting: In this setting, each edge is assigned diffusion probability uniformly at random from the set .
On the other hand,
VB2 Target Nodes
In this study, we select of the nodes as target nodes and they are chosen uniformly at random. We adopt this settings from [38].
VB3 Cost and Benefit
In this study, we follow two different settings.

First one is the random setting, where the selection cost of the nodes and the earned benefit of the target nodes are selected from the intervals and , respectively, uniformly at random. We adopt this setting from [39] and call it the random setting.

Secondly, the influence ability of a node is directly proportional to its degree. Naturally, the selection cost of a node should be proportional to its degree. We adopt another settings from [38]. By this setting, we compute the selection cost of the node is computed as
(10) and in this case, the benefit of each target node is considered as . We call this setting as the ‘degree proportional’ setting.
VB4 Budget
In our study, based on the two different cost assignment settings, we adopt two different budget settings. In case of random setting, we consider the budget values starting from continued till , and each time is incremented by . In the ‘degree proportional setting’, we start with the budget value of and continued until with a gap of .
VB5 Hop Count and Cut Off Probability
VC Algorithms in the Experiment
Here, we have listed out the algorithms that we have listed out for the experimentation.
VC1 Algorithms proposed in this paper

HopBased Heuristic (HBH): This is the Algorithm 4 of this paper, which works based on the computation of expected earned benefit of the nodes that are within the hop (for a given value of ) of the target nodes.
VC2 Baseline Algorithms

Maximum Degree Heuristic (Max_DEG): In this method, the maximum degree nodes within the budget is returned as the seed set. This method has been used in previous studies as well [23].

Degree Discount Heuristic (DEG_DIS): This is a popular heuristic for the SIM Problem proposed by Chen et al. [12]. In this heuristic, if is a seed node and , then the degree of will be discounted by , where is the number of neighbors of currently in the seed set, and is the degree of . This method has been used in many previous studies [21].

ComPBRA: This is a recently developed communitybased solution framework for the EBM Problem developed by Banerjee et al. [5].
All the algorithms have been implemented in ‘Python 3.4’ along with ‘NetworkX 1.9.1’. We have carried out all the experiments in a high performance computing cluster with nodes and each of them having cores and of RAM running in Centos environment. As, the Algorithm 2 (IGAAG) is quite inefficient, we don’t execute this on the larger datasets (e.g., Physics Network Dataset, Epinions).
VD Experimental Results and Discussion
The main goal of our experimentation is to make a comparative study of the proposed as well as baseline methods in terms of performance. It is measured as the amount of earned benefit obtained by influencing the target users due to the initial activation of the seed nodes selected by different algorithms. We also report the computational time requirement by different algorithms for selecting the seed sets.
VD1 Performance on Earned Benefit
First row of Figure 1 shows the budget vs. earned benefit plot for the ‘emailEucore’ dataset. Based on the random and degree proportional setting, the maximum benefit that can be earned is and , respectively. From the results, it is observed that there is a gap in the earned benefit between the existing methods and the methods proposed in this paper. The gap is even significant in trivalency setting compared to the uniform setting. As an example, in uniform setting () with random cost and benefit assignment for , among the existing methods from the literature, the seed set selected by ComPBRA leads to more earned benefit and the amount is , which is of the maximum possible. On the other hand, among the proposed methodologies, the seed set selected by the IGAAG leads to more amount of earned benefit , which is of maximum possible. In degree proportional setting, for , in trivalency setting among the existing methods, the seed set selected by both PMIA and ComPBRA leads to the earned benefit of ( of the maximum possible), whereas the same for both IGAAG and IGAIP is ( of the maximum possible).
Next, we report the results for the ‘Facebook’ dataset in the second row of Figure 1. In this dataset also, we observe that the seed set selected by the proposed methodologies leads to more earned benefit compared to the existing methods. As an example, when the budget value is , under random cost and benefit with trivalency setting, among the existing methods the seed set selected by ComPBRA leads to the earned benefit of . However, the same due to the seed set selected by the proposed hopbased heuristic is , which is almost more. Now, under the degree proportional cost and trivalency setting, when the budget value is , among the existing methods the earned benefit due to the seed set by ComPBRA is , and the same by the hopbased heuristics is .
(1UR)  (1TR)  (1UD)  (1TD) 
(2UR)  (2TR)  (2UD)  (2TD) 
(3UR)  (3TR)  (3UD)  (3TD) 
(4UR)  (4TR)  (4UD)  (4TD) 
Next, we show the results for the ‘Physics Collaboration Network’ dataset in the third row of Figure 1. In this dataset also, we observe a significant gap in the earned benefit between the existing methods and the methods proposed in the literature. The gap is more in case of trivalency setting. As an example, for , under random cost with uniform influence probability setting, among the existing methods, the seed set selected by PMIA leads to maximum amount of earned benefit which is and the same by the hopbased heuristic is . In trivalency setting, for , the seed set selected by SIN_DIS leads to the earned benefit, which is equal to , and the same obtained by hopbased heuristic is . This is approximately more compared to the SIN_DIS method.
Next, we report the results for the ‘Epinions’ dataset in the last row of Figure 1. In this dataset also, we observe a significant difference between the earned benefit due to the seed sets selected by the baseline methods and the methods proposed in this paper. As an example, for the highest budget (), in uniform setting under random cost and benefit assignment seed set selected by the ComPBRA leads to the earned benefit of worth , and the same in case of the ‘hopbased heuristic’ is , which is almost more compared to the ComPBRA. Similarly, in degree proportional setting under trivalency diffusion model, the seed set selected by the ComPBRA leads to the earned benefit of amount and the same for the ‘hopbased heuristic’ is , which is almost more compared to the ComPBRA.
From the results, it is observed that the seed set selected by the proposed methodologies can lead to more amount of earned benefit compared to the existing methods considered in this paper. Next, we report the computational time of the proposed and baseline methods.
VD2 Computational Time
Table II reports the computational time required for selecting the seed sets by different methodologies. From the reported results, it is observed that though the IGAAG method can achieve an approximation guarantee, the computational time requirement is quite impractical. However, the IGAIP method overcomes this issue by improving it upto times faster compared to IGAAG. However, the hopbased heuristic is much more efficient and also scalable compared to both IGAAG and IGAIP, while achieving the almost similar amount of earned benefit, in some instances even more.
Among the baseline methods, the MAX_DEG is the fastest one, as it returns the high degree nodes within the budget. The DEG_DIS and SIN_DIS methods take more time compared to the MAX_DEG method. Among the existing methods, PMIA is seen to be the fastest.
Now, in reallife applications of this problem, such as ‘computational advertisement’, ‘viral marketing’ etc. from the advertisers point of view, the main priority will be the earned benefit. However, the methodology used for seed set selection purpose should be able to perform this task with a reasonable computational time. From the experimental evaluation, it is established that among the proposed methodologies, the hopbased heuristic is far ahead compared to the existing methods.
Dataset  Budget  Computational Time (in Seconds)  

IGAAG  IGAIP  HBH  MAX_DEG  DEG_DIS  SIN_DIS  PMIA  ComPBRA  
2000  6.2351  35.5362  0.0614  0.0253  0.0293  0.2825  0.2671  0.1667  
4000  6.4995  61.2136  0.2222  0.0269  0.0358  0.4567  0.4988  0.3911  
6000  6.4995  60.8463  0.2327  0.0294  0.0459  0.2289  1.0202  0.6006  
8000  6.6856  63.1582  0.2826  0.0328  0.0830  0.5065  1.1021  0.7407  
10000  6.8265  66.7364  0.4451  0.0365  0.1216  0.4219  1.2865  0.9168  
12000  7.0004  79.1924  0.7923  0.0416  0.1461  0.6986  1.3761  1.0451  
14000  7.3358  94.2375  1.2280  0.0474  0.1669  0.5504  1.4902  1.1184  
16000  7.5138  110.7938  1.5740  0.0548  0.1872  0.5643  1.9567  1.2251  
2000  9.3518  57.5381  0.5593  0.1270  0.1371  0.1779  0.5124  0.3252  
4000  1.0031  72.1473  2.7052  0.1291  0. 
Comments
There are no comments yet.