1 Introduction
The Influence Maximization Problem aims at identifying a small set of highly influential users such that their initial activation leads to the maximum number of influential nodes in the network Kempe et al. (2003). This is perhaps one of the most well studied problem in computational social network analysis due to its practical applications in different domains including crowedsourcing Hossain (2012), viral marketing Domingos (2005), computational advertisement Aslay et al. (2015), recommender systems Ye et al. (2012) etc. This problem was initially studied by Kempe et al. (2003) and since then there was an extensive effort to study in and around of this problem. Look into Banerjee et al. (2020); Li et al. (2018) for recent surveys on this topic. The main cause of influence is the diffusion of information and it is basically a cascading effect by which information propagates from one part of the network to the other. To study this process, several diffusion models have been introduced, such as Independent Cascade Model (IC Model), Linear Threshold Model (LT Model), Maximum Influence Arborences Model (MIA Model) and many more Li et al. (2017).
In reality, social networks are formed by rational human agents. This means that if an user is selected as a seed then incentivization is required. However, the basic influence maximization problem assumes that the every users of the network has equal selection cost. though it may not be so in reality. To bridge this gap, Nguyen and Zheng (2013) introduced the ‘Budgeted Influence Maximization Problem’, where the users of the social network are associated with a nonuniform selection cost and a fixed budget is allocated, the goal here is to select a subset of nodes within the allocated budget such that their initial activation leads to maximum number of influenced nodes. As compared to the basic influence maximization problem, existing literature of this problem is very very limited. In this paper, we study this problem under the cooperative game theoretic framework.
Game Theory, which is basically a mathematical study of strategic interaction among a group of rational agents, has been used to solve many problems in the domain of social network analysis such as community detection Chen et al. (2010a), opinion dynamics Ding et al. (2010), leader selection Zimmermann and Eguíluz (2005), rumor dissemination Kostka et al. (2008), influence maximization Clark and Poovendran (2011) and many more. However, to the best of our knowledge, the BIM Problem has not been studied yet under the game theoretic framework. In this paper, we study the BIM Problem under cooperative game theoretic framework and our study is motivated by the work Narayanam and Narahari (2010). However, there are the following fundamental differences:

Narayanam and Narahari (2010)’s study is concentrated on the basic influence maximization problem and the nonuniform nature of the selection cost has not been taken into account, which is a practical concern.

In their study they have used IC and LT as the underlying diffusion model. However, recently there are several studies on influence maximization that considers MIA Model as the underlying model of diffusion Ke et al. (2018). In this study also, we have used the MIA as the underlying diffusion model.

It is well known that the complexity of computing the shapley value of a player cooperative game is of Narahari (2014). Even in case of a small size social network (e.g., number of nodes are ) this is a huge computational burden. To get rid of this, they have randomly chosen linear number of permutations of the players for computing shapley value Narayanam and Narahari (2010). However, with the help of an existing result by Maleki et al. (2013), we use the appropriate number of samples such that shapley value can be computed with bounded error with high probability.
In particular, we make the following contributions in this paper:

We formulate a cooperative diffusion game for the Budgeted Influence Maximization Problem, and design an iterative algorithm based on the solution concept of a cooperative game known as Shapley value for identifying the influential nodes.

We study the important properties of the formulated game and detailed complexity analysis of the proposed methodologies have also been done.

We also show that if we consider the community structure of the network, then the proposed methodology can leads to even more number of influenced nodes.

The proposed methodologies have been with four publicly available social network datasets and an extensive set of experiments have been carried out to demonstrate the effectiveness of the proposed methodology.
Rest of the paper is organized as follows: Section 2 describes relevant studies from the literature. Section 3 describes required preliminary concepts and define the diffusion game formally. The proposed solution methodologies have been described in Section 4. Section 5 contains the experimental evaluation, and finally, Section 6 concludes our study and gives future directions.
2 Related Work
Our study is closely related to the ‘Influence Maximization in Social Networks’, more particularly the Budgeted Influence Maximization Problem, and also game theoretic solution methodologies for the influence maximization problem. Here, we present relevant studies from the literature.
Influence Maximization in Social Networks
The problem of influence maximization aims at choosing a small number of highly influential users in a social network for initial activation such that due to the cascading effect of diffusion, the number of influenced node is maximized Kempe et al. (2003). Domingos and Richardson (2001); Richardson and Domingos (2002) first introduced this problem for the ‘viral marketing’ in social networks. Later on, Kempe et al. (2003, 2005, 2015) studied the computational issues of this problem and showed that the problem is Hard under the Independent Cascade and Linear Threshold Model of diffusion. However, they gave a  factor approximation algorithm, which works based on maximum marginal influence gain computation and for any factor inapproximability result. Their study remains an influential one and triggers a huge amount of study in last one and half decades. Solution methodologies can be grouped into different categories, such as ‘approximation algorithms’ such as CELF Leskovec et al. (2007b), CELF++ Goyal et al. (2011a), MIA Wang et al. (2012), PMIA Wang et al. (2012), TIM Tang et al. (2014), IMM Tang et al. (2015)
; heuristic solutions such as
SIMPATH Goyal et al. (2011b), IRIE Jung et al. (2012), LDGA Chen et al. (2010b), CommunityBased Solution Methodologies such as CIM Chen et al. (2012, 2014), ComPath Rahimkhani et al. (2015), INCIM Bozorgi et al. (2016), CoFIM Shang et al. (2017) and many more. Please refer to Banerjee et al. (2020); HAFIENE et al. (2020); Li et al. (2018) (and references therein) for recent surveys.Budgeted Influence Maximization Problem
In case of BIM Problem, along with the input social network, we are also given with a cost function that assigns selection cost to each node and a fixed budget is allocated for the seed set selection. Nguyen and Zheng (2013) first introduced the BIM Problem and proposed a factor approximation algorithm and two efficient heuristics to solve this problem. Recently, Wang and Yu (2020) studied the BIM Problem and proposed a solution methodology that gives an approximation ratio of . They further showed that this can be improved upto . Güney (2019) proposed an integer programmingbased approach to solve this problem under the IC Model of diffusion. Han et al. (2014) proposed a couple of heuristics for this problem that carefully considers both cost effective nodes and influential nodes. Recently, Banerjee et al. (2019) proposed a communitybased solution methodology for the BIM Problem which is broadly divided into four steps, namely, community detection, budget distribution, seed node selection, and budget transfer. Shi et al. (2019) proposed two different solution methodologies with data dependent approximation ratio. Yu et al. (2018) studied this problem under credit distribution model and came up with a streaming algorithm with approximation of the optimum.
Game Theoretic Solution Methodologies for SIM and Related Problems
Game theoretic techniques have been used to solve the influence maximization problems since last one decade. To the best of our knowledge, the first study in this direction was by Narayanam and Narahari (2010). They proposed a ‘Shapley Value’based approach for selecting the seed nodes for the influential maximization problem. Clark and Poovendran (2011) studied the influence maximization problem in competitive situation and they formulated a Stackleberg Game and proposed a methodology to solve the game. Borodin et al. (2010) studied the influence maximization problem in competitive situation with several natural extensions of the LT Model. Angell and Schoenebeck (2017) studied the influence maximization problem and proposed a greedy heuristic that shows an improvement of and for submodular and nonsubmodular influence functions, respectively. Li et al. (2015) study the influence maximization problem in competitive situation and proposed a game theoretic framework for this problem. They designed a noncooperative game and proposed an algorithm that computes the ‘Nash Equilibrium’ of the game that guarantees optimal strategies. Recently, Wang et al. (2020)
proposed an cumulative oversampling technique for ‘Thompson Sampling’ to construct optimistic parameter estimates with fewer samples. They showed that their learning algorithms can be used to solve the BIM Problem using less number of samples.
To the best of our knowledge, there does not exist any literature that studies BIM Problem under game theoretic setting. In this paper, we study this Problem under CoOperative Game Theoretic Framework. In particular, we propose an iterative approach based on the Shapley Value for selecting the influential nodes for the BIM Problem. We also show that the exploitation of community structure helps the proposed methodology to achieve even better influence spread.
3 Preliminary Concepts, Background, and Problem Definition
Here, we present some preliminary definitions. Initially, we start with the social network specific ones.
3.1 Social Network Specific
Definition 1 (Social Network).
A social network is an interconnected structure among a group of human agents formed for social interactions, which is often represented as a graph . Here, the vertex set are the set of users, the edge set are the social ties among the users, and is the edge weight function that assigns each edge to its influence probability.
We denote the number of nodes and edges of the network by and , respectively. For any positive integer , denotes the set . For any , we denote the influence probability assigned to it as or . If, then . Now, to start a diffusion process in the network, there should be some initial adopters, which we call as seed nodes. To study the diffusion process in a network, several models have been introduced such as IC Model, LT Model, MIA Model and so on Guille et al. (2013). Now, once a diffusion process is initiated from a number of seed nodes, it ends with influencing a subset of the nodes and called as the influence of the seed set, which is defined next.
Definition 2 (Influence of a Seed Set).
Given a social network and a diffusion model for a given seed set , its influence is defined as the number of nodes that are influenced due to the initial activation of the nodes in if the information is diffused by the rule of the model . denotes the the set of influenced nodes by the seed set under the diffusion model . This quantity is measured in terms of expactation. Hence, the influence of the seed set and under the diffusion model is , where is the social influence function, i.e., with . Here, denotes the empty set.
Now, any realworld diffusion processes (such as political campaigns, viral marketing of products etc.) are always carried out to maximize the influence. In this direction, next we state the well studied Social Influence Maximization Problem.
Definition 3 (Social Influence Maximization Problem (SIM Problem)).
Kempe et al. (2003) Given a social network , a diffusion model , and a positive integer (), the social influence maximization problem asks for selecting a seed set with , whose initial activation leads to the maximum number of influenced nodes, if the diffusion process happens by the rule of the model . Mathematically,
(1) 
As mentioned previously, realworld social networks are formed by rational and self interested agents. If a node is selected as seed then some kind of incentivization is required. In reality, different users of the network have different selection cost, i.e., users have non uniform selection cost. However, the SIM Problem assumes that the selection costs are uniform. To bridge this gap, the ‘Budgeted Influence Maximization Problem’ has been introduced by Nguyen and Zheng (2013) which is defined next.
Definition 4 (Budgeted Influence Maximization Problem).
Given a social network along with a cost function that assigns each node to its selection cost, a diffusion model , and a fixed budget , the problem of budgeted influence maximization asks for selecting a seed set within the allocated budget, whose initial activation leads to the maximum number of influenced nodes, if the diffusion process happens by the rule of the model . Mathematically,
(2) 
where, denotes the total selection cost of the users in , i.e., . We denote an arbitrary instance of BIM Problem as .
From the algorithmic point of view the problem can be posed as follows:
[title=Budgeted Influence Maximization Problem, width=12cm] Input: Social Network , Cost Function , Budget , and the Diffusion Model .
Problem: Find out a seed set , such that and is maximized.
Next, we state the Maximum Influence Aroborance (MIA) diffusion model, which we have considered as the underlying diffusion model. Symbols and notations used in this paper is given in Table 1.
Symbol  Meaning 

The input social network  
,  Vertex and Edge set of 
The number vertices and edges of  
The edge probability function  
The set of communities of  
The number of communities of  
The edge probability of the edge  
Propagation probability of the path  
The cost function  
Cost of the user  
Budget for seed set selection  
The social influence function  
The seed set  
Total selection cost of the users in  
The influence of the seed set  
Maximum influence path from to  
Set of all paths between and  
Cut Off Probability  
Maximum degree of  
The set of players  
The utility/ payoff function  
The BIM Game  
Shapley value of the th player  
The set of MIA Paths to the node  
The set of positive integers  
The set of positive real numbers including zero 
3.2 The MIA Diffusion Model
This is the diffusion model introduced by Wang et al. (2012). Before stating the diffusion model first we state two preliminary definitions.
Definition 5 (Propagation Probability of a Path).
Given two vertices , let denotes the set of paths from the vertex to . For any arbitrary path the propagation probability is defined as the product of the influence probabilities of the edges that constitute the path.
(3) 
Definition 6 (Maximum Probabilistic Path).
Given two vertices , the maximum probabilistic path is the path with the maximum propagation probability and denoted as . Hence,
(4) 
Based on the path propagation probability and and maximum probabilistic path maximum influence inarborescences are defined as follows.
Definition 7 (Maximum Influence InArborescences (MIIA)).
For a node , and a probability threshold , the maximum influence inarborescence is the union of the maximum influence paths with more than the cut off probability to . Mathematically,
(5) 
Now, given a seed set , a node and its , by the rule of model it is assumed that the influence from to is propagated through the paths in . Let be the influence probability of the node by the seed set and this can be recursively computed as mentioned in Wang et al. (2012).
Definition 8 (MIA Diffusion Model).
For any seed set and any arbitrary node , in the MIA Model, it is assumed that the nodes in will influence through the paths in and the expected influence by the seed set can be given by the Equation 6
(6) 
The following two lemmas will be useful to prove some properties of the diffusion game, which will be defined later.
Lemma 1.
3.3 CoOperative Game Theory Specific
Cooperative game theory is the study about strategic formation of coalition and their mathematical analysis which is defined next.
Definition 9 (CoOperative Game or Coalition Game or Transferable Utility Game).
A CoOperative Game is defined by the tuple , where is the finite set of players and
is called the payoff function (also known as the characteristic function) that assigns each possible coalition to a real number, i.e.,
. It is assumed that .In cooperative game theory, one of the main concern is how to distribute the total utility among the players. It is called as the payoff allocation.
Definition 10 (Payoff Allocation).
A payoff allocation is a vector
in , where each entry represents the utility share to the corresponding player.The Shapley Value is an important solution concept in cooperative game theory, which performs the payoff allocation satisfying the following three properties, namely, symmetry, linearity, and carrier Narahari (2014).
Definition 11 (Shapley Value).
In a cooperative game , the Shapley Value of the player towards a coalition is defined as
(7) 
4 Proposed Methodology
In this section, we describe the game theoretic approach for solving the BIM Problem. This section is broadly divided into the following subsections. In Subsection 4.1, we define the BIM Game and establish its properties. Subsection 4.2 contains the overview of the proposed methodology. Subsection 4.3 contains the algorithms present in the proposed methodology, and finally, Subsection 4.4 contains time and space complexity analysis of our proposed approaches.
4.1 The BIM Game and Its Properties
Definition 12 (The BIM Game).
We define our diffusion game as a cooperative game, where the nodes of the network are the players, and for any subset of players, their utility is the expected influence in the network under the MIA Model of diffusion. We denote this diffusion game as , where , , and for any , ^{2}^{2}2As mentioned previously, in this study we assume that the diffusion is happening by the rule of MIA Model. Hence, now onwards we omit the subscript from .
Now, we show some of the important properties of the utility function and the proposed diffusion game.
Proposition 1 (Nonnegativity and Monotonicity of ).
The utility function of the diffusion game in Definition 12 is nonnegative and monotone.
Proof.
As mentioned in Lemma 1, under MIA diffusion model, the influence function is nonnegativity and monotone, the same holds for the utility function as well. ∎
Proposition 2 (NonConvexity of the Diffusion Game).
The BIM Game defined in Definition 12 is not convex.
Proof.
By definition, a cooperative game is said to be convex, if its utility function has the following property: for all and for all , . As mentioned in Lemma 2, the influence function is submodular, this implies that the diffusion game can not be convex. ∎
Proposition 3 (SubAdditivity of the BIM Game).
The BIM Game mentioned in Definition 12 is subadditive.
Proof.
Let, and be two coalition of the BIM Game . Now, we need to show that the utility of the larger coalition, i.e., is at most as the total utility of the individual coalition; i.e.; .
Hence, it is proved that the BIM Game is subadditive. ∎
4.2 Overview of the Proposed Methodologies
Here, we describe the shapley valuebased iterative approach for identifying the influential users from a social network for the BIM Problem. The proposed methodology is broadly divided into two steps: (i) Shapley Value Computation, and (ii) Seed Set Selection. Now, we explain both these steps in detail.
Step 1 (Shapley Value Computation):
In this step, the Shapley value for all the nodes of the network are computed. As mentioned previously, the Shapley value of a node is basically, the average of the marginal contributions over all possible grand coalition. Now, for forming the grand coalition players may arrive at any order. Certainly, there are possible ways to form the grand coalition. Hence, by starling approximation, the growth of is basically of ; i.e.; exponential Cormen et al. (2009). Hence, even for a small size network (let’s say, comprising of nodes, though realworld networks are much much larger) exact computation of shapley value is not feasible. Hence, to get rid of this problem, we use a result from an existing study by Maleki et al. (2013). Their study shows that if the range of the marginal contribution of the players are known, it is possible to compute the shapley value of the players with bounded error and high probability. Particularly, if the number of permutations (denoted by ) considered is greater than or equal to , then the probability that the incurred error in shapley value computation for any of the players is bounded by with probability . By this principle, for a given , , and value, we can easily calculate the number of permutations required to consider for shapley value computation such that all the conditions are met. Algorithm 1 describes this procedure.
Step 2 (Seed Set Selection):
Based on the computed shapley value, we select the seed nodes using the following ways:

Method 1: Once the shapley value of the players are computed, we sort them based on these values and iteratively pick up users as seed nodes until the budget is exhausted. However, in each iteration once a node has been picked up as a seed node, if unflag its neighbors so that these can not be selected as seed. This will help the proposed methodology to uniformly spread the seed nodes across the network. Algorithm 2 describes this procedure.

Method 2: In this method, we detect the inherent community structure of the network. For this purpose the Louvian Algorithm Blondel et al. (2008) has been used in our study. After that the total allocated budget has been divided among the communities proportional to the shapley value of the nodes. Based on this shared budget, from each of the communities high shapley value nodes are chosen as seed nodes until the budget is exhausted. If there are any extra budget during seed set from the communities, then it is transferred to the largest community. Algorithm 3 describes this procedure.
Next, we describe both the methodologies in detail in the next subsection.
4.3 Algorithms in the Proposed Methodology
Now, we describe the proposed approaches in the form of algorithms. Algorithm 1 reports the Step 1 of our proposed methodology. The working principle of this algorithm is as follows: Given , , and , first we generate the number of permutations required for shapley value computation. Next, for each permutation we compute the marginal gain of the nodes. In Algorithm 1, denotes the set of nodes that appeared before in the permutation . First, we activate the first node in the permutation and compute the influence. Next, we consider the second node in the permutation. If it is already activated by the first node then its marginal contribution is . If it is not activated then we compute the marginal gain by subtracting the individual influence from the combined influence. In this way, we compute the marginal gain of all the nodes of the network. We repeat this process for (part of input) given number of times. Now, we compute the marginal gain by dividing . Finally, the shapley value is computed by dividing the number of permutations considered in shapley value computation.
It is important to observe that Algorithm 1 takes the range of marginal gain of the players as one of the input among many. Now, we mention the way we compute this quantity. For any node , we compute its range as follows: Consider the neighbors of the node, i.e., . Consider a node . Now, can be influenced by any of its neighbors. The upper value of this range is . In the worst case may be none of the neighbors will be influenced, and hence, the lower value of the range is . So, the range of the marginal gain of the player is . After computing the range of all the players, we aggregate them by taking average over all the players; i.e., .
As mentioned, after computing the shapley value of the players the next step is to choose the seed nodes. Algorithm 2 describes the Method 1. Though Algorithm 2 easy to understand and simple to implement, still we can improve the performance of this algorithm by exploiting the community structure of the network. Algorithm 3 describes that procedure.
4.4 Complexity Analysis of the Proposed Methodologies
Now, we analyze the algorithms to understand the running time and space requirement for the proposed methodologies. From Line to , we compute the range of marginal gain of the players. This can be implemented in time and space. Degrees of all the nodes can be computed in time. Now, assume that denotes the maximum degree of all the nodes; i.e.; . Hence, for a particular player , the instruction mentioned in Line 3 can be computed in time. The execution time of Line No is of . Hence, the computational time from Line to is of . As, is upper bounded by , hence the quantity reduces to . It is important to observe that marginal contribution of any player can be computed by traversing the graph, which takes . In each permutation, there are players. Hence, considering number of permutations and repeating the same experiment for times requirement for marginal gain computation is of time. After that, computing the shapley value requires additional time. Hence, total time requirement is of and this reduces to . The extra space consumed by the Algorithm 1 is to store the arrays , , , also to store the degree of the nodes and all of them requires space each. Hence the Lemma 3 holds.
Lemma 3.
Running time and space requirement of Algorithm 1 is of and , respectively.
As mentioned previously, once the shapley value of the players are computed, the seed set can be selected either of the two ways. In Algorithm 2, first the nodes of the players are sorted based on the shapley value computed in Algorithm 1. This step requires time. Let, denotes the maximum degree of the network. Now, from Line to requires time. Hence, the running time of Algorithm 2 is of time. Extra space required by Algorithm 2 is to store the array and the seed set , which is of . Hence, Lemma 4 holds.
Lemma 4.
The running time and space requirement of Algorithm 2 is of , and , respectively.
Algorithm 3 describes another way of selecting seed nodes. In this method, first the community structure is detected. This can be done by the Louvian Algorithm, which takes time ^{3}^{3}3https://en.wikipedia.org/wiki/Louvain_modularity. The array stores the community number corresponding to each user, i.e., means the user belongs to the th community of the network. Assume that there are number of communities of the network. Sorting the communities will require time. Computing the total shapley value of the network requires time. Line to actually shows the budget distribution among the communities, which takes time. Line to describes the seed set selection process. Now, it is important to observe that the running time of this phase depends upon the number of nodes that the community contains. To give a weak upper bound, we first calculate the running time for the seed selection for the largest community, and then multiply it with the number of communities. Let, denotes the number of nodes in the largest community. time is required to identify the nodes belongs to that community and sorting them based on the values computed in the array requires time. Now, selecting the nodes from the sorted list requires time. So, the time requirement for the execution of the largest is of . Additionally, during the processing of communities other than the largest one, transferring the unutilized budget of the community to the largest community requires time. Hence, time requirement for the execution from Line to requires . Hence, total time requirement of Algorithm 3 requires . Extra space consumed by Algorithm 3 is to store the array , which requires space; the array , which requires space, vertices of the communities during seed set selection, which requires space, and storing the seed set which requires space. Hence, total space requirement of Algorithm 3 is of . Hence, Lemma 5 holds.
Lemma 5.
Running time and space requirement of Algorithm 3 is of and , respectively.
Now, after computing the shapley value by Algorithm 1, if we use Algorithm 2 for selecting the seed nodes then the running time and space requirement of the proposed methodology becomes is of , and , respectively. Otherwise, if we use Algorithm 3 for seed set selection then the running time and space requirement of the proposed methodology becomes , and , respectively. Hence, from Lemma 3, 4, and 5, following two theorems are implied.
Theorem 1.
If Algorithm 2 is used for seed set selection then the running time and space requirement of our proposed methodology becomes , and , respectively.
Theorem 2.
If Algorithm 3 is used for seed set selection then the running time and space requirement of our proposed methodology becomes , and , respectively.
5 Experimental Evaluation
In this section, we describe the experimental evaluation of the proposed methodologies. Also, the obtained results have been compared with that of results obtained from the existing methods from the literature. Initially, we start with a brief description of the datasets used in our experiments.
5.1 Datasets
We use the following three datasets appeared in two different situations. First two datasets are basically collaboration networks among the researchers of two different areas. The third one is the trust network among the users of the ECommerce house epinions.com.

HEP Theory Collaboration Network (HEPT) Leskovec et al. (2007a) ^{4}^{4}4https://arxiv.org/archive/hepth.: This dataset contains the collaboration information among the high energy physics researchers obtained from the papers submitted in the high energy physics section of arxiv.org. Here, individual researchers are the nodes and two nodes are linked by an edge if the corresponding researchers coauthored atleast one paper.

Condensed Matter Physics Collaboration Network (CMP) ^{5}^{5}5https://arxiv.org/archive/condmat Leskovec et al. (2007a): This is also a collaboration network and obtained by connecting the researchers of the condensed matter physics section of the arxiv.org.

Epinions Dataset ^{6}^{6}6http://www.epinions.com/ Richardson et al. (2003): This dataset contains ‘who trust whom’ information among the users of a review site named ‘Epinions’. Here, the users form the vertex set of the network and there is a directed edge between and if and only if trusts .
All these datasets have been previously used in the experimentation in the domain of influence maximization Jung et al. (2012); Tong et al. (2016); Wen and Deng (2020). Table 2 contains basic statistics of the mentioned datasets.
Dataset Name  Type  Modularity  

HEPT  Undirected  9877  25998  5.26  481  0.76382 
CMP  Undirected  23133  93497  8.08  20  0.68561 
Epinions  Directed  75879  508837  13.41  296  0.79866 
5.2 Experimental Setup
Now, we state the experimental set up that has been used in this paper. Following parameters are there in our study whose value need to be set:

Selection Cost: We assign an integer values as the selection cost of the users from the interval uniformly at random. In Nguyen and Zheng (2013)’s study also cost of the users are chosen randomly from a fixed interval.

Budget: In our experiments, we start with a budget value of and each time incremented by and continued till . Hence, we experiment with following budget values . In Nguyen and Zheng (2013)’s study also experimentation is carried out with some fixed budget values.

Influence Probability: As per the existing studies in the literature, in this study also we consider the following three influence probability settings:

Uniform: In this setting, every edge of the network is associated with the same probability value. In this study, we consider this fixed value as and . These two value has been considered by many existing studies.

TriValency: In this setting all the edges of the network are assigned with a fixed probability from the set uniformly at random. This setting has also been considered in many existing studies in the literature.

Weighted Cascade: In this setting every edge has the influential probability which is equal to the reciprocal of the degree of ; i.e.; . In case of directed network will be replaced by . This setting is also common in many existing studies.
All these influence probability settings have been used in existing studies in and around the influence maximization problem Hong and Liu (2019); Chen et al. (2010b); Logins and Karras (2019); Cohen et al. (2014).


Range of the marginal gain of each player: As mentioned previously, to compute the shapley value of a node, we need to know its range of the marginal gain of the players. For any node , we compute its range as follows: Consider the neighbors of the node, i.e., . Consider a node . Now, can be influenced by any of its neighbors. The upper value of this range is . In the worst case may be none of the neighbors will be influenced, and hence, the lower value of the range is . So, the range of the marginal gain of the player is .
5.3 Goals of the Experiment
Here, we mention the goals of the experiments, which are mentioned below:

The primary goal of our experimentation is to make a comparative study among the proposed as well as existing methodologies regarding the quality of the seed set that they can select.

Our second concern is the computational time. We also make a comparative study of the proposed as well as baseline methods based on their computational time requirement.
5.4 Algorithms in the Experiment
Now, we mention the algorithms that are there in our experiments:
5.4.1 Algorithms Proposed in this Paper

BIM with CoOperative Game Theory (BIMGT): In this method, the marginal contribution in ‘shapley value’ for every user of the network is computed and the users are sorted based o this value. Users are chosen from this sorted list until the budget is exhausted. This is basically ‘Method 1’ of this paper.

BIM with CoOperative Game Theory and Community Structure (BIMGTC): In this method, first the community structure of the network has been exploited and based on the total shapley value of the nodes of the community the total budget is divided among the communities. Subsequently, the nodes with high shapley value is chosen as seed nodes from each of the communities until the allocated budget from each of the communities are exhausted.
5.4.2 Algorithms from the Literature
We compare the performance of the proposed solution approaches with the following existing methods from the literature:

Random (RAND): In this method, starting with an empty seed set in each iteration among the nonseed nodes any arbitrary node is chosen uniformly at random and put it into the seed set. This process is repeated until the allocated budget is exhausted. Many existing studies considers this method as a baseline Kempe et al. (2003); Wu et al. (2014).

Maximum Degree Heuristic (MDH): In this method the nodes are sorted based on its degree. Next, the high degree nodes are chosen as the seed set until the budget is exhausted. This method has been used in many existing studies on influence maximization Narayanam and Narahari (2010); Wu et al. (2014); Shang et al. (2017).

Maximum Clustering Coefficient Heuristic (MCCH): Working principle of this method is same as the MDH, though instead of degree, in this method the clustering coefficient of the nodes are computed. Nodes are sorted based on this value and the nodes with high clustering efficient values are chosen as seed nodes unless the allocated budget is exhausted. Existing studies on influence maximization have used this method as a baseline Narayanam and Narahari (2010).

DAG Based Heuristic for BIM Problem (DAGHEU) Nguyen and Zheng (2013): This is the first study on the BIM Problem that contains a number of solution methodologies. Among them we compare the performance of our proposed methodologies with DAG1SPBP. This is the most efficient and effective as per their claim.

Balanced Seed Selection Heuristic (BSSH) Han et al. (2014): In this method, first the nodes are divided into two groups: one is ‘cost effective’ nodes and ‘influential’ nodes. Finally, the seed nodes are chosen in an intelligent way from both the sets in an efficient way.

CommunityBased Approach for the BIM Problem Banerjee et al. (2019): This is the first communitybased solution approach for the BIM Problem by Banerjee et al. (2019). In this method the total allocated budget is divided based on ‘node fraction’ and ‘cost fraction’ among the communities. Subsequently, the high degree nodes from the communities are chosen as seed nodes until the community specific budget is exhausted.
All these methodologies have been implemented in Python 3.4 + NetworkX 2.0.1 environment. Experiments of this study are performed in an workstation with Intel® Xeon(R) W1290 CPU 3.20GHz × 20 and 16 GB of RAM.
5.5 Experimental Results with Discussions
Now, we describe the experimental results with detailed discussion. First we discuss the impact of influence spread for different budget values by different algorithms.
5.5.1 Impact of Influenced Spread
Figure 1 shows the budget vs. number of influenced nodes plots for the HEPT dataset under different influence probability settings. From this figure it can be observed that the seed set selected by the proposed methodologies leads to the more number of influenced nodes compared to the baseline methods. Now, we give one example. For uniform probability setting with and , among the existing methodologies the seed set selected by BIMGTC leads to the maximum number of influenced nodes which is . On the other hand, among the existing methods the seed set chosen by ComBIM leads to the maximum number of influenced nodes which is . This is almost more compared to the ComBIM. This observation is consistent over different influence probability settings. From Table 2, it can be observed that the number of nodes of the HEPT dataset is . Hence, the influence coverage of the seed sets selected by BIMGTC and ComBIM are approximately and . So, there is a gap of almost . It is also observed that the number of influenced nodes are least in case of trivalency model and this is irrespective of any method. As an example, for the seed set selected by the BIMGTC method leads to , , , and number of influenced nodes for uniform (with and ), trivalency, and weighted cascade models respectively.
(a) CMP Dataset ()  (b) CMP Dataset () 
(c) CMP Dataset (Trivalancy)  (d) CMP Dataset (Weighted Cascade) 
(a) CMP Dataset ()  (b) CMP Dataset () 
(c) CMP Dataset (Trivalancy)  (d) CMP Dataset (Weighted Cascade) 
Next, we describe the obtained results for the CMP Dataset. Figure 2 shows the budget vs. number of influenced nodes plot for the CMP Dataset. Like HEPT Dataset, in this dataset also the seed set selected by the proposed methodologies leads to the more number of influenced nodes compared to the existing methods. Here, we give an example. Under uniform probability setting when all the edges of the network have the influence probability of , the seed set selected by the BIMGTC method leads to more number of influence nodes which is , which is almost of the network. Among the existing methods, ComBIM leads to a seed set that results to number of influenced nodes, which is . Hence, there is a gap of almost in terms of influence coverage. It is also important to observe that in this dataset also exploitation of the community structure leads to more amount of influenced nodes for the proposed methodology. As an example, for under the uniform probability setting with , the seed set selected by BIMGT and BIMGTC leads to and number of influenced nodes, which is almost more.
(a) Epinions ()  (b) Epinions () 
(d) Epinions (Trivalancy)  (e) Epinions (Weighted Cascade) 
Figure 3 shows the budget vs. number of influenced nodes plot for the Epinions dataset. Like previous two datasets, in this dataset also we observe that the seed set selected by the proposed methodologies leads to the more number of influenced nodes compared to the baseline methods. As an example in case of uniform probability setting when and , among the proposed methodologies the seed set selected by the BIMGTC leads to the maximum number of influenced nodes which is . On the other hand, among the existing methods the seed set selected by ComBIM leads to the maximum number of influenced nodes which is . From Table 2, it can be observed that the number of nodes of the Epinions dataset is . Hence, the percentage of nodes influenced by these seed nodes are and , respectively. So, there is an approximate gap of in terms of expected influence. It is also important to observe that due to the exploitation of the community structure of the input network, the number of influenced nodes increases. As an example when and , the number of influenced nodes by the seed set selected by BIMGTC and BIMGT are and , respectively. Next, we proceed to discuss computational time requirement.
5.5.2 Computational Time
Table 3 shows the execution time of different algorithms for seed set selection. From this table it has been observed that the RANDOM takes the least amount of time across all the datasets. Next, MAXDEG takes more time than random as it needs to compute the degree of the nodes. MAXCLUS takes even more time than MAXDEG because computing clustering coefficient is much more computationally expensive operation than the degree. Remaining existing methods takes much more time than the baseline methods. It is important to observe that the computational time requirement of both the proposed methodologies are much less than that of the DAGBased heuristic. It has been observed that the proposed methodologies takes more computational time compared to some of the baseline methods. However, it is important to realize that in many practical applications including viral marketing, computational advertisement etc., it is important to have an algorithm for seed set selection which has reasonable computational time with significant influence coverage. In this aspect, the proposed methodologies are far ahead compared to many existing methods.
Dataset  Budget  Algorithm  
BIMGTC  BIMGT  ComBIM  MAX_DEG  MAX_CLUS  RANDOM  BSSA  DAGHEU  
HEPTHOY  2000  344  326  102  0.0464  0.3535  0.0032  22  1225 
6000  354  342  104  0.0488  0.3674  0.0085  26  1242  
10000  362  349  104  0.0483  0.3666  0.0092  24  1251  
14000  364  348  104  0.0497  0.3556  0.0192  23  1258  
18000  372  352  104  0.0492  0.3608  0.0269  25  1278  
22000  389  364  104  0.0503  0.3543  0.0333  28  1283  
26000  395  368  105  0.0496  0.3557  0.039  33  1299  
CONDMAT 
2000  810  785  345  0.0749  0.8999  0.0031  35  3640 
6000  812  796  344  0.0758  0.8964  0.0119  36  3676  
10000  824  810  344  0.0764  0.8968  0.0185  39  3697  
14000  835  826  344  0.0772  0.8979  0.0311  38  3723  
18000  846  832  349  0.0776  0.8981  0.0401  46  3761  
22000  858  844  349  0.0784  0.9004  0.0450  42  3800  
26000  866  852  350  0.0778  0.9034  0.0541  44  3866  
Epinions  2000  1482  1456  1052  0.2649  199  0.0081  66  14620 
6000  1521  1490  1051  0.2758  201  0.0219  72  14751  
10000  1532  1506  1082  0.3821  226  0.0234  75  14762  
14000  1546  1523  1079  0.3422  225  0.0312  78  14463  
18000  1562  1540  1116  0.3776  238  0.0427  81  14561  
22000  1575  1556  1131  0.5261  231  0.0392  79  14600  
26000  1596  1574  1139  0.5116  229  0.0798  92  14786 
6 Conclusion and Future Direction
In this paper, we have proposed a cooperative game theoretic framework for the budgeted influence maximization problem. Particularly, we formulate a cooperative game, where the users of the network are the players, and for any subset of the players, their utility is defined as the expected influence by the users of the subset under the MIA Model of diffusion. We have used the solution concept called ‘shapley value’ and proposed an iterative algorithm for selecting influential users in the network. We have also shown that the proposed methodology can select better quality seed set if the community structure of the network is exploited. Experiments with realworld social network datasets demonstrate the superiority of the proposed methodologies. Now, this study can be extended in different directions. First of all, we have not given any approximation guarantee of our proposed methodologies with respect to an optimal seed set. It will be interesting to come up with some worst case performance guarantee for our proposed methodologies. We have not considered the timevarying nature of the social network. There are many other solution concepts of cooperative game, such as Banzhaf index Dubey and Shapley (1979) etc. These solution concepts can be used instead of ‘shapley value’ and compare the performance with the proposed methodologies.
References
 Angell and Schoenebeck (2017) Angell, R., Schoenebeck, G., 2017. Don’t be greedy: Leveraging community structure to find high quality seed sets for influence maximization, in: International Conference on Web and Internet Economics, Springer. pp. 16–29.
 Aslay et al. (2015) Aslay, C., Lu, W., Bonchi, F., Goyal, A., Lakshmanan, L.V., 2015. Viral marketing meets social advertising: Ad allocation with minimum regret. Proceedings of the VLDB Endowment 8, 814–825.
 Banerjee et al. (2019) Banerjee, S., Jenamani, M., Pratihar, D.K., 2019. Combim: A communitybased solution approach for the budgeted influence maximization problem. Expert Systems with Applications 125, 1–13.
 Banerjee et al. (2020) Banerjee, S., Jenamani, M., Pratihar, D.K., 2020. A survey on influence maximization in a social network. Knowledge and Information Systems , 1–39.
 Blondel et al. (2008) Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E., 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment 2008, P10008.
 Borodin et al. (2010) Borodin, A., Filmus, Y., Oren, J., 2010. Threshold models for competitive influence in social networks, in: International workshop on internet and network economics, Springer. pp. 539–550.
 Bozorgi et al. (2016) Bozorgi, A., Haghighi, H., Zahedi, M.S., Rezvani, M., 2016. Incim: A communitybased algorithm for influence maximization problem under the linear threshold model. Information Processing & Management 52, 1188–1199.
 Chen et al. (2010a) Chen, W., Liu, Z., Sun, X., Wang, Y., 2010a. A gametheoretic framework to identify overlapping communities in social networks. Data Mining and Knowledge Discovery 21, 224–240.
 Chen et al. (2010b) Chen, W., Wang, C., Wang, Y., 2010b. Scalable influence maximization for prevalent viral marketing in largescale social networks, in: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM. pp. 1029–1038.
 Chen et al. (2012) Chen, Y., Chang, S., Chou, C., Peng, W., Lee, S., 2012. Exploring community structures for influence maximization in social networks, in: The 6th SNAKDD Workshop on Social Network Mining and Analysis Held in Conjunction with KDD, pp. 1–6.
 Chen et al. (2014) Chen, Y.C., Zhu, W.Y., Peng, W.C., Lee, W.C., Lee, S.Y., 2014. Cim: Communitybased influence maximization in social networks. ACM Transactions on Intelligent Systems and Technology (TIST) 5, 25.
 Clark and Poovendran (2011) Clark, A., Poovendran, R., 2011. Maximizing influence in competitive environments: A gametheoretic approach, in: International Conference on Decision and Game Theory for Security, Springer. pp. 151–162.
 Cohen et al. (2014) Cohen, E., Delling, D., Pajor, T., Werneck, R.F., 2014. Sketchbased influence maximization and computation: Scaling up with guarantees, in: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 629–638.
 Cormen et al. (2009) Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C., 2009. Introduction to algorithms. MIT press.
 Ding et al. (2010) Ding, F., Liu, Y., Shen, B., Si, X.M., 2010. An evolutionary game theory model of binary opinion formation. Physica A: Statistical Mechanics and its Applications 389, 1745–1752.
 Domingos (2005) Domingos, P., 2005. Mining social networks for viral marketing. IEEE Intelligent Systems 20, 80–82.
 Domingos and Richardson (2001) Domingos, P., Richardson, M., 2001. Mining the network value of customers, in: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, ACM. pp. 57–66.
 Dubey and Shapley (1979) Dubey, P., Shapley, L.S., 1979. Mathematical properties of the banzhaf power index. Mathematics of Operations Research 4, 99–131.
 Goyal et al. (2011a) Goyal, A., Lu, W., Lakshmanan, L.V., 2011a. Celf++: optimizing the greedy algorithm for influence maximization in social networks, in: Proceedings of the 20th international conference companion on World wide web, ACM. pp. 47–48.
 Goyal et al. (2011b) Goyal, A., Lu, W., Lakshmanan, L.V., 2011b. Simpath: An efficient algorithm for influence maximization under the linear threshold model, in: 2011 IEEE 11th international conference on data mining, IEEE. pp. 211–220.
 Guille et al. (2013) Guille, A., Hacid, H., Favre, C., Zighed, D.A., 2013. Information diffusion in online social networks: A survey. ACM Sigmod Record 42, 17–28.
 Güney (2019) Güney, E., 2019. On the optimal solution of budgeted influence maximization problem in social networks. Operational Research 19, 817–831.
 HAFIENE et al. (2020) HAFIENE, N., KAROUI, W., ROMDHANE, L.B., 2020. Influential nodes detection in dynamic social networks: A survey. Expert Systems with Applications , 113642.
 Han et al. (2014) Han, S., Zhuang, F., He, Q., Shi, Z., 2014. Balanced seed selection for budgeted influence maximization in social networks, in: PacificAsia Conference on Knowledge Discovery and Data Mining, Springer. pp. 65–77.
 Hong and Liu (2019) Hong, T., Liu, Q., 2019. Seeds selection for spreading in a weighted cascade model. Physica A: Statistical Mechanics and its Applications 526, 120943.
 Hossain (2012) Hossain, M., 2012. Users’ motivation to participate in online crowdsourcing platforms, in: 2012 International Conference on Innovation Management and Technology Research, IEEE. pp. 310–315.
 Jung et al. (2012) Jung, K., Heo, W., Chen, W., 2012. Irie: Scalable and robust influence maximization in social networks, in: 2012 IEEE 12th International Conference on Data Mining, IEEE. pp. 918–923.
 Ke et al. (2018) Ke, X., Khan, A., Cong, G., 2018. Finding seeds and relevant tags jointly: For targeted influence maximization in social networks, in: Proceedings of the 2018 International Conference on Management of Data, pp. 1097–1111.
 Kempe et al. (2003) Kempe, D., Kleinberg, J., Tardos, É., 2003. Maximizing the spread of influence through a social network, in: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM. pp. 137–146.
 Kempe et al. (2005) Kempe, D., Kleinberg, J., Tardos, É., 2005. Influential nodes in a diffusion model for social networks, in: International Colloquium on Automata, Languages, and Programming, Springer. pp. 1127–1138.
 Kempe et al. (2015) Kempe, D., Kleinberg, J.M., Tardos, É., 2015. Maximizing the spread of influence through a social network. Theory of Computing 11, 105–147. URL: https://doi.org/10.4086/toc.2015.v011a004, doi:10.4086/toc.2015.v011a004.
 Kostka et al. (2008) Kostka, J., Oswald, Y.A., Wattenhofer, R., 2008. Word of mouth: Rumor dissemination in social networks, in: International colloquium on structural information and communication complexity, Springer. pp. 185–196.
 Leskovec et al. (2007a) Leskovec, J., Kleinberg, J., Faloutsos, C., 2007a. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD) 1, 2.
 Leskovec et al. (2007b) Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., Faloutsos, C., VanBriesen, J., Glance, N., 2007b. Costeffective outbreak detection in networks, in: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM. pp. 420–429.
 Li et al. (2015) Li, H., Bhowmick, S.S., Cui, J., Gao, Y., Ma, J., 2015. Getreal: Towards realistic selection of influence maximization strategies in competitive networks, in: Proceedings of the 2015 ACM SIGMOD international conference on management of data, pp. 1525–1537.
 Li et al. (2017) Li, M., Wang, X., Gao, K., Zhang, S., 2017. A survey on information diffusion in online social networks: Models and methods. Information 8, 118.
 Li et al. (2018) Li, Y., Fan, J., Wang, Y., Tan, K.L., 2018. Influence maximization on social graphs: A survey. IEEE Transactions on Knowledge and Data Engineering 30, 1852–1872.
 Logins and Karras (2019) Logins, A., Karras, P., 2019. Contentbased network influence probabilities: Extraction and application, in: 2019 International Conference on Data Mining Workshops (ICDMW), IEEE. pp. 69–72.
 Maleki et al. (2013) Maleki, S., TranThanh, L., Hines, G., Rahwan, T., Rogers, A., 2013. Bounding the estimation error of samplingbased shapley value approximation. arXiv preprint arXiv:1306.4265 .
 Narahari (2014) Narahari, Y., 2014. Game theory and mechanism design. volume 4. World Scientific.
 Narayanam and Narahari (2010) Narayanam, R., Narahari, Y., 2010. A shapley valuebased approach to discover influential nodes in social networks. IEEE Transactions on Automation Science and Engineering 8, 130–147.
 Nguyen and Zheng (2013) Nguyen, H., Zheng, R., 2013. On budgeted influence maximization in social networks. IEEE Journal on Selected Areas in Communications 31, 1084–1094.
 Rahimkhani et al. (2015) Rahimkhani, K., Aleahmad, A., Rahgozar, M., Moeini, A., 2015. A fast algorithm for finding most influential people based on the linear threshold model. Expert Systems with Applications 42, 1353–1361.
 Richardson et al. (2003) Richardson, M., Agrawal, R., Domingos, P., 2003. Trust management for the semantic web, in: International semantic Web conference, Springer. pp. 351–368.
 Richardson and Domingos (2002) Richardson, M., Domingos, P., 2002. Mining knowledgesharing sites for viral marketing, in: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM. pp. 61–70.
 Shang et al. (2017) Shang, J., Zhou, S., Li, X., Liu, L., Wu, H., 2017. Cofim: A communitybased framework for influence maximization on largescale networks. KnowledgeBased Systems 117, 88–100.
 Shi et al. (2019) Shi, Q., Wang, C., Chen, J., Feng, Y., Chen, C., 2019. Post and repost: A holistic view of budgeted influence maximization. Neurocomputing 338, 92–100.
 Tang et al. (2015) Tang, Y., Shi, Y., Xiao, X., 2015. Influence maximization in nearlinear time: A martingale approach, in: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, ACM. pp. 1539–1554.
 Tang et al. (2014) Tang, Y., Xiao, X., Shi, Y., 2014. Influence maximization: Nearoptimal time complexity meets practical efficiency, in: Proceedings of the 2014 ACM SIGMOD international conference on Management of data, ACM. pp. 75–86.
 Tong et al. (2016) Tong, G., Wu, W., Tang, S., Du, D.Z., 2016. Adaptive influence maximization in dynamic social networks. IEEE/ACM Transactions on Networking 25, 112–125.
 Wang et al. (2012) Wang, C., Chen, W., Wang, Y., 2012. Scalable influence maximization for independent cascade model in largescale social networks. Data Mining and Knowledge Discovery 25, 545–576.
 Wang et al. (2020) Wang, S., Yang, S., Xu, Z., Truong, V.A., 2020. Fast thompson sampling algorithm with cumulative oversampling: Application to budgeted influence maximization. arXiv preprint arXiv:2004.11963 .
 Wang and Yu (2020) Wang, S.B.Q.G.S., Yu, J.X., 2020. Efficient algorithms for budgeted influence maximization on massive social networks. Proceedings of the VLDB Endowment 13.
 Wen and Deng (2020) Wen, T., Deng, Y., 2020. Identification of influencers in complex networks by local information dimensionality. Information Sciences 512, 549–562.
 Wu et al. (2014) Wu, Y., Yang, Y., Jiang, F., Jin, S., Xu, J., 2014. Coritivitybased influence maximization in social networks. Physica A: Statistical Mechanics and its Applications 416, 467–480.
 Ye et al. (2012) Ye, M., Liu, X., Lee, W.C., 2012. Exploring social influence for recommendation: a generative model approach, in: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, ACM. pp. 671–680.
 Yu et al. (2018) Yu, Q., Li, H., Liao, Y., Cui, S., 2018. Fast budgeted influence maximization over multiaction event logs. IEEE Access 6, 14367–14378.
 Zimmermann and Eguíluz (2005) Zimmermann, M.G., Eguíluz, V.M., 2005. Cooperation, social networks, and the emergence of leadership in a prisoner’s dilemma with adaptive local interactions. Physical Review E 72, 056118.
Comments
There are no comments yet.