The Influence Maximization Problem aims at identifying a small set of highly influential users such that their initial activation leads to the maximum number of influential nodes in the network Kempe et al. (2003). This is perhaps one of the most well studied problem in computational social network analysis due to its practical applications in different domains including crowed-sourcing Hossain (2012), viral marketing Domingos (2005), computational advertisement Aslay et al. (2015), recommender systems Ye et al. (2012) etc. This problem was initially studied by Kempe et al. (2003) and since then there was an extensive effort to study in and around of this problem. Look into Banerjee et al. (2020); Li et al. (2018) for recent surveys on this topic. The main cause of influence is the diffusion of information and it is basically a cascading effect by which information propagates from one part of the network to the other. To study this process, several diffusion models have been introduced, such as Independent Cascade Model (IC Model), Linear Threshold Model (LT Model), Maximum Influence Arborences Model (MIA Model) and many more Li et al. (2017).
In reality, social networks are formed by rational human agents. This means that if an user is selected as a seed then incentivization is required. However, the basic influence maximization problem assumes that the every users of the network has equal selection cost. though it may not be so in reality. To bridge this gap, Nguyen and Zheng (2013) introduced the ‘Budgeted Influence Maximization Problem’, where the users of the social network are associated with a nonuniform selection cost and a fixed budget is allocated, the goal here is to select a subset of nodes within the allocated budget such that their initial activation leads to maximum number of influenced nodes. As compared to the basic influence maximization problem, existing literature of this problem is very very limited. In this paper, we study this problem under the co-operative game theoretic framework.
Game Theory, which is basically a mathematical study of strategic interaction among a group of rational agents, has been used to solve many problems in the domain of social network analysis such as community detection Chen et al. (2010a), opinion dynamics Ding et al. (2010), leader selection Zimmermann and Eguíluz (2005), rumor dissemination Kostka et al. (2008), influence maximization Clark and Poovendran (2011) and many more. However, to the best of our knowledge, the BIM Problem has not been studied yet under the game theoretic framework. In this paper, we study the BIM Problem under co-operative game theoretic framework and our study is motivated by the work Narayanam and Narahari (2010). However, there are the following fundamental differences:
Narayanam and Narahari (2010)’s study is concentrated on the basic influence maximization problem and the non-uniform nature of the selection cost has not been taken into account, which is a practical concern.
In their study they have used IC and LT as the underlying diffusion model. However, recently there are several studies on influence maximization that considers MIA Model as the underlying model of diffusion Ke et al. (2018). In this study also, we have used the MIA as the underlying diffusion model.
It is well known that the complexity of computing the shapley value of a player co-operative game is of Narahari (2014). Even in case of a small size social network (e.g., number of nodes are ) this is a huge computational burden. To get rid of this, they have randomly chosen linear number of permutations of the players for computing shapley value Narayanam and Narahari (2010). However, with the help of an existing result by Maleki et al. (2013), we use the appropriate number of samples such that shapley value can be computed with bounded error with high probability.
In particular, we make the following contributions in this paper:
We formulate a co-operative diffusion game for the Budgeted Influence Maximization Problem, and design an iterative algorithm based on the solution concept of a co-operative game known as Shapley value for identifying the influential nodes.
We study the important properties of the formulated game and detailed complexity analysis of the proposed methodologies have also been done.
We also show that if we consider the community structure of the network, then the proposed methodology can leads to even more number of influenced nodes.
The proposed methodologies have been with four publicly available social network datasets and an extensive set of experiments have been carried out to demonstrate the effectiveness of the proposed methodology.
Rest of the paper is organized as follows: Section 2 describes relevant studies from the literature. Section 3 describes required preliminary concepts and define the diffusion game formally. The proposed solution methodologies have been described in Section 4. Section 5 contains the experimental evaluation, and finally, Section 6 concludes our study and gives future directions.
2 Related Work
Our study is closely related to the ‘Influence Maximization in Social Networks’, more particularly the Budgeted Influence Maximization Problem, and also game theoretic solution methodologies for the influence maximization problem. Here, we present relevant studies from the literature.
Influence Maximization in Social Networks
The problem of influence maximization aims at choosing a small number of highly influential users in a social network for initial activation such that due to the cascading effect of diffusion, the number of influenced node is maximized Kempe et al. (2003). Domingos and Richardson (2001); Richardson and Domingos (2002) first introduced this problem for the ‘viral marketing’ in social networks. Later on, Kempe et al. (2003, 2005, 2015) studied the computational issues of this problem and showed that the problem is -Hard under the Independent Cascade and Linear Threshold Model of diffusion. However, they gave a - factor approximation algorithm, which works based on maximum marginal influence gain computation and for any factor inapproximability result. Their study remains an influential one and triggers a huge amount of study in last one and half decades. Solution methodologies can be grouped into different categories, such as ‘approximation algorithms’ such as CELF Leskovec et al. (2007b), CELF++ Goyal et al. (2011a), MIA Wang et al. (2012), PMIA Wang et al. (2012), TIM Tang et al. (2014), IMM Tang et al. (2015)
; heuristic solutions such asSIMPATH Goyal et al. (2011b), IRIE Jung et al. (2012), LDGA Chen et al. (2010b), Community-Based Solution Methodologies such as CIM Chen et al. (2012, 2014), ComPath Rahimkhani et al. (2015), INCIM Bozorgi et al. (2016), CoFIM Shang et al. (2017) and many more. Please refer to Banerjee et al. (2020); HAFIENE et al. (2020); Li et al. (2018) (and references therein) for recent surveys.
Budgeted Influence Maximization Problem
In case of BIM Problem, along with the input social network, we are also given with a cost function that assigns selection cost to each node and a fixed budget is allocated for the seed set selection. Nguyen and Zheng (2013) first introduced the BIM Problem and proposed a -factor approximation algorithm and two efficient heuristics to solve this problem. Recently, Wang and Yu (2020) studied the BIM Problem and proposed a solution methodology that gives an approximation ratio of . They further showed that this can be improved upto . Güney (2019) proposed an integer programming-based approach to solve this problem under the IC Model of diffusion. Han et al. (2014) proposed a couple of heuristics for this problem that carefully considers both cost effective nodes and influential nodes. Recently, Banerjee et al. (2019) proposed a community-based solution methodology for the BIM Problem which is broadly divided into four steps, namely, community detection, budget distribution, seed node selection, and budget transfer. Shi et al. (2019) proposed two different solution methodologies with data dependent approximation ratio. Yu et al. (2018) studied this problem under credit distribution model and came up with a streaming algorithm with approximation of the optimum.
Game Theoretic Solution Methodologies for SIM and Related Problems
Game theoretic techniques have been used to solve the influence maximization problems since last one decade. To the best of our knowledge, the first study in this direction was by Narayanam and Narahari (2010). They proposed a ‘Shapley Value’-based approach for selecting the seed nodes for the influential maximization problem. Clark and Poovendran (2011) studied the influence maximization problem in competitive situation and they formulated a Stackleberg Game and proposed a methodology to solve the game. Borodin et al. (2010) studied the influence maximization problem in competitive situation with several natural extensions of the LT Model. Angell and Schoenebeck (2017) studied the influence maximization problem and proposed a greedy heuristic that shows an improvement of and for submodular and non-submodular influence functions, respectively. Li et al. (2015) study the influence maximization problem in competitive situation and proposed a game theoretic framework for this problem. They designed a non-cooperative game and proposed an algorithm that computes the ‘Nash Equilibrium’ of the game that guarantees optimal strategies. Recently, Wang et al. (2020)
proposed an cumulative oversampling technique for ‘Thompson Sampling’ to construct optimistic parameter estimates with fewer samples. They showed that their learning algorithms can be used to solve the BIM Problem using less number of samples.
To the best of our knowledge, there does not exist any literature that studies BIM Problem under game theoretic setting. In this paper, we study this Problem under Co-Operative Game Theoretic Framework. In particular, we propose an iterative approach based on the Shapley Value for selecting the influential nodes for the BIM Problem. We also show that the exploitation of community structure helps the proposed methodology to achieve even better influence spread.
3 Preliminary Concepts, Background, and Problem Definition
Here, we present some preliminary definitions. Initially, we start with the social network specific ones.
3.1 Social Network Specific
Definition 1 (Social Network).
A social network is an interconnected structure among a group of human agents formed for social interactions, which is often represented as a graph . Here, the vertex set are the set of users, the edge set are the social ties among the users, and is the edge weight function that assigns each edge to its influence probability.
We denote the number of nodes and edges of the network by and , respectively. For any positive integer , denotes the set . For any , we denote the influence probability assigned to it as or . If, then . Now, to start a diffusion process in the network, there should be some initial adopters, which we call as seed nodes. To study the diffusion process in a network, several models have been introduced such as IC Model, LT Model, MIA Model and so on Guille et al. (2013). Now, once a diffusion process is initiated from a number of seed nodes, it ends with influencing a subset of the nodes and called as the influence of the seed set, which is defined next.
Definition 2 (Influence of a Seed Set).
Given a social network and a diffusion model for a given seed set , its influence is defined as the number of nodes that are influenced due to the initial activation of the nodes in if the information is diffused by the rule of the model . denotes the the set of influenced nodes by the seed set under the diffusion model . This quantity is measured in terms of expactation. Hence, the influence of the seed set and under the diffusion model is , where is the social influence function, i.e., with . Here, denotes the empty set.
Now, any real-world diffusion processes (such as political campaigns, viral marketing of products etc.) are always carried out to maximize the influence. In this direction, next we state the well studied Social Influence Maximization Problem.
Definition 3 (Social Influence Maximization Problem (SIM Problem)).
Kempe et al. (2003) Given a social network , a diffusion model , and a positive integer (), the social influence maximization problem asks for selecting a seed set with , whose initial activation leads to the maximum number of influenced nodes, if the diffusion process happens by the rule of the model . Mathematically,
As mentioned previously, real-world social networks are formed by rational and self interested agents. If a node is selected as seed then some kind of incentivization is required. In reality, different users of the network have different selection cost, i.e., users have non uniform selection cost. However, the SIM Problem assumes that the selection costs are uniform. To bridge this gap, the ‘Budgeted Influence Maximization Problem’ has been introduced by Nguyen and Zheng (2013) which is defined next.
Definition 4 (Budgeted Influence Maximization Problem).
Given a social network along with a cost function that assigns each node to its selection cost, a diffusion model , and a fixed budget , the problem of budgeted influence maximization asks for selecting a seed set within the allocated budget, whose initial activation leads to the maximum number of influenced nodes, if the diffusion process happens by the rule of the model . Mathematically,
where, denotes the total selection cost of the users in , i.e., . We denote an arbitrary instance of BIM Problem as .
From the algorithmic point of view the problem can be posed as follows:
[title=Budgeted Influence Maximization Problem, width=12cm] Input: Social Network , Cost Function , Budget , and the Diffusion Model .
Problem: Find out a seed set , such that and is maximized.
Next, we state the Maximum Influence Aroborance (MIA) diffusion model, which we have considered as the underlying diffusion model. Symbols and notations used in this paper is given in Table 1.
|The input social network|
|,||Vertex and Edge set of|
|The number vertices and edges of|
|The edge probability function|
|The set of communities of|
|The number of communities of|
|The edge probability of the edge|
|Propagation probability of the path|
|The cost function|
|Cost of the user|
|Budget for seed set selection|
|The social influence function|
|The seed set|
|Total selection cost of the users in|
|The influence of the seed set|
|Maximum influence path from to|
|Set of all paths between and|
|Cut Off Probability|
|Maximum degree of|
|The set of players|
|The utility/ pay-off function|
|The BIM Game|
|Shapley value of the -th player|
|The set of MIA Paths to the node|
|The set of positive integers|
|The set of positive real numbers including zero|
3.2 The MIA Diffusion Model
This is the diffusion model introduced by Wang et al. (2012). Before stating the diffusion model first we state two preliminary definitions.
Definition 5 (Propagation Probability of a Path).
Given two vertices , let denotes the set of paths from the vertex to . For any arbitrary path the propagation probability is defined as the product of the influence probabilities of the edges that constitute the path.
Definition 6 (Maximum Probabilistic Path).
Given two vertices , the maximum probabilistic path is the path with the maximum propagation probability and denoted as . Hence,
Based on the path propagation probability and and maximum probabilistic path maximum influence in-arborescences are defined as follows.
Definition 7 (Maximum Influence In-Arborescences (MIIA)).
For a node , and a probability threshold , the maximum influence in-arborescence is the union of the maximum influence paths with more than the cut off probability to . Mathematically,
Now, given a seed set , a node and its , by the rule of model it is assumed that the influence from to is propagated through the paths in . Let be the influence probability of the node by the seed set and this can be recursively computed as mentioned in Wang et al. (2012).
Definition 8 (MIA Diffusion Model).
For any seed set and any arbitrary node , in the MIA Model, it is assumed that the nodes in will influence through the paths in and the expected influence by the seed set can be given by the Equation 6
The following two lemmas will be useful to prove some properties of the diffusion game, which will be defined later.
3.3 Co-Operative Game Theory Specific
Co-operative game theory is the study about strategic formation of coalition and their mathematical analysis which is defined next.
Definition 9 (Co-Operative Game or Coalition Game or Transferable Utility Game).
A Co-Operative Game is defined by the tuple , where is the finite set of players and is called the payoff function (also known as the characteristic function) that assigns each possible coalition to a real number, i.e.,
is called the payoff function (also known as the characteristic function) that assigns each possible coalition to a real number, i.e.,. It is assumed that .
In co-operative game theory, one of the main concern is how to distribute the total utility among the players. It is called as the payoff allocation.
Definition 10 (Payoff Allocation).
A payoff allocation is a vector
A payoff allocation is a vectorin , where each entry represents the utility share to the corresponding player.
The Shapley Value is an important solution concept in co-operative game theory, which performs the payoff allocation satisfying the following three properties, namely, symmetry, linearity, and carrier Narahari (2014).
Definition 11 (Shapley Value).
In a co-operative game , the Shapley Value of the player towards a coalition is defined as
4 Proposed Methodology
In this section, we describe the game theoretic approach for solving the BIM Problem. This section is broadly divided into the following subsections. In Sub-section 4.1, we define the BIM Game and establish its properties. Sub-section 4.2 contains the overview of the proposed methodology. Sub-section 4.3 contains the algorithms present in the proposed methodology, and finally, Sub-section 4.4 contains time and space complexity analysis of our proposed approaches.
4.1 The BIM Game and Its Properties
Definition 12 (The BIM Game).
We define our diffusion game as a co-operative game, where the nodes of the network are the players, and for any subset of players, their utility is the expected influence in the network under the MIA Model of diffusion. We denote this diffusion game as , where , , and for any , 222As mentioned previously, in this study we assume that the diffusion is happening by the rule of MIA Model. Hence, now onwards we omit the subscript from .
Now, we show some of the important properties of the utility function and the proposed diffusion game.
Proposition 1 (Non-negativity and Monotonicity of ).
The utility function of the diffusion game in Definition 12 is non-negative and monotone.
As mentioned in Lemma 1, under MIA diffusion model, the influence function is non-negativity and monotone, the same holds for the utility function as well. ∎
Proposition 2 (Non-Convexity of the Diffusion Game).
The BIM Game defined in Definition 12 is not convex.
By definition, a co-operative game is said to be convex, if its utility function has the following property: for all and for all , . As mentioned in Lemma 2, the influence function is submodular, this implies that the diffusion game can not be convex. ∎
Proposition 3 (Sub-Additivity of the BIM Game).
The BIM Game mentioned in Definition 12 is sub-additive.
Let, and be two coalition of the BIM Game . Now, we need to show that the utility of the larger coalition, i.e., is at most as the total utility of the individual coalition; i.e.; .
Hence, it is proved that the BIM Game is sub-additive. ∎
4.2 Overview of the Proposed Methodologies
Here, we describe the shapley value-based iterative approach for identifying the influential users from a social network for the BIM Problem. The proposed methodology is broadly divided into two steps: (i) Shapley Value Computation, and (ii) Seed Set Selection. Now, we explain both these steps in detail.
Step 1 (Shapley Value Computation):
In this step, the Shapley value for all the nodes of the network are computed. As mentioned previously, the Shapley value of a node is basically, the average of the marginal contributions over all possible grand coalition. Now, for forming the grand coalition players may arrive at any order. Certainly, there are possible ways to form the grand coalition. Hence, by starling approximation, the growth of is basically of ; i.e.; exponential Cormen et al. (2009). Hence, even for a small size network (let’s say, comprising of nodes, though real-world networks are much much larger) exact computation of shapley value is not feasible. Hence, to get rid of this problem, we use a result from an existing study by Maleki et al. (2013). Their study shows that if the range of the marginal contribution of the players are known, it is possible to compute the shapley value of the players with bounded error and high probability. Particularly, if the number of permutations (denoted by ) considered is greater than or equal to , then the probability that the incurred error in shapley value computation for any of the players is bounded by with probability . By this principle, for a given , , and value, we can easily calculate the number of permutations required to consider for shapley value computation such that all the conditions are met. Algorithm 1 describes this procedure.
Step 2 (Seed Set Selection):
Based on the computed shapley value, we select the seed nodes using the following ways:
Method 1: Once the shapley value of the players are computed, we sort them based on these values and iteratively pick up users as seed nodes until the budget is exhausted. However, in each iteration once a node has been picked up as a seed node, if un-flag its neighbors so that these can not be selected as seed. This will help the proposed methodology to uniformly spread the seed nodes across the network. Algorithm 2 describes this procedure.
Method 2: In this method, we detect the inherent community structure of the network. For this purpose the Louvian Algorithm Blondel et al. (2008) has been used in our study. After that the total allocated budget has been divided among the communities proportional to the shapley value of the nodes. Based on this shared budget, from each of the communities high shapley value nodes are chosen as seed nodes until the budget is exhausted. If there are any extra budget during seed set from the communities, then it is transferred to the largest community. Algorithm 3 describes this procedure.
Next, we describe both the methodologies in detail in the next subsection.
4.3 Algorithms in the Proposed Methodology
Now, we describe the proposed approaches in the form of algorithms. Algorithm 1 reports the Step 1 of our proposed methodology. The working principle of this algorithm is as follows: Given , , and , first we generate the number of permutations required for shapley value computation. Next, for each permutation we compute the marginal gain of the nodes. In Algorithm 1, denotes the set of nodes that appeared before in the permutation . First, we activate the first node in the permutation and compute the influence. Next, we consider the second node in the permutation. If it is already activated by the first node then its marginal contribution is . If it is not activated then we compute the marginal gain by subtracting the individual influence from the combined influence. In this way, we compute the marginal gain of all the nodes of the network. We repeat this process for (part of input) given number of times. Now, we compute the marginal gain by dividing . Finally, the shapley value is computed by dividing the number of permutations considered in shapley value computation.
It is important to observe that Algorithm 1 takes the range of marginal gain of the players as one of the input among many. Now, we mention the way we compute this quantity. For any node , we compute its range as follows: Consider the neighbors of the node, i.e., . Consider a node . Now, can be influenced by any of its neighbors. The upper value of this range is . In the worst case may be none of the neighbors will be influenced, and hence, the lower value of the range is . So, the range of the marginal gain of the player is . After computing the range of all the players, we aggregate them by taking average over all the players; i.e., .
As mentioned, after computing the shapley value of the players the next step is to choose the seed nodes. Algorithm 2 describes the Method 1. Though Algorithm 2 easy to understand and simple to implement, still we can improve the performance of this algorithm by exploiting the community structure of the network. Algorithm 3 describes that procedure.
4.4 Complexity Analysis of the Proposed Methodologies
Now, we analyze the algorithms to understand the running time and space requirement for the proposed methodologies. From Line to , we compute the range of marginal gain of the players. This can be implemented in time and space. Degrees of all the nodes can be computed in time. Now, assume that denotes the maximum degree of all the nodes; i.e.; . Hence, for a particular player , the instruction mentioned in Line 3 can be computed in time. The execution time of Line No is of . Hence, the computational time from Line to is of . As, is upper bounded by , hence the quantity reduces to . It is important to observe that marginal contribution of any player can be computed by traversing the graph, which takes . In each permutation, there are players. Hence, considering number of permutations and repeating the same experiment for times requirement for marginal gain computation is of time. After that, computing the shapley value requires additional time. Hence, total time requirement is of and this reduces to . The extra space consumed by the Algorithm 1 is to store the arrays , , , also to store the degree of the nodes and all of them requires space each. Hence the Lemma 3 holds.
Running time and space requirement of Algorithm 1 is of and , respectively.
As mentioned previously, once the shapley value of the players are computed, the seed set can be selected either of the two ways. In Algorithm 2, first the nodes of the players are sorted based on the shapley value computed in Algorithm 1. This step requires time. Let, denotes the maximum degree of the network. Now, from Line to requires time. Hence, the running time of Algorithm 2 is of time. Extra space required by Algorithm 2 is to store the array and the seed set , which is of . Hence, Lemma 4 holds.
The running time and space requirement of Algorithm 2 is of , and , respectively.
Algorithm 3 describes another way of selecting seed nodes. In this method, first the community structure is detected. This can be done by the Louvian Algorithm, which takes time 333https://en.wikipedia.org/wiki/Louvain_modularity. The array stores the community number corresponding to each user, i.e., means the user belongs to the -th community of the network. Assume that there are number of communities of the network. Sorting the communities will require time. Computing the total shapley value of the network requires time. Line to actually shows the budget distribution among the communities, which takes time. Line to describes the seed set selection process. Now, it is important to observe that the running time of this phase depends upon the number of nodes that the community contains. To give a weak upper bound, we first calculate the running time for the seed selection for the largest community, and then multiply it with the number of communities. Let, denotes the number of nodes in the largest community. time is required to identify the nodes belongs to that community and sorting them based on the values computed in the array requires time. Now, selecting the nodes from the sorted list requires time. So, the time requirement for the execution of the largest is of . Additionally, during the processing of communities other than the largest one, transferring the unutilized budget of the community to the largest community requires time. Hence, time requirement for the execution from Line to requires . Hence, total time requirement of Algorithm 3 requires . Extra space consumed by Algorithm 3 is to store the array , which requires space; the array , which requires space, vertices of the communities during seed set selection, which requires space, and storing the seed set which requires space. Hence, total space requirement of Algorithm 3 is of . Hence, Lemma 5 holds.
Running time and space requirement of Algorithm 3 is of and , respectively.
Now, after computing the shapley value by Algorithm 1, if we use Algorithm 2 for selecting the seed nodes then the running time and space requirement of the proposed methodology becomes is of , and , respectively. Otherwise, if we use Algorithm 3 for seed set selection then the running time and space requirement of the proposed methodology becomes , and , respectively. Hence, from Lemma 3, 4, and 5, following two theorems are implied.
If Algorithm 2 is used for seed set selection then the running time and space requirement of our proposed methodology becomes , and , respectively.
If Algorithm 3 is used for seed set selection then the running time and space requirement of our proposed methodology becomes , and , respectively.
5 Experimental Evaluation
In this section, we describe the experimental evaluation of the proposed methodologies. Also, the obtained results have been compared with that of results obtained from the existing methods from the literature. Initially, we start with a brief description of the datasets used in our experiments.
We use the following three datasets appeared in two different situations. First two datasets are basically collaboration networks among the researchers of two different areas. The third one is the trust network among the users of the E-Commerce house epinions.com.
HEP Theory Collaboration Network (HEPT) Leskovec et al. (2007a) 444https://arxiv.org/archive/hep-th.: This dataset contains the collaboration information among the high energy physics researchers obtained from the papers submitted in the high energy physics section of arxiv.org. Here, individual researchers are the nodes and two nodes are linked by an edge if the corresponding researchers co-authored atleast one paper.
All these datasets have been previously used in the experimentation in the domain of influence maximization Jung et al. (2012); Tong et al. (2016); Wen and Deng (2020). Table 2 contains basic statistics of the mentioned datasets.
5.2 Experimental Setup
Now, we state the experimental set up that has been used in this paper. Following parameters are there in our study whose value need to be set:
Selection Cost: We assign an integer values as the selection cost of the users from the interval uniformly at random. In Nguyen and Zheng (2013)’s study also cost of the users are chosen randomly from a fixed interval.
Budget: In our experiments, we start with a budget value of and each time incremented by and continued till . Hence, we experiment with following budget values . In Nguyen and Zheng (2013)’s study also experimentation is carried out with some fixed budget values.
Influence Probability: As per the existing studies in the literature, in this study also we consider the following three influence probability settings:
Uniform: In this setting, every edge of the network is associated with the same probability value. In this study, we consider this fixed value as and . These two value has been considered by many existing studies.
Tri-Valency: In this setting all the edges of the network are assigned with a fixed probability from the set uniformly at random. This setting has also been considered in many existing studies in the literature.
Weighted Cascade: In this setting every edge has the influential probability which is equal to the reciprocal of the degree of ; i.e.; . In case of directed network will be replaced by . This setting is also common in many existing studies.
All these influence probability settings have been used in existing studies in and around the influence maximization problem Hong and Liu (2019); Chen et al. (2010b); Logins and Karras (2019); Cohen et al. (2014).
Range of the marginal gain of each player: As mentioned previously, to compute the shapley value of a node, we need to know its range of the marginal gain of the players. For any node , we compute its range as follows: Consider the neighbors of the node, i.e., . Consider a node . Now, can be influenced by any of its neighbors. The upper value of this range is . In the worst case may be none of the neighbors will be influenced, and hence, the lower value of the range is . So, the range of the marginal gain of the player is .
5.3 Goals of the Experiment
Here, we mention the goals of the experiments, which are mentioned below:
The primary goal of our experimentation is to make a comparative study among the proposed as well as existing methodologies regarding the quality of the seed set that they can select.
Our second concern is the computational time. We also make a comparative study of the proposed as well as baseline methods based on their computational time requirement.
5.4 Algorithms in the Experiment
Now, we mention the algorithms that are there in our experiments:
5.4.1 Algorithms Proposed in this Paper
BIM with Co-Operative Game Theory (BIMGT): In this method, the marginal contribution in ‘shapley value’ for every user of the network is computed and the users are sorted based o this value. Users are chosen from this sorted list until the budget is exhausted. This is basically ‘Method 1’ of this paper.
BIM with Co-Operative Game Theory and Community Structure (BIMGTC): In this method, first the community structure of the network has been exploited and based on the total shapley value of the nodes of the community the total budget is divided among the communities. Subsequently, the nodes with high shapley value is chosen as seed nodes from each of the communities until the allocated budget from each of the communities are exhausted.
5.4.2 Algorithms from the Literature
We compare the performance of the proposed solution approaches with the following existing methods from the literature:
Random (RAND): In this method, starting with an empty seed set in each iteration among the non-seed nodes any arbitrary node is chosen uniformly at random and put it into the seed set. This process is repeated until the allocated budget is exhausted. Many existing studies considers this method as a baseline Kempe et al. (2003); Wu et al. (2014).
Maximum Degree Heuristic (MDH): In this method the nodes are sorted based on its degree. Next, the high degree nodes are chosen as the seed set until the budget is exhausted. This method has been used in many existing studies on influence maximization Narayanam and Narahari (2010); Wu et al. (2014); Shang et al. (2017).
Maximum Clustering Co-efficient Heuristic (MCCH): Working principle of this method is same as the MDH, though instead of degree, in this method the clustering co-efficient of the nodes are computed. Nodes are sorted based on this value and the nodes with high clustering efficient values are chosen as seed nodes unless the allocated budget is exhausted. Existing studies on influence maximization have used this method as a baseline Narayanam and Narahari (2010).
DAG Based Heuristic for BIM Problem (DAGHEU) Nguyen and Zheng (2013): This is the first study on the BIM Problem that contains a number of solution methodologies. Among them we compare the performance of our proposed methodologies with DAG1-SPBP. This is the most efficient and effective as per their claim.
Balanced Seed Selection Heuristic (BSSH) Han et al. (2014): In this method, first the nodes are divided into two groups: one is ‘cost effective’ nodes and ‘influential’ nodes. Finally, the seed nodes are chosen in an intelligent way from both the sets in an efficient way.
Community-Based Approach for the BIM Problem Banerjee et al. (2019): This is the first community-based solution approach for the BIM Problem by Banerjee et al. (2019). In this method the total allocated budget is divided based on ‘node fraction’ and ‘cost fraction’ among the communities. Subsequently, the high degree nodes from the communities are chosen as seed nodes until the community specific budget is exhausted.
All these methodologies have been implemented in Python 3.4 + NetworkX 2.0.1 environment. Experiments of this study are performed in an workstation with Intel® Xeon(R) W-1290 CPU 3.20GHz × 20 and 16 GB of RAM.
5.5 Experimental Results with Discussions
Now, we describe the experimental results with detailed discussion. First we discuss the impact of influence spread for different budget values by different algorithms.
5.5.1 Impact of Influenced Spread
Figure 1 shows the budget vs. number of influenced nodes plots for the HEPT dataset under different influence probability settings. From this figure it can be observed that the seed set selected by the proposed methodologies leads to the more number of influenced nodes compared to the baseline methods. Now, we give one example. For uniform probability setting with and , among the existing methodologies the seed set selected by BIMGTC leads to the maximum number of influenced nodes which is . On the other hand, among the existing methods the seed set chosen by ComBIM leads to the maximum number of influenced nodes which is . This is almost more compared to the ComBIM. This observation is consistent over different influence probability settings. From Table 2, it can be observed that the number of nodes of the HEPT dataset is . Hence, the influence coverage of the seed sets selected by BIMGTC and ComBIM are approximately and . So, there is a gap of almost . It is also observed that the number of influenced nodes are least in case of trivalency model and this is irrespective of any method. As an example, for the seed set selected by the BIMGTC method leads to , , , and number of influenced nodes for uniform (with and ), trivalency, and weighted cascade models respectively.
|(a) CMP Dataset ()||(b) CMP Dataset ()|
|(c) CMP Dataset (Trivalancy)||(d) CMP Dataset (Weighted Cascade)|
|(a) CMP Dataset ()||(b) CMP Dataset ()|
|(c) CMP Dataset (Trivalancy)||(d) CMP Dataset (Weighted Cascade)|
Next, we describe the obtained results for the CMP Dataset. Figure 2 shows the budget vs. number of influenced nodes plot for the CMP Dataset. Like HEPT Dataset, in this dataset also the seed set selected by the proposed methodologies leads to the more number of influenced nodes compared to the existing methods. Here, we give an example. Under uniform probability setting when all the edges of the network have the influence probability of , the seed set selected by the BIMGTC method leads to more number of influence nodes which is , which is almost of the network. Among the existing methods, ComBIM leads to a seed set that results to number of influenced nodes, which is . Hence, there is a gap of almost in terms of influence coverage. It is also important to observe that in this dataset also exploitation of the community structure leads to more amount of influenced nodes for the proposed methodology. As an example, for under the uniform probability setting with , the seed set selected by BIMGT and BIMGTC leads to and number of influenced nodes, which is almost more.
|(a) Epinions ()||(b) Epinions ()|
|(d) Epinions (Trivalancy)||(e) Epinions (Weighted Cascade)|
Figure 3 shows the budget vs. number of influenced nodes plot for the Epinions dataset. Like previous two datasets, in this dataset also we observe that the seed set selected by the proposed methodologies leads to the more number of influenced nodes compared to the baseline methods. As an example in case of uniform probability setting when and , among the proposed methodologies the seed set selected by the BIMGTC leads to the maximum number of influenced nodes which is . On the other hand, among the existing methods the seed set selected by ComBIM leads to the maximum number of influenced nodes which is . From Table 2, it can be observed that the number of nodes of the Epinions dataset is . Hence, the percentage of nodes influenced by these seed nodes are and , respectively. So, there is an approximate gap of in terms of expected influence. It is also important to observe that due to the exploitation of the community structure of the input network, the number of influenced nodes increases. As an example when and , the number of influenced nodes by the seed set selected by BIMGTC and BIMGT are and , respectively. Next, we proceed to discuss computational time requirement.
5.5.2 Computational Time
Table 3 shows the execution time of different algorithms for seed set selection. From this table it has been observed that the RANDOM takes the least amount of time across all the datasets. Next, MAXDEG takes more time than random as it needs to compute the degree of the nodes. MAXCLUS takes even more time than MAXDEG because computing clustering coefficient is much more computationally expensive operation than the degree. Remaining existing methods takes much more time than the baseline methods. It is important to observe that the computational time requirement of both the proposed methodologies are much less than that of the DAG-Based heuristic. It has been observed that the proposed methodologies takes more computational time compared to some of the baseline methods. However, it is important to realize that in many practical applications including viral marketing, computational advertisement etc., it is important to have an algorithm for seed set selection which has reasonable computational time with significant influence coverage. In this aspect, the proposed methodologies are far ahead compared to many existing methods.
6 Conclusion and Future Direction
In this paper, we have proposed a co-operative game theoretic framework for the budgeted influence maximization problem. Particularly, we formulate a co-operative game, where the users of the network are the players, and for any subset of the players, their utility is defined as the expected influence by the users of the subset under the MIA Model of diffusion. We have used the solution concept called ‘shapley value’ and proposed an iterative algorithm for selecting influential users in the network. We have also shown that the proposed methodology can select better quality seed set if the community structure of the network is exploited. Experiments with real-world social network datasets demonstrate the superiority of the proposed methodologies. Now, this study can be extended in different directions. First of all, we have not given any approximation guarantee of our proposed methodologies with respect to an optimal seed set. It will be interesting to come up with some worst case performance guarantee for our proposed methodologies. We have not considered the time-varying nature of the social network. There are many other solution concepts of co-operative game, such as Banzhaf index Dubey and Shapley (1979) etc. These solution concepts can be used instead of ‘shapley value’ and compare the performance with the proposed methodologies.
- Angell and Schoenebeck (2017) Angell, R., Schoenebeck, G., 2017. Don’t be greedy: Leveraging community structure to find high quality seed sets for influence maximization, in: International Conference on Web and Internet Economics, Springer. pp. 16–29.
- Aslay et al. (2015) Aslay, C., Lu, W., Bonchi, F., Goyal, A., Lakshmanan, L.V., 2015. Viral marketing meets social advertising: Ad allocation with minimum regret. Proceedings of the VLDB Endowment 8, 814–825.
- Banerjee et al. (2019) Banerjee, S., Jenamani, M., Pratihar, D.K., 2019. Combim: A community-based solution approach for the budgeted influence maximization problem. Expert Systems with Applications 125, 1–13.
- Banerjee et al. (2020) Banerjee, S., Jenamani, M., Pratihar, D.K., 2020. A survey on influence maximization in a social network. Knowledge and Information Systems , 1–39.
- Blondel et al. (2008) Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E., 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment 2008, P10008.
- Borodin et al. (2010) Borodin, A., Filmus, Y., Oren, J., 2010. Threshold models for competitive influence in social networks, in: International workshop on internet and network economics, Springer. pp. 539–550.
- Bozorgi et al. (2016) Bozorgi, A., Haghighi, H., Zahedi, M.S., Rezvani, M., 2016. Incim: A community-based algorithm for influence maximization problem under the linear threshold model. Information Processing & Management 52, 1188–1199.
- Chen et al. (2010a) Chen, W., Liu, Z., Sun, X., Wang, Y., 2010a. A game-theoretic framework to identify overlapping communities in social networks. Data Mining and Knowledge Discovery 21, 224–240.
- Chen et al. (2010b) Chen, W., Wang, C., Wang, Y., 2010b. Scalable influence maximization for prevalent viral marketing in large-scale social networks, in: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM. pp. 1029–1038.
- Chen et al. (2012) Chen, Y., Chang, S., Chou, C., Peng, W., Lee, S., 2012. Exploring community structures for influence maximization in social networks, in: The 6th SNA-KDD Workshop on Social Network Mining and Analysis Held in Conjunction with KDD, pp. 1–6.
- Chen et al. (2014) Chen, Y.C., Zhu, W.Y., Peng, W.C., Lee, W.C., Lee, S.Y., 2014. Cim: Community-based influence maximization in social networks. ACM Transactions on Intelligent Systems and Technology (TIST) 5, 25.
- Clark and Poovendran (2011) Clark, A., Poovendran, R., 2011. Maximizing influence in competitive environments: A game-theoretic approach, in: International Conference on Decision and Game Theory for Security, Springer. pp. 151–162.
- Cohen et al. (2014) Cohen, E., Delling, D., Pajor, T., Werneck, R.F., 2014. Sketch-based influence maximization and computation: Scaling up with guarantees, in: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 629–638.
- Cormen et al. (2009) Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C., 2009. Introduction to algorithms. MIT press.
- Ding et al. (2010) Ding, F., Liu, Y., Shen, B., Si, X.M., 2010. An evolutionary game theory model of binary opinion formation. Physica A: Statistical Mechanics and its Applications 389, 1745–1752.
- Domingos (2005) Domingos, P., 2005. Mining social networks for viral marketing. IEEE Intelligent Systems 20, 80–82.
- Domingos and Richardson (2001) Domingos, P., Richardson, M., 2001. Mining the network value of customers, in: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, ACM. pp. 57–66.
- Dubey and Shapley (1979) Dubey, P., Shapley, L.S., 1979. Mathematical properties of the banzhaf power index. Mathematics of Operations Research 4, 99–131.
- Goyal et al. (2011a) Goyal, A., Lu, W., Lakshmanan, L.V., 2011a. Celf++: optimizing the greedy algorithm for influence maximization in social networks, in: Proceedings of the 20th international conference companion on World wide web, ACM. pp. 47–48.
- Goyal et al. (2011b) Goyal, A., Lu, W., Lakshmanan, L.V., 2011b. Simpath: An efficient algorithm for influence maximization under the linear threshold model, in: 2011 IEEE 11th international conference on data mining, IEEE. pp. 211–220.
- Guille et al. (2013) Guille, A., Hacid, H., Favre, C., Zighed, D.A., 2013. Information diffusion in online social networks: A survey. ACM Sigmod Record 42, 17–28.
- Güney (2019) Güney, E., 2019. On the optimal solution of budgeted influence maximization problem in social networks. Operational Research 19, 817–831.
- HAFIENE et al. (2020) HAFIENE, N., KAROUI, W., ROMDHANE, L.B., 2020. Influential nodes detection in dynamic social networks: A survey. Expert Systems with Applications , 113642.
- Han et al. (2014) Han, S., Zhuang, F., He, Q., Shi, Z., 2014. Balanced seed selection for budgeted influence maximization in social networks, in: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer. pp. 65–77.
- Hong and Liu (2019) Hong, T., Liu, Q., 2019. Seeds selection for spreading in a weighted cascade model. Physica A: Statistical Mechanics and its Applications 526, 120943.
- Hossain (2012) Hossain, M., 2012. Users’ motivation to participate in online crowdsourcing platforms, in: 2012 International Conference on Innovation Management and Technology Research, IEEE. pp. 310–315.
- Jung et al. (2012) Jung, K., Heo, W., Chen, W., 2012. Irie: Scalable and robust influence maximization in social networks, in: 2012 IEEE 12th International Conference on Data Mining, IEEE. pp. 918–923.
- Ke et al. (2018) Ke, X., Khan, A., Cong, G., 2018. Finding seeds and relevant tags jointly: For targeted influence maximization in social networks, in: Proceedings of the 2018 International Conference on Management of Data, pp. 1097–1111.
- Kempe et al. (2003) Kempe, D., Kleinberg, J., Tardos, É., 2003. Maximizing the spread of influence through a social network, in: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM. pp. 137–146.
- Kempe et al. (2005) Kempe, D., Kleinberg, J., Tardos, É., 2005. Influential nodes in a diffusion model for social networks, in: International Colloquium on Automata, Languages, and Programming, Springer. pp. 1127–1138.
- Kempe et al. (2015) Kempe, D., Kleinberg, J.M., Tardos, É., 2015. Maximizing the spread of influence through a social network. Theory of Computing 11, 105–147. URL: https://doi.org/10.4086/toc.2015.v011a004, doi:10.4086/toc.2015.v011a004.
- Kostka et al. (2008) Kostka, J., Oswald, Y.A., Wattenhofer, R., 2008. Word of mouth: Rumor dissemination in social networks, in: International colloquium on structural information and communication complexity, Springer. pp. 185–196.
- Leskovec et al. (2007a) Leskovec, J., Kleinberg, J., Faloutsos, C., 2007a. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD) 1, 2.
- Leskovec et al. (2007b) Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., Faloutsos, C., VanBriesen, J., Glance, N., 2007b. Cost-effective outbreak detection in networks, in: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM. pp. 420–429.
- Li et al. (2015) Li, H., Bhowmick, S.S., Cui, J., Gao, Y., Ma, J., 2015. Getreal: Towards realistic selection of influence maximization strategies in competitive networks, in: Proceedings of the 2015 ACM SIGMOD international conference on management of data, pp. 1525–1537.
- Li et al. (2017) Li, M., Wang, X., Gao, K., Zhang, S., 2017. A survey on information diffusion in online social networks: Models and methods. Information 8, 118.
- Li et al. (2018) Li, Y., Fan, J., Wang, Y., Tan, K.L., 2018. Influence maximization on social graphs: A survey. IEEE Transactions on Knowledge and Data Engineering 30, 1852–1872.
- Logins and Karras (2019) Logins, A., Karras, P., 2019. Content-based network influence probabilities: Extraction and application, in: 2019 International Conference on Data Mining Workshops (ICDMW), IEEE. pp. 69–72.
- Maleki et al. (2013) Maleki, S., Tran-Thanh, L., Hines, G., Rahwan, T., Rogers, A., 2013. Bounding the estimation error of sampling-based shapley value approximation. arXiv preprint arXiv:1306.4265 .
- Narahari (2014) Narahari, Y., 2014. Game theory and mechanism design. volume 4. World Scientific.
- Narayanam and Narahari (2010) Narayanam, R., Narahari, Y., 2010. A shapley value-based approach to discover influential nodes in social networks. IEEE Transactions on Automation Science and Engineering 8, 130–147.
- Nguyen and Zheng (2013) Nguyen, H., Zheng, R., 2013. On budgeted influence maximization in social networks. IEEE Journal on Selected Areas in Communications 31, 1084–1094.
- Rahimkhani et al. (2015) Rahimkhani, K., Aleahmad, A., Rahgozar, M., Moeini, A., 2015. A fast algorithm for finding most influential people based on the linear threshold model. Expert Systems with Applications 42, 1353–1361.
- Richardson et al. (2003) Richardson, M., Agrawal, R., Domingos, P., 2003. Trust management for the semantic web, in: International semantic Web conference, Springer. pp. 351–368.
- Richardson and Domingos (2002) Richardson, M., Domingos, P., 2002. Mining knowledge-sharing sites for viral marketing, in: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM. pp. 61–70.
- Shang et al. (2017) Shang, J., Zhou, S., Li, X., Liu, L., Wu, H., 2017. Cofim: A community-based framework for influence maximization on large-scale networks. Knowledge-Based Systems 117, 88–100.
- Shi et al. (2019) Shi, Q., Wang, C., Chen, J., Feng, Y., Chen, C., 2019. Post and repost: A holistic view of budgeted influence maximization. Neurocomputing 338, 92–100.
- Tang et al. (2015) Tang, Y., Shi, Y., Xiao, X., 2015. Influence maximization in near-linear time: A martingale approach, in: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, ACM. pp. 1539–1554.
- Tang et al. (2014) Tang, Y., Xiao, X., Shi, Y., 2014. Influence maximization: Near-optimal time complexity meets practical efficiency, in: Proceedings of the 2014 ACM SIGMOD international conference on Management of data, ACM. pp. 75–86.
- Tong et al. (2016) Tong, G., Wu, W., Tang, S., Du, D.Z., 2016. Adaptive influence maximization in dynamic social networks. IEEE/ACM Transactions on Networking 25, 112–125.
- Wang et al. (2012) Wang, C., Chen, W., Wang, Y., 2012. Scalable influence maximization for independent cascade model in large-scale social networks. Data Mining and Knowledge Discovery 25, 545–576.
- Wang et al. (2020) Wang, S., Yang, S., Xu, Z., Truong, V.A., 2020. Fast thompson sampling algorithm with cumulative oversampling: Application to budgeted influence maximization. arXiv preprint arXiv:2004.11963 .
- Wang and Yu (2020) Wang, S.B.Q.G.S., Yu, J.X., 2020. Efficient algorithms for budgeted influence maximization on massive social networks. Proceedings of the VLDB Endowment 13.
- Wen and Deng (2020) Wen, T., Deng, Y., 2020. Identification of influencers in complex networks by local information dimensionality. Information Sciences 512, 549–562.
- Wu et al. (2014) Wu, Y., Yang, Y., Jiang, F., Jin, S., Xu, J., 2014. Coritivity-based influence maximization in social networks. Physica A: Statistical Mechanics and its Applications 416, 467–480.
- Ye et al. (2012) Ye, M., Liu, X., Lee, W.C., 2012. Exploring social influence for recommendation: a generative model approach, in: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, ACM. pp. 671–680.
- Yu et al. (2018) Yu, Q., Li, H., Liao, Y., Cui, S., 2018. Fast budgeted influence maximization over multi-action event logs. IEEE Access 6, 14367–14378.
- Zimmermann and Eguíluz (2005) Zimmermann, M.G., Eguíluz, V.M., 2005. Cooperation, social networks, and the emergence of leadership in a prisoner’s dilemma with adaptive local interactions. Physical Review E 72, 056118.