A k-hop Collaborate Game Model: Extended to Community Budgets and Adaptive Non-Submodularity

04/02/2020 ∙ by Jianxiong Guo, et al. ∙ The University of Texas at Dallas 0

Revenue maximization (RM) is one of the most important problems on online social networks (OSNs), which attempts to find a small subset of users in OSNs that makes the expected revenue maximized. It has been researched intensively before. However, most of exsiting literatures were based on non-adaptive seeding strategy and on simple information diffusion model, such as IC/LT-model. It considered the single influenced user as a measurement unit to quantify the revenue. Until Collaborate Game model appeared, it considered activity as a basic object to compute the revenue. An activity initiated by a user can only influence those users whose distance are within k-hop from the initiator. Based on that, we adopt adaptive seed strategy and formulate the Revenue Maximization under the Size Budget (RMSB) problem. If taking into account the product's promotion, we extend RMSB to the Revenue Maximization under the Community Budget (RMCB) problem, where the influence can be distributed over the whole network. The objective function of RMSB and RMCB is adatpive monotone and not adaptive submodular, but in some special cases, it is adaptive submodular. We study the RMSB and RMCB problem under both the speical submodular cases and general non-submodular cases, and propose RMSBSolver and RMCBSolver to solve them with strong theoretical guarantees, respectively. Especially, we give a data-dependent approximation ratio for RMSB problem under the general non-submodular cases. Finally, we evaluate our proposed algorithms by conducting experiments on real datasets, and show the effectiveness and accuracy of our solutions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The prosperous development of online social networks (OSNs) has derived a number of famous social platforms, such as Facebook, LinkedIn, Twitter and WeChat, which have become the main means of communication. There are more than 1.52 billion users active daily on Facebook and 321 million users active monthly on Twitter. These social platforms have been taken as an effective advertising channel by many companies to promote their products through ”word-of-mouth” effect. It motivates the researches about viral marketing. The OSNs can be represented as an undirected graph, where nodes are users and edges denote the friendships between two users. Viral marketing, proposed by Domingos and Richardson [2] [3], aims to maximize the follow-up users by giving rewards, coupons, or discounts to a subset of users who are the most influential. Then, Kempe et al. [4] formulated Influence Maximization (IM) problem as a combinatorial problem: selects a subset of users as the seed set under the cardinality budget, such that the expected number of users who are influenced by this seed set can be maximized. They proposed two information diffusion models that were accepted by most researchers in subsequent researches: Independent Cascade model (IC-model) and Linear Threshold model (LT-model), and proved IM is NP-hard and its objective function is monotone submodular under these two models, thus, a good approximation can be obtained by natural greedy algorithm [5]. After this milestone work, a series of variant problems based on IM model and adapted to different application scenarios were emerged. For example, Revenue Maximization (RM), also called Profit Maximization, is a representative among them. RM problem is commonly studied in the literature [6] [7] [8] [9] [10] [11]. More different factors, such as product price, discount, cost [12] [13] [14] [15] and their impact on propagation, need to be considered in maximizing revenue.

From the perspective of companies, their objecitves are to maximize the exptected total revenue by selecting a subset of users from the whole network. However, most existing researches about RM problem were based on non-adaptive seeding strategy, where we select all seed nodes in one batch but do not consider the influence diffusion process. Thus, non-adaptive seeding is not the best choice to solve RM problem. Compared to that, adaptive one can response with a better seeding strategy because of making decision according to the real-time feedback from the users. Not only can it take advantage of the limited budget more wisely, but also adapt to the dynamic features of a social network. For example, the network topology is changing at any time because users join or leave, and friendships construct or terminate. In addition, most previous researches were based on the simple information diffusion model, such as IC-model or LT-model, where they considered the number of follow-up users that accept our information cascade. It applies user as a measurement unit to quantify the revenue, where each user corresponds to a fixed value of revenue. Sometimes, this is not enough and we need to apply activity as a measurement unit to quantify the revenue. Suppose some user in facebook is invited by a company to initiate an activity, after accepting and initiating this activity, the nerghbor users of this initiator would be infected. The initiator’s friends, or friends of friends, may be participate in this activity. Thus, Guo et al. [1] propose Collaborate Game model to characterize this scene, where the total revenue gained by the company is correlated to the number of successfully initiated activity. The number of participants is different among different activities. For a single activity, the revenue we can obtain from this activity is related to the number of its participants, but not simple linear relationship. In general, the closer the distance is from the initiator, the more business benefit this participant will provide.

Based on the Collaborate Game model, we propose the Revenue Maximization under the Size Budget (RMSB) problem. RMSB aims to maximize the total revenue by inviting a small number of users to initiate an activity in an adaptive strategy. We adopt adaptive seeding strategy because: before sending an invitation to a potential initiator, we do not know whether she will accept it and how many users around her will follow and join together. By adaptive strategy, we can observe the actual state of users and edges after sending an invitation. Besides, for a company, they should not only consider the largest revenue, but also consider the promotion effect of this product. According to that, we propose the Revenue Maximization under the Community Budget (RMCB) problem, where the number of invited users in each community of the targeted network is constrained. In this way, we can make sure there exists several activities happened in every area of the entire network, thus, the balance of influence distribution is guaranteed. Relied on the adaptive greedy policy, we design RMSBSolver and RMCBSolver algorithms to solve RMSB and RMCB respectively. The objective function of RMSB and RMCB problem is adaptive monotone and not adaptive submodular, but in some special cases, it is adaptive submodular. In this paper, the community budget of RMCB can be represented by use of partition matroid, and we can obtain a constant approximation ratio for RMCB problem under the special submodular cases. For the RMSB problem, it is an open question left by [1] how to get a theoretical bound for general non-submodular cases. In this paper, we attempt to solve it. Smith et al. [16] proposed the concept of adaptive primal curvature and adaptive total primal curvature. We give a bound for adaptive total primal curvature and then, with the help of that, we can obtain a data-dependent approximation ratio for RMSB problem. Our contributions are summarized as follows:

  1. Based on Collaborate Game model, we propose RMSB and RMCB problem, which are adaptive monotone but not adaptive submodular.

  2. We design RMSBSolver and RMCBSolver algorithms to RMSB and RMCB problem, generalize RMCB to partition matroid, and obtain a -approximation under the special submodular cases.

  3. To RMSB problem, we prove the solution returned by RMCBSolver satisfies a data-dependent -approximation under the general non-submodular cases.

  4. Our proposed algorithms are evaluted on real-world datasets, and it shows that our algorithms are effective and better than other baseline algorithms.

Organiztion: In Section 2, we survey the related work in RM and adaptive submodular optimization. We then present RMSB and RMCB problem in Section 3, discuss the algorthms in Section 4 and introduce the comprehensive theoretical analysis in Section 5. Finally, we conduct experiments and conclude in Section 6 and Section 7.

2 Related Work

Domingos and Richardson [2] [3] were the first to study viral marketing and the value of customers in social networks. Kempe et al. [4] studied IM as a discrete optimization problem and generalized IC-model and LT-model to triggering model, who provided us with a greedy algorithm and a constant approximation ratio. RM is an important variant problem of IM, and its existing researches focused on in the non-adaptive setting. [7] [8] studied the problem that selects quality seed users such that maximizing revenue with the help of influence diffusion. Lu et al. [7]

extended the LT-model to include prices and valuation, who used a heuristic unbedgeted greedy framework to solve this problem. Tang et al.

[8] applied deterministic and randomized double greedy algorithms [17] to solve the RM problem. If the objective function is non-negative and submodular, they obtained a - and -approximation ratio respectively. Zhang et al. [9] investigated RM problem with multiple adoptions, which aimed at maximizing the overall profit across all products. Liu et al. [18] considered RM with coupons in his new model, independent cascade model with coupons and valuations (IC-CV), and solved it based on local search algorithm [19]. Recently, Tong et al. [20] designed randomized algorithms, called simulation-based and realization-based RM, to address the RM with coupons and achieved a

-approximation with high probability. Guo et al.

[21] proposed a composed influence model with complementary products and gave a solution by use of sandwich approximation framework [22].

In the adaptive setting, Golovin et al. [23] were the first to study the adaptive submodular optimization problem. Similar to the monotonicity and submodularity of set function, they extended these two concepts to adaptive version, adaptive monotonicity and adaptive submodularity, and proved that the solution returned by adaptive greedy policy is a -approximation if the objective function is adaptive monotone and adaptive submodular. Applied it to social networks, Tong et al. [24] provided a systematic study on the adaptive influence maximization problem with different (partial or full) feedback model, especially for the algorithmic analysis of the scenarios when it is not adaptive submodular. Smith et al. [16] introduced two important concepts, adaptive primal curvature and adaptive total primal curvature, to obtain a valid approximation ratio for adaptive greedy policy when the objective function is adaptive monotone but not adaptive submodular. Further, when the objective function is not adaptive monotone, but adaptive submodular, Gotoves et al. [25] extended random greedy algorithm to adaptive policy and obtained a -approximation. Other researches on the application of adaptive strategy can refer to [26] [27] [28] [29].

3 Problem and Preliminaries

In this section, we talk about some preliminary knowledges and concepts to the rest of paper, which maily includes model review and problem definition.

3.1 Model Recapitulation

The problem in this paper is an extension based on Collaborate Game model [1], thus, we need to review it briefly here. If you want to know more details about Collaborate Game model, please read [1]. The Collaborate Game model is defined in the targeted network, which is an undirected graph where is the set of users and is the set of edges. If the edge exists, it means user and are friend, and there is a probability representing their intimacy. For each user , we denote as the set of users who are the neighbor (friend) of .

Definition 1 (Collaborate Game [1]).

When a game company invites a user to play their game, she accepts this invitation to launch this game with probability . If she accepts it, we call her as ”initiator”, and in round , the initiator will invite each of her friend to participate in this game with success probability . We call user as 0-hop participant and user who accepts the invitation from as 1-hop participant. In round , each (i-1)-hop participant will invite each of her friend to participate in this game with success probability . Then, the process terminates until finishing round .

Then, we have a ”Tendency Assumption”: assuming that a user is an i-hop participant to one initiator and a j-hop participant to another initiator , where , at this time, we consider user will choose to be an i-hop participant to initiator . There are two additional parameters associated to our model:

  1. Acceptance vector

    , where each element is an acceptance probability for user . It quantifies the likelihood for user to accept to be an initiator when she receives the invitation from the company. In this paper, we assume

    is uniformly distributed in interval

    .

  2. Revenue vector , where each element is the revenue the company can gain from an i-hop participant. In this paper, we assume according to the ”Benefit Diminishing” assumption [1].

Remark 1.

The Collaborate Game model is based on such a scenario: A game company wants to promote their new multiplayer game over the targeted network by inviting some users to play this game. Once they accept it, they will attract their friends recursively to participate in this game. Then, game company can obtain revenue from it. Since we assume that the initiator’s influence range is at most k-hop from her, this model is called k-hop Collaborate Game model as well. Thus, the ”Benefit Diminishing” assumption means those participants whose are far from the initiator will provide less benefit to the company.

From the perspective of companies, they are not sure whether she will accept it before sending an invitation to a potential initiator, and do not know how many users around her will follow and join together. Therefore, they need to adopt an adaptive strategy. Before determining who should be the next potential initiator, they need to make an observation to the change of states of both users and networks from the last invitation. Given targeted network , for each user , the state of can be denoted by , where means user accepts to be an initiator because of the company’s invitation and means user rejects to be an initiator. means user is unknown, who did not receive an invitation from the company. The states of all users are at the beginning. Similarly, for each edge , the state of can be denoted by , where indicates user (resp. ) did not accept the invitation from user (resp. ) and indicates user (resp. ) is willing to play the game with user (resp. ). Once determined, it cannot be changed. indicates there is no invitation that happens between user and user . The states of all edges are at the beginning.

Fig. 1: An example that shows the adaptive process: Here, we assume , where the state of yellow nodes are , other nodes are ; Green and purple nodes are 1-hop and 2-hop participants; The state of red, blue and dotted line are , and . First, we invite node , the states shown as left part; Then, we invite node , the states shown as right part. We can see that node is changed from 2-hop to 1-hop participant because of node . [1]

After defining the states of user and edge, we have a function be all possible states, which is called as realization. Thus, we say that is the state of user and is the state of edge under the realization . We denote by a random realization and by

the probability distribution over all realizations. Besides, each realization should be consistent. Here, each user can only be one of states in

, and each edge can only be one of states in identically. After each pick, our observations so far can be represented as a partial realization , which is a function from observed objects to their states. Then, is the domain of , that is, the observed objects in . A partial realization is consistent with a realization if they are equal everywhere in the domain of , denoted by . If and are both consistent with some , and , we say is a subrealization of , denoted by . Besides, we denote the observed users in the domain of by , and the observed edges in the domain of by . [1]

3.2 Problem Definition

At each pick, the game company attempts to send an invitation to a user according to current partial realization , and observe the ’s state . If , we need to update , in other words, update the states of edges whose distance are less than k-hop from user . In this process, user attracts her nearby users to play together. If , we do nothing and go to the next pick. An example is shown as Fig. 1. Let be such an adaptive policy, and be the set of users who are invited by the game company according to policy under the realization . The total revenue gained according to policy under the realization can be defined as follows: [1]

(1)

where and is the set that contains all i-hop participants to the company according to policy under the realization , thus, we have

(2)
(3)

Finally, we can evaluate the performance of a policy by its expected revenue, and we have

(4)

where the expectation is taken with respect to . Then, under the k-hop Collaborate Game model, the Revenue Maximization under the Size Budget (RMSB) [1] is formulated as follows:

Problem 1.

Given a targeted network , an acceptance vector , a size budget and a revenue vector , we aim to find a policy such that maximizing the expected revenue under the k-hop Collaborate Game model, , subject to for all realization .

However, a size budget is not enough because not only we want to maximize the revenue, but also need to consider the influence distribution. Generally, there exists a community structure given a social network, which is a partition of this network. Given a targeted network , we assume it exists a unique community structure as associated with , where is a partition of . In other words, it means that and for any . Denote community budget by vector , . It means that the number of users that invited to be initiators by the company in each community cannot be larger than budget . Then, under the k-hop Collaborate Game model, the Revenue Maximization under the Community Budget (RMCB) is formulated as follows:

Problem 2.

Given a targeted network , a community structure associated with , an acceptance vector , a community budget and a revenue vector , we aim to find a policy such that maximizing the expected revenue under the k-hop Collaborate Game model, , subject to for any and all realization .

Community budget in RMCB problem can be generalized to the matroid constraint, and we introduce some besic concepts about matroid here. A matroid

is an order pair

, where is the ground set, node set in this paper, and is the collection of independent sets, which satisfies: (1) For all , if then ; (2) For all with , we have such that . Given a matroid , the matroid constraint means that a feasible solution . The bases of matroid, denoted by , are those satisfy and such that . All bases of a matroid have the same size. There is a kind of special matroid, Partition Matroid, that is related to our problem: The ground set is partitioned into disjoint sets , community structure in this paper, where . Let , community budget in this paper, be an integer with for any . is an indepndent set, , if and only if for any . The matroid whose independet sets are defined in this form is called partition matroid. Therefore, we want to find such that for all realization , and

(5)

where is a collection of independent sets defined by partition matroid we talk about before.

4 Algorithm

In this section, we propose our algorithms to solve RMSB and RMCB problem. Adaptive greedy policy, proposed by Golovin et al. [23], is an effective method to solve such problems. It can be divided into two steps at each iteration:

  1. Send an invitation to the user who has the largest expected increment of revenue based on the partial realization .

  2. Observe the state of this user, if she accepts this invitation, update the states of edges whose distance within k-hop from her; Otherwise, back to (1) to invite the next user.

The Conditional Exepected Marginal Benefit under the adaptive surrounding can be shown as follows:

Definition 2 (Conditional Expected Marginal Benefit).

Given a partial realization and an user , the conditional expected marginal benefit of conditioned on observed is

(6)

In our RMSB and RMCB problem, is the expected revenue increment of user by sending an invitation to initiate user . It conditions on the previous observation to users and edges, partial realization , and the expectation is taken over all realization that are consistant with .

Based on adaptive greedy policy [23], RMSBSolver and RMCBSolver is proposed, shown as Algorithm 1 and Algorihtm 2. In algorithm 1, we invite a user with the largest expected revenue increment at each iteration until the number of invitation is larger than size budget . However, in Algorithm 2, it is more complicated. At each iteration, we need to check whether the candidate set is empty. This candidate set is determined by the collection of independent set , shown as Equation (5), given community budget . If this candidate set is empty, it means that the number of invitations at each community reaches its budget constraint, thus, stop inviting at that time.

1:  Initialize: ,
2:  for  to  do
3:     for user  do
4:         Computed by Monte-Carlo simulation or simplified method of Section 4.2 in [1]
5:     end for
6:     Select
7:     
8:     if  accepts the invitation then
9:        Update , the states of the edges whose distance are less than k-hop from
10:     end if
11:  end for
12:  return  
Algorithm 1 RMSBSolver

At each iteration, we need to compute the value of given current partial realization, which can be done by Monte-Carlo simulation: running this diffusion process many times and then take the average of them. In order to achieve satisfactory accuracy, the number of simulations cannot be too small, however, it increases the running time greatly. To improve its scabability, a simplified computing method was proposed in Algorithm 2, Section 4.2 of [1]. Its core idea is computing the value of by use of a technique similar to breath-first searching instead of Monto carlo simulation. It fixes all potential i-hop participants for in advance, ignores some cases where the probability of occurrence is low and computes the probability of potential participants to join this game from up to down. The time complexity to compute each is reduced to . The effectiveness of this method was validated by experiments on real-world dataset, and it reduces the running time significantly under the premise of ensuring accuracy. Besides, similar to [1], we do not need to compute for each user repeatedly at each iteration. If, at iteration , the selected user at the last iteration rejects the invitation, we do not need to update any . Otherwise, we only need to update for those users whose distences from is less than or equal to at iteration . It improves the efficieny of Algorithm 1 and Algorithm 2 further.

1:  Initialize: ,
2:  while  do
3:     
4:     if  then
5:        break
6:     end if
7:     for user  do
8:         Computed by Monte Carlo simulation or simplified method of Section 4.2 in [1]
9:     end for
10:     Select
11:     
12:     if  accepts the invitation then
13:        Update , the states of the edges whose distance are less than k-hop from
14:     end if
15:  end while
16:  return  
Algorithm 2 RMCBSolver

5 Theoretical Analysis

In this section, we discuss some related conclusions about RMSB and RMCB problem, and then extend them to generalized non-submodular cases.

5.1 Related Conclusions

Before starting our discussion, we introduce two important concepts, defined in [23], as follows:

Definition 3 (Adaptive Monotone).

A function is adaptive monotone with respect to distribution if the conditional expected marginal benefit of any user is nonnegative, i.e., for all with and all , we have

(7)
Definition 4 (Adaptive Submodular).

A function is adaptive submodular with respect to distribution if the conditional expected marginal benefit of any user does not increase as more states of users and edges are observed, i.e., for all and such that is a subrealization of () and all , we have

(8)

Our RMSB and RMCB problem are two instances of adaptive optimization problem, thus, determined theoretical bounds can be obtained for both instances by adaptive greedy policy if our objective function, Equation (4), is adapative monotone and adaptive submodular. Based on [1], we have seveal conclusions as follows:

Theorem 1.

The RMSB and RMCB problem is NP-hard, and there are no solutions with polynomial time unless NPP.

Proof.

The RMSB is NP-hard [1], because it can be reduced to Maximum Coverage problem when setting revenue vector , acceptance vector and for each . Then, when there is only one community in targeted network

, and community budget is a scalar, at this moment, RMCB can be reduced to RMSB problem. Thus, RMCB problem is NP-hard as well. ∎

Lemma 1.

The objective function of the RMSB and RMCB problem is adaptive monotone.

Lemma 2.

The objective function of the RMSB and RMCB problem is not adaptive submodular.

Lemma 3 ([1]).

If the RMSB and RMCB problem conform one of following two special cases:

  1. For all ,

The objective function is adaptive submodular.

Those properties suitable to RMSB problem can be applied to RMCB problem, because their objective function is identical, but constraints are different. Therefore, for RMSB problem, we have

Theorem 2.

The adaptive greedy policy , given by Algorithm 1, for RMSB problem is a -approximate solution when or for each . Hence, we have

(9)

where is optimal policy s.t. for all .

Proof.

From Lemma 1, the objective function is adaptive monotone, and from Lemma 3, is adaptive submodular when or for each . The adaptive greedy policy is a -approximation under the size constraint according to the conclusion of [23]. ∎

In [30], Golovin et al. analyzed the theoretical performance of adaptive submodular optimization under p-independence system. The main conclusions are summarized as follows:

Lemma 4 ([30]).

Given an adaptive monotone and submodular function and a p-independence system , the adaptive -approximate greedy policy under the constraint yields and -approximation. Hence, we have

(10)

where is optimal policy s.t. for all .

Therefore, for RMCB problem, we have

Theorem 3.

The adaptive greedy policy , given by Algorithm 2, for RMCB problem is a -approximate solution when or for each . Hence, we have

(11)

where is optimal policy s.t. for any and all .

Proof.

From Lemma 1, the objective function is adaptive monotone, and from Lemma 3, is adaptive submodular when or for each

. As we said before, the constraint of community budget can be classified to partition matroid. The independent set

in Lemma 4 can be defined by Equation (5). It is 1-independence system, thus, we have . And the adaptive greedy policy is exact, which means that . According to Lemma 4, Equation (10), we have finally. ∎

5.2 Non-Submodularity

In addition to the special cases we discuss in the last subsection, the objective function of RMSB problem is not adaptive submodular, thus, the approximation ratio in Theorem 2 cannot be held in general cases. In order to deal with the general cases that is not adaptive submodular, we get help from the concept of adaptive primal curvature, proposed by [16], to obtain the approximation bound for adaptive greedy policy without adaptive submodularity. Wang et al. [31] were the first to propose the concept of elemental curvature, which is the maximum ratio of the marginal gain of an element at any subset and for any element . Extended to adaptive case, adaptive primal curvature [16] was formulated, which is the ratio of the marginal gain of element under partial realization and , where is possible state for any element . Thus, we have

Definition 5 (Adaptive Primal Curvature [16]).

Given a adaptive momotone function , the adaptive primal curvature is

(12)

where is the state set of and is conditional expected marginal benefit, defined as Definition 2.

From Equation (12), the measures the changing of the conditional expected marginal benefit because of added to the solution previously. Under the background of RMSB and RMCB problem, the meaning of is a little different. Here, and represent the users that are invited to be initiators. Let us look at the term: , can be or . When , means we only add state to . But when , the meaning of will be more complicated, where we not only add the state to , but also all state change of edges since becomes an initiator should be added to . Thus, the expectation of primal curvature is with respect to the acceptance probability of and , and edge probabilities. In the non-adaptive case, is determined, the elemental curvature is the maximum primal curvature. To measure the total change form partial realization to , total primal curvature [16] is defined as follows:

Definition 6 (Adaptive Total Primal Curvature [16]).

Let and let represents the set of possible state sequences changing from partial realization to . Then the adaptive total primal curvature is

where and

which is a simplified notation of the adaptive primal curvature corresponded to a single state .

In the following, we can find a constant upper bound of the adaptive total primal curvature for the RMSB problem and a relationship between any policy with budget and adaptive greedy policy with any budget. Then, according to that, the approximation ratio of general RMSB problem can be found later.

Lemma 5.

The adaptive total primal curvature for any , and is upper bounded by , in the general cases for RMSB problem, where

(13)
Proof.

The adaptive total primal curvature can be denoted by . Let us assume sequence , we have

From above, the product for any sequence from partial realization to can be reduced to trivially. Then, we have

(14)

where is the marginal gain assuming that user accepts to be the initiator. For , the revenue from user is at most and from each of user in is at most . Considering an extreme example, there exists an edge from user to each of user and the edge probability is equal to for each edge. Thus, the total revenue because of selecting as initiator could be at most . For , we need to know the current participation role of user , maybe user is a x-hop participant, before selecting as initiator. The revenue from user is equal to . Based on Benefit diminishing assumption, , the revenue from user is at least . Considering an extreme example, all neighbors of user have been 1-hop participants before selecting as initiator, there is no revenue for other users. Thus, the total revenue because of selecting as initiator could be at most . Combined with the above analysis, the Lemma is held. ∎

Let us denote by any policy that inviting exact users, and by the adaptive greedy policy that inviting users. Besides, we define as the user who is invited to be an initiator in adaptive greedy policy. Then, we can bound the adaptive greedy policy that inviting users with any policy that inviting exact users.

Lemma 6.

The difference of expected revenue bewtween any policy and adaptive greedy policy in the general cases for RMSB problem can be bounded by

(15)

where we assume and is defined as .

Proof.

We know because is adaptive monotone [23]. From it, we have

The difference of expected revenue between and can be bounded by running after running . It means that we need to send invitations again. Supposing that is the partial realization generated by , the initiators selected by is on partial realization such that . Then, we have if and , because from Lemma 5. Therefore,

where the first inequality is from seeding invitations again, and other equality from Definition 2 and Equation (4). ∎

So far, we can obtain the main theorem in this paper that gets the approximation of RMSB problem in the general cases that is not adaptive submodular.

Theorem 4.

The approximation performance of Adaptive greedy policy , given by Algorithm 1, in the general cases for RMSB problem satisfies the following:

(16)

when we hypothesize .

Proof.

According to the Lemma 6, can be bounded by , we have

(17)
(18)

where and . We note that . Multiply both side by of Inequality (17) and sum from to . The left hand side of Inequality (17) can be reduced to

Similarly, the right hand side of Inequality (17) can be written as follows:

(19)

Then, (19) can be reduced to

Rearrange and separate the term , then combine its coefficients, we have

Here, if we assume

(20)

Then, we have

Therefore, combining with above, we have

(21)

Then, we can obtain the conclusion, Inequality (16), when from Inequality (21). ∎

5.3 Further Estimation

From the Lemma 5 in the last subsection, it gives us an upper bound of for any , and

. However, this estimation is too rough to get a valid approximation ratio. A natural question is whether we can obtain a tight upper bound. For

, there exists a more precise solution which depends on the structure of network dataset. Given a user , all the potential following participants can be found by breath-first searching, here we assume the edge probability in network is equal to . Then, we can determine the hop of these followers from this initiator . The maximum revenue from user and her following participants can be known. Finally, we choose the largest one among all users as the upper bound of . Given targeted network and revenue vector , the calculation steps are as follows:

  1. For each user , we find all potential followers by breath-first searching, defined as , where and , , is the set of potential j-hop participants to user . It means that the length of the shortest path from a user in to user is equal to .

  2. For each user , we compute the maximum possible revenue from and her followers, which can be expressed as , where

    (22)
  3. Finally, we select the largest as the upper bound of .

Thus, from above process, we can get a estimated vector , which can be computed directly according to the graph dataset. Then, we have

Lemma 7.

The adaptive total primal curvature for any , and is upper bounded by , in the general cases for RMSB problem, where

(23)

The value of is data-dependent. When is smaller and the degree distribution is more uniform, the upper bound is more tight. It is in line with our intuition.

6 Experiment

In this section, we conduct several experiments to validate the correctness and efficiency of our proposed algorithms on real datasets. It evaluates Algorithm 1 and 2 with some common used baseline algorithms.

6.1 Dataset Description and Statistics

Our experiments are relied on the datasets from networkrepository.com [32], an online network repository. There are three datasets used in our experiments: (1) Dataset-1: A co-authorship network, where each edge is a co-authorship among scientists to publish papers in the area of network theory. (2) Dataset-2: A Wiki network, which is a who-votes-on-whom network collected from Wikipedia. (3) A Collaboration network extracted from Arxiv General Relativity. The statistics information of the three datasets is represented in table I.

Dataset n m Type Average degree
Dataset-1 0.4K 1.01K undirected 4
Dataset-2 1.0K 3.15K undirected 6
Dataset-3 5.2K 14.5K undirected 5
TABLE I: The statistics of three datasets

6.2 Experimental Settings

As mentioned earlier, our proposed algorithms are based on the following parameters: Hop number (at most k- hop participants follow an initiator), Acceptance vector , Revenue vector , budget and edge probability. There are two experiments we perform to test Algorithm 1 and 2 used to solve RMSB and RMCB problem, called submodualr performance and non-submodular performance. Shown as Lemma 3, the objective function of RMSB and RMCB is adaptive submodular when or for all . Thus, for submodular performance, we set (1) edge probability for all and ; (2) edge probability for all and . And in Algorithm 1 and 2, we estimate the Conditional Expected Marginal Benefit for each by the simplifed method of Section 4.2 in [1]. For non-submodular performance, we set (1) edge probability for all and ; (2) edge probability for all and . We conduct these two experiments on the three datasets shown as above, and analyze their experimental results.


(a) Dataset-1

(b) Dataset-2

(c) Dataset-3

Fig. 2: Submodular performance: the performance comparison of algorithms with different budgets on these three datasets. Parameter settings are for each and . Left column is to solve RMSB problem, and right column is to solve RMCB.

(a) Dataset-1

(b) Dataset-2

(c) Dataset-3

Fig. 3: Submodular performance: the performance comparison of algorithms with different budgets on these three datasets. Parameter settings are for each and . Left column is to solve RMSB problem, and right column is to solve RMCB.

(a) Dataset-1

(b) Dataset-2

(c) Dataset-3

Fig. 4: Non-submodular performance: the performance comparison of algorithms to solve RMSB problem with different budgets on these three datasets. Left column: parameter settings are for each and . Right column: parameter settings are for each and .

For submodular performance and non-submodular performance, we observe the outcomes of our Algorithm 1, Algorithm 2 and some common used heuristic algorithms, then we compare the results of their performance. It aims to evaluate the effectiveness of the adaptive greedy policy on adaptive (non-)submodular cases under the size and community budget. The Revenue vector can be set as: (1) when ; (2) when ; (3) when . It means that the revenue of 0-hop participant is 8 units, 1-hop is 6 units, 2-hop is 4 units and 3-hop is 2 units. The baseline algorithms are shown as follows:

  1. MaxDegree: Invite the user with maximum degree at each step within budget .

  2. Random: Invite a user randomly from at each step within budget .

  3. MaxProb: Invite the user with maximum acceptance probability at each step within budget .

We use python programming to test each algorithm. The simulation is run on a Windows machine with a 3.40GHz, 4 core Intel CPU and 16GB RAM.

6.3 Experimental Results

In our experiments, the whole datasets are considered as the targeted networks. These adaptive algorithms, including our proposed algorithms and other baseline algorithms, are run 50 times on each network, and we take the average of them as the final results. To size budget, it is easy to understand that the number of invitations is limited as . However, for RMCB problem, we need to know how to define this community budget . Here, we adopt such a strategy: Suppose the total number of invitations in budget is predefined, the budget for each community can be represented as follows: Given and budget vector , we have

(24)

It is possible that is not equal to the total invitations of . We need to make some adjustments such that is equal to the total invitation of : add one or mimus one to each community’s budget from large community to small community until satisfying equality. Let us look at a specific example:

Example 1.

Considering the right columns of Fig. 2 and Fig. 3, ”Budget ” means the total invitations of is . Suppose that the community structure is , where , , , and , we have , , , and . Now, , thus, we let , and others maintain so that satisfying . This is because and are the largest communities.

For submodular performance, the experimental results achieved by different algorithms on these three datasets are shown as Fig. 2 and Fig. 3. The left columns of Fig. 2 and Fig. 3 are under the size budget, and we can see that the expected total revenues returned by RMSBSolver is larger than other policies. The right columns of Fig. 2 and Fig. 3 are under the community budget, and the performance of RMCBSolver is better than other policies as well. Especially on Dataset-3, the gap between RMSBSolver and other baseline algorithms is very significant, which may be related to the size of the dataset and the network structure. The performance of baseline algorithm, such as MaxDegree, is on and off, good for Dataset-2 and bad for Dataset-3. Thus, we cannot predict whether it is good or bad in advance. An interesting discovery is that, intuitively, the total revenue of RMCB should be smaller than that of RMSB, because the constraint of RMCB is more strict and the approximation performance of RMSBSolver is better than that of RMCBSolver. But, from the results, their expected total revenues are very close regardless of adaptive greedy strategy or other heuristic policies, even sometimes the revenues obtained under the community budget are better. For example, on Dataset-3, the performance of MaxDegree algorithm under the community budget is apparently better than that under the size budget. Therefore, considering the distribution of influence and the average revenue comprehensively, community budget is a more sensible choice to us.

For non-submodular performance, the experimental results achieved by different algorithms on these three datasets are shown as Fig. 4. We can see that the expected total revenues returned by RMSBSolver is larger than other policies. Same as before, on Dataset-3, the gap between RMSBSolver and other baseline algorithms is very significant. Even though the objective function of RMSB is not adaptive submodular, from the actual performance, this effect does not seem obvious. In other words, the objective function of RMSB is close to adaptive submodular. According to Lemma 7, the bound of these three datasets are shown as follows:

Dataset-1 Dataset-2 Dataset-3
106 310 247
199 1060 820
TABLE II: The bound of three datasets

Unfortunately, this results is frustrating. Even if the bound given by Lemma 7 is much smaller than that given by Lemma 5, it is still too large to get a satisfactory approximation ratio. Thus, we need to look for a more compact estimation method of bound further so as to get a practical and meaningful approximation ratio.

7 Conclusion

In this paper, we propose RMSB problem based on Collaborate Game model. In order to consider the distribution of influence and total revenue, we extend RMSB to RMCB problem by use of community budget. We proposed RMSBSolver and RMCBSolver to address both of them according to the adaptive greedy policy. The objective function of RMSB and RMCB is adaptive monotone and not adaptive submodular, but adaptive submodular in some special cases. Then, we reduce the community budget of RMCB to partition matroid, which can be solved within -approximation under the special submodular cases. For RMSB problem under the general non-submodular cases, we give a data-dependent -approximation through bounding the adaptive total primal curvature by . The good performance of our algorithms is verified by our experiments on three real network datasets. The performance under the community budget is not worse than that under the size budget, thus, community budget is a better choice. However, the bound is not satisfactory, in future, we need to improve it further such that getting a smaller one.

Acknowledgments

This work is partly supported by National Science Foundation under grant 1747818.

References

  • [1] J. Guo and W. Wu, “A k-hop collaborate game model: Adaptive strategy to maximize total revenue,” arXiv preprint arXiv:1910.04125, 2019.
  • [2] P. Domingos and M. Richardson, “Mining the network value of customers,” in Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining.   ACM, 2001, pp. 57–66.
  • [3] M. Richardson and P. Domingos, “Mining knowledge-sharing sites for viral marketing,” in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining.   ACM, 2002, pp. 61–70.
  • [4] D. Kempe, J. Kleinberg, and É. Tardos, “Maximizing the spread of influence through a social network,” in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining.   ACM, 2003, pp. 137–146.
  • [5] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher, “An analysis of approximations for maximizing submodular set functions—i,” Mathematical programming, vol. 14, no. 1, pp. 265–294, 1978.
  • [6] D. Arthur, R. Motwani, A. Sharma, and Y. Xu, “Pricing strategies for viral marketing on social networks,” in International workshop on internet and network economics.   Springer, 2009, pp. 101–112.
  • [7] W. Lu and L. V. Lakshmanan, “Profit maximization over social networks,” in 2012 IEEE 12th International Conference on Data Mining.   IEEE, 2012, pp. 479–488.
  • [8] J. Tang, X. Tang, and J. Yuan, “Profit maximization for viral marketing in online social networks,” in 2016 IEEE 24th International Conference on Network Protocols (ICNP).   IEEE, 2016, pp. 1–10.
  • [9] H. Zhang, H. Zhang, A. Kuhnle, and M. T. Thai, “Profit maximization for multiple products in online social networks,” in IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications.   IEEE, 2016, pp. 1–9.
  • [10] J. Guo, T. Chen, and W. Wu, “A multi-feature diffusion model: Rumor blocking in social networks,” arXiv preprint arXiv:1912.03481, 2019.
  • [11] ——, “Budgeted coupon advertisement problem: Algorithm and robust analysis,” IEEE Transactions on Network Science and Engineering, pp. 1–1, 2020.
  • [12] F. Zhou, R. J. Jiao, and B. Lei, “Bilevel game-theoretic optimization for product adoption maximization incorporating social network effects,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 46, no. 8, pp. 1047–1060, 2015.
  • [13] Z. Lu, H. Zhou, V. O. Li, and Y. Long, “Pricing game of celebrities in sponsored viral marketing in online social networks with a greedy advertising platform,” in 2016 IEEE International Conference on Communications (ICC).   IEEE, 2016, pp. 1–6.
  • [14] Y. Yang, X. Mao, J. Pei, and X. He, “Continuous influence maximization: What discounts should we offer to social network users?” in Proceedings of the 2016 international conference on management of data.   ACM, 2016, pp. 727–741.
  • [15] A. Ajorlou, A. Jadbabaie, and A. Kakhbod, “Dynamic pricing in social networks: The word-of-mouth effect,” Management Science, vol. 64, no. 2, pp. 971–979, 2016.
  • [16] J. D. Smith, A. Kuhnle, and M. T. Thai, “An approximately optimal bot for non-submodular social reconnaissance,” in Proceedings of the 29th on Hypertext and Social Media.   ACM, 2018, pp. 192–200.
  • [17] N. Buchbinder, M. Feldman, J. Seffi, and R. Schwartz, “A tight linear time (1/2)-approximation for unconstrained submodular maximization,” SIAM Journal on Computing, vol. 44, no. 5, pp. 1384–1402, 2015.
  • [18] B. Liu, X. Li, H. Wang, Q. Fang, J. Dong, and W. Wu, “Profit maximization problem with coupons in social networks,” in International Conference on Algorithmic Applications in Management.   Springer, 2018, pp. 49–61.
  • [19] U. Feige, V. S. Mirrokni, and J. Vondrák, “Maximizing non-monotone submodular functions,” SIAM Journal on Computing, vol. 40, no. 4, pp. 1133–1153, 2011.
  • [20] G. Tong, W. Wu, and D.-Z. Du, “Coupon advertising in online social systems: Algorithms and sampling techniques,” arXiv preprint arXiv:1802.06946, 2018.
  • [21] J. Guo and W. Wu, “A novel scene of viral marketing for complementary products,” IEEE Transactions on Computational Social Systems, vol. 6, no. 4, pp. 797–808, 2019.
  • [22] W. Lu, W. Chen, and L. V. Lakshmanan, “From competition to complementarity: comparative influence diffusion and maximization,” Proceedings of the VLDB Endowment, vol. 9, no. 2, pp. 60–71, 2015.
  • [23]

    D. Golovin and A. Krause, “Adaptive submodularity: Theory and applications in active learning and stochastic optimization,”

    Journal of Artificial Intelligence Research

    , vol. 42, pp. 427–486, 2011.
  • [24] G. Tong, “Adaptive influence maximization under general feedback models,” arXiv preprint arXiv:1902.00192, 2019.
  • [25] A. Gotovos, A. Karbasi, and A. Krause, “Non-monotone adaptive submodular maximization,” in Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
  • [26] V. Gabillon, B. Kveton, Z. Wen, B. Eriksson, and S. Muthukrishnan, “Adaptive submodular maximization in bandit setting,” in Advances in Neural Information Processing Systems, 2013, pp. 2697–2705.
  • [27] A. Fern, R. Goetschalckx, M. Hamidi-Haines, and P. Tadepalli, “Adaptive submodularity with varying query sets: An application to active multi-label learning,” in International Conference on Algorithmic Learning Theory, 2017, pp. 577–592.
  • [28] J. Yuan and S.-J. Tang, “Adaptive discount allocation in social networks,” in Proceedings of the 18th ACM International Symposium on Mobile Ad Hoc Networking and Computing.   ACM, 2017, p. 22.
  • [29] K. Han, K. Huang, X. Xiao, J. Tang, A. Sun, and X. Tang, “Efficient algorithms for adaptive influence maximization,” Proceedings of the VLDB Endowment, vol. 11, no. 9, pp. 1029–1040, 2018.
  • [30] D. Golovin and A. Krause, “Adaptive submodular optimization under matroid constraints,” arXiv preprint arXiv:1101.4450, 2011.
  • [31] Z. Wang, B. Moran, X. Wang, and Q. Pan, “Approximation for maximizing monotone non-decreasing set functions with a greedy method,”

    Journal of Combinatorial Optimization

    , vol. 31, no. 1, pp. 29–43, 2016.
  • [32] R. A. Rossi and N. K. Ahmed, “The network data repository with interactive graph analytics and visualization,” in AAAI, 2015. [Online]. Available: http://networkrepository.com