Fast Evolutionary Algorithms for Maximization of Cardinality-Constrained Weakly Submodular Functions

08/03/2019 ∙ by Victoria G. Crawford, et al. ∙ 0

We study the monotone, weakly submodular maximization problem (WSM), which is to find a subset of size k from a universe of size n that maximizes a monotone, weakly submodular objective function f. For objectives with submodularity ratio γ, we provide two novel evolutionary algorithms that have an expected approximation guarantee of (1-n^-1)(1-e^-γ-ϵ) for WSM in linear time, improving upon the cubic time complexity of previous evolutionary algorithms for this problem. This improvement is a result of restricting mutations to local changes, a biased random selection of which set to mutate, and an improved theoretical analysis. In the context of several applications of WSM, we demonstrate the ability of our algorithms to quickly exceed the solution of the greedy algorithm and converge faster than existing evolutionary algorithms for WSM.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Monotone weakly submodular111See Notation and Definitions at the end of Section 1. maximization (WSM) is the combinatorial problem of selecting a set of size from a universe of elements to maximize a monotone weakly submodular objective function

, defined on subsets of the universe. This NP-hard combinatorial optimization problem arises within many applications, including dictionary selection

(Das and Kempe, 2011), data summarization (Mirzasoleiman et al., 2013), viral marketing (Kempe et al., 2003), sensor placement (Krause et al., 2008)

, and statistical learning theory

(Elenberg et al., 2018). Much work on WSM has focused on greedy algorithms, which are particularly effective if the objective is submodular (Nemhauser et al., 1978). Bian et al. (2017) showed that the standard greedy algorithm provides a tight ratio of where is the submodularity ratio1 of . Algorithms that can find better quality solutions in practice than the greedy algorithm are of interest; especially when is further away from being submodular (i.e. is closer to 0) and the efficacy of the greedy algorithm degrades. Pareto optimization (PO) strategies have been introduced by Friedrich and Neumann (2014); Qian et al. (2015) for WSM; these evolutionary algorithms work by maintaining a population of sets, composed of the best set encountered, i.e. the highest value, of each cardinality from to

. Iteratively, a random set is selected for mutation, where the membership of every element in the universe is flipped with probability

. If the new set is better than the set in the population with the same cardinality, it is kept and the existing one removed. A more detailed description of PO can be found in Appendix A.1. Both Friedrich and Neumann (2014) and Qian et al. (2015) analyzed slight variants of PO (the former with a submodular objective) and showed that after an expected time222In this work, time is measured in terms of the number of objective function evaluations. of and respectively, there exists a set in the population of PO that has the same worst-case ratio as that of the greedy algorithm. Further, Qian et al. (2015) demonstrated the ability of PO to find significantly better solutions in practice compared to the greedy algorithm on the sparse regression problem. One difficulty with PO as an alternative to the greedy algorithm for WSM is the expected runtime until the approximation guarantee of is achieved: If , then the expected time is cubic in , which may not be a practical timeframe on large datasets.


We modify the PO framework to obtain faster convergence, and we develop Biased Localized Pareto Optimization (BLPO, Alg. 1), which yields a solution with expected approximation guarantee (Theorem 1) for WSM in time , where , are input parameters and is the submodularity ratio of . The design of BLPO departs from the classical PO framework by (i) biasing the random selection of sets in the population to mutate in order to more quickly find a good solution, (ii) restricting mutations of sets to the addition or removal of a single element i.e. a local change, and (iii) a new theoretical analysis that draws inspiration from the analysis of the algorithm Stochastic-Greedy of Mirzasoleiman et al. (2015). In addition, we propose a second algorithm, Targeted Localized Pareto Optimization (TLPO, Alg. 2 in Appendix A), with the same theoretical guarantees as BLPO. TLPO is similar in design to BLPO, but the selection of sets in the population to mutate is further concentrated: exploiting the fact that mutations are local changes, TLPO identifies sets in the population that are no longer profitable to mutate and focuses mutations on sets that may be profitable. In addition, once there are no sets in the population that are profitable to mutate, TLPO restarts the entire population. As demonstrated in Section 3, TLPO often finds better solutions faster than BLPO and PO. We validate BLPO and TLPO in the context of submodular and non-submodular variants of influence maximization and data summarization. Empirically, we find that both BLPO and TLPO exceed the greedy value before a single greedy iteration, which is often five to ten times faster than PO. Further, both BLPO and TLPO continue to improve their solution beyond greedy at a rapid pace. The advantages of BLPO and TLPO over the greedy algorithm are most pronounced on the non-submodular applications.


The rest of the paper is organized as follows. We first introduce some necessary notation and definitions below. Then, in Section 1.1, we describe additional related work beyond that described in the introduction. Next, we describe our algorithms BLPO and TLPO in Section 2 and present our results on their approximation guarantees. Finally, our experimental results are in Section 3.

Notation and Definitions

We introduce some notation and definitions that will be used throughout the paper. Let be a universe of size , and be defined on subsets of . For , we shorten the notation for the marginal gain to of adding to to be . If we have a singleton set , we will simply write to mean . The set function is monotone if for all , . The set function is -weakly submodular for if for all , and , . The submodularity ratio measures how submodular is: if , then is submodular, and the closer is to , the less submodular is.

1.1 Related Work

In this section, we discuss related work on the monotone weakly submodular maximization (WSM) problem and evolutionary algorithms (EAs). When is submodular (i.e. ), Nemhauser and Wolsey (1978) showed the standard greedy algorithm achieves the best ratio of for WSM in time under the value query model. Methods of speeding up the standard greedy algorithm to nearly linear time have since been proposed (Badanidiyuru and Vondrák, 2014; Mirzasoleiman et al., 2015). Badanidiyuru and Vondrák (2014) provided a deterministic -approximation in time . A faster, randomized greedy algorithm Stochastic-Greedy (SG) was developed by Mirzasoleiman et al. (2015) with expected -approximation in time . Although BLPO and TLPO have very different design than SG, the proof for the ratio of SG inspired our analysis; this will be discussed in more detail in Section 2.1. The advantage of BLPO and TLPO relative to SG is that the solutions of BLPO and TLPO continue to improve the longer they are run, quickly outperforming SG and the standard greedy algorithm provided that improvement is possible. We compare SG to BLPO and TLPO experimentally in Section 3. Recently, attention has been paid to the more general class of weakly submodular functions. Das and Kempe (2011) introduced the submodularity ratio and proved that the standard greedy algorithm has a ratio of . To quantify weak submodularity, Das and Kempe (2011) introduced and bounded the submodularity ratio for the special case of the subset selection objective; their results were recently extended to bound the submodularity ratio of restricted strongly convex functions (Elenberg et al., 2018). Bian et al. (2017) showed that the standard greedy algorithm has a tight ratio of , where is the submodularity ratio and is a notion of generalized curvature. Subject to a matroid constraint instead of a cardinality constraint, Chen et al. (2018) analyzed the performance of a randomized greedy algorithm for weakly submodular maximization. The submodularity ratio has also been employed for robust maximization by Bogunovic et al. (2018). Other notions of non-submodularity have been introduced for cardinality-constrained maximization (Feige and Izsak, 2013; Borodin et al., 2014; Horel and Singer, 2016), as well as in the context of multiset selection (Kuhnle et al., 2018; Qian et al., 2018). Evolutionary algorithms (EA) have been studied for many combinatorial optimization problems (Laumanns et al., 2002; Neumann and Wegener, 2007; Friedrich et al., 2010; Yu et al., 2012), and recently for submodular optimization problems (Friedrich and Neumann, 2014; Qian et al., 2015, 2017; Friedrich et al., 2018). To the best of our knowledge, BLPO and TLPO are the first EAs that have an approximation guarantee in linear time close to that of the greedy algorithm for WSM with monotone, weakly submodular .

2 Algorithms

In this section, we introduce our algorithms Biased Localized Pareto Optimization (BLPO, Alg. 1) and Targeted Localized Pareto Optimization (TLPO, Alg. 2 in Appendix A) for the cardinality-constrained monotone weakly submodular maximization problem (WSM). We first introduce BLPO in Section 2.1, and prove in Theorem 1 that BLPO finds a feasible -approximate solution to the instance of WSM in time, where and are input parameters, and is the submodularity ratio of . Next, we introduce TLPO in Section 2.2, which has the same theoretical guarantee as BLPO.

2.1 Biased Localized Pareto Optimization (Blpo)

We now introduce the first of our two algorithms; Biased Localized Pareto Optimization (BLPO, Alg. 1). The algorithm BLPO takes as input a value oracle to , a cardinality constraint , number of iterations , approximation accuracy parameter , and bias parameter . The basic strategy of BLPO is like PO (See Appendix A.1 for a detailed description of PO): BLPO maintains a population of good sets, one for each distinct cardinality from to . One can imagine as being organized into bins, with at most one set per bin; therefore we describe a set in with cardinality as being contained in bin . For each possible cardinality , bin contains the best set (i.e. highest value) encountered of cardinality during the execution of the algorithm. However, the design of BLPO has two major differences compared to PO: (i) localized mutations: in BLPO, mutation entails only the addition or removal of a single element. In particular, once a bin containing set is selected for mutation, an unbiased coin is flipped to decide whether a mutation is addition, in which case we term it a forward mutation, or the mutation is removal, termed a backward mutation. The element chosen to be added (removed) is then uniformly chosen from (). (ii) biased random selection: When choosing a bin for mutation, BLPO does not choose uniformly randomly. Instead, with probability BLPO chooses bin , and otherwise chooses a uniformly random bin; initially , and once a forward mutation has occurred at bin times, increments. Somewhat surprisingly, these two ideas, along with an improved analysis, are enough to reduce the theoretical convergence time of PO from cubic to linear, as we show below. Further, we experimentally observe a large practical speedup in convergence as well in Section 3. We remark that the choice of and the analysis of the performance ratio of BLPO are inspired by the algorithm Stochastic-Greedy (SG) of Mirzasoleiman et al. (2015) and its analysis by the following intuition: suppose is fixed, and bin is selected for forward mutation times. Since the best set of cardinality is maintained in bin , this roughly corresponds to the addition of one element in SG. The analysis for BLPO is more complicated than that of SG since the set in bin may change during the forward mutations. The full pseudocode for BLPO is given in Alg. 1.

1:  Input: a value oracle to objective function ; cardinality constraint ; selection bias parameter ; approximation accuracy parameter ; number of iterations .
2:  Output: , such that .
6:  for , ,  do
7:     if   then
8:        With probability , let be the set in such that ; otherwise, with probability , select a set uniformly randomly.
9:     else
10:        Select an element uniformly randomly.
11:     Flip an unbiased coin.
12:     if  heads  then
13:        , where is uniformly randomly selected from .
14:        if   then
16:           if   then
18:     else
19:        , where is uniformly randomly selected from .
20:     if  such that and  then
22:     else if  and such that  then
24:  return
Algorithm 1 BLPO : Biased Localized Pareto Optimization

We now state the main result for BLPO.

Theorem 1.

Suppose we have an instance of WSM with optimal solution . Suppose we run BLPO (Algorithm 1) with , , and number of iterations . Then where is the set returned at the completion of BLPO, and is the submodularity ratio of .

Because there is exactly one evaluation of at each iteration, if we measure time in number of evaluations as is commonly done (e.g. Badanidiyuru and Vondrák (2014)), then BLPO finds a -approximate solution in time . Recall that PO has a ratio of in expected iterations. If the selection of sets from to mutate is unbiased (that is, ), then the number of evaluations required by Theorem 1 becomes . Hence, we see that our improved analysis inspired by SG removes a factor of from PO. In addition, biasing the selection of sets from to mutate removes another factor of from the time complexity. The localized mutations result in a constant-factor speedup. We now provide a proof for Theorem 1.

Proof of Theorem 1

First suppose that BLPO is run until is incremented to . Notice that with each iteration of the outer loop on Line 6 in BLPO, a single random set is generated on Lines 7 to 10, and then a single random element is generated on Lines 11 to 19. Consider only iterations where was added to at Line 13. Each such iteration of BLPO can then be identified with a pair , and an entire run of BLPO can be identified with a sequence of such pairs. Assuming BLPO is run until is incremented to , consider the probability space of all possible sequences of pairs (i.e. corresponding to runs of BLPO) as described in the previous paragraph. Let . At the end of the iteration of the outer loop on Line 6 where is incremented from to , there must exist exactly one set such that . This is because could not have incremented from to on Line 17 without generating a set of size and adding it to on Line 13 for the first time and adding it to at Line 23, and from that point on such a set can only be replaced by another set of the same cardinality on Line 21. Define

to be the random variable taking on value

. Next, with every possible sequence in the probability space and every , we will associate a single pair in the sequence, defined as follows. Consider some sequence, and the subsequence of this sequence such that and that correspond to iterations where (there must be exactly of these or else would not have incremented to in Line 17). If there exists in the subsequence such that , associate with the sequence the last such pair. Otherwise, associate with the sequence the last element in the subsequence. Define to be the random variable taking on value , and the random variable taking on value . Define the event for every to be the event that was in the pair described in the previous paragraph. Similarly, define the event for every to be the probability that was in the pair described in the previous paragraph. Notice that for every and , and are independent events because was chosen uniformly randomly from on Line 13. We now will prove the following Lemma, which will be used to prove Theorem 1.

Lemma 1.

For any ,


Consider the pair associated with any possible sequence in the probability space and this . By definition of this pair, was generated as on Lines 7 to 19 when . From this point on, there exists a single set in of cardinality with value at least . Since is incremented to after this point, we have that


In addition, it can be seen that


where the last line follows because of the independence of and . Consider : this is the probability that over the times we select to add while the iterator is at we never randomly choose an element of , which has probability bounded by

In addition, for all . Therefore using Inequality 2,


where the last inequality follows from the monotonicity and the submodularity ratio of . Combining Inequalities 1 and 3 gives us that


Next, notice that for any , : this is because any set of size we have added to while must have value at least as high as the set in of size when is set to . This completes the proof of Lemma 1. ∎

One may apply Lemma 1 inductively to see that

So far, we have assumed that BLPO has run until . The following Lemma gives the probability of this occurring given the lower bound on .

Lemma 2.

At the completion of BLPO, with probability at least .


Consider the random variable arising from the following binomial experiment with trials: Every iteration of the outer loop on Line 6 of BLPO is a trial, and “success" in an iteration is defined to occur when bin is selected on Line 8 and the coin toss at Line 11 yields heads, i.e. when is incremented on Line 15. The probability of success at each iteration is . Therefore . We will used Chernoff’s bound, stated as follows.

Lemma 3 (Chernoff’s bound).

Suppose are independent random variables taking values in . Let denote their sum and let denote the sum’s expected value. Then for any

Then by Chernoff’s bound,

since . Furthermore since , which is a sufficient number of selections forward from bin to ensure . This completes the proof of Lemma 2. ∎

Finally, the statement of Theorem 1 follows from Equation 4, Lemma 2

, and the law of total probability. ∎

2.2 Targeted Localized Pareto Optimization (Tlpo)

We now introduce the second of our two algorithms; Targeted Localized Pareto Optimization (TLPO, Alg. 2 in Appendix A). TLPO has the same basic design as BLPO, with the major difference being the random bin selection process in each iteration. We include the full pseudocode for TLPO in Appendix A. Recall that BLPO chooses a specific bin ( increases from 0 to over the duration of BLPO) with probability , and otherwise chooses a bin uniformly randomly to mutate. Consider the following situation: suppose that bin has been selected enough times so that all elements in the universe have been tried for addition and removal to the set in bin , and suppose that the set within bin has not changed during this time. Then, no improvement can be gained by further selection of bin for mutation until the set within bin changes. Motivated by this intutition, TLPO follows an alternative procedure compared to BLPO designed to avoid selecting bins for which mutation would not be as fruitful. For each bin , TLPO keeps track of how many forward and backward mutations, and respectively, have been attempted from bin ; these are reset to whenever the set in bin changes. If both numbers exceed , this bin is labeled converged; otherwise, it is unconverged. We use as the threshold in rough analogy to SG (Mirzasoleiman et al., 2015): after add (remove) selections from bin , one has conducted a process analogous to a SG add (remove) step from bin . Then the bin selection procedure of TLPO goes as follows: select bin with probability as BLPO does, but if bin is not selected then TLPO selects the unconverged bin of lowest index. If no unconverged bin exists, clear all bins of their sets except for bin (which always contains ) and remember only the best set within budget seen so far. TLPO has the same theoretical guarantees as BLPO. The proof of this is nearly identical to the proof of Theorem 1 in Section 2.1, and so we do not include it here.

3 Applications and Experiments

In this section, we validate BLPO and TLPO on instances of influence maximization (IM) (Kempe et al., 2003) and data summarization (DS) (Mirzasoleiman et al., 2015), with both submodular and non-submodular objectives . Our results show that BLPO and TLPO find solutions better than that of the standard greedy algorithm much faster than PO; frequently in to of the time. In addition, TLPO typically finds better solutions than those that are ever found by PO, even after 50 greedy iterations333A greedy iteration is function evaluations.. Below, we provide a description of the IM and DS applications, and then our experimental results on those applications.

Algorithms Evaluated

Our algorithms BLPO and TLPO are implemented as specified in the pseudocode in Algorithms 1 and 2 respectively. We compare with PO as described in Appendix A.1. In addition, we compare with Stochastic-Greedy (SG) of Mirzasoleiman et al. (2015). To allow SG to improve its solution over time, we run SG repeatedly and keep track of the best solution found so far. For BLPO, TLPO, and SG, . For BLPO and TLPO, .

(a) Facebook,
(b) Facebook,
(c) Orkut,
(d) GrQc,
(e) GrQc,
(f) DS, -medoid
(g) DS, -medoid
(h) DS, DPP
(i) DS, DPP
Figure 1: In all plots the -axis is normalized by the standard greedy value on the instance, and the -axis is normalized by the number of evaluations required by the standard greedy algorithm. The dataset and objective are indicated in the caption of each subfigure. SG is dropped from plots for which it did not exceed the minimum value on the -axis. (a)–(c): IM with submodular IC model. (d)–(e) IM with non-submodular FT model. (f)–(g) DS with submodular -medoid objective. (h)–(i) DS with weakly submodular DPP objective.
Influence Maximization (IM)

Given a social network, represented as a weighted graph , it is desired to select a seed set of maximum influence; for example, in the context of viral marketing. Content cascades through the network via sharing between users; let denote the expected size of a cascade if the content is originally shared by the users in set . The influence maximization (IM) problem is to choose the seed set of users of size at most that maximizes . For the function to be well-defined, a diffusion model must be specified. We employ two popular diffusion models in our evaluation: 1) Independent Cascade (IC), which yields submodular objective (Kempe et al., 2003), and 2) Fixed Threshold (FT) model, which yields function , which is not submodular; in fact, it is hard to approximate IM with (Lu et al., 2012). A description of these diffusion models and our experimental setup is provided in Appendix B. Topologies of social networks from SNAP (Leskovec and Krevl, 2014) were used. Results on the submodular objective are shown in Figs. 1(a)1(c), on an instance of Facebook () with and Orkut () with . Results for the non-submodular objective are shown in Figs. 1(d)1(e), on an instance of GrQc () with .

Data Summarization (DS)

In data summarization (DS), we have a set of data points and we wish to find a subset of of cardinality that best summarizes the entire dataset . takes to a measure of how effectively summarizes . In our experiments, is a set of color images from the CIFAR-100 dataset (Krizhevsky and Hinton, 2009)

each represented by a 3072 dimensional vector of pixels. For the objective

, we use: (i) A monotonic, submodular objective based on -medoid (Kaufman and Rousseeuw, 2009) that emphasizes minimizing the average Euclidean distance between elements of and their closest representative in the summary, and (ii) a monotonic, weakly submodular objective based on Determinantal Point Process (DPP) (Kulesza et al., 2012) which emphasizes diversity within the summary. The experimental results for the submodular -medoid objective are shown in Figs. 1(f) and 1(g) on instances of DS with and with and respectively. The experimental results for the weakly submodular DPP objective are shown in Figs 1(h) and 1(i) on instances of DS with and with and respectively. More details about both the -medoid and DPP objectives, as well as our experimental setup, can be found in Appendix B.2.

Summary of Experimental Results

Our experimental results can be found in Figures 1(a) to 1(i). All results shown are the mean of at least

repetitions of each algorithm; shaded regions represent one standard deviation from the mean. In overview, we observed the following results: for the submodular objectives, standard greedy performs near optimally in practice, although our algorithms were able to outperform it by up to

(Figs. 1(b), 1(f)); however, on the non-submodular objectives, we were able to find substantial improvements on the greedy solution on nearly every instance (about on some instances, see Figs. 1(e), 1(i)). Both BLPO and TLPO demonstrated a much faster convergence than PO; see Figs. 1(e) and the same experiment depicted during its early iterations 1(d), where both BLPO and TLPO exceed times the greedy value by greedy iterations, whereas PO requires nearly greedy iterations to make the same improvement. On some of the instances, TLPO finds a clearly better solution in the long run than BLPO or TLPO; see Figs. 1(b), 1(e), 1(f). In Figure 1(b) we were able to verify that TLPO was converging to the optimal solution value by using an IP solver, while both BLPO and PO appear to get stuck at around improvement over greedy.


Appendix A Supplementary Information: Algorithms

In this section, we include supplementary material to Section 2. In particular, pseudocode for Targeted Localized Pareto Optimization (TLPO) is given in Algorithm 2, and a description of the Pareto Optimization algorithm (PO) of Friedrich and Neumann [2014], Qian et al. [2015] is provided in Section A.1.

1:  Input: , oracle to objective function; , cardinality constraint; , the selection bias; , accuracy parameter.
2:  Output: , such that .
5:  , and for all
7:  while number of evaluations is less than  do
8:     if   then
9:        With probability , select bin ; otherwise, with probability , select the smallest bin such that
10:     else
11:        Select the smallest bin such that
12:      element of with cardinality
13:     Flip an unbiased coin, unless or . If , always ; if , always
14:     if   then
16:        , where is randomly selected from .
17:        if   then
19:           if   then
21:     else
23:        , where is randomly selected from .
24:     if  or or  then
26:        if  for all  then
27:           , and for all
29:           continue with next iteration of while loop
30:     if  there is no set of size in and  then
32:     else
33:        if  such that and  then
37:  return
Algorithm 2 TLPO : Targeted Localized Pareto Optimization

a.1 Description of Pareto Optimization (Po)

1:  Input: , oracle to objective function; , cardinality constraint
2:  Output: , such that .
4:  while number of evaluations is less than  do
5:     Select uniformly at random.
7:     for  do
8:        if  then
9:           With probability , .
10:        else
11:           With probability , .
12:     if  then
13:        if  then
15:     else
16:        if  then
18:  return
Algorithm 3 PO : Pareto Optimization

In this section, we give a description of Pareto Optimization (PO), the algorithm analyzed for submodular maximization in [Qian et al., 2015]. The algorithm analyzed by [Friedrich and Neumann, 2014] is similar except starts with a random solution in its population instead of . The full pseudocode is given in Alg. 3. We summarize the differences compared to BLPO: (i) non-local mutations: to mutate a set , the membership of each is randomly flipped independently with probability (for loop on line 7). Therefore, multiple elements can be added to or removed from . In contrast, in BLPO and TLPO a mutation always entails a single element added or a single element removed. (ii) uniformly random solution selection: when PO chooses an element of to mutate, it chooses from uniformly randomly (line 5), i.e. there is no index as in BLPO and TLPO in PO.

Appendix B Supplementary Information: Experiments

In this section, we include supplementary discussion to Section 3 in the paper. This includes application and experimental setup details for the Influence Maximization (IM) experiments (Section B.1) and for the Data Summarization (DS) experiments (Section B.2).

b.1 Influence Maximization (IM)

We present additional information about cascade models for IM in Section B.1.1 and then additional experimental setup details for the IM experiments in Section B.1.2.

b.1.1 IM Cascade Models

The first diffusion model we consider is the Independent Cascade Model (IC) [Kempe et al., 2003]. IC lets each edge of social network be present in realization graph with independent probability proportional to the weight of ; then, the size of the cascade of a set is the size of the reachable set of in . Under this model, we denote the expected cascade size as ; notably, is submodular [Kempe et al., 2003]. For a given seed set , it is #P-hard to compute [Chen et al., 2010]; however, an efficient sampling method to approximate have been developed by Borgs et al. [2014], namely Reverse Influence Sampling (RIS). RIS results in a submodular sampled function regardless of how many samples are used; we employ this submodular sampled function in our evaluation. For each dataset, one function is sampled; this function is used for all experiments on this dataset. The second diffusion model we consider is the fixed threshold model (FT). Under this model, each user has a fixed threshold and is activated if the sum of the edge weights of its activated neighbors exceeds . We choose the thresholds and edge weights randomly in

, but they are fixed for the remainder of the experiment; that is, all algorithms use the same oracle. This yields deterministic activation function

, which is not submodular; in fact, it is hard to approximate IM with [Lu et al., 2012].

b.1.2 IM Datasets

Topology |V| |E| Samples
ca-GrQc 5242 14496 n/a n/a
ego-Facebook 4039 88234 0.01 500
ca-AstroPh 18772 198119 0.01 50000
com-Orkut 3072441 117185083 0.001 100000
Table 1: Datasets from SNAP [Leskovec and Krevl, 2014]

For the IM experiments in Section 3, we used social network topologies collected by SNAP [Leskovec and Krevl, 2014], as described in Table 1. We used the IC model with constant edge probabilities; that is, each edge has independent probability to exist. The number of RIS samples is also indicated in Table 1. The reverse influence sets were sampled once and re-used for all algorithms; thus, each algorithm used the same submodular, approximate oracle.

b.2 Data Summarization (DS)

In this section, we include additional information about the Data Summarization (DS) objectives in Section B.2.1.

b.2.1 DS Objectives

We use two measures of summary effectiveness as the objective for our experiments in Section 3. The first objective is based on the -medoid problem [Kaufman and Rousseeuw, 2009], which is to minimize the average distance between elements in and their closest representative in the chosen data points. For these experiments, we define for any to be

and is the Euclidean norm. In this case, is monotone submodular. We refer to this as in Section 3. The second objective that we use for our experiments in Section 3 involves the Determinantal Point Processes (DPP) [Kulesza et al., 2012]. Suppose that we order the universe . Define to be the Gramian matrix of the vectors in and the Gaussian kernel, i.e.

Then for a subset of images define where is the submatrix of indexed by . In this case, it has been proven that is monotone weakly submodular; if

are the eigenvalues of

, then has submodularity ratio lower bounded by [Bian et al., 2017].