Sequential decision making plays a crucial role in machine learning. In various scenarios, we must design an effective policy that repeatedly decides the next action to be taken by using the feedback obtained so far. The greedy policy is a simple but empirically effective approach to sequential decision making. At each step, it myopically makes a decision that seems the most beneficial among feasible choices.
Adaptive submodularity (Golovin & Krause, 2011) is a well-established framework for analyzing greedy algorithms for sequential decision making. It extends submodularity
, which is a diminishing returns property of set functions, to the setting of adaptive decision making. This framework has successfully provided theoretical guarantees for greedy algorithms for active learning(Golovin et al., 2010), recommendation (Gabillon et al., 2013), and touch-based localization in robotics (Javdani et al., 2014).
However, adaptive submodularity is not omnipotent. While the greedy policy works well for various sequential decision making problems, many of these problems do not have adaptive submodularity. In fact, even if an objective function is submodular in the non-adaptive setting, its adaptive version does not always have adaptive submodularity. Adaptive influence maximization is one such example. In this problem, a decision maker aims at spreading information about a product by selecting several advertisements. She repeatedly alternates between selecting an advertisement and observing its effect. The objective function of this problem is known to have adaptive submodularity in the independent cascade model (Golovin & Krause, 2011), but not in a more general diffusion model called the triggering model (Kempe et al., 2003), which is extensively studied as an important class of diffusion models (Leskovec et al., 2007; Tang et al., 2014). Note that this objective function satisfies submodularity in the non-adaptive setting, while it does not satisfy adaptive submodularity in the adaptive setting. Examples of other problems lacking adaptive submodularity appear in many applications such as feature selection and active learning. Therefore, we are waiting for an analysis framework that goes beyond adaptive submodularity.
|Problem||Adaptive submodularity ratio||Adaptive greedy||Adaptivity gaps|
In the non-adaptive setting, submodularity ratio (Das & Kempe, 2011) is a prevalent tool for handling non-submodular functions (Khanna et al., 2017; Elenberg et al., 2017). Intuitively, it is a parameter of monotone set functions that measures their distance to submodular functions. An adaptive variant of submodularity ratio would be a promising approach to handling functions that lack adaptive submodularity, but how to define it is quite non-trivial since there is a large discrepancy between the non-adaptive and adaptive settings as exemplified above. In particular, success in defining an adaptive version of submodularity ratio involves meeting the following two requirements: it must yield an approximation guarantee of the greedy policy, and it must be bounded in various important applications such as the adaptive influence maximization and adaptive feature selection. Previous works (Kusner, 2014; Yong et al., 2017) tried to define similar notions, but none of them meet the requirements.
We propose an analysis framework, adaptive submodularity ratio, that meets the aforementioned requirements. An advantage of our proposal is that it has the potential to yield various theoretical results as in Table 1. Below we summarize our main contributions.
We propose the definition of the adaptive submodularity ratio and, by using it, we prove an approximation guarantee of the adaptive greedy algorithm.
We give a bound on the adaptivity gap111The adaptivity gap is a different concept from adaptive complexity (Balkanski & Singer, 2018)., which represents the superiority of adaptive policies over non-adaptive policies, through the lens of the adaptive submodularity ratio.
We provide lower-bounds of adaptive submodularity ratio for two important applications: adaptive influence maximization on bipartite graphs in the triggering model and adaptive feature selection. Regarding the former one, we show that our result is tight.
Experiments confirm that the greedy policy performs well for the considered applications.
The rest of this paper is organized as follows. Section 2 provides the basic concepts and definitions. In Section 3, we formally define the adaptive submodularity ratio, which is the key concept of this study. In Sections 5 and 4, we provide bounds on the approximation ratio of the adaptive greedy algorithm and adaptivity gaps, respectively, by using the adaptive submodularity ratio. In Sections 7 and 6, we apply the frameworks developed in Sections 5 and 4 to two applications: adaptive influence maximization and adaptive feature selection. In Section 8, we experimentally check the performance of the adaptive greedy algorithm in several applications. In Section 9 we review related work.
Adaptive Stochastic Optimization.
Adaptive stochastic optimization is a general framework for handling problems of sequentially selecting elements, where we can observe the states of only the selected elements. Let be the ground set consisting of a finite number of elements. Suppose every element is assigned to some state in , which is the set of all possible states. We let be a map that associates each element, , with a state, . We consider the Bayesian setting where is generated from a known prior distribution . Let
be a random variable representing the randomness of the realization.
A decision maker can select one element at each step. After selecting , she can observe the state of . She repeatedly selects an element and then observes its state. The important point is that she can utilize the information about the states observed so far for selecting the next element. We denote by the partial realization observed so far, where is the set of selected elements. The decision maker’s strategy can be described as a policy tree, or simply policy
. A policy is a decision tree that determines the element to be selected next. Formally, a policyis a partial map that returns an element to be selected next given partial realization observed so far.
The goal of the decision maker is to maximize the expected value of the objective function . The objective function value depends on the set of selected elements and the states of all elements. At the beginning, she does not know , but she can get partial information of by observing state of selected . In parallel, she must select elements to construct that has high utility under the realization . Let be the set selected by policy under realization . The expected value achieved by policy is
where the expectation is taken with regard to the random variable generated from .
Adaptive Submodularity and Adaptive Monotonicity.
Adaptive submodularity, which is an adaptive extension of submodularity, is a diminishing returns property of the expected marginal gain. The expected marginal gain of when has been observed so far is defined as
where . We write if is generated from the posterior distribution . Given current realization , the expected marginal gain, , represents the expected increase in the objective value yielded by selecting . Adaptive submodularity is defined as follows:
Definition 1 (Adaptive submodularity (Golovin & Krause, 2011)).
Let be a set function and a distribution of . We say is adaptive submodular with respect to if for any partial realization and any element , it holds that
The monotonicity can also be extended to the adaptive setting as follows:
Definition 2 (Adaptive monotonicity (Golovin & Krause, 2011)).
Let be a set function and a distribution of . We say is adaptive monotone with respect to if for any partial realization and any element , it holds that
Other Notations for Adaptive Stochastic Optimization.
The expected marginal gain of policy with partial realization is defined as
Similarly, the expected marginal gain of set with partial realization is defined as
Let be the set of all policies whose heights do not exceed .
Submodularity Ratio and Supermodularity Ratio.
The submodularity ratio of a monotone non-negative set function with respect to set and parameter is defined to be
where and . If the numerator and denominator are both , the submodularity ratio is considered to be 1. We have , and a monotone set function is submodular if and only if for every and .
As an opposite concept of the submodularity ratio, the supermodularity ratio, was considered in Bogunovic et al. (2018), which is defined as follows:
where we regard . We have , and is supermodular if and only if for every and . We omit from and if it is clear from the context.
3 Adaptive Submodularity Ratio
In this section, we provide a precise definition of the adaptive submodularity ratio, which extends the submodularity ratio from the non-adaptive setting to the adaptive setting. We need to define it carefully so that it can yield an approximation guarantee of the greedy policy. An important point is to generalize subset of size at most , used to define the submodularity ratio, to policy of height at most .
Definition 3 (Adaptive submodularity ratio).
Suppose that is adaptive monotone w.r.t. a distribution . Adaptive submodularity ratio of and with respect to partial realization and parameter is defined to be
We omit and if they are clear from the context. We also define .
Intuitively, the adaptive submodularity ratio indicates the distance between and the class of adaptive submodular functions. As with the non-adaptive setting, implies the adaptive submodularity of , which can formally be written as follows:
It holds that for any partial realization and if and only if is adaptive submodular with respect to .
The proof is given in Appendix A.
4 Adaptive Greedy Algorithm
In this section, we present a new approximation ratio guarantee for the adaptive greedy algorithm based on the adaptive submodularity ratio. Thanks to this result, once the adaptive submodularity ratio is bounded, we can obtain approximation guarantees of the adaptive greedy algorithm for various applications. The adaptive greedy algorithm is an algorithm that starts with an empty set and repeatedly selects the element with the largest expected marginal gain. The detailed description is given in Algorithm 1. Golovin & Krause (2011) have shown that this algorithm achieves -approximation to the expected objective value of an optimal policy if is adaptive submodular w.r.t. . Here we extend their result and show that the adaptive greedy algorithm achieves -approximation, where is the number of selected elements. More precisely, we can bound the approximation ratio relative to any policy of height as follows:
Suppose is adaptive monotone with respect to . Let be a policy representing the adaptive greedy algorithm until step. Then, for any policy , it holds that
where is the adaptive submodularity ratio of w.r.t. .
We provide the proof in Appendix B.
5 Non-adaptive Policies and Adaptivity Gaps
We show that the adaptive submodularity ratio is also useful for theoretically comparing the performances of adaptive and non-adaptive policies. More precisely, we present a lower-bound of the adaptivity gap, which represents the performance gap between adaptive and non-adaptive polices, by using the adaptive submodularity ratio. The adaptivity gap is defined as follows:
Definition 4 (Adaptivity gaps).
The adaptivity gap of an objective function
and a probability distributionof is defined as the ratio between an optimal adaptive policy and an optimal non-adaptive policy, i.e.,
where is the height of adaptive and non-adaptive policies.
Let be an objective function and a probability distribution of . Let be the adaptive submodularity ratio of w.r.t. . Let be the supermodularity ratio of the set function of non-adaptive policies. We have
Therefore, given any non-adaptive -approximation algorithm, we can evaluate its performance relative to an optimal adaptive policy as follows:
Let be a non-adaptive policy that achieves -approximation to an optimal non-adaptive policy . Let be the adaptive submodularity ratio of w.r.t. . Let be the supermodularity ratio of the non-adaptive objective function . Let be an optimal adaptive policy. We have
Proofs are given in Appendix C.
6 Adaptive Influence Maximization
In this section, we consider adaptive influence maximization on bipartite graphs. We provide a bound on the adaptive submodularity ratio in the case of the triggering model, and we show that this result is tight. We also present bounds on the adaptivity gaps in the case of the independent cascade and linear threshold models by using the adaptive submodularity ratio.
Let be a directed bipartite graph with source vertices , sink vertices , and directed edges . In the case of bipartite influence model (Alon et al., 2012), this graph represents the relationship between advertisements and customers . We consider the problem of selecting several advertisements to make as much influence as possible on the customers. Here, each edge is determined to be alive or dead according to a certain distribution, and influence can be spread only through live edges. Given vertex weights , the objective function to be maximized is , where, for each , represents a set of vertices that are reachable from by going through only live edges. In the adaptive version of influence maximization, at each step, we select a vertex and observe the states of all outgoing edges , while, in the non-adaptive setting, we select before observing the states of any edges.
We consider a general diffusion model called the triggering model (Kempe et al., 2003), which includes various important models such as the independent cascade model and the linear threshold model as special cases. In the triggering model, each vertex is associated with some known probability distribution over the power set of incoming edges. According to this distribution, a subset of incoming live edges is determined. A vertex gets activated if and only if it is reachable from some selected vertex (or seed vertex) through only live edges. We aim to maximize the total weight of activated vertices by appropriately selecting seed vertices. Note that this objective function is submodular in the non-adaptive setting.
For later use, we explain the linear threshold model, a special case of the triggering model. In this model, the probability distribution on the incoming edges of each vertex is restricted so that each vertex has at most one live edge in any realization. In other words, there exists such that, for each , we have , where is the full set of edges pointing to , and is alive with probability exclusively over . In contrast to the linear threshold model, the triggering model accepts any distribution over the power set of .
6.1 Bound of Adaptive Submodularity Ratio
Let be an arbitrary directed bipartite graph and be any weight function. For any and partial realization , the adaptive submodularity ratio of the objective function and the distribution of the adaptive influence maximization in the triggering model is lower-bounded as follows:
Proof sketch of creftypecap 3.
Since the objective function and the probability distribution of edge states can be decomposed into those defined for each vertex , it is sufficient to consider the case where .
Our goal is to prove
for any observation and policy . By duplicating that appears multiple times in policy tree , we can write the above inequality as
where is a shorthand for and is the observation just before is selected. We decompose the policy tree into the path wherein remains inactive and the rest, and prove the inequality for each part separately. ∎
We can see that the above bound is tight even for the linear threshold model by considering the following example.
Let be a bipartite directed graph with , , and . Let be the vertex weight such that . We consider the linear threshold model in which an edge selected out of uniformly at random is alive and the other edges are dead. We consider a simple policy that selects all vertices one by one until is activated. These graph and policy are illustrated in Figure 1. Since finally activates , the expected gain of is . The probability that selects each vertex is . The expected marginal gain of is . The adaptive submodularity ratio can be upper-bounded as
Hence the lower-bound in creftypecap 3 is tight.
The assumption that is bipartite, considered in creftypecap 3, may seem excessively strong, but it is actually a vital assumption. We show that, if is not a bipartite graph, the adaptive submodularity ratio can be arbitrarily small; in fact, such an example can be constructed with the linear threshold model on a very simple graph . We describe the details in Section D.3.
6.2 Bound of Adaptivity Gap
Next we provide a bound on the adaptivity gaps of bipartite influence maximization problems by using the adaptive submodularity ratio. First we consider the independent cascade model. Since the adaptive submodularity holds for the independent cascade model (Golovin & Krause, 2011), the adaptive submodularity ratio of its objective function is by creftypecap 1. In addition, by using a bound of the curvature (Maehara et al., 2017) and an inequality between the supermodularity ratio and the curvature (Bogunovic et al., 2018), we obtain , where is an upper bound of the probability that each edge is alive and is the largest degree of the vertex in . From creftypecap 2, we obtain the following result.
Let be the objective function and the probability distribution of bipartite influence maximization in the independent cascade model. We have
We can derive a similar bound for the linear threshold model. Since the expected objective function is a linear function, its supermodularity ratio is . As a special case of creftypecap 3, we have . Combining these bounds with creftypecap 2, we obtain the following result.
Let be the objective function and the probability distribution of bipartite influence maximization in the linear threshold model. We have
7 Adaptive Feature Selection
In this section, we consider an adaptive variant of feature selection for sparse regression. All proofs related to this section are presented in Appendix E.
Let us consider the following scenario. A learner has all feature vectors in advance, but they are not accurate due to sensing noise. Here each sensor corresponds to a single feature vector. The learner can obtain accurate feature vectors by replacing inaccurate sensors with high-quality sensors, but the number of high-quality sensors is limited to. The learner selects features for observing their accurate feature vectors.
We formalize this scenario as the following problem. At the beginning, a learner knows a response vector and a prior distribution over the features, but does not know the features themselves. Namely, we regard the inaccurate feature vectors obtained with noisy sensors as prior distributions on accurate feature vectors. A random variable indicates the uncertainty over the observed feature vectors. From the noisy sensors, we can know only a prior distribution of but not the true . Let be the set of features. At each step, the learner can query a feature and observe its feature vector . We assume the noise of sensors are independent of each other; i.e., there exists a distribution for each and we can factorize as .
Let be the realized feature matrix under realization . The objective function to be maximized is defined as
7.1 Bound of Adaptive Submodularity Ratio
To bound the adaptive submodularity ratio of adaptive feature selection, we give a general lower bound of the adaptive submodularity ratio by using (non-adaptive) submodularity ratios of all realizations.
Let be adaptive monotone w.r.t. distribution . Assume the value of depends only on not on , i.e., for all and such that for all . We also assume can be factorized to distributions of states of each , i.e., . Let be the submodularity ratio of for each realization . For any distribution of , the adaptive submodularity ratio can be bounded as
Assume each column of is normalized. For any and any distribution of each , the adaptive submodularity ratio can be bounded as
where represents the smallest eigenvalue.
represents the smallest eigenvalue.
7.2 Bound of Adaptivity Gap
We can also obtain a bound on the adaptivity gap of adaptive feature selection as follows:
Let and suppose that can be factorized as . We have
We conduct experiments on two applications: adaptive influence maximization and adaptive feature selection. For each setting, we conduct trials and plot their mean values.
8.1 Adaptive Influence Maximization
We conduct experiments on two datasets of adaptive influence maximization. The first dataset is a synthetic bipartite graph generated randomly according to Erdös–Renyi rule. We set the number of source and sink vertices to 10000, i.e., . For each pair , we add an edge between and with probability . The second dataset is Yahoo! Search Marketing Advertiser–Phrase Bipartite Graph (Yah, ), which is a bipartite graph representing relationships between advertisers and search phrases; we have , , and . For both datasets, the weight of each vertex in
is drawn from the uniform distribution on.
We consider two diffusion models. The first one is the linear threshold model. The probability that each edge is alive is set to the reciprocal of the degree of the sink vertex, that is, . As the second diffusion model, we consider an extended version of the linear threshold model, which is also a special case of the triggering model. In this model, for each sink vertex , the subset of incoming live edges is determined as follows. We sample edges with replacement from uniformly at random, and an edge turns alive if it is sampled at least once. In our experiments, parameter is set to .
We compare the adaptive greedy algorithm with three non-adaptive benchmarks. The first benchmark is the non-adaptive greedy algorithm, called non-adaptive, which is a standard greedy algorithm (Nemhauser et al., 1978) for maximizing the expected value of the objective function . The second benchmark is Degree, which selects the set of vertices with the top- largest degree. The third benchmark is Random, which selects a random subset of size .
8.2 Adaptive Feature Selection
We use synthetic datasets generated randomly as follows. First we determine the mean according to the uniform distribution on . After that, each column is normalized so that its mean is
and its standard deviation is. We obtain by adding to , where each element of is drawn from the uniform distribution on . We consider two settings: and . We select a random sparse subset of features such that , and we let be the response vector, where each element of
is drawn from the standard normal distribution. In all settings, we setand .
We compare the adaptive greedy algorithm with two benchmarks. The first benchmark is the non-adaptive greedy algorithm. Regarding the adaptive and non-adaptive greedy algorithms, it is hard to evaluate the exact values of the objective functions, and so we approximately evaluate them by sampling randomly according to posterior distributions. The second benchmark is the noise-oblivious greedy algorithm, a non-adaptive algorithm that greedily selects a subset based on the mean, .
9 Related Work
Comparison with (Kusner, 2014).
To our knowledge, the first attempt to generalize submodularity ratio to the adaptive setting is (Kusner, 2014). They defined approximate adaptive submodularity, a notion that is similar to ours, as follows:
The key difference is that they did not replace subset with policy . In Appendix F, we show that the approximate adaptive submodularity is not sufficient for providing an approximation guarantee of the adaptive greedy algorithm.
Comparison with (Yong et al., 2017).
Another attempt to relax adaptive submodularity is presented in (Yong et al., 2017). They introduced -weakly adaptive submodular functions as follows:
Definition 5 (-weak adaptive submodularity).
Let be a set function and be a distribution of . For any , we say is adaptive submodular with respect to if for any partial realization and any element , it holds Let be the infimum of satisfying the above inequality.
Analogous to our adaptive submodularity ratio, one can readily see that -weak adaptive submodularity is equivalent to the adaptive submodularity. In general, however, there is a difference between the two notions; the adaptive submodularity ratio can be bounded from below by , implying that it is more demanding to bound the value of than that of the adaptive submodularity ratio.
For any set function and distribution , we have
We provide a proof in Section G.1. Yong et al. (2017) studied a problem called group-based active diagnosis and gave a bound of , but some vital assumptions seem to have been missed. In Section G.2, we provide a problem instance in which their bound does not hold. We also present instances of adaptive influence maximization and adaptive feature selection for which our framework provides strictly better approximation ratios than those obtained with the weak adaptive submodularity in Sections G.4 and G.3.
Adaptive submodularity was proposed by Golovin & Krause (2011). There are several attempts to adaptively maximize set functions that do not satisfy adaptive submodularity (e.g., (Kusner, 2014; Yong et al., 2017)). Chen et al. (2015) analyzed the greedy policy focusing on the maximization of mutual information, which does not have adaptive submodularity.
Submodularity ratio was proposed by Das & Kempe (2011) for sparse regression with squared loss. Recently, Elenberg et al. (2018) extended this result to more general loss functions with restricted strong convexity and restricted smoothness. Bogunovic et al. (2018) proposed the notion of supermodularity ratio. Bian et al. (2017) provided a guarantee of the non-adaptive greedy algorithm for the case where the total curvature and submodularity ratio of objective functions are bounded.
Influence maximization was proposed by Kempe et al. (2003). An adaptive version of influence maximization was first considered by Golovin & Krause (2011). They showed that this objective function satisfies adaptive submodularity under the independent cascade model in general graphs. Influence maximization on a bipartite graph has been studied for applications to advertisement selection (Alon et al., 2012; Soma et al., 2014). This problem setting was extended to the adaptive setting by Hatano et al. (2016), but only the independent cascade model was considered. The curvature of its objective function was studied by Maehara et al. (2017).
Kale et al. (2017) considered the problem called adaptive feature selection, but their problem setting is different from ours. In their setting, the learner solves feature selection problems multiple times. They studied the adaptivity among the multiple rounds, while we studied the adaptivity inside of a single round.
K.F. was supported by JSPS KAKENHI Grant Number JP 18J12405.
- (1) Yahoo! webscope dataset: G1 - Yahoo! Search Marketing Advertiser-Phrase Bipartite Graph, Version 1.0. URL https://webscope.sandbox.yahoo.com/.
- Alon et al. (2012) Alon, N., Gamzu, I., and Tennenholtz, M. Optimizing budget allocation among channels and influencers. In Proceedings of the 21st World Wide Web Conference 2012, WWW 2012, pp. 381–388, 2012.
Balkanski & Singer (2018)
Balkanski, E. and Singer, Y.
The adaptive complexity of maximizing a submodular function.
Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, pp. 1138–1151, 2018.
- Bian et al. (2017) Bian, A. A., Buhmann, J. M., Krause, A., and Tschiatschek, S. Guarantees for greedy maximization of non-submodular functions with applications. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, pp. 498–507, 2017.
Bogunovic et al. (2018)
Bogunovic, I., Zhao, J., and Cevher, V.
Robust maximization of non-submodular objectives.
Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, AISTATS 2018, pp. 890–899, 2018.
- Chen et al. (2015) Chen, Y., Hassani, S. H., Karbasi, A., and Krause, A. Sequential information maximization: When is greedy near-optimal? In Proceedings of The 28th Conference on Learning Theory, COLT 2015, pp. 338–363, 2015.
- Das & Kempe (2011) Das, A. and Kempe, D. Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pp. 1057–1064, 2011.
Elenberg et al. (2017)
Elenberg, E. R., Dimakis, A. G., Feldman, M., and Karbasi, A.
Streaming weak submodularity: Interpreting neural networks on the fly.In Advances in Neural Information Processing Systems 30, pp. 4047–4057, 2017.
- Elenberg et al. (2018) Elenberg, E. R., Khanna, R., Dimakis, A. G., and Negahban, S. Restricted strong convexity implies weak submodularity. Ann. Statist., 46(6B):3539–3568, 2018.
- Gabillon et al. (2013) Gabillon, V., Kveton, B., Wen, Z., Eriksson, B., and Muthukrishnan, S. Adaptive submodular maximization in bandit setting. In Advances in Neural Information Processing Systems 26, pp. 2697–2705, 2013.
- Golovin & Krause (2011) Golovin, D. and Krause, A. Adaptive submodularity: Theory and applications in active learning and stochastic optimization. J. Artif. Intell. Res., 42:427–486, 2011.
- Golovin et al. (2010) Golovin, D., Krause, A., and Ray, D. Near-optimal bayesian active learning with noisy observations. In Advances in Neural Information Processing Systems 23, pp. 766–774, 2010.
- Hatano et al. (2016) Hatano, D., Fukunaga, T., and Kawarabayashi, K. Adaptive budget allocation for maximizing influence of advertisements. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, IJCAI 2016, pp. 3600–3608, 2016.
- Javdani et al. (2014) Javdani, S., Chen, Y., Karbasi, A., Krause, A., Bagnell, D., and Srinivasa, S. S. Near optimal bayesian active learning for decision making. In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, AISTATS 2014, pp. 430–438, 2014.
Kale et al. (2017)
Kale, S., Karnin, Z., Liang, T., and Pál, D.
Adaptive feature selection: Computationally efficient online sparse linear regression under RIP.In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, pp. 1780–1788, 2017.
- Kempe et al. (2003) Kempe, D., Kleinberg, J. M., and Tardos, É. Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, pp. 137–146, 2003.
- Khanna et al. (2017) Khanna, R., Elenberg, E. R., Dimakis, A. G., Negahban, S., and Ghosh, J. Scalable greedy feature selection via weak submodularity. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, pp. 1560–1568, 2017.
- Kusner (2014) Kusner, M. J. Approximately adaptive submodular maximization. In NIPS Workshop on Discrete and Combinatorial Problems in Machine Learning, 2014.
- Leskovec et al. (2007) Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J. M., and Glance, N. S. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2007, pp. 420–429, 2007.
- Maehara et al. (2017) Maehara, T., Kawase, Y., Sumita, H., Tono, K., and Kawarabayashi, K. Optimal pricing for submodular valuations with bounded curvature. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, AAAI 2017, pp. 622–628, 2017.
- Nemhauser et al. (1978) Nemhauser, G. L., Wolsey, L. A., and Fisher, M. L. An analysis of approximations for maximizing submodular set functions - I. Math. Program., 14(1):265–294, 1978.
- Soma et al. (2014) Soma, T., Kakimura, N., Inaba, K., and Kawarabayashi, K. Optimal budget allocation: Theoretical guarantee and efficient algorithm. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, pp. 351–359, 2014.
- Tang et al. (2014) Tang, Y., Xiao, X., and Shi, Y. Influence maximization: near-optimal time complexity meets practical efficiency. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, pp. 75–86, 2014.
Yong et al. (2017)
Yong, S. Z., Gao, L., and Ozay, N.
Weak adaptive submodularity and group-based active diagnosis with applications to state estimation with persistent sensor faults.In 2017 American Control Conference (ACC), pp. 2574–2581, 2017.
Appendix A Proof for Adaptive Submodularity Ratio
Proof of creftypecap 1.
First we deal with the “if” part. Let be the partial realization just before is selected in . If there are multiple partial realizations such that , we can duplicate and take them to be different elements. From adaptive submodularity, for any partial realization and policy , we have
Thus we can see . Moreover, if is a policy that selects a single element, the above inequality holds with equality. These two facts imply .
Next we deal with the “only if” part. Let be any partial realization such that and be any element. We define to be the additional element and its state in , i.e., . Let us consider a policy that first selects and, if , proceeds to select . From the assumption, we have , and thus . We can calculate the left and right hand sides as follows:
Therefore, we obtain . By sequentially concatenating inequalities of this type, we can show that the statement holds for any . ∎
Appendix B Proof for the Adaptive Greedy Algorithm
To prove creftypecap 1, we introduce a lemma provided by Golovin & Krause (2011). Let be a concatenated policy, i.e., a policy that executes as if from scratch after executing . Adaptive monotonicity is known to be equivalent to the following condition:
Lemma 1 (Adopted from (Golovin & Krause, 2011, Lemma A.8)).
Fix . Then we have for all with and all if and only if for all policies and , we have .
Proof of creftypecap 1.
Let be any possible partial realization that can appear while executing the adaptive greedy policy . Since stops after steps, we have . According to the definition of adaptive submodularity ratio, we have
since . Let be a random partial realization observed by executing , where is a policy obtained by running until it terminates or it selects elements. Formally, conforms to the distribution . Then we can lower-bound the expected single step gain as follows:
Let . The above inequality can be rewritten as , which implies . By repeatedly using this inequality, we obtain . Consequently, we have . ∎
Appendix C Proofs for Adaptivity Gaps
Proof of creftypecap 2.
Let be an optimal non-adaptive policy and be an optimal adaptive policy. Since is a non-adaptive policy, it selects the same subset for all , i.e., for all and . Let and the non-adaptive policy that selects . From the optimality of , we have
By the definition of the supermodularity ratio, we have
Note that and for each . Due to the definition of , we have
From the definition of adaptive submodularity ratio, we have
Combining these inequalities, we have
Proof of creftypecap 1.
From the approximation ratio, we have
From creftypecap 2, we have
The above two inequalities imply the statement. ∎
From the following example, we can see that creftypecap 2 is tight, i.e., for any rationals and in , there exist and such that the equality holds.
Let be the ground set, where . Let . Let . We define the probability distribution such that with probability for each and with probability . Other elements always in state , i.e., with probability for all . We define the objective function as
where is the parameter specified later. We have for all . The supermodularity ratio of is
The adaptive submodularity ratio is
The adaptivity gap is
For any rationals and , there exist some such that and .
Appendix D Proof for Adaptive Influence Maximization
In this section, we provide the full proof for creftypecap 3. For the readability, we first give a proof for the case of the linear threshold model, which is a special case of the triggering model. After that, we give a proof for the case of the triggering model.
d.1 Proof for the Linear Threshold Model
Proof of creftypecap 3 in the case of the linear threshold model.
Let be the source vertices, the sink vertices, and the directed edges. For notational simplicity, assume that is a complete bipartite graph, i.e., . By setting for all edges that originally do not exist, we can assume this without loss of generality. Fix any and . It suffices to prove
Let be the expected marginal gain obtained by activating . Below we explain that the above inequality can be separated for each ; i.e., it is enough to prove the above inequality for the case where for just one vertex and for the others. The objective function is the linear sum of the one for each : . Therefore, the above inequality is decomposed into the sum of