Beyond Adaptive Submodularity: Approximation Guarantees of Greedy Policy with Adaptive Submodularity Ratio

04/24/2019
by   Kaito Fujii, et al.
0

We propose a new concept named adaptive submodularity ratio to study the greedy policy for sequential decision making. While the greedy policy is known to perform well for a wide variety of adaptive stochastic optimization problems in practice, its theoretical properties have been analyzed only for a limited class of problems. We narrow the gap between theory and practice by using adaptive submodularity ratio, which enables us to prove approximation guarantees of the greedy policy for a substantially wider class of problems. Examples of newly analyzed problems include important applications such as adaptive influence maximization and adaptive feature selection. Our adaptive submodularity ratio also provides bounds of adaptivity gaps. Experiments confirm that the greedy policy performs well with the applications being considered compared to standard heuristics.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

03/21/2010

Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization

Solving stochastic optimization problems under partial observability, wh...
07/16/2020

Improved Approximation Factor for Adaptive Influence Maximization via Simple Greedy Strategies

In the adaptive influence maximization problem, we are given a social ne...
02/19/2011

Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection

We study the problem of selecting a subset of k random variables from a ...
11/02/2020

Unified greedy approximability beyond submodular maximization

We consider classes of objective functions of cardinality constrained ma...
07/16/2014

Probabilistic Group Testing under Sum Observations: A Parallelizable 2-Approximation for Entropy Loss

We consider the problem of group testing with sum observations and noise...
01/24/2017

Weak Adaptive Submodularity and Group-Based Active Diagnosis with Applications to State Estimation with Persistent Sensor Faults

In this paper, we consider adaptive decision-making problems for stochas...
05/28/2021

Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks

We present a new class of adaptive stochastic optimization algorithms, w...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Sequential decision making plays a crucial role in machine learning. In various scenarios, we must design an effective policy that repeatedly decides the next action to be taken by using the feedback obtained so far. The greedy policy is a simple but empirically effective approach to sequential decision making. At each step, it myopically makes a decision that seems the most beneficial among feasible choices.

Adaptive submodularity (Golovin & Krause, 2011) is a well-established framework for analyzing greedy algorithms for sequential decision making. It extends submodularity

, which is a diminishing returns property of set functions, to the setting of adaptive decision making. This framework has successfully provided theoretical guarantees for greedy algorithms for active learning

(Golovin et al., 2010), recommendation (Gabillon et al., 2013), and touch-based localization in robotics (Javdani et al., 2014).

However, adaptive submodularity is not omnipotent. While the greedy policy works well for various sequential decision making problems, many of these problems do not have adaptive submodularity. In fact, even if an objective function is submodular in the non-adaptive setting, its adaptive version does not always have adaptive submodularity. Adaptive influence maximization is one such example. In this problem, a decision maker aims at spreading information about a product by selecting several advertisements. She repeatedly alternates between selecting an advertisement and observing its effect. The objective function of this problem is known to have adaptive submodularity in the independent cascade model (Golovin & Krause, 2011), but not in a more general diffusion model called the triggering model (Kempe et al., 2003), which is extensively studied as an important class of diffusion models (Leskovec et al., 2007; Tang et al., 2014). Note that this objective function satisfies submodularity in the non-adaptive setting, while it does not satisfy adaptive submodularity in the adaptive setting. Examples of other problems lacking adaptive submodularity appear in many applications such as feature selection and active learning. Therefore, we are waiting for an analysis framework that goes beyond adaptive submodularity.


Problem Adaptive submodularity ratio Adaptive greedy Adaptivity gaps
Linear threshold
Independent cascade
Triggering
Feature selection
Table 1: Summary of our theoretical results about adaptive bipartite influence maximization and adaptive feature selection. We show lower bounds for the adaptive submodularity ratios, the approximation ratios of the adaptive greedy algorithm, and the adaptivity gaps. Let and . Parameters and are determined by the diffusion model and the underlying graph structure. The results of (Golovin & Krause, 2011) are indicated by .

In the non-adaptive setting, submodularity ratio (Das & Kempe, 2011) is a prevalent tool for handling non-submodular functions (Khanna et al., 2017; Elenberg et al., 2017). Intuitively, it is a parameter of monotone set functions that measures their distance to submodular functions. An adaptive variant of submodularity ratio would be a promising approach to handling functions that lack adaptive submodularity, but how to define it is quite non-trivial since there is a large discrepancy between the non-adaptive and adaptive settings as exemplified above. In particular, success in defining an adaptive version of submodularity ratio involves meeting the following two requirements: it must yield an approximation guarantee of the greedy policy, and it must be bounded in various important applications such as the adaptive influence maximization and adaptive feature selection. Previous works (Kusner, 2014; Yong et al., 2017) tried to define similar notions, but none of them meet the requirements.

Our Contribution.

We propose an analysis framework, adaptive submodularity ratio, that meets the aforementioned requirements. An advantage of our proposal is that it has the potential to yield various theoretical results as in Table 1. Below we summarize our main contributions.

  • We propose the definition of the adaptive submodularity ratio and, by using it, we prove an approximation guarantee of the adaptive greedy algorithm.

  • We give a bound on the adaptivity gap111The adaptivity gap is a different concept from adaptive complexity (Balkanski & Singer, 2018)., which represents the superiority of adaptive policies over non-adaptive policies, through the lens of the adaptive submodularity ratio.

  • We provide lower-bounds of adaptive submodularity ratio for two important applications: adaptive influence maximization on bipartite graphs in the triggering model and adaptive feature selection. Regarding the former one, we show that our result is tight.

  • Experiments confirm that the greedy policy performs well for the considered applications.

Organization.

The rest of this paper is organized as follows. Section 2 provides the basic concepts and definitions. In Section 3, we formally define the adaptive submodularity ratio, which is the key concept of this study. In Sections 5 and 4, we provide bounds on the approximation ratio of the adaptive greedy algorithm and adaptivity gaps, respectively, by using the adaptive submodularity ratio. In Sections 7 and 6, we apply the frameworks developed in Sections 5 and 4 to two applications: adaptive influence maximization and adaptive feature selection. In Section 8, we experimentally check the performance of the adaptive greedy algorithm in several applications. In Section 9 we review related work.

2 Preliminaries

Adaptive Stochastic Optimization.

Adaptive stochastic optimization is a general framework for handling problems of sequentially selecting elements, where we can observe the states of only the selected elements. Let be the ground set consisting of a finite number of elements. Suppose every element is assigned to some state in , which is the set of all possible states. We let be a map that associates each element, , with a state, . We consider the Bayesian setting where is generated from a known prior distribution . Let

be a random variable representing the randomness of the realization

.

A decision maker can select one element at each step. After selecting , she can observe the state of . She repeatedly selects an element and then observes its state. The important point is that she can utilize the information about the states observed so far for selecting the next element. We denote by the partial realization observed so far, where is the set of selected elements. The decision maker’s strategy can be described as a policy tree, or simply policy

. A policy is a decision tree that determines the element to be selected next. Formally, a policy

is a partial map that returns an element to be selected next given partial realization observed so far.

The goal of the decision maker is to maximize the expected value of the objective function . The objective function value depends on the set of selected elements and the states of all elements. At the beginning, she does not know , but she can get partial information of by observing state of selected . In parallel, she must select elements to construct that has high utility under the realization . Let be the set selected by policy under realization . The expected value achieved by policy is

(1)

where the expectation is taken with regard to the random variable generated from .

Adaptive Submodularity and Adaptive Monotonicity.

Adaptive submodularity, which is an adaptive extension of submodularity, is a diminishing returns property of the expected marginal gain. The expected marginal gain of when has been observed so far is defined as

(2)
(3)

where . We write if is generated from the posterior distribution . Given current realization , the expected marginal gain, , represents the expected increase in the objective value yielded by selecting . Adaptive submodularity is defined as follows:

Definition 1 (Adaptive submodularity (Golovin & Krause, 2011)).

Let be a set function and a distribution of . We say is adaptive submodular with respect to if for any partial realization and any element , it holds that

(4)

The monotonicity can also be extended to the adaptive setting as follows:

Definition 2 (Adaptive monotonicity (Golovin & Krause, 2011)).

Let be a set function and a distribution of . We say is adaptive monotone with respect to if for any partial realization and any element , it holds that

(5)

Other Notations for Adaptive Stochastic Optimization.

The expected marginal gain of policy with partial realization is defined as

(6)
(7)

Similarly, the expected marginal gain of set with partial realization is defined as

(8)

Let be the set of all policies whose heights do not exceed .

Submodularity Ratio and Supermodularity Ratio.

The submodularity ratio of a monotone non-negative set function with respect to set and parameter is defined to be

(9)

where and . If the numerator and denominator are both , the submodularity ratio is considered to be 1. We have , and a monotone set function is submodular if and only if for every and .

As an opposite concept of the submodularity ratio, the supermodularity ratio, was considered in Bogunovic et al. (2018), which is defined as follows:

(10)

where we regard . We have , and is supermodular if and only if for every and . We omit from and if it is clear from the context.

3 Adaptive Submodularity Ratio

In this section, we provide a precise definition of the adaptive submodularity ratio, which extends the submodularity ratio from the non-adaptive setting to the adaptive setting. We need to define it carefully so that it can yield an approximation guarantee of the greedy policy. An important point is to generalize subset of size at most , used to define the submodularity ratio, to policy of height at most .

Definition 3 (Adaptive submodularity ratio).

Suppose that is adaptive monotone w.r.t. a distribution . Adaptive submodularity ratio of and with respect to partial realization and parameter is defined to be

(11)
(12)

We omit and if they are clear from the context. We also define .

Intuitively, the adaptive submodularity ratio indicates the distance between and the class of adaptive submodular functions. As with the non-adaptive setting, implies the adaptive submodularity of , which can formally be written as follows:

Proposition 1.

It holds that for any partial realization and if and only if is adaptive submodular with respect to .

The proof is given in Appendix A.

4 Adaptive Greedy Algorithm

In this section, we present a new approximation ratio guarantee for the adaptive greedy algorithm based on the adaptive submodularity ratio. Thanks to this result, once the adaptive submodularity ratio is bounded, we can obtain approximation guarantees of the adaptive greedy algorithm for various applications. The adaptive greedy algorithm is an algorithm that starts with an empty set and repeatedly selects the element with the largest expected marginal gain. The detailed description is given in Algorithm 1. Golovin & Krause (2011) have shown that this algorithm achieves -approximation to the expected objective value of an optimal policy if is adaptive submodular w.r.t. . Here we extend their result and show that the adaptive greedy algorithm achieves -approximation, where is the number of selected elements. More precisely, we can bound the approximation ratio relative to any policy of height as follows:

0:  The value oracle for the expected marginal gain associated with and , a cardinality constraint .
0:   a set of observations of size .
1:  .
2:  for  do
3:     .
4:     Observe and let .
5:  end for
6:  return .
Algorithm 1 Adaptive greedy algorithm (Golovin & Krause, 2011)
Theorem 1.

Suppose is adaptive monotone with respect to . Let be a policy representing the adaptive greedy algorithm until step. Then, for any policy , it holds that

(13)

where is the adaptive submodularity ratio of w.r.t. .

We provide the proof in Appendix B.

5 Non-adaptive Policies and Adaptivity Gaps

We show that the adaptive submodularity ratio is also useful for theoretically comparing the performances of adaptive and non-adaptive policies. More precisely, we present a lower-bound of the adaptivity gap, which represents the performance gap between adaptive and non-adaptive polices, by using the adaptive submodularity ratio. The adaptivity gap is defined as follows:

Definition 4 (Adaptivity gaps).

The adaptivity gap of an objective function

and a probability distribution

of is defined as the ratio between an optimal adaptive policy and an optimal non-adaptive policy, i.e.,

(14)

where is the height of adaptive and non-adaptive policies.

Theorem 2.

Let be an objective function and a probability distribution of . Let be the adaptive submodularity ratio of w.r.t. . Let be the supermodularity ratio of the set function of non-adaptive policies. We have

(15)

Therefore, given any non-adaptive -approximation algorithm, we can evaluate its performance relative to an optimal adaptive policy as follows:

Corollary 1.

Let be a non-adaptive policy that achieves -approximation to an optimal non-adaptive policy . Let be the adaptive submodularity ratio of w.r.t. . Let be the supermodularity ratio of the non-adaptive objective function . Let be an optimal adaptive policy. We have

(16)

Proofs are given in Appendix C.

6 Adaptive Influence Maximization

In this section, we consider adaptive influence maximization on bipartite graphs. We provide a bound on the adaptive submodularity ratio in the case of the triggering model, and we show that this result is tight. We also present bounds on the adaptivity gaps in the case of the independent cascade and linear threshold models by using the adaptive submodularity ratio.

Let be a directed bipartite graph with source vertices , sink vertices , and directed edges . In the case of bipartite influence model (Alon et al., 2012), this graph represents the relationship between advertisements and customers . We consider the problem of selecting several advertisements to make as much influence as possible on the customers. Here, each edge is determined to be alive or dead according to a certain distribution, and influence can be spread only through live edges. Given vertex weights , the objective function to be maximized is , where, for each , represents a set of vertices that are reachable from by going through only live edges. In the adaptive version of influence maximization, at each step, we select a vertex and observe the states of all outgoing edges , while, in the non-adaptive setting, we select before observing the states of any edges.

We consider a general diffusion model called the triggering model (Kempe et al., 2003), which includes various important models such as the independent cascade model and the linear threshold model as special cases. In the triggering model, each vertex is associated with some known probability distribution over the power set of incoming edges. According to this distribution, a subset of incoming live edges is determined. A vertex gets activated if and only if it is reachable from some selected vertex (or seed vertex) through only live edges. We aim to maximize the total weight of activated vertices by appropriately selecting seed vertices. Note that this objective function is submodular in the non-adaptive setting.

For later use, we explain the linear threshold model, a special case of the triggering model. In this model, the probability distribution on the incoming edges of each vertex is restricted so that each vertex has at most one live edge in any realization. In other words, there exists such that, for each , we have , where is the full set of edges pointing to , and is alive with probability exclusively over . In contrast to the linear threshold model, the triggering model accepts any distribution over the power set of .

6.1 Bound of Adaptive Submodularity Ratio

We first present the bound of adaptive submodularity ratio. Here we provide a proof sketch, and the full proof is given in Sections D.2 and D.1.

Theorem 3.

Let be an arbitrary directed bipartite graph and be any weight function. For any and partial realization , the adaptive submodularity ratio of the objective function and the distribution of the adaptive influence maximization in the triggering model is lower-bounded as follows:

(17)
Proof sketch of creftypecap 3.

Since the objective function and the probability distribution of edge states can be decomposed into those defined for each vertex , it is sufficient to consider the case where .

Our goal is to prove

(18)

for any observation and policy . By duplicating that appears multiple times in policy tree , we can write the above inequality as

(19)

where is a shorthand for and is the observation just before is selected. We decompose the policy tree into the path wherein remains inactive and the rest, and prove the inequality for each part separately. ∎

is alive

is alive

is alive

is dead

is dead

is dead

graph

policy
Figure 1: An example that implies the tightness of our bound.

We can see that the above bound is tight even for the linear threshold model by considering the following example.

Example 1.

Let be a bipartite directed graph with , , and . Let be the vertex weight such that . We consider the linear threshold model in which an edge selected out of uniformly at random is alive and the other edges are dead. We consider a simple policy that selects all vertices one by one until is activated. These graph and policy are illustrated in Figure 1. Since finally activates , the expected gain of is . The probability that selects each vertex is . The expected marginal gain of is . The adaptive submodularity ratio can be upper-bounded as

(20)
(21)
(22)

Hence the lower-bound in creftypecap 3 is tight.

The assumption that is bipartite, considered in creftypecap 3, may seem excessively strong, but it is actually a vital assumption. We show that, if is not a bipartite graph, the adaptive submodularity ratio can be arbitrarily small; in fact, such an example can be constructed with the linear threshold model on a very simple graph . We describe the details in Section D.3.

6.2 Bound of Adaptivity Gap

Next we provide a bound on the adaptivity gaps of bipartite influence maximization problems by using the adaptive submodularity ratio. First we consider the independent cascade model. Since the adaptive submodularity holds for the independent cascade model (Golovin & Krause, 2011), the adaptive submodularity ratio of its objective function is by creftypecap 1. In addition, by using a bound of the curvature (Maehara et al., 2017) and an inequality between the supermodularity ratio and the curvature (Bogunovic et al., 2018), we obtain , where is an upper bound of the probability that each edge is alive and is the largest degree of the vertex in . From creftypecap 2, we obtain the following result.

Proposition 2.

Let be the objective function and the probability distribution of bipartite influence maximization in the independent cascade model. We have

(23)

We can derive a similar bound for the linear threshold model. Since the expected objective function is a linear function, its supermodularity ratio is . As a special case of creftypecap 3, we have . Combining these bounds with creftypecap 2, we obtain the following result.

Proposition 3.

Let be the objective function and the probability distribution of bipartite influence maximization in the linear threshold model. We have

(24)

7 Adaptive Feature Selection

In this section, we consider an adaptive variant of feature selection for sparse regression. All proofs related to this section are presented in Appendix E.

Let us consider the following scenario. A learner has all feature vectors in advance, but they are not accurate due to sensing noise. Here each sensor corresponds to a single feature vector. The learner can obtain accurate feature vectors by replacing inaccurate sensors with high-quality sensors, but the number of high-quality sensors is limited to

. The learner selects features for observing their accurate feature vectors.

We formalize this scenario as the following problem. At the beginning, a learner knows a response vector and a prior distribution over the features, but does not know the features themselves. Namely, we regard the inaccurate feature vectors obtained with noisy sensors as prior distributions on accurate feature vectors. A random variable indicates the uncertainty over the observed feature vectors. From the noisy sensors, we can know only a prior distribution of but not the true . Let be the set of features. At each step, the learner can query a feature and observe its feature vector . We assume the noise of sensors are independent of each other; i.e., there exists a distribution for each and we can factorize as .

Let be the realized feature matrix under realization . The objective function to be maximized is defined as

7.1 Bound of Adaptive Submodularity Ratio

To bound the adaptive submodularity ratio of adaptive feature selection, we give a general lower bound of the adaptive submodularity ratio by using (non-adaptive) submodularity ratios of all realizations.

Theorem 4.

Let be adaptive monotone w.r.t. distribution . Assume the value of depends only on not on , i.e., for all and such that for all . We also assume can be factorized to distributions of states of each , i.e., . Let be the submodularity ratio of for each realization . For any distribution of , the adaptive submodularity ratio can be bounded as

(25)

By using creftypecap 4 and the result of (Das & Kempe, 2011), we obtain the following lower bound of the adaptive submodularity ratio.

Corollary 2.

Assume each column of is normalized. For any and any distribution of each , the adaptive submodularity ratio can be bounded as

(26)

where

represents the smallest eigenvalue.

7.2 Bound of Adaptivity Gap

We can also obtain a bound on the adaptivity gap of adaptive feature selection as follows:

Proposition 4.

Let and suppose that can be factorized as . We have

(27)
Remark 1.

These results on the adaptive submodularity ratio and adaptivity gap can be extended to more general loss functions with restricted strong concavity and restricted smoothness as in

Elenberg et al. (2018).

(a) infmax, synth., linear threshold
(b) infmax, synth., extended linear thre.
(c) infmax, yahoo, linear threshold
(d) infmax, yahoo, extended linear thre.
(e) feature,
(f) feature,
Figure 2: Experimental results on adaptive influence maximization LABEL:LABEL: and adaptive feature selection LABEL:LABEL:. LABEL: and LABEL: are the results on synthetic datasets with the linear threshold model and extended linear threshold model, respectively. LABEL: and LABEL: are the results on Yahoo! dataset (Yah, ) with the linear threshold model and extended linear threshold model, respectively. LABEL: and LABEL: are the results on synthetic datasets with uniform noise distribution on with , respectively.

8 Experiments

We conduct experiments on two applications: adaptive influence maximization and adaptive feature selection. For each setting, we conduct trials and plot their mean values.

8.1 Adaptive Influence Maximization

Datasets.

We conduct experiments on two datasets of adaptive influence maximization. The first dataset is a synthetic bipartite graph generated randomly according to Erdös–Renyi rule. We set the number of source and sink vertices to 10000, i.e., . For each pair , we add an edge between and with probability . The second dataset is Yahoo! Search Marketing Advertiser–Phrase Bipartite Graph (Yah, ), which is a bipartite graph representing relationships between advertisers and search phrases; we have , , and . For both datasets, the weight of each vertex in

is drawn from the uniform distribution on

.

Diffusion Model.

We consider two diffusion models. The first one is the linear threshold model. The probability that each edge is alive is set to the reciprocal of the degree of the sink vertex, that is, . As the second diffusion model, we consider an extended version of the linear threshold model, which is also a special case of the triggering model. In this model, for each sink vertex , the subset of incoming live edges is determined as follows. We sample edges with replacement from uniformly at random, and an edge turns alive if it is sampled at least once. In our experiments, parameter is set to .

Benchmarks.

We compare the adaptive greedy algorithm with three non-adaptive benchmarks. The first benchmark is the non-adaptive greedy algorithm, called non-adaptive, which is a standard greedy algorithm (Nemhauser et al., 1978) for maximizing the expected value of the objective function . The second benchmark is Degree, which selects the set of vertices with the top- largest degree. The third benchmark is Random, which selects a random subset of size .

Results.

Objective values achieved by the algorithms are shown in Figures 2(d), 2(c), 2(b) and 2(a). In all settings, the adaptive greedy algorithm outperforms all the benchmarks.

8.2 Adaptive Feature Selection

Datasets.

We use synthetic datasets generated randomly as follows. First we determine the mean according to the uniform distribution on . After that, each column is normalized so that its mean is

and its standard deviation is

. We obtain by adding to , where each element of is drawn from the uniform distribution on . We consider two settings: and . We select a random sparse subset of features such that , and we let be the response vector, where each element of

is drawn from the standard normal distribution. In all settings, we set

and .

Benchmarks.

We compare the adaptive greedy algorithm with two benchmarks. The first benchmark is the non-adaptive greedy algorithm. Regarding the adaptive and non-adaptive greedy algorithms, it is hard to evaluate the exact values of the objective functions, and so we approximately evaluate them by sampling randomly according to posterior distributions. The second benchmark is the noise-oblivious greedy algorithm, a non-adaptive algorithm that greedily selects a subset based on the mean, .

Results.

The results are shown in Figures 2(f) and 2(e). In both settings, the adaptive greedy algorithm outperforms the two benchmarks.

9 Related Work

Comparison with (Kusner, 2014).

To our knowledge, the first attempt to generalize submodularity ratio to the adaptive setting is (Kusner, 2014). They defined approximate adaptive submodularity, a notion that is similar to ours, as follows:

(28)

The key difference is that they did not replace subset with policy . In Appendix F, we show that the approximate adaptive submodularity is not sufficient for providing an approximation guarantee of the adaptive greedy algorithm.

Comparison with (Yong et al., 2017).

Another attempt to relax adaptive submodularity is presented in (Yong et al., 2017). They introduced -weakly adaptive submodular functions as follows:

Definition 5 (-weak adaptive submodularity).

Let be a set function and be a distribution of . For any , we say is adaptive submodular with respect to if for any partial realization and any element , it holds Let be the infimum of satisfying the above inequality.

Analogous to our adaptive submodularity ratio, one can readily see that -weak adaptive submodularity is equivalent to the adaptive submodularity. In general, however, there is a difference between the two notions; the adaptive submodularity ratio can be bounded from below by , implying that it is more demanding to bound the value of than that of the adaptive submodularity ratio.

Proposition 5.

For any set function and distribution , we have

We provide a proof in Section G.1. Yong et al. (2017) studied a problem called group-based active diagnosis and gave a bound of , but some vital assumptions seem to have been missed. In Section G.2, we provide a problem instance in which their bound does not hold. We also present instances of adaptive influence maximization and adaptive feature selection for which our framework provides strictly better approximation ratios than those obtained with the weak adaptive submodularity in Sections G.4 and G.3.

Adaptive Submodularity.

Adaptive submodularity was proposed by Golovin & Krause (2011). There are several attempts to adaptively maximize set functions that do not satisfy adaptive submodularity (e.g., (Kusner, 2014; Yong et al., 2017)). Chen et al. (2015) analyzed the greedy policy focusing on the maximization of mutual information, which does not have adaptive submodularity.

Submodularity Ratio.

Submodularity ratio was proposed by Das & Kempe (2011) for sparse regression with squared loss. Recently, Elenberg et al. (2018) extended this result to more general loss functions with restricted strong convexity and restricted smoothness. Bogunovic et al. (2018) proposed the notion of supermodularity ratio. Bian et al. (2017) provided a guarantee of the non-adaptive greedy algorithm for the case where the total curvature and submodularity ratio of objective functions are bounded.

Influence Maximization.

Influence maximization was proposed by Kempe et al. (2003). An adaptive version of influence maximization was first considered by Golovin & Krause (2011). They showed that this objective function satisfies adaptive submodularity under the independent cascade model in general graphs. Influence maximization on a bipartite graph has been studied for applications to advertisement selection (Alon et al., 2012; Soma et al., 2014). This problem setting was extended to the adaptive setting by Hatano et al. (2016), but only the independent cascade model was considered. The curvature of its objective function was studied by Maehara et al. (2017).

Feature Selection.

Kale et al. (2017) considered the problem called adaptive feature selection, but their problem setting is different from ours. In their setting, the learner solves feature selection problems multiple times. They studied the adaptivity among the multiple rounds, while we studied the adaptivity inside of a single round.

Acknowledgements

K.F. was supported by JSPS KAKENHI Grant Number JP 18J12405.

References

  • (1) Yahoo! webscope dataset: G1 - Yahoo! Search Marketing Advertiser-Phrase Bipartite Graph, Version 1.0. URL https://webscope.sandbox.yahoo.com/.
  • Alon et al. (2012) Alon, N., Gamzu, I., and Tennenholtz, M. Optimizing budget allocation among channels and influencers. In Proceedings of the 21st World Wide Web Conference 2012, WWW 2012, pp. 381–388, 2012.
  • Balkanski & Singer (2018) Balkanski, E. and Singer, Y. The adaptive complexity of maximizing a submodular function. In

    Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018

    , pp. 1138–1151, 2018.
  • Bian et al. (2017) Bian, A. A., Buhmann, J. M., Krause, A., and Tschiatschek, S. Guarantees for greedy maximization of non-submodular functions with applications. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, pp. 498–507, 2017.
  • Bogunovic et al. (2018) Bogunovic, I., Zhao, J., and Cevher, V. Robust maximization of non-submodular objectives. In

    Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, AISTATS 2018

    , pp. 890–899, 2018.
  • Chen et al. (2015) Chen, Y., Hassani, S. H., Karbasi, A., and Krause, A. Sequential information maximization: When is greedy near-optimal? In Proceedings of The 28th Conference on Learning Theory, COLT 2015, pp. 338–363, 2015.
  • Das & Kempe (2011) Das, A. and Kempe, D. Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pp. 1057–1064, 2011.
  • Elenberg et al. (2017) Elenberg, E. R., Dimakis, A. G., Feldman, M., and Karbasi, A.

    Streaming weak submodularity: Interpreting neural networks on the fly.

    In Advances in Neural Information Processing Systems 30, pp. 4047–4057, 2017.
  • Elenberg et al. (2018) Elenberg, E. R., Khanna, R., Dimakis, A. G., and Negahban, S. Restricted strong convexity implies weak submodularity. Ann. Statist., 46(6B):3539–3568, 2018.
  • Gabillon et al. (2013) Gabillon, V., Kveton, B., Wen, Z., Eriksson, B., and Muthukrishnan, S. Adaptive submodular maximization in bandit setting. In Advances in Neural Information Processing Systems 26, pp. 2697–2705, 2013.
  • Golovin & Krause (2011) Golovin, D. and Krause, A. Adaptive submodularity: Theory and applications in active learning and stochastic optimization. J. Artif. Intell. Res., 42:427–486, 2011.
  • Golovin et al. (2010) Golovin, D., Krause, A., and Ray, D. Near-optimal bayesian active learning with noisy observations. In Advances in Neural Information Processing Systems 23, pp. 766–774, 2010.
  • Hatano et al. (2016) Hatano, D., Fukunaga, T., and Kawarabayashi, K. Adaptive budget allocation for maximizing influence of advertisements. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, IJCAI 2016, pp. 3600–3608, 2016.
  • Javdani et al. (2014) Javdani, S., Chen, Y., Karbasi, A., Krause, A., Bagnell, D., and Srinivasa, S. S. Near optimal bayesian active learning for decision making. In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, AISTATS 2014, pp. 430–438, 2014.
  • Kale et al. (2017) Kale, S., Karnin, Z., Liang, T., and Pál, D.

    Adaptive feature selection: Computationally efficient online sparse linear regression under RIP.

    In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, pp. 1780–1788, 2017.
  • Kempe et al. (2003) Kempe, D., Kleinberg, J. M., and Tardos, É. Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, pp. 137–146, 2003.
  • Khanna et al. (2017) Khanna, R., Elenberg, E. R., Dimakis, A. G., Negahban, S., and Ghosh, J. Scalable greedy feature selection via weak submodularity. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, pp. 1560–1568, 2017.
  • Kusner (2014) Kusner, M. J. Approximately adaptive submodular maximization. In NIPS Workshop on Discrete and Combinatorial Problems in Machine Learning, 2014.
  • Leskovec et al. (2007) Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J. M., and Glance, N. S. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2007, pp. 420–429, 2007.
  • Maehara et al. (2017) Maehara, T., Kawase, Y., Sumita, H., Tono, K., and Kawarabayashi, K. Optimal pricing for submodular valuations with bounded curvature. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, AAAI 2017, pp. 622–628, 2017.
  • Nemhauser et al. (1978) Nemhauser, G. L., Wolsey, L. A., and Fisher, M. L. An analysis of approximations for maximizing submodular set functions - I. Math. Program., 14(1):265–294, 1978.
  • Soma et al. (2014) Soma, T., Kakimura, N., Inaba, K., and Kawarabayashi, K. Optimal budget allocation: Theoretical guarantee and efficient algorithm. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, pp. 351–359, 2014.
  • Tang et al. (2014) Tang, Y., Xiao, X., and Shi, Y. Influence maximization: near-optimal time complexity meets practical efficiency. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, pp. 75–86, 2014.
  • Yong et al. (2017) Yong, S. Z., Gao, L., and Ozay, N.

    Weak adaptive submodularity and group-based active diagnosis with applications to state estimation with persistent sensor faults.

    In 2017 American Control Conference (ACC), pp. 2574–2581, 2017.

Appendix A Proof for Adaptive Submodularity Ratio

Proof of creftypecap 1.

First we deal with the “if” part. Let be the partial realization just before is selected in . If there are multiple partial realizations such that , we can duplicate and take them to be different elements. From adaptive submodularity, for any partial realization and policy , we have

(A1)
(A2)

Thus we can see . Moreover, if is a policy that selects a single element, the above inequality holds with equality. These two facts imply .

Next we deal with the “only if” part. Let be any partial realization such that and be any element. We define to be the additional element and its state in , i.e., . Let us consider a policy that first selects and, if , proceeds to select . From the assumption, we have , and thus . We can calculate the left and right hand sides as follows:

(LHS) (A3)
(RHS) (A4)

Therefore, we obtain . By sequentially concatenating inequalities of this type, we can show that the statement holds for any . ∎

Appendix B Proof for the Adaptive Greedy Algorithm

To prove creftypecap 1, we introduce a lemma provided by Golovin & Krause (2011). Let be a concatenated policy, i.e., a policy that executes as if from scratch after executing . Adaptive monotonicity is known to be equivalent to the following condition:

Lemma 1 (Adopted from (Golovin & Krause, 2011, Lemma A.8)).

Fix . Then we have for all with and all if and only if for all policies and , we have .

Proof of creftypecap 1.

Let be any possible partial realization that can appear while executing the adaptive greedy policy . Since stops after steps, we have . According to the definition of adaptive submodularity ratio, we have

(A5)

since . Let be a random partial realization observed by executing , where is a policy obtained by running until it terminates or it selects elements. Formally, conforms to the distribution . Then we can lower-bound the expected single step gain as follows:

(due to the property of the adaptive greedy algorithm)
(due to (A5))
(A6)
(due to adaptive monotonicity and creftypecap 1)

Let . The above inequality can be rewritten as , which implies . By repeatedly using this inequality, we obtain . Consequently, we have . ∎

Appendix C Proofs for Adaptivity Gaps

Proof of creftypecap 2.

Let be an optimal non-adaptive policy and be an optimal adaptive policy. Since is a non-adaptive policy, it selects the same subset for all , i.e., for all and . Let and the non-adaptive policy that selects . From the optimality of , we have

(A7)

By the definition of the supermodularity ratio, we have

(A8)

Note that and for each . Due to the definition of , we have

(A9)

From the definition of adaptive submodularity ratio, we have

(A10)

Combining these inequalities, we have

(A11)
(A12)
(A13)

Proof of creftypecap 1.

From the approximation ratio, we have

(A14)

From creftypecap 2, we have

(A15)

The above two inequalities imply the statement. ∎

From the following example, we can see that creftypecap 2 is tight, i.e., for any rationals and in , there exist and such that the equality holds.

Example 2.

Let be the ground set, where . Let . Let . We define the probability distribution such that with probability for each and with probability . Other elements always in state , i.e., with probability for all . We define the objective function as

(A16)

where is the parameter specified later. We have for all . The supermodularity ratio of is

(A17)

The adaptive submodularity ratio is

(A18)

The adaptivity gap is

(A19)

For any rationals and , there exist some such that and .

Appendix D Proof for Adaptive Influence Maximization

In this section, we provide the full proof for creftypecap 3. For the readability, we first give a proof for the case of the linear threshold model, which is a special case of the triggering model. After that, we give a proof for the case of the triggering model.

d.1 Proof for the Linear Threshold Model

Proof of creftypecap 3 in the case of the linear threshold model.

Let be the source vertices, the sink vertices, and the directed edges. For notational simplicity, assume that is a complete bipartite graph, i.e., . By setting for all edges that originally do not exist, we can assume this without loss of generality. Fix any and . It suffices to prove

(A20)

Let be the expected marginal gain obtained by activating . Below we explain that the above inequality can be separated for each ; i.e., it is enough to prove the above inequality for the case where for just one vertex and for the others. The objective function is the linear sum of the one for each : . Therefore, the above inequality is decomposed into the sum of