# Optimal Sampling Gaps for Adaptive Submodular Maximization

Running machine learning algorithms on large and rapidly growing volumes of data are often computationally expensive, one common trick to reduce the size of a data set, and thus reduce the computational cost of machine learning algorithms, is probability sampling. It creates a sampled data set by including each data point from the original data set with a known probability. Although the benefit of running machine learning algorithms on the reduced data set is obvious, one major concern is that the performance of the solution obtained from samples might be much worse than that of the optimal solution when using the full data set. In this paper, we examine the performance loss caused by probability sampling in the context of adaptive submodular maximization. We consider a easiest probability sampling method which selects each data point independently with probability r∈[0,1]. We define sampling gap as the largest ratio of the optimal solution obtained from the full data set and the optimal solution obtained from the samples, over independence systems. Our main contribution is to show that if the utility function is policywise submodular, then for a given sampling rate r, the sampling gap is both upper bounded and lower bounded by 1/r. One immediate implication of our result is that if we can find an α-approximation solution based on a sampled data set (which is sampled at sampling rate r), then this solution achieves an α r approximation ratio for the original problem when using the full data set. We also show that the property of policywise submodular can be found in a wide range of real-world applications, including pool-based active learning and adaptive viral marketing.

## Authors

• 29 publications
• 22 publications
• ### Instance Specific Approximations for Submodular Maximization

For many optimization problems in machine learning, finding an optimal s...
02/23/2021 ∙ by Eric Balkanski, et al. ∙ 0

• ### Practical Budgeted Submodular Maximization

We consider the Budgeted Submodular Maximization problem, that seeks to ...
07/09/2020 ∙ by Zeev Nutov, et al. ∙ 0

• ### A Unified Framework of Robust Submodular Optimization

In this paper, we shall study a unified framework of robust submodular o...
06/14/2019 ∙ by Rishabh Iyer, et al. ∙ 0

• ### Horizontally Scalable Submodular Maximization

A variety of large-scale machine learning problems can be cast as instan...
05/31/2016 ∙ by Mario Lucic, et al. ∙ 0

• ### Submodular Maximization with Optimal Approximation, Adaptivity and Query Complexity

As a generalization of many classic problems in combinatorial optimizati...
07/20/2018 ∙ by Matthew Fahrbach, et al. ∙ 0

• ### Scaling the Indian Buffet Process via Submodular Maximization

Inference for latent feature models is inherently difficult as the infer...
04/11/2013 ∙ by Colorado Reed, et al. ∙ 0

• ### A Sketching Method for Finding the Closest Point on a Convex Hull

We develop a sketching algorithm to find the point on the convex hull of...
02/21/2021 ∙ by Roozbeh Yousefzadeh, et al. ∙ 15

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Many machine learning methods are highly benefitted when they are fed with the right volume of data. One common approach to reduce the volume of a large data set is probability sampling, which generates a sampled data set by including each data point with a known probability. However, one major concern of running an algorithm on a sampled data set is that the performance of the sampling-based solution might be much worse than that of the optimal solution when using the full data set. In this paper, we examine the performance loss caused by probability sampling in the context of adaptive submodular maximization over independence systems.

Due to the wide applicability of submodular functions, submodular maximization, whose objective is to select a group of items to maximize a submodular function on various types of independence systems, including matroid (Nemhauser et al. 1978, Calinescu et al. 2007) and knapsack (Sviridenko 2004), has been extensively studied in the literature. Most of existing studies focus on the non-adaptive setting, where each item has a deterministic state and all items must be selected at once. Recently, Golovin and Krause (2011), Asadpour and Nazerzadeh (2016) extend this study to the adaptive setting. Golovin and Krause (2011)

introduce the problem of adaptive submodular maximization, which is a stochastic variant of the classical non-adaptive submodular maximization problem. Under the adaptive setting, each item has a particular state which is unknown initially. One must pick an item before observing its realized state. An adaptive policy can be represented using a decision tree which specifies which item to pick next based on the realizations observed so far. One classical example of adaptive submodular maximization is sensor selection. In this example, we would like to select a limited number of sensors to monitor some targets

(Asadpour and Nazerzadeh 2016). The state of each sensor has two possible values: failure or normal. One must select a sensor before observing its realized state. We further assume that each sensor can monitor a known set of targets if it is in normal state; otherwise, it fails to monitor any targets. Our objective is to adaptively select a group of sensors that maximizes the total number of targets that can be monitored in expectation. A typical adaptive selection policy works as follows: Select the first sensor and observe its state, then selects the second sensor based on the past outcome, and this process iterates until sensors have been selected. Golovin and Krause (2011) develop a simple adaptive greedy algorithm that achieves a approximation for the problem of cardinality constrained adaptive submodular maximization. Their algorithm starts with an empty set, and in each iteration, it selects an item with the largest marginal utility on top of the current observation. This algorithm requires value oracle queries, where is the size of the ground set and is the cardinality constraint. However, evaluating the marginal utility of an item is expensive in many data intensive applications, making the standard greedy algorithm infeasible in practise. One natural idea to reduce the computational cost of any machine learning algorithms is to run them on a reduced ground set that is sampled from the full set. One major concern is that the output restricted to the sampled data set might be much worse than that of the optimal solution when using the full data set. This raises the sampling gap question:

What is the maximum ratio between the expected utility of the optimal solution (over independence systems) when using the sampled data set and the optimal solution when using the full data set?

If this sampling gap is small then we can focus on finding a good solution based on the sampled data set while enjoying its benefits of reduced computational cost and small performance loss. In this work, we consider a easiest sampling method that selects each item from the full set independently with probability . Our objective is to examine the performance loss due to the probability sampling in the context of adaptive submodular maximization.

Overview of Results. We first introduce a class of stochastic functions, called policywise submodular function. Policywise submodularity refers to the property of diminishing returns over optimal policies, and we show that this property can be found in a wide range of real-world applications, including pool-based active learning and adaptive viral marketing. Our main contribution is to show that if the utility function is policywise submodular, then for a given sampling rate , the sampling gap, i.e., the maximum ratio between the optimal solution based on samples and the optimal solution based on the full data set, over independence systems, is both upper bounded and lower bounded by . One major implication of our result is that if we can find an -approximation solution based on a sampled data set (which is sampled at sampling rate ), then this solution achieves an approximation ratio for the original problem when using the full data set.

## 2 Related Works

The adaptive variant of submodular maximization has been extensively studied in the literature (Chen and Krause 2013, Tang and Yuan 2020, Tang 2020, Yuan and Tang 2017, Fujii and Sakaue 2019, Gabillon et al. 2013, Golovin et al. 2010, alaei2021maximizing). For the case of maximizing an adaptive monotone and adaptive submodular function subject to a cardinality constraint, Golovin and Krause (2011) develops a simple adaptive greedy policy that achieves a tight approximation ratio. For the nonmonotone case, Tang (2021) propose an adaptive random greedy algorithm that achieves a approximation ratio. Very recently, (Amanatidis et al. 2020) consider the problem of maximizing a non-monotone adaptive submodular and pointwise submodular function subject to a knapsack constraint, and they develop a constant approximation solution for this case. Given the rapid growth of data volume, much recent research in submodular maximization has explored the possibility of developing fast and practical algorithms (Leskovec et al. 2007, Badanidiyuru and Vondrák 2014, Mirzasoleiman et al. 2016, Ene and Nguyen 2018, Mirzasoleiman et al. 2015, Tang 2021), and many of them have adopted the technique of random sampling to reduce the computational cost. However, all these results are both problem- and algorithm-dependent, and do not extend to arbitrary algorithms and constraints. Our study complements the existing studies by establishing a general framework for measuring the performance loss caused by the probability sampling in the context of adaptive submodular maximization. Our main technical contribution is to obtain optimal bounds for the sampling gaps for maximizing a policywise submodular function over independence systems.

## 3 Preliminaries

We start by introducing some important notations. In the rest of this paper, we use to denote the cardinality of a set . All missing proofs are moved to the appendix.

### 3.1 Independence System

An independence system on the set is a collection of subsets of such that:

1. ;

2. , which is called the independent sets, is downward-closed, that is, and implies that .

Examples of independence systems include matroid, knapsack, matching, and independent set. The upper rank of an independence system on is defined as the size of the largest subset from , i.e., . For any two sets and such that , let .

We next present three useful properties of any independence system . These properties will be used later to derive the main results of this paper. Consider any two sets and such that , we have is an independence system and .

Assume . For any item such that , we have is an independence system and .

Consider any three sets , , and such that and , we have .

### 3.2 Items and States

We consider a set of items, where each item is in a particular state, which is unknown initially, from . We use a function , called a realization, to represent the states of all items, where represents the realization of item . Let denote the random realizations of , where is a random realization of . For any , let denote the random realizations of

. There is a known prior probability distribution

over realizations. One must select an item before observing the value of its realization . After selecting a set of items, we are able to observe a partial realization of those items’ states. The domain of a partial realization is the set of all items involved in . A partial realization is said to be consistent with a realization , denoted , if they are equal everywhere in . A partial realization is said to be a subrealization of another partial realization , denoted , if and they are equal everywhere in the domain of . Given a partial realization , denote by the conditional distribution over realizations conditioned on : .

### 3.3 Policies

Any adaptive policy can be represented as a function that maps a set of observations to a distribution of : . It specifies which item to select next based on the past outcomes. There is a utility function from a subset of items and their states to a non-negative real number:

. Let random variable

denote the subset of items selected by under realization . The expected utility of a policy can be written as

 favg(π)=EΦ∼p(ϕ),Πf(V(π,Φ),Φ) (1)

The expectation is taken over the realization and the random output of the policy.

[Conditional Expected Marginal Utility of a Policy] Given a utility function , the conditional expected marginal utility of a policy on top of a partial realization is , where the expectation is taken over (1) realizations with respect to , and (2) the random output of the policy.

[-restricted Policy] Consider two sets and such that . A policy is -restricted if . Let denote the set of all -restricted polices.

[Optimal -restricted Policy on top of ] Consider two sets and such that , and a partial realization . Define as the optimal -restricted policy on top of , i.e.,

 π∗(IRS,ψ)∈argmaxπ∈Ω(IRS)favg(π|ψ) (2)

Based on the above notation, represents the best policy over an independence system , i.e., or equivalently, . In the rest of this paper, for any , let denote for short.

### 3.4 Policywise Submodularity and Sampling Gap

In this paper, we propose a new class of stochastic utility functions, policywise submodular functions. We define policywise submodularity as the diminishing return property about the expected marginal gain of the optimal policy over independence systems. [Policywise Submodularity] A function is policywise submodular with respect to a prior and an independence system if for any two partial realizations and such that and , and any such that , we have

 favg(π∗(IRdom(ψb),ψa)|ψa)≥favg(π∗(IRdom(ψb),ψb)|ψb) (3)

Later we show that the above property can be found in many real-world applications. For example, a variety of objective functions, including generalized binary search (Golovin and Krause 2011), EC (Golovin et al. 2010), ALuMA(Gonen et al. 2013), and the maximum Gibbs error criterion(Cuong et al. 2013), used in active learning are policywise submodular. This property can also be found in other applications wherever the states of the items are independent (Asadpour and Nazerzadeh 2016). Moreover, we prove that the utility function of the adaptive viral marketing (Golovin and Krause 2011) is also policywise submodular.

Next we introduce the concept of sampling gap, which is defined as the ratio of the optimal solution obtained from the full data set and the optimal solution obtained from the sampled data set, over independence systems. The sampling gap measures the performance loss of the optimal solution due to the probability sampling. Intuitively, for a given sampling rate , a smaller sampling gap indicates less performance loss due to probability sampling. [Sampling Gap] Let be a random subset of where each item is included in with probability . Define the sampling gap at sampling rate as the largest (worst-case instance of ) ratio of the optimal policy when using the full ground set and optimal policies when using the sampled set .

 max(f,p(ϕ),V,I)favg(π∗V)ET[favg(π∗T)] (4)

In this paper, we restrict our attention to the case of maximizing a policywise submodular function over independence systems. I.e., our goal is to provide an answer to the following question:

Given that is policywise submodular with respect to a prior and an independence system , what is the sampling gap at sampling rate for any given ?

## 4 Sampling Gaps for a Policywise Submodular Function

In this section we provide our main result, the optimal sampling gap for policywise submodular functions over independence systems. We prove the upper bound of Theorem 4 in section 4.1 and we prove the lower bound of Theorem 4 in section 4.2.

The sampling gap at sampling rate for maximizing a policywise submodular function over independence systems is exactly .

### 4.1 Upper Bound of 1/r

We first present an upper bound of the sampling gap over independence systems. Note that for an arbitrary partial realization such that , Lemma 3.1 implies that is an independence system. Before presenting the main theorem, we first prove the following technical lemma. Suppose is policywise submodular with respect to a prior and an independence system . For an arbitrary partial realization such that , is also policywise submodular with respect to a prior and an independence system .

Now we are ready to present the main theorem of this paper.

Let be a random subset of where each item is included in independently with probability . Suppose is policywise submodular with respect to a prior and an independence system , we have

 ET[favg(π∗T)]≥(1−r)f(∅)+rfavg(π∗V)

Proof: We prove this lemma through the induction on the upper rank of the independence system .

For the base case when , we have for any and . Hence, .

Assume the statement holds for all independence systems such that , we next prove that it holds for all independence systems when . Assume is the root of the decision tree of , i.e., is the first item selected by , we next construct a policy such that

• If , first selects , then adopts the optimal -restricted policy on top of .

• If , adopts the optimal -restricted policy on top of .

We first show that is a feasible -restricted policy. is a feasible -restricted policy.

Because is an optimal -restricted policy and is a feasible -restricted policy (Lemma 4.1), we have for any . Hence, . To prove this theorem, it suffice to show that

 ET[favg(πT)]≥(1−r)f(∅)+rfavg(π∗V) (5)

Hence, we next focus on proving (5). We first compute the expected utility of for a given .

In the case of , we have

 favg(πT)=f(∅)+f(s|∅)+EΦ(s)[favg(π∗(IT∖{s}{s},Φ(s))|Φ(s))] (6)

In the case of , we have

 favg(πT) = f(∅)+favg(π∗(IT{s},∅)|∅) (7) ≥ f(∅)+EΦ(s)[favg(π∗(IT{s},Φ(s))|Φ(s))] (8)

The inequality is due to is policywise submodular with respect to and . Taking the expectation over , we next bound the expected utility of .

 ET[favg(πT)] = r(f(∅)+f(s|∅)+EΦ(s),T[favg(π∗(IT∖{s}{s},Φ(s))|Φ(s))|s∈T]) (10) +(1−r)(f(∅)+ET[favg(π∗(IT{s},∅)|∅)|s∉T]) ≥ r(f(∅)+f(s|∅)+EΦ(s),T[favg(π∗(IT∖{s}{s},Φ(s))|Φ(s))|s∈T]) (12) +(1−r)(f(∅)+EΦ(s),T[favg(π∗(IT{s},Φ(s))|Φ(s))|s∉T]) = r(f(∅)+f(s|∅)+EΦ(s),T[favg(π∗(IT∖{s}{s},Φ(s))|Φ(s))]) (14) +(1−r)(f(∅)+EΦ(s),T[favg(π∗(IT∖{s}{s},Φ(s))|Φ(s))]) = (1−r)f(∅)+r(f(∅)+f(s|∅))+EΦ(s),T[favg(π∗(IT∖{s}{s},Φ(s))|Φ(s))] (15) ≥ (1−r)f(∅)+r(f(∅)+f(s|∅)+EΦ(s)[favg(π∗(IV∖{s}{s},Φ(s))|Φ(s))]) (16) = (1−r)f(∅)+rfavg(π∗V) (17)

The first inequality is due to (6) and (8). The second inequality is due to the observation that for any ,

 ET[favg(π∗(IT∖{s}{s},Φ(s))|Φ(s))]≥rfavg(π∗(IV∖{s}{s},Φ(s))|Φ(s)) (18)

(18) follows from the inductive assumption based on the following three facts: (1) (Lemma 3.1), (2) is policywise submodular with respect to a prior and an independence system for any (Lemma 4.1), and (3) for any .

Theorem 4.1 implies the following two corollaries. Given that is policywise submodular with respect to a prior and an independence system , then the sampling gap at sampling rate is upper bounded by , i.e.,

 max(f,p(ϕ),V,I)∈Λfavg(π∗V)ET[favg(π∗T)]≤1r (19)

where denotes the set of all instances that satisfy the aforementioned condition, i.e., . The following corollary shows that if we can find an approximation solution over a sampled data set, then this solution achieves a bounded approximation ratio for the original problem when using the full data set. If is policywise submodular with respect to a prior and an independence system , and there exists an -approximation -restricted policy for every , i.e., , then

 favg(π∗V)ET[favg(παT)]≤αr (20)

### 4.2 Lower Bound of 1/r

In this section we show a policywise submodular function and a simple cardinality constraint where the sampling gap at sampling rate is . Assume , i.e., the ground set contains only one item, and , i.e., there is only one state. Define and . Moreover, . We first show that is policywise submodular with respect to and . Because , we have , which implies that is policywise submodular with respect to and . We next show that the sampling gap at sampling rate is at least . Let be a random set where appears with probability . In the case of , selects , thus, . In the case of , selects , thus, . Thus, . Moreover, always selects , thus, . Hence, the sampling gap at sampling rate is lower bounded by .

## 5 Applications

We next show that the property of policywise submodularity can be found in a wide range of real-world applications. We start with introducing two well-studied classes of stochastic functions. Then we build a relation between the property of policywise submodularity and the aforementioned stochastic functions. [Policy-adaptive Submodularity] A function is policy-adaptive submodular with respect to , if for any two partial realizations and such that , and any policy such that , we have

 favg(π|ψa)≥favg(π|ψb) (21)

[Adaptive Submodularity] A function is adaptive submodular with respect to , if for any two partial realizations and such that , and any item , we have

 favg({e}|ψa)≥favg({e}|ψb) (22)

Our next lemma shows that policy-adaptive submodularity implies both adaptive submodularity and policywise submodularity. Later, we show that policy-adaptive submodularity is a strictly stronger condition than policywise submodularity. If is policy-adaptive submodular with respect to , then is both adaptive submodular with respect to and policywise submodular with respect to and any independence system . Proof: Because selecting a single item is also a policy, policy-adaptive submodularity implies adaptive submodularity. We next prove that policy-adaptive submodularity implies policywise submodularity. Consider any two partial realizations and such that and , and any such that . Because is policy-adaptive submodular with respect to , we have

 favg(π∗(IRdom(ψb),ψb)|ψa)≥favg(π∗(IRdom(ψb),ψb)|ψb) (23)

Based on the definition of , we have . Together with (23), we have . Hence, is policywise submodular with respect to and any independence system .

Thanks to the recent progress in adaptive submodular maximization (Golovin and Krause 2011, Tang 2021), there exists efficient solutions for maximizing an adaptive submodular function subject to many practical constraints, including matroid and knapsack constraints. Hence, Lemma 5, together with Corollary 4.1, implies that if a function is policy-adaptive submodular with respect to , then running existing algorithms on the sampled ground set has comparable performance to running them on the full set.

We next discuss three representative applications whose objective function satisfies the policywise submodularity. The objective functions in the first two applications are policy-adaptive submodular, which implies both adaptive submodularity and policwise submodularity. Although the objective function in the third application does not satisfy the policy-adaptive submodularity, it is still adaptive submodular and policwise submodular. This implies that policy-adaptive submodularity is a strictly stronger condition than policywise submodularity.

#### Application 1: Pool-based Active Learning (Golovin et al. 2010).

We use to denote the set of candidates hypothesis. Each hypothesis represents some realization, i.e., . Let be a prior distribution over hypotheses. Define for any . Then the prior distribution over realizations can be represented as as . The version space under observations is defined to be . The utility function of generalized binary search under the Bayesian setting is

 f(S,ϕ)=1−pH(H(ψ(S))) (24)

The following proposition follows from the fact that the above utility function is policy-adaptive submodular with respect to (Proposition A.1 in Fujii and Kashima (2016)) and Lemma 5. The utility function of pool-based active learning is both adaptive submodular with respect to and policywise submodular with respect to and any independence system .

Moreover, many other types of objective functions of active learning, including EC (Golovin et al. 2010), ALuMA(Gonen et al. 2013), and the maximum Gibbs error criterion (Cuong et al. 2013), are both adaptive submodular and policywise submodular.

#### Application 2: The Case of Independent Items (Asadpour and Nazerzadeh 2016).

The property of policywise submodularity can also be found in any applications in which the states of items are independent of each other. One such application is sensor selection (Golovin and Krause 2011). The following proposition follows from Lemma 5 and the fact that if are independent and is adaptive submodular with respect to , then is policy-adaptive submodular with respect to (Proposition A.5 in Fujii and Kashima (2016)).

If are independent and is adaptive submodular with respect to , then is policywise submodular with respect to and any independence system .

#### Application 3: Adaptive Viral Marketing (Golovin and Krause 2011).

The third application is the adaptive variant of viral marketing (Golovin and Krause 2011). We use a directed graph to represent a social network, where represents a set of individuals and represents a set of edges. Under the Independent Cascade Model (Kempe and Mahdian 2008), each edge is associated with a propagation probability . In step , we activate a set of seeds. Then, in each subsequence step , each individual , that is newly activated, has a single chance to activate each of its neighbors ; it succeeds with a probability . If succeeds, then becomes activated in step . This process is iterated till no more individuals are newly activated. Under the adaptive setting(Golovin and Krause 2011), we model the state of as a function , where means that selecting reveals that is blocked (i.e., fails to activate ), means that selecting reveals that is live (i.e., succeeds in activating ), and means that selecting can not reveal the status of . For a given set of seeds and a realization , we define the utility of given as the number of individuals that can be reached by at least one seed from through live edges, i.e.,

 f(S,ϕ)=|{v|∃u∈S,w∈V∖S,ϕu((w,v))=1}|+|S| (25)

The utility function of adaptive viral marketing is policywise submodular with respect to and any independence system . Proof: For any partial realization , let denote the set of seeds that are selected under , denote the set of all individuals that are activated under , and denote the set of observed edges under . Consider any two partial realizations and such that and , and any such that . Given an optimal -restricted policy and its decision tree conditioned on , we next build a -restricted policy such that .

Construction of . For each pair of partial realization conditioned on and item such that , we define for all such that and . The intuition behind is to mimic the execution of conditioned on by ignoring those individuals which can not be activated through a live path (i.e., a path composed of all live edges) within .

We first show that is a -restricted policy. Because is a -restricted policy, we have and for each pair of partial realization conditioned on and item such that . Recall that we define for all such that and , the second condition ensures that . Together with the fact that , we have that is a -restricted policy.

We next prove that . Let denote the set of individuals that are not activated under . Clearly, will only select individuals from , this is because selecting any individual from has zero marginal utility on top of . Conditioned on , let denote the realization of , excluding the states of those edges from . Following the design of , and select the same group of individuals as seeds conditioned on a given realization . Thus, for any node , if is activated by conditioned on and , which implies that there exists a live path in such that it connects with some seed selected by , then must be activated by conditioned on and . Hence, . Moreover, because , has the same distribution conditioned on both and . Thus,

 favg(π|ψa)=∑Φ−E(ψb)(U)Pr[Φ−E(ψb)(U)|ψa]favg(π|Φ−E(ψb)(U)∪ψa) (26) ≥∑Φ−E(ψb)(U)Pr[Φ−E(ψb)(U)|ψb]E[favg(π∗(IRdom(ψb),ψb)|ψb)|Φ−E(ψb)(U)∪ψb] (27) =favg(π∗(IRdom(ψb),ψb)|ψb) (28)

Because is the optimal -restricted policy on top of , we have