# Approximate Submodular Functions and Performance Guarantees

We consider the problem of maximizing non-negative non-decreasing set functions. Although most of the recent work focus on exploiting submodularity, it turns out that several objectives we encounter in practice are not submodular. Nonetheless, often we leverage the greedy algorithms used in submodular functions to determine a solution to the non-submodular functions. Hereafter, we propose to address the original problem by approximating the non-submodular function and analyze the incurred error, as well as the performance trade-offs. To quantify the approximation error, we introduce a novel concept of δ-approximation of a function, which we used to define the space of submodular functions that lie within an approximation error. We provide necessary conditions on the existence of such δ-approximation functions, which might not be unique. Consequently, we characterize this subspace which we refer to as region of submodularity. Furthermore, submodular functions are known to lead to different sub-optimality guarantees, so we generalize those dependencies upon a δ-approximation into the notion of greedy curvature. Finally, we used this latter notion to simplify some of the existing results and efficiently (i.e., linear complexity) determine tightened bounds on the sub-optimality guarantees using objective functions commonly used in practical setups and validate them using real data.

## Authors

• 11 publications
• 10 publications
• 14 publications
• ### Minimizing approximately submodular functions

The problem of minimizing a submodular function is well studied; several...
05/29/2019 ∙ by Marwa El Halabi, et al. ∙ 0

• ### Robust Maximization of Non-Submodular Objectives

We study the problem of maximizing a monotone set function subject to a ...
02/20/2018 ∙ by Ilija Bogunovic, et al. ∙ 0

• ### Approximation Guarantees of Stochastic Greedy Algorithms for Non-monotone Submodular Maximization with a Size Constraint

The stochastic greedy algorithm (SG) is a randomized version of the gree...
08/17/2019 ∙ by Shinsaku Sakaue, et al. ∙ 0

• ### On the Approximation Relationship between Optimizing Ratio of Submodular (RS) and Difference of Submodular (DS) Functions

We demonstrate that from an algorithm guaranteeing an approximation fact...
01/05/2021 ∙ by Pierre Perrault, et al. ∙ 0

• ### Scalable Greedy Feature Selection via Weak Submodularity

Greedy algorithms are widely used for problems in machine learning such ...
03/08/2017 ∙ by Rajiv Khanna, et al. ∙ 0

• ### Batch greedy maximization of non-submodular functions: Guarantees and applications to experimental design

We propose and analyze batch greedy heuristics for cardinality constrain...
06/03/2020 ∙ by Jayanth Jagalur-Mohan, et al. ∙ 0

• ### DOPE: Distributed Optimization for Pairwise Energies

We formulate an Alternating Direction Method of Mul-tipliers (ADMM) that...
04/11/2017 ∙ by Jose Dolz, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

A multitude of problems in machine learning, control, game theory, economics can be modeled as discrete optimization

(Das et al., 2012; Tropp, 2004; Zhang, 2008; Golovin and Krause, 2011; Guillory and Bilmes, 2011; Hoi et al., 2006). Specifically, those captured by a maximization of set functions subject to cardinality constraints (Xue et al., 2016; Tzoumas et al., 2018; Krause et al., 2008; Schnitzler et al., 2015; Das and Kempe, 2011). The set functions chosen as objectives can have arbitrary structures that are problem-dependent. Nonetheless, in a quest to quantify sub-optimality in such discrete optimization problems, we often try to unveil structures (i.e., subclasses of functions with specific properties) that enable us to either develop efficient algorithms to determine the optimal solution, or to approximate the solutions when the problem is NP-hard yet with sub-optimality guarantees. Within the latter class, there was a surge for submodular functions111A set function over a ground set is referred as submodular if and only if for all sets , (Bach, 2013)., whose approximate solution can be determined by a greedy algorithm that determines the suboptimal solution by recursively adding the element which maximizes the objective. Also, this algorithm is known to achieve a constant performance bound (Feige et al., 2011; Nemhauser et al., 1978; Sviridenko, 2004; Buchbinder et al., 2012), which can be improved by using the concept of curvature in (Conforti and Cornuejols, 1984; Iyer et al., 2013).

Notwithstanding, some objectives of interest do no possess such properties (Bian et al., 2017), e.g., Bayesian A-optimality, determinantal function, subset selection in objective (Das and Kempe, 2011), sparse approximation (Das and Kempe, 2008; Krause and Cevher, 2010). Subsequently, it has been proposed to try to use the greedy algorithm, which empirically leads to a good performance. Is this a coincidence, or are there implicit features that justify some of the empirical performances obtained?

This is the quest we pursue in the present work. Specifically, we seek answers to the following questions:

• Is it possible to approximate an arbitrary set function by a submodular one within a given error everywhere (i.e., for all possible sets)?

• Can we quantify the error incurred by such approximation?

• For functions with the same approximation error, can we find some functions that are preferable, in the sense that if we perform suboptimal greedy algorithms, we are guaranteed to achieve a smaller sub-optimality gap?

In this work, we will see how the parameter submodularity ratio can be used to interpret better the meaning of closeness to submodularity and how we can limit the error in the approximation. We will also see that curvature of submodular function plays a vital role when it comes to the choice of approximation function offering same errors. The techniques developed will establish the relationship between non-submodular functions and submodular functions which may not be intuitive at first look but can be very useful.

Main contributions.

The present paper is motivated by the above questions, and in particular, the main contributions are as follows:

approximation We propose that any set function can be approximated as a approximate submodular function. This is used to capture the ‘divergence’ between a given non-submodular function and a submodular function we use as an approximation.

Region of submodularity A novel notion of the region of submodularity is introduced which characterizes the class of submodular functions that are -approximations of a given set function. Such notion can be used to better understand the meaning of closeness to be submodular.

Performance bounds We propose a novel definition for greedy curvature which unifies the existing one in (Conforti and Cornuejols, 1984). The proposed definition can be leveraged to simplify the proof of current theorems on curvature bounds (Conforti and Cornuejols, 1984; Bian et al., 2017). Building upon it, we proved the performance bound for the greedy algorithm using approximate submodular functions. Remarkably, the computational complexity is only linear (in contrast with the ones available in the literature).

The performance bound will be lower than that of submodular function. We provide an intuition into the decrease in performance as a penalty paid to deviate from submodular functions in terms of approximation error.

## 2 Preliminaries

A discrete function defined over the ground set is monotone non-decreasing if for all , . The marginal gain of an element with respect to a set is defined as .

Throughout this work, we are concerned about cardinality constrained maximization problem, i.e.,

 max|S|≤k,S⊆Ωf(S). (1)

The objective function in the above discrete optimization problem will possibly be a non-submodular function. The optimal solution of this problem will be referred as such that and . Before stating the definition of approximate submodularity, we define the divergence between two discrete set functions.

###### Definition 1.

The divergence between two non-decreasing set functions and defined over the same set is denoted as

 d(f,g)=maxS⊆Ω,a∈Ω∖S∣∣∣fS(a)gS(a)−1∣∣∣. (2)

The approximate submodular function using the above definition of divergence is defined as follows.

###### Definition 2.

A function , possibly non-submodular, is referred as approximate submodular if there exists a non-decreasing submodular function such that .

The above definition can be re-written in its most useful form for the rest of the paper as

 (1−δ)gS(a)≤fS(a)≤(1+δ)gS(a), ∀S⊆Ω,a∈Ω∖S. (3)

Intuitively, any finite-valued set function can be looked as approximate submodular function, but we will restrict to the scenario where in Definition 2. It can be seen that is exactly submodular if and only if . For any non-negative set function the submodularity ratio was introduced as a parameter to measure closeness to submodularity (Das and Kempe, 2011). Furthermore, this notion can be generalized to the so-called generalized submodularity ratio as described in the next definition.

###### Definition 3 (generalized submodularity ratio (Bian et al., 2017)).

The submodularity ratio of a non-negative set function is given by

 γf=minS,T⊆Ω∑t∈T∖SfS(t)fS(T). (4)

The is such that , and is equal to if and only if is submodular. Another useful parameter called curvature of submodular functions, which is a measure of deviation from modularity, was introduced in (Conforti and Cornuejols, 1984) to write better performance bounds. The total curvature of a submodular function is defined as follows.

 αT=1−mina∈ΩfΩ∖a(a)f(a). (5)

The is such that and is equal to for modular/additive functions. Similar to the total curvature for submodular functions, we can write generalized curvature for any non-negative set function as

###### Definition 4 (Generalized curvature (Bian et al., 2017)).

The curvature of a non-negative set function can be written as

 α=1−minS,T⊆Ω,s∈S∖TfS∖{s}∪T(s)fS∖{s}(s). (6)

The generalized curvature is between and for the case of non-decreasing functions and for the particular case of being submodular it can be verified that .

For the case of greedy algorithm, if denotes the ordered set solution of cardinality constrained maximization problem and be the optimal solution such that and , then we can define the greedy curvature in contrast to that in (Conforti and Cornuejols, 1984) as follows.

###### Definition 5 (Greedy curvature).

For a non-decreasing set function and such that , the greedy curvature is defined as

 αG = 1−min1≤i≤k⎧⎪ ⎪⎨⎪ ⎪⎩mina∈SG∖(Si−1G∪Ω∗)fSi−1G∪Ω∗(a)fSi−1G(a),mina∈(SG∩Ω∗)∖Si−1Gi≤j≤k−1fSj−1G(sj)fSi−1G(a)mina∈(SG∩Ω∗)∖Si−1Gi≤j≤kfSj−1G(sj)fSi−1G(a)⎫⎪ ⎪⎬⎪ ⎪⎭, (7)

where for , .

The greedy curvature defined above is always less than or equal to greedy curvature defined in (Conforti and Cornuejols, 1984). The second term introduced in the above expression bounds the consecutive marginals of greedy selection. This kind of technique will play a key role in proving performance bounds of the greedy algorithm and will simplify the proofs to a great extent. It can be easily verified that . The main results and interpretations are presented in the following section.

## 3 Results

We start by addressing the necessary conditions that a set function must satisfy to be approximate submodular. First, notice that if the function is submodular, then can be set to zero, and vice versa. On the other hand, if is different from zero, then we show that it cannot be arbitrary close. In other words, we obtain a necessary condition that establishes a ‘submodularity gap’. Subsequently, we aim to characterize in further detail the properties of these functions, which lead us to introduce the notion of the region of submodularity (ROS). Within this region, there may exist different submodular functions, whose curvature differs and directly impact the optimality guarantees. Specifically, the lower the curvature is, the better the performance as shown in (Conforti and Cornuejols, 1984). Therefore, we aim to leverage these properties to show the performance of the greedy algorithm and obtain one of the main results of the paper, i.e., the improved constant optimality guarantees.

### 3.1 Approximate submodularity

###### Lemma 1.

A non-submodular function with submodularity ratio can be represented as approximate submodular to some submodular function only if .

###### Proof.

We will prove this by contradiction, let us assume that or . The submodularity ratio of any satisfying (3) can be written as

 γg=minS,T⊆Ω∑t∈T∖SgS(t)gS(T) ≤1+δ1−δminS,T⊆Ω∑t∈T∖SfS(t)fS(T)<1γfγf=1.

where the first inequality is written using (3). Hence, cannot be submodular. ∎

The above result gives interesting insight into what do we mean by closeness of a function to being submodular using the submodularity ratio. The value of limits the smallest possible value of and determines how close can be the function marginals to that of some submodular function. The region of submodularity, is defined as.

 ROS(f,δ)={g | 1−γf1+γf≤d(f,g)≤δ}, (8)

which is the collection of for which a given function can be termed as approximate submodular. As shown in Figure 1, for a given the submodular functions in the shaded region can be used to describe it as an approximation. The closest value of is restricted by . It should be noted that multiple can be used to approximate the given as approximate submodular and hence, from the viewpoint of performance bound, we are interested in the with minimum total curvature for the given value of . Let us denote as the curvature of the selected , i.e.,

 αδ=ming∈ROS(f,δ)αT(g). (9)

We will now state the result regarding performance of naive greedy selection for approximate submodular functions.

### 3.2 Constant performance bound

It is worth to mention the result for the performance bound of a submodular function with total curvature .

###### Theorem 1 (from (Conforti and Cornuejols, 1984)).

For a non-negative non-decreasing submodular function with total curvature , if denotes the optimal value of subject to , then the output of greedy algorithm, satisfies .

Also, for any set function (may or may not be approximate), the performance of greedy can be lower bounded as explained in the next result.

###### Theorem 2 (from (Bian et al., 2017)).

Let be a non-negative non-decreasing set function with submodularity ratio and curvature . The output of greedy algorithm satisfies .

For approximate submodular functions it can be shown that the greedy selection algorithm offers constant performance bound as captured in the next result.

###### Theorem 3.

For a given approximate submodular function with submodularity ratio and such that , the greedy algorithm has guaranteed constant performance.

 f(SG)≥1αδ1−δ1+δ+2δ1+δ(1−(1−1k(αδ1−δ1+δ+2δ1+δ)(1−δ1+δ))k)OPT. (10)
###### Proof (sketch).

The detailed proof is provided in the Appendix. The main idea of the proof is to track the deviation of the optimal solution with the greedy algorithm at each step. The submodularity property can be exploited to bound the given function marginals in terms of some submodular function which enable us to apply the properties of submodularity. The greedy curvature defined in (7) is used to bound the successive marginals occurring at any th step. Formally, let us denote the output of the greedy algorithm at the th stage as . The term can be expanded in two different ways as follows:

 f(Ω∗∪SiG)≥f(Ω∗)+(1−αG)∑sj∈SiG∖Ω∗fSj−1G(sj),

and

 f(Ω∗∪SiG)=f(SiG)+fSiG(Ω∗). (11)

The above equations can be combined and re-written after using the property of submodularity from (3) as

 f(Ω∗) ≤ αGf(SiG)+(1−αG)∑sj∈SiG∩Ω∗fSj−1G(sj)+1+δ1−δ∑ω∈Ω∗∖SiGfSiG(ω).

Besides, the greedy algorithm at the th step would select the index as

 fSiG(si+1)=maxa∈Ω∖SiGfSiG(a). (12)

Therefore, the equation (3.2) can be re-written by upper bounding the last term using (17) as

 f(Ω∗) ≤ αGf(SiG)+∑sj∈SiG∩Ω∗{(1−αG)fSj−1G(sj)−fSiG(si+1)}+k1+δ1−δfSiG(si+1).

The middle term in the above equation can be eliminated by exploiting the second term in the definition of greedy curvature in (7). The remaining inequality then reduces to the form of , which has the solution of the form obtained using mathematical induction. Finally, we can write

 f(SG) =k∑i=1fSi−1G(si) ≥12δ1+δ+1−δ1+δαδ(1−(1−1k(2δ1+δ+1−δ1+δαδ)(1−δ1+δ))k)OPT.

For , the above result is then same as Theorem 1 because would be submodular. For , the performance guarantee reduces due to divergence from submodularity property. While corresponds to the total curvature of a submodular function used to approximate the given function , the term can be looked upon as penalty paid for deviating from this submodular function.

It is worth to mention that the proofs of the Theorem 1 and Theorem 2 can be simplified to a large extent if we use new strategies employing our proposed unified definition of the greedy curvature in (7

). The direction of constructing the Linear Programs (LP) introduced in

(Conforti and Cornuejols, 1984) can then be skipped. Consequently, we provide an alternate version of the proof of Theorem 2 in the appendix for reference.

The definition of approximate submodularity in (3) is sufficient for identifying such functions but it often happens that the upper and lower bounds may not be symmetric as we will see in the next section. However, instead of making the bounds loose to bring regularity, such asymmetric behavior can be leveraged to get tighter bounds for the performance of greedy algorithm from Theorem 3. The feasibility conditions from Lemma 1 will also change accordingly. The variations of such asymmetric behavior is summarized in Table 1. Next, we identify some important examples that can be recognized as approximate submodular functions according to the Definition 2.

## 4 Applications

In this section, we identify some critical functions which are not submodular and appear a lot in the applications like sensor selection, sparse learning, etc. These functions are known to perform well with greedy search techniques, and we will establish that being approximate submodular helps in improving their existing performance bounds as well as developing some closeness guarantee to being submodular.

### 4.1 Trace of inverse Gramian

Let us consider a matrix , which for the rest of the paper is denoted in its simplest form as , where is taken from the data matrix

. The ordered eigenvalues of

are denoted as .

The negative trace of the covariance matrix inverse is used as criteria for Bayesian-A optimality in (Krause et al., 2008). The problem is stated in (Bian et al., 2017)

as variance reduction of the Bayesian estimated unknown. For the sake of completeness, we will define the problem here as follows. Given some set of observations

and the linear model which relates the parameter with observations in the presence of Gaussian noise . The problem of sensor selection can be stated as minimizing the conditional variance of with a given sensor budget. If denotes the set of selected sensors, then we have , where are the set of observations indexed according to the elements in and has columns taken from accordingly. With the Gaussian prior assumption for with , the conditional covariance of given can be written as . The objective function is defined as

 f(S)=tr(Λ0)−tr(Λ0+∑s∈SxsxTs)−1. (13)

The negative trace of the inverse of matrix, , is a non-submodular function. We will see that it can be labeled as approximate submodular to a known submodular function, log determinant of , using the following result.

###### Proposition 1.

The negative trace of matrix inverse, is approximately submodular to the submodular function, log determinant of the matrix, with the upper and lower bounds as and , respectively.

Interestingly, this result establishes the relationship between two functions, negative trace of matrix inverse (non-submodular) and log determinant of the matrix (submodular), which may not look intuitive at first glance. We will see the usefulness of this kind of association in the Experiments section.

The matrix of the form of can be extended to Gramian which has applications in the domain of complex networks. The controllability of complex networks plays a fundamental role in network science, cyber-physical systems, and systems biology. The steering of networks has applications ranging from drug design for cancer networks to electrical grids. It has been realized in (Pasqualetti et al., 2014) that negative trace of the observability Gramian inverse can be used to quantize the controllability of the networks subject to some sensors.

### 4.2 Minimum Eigenvalue

While controllability helps to steer the networks, the observability criterion of complex networks is the problem of selecting sensors and an estimation method such that the networks can be reconstructed from the measurements collected by the selected sensors. It has been stated in (Pasqualetti et al., 2014) that minimum eigenvalue of the so-called Observability Gramian can be used to quantize the energy associated with a network state. The maximization of minimum eigenvalue also serves as criteria for matrix inversion in the presence of numerical errors for sparse matrices.

The minimum eigenvalue of the Gramian

 f(S)=λ1(WS), (14)

is non-submodular function (Summers et al., 2016) but we now show that it can be modeled as a approximate submodular function.

###### Proposition 2.

The minimum eigenvalue of Gramian, is approximately submodular to the modular function, trace of Gramian, with the upper and lower bounds as and , respectively, where

 δu = 1−n−1nminω∈Ωλ1(Wω)λn(Wω), δl = 1nminω∈Ωλ1(Wω)λn(Wω).

The above result represents the non-submodular function as an approximation to a modular function. Since is modular, therefore its curvature is . This property can be leveraged in Table 1 to get tight bounds when is not large. Next, we show that there exists another function in the ROS of through the following result.

###### Proposition 3.

The minimum eigenvalue of Gramian, is approximately submodular to the maximum eigenvalue of Gramian, , which is submodular with the upper and lower bounds as and , respectively.

The proofs of the Proposition 1, 2 and 3 are provided in the appendix. With some of these theoretical applications we will see their practical use in the following section.

## 5 Experiments

We apply some of the identified non-submodular functions in the previous section to real-world data and observe the performance of greedy selection. Specifically, first, we establish using simulations that the bounds computed using approximate submodularity concept widely improve the state-of-the-art bounds. Next, we apply the non-submodular function to the problem of finding sensors for the structured data using augmented linear model in the context of real-world electroencephalogram (EEG) dataset.

### 5.1 Tightness analysis of the performance bounds

The performance bound presented in (Bian et al., 2017) requires a combinatorial exhaustive search for the computation of parameters, making it extremely difficult to realize in practical scenarios. To remedy this, the authors have presented the bounds for parameters (curvature and submodularity ratio). The performance bound for greedy algorithm presented in this work in Theorem 3 requires only linear search to compute the curvature of the submodular function, from (5). We make a comparison between the presented bounds and the ones in (Bian et al., 2017) through simulation of negative trace of inverse in (13). To simulate the matrix , the entries of the data matrix

are generated from Gaussian distribution

and each column is normalized, i.e. . The matrix in is taken as . Figure 2 shows the performance bound for different values of . It can be observed that there is a huge gap between the performance bounds using and from Proposition 1, Table 1 and the one using bounds of parameters from (Bian et al., 2017).

An interesting observation can be made when grows large which can be explained using the current theory of approximation. In the limit of large , the ratio of bounds , also the identified submodular function, i.e., log determinant of Gramian becomes constant and, hence . Substituting these values in Table 1, the performance bound goes to which matches that of submodular function.

### 5.2 Efficient sensor selection for accurate brain activity mining

We now present some experiments for a real-world application setting collected by the BCI system with a sampling rate of Hz (Schalk et al., 2004). Individually, we consider channel EEG data set which records the brain activity of subjects when they are performing motor and imagery tasks. Each subject completed a set of tasks while interacting with a target appearing on the screen to imagine motor movements (Goldberger et al., 2000). Briefly, Task was open/closure of left or right fist as the target appears on the screen, Task was imagining the open/closure of left or right fist as the target appears. The Task was open/closure of both fist/feet and Task was imagining open/closure of both fist/feet as the target appears on the screen. The brain activity recorded through EEG sensors during this process is mathematically modeled using the augmented linear model as described in (Xue et al., 2016; Gupta et al., 2018).

The problem can be stated as follows: selection of appropriate sensors subject to the sensors number constraint such that the estimation error of the initial state is minimized. We will resort to the identified approximate submodular function, the minimum eigenvalue of the observability Gramian (14) to select the sensors. The selection of sensors subject to the maximization of the minimum eigenvalue helps in improving the quality of estimation while inversion of the observability matrix is required. The estimation of the initial state is executed via minimum mean squared error criteria (Pequito et al., 2017; Tzoumas et al., 2018). The performance of the selection metric, i.e., minimum eigenvalue is plotted in Figure 3 for a total of tasks. For the evaluation purpose, two metrics are explored namely, mean squared error of the initial state estimation and entropy of the error incurred in the initial state estimation. We have omitted the additive constants in the plots of the entropy. It is observed that the minimum eigenvalue metric performs well with the greedy selection algorithm. We also compared in Figure 3 the performance with respect to the entropy of initial state estimation error while executing greedy and random sensor selection. The performance of the random selection is presented in the form of box plots. We observe a gap between greedy and random selection in all of the tasks.

## 6 Related Work

In this section, we acknowledge works related to non-submodular functions and approximations. Also, we will see the distinction between our current work.

There has been work done in the domain of approximating set functions. Specifically for submodular functions, (Iyer and Bilmes, 2012) discuss functions which are representable as difference of submodular functions, (Goemans et al., 2009; Goel et al., 2009; Svitkina and Fleischer, 2008) approximated submodular functions using value oracle model with polynomial queries. (Iyer et al., 2013) used the curvature of submodular functions to improve the bounds of (Goemans et al., 2009). (Devanur et al., 2013) approximated the submodular functions concerning cut functions of directed/undirected graphs. While these works focus on approximating submodular functions, the closest ones related to approximating non-submodular functions are (Horel and Singer, 2016; Hassidim and Singer, 2016). The authors have considered deviations from submodularity in the context of noisy oracle model. Similarly, there are other works related to noisy versions (Chen et al., 2015; Singla et al., 2015; Kempe et al., 2003). The fundamental model was the presence of a submodular function regarding value oracle and then its deviation from submodularity upon corruption from additive or multiplicative noise. But as we have seen that in many applications the objective function is provided in the closed form and therefore, the number of queries are not the primary concern. Since marginals of functions are more useful in the optimization algorithms, we have seen that defining approximation in terms of function marginals is more useful than regarding function value itself, for example, the definition of approximate submodular in (Horel and Singer, 2016; Li et al., 2017).

The other versions of non-submodularity like restricted and shifted are pointed out in (Du et al., 2008). The restricted submodularity arise when a function is not submodular in general but only when restricted to some subset, while shifted submodularity was defined as . (Borodin et al., 2014) introduced the notion of weak submodular functions which are monotonous and proved the performance bounds.

For a general non-submodular function, (Das and Kempe, 2011) have introduced submodularity ratio (and generalized submodularity ratio (Bian et al., 2017)) which is roughly speaking a measure of the deviation from submodularity. Using generalized submodularity ratio and generalized curvature, (Bian et al., 2017) derived performance bound for any non-submodular function. The computation of the parameters in the bound is a challenge as they require an exhaustive combinatorial search. Instead, the currently proposed framework of approximation computes the performance bound in linear complexity. Besides, it also helps in establishing the correspondence of non-submodular function to some submodular functions with closed-form bounds.

## 7 Conclusion

In this work, we have proposed that any set function can be modeled as a approximate submodular function. A new interpretation of closeness to submodularity is presented using region of submodularity (ROS) which is a function of the submodularity ratio. This methodology offers fundamental insight into the proximity of a function to submodularity, and it is shown that better performance guarantee bounds for the greedy algorithm can be achieved by carefully choosing a submodular function in the ROS with minimum total curvature as a function of . The expression for performance bound is computable in just linear complexity as compared to the existing ones which require a combinatorial search.

Besides, the present results can be copulated with the existing literature on learning submodular functions to bound the approximation errors in the context of query-base setups, and open new venues of research for interpolation of closed-form set functions that often occur in applications such as sensor and/or feature selection. The future work would include sufficient condition for the existence of submodular functions and lower bounds on the minimum curvature in ROS as a function of

, such that the sub-optimality guarantees are improved.

## References

• Bach (2013) Francis Bach. Learning with submodular functions: A convex optimization perspective, 2013. arXiv:1111.6453.
• Bian et al. (2017) Andrew An Bian, Joachim M. Buhmann, Andreas Krause, and Sebastian Tschiatschek. Guarantees for greedy maximization of non-submodular functions with applications, 2017. arXiv:1703.02100.
• Borodin et al. (2014) Allan Borodin, Dai Tri Man Le, and Yuli Ye. Weakly submodular functions, 2014. arXiv:1401.6697.
• Buchbinder et al. (2012) N. Buchbinder, M. Feldman, J. Naor, , and R. Schwartz. A tight (1/2) linear-time approximation to unconstrained submodular maximization. In FOCS, 2012.
• Chen et al. (2015) Yuxin Chen, S. Hamed Hassani, Amin Karbasi, , and Andreas Krause. Sequential information maximization: When is greedy near-optimal? In COLT, 2015.
• Conforti and Cornuejols (1984) Michele Conforti and Gerard Cornuejols. Submodular set functions, matroids and the greedy algorithm: tight worst-case bounds and some generalizations of the Rado-Edmonds theorem. Discrete Applied Math, 7(3):251–274, 1984.
• Das and Kempe (2008) Abhimanyu Das and David Kempe.

Algorithms for subset selection in linear regression.

In

Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing

, pages 45–54, 2008.
• Das and Kempe (2011) Abhimanyu Das and David Kempe. Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection. In ICML, 2011.
• Das et al. (2012) Abhimanyu Das, Anirban Dasgupta, and Ravi Kumar. Selecting diverse features via spectral regularization. In NIPS, 2012.
• Devanur et al. (2013) Nikhil R. Devanur, Shaddin Dughmi, Roy Schwartz, Ankit Sharma, and Mohit Singh. On the approximation of submodular functions, 2013. arXiv:1304.4948.
• Du et al. (2008) Ding-Zhu Du, Ronald L. Graham, Panos M. Pardalos, Peng-Jun Wan, Weili Wu, and Wenbo Zhao. Analysis of greedy approximations with nonsubmodular potential functions. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’08, pages 167–175. Society for Industrial and Applied Mathematics, 2008.
• Feige et al. (2011) U. Feige, V. S. Mirrokni, and J. Vondrak. Maximizing non-monotone submodular functions. SIAM J. Comput, 40(4):1133–1153, 2011.
• Goel et al. (2009) G. Goel, C. Karande, P. Tripathi, and L. Wang. Approximability of combinatorial problems with multi-agent submodular cost functions. In FOCS, 2009.
• Goemans et al. (2009) M. Goemans, N. Harvey, S. Iwata, and V. Mirrokni. Approximating submodular functions everywhere. In SODA, pages 535–544, 2009.
• Goldberger et al. (2000) Ary L. Goldberger, Luis A. N. Amaral, Leon Glass, Jeffrey M. Hausdorff, Plamen Ch. Ivanov, Roger G. Mark, Joseph E. Mietus, George B. Moody, Chung-Kang Peng, and H. Eugene Stanley. Physiobank, physiotoolkit, and physionet. Circulation, 101(23):e215–e220, 2000.
• Golovin and Krause (2011) D. Golovin and A. Krause.

Adaptive submodularity: Theory and applications in active learning and stochastic optimization.

JAIR, 42:427–486, 2011.
• Guillory and Bilmes (2011) A. Guillory and J. Bilmes. Simultaneous learning and covering with adversarial noise. In ICML, 2011.
• Gupta et al. (2018) Gaurav Gupta, Sérgio Pequito, and Paul Bogdan. Dealing with unknown unknowns: Identification and selection of minimal sensing for fractional dynamics with unknown inputs. to appear in American Control Conference 2018, 2018. arXiv:1803.04866.
• Hassidim and Singer (2016) Avinatan Hassidim and Yaron Singer. Submodular optimization under noise, 2016. CoRR, abs/1601.03095.
• Hoi et al. (2006) S. Hoi, R. Jin, J. Zhu, and M. Lyu. Batch mode active learning and its application to medical image classification. In ICML, 2006.
• Horel and Singer (2016) Thibaut Horel and Yaron Singer. Maximization of approximately submodular functions. In NIPS, 2016.
• Iyer and Bilmes (2012) R. K. Iyer and J. A. Bilmes. Algorithms for approximate minimization of the difference between submodular functions, with applications. In UAI, 2012.
• Iyer and Bilmes (2013) R. K. Iyer and J. A. Bilmes. Submodular optimization with submodular cover and submodular knapsack constraints. In NIPS, pages 2436–2444, 2013.
• Iyer et al. (2013) R. K. Iyer, S. Jegelka, and J. A. Bilmes. Curvature and optimal algorithms for learning and minimizing submodular functions. In NIPS, pages 2742–2750, 2013.
• Kempe et al. (2003) David Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of influence through a social network. In KDD, 2003.
• Krause et al. (2008) A. Krause, A. Singh, and C. Guestrin. Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies. Journal of Machine Learning Research, 9:235–284, Jun 2008.
• Krause and Cevher (2010) Andreas Krause and Volkan Cevher. Submodular dictionary selection for sparse representation. In ICML, pages 567–574, 2010.
• Li et al. (2017) Qiang Li, Wei Chen, Institute of Computing Xiaoming Sun, and Institute of Computing Jialin Zhang. Influence maximization with almost submodular threshold functions. In Advances in Neural Information Processing Systems 30, pages 3804–3814. 2017.
• Nemhauser et al. (1978) G. Nemhauser, L. Wolsey, , and M. Fisher. An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 14(1):265–294, 1978.
• Pasqualetti et al. (2014) F. Pasqualetti, S. Zampieri, and F. Bullo. Controllability metrics, limitations and algorithms for complex networks. IEEE Transactions on Control of Network Systems, 1(1):40–52, March 2014.
• Pequito et al. (2017) S. Pequito, A. Clark, and G. J. Pappas. Discrete-time fractional-order multiple scenario-based sensor selection. In 2017 American Control Conference (ACC), pages 5488–5493, May 2017.
• Schalk et al. (2004) G. Schalk, D. J. McFarland, T. Hinterberger, N. Birbaumer, and J. R. Wolpaw. Bci2000: a general-purpose brain-computer interface (BCI) system. IEEE Transactions on Biomedical Engineering, 51(6):1034–1043, June 2004.
• Schnitzler et al. (2015) F. Schnitzler, J. Y. Yu, , and S. Mannor. Sensor selection for crowdsensing dynamical systems. In AISTATS, 2015.
• Singla et al. (2015) A. Singla, S. Tschiatschek, and A. Krause. Noisy submodular maximization via adaptive sampling with applications to crowdsourced image collection summarization, 2015. arXiv:1511.07211.
• Summers et al. (2016) Tyler H. Summers, Fabrizio L. Cortesi, and John Lygeros. On submodularity and controllability in complex dynamical networks. IEEE Transactions on Control of Network Systems, 3:91–101, 2016.
• Sviridenko (2004) M. Sviridenko. A note on maximizing a submodular set function subject to a knapsack constraint. Operations Research Letters, 32(1):41–43, 2004.
• Svitkina and Fleischer (2008) Z. Svitkina and L. Fleischer. Submodular approximation: Sampling-based algorithms and lower bounds. In FOCS, pages 697–706, 2008.
• Tropp (2004) J. Tropp. Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Information Theory, 50:2231–2242, 2004.
• Tzoumas et al. (2018) V. Tzoumas, Y. Xue, S. Pequito, P. Bogdan, and G. J. Pappas. Selecting sensors in biological fractional-order systems. IEEE Transactions on Control of Network Systems, pages 1–1, 2018.
• Xue et al. (2016) Yuankun Xue, Sergio Pequito, Joana R. Coelho, Paul Bogdan, and George J. Pappas. Minimum number of sensors to ensure observability of physiological systems: a case study. In Allerton, 2016.
• Zhang (2008) T. Zhang. Adaptive forward-backward greedy algorithm for sparse learning with linear models. In NIPS, 2008.

## Appendix A Proof of Theorem 3

###### Proof.

Let be the output of the greedy algorithm and therefore . At the th stage of the algorithm, if is the selected set then can be expanded in two ways. First we have

 f(Ω∗∪SiG) = f(Ω∗)+fΩ∗(SiG)=f(Ω∗)+∑sj∈SiG∖Ω∗fΩ∗∪Sj−1G(sj) (15) ≥ f(Ω∗)+(1−αG)∑sj∈SiG∖Ω∗fSj−1G(sj),

where the inequality is written using the definition of greedy curvature from (7). On the other hand, the expansion is as follows

 f(Ω∗∪SiG)=f(SiG)+fSiG(Ω∗). (16)

After combining (15) and (16), we can write that

 f(Ω∗)+(1−αG)∑sj∈SiG∖Ω∗fSj−1G(sj) ≤ ∑sj∈SiGfSj−1G(sj)+fSiG(Ω∗),

which can be re-written as

 f(Ω∗)≤αG∑sj∈SiGfSj−1G(sj)+(1−αG)∑sj∈SiG∩Ω∗fSj−1G(sj)+fSiG(Ω∗).

At this point, it should be noted that out of , we have chosen which has total curvature of . We can upper bound the last term of the above inequality using (3) and use the diminishing returns property of submodular function to write that

 f(Ω∗) ≤ αGf(SiG)+(1−αG)∑sj∈SiG∩Ω∗fSj−1G(sj)+(1+δ)∑ω∈Ω∗∖SiGgSiG(ω) ≤ αGf(SiG)+(1−αG)∑sj∈SiG∩Ω∗fSj−1G(sj)+1+δ1−δ∑ω∈Ω∗∖SiGfSiG(ω).

The greedy algorithm at the th step would select according to the following

 A(i+1)=maxa∈Ω∖SiGfSiG(a)=fSiG(si+1), (17)

where is the gain at the th step. The last term can be upper bounded by and let us denote the size of set as . Subsequently, we can write that

 f(Ω∗) ≤ αGf(SiG)+(1−αG)∑sj∈SiG∩Ω∗fSj−1G(sj)+1+δ1−δ(k−ti)fSiG(si+1) ≤ αGf(SiG)+∑sj∈SiG∩Ω∗{(1−αG)fSj−1G(sj)−fSiG(si+1)}+k1+δ1−δfSiG(si+1).

where we have used the fact that is feasible according to Lemma 1 and hence . The summation term in the above inequality can be upper-bounded by using the definition of greedy curvature from (7) and we obtain

 f(Ω∗) ≤ αGf(SiG)+k1+δ1−δfSiG(si+1). (18)

We shall now upperbound the greedy curvature in terms of . In that process, we can write that

 mina∈SG∖(Si−1G∪Ω∗)fSi−1G∪Ω∗(a)fSi−1G(a) ≥ 1−δ1+δmina∈SG∖(Si−1G∪Ω∗)gSi−1G∪Ω∗(a)gSi−1G(a) (19) ≥ 1−δ1+δmina∈ΩgΩ∖a(a)g(a)=1−δ1+δ(1−αδ),

where the first inequality is written using (3) and second inequality is using the property of submodular functions. The last equality is using the definition of total curvature from (5). Using the similar approach, we can again write that

 mina∈(SG∩Ω∗)∖Si−1Gi≤j≤kfSj−1G(sj)fSi−1G(a) ≥ mina∈(SG∩Ω∗)∖Si−1Gi≤j≤kfSj−1G(a)fSi−1G(a) (20) ≥ 1−δ1+δmina∈(SG∩Ω∗)∖Si−1Gi≤j≤kgSj−1G(a)gSi−1G(a) ≥ 1−δ1+δmina∈(SG∩Ω∗)∖Si−1Gi≤j≤kgΩ∖a(a)g(a) ≥ 1−δ1+δmina∈ΩgΩ∖a(a)g(a)=1−δ1+δ(1−αδ).

Using the equation (19), (20) and (7), we can now conclude that . Therefore, equation (18) can now be written as

 f(Ω∗) ≤ (2δ1+δ+1−δ1+δαδ)f(SiG)+k1+δ1−δfSiG(si+1). (21)

The equation (21) is of the form of , which has the solution of the form using simple mathematical induction. Therefore, we can write

 f(SG) = k∑i=1A(i) (22) ≥ 12δ1+δ+1−δ1+δαδ(1−(1−1k(2δ1+δ+1−δ1+δαδ)(1−δ1+δ))k)OPT ≥ 12δ1+δ+1−δ1+δαδ(1−e−(2δ1+δ+1−δ1+δαδ)1−δ1+δ)OPT.

## Appendix B Alternate proof of Theorem 2

###### Proof.

The reader can benefit from this alternate proof using the new definition of greedy curvature in (7). It is shown that the proof can be done without going into the direction of constructing LPs. Now, following the same steps and notations of the proof of Theorem 1, we can write at the th step that

 f(Ω∗) ≤ (αG−1)∑sj∈SiG∖Ω∗fSj−1G(sj)+∑sj∈SiGfSj−1G(sj)+fSiG(Ω∗) = αG∑sj∈SiG∖Ω∗fSj−1G(sj)+∑sj∈SiG∩Ω∗fSj−1G(sj)+fSiG(Ω∗).

Using the definition of submodularity ratio from (4) we can upperbound the last term to write that

 f(Ω∗) ≤ αG∑sj∈SiG∖Ω∗fSj−1G(sj)+∑sj∈SiG∩Ω∗fSj−1G(sj)+1γ∑ω∈Ω∗∖SiGfSiG(ω).

Using the property of greedy algorithm from (7), the last term would be upper-bounded by