Submodular functions provide a rich class of expressible models for a variety of machine learning problems. Submodular functions occur naturally in two flavors. In minimization problems, they model notions of cooperation, attractive potentials, and economies of scale, while in maximization problems, they model aspects of coverage, diversity, and information. A set function over a finite set is submodular  if for all subsets , it holds that . Given a set , we define the gain of an element in the context as . A perhaps more intuitive characterization of submodularity is as follows: a function is submodular if it satisfies diminishing marginal returns, namely for all , and is monotone if for all .
where the function is monotone submodular, and is a combinatorial constraint, which could represent a cardinality lower bound constraint, or more complicated ones like cuts, matchings, trees, or paths in a graph. With cut constraints, this problem becomes cooperative cuts , and with matching constraints, we call this cooperative matchings, which we introduce and utilize in this paper.
The second problem asks for minimizing a monotone submodular cost function , while simultaneously maximizing a monotone submodular coverage function . A natural way to model this bi-optimization problem is to introduce one of and as a constraint . In particular, we obtain two optimization problems:
The fourth problem considered in this paper is minimizing the ratio of submodular functions .
A key assumption in this paper is that the functions and in Problems 1-4, are monotone submodular – an assumption that, as we shall see, is natural in many applications. Problem 2 is a special case of Problem 1, with . Furthermore, Problem 2 and Problem 3 are closely related and, loosely speaking, duals of each other . Similarly, Problem 4 is closely related to Problems 2 and 3, in that given an approximation algorithm for Problems 2 or 3, we can obtain an approximation with similar guarantees for Problem 4  (also considered in  with general monotone set functions). Problem 1 is constrained submodular minimization, while Problems 2, 3 and 4 try to simultaneously minimize one submodular function while maximizing another.
Problems 1-4 appear naturally in several machine learning applications. However, in the worst case all four problems have polynomial hardness factors of [29, 15, 12, 1]. An important observation is that the polynomial hardness of problems 1 - 3, comes up mainly due to the submodular cost function – they do not depend as much on the constraints or the submodular function [15, 12]. In the case of Problem 4, the hardness depends on both and .
On the other hand, these problems come up as models in many machine learning applications. These lower bounds are specific to rather contrived classes of functions, whereas much better results can be achieved for many practically relevant cases. The pessimistic worst case results are somewhat discouraging, d begging the need to quantify sub-classes of submodular functions that are more amenable to these optimization problems. Only limited past work has focused on investigating these problems with potentially a subclass of submodular functions. [14, 12, 1] provide bounds for Problems 1-4 based on the notion of curvature, and argue how several submodular functions (e.g. clustered concave over modular functions) have bounded curvature. Their curvature bounds depend on the choice of the submodular functions, and in certain cases yield no improvement over the worst case bounds. For classes of functions with bounded curvature, their bounds yield improved results.
In this paper, we focus on a tractable yet expressive subclass of submodular cost functions , namely low rank sums of concave over modular functions.
Low rank sums of concave over modular functions are the class of functions representable as , where s are monotone concave, and is constant or .
Low Rank in this context means that the number of components in the sum is small (i.e., is small). Our use of the terminology “low rank” is identical to that used in . We argue how this subclass naturally models many interesting applications of problems 1 - 4 in machine learning. We do not need to consider the entire class of submodular functions (which includes rather contrived instances), but only this subclass. This observation helps us in providing better connections between theory and practice. The main specialty of this subclass is that these functions effectively model cooperation between objects via discounts provided by concave functions. Moreover, we show that this subclass admits fully polynomial time approximation schemes for Problem 1, and constant factor approximation guarantees for Problems 2 and 3. Similarly, we achieve constant factor approximation guarantees for Problem 4, when is a low rank sum of concave over Modular functions, and is an arbitrary submodular function, a significant improvement over . The bounds we obtain are significantly better than the worst case bounds, and also an improvement over the bounds achieved using the curvature [15, 12].
Low rank sums of concave over modular functions in Problems 1 - 4, fit as natural models in several machine learning problems. Below, we summarize some of these.
Image segmentation (Cooperative Cuts): Markov random fields with pairwise attractive potentials occur naturally in modeling image segmentation and related applications . While models are tractably solved using graph-cuts, they suffer from the shrinking bias problem, and images with elongated edges are not segmented properly. When modeled via a submodular function, however, the cost of a cut is not just the sum of the edge weights, but a richer function that allows cooperation between edges, and yields superior results on many challenging tasks (see, for example, the results of the image segmentations in ). This was achieved in  by partitioning the set of edges of the grid graph into groups of similar edges (or types) , and defining a function , where s are concave functions and encodes the edge potentials. This ensures that we offer a discount to edges of the same type. Moreover, the number of types of edges are typically much smaller than the number of pixels, so this is a low-rank sum of concave functions.
Image Correspondence (Cooperative Matchings): The simplest model for matching key-points in pairs of images (which is also called the correspondence problem) can be posed as a bipartite matching. These models, however, do not capture interaction between the pixels. We illustrate the difficulty of this in Figure 1. One kind of desirable interaction is that similar or neighboring pixels be matched together. We can achieve this as follows. First we cluster the key-points in the two images into groups (this is illustrated in Figure 1-left via green, blue and red key-points). This induces a clustering of edges that can be given a discount via a submodular function (details are given in Section 4.1). In practice, the number of groups () can be much smaller than and this is a low-rank sum of concave over modular functions. Figure 1-right shows how the submodular matchings improves over the simple bipartite matching. In particular, the minimum matching approach produces many spurious matches between clusters (shown in red) that are avoided via the cooperation described above.
Sensor Placement or Feature Selection:
Sensor Placement or Feature Selection:Often, the problem of choosing sensor locations from a given set of possible locations can be modeled [22, 10] by maximizing the mutual information between the chosen variables and the unchosen set (i.e., ). Alternatively, we may wish to maximize the mutual information between a set of chosen sensors and a quantity of interest (i.e., ) assuming that the set of features are conditionally independent given . Both these functions are submodular. Since there are costs involved, we want to simultaneously minimize the cost . Often this cost is submodular [22, 10], since there is typically a discount when purchasing sensors in bulk (or computing features), and we can express this via Problems 2 and 3. For example, there may be diminished cost for placing a sensor in a particular location given placement in certain other locations. Similarly, certain features might be cheaper to use given that others are already being computed (e.g., those that use an FFT). A natural cost model in such cases is where ’s are concave, is the cost of sensor (or feature) and are groups of similar sensors or features. Typically, is much smaller than and this can be expressed as low rank sum of concave over modular functions.
2 Background & Existing Algorithms
The basic idea for most combinatorial algorithms solving Problems 1 - 4, are based on approximating the cost function with a tractable surrogate function [6, 7, 16, 13, 12, 15, 1]. Moreover, all four problems have similar guarantees. We characterize the quality of the solution via the notion of approximation factors. In particular, we say that an algorithm achieves an approximation factor of for Problem 1, if we can obtain a set such that , where is the optimizer of Problem 1. For Problems 2 and 3, we use the notion of bi-criterion approximation factors. An algorithm is a bi-criterion algorithm for Problem 2 if it is guaranteed to obtain a set such that (approximate optimality) and (approximate feasibility), where is an optimizer of Problem 2. Typically, and . Similarly, an algorithm is a bi-criterion algorithm for Problem 3 if it is guaranteed to obtain a set such that and , where is the optimizer of Problem 3. Moreover, problems 2 and 3 are very closely related , in that an approximation algorithm for one problem can be used to obtain guarantees for the other problem. The two problems also have matching hardness factors. For Problem 4, we study an algorithm which achieve -approximation guarantees, in that we can achieve a set such that where and is the optimal minimizer of .
Supergradient based Algorithm (SGA): One such method uses the supergradients of a submodular function [15, 13, 6, 17, 11] to obtain modular upper bounds in an iterative manner. In particular, define a modular upper bound:
The algorithm starts with the and sequentially sets as the solution of the corresponding problem (1, 2 or 3) with a surrogate function as [15, 16, 12]. In each case, this subproblem is much easier. For example, in the case of Problem 1, the subproblem becomes,
which is a linear cost problem, poly-time solvable for many constraints, like cardinality, cuts, matchings, paths etc.
In the case of Problems 2 and 3, these subproblems are
With Problem 4, the subproblem becomes,
This can be approximated up to a factor of via a Greedy algorithm .
Define , where represents the average curvature of the function . The supergradient based iterative algorithm (SGA) achieves an approximation factor of for Problem 1, and bicriteria factors satisfying and for Problems 2 and 3. Finally, SGA achieves an approximation factor of for Problem 4.
This Lemma follows easily from the results in [15, 13, 12, 1]. We can also achieve a non-bicriteria approximation factor for Problem 2, which is worse than the bicriteria factor by a factor . A key quantity which defines the approximation factor above is the average curvature , which in turn depends on the concave functions. If the concave function is , SGA admits approximation factors of . On the other hand, if the concave function is , the guarantees are , which is much poorer.
The supergradient based algorithm is easy to implement, and also works well in practice [15, 17]. For the general class of submodular functions, these results are close to the optimal bounds, and are, in fact, tight for some constraints. Nevertheless, the worst case guarantees seem discouraging, particularly for the class of low rank sums of concave over modular functions that we consider here, and that as mentioned above are natural for many applications.
Ellipsoidal Approximation based Algorithm (EA): Another generic approximation of a submodular function, introduced by Goemans et. al , is based on approximating the submodular polyhedron by an ellipsoid. The main result states that any polymatroid (monotone submodular) function , can be approximated by a function of the form
for a certain modular weight vector, such that . A simple trick then provides a curvature-dependent approximation . We have the following result borrowed from [12, 13, 1].
Define , where represents the worst case curvature of the function .. The Ellipsoidal Approximation based algorithm (EA) achieves an approximation factor of for Problem 1, and bicriteria factors satisfying and for Problems 2 and 3. Similarly EA achieves an approximation guarantee of for Problem 4.
The Ellipsoidal Approximation obtains the tightest bounds for Problems 1-4 [12, 13, 7, 6, 16, 1]. This is again for the general class of submodular functions and the worst case factor of is quite discouraging. This algorithm, however is very expensive computationally, and is not practical for solving machine learning applications .
3 Improved Algorithms for Low-rank sums of concave-modular functions
Our main new results are that we can achieve a fully polynomial time approximation scheme for Problem 1, and constant factor approximation guarantees for Problems 2, 3 and 4 when the cost function is a low rank sum of concave over modular functions (Theorem 4). Our techniques build on recent methods used for minimizing quasi-concave functions over solvable polytopes [25, 23, 8, 19].
Assume the concave functions ’s are monotone functions, i.e., . We also assume that for all , for and some constant . The second assumption holds for a number of concave functions, including , and .
The main idea of this approach is to replace the concave functions ’s by piece-wise linear approximations . We define an approximation of as defined as . We then optimize this piece-wise linear approximation function, and the approximation factor comes based on the tightness of this piece-wise linear approximation. We call this procedure the piece-wise linear approximation based algorithm (PLA).
We compute this approximation as follows. In the case of Problem 1, compute and for each . Both these computations are linear cost problems and are polynomial time for most constraints. In case these are NP hard for Problem 1, or in the case of Problems 2, 3 and 4, we set and . Then divide the range into pieces with breakpoints such that , , and so on, for any . It is easy to see that . The precision defines the fineness of the points, and the quality of the approximation.
For all , define the piece-wise linear function , via the breakpoints . A visualization of this is shown in Figure 2, where the dotted lines are the piece-wise approximation, while the solid curve is the concave function . We first show that the function approximates the function within a factor of .
The piece-wise linear function defined with a precision satisfies,
where is a constant such that for for all .
By the construction of , and the concavity of the s, it is easy to see that . To show the upper bound, consider a region defined by breakpoints and . Due to concavity of , there exists a tangent at some point in whose slope equals that of the line connecting and . This tangent line upper bounds the concave function , and we can denote the corresponding upper bound as . It then holds that . We now show that .
We now focus on the region . Let be the constant difference between the two (parallel) lines, in terms of the value. A visualization of this is shown in Figure 3. We would like to give a worst case bound on . Notice that . The last inequality holds since , and the second last one holds since is a break point.
Moreover, and hence . ∎
We now show how we can exactly solve Problems 1, 2 and 3 using the cost function . Let denote the slopes of the piece-wise linear functions – in other words, . Also, we denote as the corresponding intercepts. The functions are characterized by the pairs , and . We then consider the different possibilities of the cross-terms. Define as a vector such that .
In the case of Problem 1, PLA solves a set of optimization problems,
The final solution is the minimum among the ones above. For problem 2, we consider the set of problems,
and again set is the minimum among the ’s above. Similarly, for Problem 3, we solve,
We set corresponding to the set with the largest value of . Finally, for Problem 4, we have:
Our main result is that these simple procedures provide improved guarantees for all three problems.
PLA achieves an approximation factor of for Problem 1 as long as a linear function can be exactly minimized under . PLA also achieves a bi-criterion approximation factor satisfying and for Problems 2 and 3. PLA also achieves a non bicriterion approximation factor of for Problem 2. PLA also achieves an approximation factor of for Problem 4. The worst case complexity of PLA is , where is the complexity of Problems 1-4, with a linear cost function .
We first show that PLA solves Problems 1-4 with the surrogate function . Note that with the piece-wise linear approximation, Problem 1 becomes . This holds since , due to the concavity of ’s. We can then rewrite this as , where . Combining these facts, we can rewrite the problem as , which after interchanging the ’s becomes . This is exactly Eq. (3).
The algorithm for Problem 2 (Eq. (3)) is basically the same as that of Problem 1, since it is a special case. Similarly we can write Problem 4 as which is equivalent to , which becomes equation (3). Equations (3), (3) and (3) each become instances of Problems 1, 2 and 4 with being modular and the approximation guarantees follow directly from [14, 12, 1].
To deal with Problem 3, we use the fact that , and hence we have the constraint, . First we show that is feasible for all . This follows easily from the fact that if for any , , it holds that . Next, let be the optimal solution of Problem 3, and let be such that . Note that our algorithm covers and hence , where is the approximation factor of the submodular knapsack problem . Note that the approximation factor of Problem 1 with is assuming admits an exact solution with linear cost functions, while the factor for problem is for non-bicriterion algorithms, and a bicriterion factor of with a bi-criterion algorithm [31, 12]. ∎
. This problem in general is not a combinatorial optimization problem. However, when the functions are quasi-concave, the optimum lies on an extreme point, and hence, can be posed as a combinatorial optimization problem. Problem 1 asks for optimizing a specific subclass of concave (and hence quasi-concave) functions.[25, 8, 19] focus on the class of low rank quasi-concave functions, while  consider the general class of low rank functions. While their algorithms apply to our class of functions as well, their approach while being more general, is also more complicated and involved.  also consider a special case of Problem 1, with being the family of cuts (i.e., the cooperative cut problem). Interestingly, they suggest an algorithm that is identical to PLA when is a (low rank) sum of truncations (i.e., ). For general sums of low-rank concave functions, they resort to the algorithms of [23, 8]. We provide a generic algorithm, which not only works for a much large class of constraints and functions, but also extends to the Problems 2, 3 and 4. Moreover, it is easy to see that our algorithms would also work for the more general problem of minimizing low rank sums of concave functions, over a solvable polytope.
Note that the complexity of PLA is polynomial in , but exponential in . Hence this makes sense only if is a constant or is . If is a constant (with respect to ), PLA is a fully polynomial time approximation scheme (FPTAS) . If , then PLA is a polynomial-time approximation scheme (PTAS). This assumption is reasonable for many of the applications of Problems 1-4 (see details of this in the experiments section). Moreover, there are a number of ways one can speed up PLA. A very simple observation is that PLA is amenable to a distributive implementation via Map-Reduce. In particular, let denote the total number of computations of PLA (i.e., this is the number of times one performs an instance of Problems 1-4 with a modular function). All these can be performed in parallel on processing systems. We output the best from each system to a central processor, which finds the optimal amongst these. The complexity of this distributive procedure is , (where is the complexity of using a modular function in the place of in Problems 1-4), which improves the overall complexity by a factor of .
In addition, we can also provide early stopping criterion and heuristics for speeding up PLA. One strategy of implementing PLA, is to start with, and incrementally increase in a coordinate ascent fashion. The following lemma gives a sufficient condition for stopping PLA.
Let be such that the corresponding solution satisfies . Then is the (near) optimal solution for Problems 1, 2 and 3.
The values of also suggest the direction of the co-ordinate wise algorithm. For example, if , it suggests that the value of be decreased. Similarly, if , its a sign that be decreased. In this manner, one can define a greedy like heuristic to implement PLA , which picks for every coordinate, the slope which increases the objective value the most. Many of these heuristics have been considered in  in the case of cuts, and when the function class is low rank sums of truncations. These heuristics are all polynomial in , but are not guaranteed to obtain the optimal solutions. Moreover, in certain cases (for example, the case of cuts), one can do parametric versions, thereby solving a set of related problems simultaneously [20, 5].
We next experimentally evaluate the performance of our methods. The utility of the constrained minimization algorithms for cooperative cuts have been investigated in . In this paper, we consider the applications of cooperative image matching and sensor placement.
4.1 Cooperative Image matching
The problem of matching key-points in images, also called image correspondence, is an important problem in computer vision. The simplest model for this problem constructs a matching with linear scores, i.e., a max bipartite matching , called a linear assignment. This model does not allow a representation of interaction between the pixels. For example, we see many obviously spurious matches in figure 4b. Many models try to capture this, via, for example via quadratic assignments . Instead of just looking at the best linear assignment, the quadratic models try to incorporate pairwise constraints. This is also called graph matching.
We describe a new and different model here. First, we cluster key-points, separately in each of the two images, into clusters. Figure 4a shows a particular clustering of an image into groups. The clustering can be performed based on the pixel color map, or simply the distance of the key-points. That is, each image has clusters. Let and be the two sets of clusters. We then compute the linear assignment problem, letting be the resulting maximum matching. We then partition the edge set where for corresponding to the ’th largest intersection, and are the remaining edges either that were not matched or that did not lie within a frequently associated pair of image key-point clusters. We then define a submodular function as follows:
which provides an additional discount to the edges corresponding to key-points that were frequently associated in the initial pass. The problem of co-operative matching then becomes an instance of Problem 1 with the submodular function (over the edges) defined above, and a constraint that the edges form a matching. Figures 4b and 4c shows how the submodular matchings improve over the simple bipartite matching, with . The minimum matching approach obtains many spurious matches between clusters (shown in red), while the cooperation described above reduces these spurious matches. The cooperative matching improves the performance over the modular method on these images by about .
We also test the performance of our algorithms on the CMU House and Hotel dataset . The house dataset has images, while the hotel dataset has images. We consider all possible pairs of images, with differences between the two images ranging from in both cases. We consider three algorithms: PLA, SGA (both using Equation (7)) and the simple modular bipartite matching as a baseline (Mod). Again, we set . The results are shown in Figure 4(d-e) where we observe that PLA and SGA beat Mod by about on average. Moreover, we also see that PLA, in general, outperforms SGA, thus showing how superior theoretical guarantees translate into better empirical performance. In PLA, we chose such that each concave function has four break points. We observed, moreover, that setting lower values of does not improve the objective value in this application. We observe, moreover, that PLA also beats SGA in terms of objective value. We do not compare the ellipsoidal approximation algorithm (EA) , mainly because it is too slow to run on real world problems. Moreover, this algorithm has been observed to perform comparably to the much simpler SGA . While we considered the simple linear assignment as a baseline for the cooperative matching, it seems possible to embed this cooperation on more involved graph matching models as well.
4.2 Sensor Placement
We next consider an application of sensor placement. A number of natural models for this problem are forms of submodular maximization [21, 22]. A natural model, that performs very well in practice, is to maximize the mutual information , where refers to the set of sensors chosen. [21, 22] investigate this in the setting of additive costs on the sensors. Often however, the costs are not additive in practice. In fact, very often, they are also submodular , and a natural model is,
where s are concave, is the cost of sensor and are groups of similar sensors. This was posed as an open problem in . We can naturally pose this as instances of Problem 3, where and is the cost function above. Note that we could equivalently also express this as an instance of Problem 2 with a constraint on while minimizing .
We consider real world data of placing sensors to predict the pH values from the lake of Merced . We also assume that the function is piece-wise linear, shown in Figure 5a (far left). Figure 5b shows the locations (horizontal and vertical). We assume that there are three kinds of locations, shown in blue, green and red colors respectively, and the costs of placing sensors in the same kind of location is discounted. Correspondingly, we assume the cost function is an instance of function Equation (8) with . For simplicity, we assume also that all three types of sensor locations have the same coverage model (though, in general, it would make sense for them to have different models for coverage, based on their type). Under this assumption, the optimal configuration would tend to be spatially diverse, yet cooperative (in the sense, that the same type of sensors would be chosen).
We compare three algorithms: PLA, and SGA (both on Problem 3), and a simple cost agnostic greedy algorithm (AG), which ignores the cost function , and greedily adds sensors. Figure 5c shows the sensors chosen by PLA (the cost sensitive one), and Figure 5d shows the choices of AG (the cost agnostic one). While both have the same cost budget, the cost agnostic one does not utilize the discounts of placing sensors in similar locations, and correspondingly, places fewer sensors. The cost sensitive algorithms (PLA and SGA) on the other hand, simultaneously achieve coverage, while making use of the discounts. Figure 5e plots the objective functions attained by the three algorithms. We see that both PLA and SGA, outperform the agnostic greedy algorithm. Moreover, PLA also performs better than SGA. Note that the function , used in this case is piece-wise linear, and correspondingly PLA is exact in this case.
In this paper, we investigated a new class of algorithms for various forms of constrained submodular programs, with a special subclass of submodular cost functions. We focus on problems that for the general class of submodular functions are hard, and yet occur naturally in many applications. We showed that when we restrict the class of functions to low rank sums of concave over modular functions, we can obtain significantly improved worst case theoretical results. We also complemented our results with experimental results in sensor placement and image correspondence. An immediate open question is whether there are similar algorithms for other rich and useful subclasses of submodular functions. In particular, it would be interesting if one can remove the low rank assumption, and provide tighter approximation algorithms for general sums of concave over modular functions, which would be very powerful.
This material is based upon work supported by the National Science Foundation under Grant No. (IIS-1162606), as well as a Google and a Microsoft award. This work was also supported in part by the CONIX Research Center, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA.
-  W. Bai, R. Iyer, K. Wei, and J. Bilmes. Algorithms for optimizing the ratio of submodular functions. In International Conference on Machine Learning, pages 2751–2759, 2016.
-  Y. Boykov and V. Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. TPAMI, 26(9):1124–1137, 2004.
-  T. S. Caetano, J. J. McAuley, L. Cheng, Q. V. Le, and A. J. Smola. Learning graph matching. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 31(6):1048–1058, 2009.
-  S. Fujishige. Submodular functions and optimization, volume 58. Elsevier Science, 2005.
-  S. Fujishige and S. Iwata. Minimizing a submodular function arising from a concave function. Discrete applied mathematics, 92(2):211–215, 1999.
-  G. Goel, P. Tripathi, and L. Wang. Combinatorial problems with discounted price functions in multi-agent systems. In FSTTCS, 2010.
-  M. Goemans, N. Harvey, S. Iwata, and V. Mirrokni. Approximating submodular functions everywhere. In SODA, pages 535–544, 2009.
-  V. Goyal and R. Ravi. An FPTAS for minimizing a class of low-rank quasi-concave functions over a convex set. Operations Research Letters, 41(2):191–196, 2013.
-  S. Iwata and K. Nagano. Submodular function minimization under covering constraints. In In FOCS, pages 671–680. IEEE, 2009.
-  R. Iyer and J. Bilmes. Algorithms for approximate minimization of the difference between submodular functions, with applications. In UAI, 2012.
-  R. Iyer and J. Bilmes. The submodular Bregman and Lovász-Bregman divergences with applications. In NIPS, 2012.
-  R. Iyer and J. Bilmes. Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints. In NIPS, 2013.
-  R. Iyer, S. Jegelka, and J. Bilmes. Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions . In Neural Information Processing Society (NIPS), 2013.
-  R. Iyer, S. Jegelka, and J. Bilmes. Curvature and Optimal Algorithms for Learning and Optimization of Submodular Functions: Extended arxiv version, 2013.
-  R. Iyer, S. Jegelka, and J. Bilmes. Fast Semidifferential based Submodular function optimization. In ICML, 2013.
-  S. Jegelka and J. A. Bilmes. Approximation bounds for inference using cooperative cuts. In ICML, 2011.
-  S. Jegelka and J. A. Bilmes. Submodularity beyond submodular energies: coupling edges in graph cuts. In CVPR, 2011.
-  S. Jegelka, A. Kapoor, and E. Horvitz. An interactive approach to solving correspondence problems. International Journal of Computer Vision, pages 1–10, 2013.
-  J. A. Kelner and E. Nikolova. On the hardness and smoothed complexity of quasi-concave minimization. In Foundations of Computer Science, 2007. FOCS’07. 48th Annual IEEE Symposium on, pages 472–482. IEEE, 2007.
-  P. Kohli, A. Osokin, and S. Jegelka. A principled deep random field for image segmentation. In CVPR, 2013.
-  A. Krause and C. Guestrin. Optimizing sensing: From water to the web. Technical report, DTIC Document, 2009.
-  A. Krause, A. Singh, and C. Guestrin. Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies. JMLR, 9:235–284, 2008.
-  S. Mittal and A. S. Schulz. An FPTAS for optimizing a class of low-rank functions over a polytope. Mathematical Programming, 141(1-2):103–120, 2013.
-  G. Nemhauser and L. Wolsey. Best algorithms for approximating the maximum of a submodular set function. Mathematics of Operations Research, 3(3):177–188, 1978.
-  E. Nikolova. Approximation algorithms for offline risk-averse combinatorial optimization, 2010.
-  A. S. Ogale and Y. Aloimonos. Shape and the stereo correspondence problem. International Journal of Computer Vision, 65(3):147–162, 2005.
-  C. Qian, J.-C. Shi, Y. Yu, K. Tang, and Z.-H. Zhou. Optimizing ratio of monotone set functions. In IJCAI, pages 2606–2612, 2017.
-  M. Sviridenko. A note on maximizing a submodular set function subject to a knapsack constraint. Operations Research Letters, 32(1):41–43, 2004.
-  Z. Svitkina and L. Fleischer. Submodular approximation: Sampling-based algorithms and lower bounds. In FOCS, pages 697–706, 2008.
-  V. V. Vazirani. Approximation algorithms. springer, 2004.
-  L. A. Wolsey. An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica, 2(4):385–393, 1982.