 # Majorisation-minimisation algorithms for minimising the difference between lattice submodular functions

We consider the problem of minimising functions represented as a difference of lattice submodular functions. We propose analogues to the SupSub, SubSup and ModMod routines for lattice submodular functions. We show that our majorisation-minimisation algorithms produce iterates that monotonically decrease, and that we converge to a local minimum. We also extend additive hardness results, and show that a broad range of functions can be expressed as the difference of submodular functions.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In discrete optimisation, many objectives are expressible as submodular minimisation or maximisation problems. This is useful, as submodular functions can be minimised in strongly polynomial time , and have strong constant approximation factors under a variety of constraints for maximisation  . Applications for these problems have included medical imaging , document summarisation  and online advertising .

We refer to as the ground set, and a function is then called submodular if the following inequality holds for all :

 f(A)+f(B)≥f(A∪B)+f(A∩B). (1)

We note that a subset can also be represented as a vector , where a component takes on the value if the element is present in the subset . In this way our inequality can be written as

 f(x)+f(y)≥f(min(x,y))+f(max(x,y)) (2)

for all . A popular and often-studied application of submodular optimisation which we shall use as our running example is that of sensor placement. Suppose we wish to detect the level of air pollution in a town. We have a finite set of locations that we can install air pollution sensors in, but we are constrained by some budget. Knowing the geography of our town, we wish to select the set of locations such that the information we receive is maximised. As shown in , this problem is submodular.

In the classic case, this corresponds to only having one type of sensor. However, we can suppose that we have a number of different types of sensors available, each with a strength level that can be parameterised by an integer variable . Lattice submodularity is then determined by satisfying Equation , but for all . Note that in general, we can allow to be different for different elements.

Submodular functions in the set case also lend themselves to a useful diminishing returns property, which is often considered as an equivalent definition.

###### Definition 1.1 (Diminishing returns property).

A submodular set function can also be defined by the following. For any and , we have

 f(A∪e)−f(A)≥f(B∪e)−f(B). (3)

However, for lattice functions we unfortunately do not get that for free , so we define a proper subclass known as DR-submodular functions as follows:

###### Definition 1.2 (DR-Submodular).

Let be a lattice submodular function on . Then is called a DR-submodular function if for all with componentwise, and basis vector , we have

 f(x+cej)−f(x)≥f(y+cej)−f(y). (4)

As shown in , many typical objectives we come across are in fact DR-submodular, so it makes sense to consider this subclass. The specific problem of optimising DR-submodular functions has been studied before  . Across these papers, the problem of maximisation has been studied in monotone and non-monotone settings across a variety of constraints.

In this paper, we will carry out our majorisation-minimisation algorithm by discretising the lattice submodular functions, giving us submodular functions on some lattice. In , the authors showed that DR-submodular functions on a lattice can be reduced to the set function submodular case, where the ground set will be smaller. Due to this, we do not consider the specific DR subclass here, as it would be more efficient to carry out this reduction and then use the work of .

To motivate the problem of considering the difference of submodular functions, we turn back to the sensor information problem. Here, we may wish to also factor in the costs of sensors also. As is typical in real-world applications, we expect to get a bulk discount if we buy many sensors. Additionally, we may expect that for any one sensor, as the sensor gets stronger the unit price increases slower. This leads us to the diminishing returns property for submodularity. Specifically, if we obtain information , and spend to get that information, for some sensor placement , we wish to minimise , for some tradeoff parameter . This formulation also applies to any other problem of optimising a submodular function with some cost.

In the set function case, this problem was considered by , who proposed several different majorisation minimisation techniques. Here, we will use the work of  to come up with our algorithm, who worked on discretised versions of continuous submodular function, hence lattice submodular functions. We will derive majorisation-minimisation algorithms on the difference . It can be considered the extension to the lattice case of the SupSub procedure in , originally introduced in , as well as the SubSup and ModMod procedures. In fact by discretising continuous functions, we can also consider it an extension to the continuous domain much like . We will show that we converge to a local minimum of the function, and consider the class of functions we can represent as the difference of submodular functions.

### 1.1 Outline

We begin by discussing some preliminaries Section . We consider the theory of continuous and lattice submodular functions, and show how subgradients are obtained. We show modular and tight lower and upper bounds, along with a decomposition that allows us to utilise the theory of DR-submodular functions effectively. Section presents and discusses the three algorithms, along with a discussion of a stopping criterion. In Section , we discuss complexity, deriving the functions for the representation , along with other theoretical issues. We conclude with a brief discussion in Section .

## 2 Preliminaries

We are considering the problem of minimising the difference between lattice submodular functions. In particular, we seek to minimise

 v(x)=f(x)−g(x) (5)

where are lattice submodular. For our algorithms, our plan will be to in fact use an extension of the functions that was defined in . While doing this, we will also give the main ingredient for computing the lower bound.

### 2.1 An Extension

Our extension will be from the set to the product of measures of . The motivation for this can be thought of as follows. In the Lovasz extension, the value

taken on by a component can be thought of as the probability that an element

is included in a set. Similarly, we will extend here to probability mass functions over the domain .

We note that any PMF on this domain can be represented by its reverse cumulative probability density function

. We then see that always, and the only constraint on it is that its elements are non-decreasing, and all belong to . As in , we denote this set of vectors as .

Next, we think of this set of vectors just defined, and note that if we mark the locations of , we will divide the number line into segments. This induces a map from to with if , if for , and for .

Our extension is then thus defined, for a submodular function

 f↓(ρ)=∫10f(θ(ρ1,t),…,θ(ρn,t))dt. (6)

For a more thorough exposition, see . We now detail how the extension is evaluated using a greedy algorithm.

### 2.2 A lower bound

We now give the greedy algorithm from .

###### Theorem 2.1.

Consider the extension of of a submodular function . Order all values of in decreasing order, breaking ties arbitrarily but ensuring that all components for a given are in the correct order. Assume the value at position is equal to and corresponds to . Then we have the following expression for

 f↓(ρ)=f(0)+r∑i=1t(s)(f(y(s))−f(y(s−1))) (7)

which we write in the form , where corresponds to the difference in value between two function evaluations whose arguments differ by a single basis vector.

In this section, we will first remind ourselves of the different characterisations of the base polyhedron, how the greedy algorithm is used to construct extreme points for one of these characterisations, and how subgradients are computed. Using this, we will compute a lower bound that will be used for our algorithm.

We remind ourselves first of the characterisation of the base polyhedron as given initially in :

###### Definition 2.1 (Base Polyhedron).

Let be a discrete submodular function WLOG with , in each argument from to the integer respectively. Then the submodular polyhedron can be defined by arguments such that for all :

 n∑i=1xi∑yi=1wi(yi)≤f(x1,…,xn), (8)
 n∑i=1ki−1∑yi=1wi(yi)=f(k1−1,…,kn−1). (9)

As mentioned in , this polyhedron is in fact not a polyhedron, and is unbounded, if there is any . This can be made explicit in the following example.

###### Example 2.1 (Base Polyhedron Unbounded).

Let for some . Then for that , we can add to any such that

 u1(1)=−1,u1(2)=1, (10)

and we see this won’t violate any of the equations in the definition. This argument extends straightforwardly to any .

Because of this, they instead define the base polyhedron as the convex hull of outputs of a greedy algorithm. The following result shows that the base polyhedron still behaves in the same way:

###### Lemma 2.1.

Let be some submodular function. Then for any we have that

 maxw⟨w,ρ⟩=f↓(ρ)−f(0), (11)

where we take the max over either characterisation of the base polyhedron.

We see that as long as we take a compatible with the ordering , we will get a subgradient for the function . Restricting to the that will give us points in the domain of our original submodular function , we get an element of the subdifferential of .

Now we can construct a lower bound. First, we construct a set with elements, each of which corresponds to an increment of one of the basis vectors. There are copies of the increment of element . Note that such a set corresponds to a chain defined by

 0=p0

Take a permutation of denoted by and form its corresponding chain . Ensure is such that the chain contains , something we define now.

###### Definition 2.2 (Chain containing an element).

Let be a chain as defined in Equation (12). We say the chain contains , for some vector , if we have .

Note that any chain that contains a vector is compatible with the ordering of the that corresponds to . In particular, we’ll take the chain where we increment the first element to , then the second to and so on. After it reaches , we let it have any behaviour. Then we can form by taking differences of successive elements of the chain.

Now we can form the function corresponding to the lower bound by making as described earlier. This will be denoted by

 hf,y(i,j)=wi(j). (13)

To extend this definition to an entire point we can do the following:

 hf,y(x)=n∑i=1xi∑j=1hf,y(i,j)=n∑i=1xi∑j=1wi(j). (14)

Now note that for each in the chain, we have that

 hf,y(pi)=f(pi). (15)

In particular, note that as is contained in this chain, we have that

 hf,y(y)=f(y). (16)

Then due to the fact that this is a subgradient, we have that

 hf,y(x)≤f(x) (17)

for all . This is parameterised by and is tight at .

### 2.3 Upper Bound

In , an upper bound was derived for a submodular set function. This can be extended, but it does require DR-submodularity. We show now how to get around that as follows if we know one particular quantity:

###### Lemma 2.2.

Let be a lattice submodular function. Then can be represented as the sum of a modular function and a DR-submodular function .

###### Proof.

Let be the largest second difference of the function that violates the DR property, or at least an upper bound of it. Because of submodularity, we know the difference will be a second difference entirely within one basis element. Namely, take:

 λ≤maxx,i(f(x+2ei)−2f(x+ei)+f(x)). (18)

Then let . This will give us DR-submodular as required. ∎

Computing the exact tight value of is often hard, but there are some cases where it will be easier to derive upper bounds:

1. If the function

is the discretisation of a continuous function, we can compute the largest positive eigenvalue, or an upper bound of it, via a number of eigenvalue algorithms.

2. If the function is also -convex (or midpoint convex), the function is convex-extensible , and so proceed similarly to above.

3. If the function is a quadratic program, with , then we have that is the maximum positive diagonal element of (or if none exist).

The idea now here is that we will let , and then derive a modular and tight upper bound for , thus giving our modular and tight upper bound for as is already modular. The aim for that is to generalise the bounds found in , given as

 f(Y) ≤f(X)−∑j∈X∖Yf(j|X∖j)+∑j∈Y∖Xf(j|∅), (19) f(Y) ≤f(X)−∑j∈X∖Yf(j|V∖j)+∑j∈Y∖Xf(j|X). (20)

In this section, we assume that is a lattice submodular function given on the product of sets . The extension of these bounds is given in the following lemma:

###### Lemma 2.3.

Let be a DR-submodular function as described. Let with extending its arguments to vectors componentwise. Let be vectors in . Let

 (a1,…,an) =m(x−y), (21) (b1,…,bn) =m(y−x). (22)

Then we have the following:

 f(y) ≤f(x)−n∑i=1[f(x)−f(x−aiei)]+n∑i=1f(biei), (23) f(y) ≤f(x)−n∑i=1[f(kmax)−f(kmax−aiei)]+n∑i=1f(f(x+biei)−f(x)), (24)

where are the usual basis vectors.

###### Proof.

We first show the following bounds:

 f(y) ≤f(x)−n∑i=1[f(x)−f(x−aiei)]+n∑i=1[f(biei+min(x,y))−f(min(x,y))], (25) f(y) ≤f(x)−n∑i=1[f(max(x,y))−f(max(x,y)−aiei)]+n∑i=1f(f(x+biei)−f(x)). (26)

The proof proceeds similarly to the derivation in , but with unions and intersections replaced with mins and maxes respectively. We start with the second statement. Take an arbitrary with as before. Then note we have

 f(max(x,y))−f(x) =n∑i=1[f(x+i∑j=1bjej)−f(x+i−1∑j=1bjej)] (27) =n∑i=1ρbi,ei(x+i−1∑j=1bjej), (28)

where we take . Here denotes the marginal return on adding the value in the basis vector when the function already has argument . Using the DR-submodular property, we then see

 n∑i=1ρbi,ei(x+i−1∑j=1bjej)≤n∑i=1ρbi,ei(x)=n∑i=1f(x+biei)−f(x). (29)

Similarly, we’ll now consider the following expression:

 f(max(x,y))−f(y) =n∑i=1[f(y+i∑j=1ajej)−f(y+i−1∑j=1ajej)] (30) =n∑i=1ρai,ei(y+i∑j=1ajej−aiei) (31) ≥ρai,ei(max(x,y)−aiei)=n∑i=1[f(max(x,y))−f(max(x,y)−ai)]. (32)

Subtracting these two gives us the required result. We now proceed to the first statement. We get the first inequality similarly to just how we proceeded:

 f(x)−f(min(x,y)) =n∑i=1[f(x+i∑j=1ajej)−f(x+i−1∑j=1ajej)] (33) =n∑i=1ρai,ei(y+i∑j=1ajej−aiei) (34) ≤ρai,ei(x−aiei)=n∑i=1[f(x)−f(x−aiei)]. (35)

The second also coming easily:

 f(y)−f(min(x,y)) =n∑i=1[f(y+i∑j=1bjej)−f(y+i−1∑j=1bjej)] (36) =n∑i=1ρbi,ei(y+i−1∑j=1bjej), (37) ≤n∑i=1ρbi,ei(min(x,y)) (38) =n∑i=1y−x[f(biei+min(x,y))−f(min(x,y))]. (39)

And again we subtract to get the first bounds that we wanted. To get to the required result, simply apply DR-submodularity to the final term of the first bound and the second term of the second bound. ∎

We note that these bounds are separable. Additionally, we see that these bounds are tight at , namely equality is achieved with . This result also applies as written for continuous submodular functions, where the vectors instead belong to WLOG.

## 3 Three algorithms

We describe the integer lattice majorisation-minimisation algorithms here, in Algorithms . In Algorithm , note that at every step we are minimising a lattice submodular function, which as shown in  to get this to arbitrary precision we have complexity , for a continuous submodular function defined on with Lipschitz constant (note that the author minimises a discretised version of the continuous function). For Algorithm , we are instead maximising, for which we have an approximation factor of  and runs in calls. For Algorithm , we are at each point minimising a modular function, which can be done easily in function evaluations, where we have , by evaluating each separated function at every point and taking the minima. Additionally, we note that Algorithms will require in principle an upper bound on the quantity as described in Lemma .

We note that we have quite easily:

 v(xt+1) =f(xt+1)−hgσ(xt+1) (40) ≤mfxt(xt)−hgσ(xt) (41) =f(xt)−g(xt)=v(xt), (42)

the second to last equality coming from the demonstrated tightness of our bounds. However, we note that the sequence may not strictly decrease, and we seek a convergence condition. For the upper bound, we can simply say try both of them, and if neither strictly decreases the function, we’re done. For the lower bound, there are many permutations we can change and we want some sort of a stopping criterion.

###### Lemma 3.1.

If we choose permutations each with different increments directly before and after , and attempt to decrease with both upper bounds and we are not successful, then we have reached a local minimum.

###### Proof.

Proof is functionally identical to the set function case as in , except instead of saying we consider all , we just consider all where are basis vectors. ∎

## 4 Theoretical Analysis

We note that we don’t get any multiplicative approximation guarantees, as the hardness results are inherited from the set function case. However, we would like to extend the additive hardness results of . While following their proof will rely on being DR-submodular, we recall we can write for modular and DR-submodular via Lemma .

So now we act just on the DR-submodular function. This requires an extension of the decomposition from  which we give in a slightly weaker form:

###### Lemma 4.1.

Let be any DR-submodular function with . It can be decomposed into a modular function plus a monotone function with .

###### Proof.

We first construct the modular function , then show that the function is monotone, and takes the value at . For any input we form:

 g(y)=n∑k=1yk∑j=1mj,kj, (43)

for for each . We note this decays to the modular function in  if we restrict all to . This function is clearly modular. Additionally, it is clear that has . To show that it is monotone:

 f′(k+ei) =f(k+ei)−n∑k=1y′k∑j=1mj,kj (44) f′(k) =f(k)−n∑k=1yk∑j=1mj,kj. (45)

Note that in the first equation, there will be one extra term subtracted, . We claim the right hand side of the first equation here is greater than the second, as:

 f(k+ei)−n∑k=1y′k∑j=1mj,kj−(f(k)−n∑k=1yk∑j=1mj,kj) (46) =f(k+ei)−f(k)−mi,ki+1ki+1 (47) =1ki+1((ki+1)(f(k+ei)−f(k))−(f(kmax)−f(kmax−eiki+1))). (48)

We then split up the second term to get:

 f(kmax)−f(kmax−eiki+1))=k1∑j=1f(kmax−(j−1)ei)−f(kmax−jei), (49)

and use the DR property to bound each individual term by , giving us our result. ∎

Using this, we now have the following combining the two decompositions:

###### Theorem 4.1.

Let be a lattice submodular function with . Then can be represented as the sum of a modular function and a monotone submodular function , with .

This will allow us to get additive bounds similar to :

###### Lemma 4.2.

Consider the problem of minimising . Apply our decomposition to instead have . Here is modular, and are monotone. Then we have:

 minv(x) ≥minxf′(x)+k(x)−g′(kmax), (50) minv(x) ≥f′(0)−g′(kmax)+n∑k=1(minyy∑i=1mi,ki). (51)
###### Proof.

We have:

 minxv(x) =minxf′(x)−g′(x)+k(x) (53) ≥minx(f′(x)+k(x))−maxxg′(x) (54) =minx(f′(x)+k(x))−g′(kmax), (55)

by monotonicity of . Next, we continue from the previous line:

 ≥minxf′(x)+minxk(x)−g′(kmax) (56) =f′(0)−g′(kmax)+n∑k=1(minyy∑i=1mi,ki). (57)

The nested optimisation problem here in the second bound is non-trivial, but for our purposes it doesn’t actually need to be solved. All we need to see is that by setting to zero, we can eliminate this term and thus by the monotonicity of , we will always get that this second bound is less than zero.

We also prove some other results that show this algorithm can be broadly applied. The first results shows that we can efficiently minimise many functions using this approach:

###### Lemma 4.3.

Let be any function on the lattice . Then can be written as the difference of two submodular functions.

###### Proof.

Let be any strictly submodular function on the same lattice, that is, one with all second differences over pairs strictly less than zero. We shall denote by the minimum absolute value of the second differences over all pairs and , that is:

 m=mini≠j,x|g(x+ei+ej)−g(x+ei)−g(x+ej)+g(x)|. (58)

As all second differences of are less than zero as its submodular, note for any other difference we have . We shall additionally define to be the maximum absolute value of the second difference of under the same constraint:

 n=maxi≠j,x|v(x+ei+ej)−v(x+ei)−v(x+ej)+v(x)|. (59)

We can now form the function :

 f(x)=v(x)+nmg(x). (60)

We claim that is submodular, which will give our result, as a submodular function scaled by a constant is submodular. To do this, take any of the second differences over pairs of the function . Denote the second difference as . We find:

 Di,j(f)(x) =Di,j(v)(x)+nmDi,j(g)(x) (61) ≤Di,j(v)(x)−n (62) ≤0, (63)

as required. ∎

Of course, is in general hard to find, so another question that may be asked regarding this problem is how difficult it is to find the functions corresponding to some . That is answered with this result, extending :

###### Lemma 4.4.

Suppose that we know , or a lower bound on as in the previous lemma. Then given a function , we can construct submodular functions such that .

###### Proof.

Consider the submodular function on :

 g(x1,…,xp)=x21+…+x2p−4∑i≠jxixj (64)

We can clearly verify here that is strictly submodular, and we have . Thus if we know a lower bound on we can form in the manner of Lemma as required. ∎

Note that the choice of in the proof above is not special, it simply needs to be submodular and we need to know its . Thus depending on the particular problem , different choices of may give nice or meaningful decompositions.

To do complexity analysis, we will be working on an -approximate version of the algorithm, introduced in . This means that we will only proceed to step if we have . The reason we do this comes from , as we know this problem for set functions is PLS-complete. We now consider the complexity of this procedure. The complexity in minimising the lattice version can be found from a lemma whose statement and proof can be adapted directly from :

###### Lemma 4.5.

The -approximate version of each algorithm has a worst case complexity of , where is the complexity of the iterative step, and .

We also note that as described earlier, the ModMod procedure has the lowest complexity at each iteration.

### 4.1 Constrained Optimisation

For the SubSup procedure, we are minimising a submodular function with some constraint at every iteration. However, we know that this is hard, and also hard to approximate  for the set function case, and so will be for ours also. Therefore this algorithm is not suitable for using constraints.

We know that for some simple constraints such as cardinality, knapsack and polymatroid, the submodular maximisation problem has been studied on the integer lattice for monotone functions . So at least for some subclasses, we can use the SupSub procedure to optimise under constraints.

While to the best of our knowledge no-one has explicitly written algorithms for modular minimisation on the integer lattice, we know it is easy and can be done exactly at least for cardinality constraints, where we can just enumerate all marginal gains and take the lowest in each variable. We will then take the lowest, where our cardinality constraint says .

In the set function case, we can optimise easily and exactly over a variety of other constraints . However, here our separability also gives us linearity, something we lose on our lattice as we separate to arbitrary discrete functions. It is worth looking into more of these constrained optimisation problems.

## 5 Conclusion

In , the authors studied the problem of minimising the difference between two submodular set functions. We have extended this to the case of general lattice submodular functions, without the DR requirement. Additionally, we note that via discretisation, our method can be applied to continuous functions also.

In performing the majorisation-minimisation technique, we extended an earlier bound from  for the upper bound, which is valid for DR-submodular functions, and used a decomposition for submodular functions into DR-submodular functions plus a modular function to take advantage of it. For the lower bound, we used the method of computing the subgradient as in . The result of this greedy algorithm gives us our lower bound, and did not require to be DR-submodular.

After that we formally stated our algorithms, performed some complexity analysis, and analysed other theoretical properties of it. One clear extension on this work would be finding an alternative lower bound that can be used on continuous functions, as our upper bound can already be used in this context.