# Independence Properties of Generalized Submodular Information Measures

Recently a class of generalized information measures was defined on sets of items parametrized by submodular functions. In this paper, we propose and study various notions of independence between sets with respect to such information measures, and connections thereof. Since entropy can also be used to parametrize such measures, we derive interesting independence properties for the entropy of sets of random variables. We also study the notion of multi-set independence and its properties. Finally, we present optimization algorithms for obtaining a set that is independent of another given set, and also discuss the implications and applications of combinatorial independence.

## Authors

• 14 publications
• 18 publications
• 41 publications
06/27/2020

### Submodular Combinatorial Information Measures with Applications in Machine Learning

Information-theoretic quantities like entropy and mutual information hav...
06/05/2019

### A Note on Submodular Maximization over Independence Systems

In this work, we consider the maximization of submodular functions const...
07/13/2017

### On (Anti)Conditional Independence in Dempster-Shafer Theory

This paper verifies a result of Shenoy:94 concerning graphoidal structur...
06/07/2010

### Uncovering the Riffled Independence Structure of Rankings

Representing distributions over permutations can be a daunting task due ...
03/19/2021

### Stochastic comparisons, differential entropy and varentropy for distributions induced by probability density functions

Stimulated by the need of describing useful notions related to informati...
01/21/2020

### A General Class of Weighted Rank Correlation Measures

In this paper we propose a class of weighted rank correlation coefficien...
01/30/2013

### Measure Selection: Notions of Rationality and Representation Independence

We take another look at the general problem of selecting a preferred pro...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

In this paper, we consider the recently proposed class of submodular information measures [8], and study general combinatorial independence characterizations admitted by them. A set function over a finite set is submodular [3] if for all subsets , it holds that . Given a set , we define the gain of an element in the context as . A perhaps more intuitive characterization of submodularity is as follows: a function is submodular if it satisfies diminishing marginal returns, namely for all , and is monotone if for all . Submodular functions are a rich and expressive class of models which capture a number of important aspects like coverage (e.g. set cover function), representation (e.g. facility location function), diversity (e.g. log-determinants), and information (e.g. joint entropy of a set of random variables). Submodular functions have been shown to be closely connected with convexity [14, 2], and concavity [5].

Given a submodular function , we can define the submodular (conditional) mutual information as:

 If(A;B|C)=f(A|C)+f(B|C)−f(A∪B|C) (1)

where the conditional information . Recently, [8, 4] studied several important and interesting properties of such as non-negativity, monotonicity, conditions for submodularity, and upper/lower bounds. Furthermore, [8] also studied multi-set extensions of the submodular mutual information to capture joint information between sets. As argued by [8, 4], the submodular mutual information (or multi-set mutual information) effectively captures the shared information between two sets (or multiple sets), from the lens of the submodular function . [8] also study a number of examples of

, and the modeling capabilities of these in various machine learning applications such as query focused and privacy preserving summarization, clustering and disparate partitioning.

In this work, we extend the paradigm of combinatorial information measures to study the concept of independence among sets. In particular, we first introduce six notions of (conditional) independence between subsets in Section II. Next, in Section III, we study the relationship between the independence types for general submodular functions, and then study these connections for a few special cases, including set cover functions, modular functions, and Entropy. We then study independence between multiple sets (Section IV), and then present optimization algorithms for obtaining a set which is independent of a given set (Section V), and also discuss the implications and applications of this study. While the main focus of this work is combinatorial independence, we note that our results in the special case of Entropy function (specifically Lemma 3.3), is one of the first (to our knowledge) which studies the different possible types of independence between sets of random variables. We provide proofs of our results in Section VI, and conclude this paper in Section VII.

## Ii Submodular (Conditional) Independence:

Two random variables and are statistically independent iff the mutual information . Since we are studying sets of variables (rather than two random variables), the notion of independence becomes a little more intricate. Consequently, we define six types of combinatorial (or submodular) independence relations between two sets with respect to a submodular function . These relations also hold in the entropic case, in which case it is exactly the statistical independence between two sets of random variables. Before going into defining the different independence types, we provide some definitions.

A function is said to satisfy the condition (for any ), if , and . This condition implies that the function is modular with respect to the addition of any element from to any subset of not containing the specific element. This notation is introduced to draw attention towards implications of the difference amongst the definitions of the independence types. Below, we define the six types of independence conditions.

1. [topsep=0pt,itemsep=0ex,partopsep=1ex,parsep=1pt,leftmargin=5mm,label=0.]

2. Joint Independence (): We say set is jointly independent of another set , or with respect to if or equivalently: (or ).

3. Marginal Independence (): We say set is marginally independent with another set , or with respect to if and . In other words, if and

4. Pairwise Independence (): Set is pairwise independent of another set , or with respect to if and . In other words, if .

5. Subset Marginal Independence (): We say set holds subset marginal independence with another set , or with respect to if , and , . In other words, if and . Observe that SMI generalizes MI and PI.

6. Modular Independence (): Set holds modular independence with another set , or with respect to if where . Thus, satisfies the condition .

7. Subset Modular Independence (): Set is said to be subset modular independent of or with respect to if where or . Thus, satisfies both the conditions and

Unless specified, we will use joint independence as the default notion of independence. We also assume that and are disjoint without a loss of generality – since joint independence implies and since is assumed to be monotone, this essentially means that we can remove the elements in from the ground set without letting them affect the functional values.

Similar to independence, we also define conditional combinatorial independence between two sets and given a third set . We say that are conditionally independent of each other given , in the context of , and denoted by iff . An equivalent way of viewing this is in terms of the submodular mutual information: iff and . In terms of the conditional gains, it implies that and . Similar to independence, we can also define the six types of conditional independence.

## Iii Connections between Independence Types

We start this section by providing a relationship between the different types of (conditional) independences. Figure 1 illustrates the containment relationship between the different types of (conditional) independence.

Also, note that we prove the relationships only for the independence case and note that since conditional independence with is equivalent to independence with for fixed given C, the conditions below also hold for the six types of conditional independence.

###### Theorem III.1.

The following relations hold between the different types of independence defined in the previous section. We use to denote the different types compactly.

 ModI⟹JI⟹MI⟺SMI⟹PI (2) andModI⟹SModI⟹MI (3)

Moreover, there exists submodular functions where , , , , , , .

Thus we can see that there is a certain hierarchy attached to the independence types in general which is illustrated in Fig. 1.

### Iii-a Relationship between the Independence Types for the Modular and Set Cover Function

For the special case when is a modular function, all the six independence types are equivalent i.e reverse implications also hold and in fact, this follows from the very definition of a modular function. Next we show that for the Set Cover function the first four types are indeed equivalent.

###### Lemma III.2.

When is a Set Cover function the relation shown in Theorem III.1 is not tight. In fact we have that the reverse implications also hold among the first four types, which makes them in essence equivalent to one another: . However, .

### Iii-B Relationship between the Independence Types for the Entropy Function

As we have noted earlier, Entropy function is generalized by our framework. This is done via defining the universal set of random variables

with the corresponding joint probability distribution

on the random variables. Thus for we have well defined submodular function, entropy , where . The usual celebrated notion of independence is that of , or the set of random variables are independent of the set of random variables . This is the notion of independence as per our exposition on the different types of independence. However with the above definitions of six types of independence in the context of generalized submodular information, we have a richer set of notions of independence and hierarchies thereof for the entropy function. The lemma below further elaborates on the relations of different types of independence for the entropy function.

###### Lemma III.3.

For the entropy function:

 ModI⟹JI⟹MI⟺SMI⟹PI (4) andModI⟹SModI⟹MI (5)

Moreover, there exists tuple, where ,

### Iii-C Miscellaneous Results

Next, we provide sufficient but not necessary conditions for submodular (conditional) independence to hold.

###### Lemma III.4.

For a submodular , . Also, if (and is any set) or (and is any set). Moreover, .

###### Proof.

We first prove that . This follows from definition since if is normalized. Next, if , and hence . Similarly, and hence . The same proof holds for thye case if .

Finally, we show that (the case of follows from a symmetric argument). Recall that implies that . This implies that . The first inequality follows from the monotonicity of in one argument given the other. Hence proved. ∎

The last result in Lemma III.4 is similar to the classical data processing inequality, but defined on sets of variables. Given that the sets

form a Markov chain

111This is in the usual sense of Markovicity, that knowing , is fully determined irrespective of and so forth, it implies that the conditional mutual information . Furthermore, let be a processing operator (). If the processing involves taking subsets , this implies that (from Lemma III.4). For specific sub-classes of functions, the processing can be more interesting. For example, in the case of set cover, let be the inverse operator such that given a concept gives the subset of such that . Similarly, given a subset , we can define . Then given sets , let be any subset. Then the sets form a Markov chain with the property that .

Finally, we give some examples of properties which do not hold for combinatorial independence.

###### Lemma III.5.

Suppose are subsets such that and . This does not however, imply that .

###### Proof.

To see this, again, define and let with being mutually disjoint. Note that and . Lets study . This means that is not jointly independent of . This is a good segway to the next subsection, which studies independence among -sets. ∎

## Iv Multi-Set Submodular Independence

In this section, we first introduce multi-set submodular independence. In particular, we introduce two concepts of independence among sets.

• Sets are mutually independent iff .

• Sets are pairwise independent iff .

Next, we study the connection between mutual and pairwise independence of sets . We first define the multi-set total correlation as [8]:

 Cf(A1;⋯;Ak)=k∑i=1f(Ai)−f(∪ki=1Ai) (6)

Next, note that the mutual submodular independence between sets means that . On the other hand, pairwise independendence implies that for all pairs, . Both types of independence are again, w.l.o.g. defined on disjoint sets of items. The following result connects the two types of independences.

###### Lemma IV.1.

Given a monotone, non-negative and normalized submodular function , mutual independence implies pairwise independence. However, pairwise independence does not imply mutual independence.

###### Proof.

Lets prove that mutual independence implies pairwise independence. To prove this, lets assume that sets are mutually independent, but not pairwise independent. That is, there exist two sets which are not joint independent. W.l.o.g, lets assume they are . In other words, . This implies (following from submodularity):

 f(∪iAi)≤f(A1∪A2)+k∑i=3f(∪iAi) (7)

Now invoking the assumption that , this means that:

 f(∪iAi)

which means that which contradicts mutual independence. Hence, given mutual independence, this must imply pairwise independence. However, pairwise independence does not imply mutual independence. We prove this with an example. Again, define and let be mutually disjoint sets. Note that all pairs are pairwise independent. However, they are not mutually independent since for . ∎

Finally, we consider an alternative to mutual independence, which is instead of . Recall that the multi-set mutual information is defined as:

 If(A1;A2;…;Ak)=−∑T⊆[k](−1)|T|f(∪i∈TAi). (9)

Note that and are different quantities and they will in general provide different conditions for the mutual independence between sets . However, the condition that the multi-set submodular mutual information is zero is unfortunately not very interesting. To understand this, we look at it via the example of the set cover function.

###### Example IV.1.

If , the multi-set submodular mutual information being zero is equivalent to . This is, however, not very interesting since even if any two sets and are jointly independent, they will satisfy and hence it will hold that . This means that even if only two sets are mutually independent, but the rest of the sets are completely dependent (or even identical), the multi-set submodular mutual information will still be zero.

## V The Utility of Independence Characterizations

Here, we present some discussion on the applications and utility of our formulations. In several data subset selection applications (e.g. video/image collection summarization [10, 11, 16], data selection for efficient training [17, 13, 12]

, and active learning

[1, 13, 17], submodular functions have been shown to be a natural fit. In particular, many of these problems involve optimizing a submodular function under constraints, such as cardinality, knapsack, and matroid constraints [9, 15].

Submodular Independence discussed in this work, can be viewed as a new class of combinatorial constraints, and we will refer to this constraint as . In particular, we can then consider optimization problem as:

 maxA⊆Vg(A),s.t. A⊥fP (10)

for a given set . This has several natural applications. The first is privacy preserving summarization [10, 11] where we want to select a subset which is as different as possible from set (and this difference is measured with respect to the submodular function). For instance this private set could be of one’s personal picture collection or medical data, or could be images of one’s family. The independence equates to lack of discernment of any information contained in by knowing . If the independence considered here is JI, then this is equivalent to , which we can equivalently relax to for a very small . We can also consider the different types of independence here since the constraint may not be amenable to tractable optimization algorithms when is not submodular in A for a fixed (which is true for several important classes of submodular functions). In such cases, we can relax this to SMI or MI instead of JI, which requires that and

If is second-order supermodular [8], then the problem of maximizing subject to a constraint that is an instance of SCSK [7, 6] which admits bounded approximation guarantees. For general though, achieving such a set could be NP hard. On the other hand, the independence characterizations for MI (equivalently SMI) and PI are much easier. In particular, for MI, we just need to find elements such that . Similarly, for PI, we find elements such that .

## Vi Proofs of Results in Section III

Proof of Theorem III.1.

###### Proof.

We start with . implies that . Let be the elements in , and define a chain of sets corresponding to the above ordering as . The set . Similarly, consider the chain of sets with . Then note that and . Since , this implies that . Note that and hence this implies that for each , . Setting , we have . Note that since the order can be arbitrary, we have .

Next, we show that . For this, first observe that by submodularity. From we have, and . Also with submodularity we have: . But . Similar proof works for showing the other case with . For , we can just use (since we trivially have ). Similarly for the other case, use .

We have by just substituting in the singleton set as: in and it follows through trivially since . Similarly we can also substitute for showing the other case.

For , recall that implies that and. Let and . Furthermore, define and . Then, . Similarly we obtain and clearly, this satisfies when and are disjoint. This shows that .

We then show . implies and or . Setting and , we get . Also, since only requires and while requires . Finally, we show the reverse implications do not hold with examples. We start with . Let and be sets such that . Concretely, let , , and . Note that sets and are independent since . However, it is not independent since let , then .

Next, we show that . Let with and being disjoint sets of size . Then, and are -independent since for all or , (the function is modular if we only add a single element). However, they are not independent since we can set for a specific , and then selecting a , we have that . We then show that and . To show , we use the set-cover function used earlier. The specific sets and are independent however they are not independent since let and . Then . To show that , again use the matroid rank function with and being disjoint sets of size . Notice that and are independent. However, and hence they are independent.

Next, we show that and . follows since and . Hence there exists a , and sets which are independent (and hence independent) but not independent. Similarly, follows since and . Finally, we show that . Again, let with and . Note that every pair of elements satisfy . However, . Hence and are not independent. ∎

Proof of Lemma III.2.

###### Proof.

To prove this, we show that . We know that (when ) iff . Similarly, with if for every . Note that this condition then also implies and hence which proves the equivalence. Since , this means that all the four types are equivalent. Finally, since the proof of Theorem III.1 uses an instance of set cover to show that . ∎

Proof of Lemma III.3.

###### Proof.

Since entropy is a submodular function, the following holds from the proof of Lemma III.1.

 ModI⟹JI⟹MI⟺SMI⟹PI (11) andModI⟹SModI⟹MI (12)

Next, we show that reverse implications do not hold. We start with . Consider

with the joint distribution satisfying

. For an example, let be three binary random variables (taking values in ) such that are jointly distributed as , and we have . Take . Since the pair are independent of , we have and , i.e., they are independent but we have , implying A and B are not independent.

For showing we consider the same example as above with and let . and are independent as shown above. However, this would not imply necessarily independence criterion which needs here pairwise independence between all the variables and that .

Next, we show and . To prove them both, we consider the example with with , and for . Note here we have . Let us have and . Thus, with the chosen joint distribution we have to be true which needs pairwise independence and the mutual independence between any set of three random variables. Since , is not true. Also as , is not true. Therefore, and .

follows since and . Similarly, follows since and .

In the end, we show

. Here consider an example, where the assumed probability distribution on

which take values in , is as follows: . It is easy to check here that there is pairwise independence, as, and that , . Thus there is pairwise independence which implies independence for sets and . However there is no mutual independence here as we can see that . In fact, by the way of construction of the example, knowing and one knows with certainty, which implies . This implies independence does not hold for the sets as it requires at least , which is not true in this example. ∎

## Vii Conclusions

To conclude, in this paper, we study different independence classes of combinatorial independence, and their relationships. We then discuss the implications of these results in the entropic case, and provide algorithms for obtaining the independent sets for different independence classes. Finally, we discuss some implications of the results for applications like summarization and clustering.

## References

• [1] J. T. Ash, C. Zhang, A. Krishnamurthy, J. Langford, and A. Agarwal (2019) Deep batch active learning by diverse, uncertain gradient lower bounds. arXiv preprint arXiv:1906.03671. Cited by: §V.
• [2] F. Bach (2011) Learning with submodular functions: a convex optimization perspective. arXiv preprint arXiv:1111.6453. Cited by: §I.
• [3] S. Fujishige (2005) Submodular functions and optimization. Vol. 58, Elsevier Science. Cited by: §I.
• [4] A. Gupta and R. Levin (2020) The online submodular cover problem. In ACM-SIAM Symposium on Discrete Algorithms, Cited by: §I.
• [5] R. Iyer and J. Bilmes (2020) Concave aspects of submodular functions. In Proc. ISIT. Cited by: §I.
• [6] R. Iyer, S. Jegelka, and J. Bilmes (2013) Fast semidifferential-based submodular function optimization: extended version. In ICML, Cited by: §V.
• [7] R. K. Iyer and J. A. Bilmes (2013) Submodular optimization with submodular cover and submodular knapsack constraints. In Advances in Neural Information Processing Systems, pp. 2436–2444. Cited by: §V.
• [8] R. Iyer, N. Khargoankar, J. Bilmes, and H. Asanani (2021) Submodular combinatorial information measures with applications in machine learning. In Algorithmic Learning Theory. Cited by: Independence Properties of Generalized Submodular Information Measures, §I, §I, §IV, §V.
• [9] R. K. Iyer (2015) Submodular optimization and machine learning: theoretical results, unifying and scalable algorithms, and applications. Ph.D. Thesis. Cited by: §V.
• [10] V. Kaushal, S. Kothawade, G. Ramakrishnan, J. Bilmes, H. Asnani, and R. Iyer (2020) A unified framework for generic, query-focused, privacy preserving and update summarization using submodular information measures. arXiv preprint arXiv:2010.05631. Cited by: §V, §V.
• [11] V. Kaushal, S. Kothawade, G. Ramakrishnan, J. Bilmes, and R. Iyer (2021) PRISM: a unified framework of parameterized submodular information measures for targeted data subset selection and summarization. arXiv preprint arXiv:2103.00128. Cited by: §V, §V.
• [12] K. Killamsetty, D. Sivasubramanian, B. Mirzasoleiman, G. Ramakrishnan, A. De, and R. Iyer (2021) GRAD-match: a gradient matching based data subset selection for efficient learning. In Proc. ICML. Cited by: §V.
• [13] K. Killamsetty, D. Sivasubramanian, G. Ramakrishnan, and R. Iyer (2020) GLISTER: generalization based data subset selection for efficient and robust learning. arXiv preprint arXiv:2012.10630. Cited by: §V.
• [14] L. Lovász (1983) Submodular functions and convexity. In Mathematical Programming The State of the Art, pp. 235–257. Cited by: §I.
• [15] E. Tohidi, R. Amiri, M. Coutino, D. Gesbert, G. Leus, and A. Karbasi (2020) Submodularity in action: from machine learning to signal processing applications. IEEE Signal Processing Magazine 37 (5), pp. 120–133. Cited by: §V.
• [16] S. Tschiatschek, R. K. Iyer, H. Wei, and J. A. Bilmes (2014) Learning Mixtures of Submodular Functions for Image Collection Summarization. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.), Cited by: §V.
• [17] K. Wei, R. Iyer, and J. Bilmes (2015) Submodularity in data subset selection and active learning. In International Conference on Machine Learning, pp. 1954–1963. Cited by: §V.