1 Introduction
Background and motivation
The subspace selection problem involves seeking a good subspace from data. Mathematically, the problem is formulated as follows. Let be a family of subspaces of , be a set of feasible subspaces, and be an objective function. Then, the task is to solve the following optimization problem.
(1.1) 
This problem is a kind of feature selection problem, and contains several important machine learning problems such as the principal component analysis and sparse dictionary selection problem.
In general, the subspace selection problem is a nonconvex continuous optimization problem; hence it is hopeless to obtain a provable approximate solution. On the other hand, such solution can be obtained efficiently in some special cases. The most important example is the principal component analysis. Let be the set of all the subspaces of , be the subspaces with dimension of at most , and be the function defined by
(1.2) 
where is the given data and is the projection to subspace . Then, problem (1.1) with these , , and defines the principal component analysis problem. As we know, the greedy algorithm, which iteratively selects a new direction that maximizes the objective function, gives the optimal solution to problem (1.1). Another important problem is the sparse dictionary selection problem. Let
be a set of vectors, called a dictionary. For a subset
, we denote by the subspace spanned by . Let be the subspaces spanned by a subset of , and be the subspaces spanned by at most vectors of . Then, the problem (1.1) with these , , and in (1.2) defines the sparse dictionary selection problem. The problem is in general difficult to solve natarajan1995sparse ; however, the greedytype algorithms, e.g., orthogonal matching pursuit, yield provable approximation guarantees depending on the mutual coherence of .Here, we are interested in the following research question: Why the principal component analysis and the sparse dictionary selection problem can be solved by the greedy algorithms, and what classes of objective functions and constraints have the same property?
Existing approach
Several researchers have considered this research question (see Related work below). One successful approach is employing submodularity. Let be a (possibly infinite) set of vectors. We define by . If this function satisfies the submodularity, , or some its approximation variants, we obtain a provable approximation guarantee of the greedy algorithm krause2010submodular ; das2011submodular ; elenberg2016restricted ; khanna2017approximation .
However, this approach has a crucial issue that it cannot capture the structure of vector spaces. Consider three vectors , , and in . Then, we have ; therefore, . However, this property (a single subspace is spanned by different bases) is overlooked in the existing approach, which yields underestimation of the approximation factors of the greedy algorithms (see Section 4.2).
Our approach
In this study, we employ Lattice Theory to capture the structure of vector spaces. A lattice is a partially ordered set closed under the greatest lower bound (aka., meet, ) and the least upper bound (aka., join, ).
The family of all subspaces of is called the vector lattice , which forms a lattice whose meet and join operators correspond to the intersection and direct sum of subspaces, respectively. This lattice can capture the structure of vector spaces as mentioned above. Also, the family of subspaces spanned by a subset of forms a lattice.
We want to establish a submodular maximization theory on lattice. Here, the main difficulty is a “nice” definition of submodularity. Usually, the lattice submodularity is defined by the following inequality topkis1978minimizing , which is a natural generalization of set submodularity.
(1.3) 
However, this is too strong that it cannot capture the principal component analysis as shown below.
Example 1.
Consider the vector lattice . Let and be subspaces of where is sufficiently small. Let be the given data. Then, function (1.2) satisfies , , , and . Therefore, it does not satisfy the lattice submodularity. A more important point is that, since we can take , there is no constants and such that on this lattice. This means that it is very difficult to formulate this function as an approximated version of a lattice submodular function.
Another commonly used submodularity is the diminishing return (DR)submodularity soma2015generalization ; bian2016guaranteed ; soma2017non , which is originally introduced on the integer lattice . A function is DRsubmodular if
(1.4) 
for all (component wise inequality) and , where is the th unit vector. This definition is later extended to distributive lattices gottschalk2015submodular and can be extended to general lattices (see Section 3). However, Example 1 above is still crucial, and therefore the objective function of the principal component analysis cannot be an approximated version of a DRsubmodular function.
To summarize the above discussion, our main task is to define submodularity on lattices that should satisfy the following two properties:

It captures some important practical problems such as the principal component analysis.

It admits efficient approximation algorithms on some constraints.
Our contributions
In this study, in response to the above two requirements, we make the following contributions:

We define downward DRsubmodularity and upward DRsubmodularity on lattices, which generalize the DRsubmodularity (Section 3). Our directional DRsubmodularities are capable of representing important machine learning problems such as the principal component analysis and sparse dictionary selection problem (Section 4).

We propose approximation algorithms for maximizing (1) monotone downward DRsubmodular function over height constraint, (2) monotone downward DRsubmodular function over knapsack constraint, and (3) nonmonotone DRsubmodular function (Section 5). These are obtained by generalizing the existing algorithms for maximizing the submodular set functions. Thus, even our directional DRsubmodularities are strictly weaker than the lattice DRsubmodularity; it is sufficient to admit approximation algorithms.
All the proofs of propositions and theorems are given in Appendix in the supplementary material.
Related Work
For the principal component analysis, we can see that the greedy algorithm, which iteratively selects the largest eigenvectors of the correlation matrix, solves the principal component analysis problem exactly
abdi2010principal .With regard to the sparse dictionary selection problem, several studies gilbert2003approximation ; tropp2003improved ; tropp2004greed ; das2008algorithms have analyzed greedy algorithms. In general, the objective function for the sparse dictionary selection problem is not submodular. Therefore, researchers introduced approximated versions of the submodularity and analyzed the approximation guarantee of algorithms with respect to the parameter.
Krause and Cevher krause2010submodular showed that function (1.2) is an approximately submodular function whose additive gap depends on the mutual coherence. They also showed that the greedy algorithm gives approximate solution.^{1}^{1}1A solution is an approximate solution if it satisfies . If then we simply say that it is an approximate solution.
Das and Kempe das2011submodular introduced the submodularity ratio, which is another measure of submodularity. For the set function maximization problem, the greedy algorithm attains a provable approximation guarantee depending on the submodularity ratio. The approximation ratio of the greedy algorithm is further improved by combining with the curvature bian2017guarantees . Elenberg et al. elenberg2016restricted showed that, if function has a bounded restricted convexity and a bounded smoothness, the corresponding set function has a bounded submodularity ratio. Khanna et al. khanna2017approximation applied the submodularity ratio for the lowrank approximation problem.
It should be emphasized that all the existing studies analyzed the greedy algorithm as a function of a set of vectors (the basis of the subspace), instead of as a function of a subspace. This overlooks the structure of the subspaces causing difficulties as described above.
2 Preliminaries
A lattice is a partially ordered set (poset) such that, for any , the least upper bound and the greatest lower bound uniquely exist. We often say “ is a lattice” by omitting if the order is clear from the context. In this paper, we assume that the lattice has the smallest element .
A subset is lower set if then any with is also . For , the set is called the lower set of .
A sequence of elements of is a composition series if there is no such that for all . The length of the longest composition series from to is referred to as the height of and is denoted by . The height of a lattice is defined by . If this value is finite, the lattice has the largest element . Note that the height of a lattice can be finite even if the lattice has infinitely many elements. For example, the height of the vector lattice is .
A lattice is distributive if it satisfies the distributive law: . A lattice is modular if it satisfies the modular law: . Every distributive lattice is modular. On a modular lattice , all the composition series between and have the same length. The lattice is modular if and only if its height function satisfies the modular equality: . Modular lattices often appear with algebraic structures. For example, the set of all subspaces of a vector space forms a modular lattice. Similarly, the set of all normal subgroups of a group forms a modular lattice.
For a lattice , an element is joinirreducible if there no such that .^{2}^{2}2For the set lattice of a set , the joinirreducible elements correspond to the singleton sets, . Thus, for clarity, we use upper case letters for general lattice elements (e.g., or ) and lower case letters for joinirreducible elements (e.g., or ). We denote by the set of all joinirreducible elements. Any element is represented by a join of joinirreducible elements; therefore the structure of is specified by the structure of . A join irreducible element is admissible with respect to an element if and any with satisfies . We denote by the set of all admissible elements with respect to . A set is called a closure of at . See Figures 2.2 and 2.2 for the definition of admissible elements and closure. Note that is admissible with respect to if and only if the distance from the lower set of to is one.
Example 2.
In the vector lattice , each element corresponds to a subspace. An element is joinirreducible if and only if it has dimension one. A joinirreducible element is admissible to if these are linearly independent. The closure is the one dimensional subspaces contained in independent to .
3 Directional DRsubmodular functions on modular lattices
We introduce new submodularities on lattices. As described in Section 1, our task is to find useful definitions of “submodularities” on lattices; thus, this section is the most important part of this paper.
Recall definition (1.4) of the DRsubmodularity on the integer lattice. Then, we can see that and for and , where and are the th components of and , respectively. Here, and are joinirreducibles in the integer lattice, , , and . Thus, a natural definition of the DRsubmodularity on lattices is as follows.
Definition 3 (Strong DRsubmodularity).
A function is strong DRsubmodular if, for all with and with , the following holds.
(3.1) 
The same definition is introduced by Gottshalk and Peis gottschalk2015submodular for distributive lattices. However, this is too strong for our purpose because it cannot capture the principal component analysis; you can check this in Example 1. Therefore, we need a weaker concept of DRsubmodularities.
Recall that for all . Thus, the strong DRsubmodularity (3.1) is equivalent to the following.
(3.2) 
By relaxing the outer to , we obtain the following definition.
Definition 4 (Downward DRsubmodularity).
Let be a lattice. A function is downward DRsubmodular with additive gap , if for all and , the following holds.
(3.3) 
Similarly, the strong DRsubmodularity (3.1) is equivalent to the following.
(3.4) 
By relaxing the inner to , we obtain the following definition.
Definition 5 (Upward DRsubmodularity).
Let be a lattice. is upward DRsubmodular with additive gap , if for all and with , the following holds.
(3.5) 
If a function is both downward DRsubmodular with additive gap and upward DRsubmodular with additive gap , then we say that is bidirectional DRsubmodular with additive gap . We say directional DRsubmodularity to refer these new DRsubmodularities.
The strong DRsubmodularity implies the bidirectional DRsubmodularity, because both downward and upward DRsubmodularities are relaxations of the strong DRsubmodularity. Interestingly, the converse also holds in distributive lattices.
Proposition 6.
On a distributive lattice, the strong DRsubmodularity, downward DRsubmodularity, and upward DRsubmodularity are equivalent. ∎
Therefore, we can say that directional DRsubmodularities are required to capture the specialty of nondistributive lattices such as the vector lattice.
4 Examples
In this section, we present several examples of directional DRsubmodular functions to show that our concepts can capture several machine learning problems.
4.1 Principal component analysis
Let be the given data. We consider the vector lattice of all the subspaces of , and the objective function defined by (1.2). Then, the following holds.
Proposition 7.
The function defined by (1.2) is a monotone bidirectional DRsubmodular function. ∎
This provides a reason why the principal component analysis is solved by the greedy algorithm from the viewpoint of submodularity.
The objective function can be generalized further. Let be a monotone nondecreasing concave function with for each . Let
(4.1) 
Then, the following holds.
Proposition 8.
The function defined by (4.1) is a monotone bidirectional DRsubmodular function. ∎
If we use this function instead of the standard function (1.2), we can ignore the contributions from very large vectors because if is already well approximated in , there is less incentive to seek larger subspace for due to the concavity of . See Experiment in Appendix.
4.2 Sparse dictionary selection
Let be a set of vectors called a dictionary. We consider of all subspaces spanned by , which forms a (not necessarily modular) lattice. The height of coincides with the dimension of . Let be the given data. Then the sparse dictionary selection problem is formulated by the maximization problem of defined by (1.2) on this lattice under the height constraint.
In general, the function is not a directional DRsubmodular function on this lattice. However, we can prove that is a downward DRsubmodular function with a provable additive gap. We introduce the following definition.
Definition 9 (Mutual coherence of lattice).
Let be a lattice of subspaces. For , the lattice has mutual coherence , if for any , there exists such that , , and for all unit vectors and , . The infimum of such is called the mutual coherence of , and is denoted by .
Our mutual coherence of a lattice is a generalization of the mutual coherence of a set of vectors donoho2003optimally . For a set of unit vectors , its mutual coherence is defined by . The mutual coherence of a set of vector is extensively used in compressed sensing to prove the uniqueness of the solution in a sparse recovery problem eldar2012compressed . Here, we have the following relation between the mutual coherence of a lattice and that of a set of vectors, which is the reason why we named our quantity mutual coherence.
Lemma 10.
Let be a set of unit vectors whose mutual coherence is . Then, the lattice generated by the vectors has mutual coherence . ∎
This means that if a set of vectors has a small mutual coherence, then the lattice generated by the vectors has a small mutual coherence. Note that the converse does not hold. Consider where , , and for sufficiently small . Then the mutual coherence of the vectors is ; however, the mutual coherence of the lattice generated by is . This shows that the mutual coherence of a lattice is a more robust concept than that of a set of vectors, which is a strong advantage of considering a lattice instead of a set of vectors.
If a lattice has a small mutual coherence, we can prove that the function is a monotone downward DRsubmodular function with a small additive gap.
Proposition 11.
Let be normalized vectors and be a lattice generated by . Suppose that forms a modular lattice. Let . Then, the function defined in (4.1) is a downward DRsubmodular function with additive gap at most where . ∎
4.3 Quantum cut
Finally, we present an example of a nonmonotone bidirectional DRsubmodular function. Let be a directed graph, and be a weight function. The cut function is then defined by where is the indicator function of and is the complement of . This is a nonmonotone submodular function. Maximizing the cut function has application in feature selection problems with diversity lin2009graph .
We extend the cut function to the “quantum” setting. We say that a lattice of vector spaces is orthocomplementable if then where is the orthogonal complement of . Let be vectors assigned on each vertex. For an orthocomplementable lattice , the quantum cut function is defined by
(4.2) 
If for all , where is the th unit vector, and is the lattice of axisparallel subspaces of , function (4.2) coincides with the original cut function. Moreover, it carries the submodularity.
Proposition 12.
The function defined by (4.2) is a bidirectional DRsubmodular function. ∎
The quantum cut function could be used for subspace selection problem with diversity. For example, in a natural language processing problem, the words are usually embedded into a latent vector space
mikolov2013distributed . Usually, we select a subset of words to summarize documents; however, if we want to select a “meaning”, which is encoded in the vector space as a subspace kim2013deriving , it would be promising to select a subspace. In such an example, the quantum cut function (4.2) can be used to incorporate the diversity represented by the graph of words.5 Algorithms
We provide algorithms for maximizing (1) a monotone downwardDR submodular function on the height constraint, which generalizes the cardinality constraint (Section 5.1), (2) a monotone downward DRsubmodular function on knapsack constraint (Section 5.2), and (3) a nonmonotone bidirectional DRsubmodular function (Section 5.3). Basically, these algorithms are extensions of the algorithms for the set lattice. This indicates that our definitions of directional DRsubmodularities are natural and useful.
Below, we always assume that is normalized, i.e., .
5.1 Height constraint
We first consider the height constraint, i.e., . This coincides with the cardinality constraint if is the set lattice. In general, this constraint is very difficult analyze because can be arbitrary large. Thus, we assume that the height function is incremental, i.e., for all and . Note that if and only if is modular.
We show that, as similar to the set lattice, the greedy algorithm (Algorithm 1) achieves approximation for the downward DRsubmodular maximization problem over the height constraint.
Theorem 13.
Let be a lattice whose height function is incremental, and be a downward DRsubmodular function with additive gap . Then, Algorithm 1 finds approximate solution of the height constrained monotone submodular maximization problem.^{3}^{3}3Algorithm 1 requires solving the nonconvex optimization problem in Step 3. If we can only obtain an approximate solution in Step 3, the approximation ratio of the algorithm reduces to . In particular, on modular lattice with , it gives approximation. ∎
5.2 Knapsack constraint
Next, we consider the knapsack constrained problem. A knapsack constraint on a lattice is specified by a nonnegative modular function (cost function) and nonnegative number (budget) such that the feasible region is given by .
In general, it is NPhard to obtain a constant factor approximation for a knapsack constrained problem even for a distributive lattice gottschalk2015submodular . Therefore, we need additional assumptions on the cost function.
We say that a modular function is orderconsistent if for all , , , and . The height function of a modular lattice is orderconsistent, because for all and ; therefore it generalizes the height function. Moreover, on the set lattice , any modular function is orderconsistent because there is no joinirreducible such that holds; therefore it generalizes the standard knapsack constraint on sets.
For a knapsack constraint with an orderconsistent nonnegative modular function, we obtain a provable approximation ratio.
Theorem 14.
Let be a lattice, be a knapsack constraint where be an orderconsistent modular function, , and be a monotone downward DRsubmodular function with additive gap . Then, Algorithm 2 gives approximation of the knapsack constrained monotone submodular maximization problem. ∎
5.3 Nonmonotone unconstrained maximization
Finally, we consider the unconstrained nonmonotone maximization problem.
The double greedy algorithm buchbinder2015tight achieves the optimal approximation ratio on the unconstrained nonmonotone submodular set function maximization problem. To extend the double greedy algorithm to lattices, we have to assume that the lattice has a finite height. This is needed to terminate the algorithm in a finite step. We also assume both downward DRsubmodularity and upward DRsubmodularity, i.e., bidirectional DRsubmodularity. Finally, we assume that the lattice is modular. This is needed to analyze the approximation guarantee.
Theorem 15.
Let be a modular lattice of finite height, , and be nonmonotone bidirectional DRsubmodular function with additive gap . Then, Algorithm 3 gives approximate solution of the unconstrained nonmonotone submodular maximization problem.
6 Conclusion
In this paper, we formulated the subspace selection problem as optimization problem over lattices. By introducing new “DRsubmodularities” on lattices, named directional DRsubmodularities, we successfully characterize the solvable subspace selection problem in terms of the submodularity. In particular, our definitions successfully capture the solvability of the principal component analysis and sparse dictionary selection problem. We propose algorithms with provable approximation guarantees for directional DRsubmodular functions over several constraints.
There are several interesting future directions. Developing an algorithm for the matroid constraint over lattice is important since it is a fundamental constraint in submodular set function maximization problem. Related with this direction, extending the continuous relaxation type algorithms over lattices is very interesting. Such algorithms have been used to obtain the optimal approximation factors to matroid constrained submodular set function maximization problem.
It is also an interesting direction to look for machine learning applications of the directional DRsubmodular maximization other than the subspace selection problem. The possible candidates include the subgroup selection problem and the subpartition selection problem.
References
 [1] Hervé Abdi and Lynne J Williams. Principal component analysis. Wiley interdisciplinary reviews: computational statistics, 2(4):433–459, 2010.
 [2] Andrew An Bian, Joachim M Buhmann, Andreas Krause, and Sebastian Tschiatschek. Guarantees for greedy maximization of nonsubmodular functions with applications. In International Conference on Machine Learning (ICML’17), 2017.

[3]
Andrew An Bian, Baharan Mirzasoleiman, Joachim Buhmann, and Andreas Krause.
Guaranteed Nonconvex Optimization: Submodular Maximization over
Continuous Domains.
In
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS’17)
, pages 111–120, 2017.  [4] Niv Buchbinder, Moran Feldman, Joseph Seffi, and Roy Schwartz. A tight linear time (1/2)approximation for unconstrained submodular maximization. SIAM Journal on Computing, 44(5):1384–1402, 2015.

[5]
Abhimanyu Das and David Kempe.
Algorithms for subset selection in linear regression.
InProceedings of the 40th Annual ACM Symposium on Theory of Computing (STOC’08)
, pages 45–54, 2008.  [6] Abhimanyu Das and David Kempe. Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection. Proceedings of the 28th International Conference on Machine Learning (ICML’11), pages 1057–1064, 2011.
 [7] David L Donoho and Michael Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization. Proceedings of the National Academy of Sciences, 100(5):2197–2202, 2003.
 [8] Yonina C Eldar and Gitta Kutyniok. Compressed sensing: theory and applications. Cambridge University Press, 2012.
 [9] Ethan R Elenberg, Rajiv Khanna, Alexandros G Dimakis, and Sahand Negahban. Restricted strong convexity implies weak submodularity. arXiv preprint arXiv:1612.00804, 2016.
 [10] Anna C Gilbert, S Muthukrishnan, and Martin J Strauss. Approximation of functions over redundant dictionaries using coherence. In Proceedings of the 14th ACMSIAM Symposium on Discrete algorithms (SODA’03), pages 243–252, 2003.
 [11] Corinna Gottschalk and Britta Peis. Submodular function maximization over distributive and integer lattices. arXiv preprint arXiv:1505.05423, 2015.
 [12] George Grätzer. General lattice theory. Springer Science & Business Media, 2002.
 [13] Rajiv Khanna, Ethan R. Elenberg, Alexandros G. Dimakis, Joydeep Ghosh, and Sahand Negahban. On approximation guarantees for greedy low rank optimization. In Proceedings of the 34th International Conference on Machine Learning (ICML’17), pages 1837–1846, 2017.
 [14] JooKyung Kim and MarieCatherine de Marneffe. Deriving adjectival scales from continuous space word representations. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP’13), pages 1625–1630, 2013.
 [15] Andreas Krause and Volkan Cevher. Submodular dictionary selection for sparse representation. In Proceedings of the 27th International Conference on Machine Learning (ICML’10), pages 567–574, 2010.
 [16] Hui Lin, Jeff Bilmes, and Shasha Xie. Graphbased submodular selection for extractive summarization. In In Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU’09), pages 381–386. IEEE, 2009.
 [17] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (NIPS’13), pages 3111–3119, 2013.
 [18] Balas Kausik Natarajan. Sparse approximate solutions to linear systems. SIAM Journal on Computing, 24(2):227–234, 1995.
 [19] Tasuku Soma and Yuichi Yoshida. A generalization of submodular cover via the diminishing return property on the integer lattice. In Advances in Neural Information Processing Systems (NIPS’15), pages 847–855, 2015.
 [20] Tasuku Soma and Yuichi Yoshida. Nonmonotone drsubmodular function maximization. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’17), volume 17, pages 898–904, 2017.
 [21] Gilbert Strang, Gilbert Strang, Gilbert Strang, and Gilbert Strang. Introduction to linear algebra, volume 3. WellesleyCambridge Press Wellesley, MA, 1993.
 [22] Donald M Topkis. Minimizing a submodular function on a lattice. Operations Research, 26(2):305–321, 1978.
 [23] Joel A Tropp. Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information theory, 50(10):2231–2242, 2004.
 [24] Joel A Tropp, Anna C Gilbert, Sambavi Muthukrishnan, and Martin J Strauss. Improved sparse approximation over quasiincoherent dictionaries. In Proceedings of the International Conference on Image Processing (ICIP’03), volume 1, pages I–37. IEEE, 2003.
Appendix A Proofs
In this section, we provide proofs omitted in the main body.
Proof of Proposition 6.
We use the Birkhoff’s representation theorem for distributive lattice. A set is a lower set if then for all . The lower sets forms a lattice under the inclusion order. We call this lattice lower set lattice of .
Theorem 16 (Birkhoff’s representation theorem; see [12]).
Any finite distributive lattice is isomorphic to the lower set lattice of . The isomorphism is given by . ∎
This theorem implies that, for any , the corresponding lower set of is uniquely determined. Therefore, for any , we have for all .
(Downward Strong) By Birkhoff’s representation theorem, we have . Thus the replaced maximum in (3.3) coincides with the minimum.
(Upward Strong) By Birkhoff’s representation theorem, for any and with , the element such that is uniquely determined (i.e., represent as a lower set of and remove from the lower set). Thus, the replaced minimum in (3.5) coincides with the maximum. ∎
Proofs of Propositions 7, 8.
The downward DRsubmodularity follows from Proposition 11, which is proved below, since the mutual coherence of is zero. Thus, we here prove the upward DRsubmodularity. To simplify the notation, we prove the case that . Extension to the general case is easy.
Let and with . Since the height of joinirreducible elements is one in the vector lattice, the outer max in (3.5) is negligible. Let , where is the orthogonal complement of . By the modularity of the height, is 1dimensional subspace. In particular, it is joinirreducible. Notice that . Since , we have
(A.1) 
Here, we identify 1dimensional subspace as a unit vector in the space. Let . By using the modularity of the height again, we have . Since , we have
(A.2) 
By the concavity of and , we obtain
(A.3) 
This shows the upward DRsubmodularity. ∎
∎
Proofs of Proposition 11.
To simplify the notation, we prove the case that . Extension to the general case is easy.
Let . Since the joinirreducible elements has height one in this lattice, the additive gap is given by
(A.4) 
Let and arbitrary. By the definition of mutual coherence, there exists that has low coherence with . Let . Then, by the modularity of the height function, we have and it is a joinirreducible element. Since , we have . By comparing the height of and , we have .
We use at the RHS and evaluate
(A.5) 
Let be a unit vector in orthogonal to . Note that may not be the element of .
(A.6)  
(A.7) 
where the second inequality follows from the concavity of with the monotonicity of the mapping . Thus,
(A.8) 
where is the unit vector proportional to . If then, by the monotonicity of , we have . Therefore, we only have to consider the reverse case. In such case, by the concavity, we have
(A.9) 
Here, is the derivative of at .
Let us denote where is a unit vector in orthogonal to . Then, by the definition of the mutual coherence, we have . Also, we have . By the construction, we have where . Thus, we have
(A.10)  
(A.11) 
Therefore, by using , we have
(A.12)  
(A.13) 
∎
Proof of Lemma 10.
Suppose that has dimension . Let . Then, there exists such that any vector is represented by a linear combination of them. We construct by selecting maximally independent vectors to and let , where . By the dimension theorem of vector space and the fact that is the subspace of the intersection of and , we have . Here, the lefthand side is and the righthand side is . Therefore, . This shows .
We check the condition of the mutual coherence. Let and be normalized vectors in and . Then we have
(A.14) 
where , , and . Here, . Therefore we prove that and are small. Since is normalized, we have
(A.15) 
where and
is the smallest eigenvalue of
. Since the diagonal elements of are one, and the absolute values of the offdiagonal elements are at most , the Gerschgorin circle theorem [21] implies that . Therefore, . Similarly, . Therefore, . ∎Proof of Proposition 12.
We first check the downward DRsubmodularity. Take arbitrary subspaces and with and . Without loss of generality, we can suppose . To simplify the notation, we use the same symbol to represent the unit vector in the subspace . By a direct calculation,
(A.16) 
Hence,
(A.17) 
Since , we have
(A.18) 
Since , we have and . Hence, each summand in is smaller than that in . This shows the downward DRsubmodularity.
Next, we check the upward DRsubmodularity. Take arbitrary subspaces and a vector with . Let . To simplify the notation, we use the same symbol to represent the unit vector in the subspace . Notice that . Then, we can show the following equalities by the same argument as the downward DRsubmodular case.
(A.19) 