Column (or row) stochastic matrices are those where each column (or row) has non-negative entries that sum to
. Such matrices are shown to be useful in many machine learning applications[LL04, ZCL05, IM07, RBCG08, SGH15]. The constraint of interest for those matrices is
which is also called the standard simplex constraint. Sun et al. [SGH15] proposed a Riemannian geometry for set obtained from the constraint (1) with strictly positive entries. The strict positivity ensures that the set obtained from the constraint (1) has a differentiable manifold structure. They develop optimization-related ingredients to enable first- and second-order optimization.
In this work, we propose to generalize the constraint (1) to constraints with matrices, i.e., the matrix simplex constraint
where is a symmetric positive semidefinite of size for all . Although the constraint (2) is a natural generalization of (1), its study is rather limited [ŘEK05, LSC08]. To that end, we discuss a novel Riemannian geometry for the set obtained from the constraint (2) with strict positive definiteness111Strict positive definiteness of matrices is needed to obtain a differentiable manifold structure. The proposed Riemannian structure allows to handle potentially semidefinite, i.e., rank deficient, matrices gracefully by scaling those elements to the boundary of the manifold. of the matrices. The main aim of the work is to focus on developing optimization-related ingredients that allow to propose optimization algorithms on this constraint set. The expressions of the ingredients extend to the case of Hermitian positive definite matrices. We provide manifold description files for easy integration with the manifold optimization toolbox Manopt [BMAS14]. The files are available at https://bamdevmishra.in.
2 The matrix simplex manifold
We define the matrix simplex manifold of interest as
It should be noted that the positive semidefiniteness constraint is replaced with the positive definiteness constraint to ensure that the set is differentiable. Below, we impose a Riemannian structure to the matrix simplex manifold (3) and discuss ingredients that allow to develop optimization algorithms systematically [AMS08].
2.1 Riemannian metric and tangent space projector
An element of is numerically represented as the structure which is a collection of symmetric positive definite matrices of size .
The tangent space of at an element is the linearization of the manifold, i.e., the constraint (2). Accordingly, the tangent space characterization of at is
It can be shown that is an embedding submanifold of the , which is the Cartesian product of manifolds of symmetric positive definite matrices of size [AMS08, Chapter 3.3]. Here denotes the manifold of symmetric positive definite matrices that has a well-known Riemannian geometry [Bha09]. The dimension of the manifold is . We endow the manifold with a smooth metric (inner product) at every [AMS08]. A natural choice of the metric is based on the well-known bi-invariant metric of [Bha09], i.e.,
Once the manifold is endowed with the metric (5), the manifold turns into a Riemannian submanifold of . Following [AMS08, Chapters 3 and 4], the Riemannian submanifold structure allows the computation of the Riemannian gradient and Hessian of a function (on the manifold) in a straightforward manner from the partial derivatives of the function.
A critical ingredient in those computations is the computation of the linear projection operator of a vector in the ambient spaceonto the tangent space (4) at an element of . In particular, given in the ambient space, we compute the projection operator , orthogonal with respect to the metric (5), as [MS16]
which has the expression
where is the symmetric matrix that is the solution to the linear system
It is easy to verify that
belongs to the tangent space and
and complementary to each other with respect to the chosen metric (6)
for all choices of .
2.2 Retraction operator
Given a vector in the tangent space, the retraction operator maps it to an element of the manifold [AMS08, Chapter 4]. Overall, the notion of retraction operation allows to move on the manifold, which is required by any optimization algorithm.
A natural choice of the retraction operator on the manifold is inspired from the well-known exponential mapping operation on , the manifold of positive definite matrices [Bha09]. However, this only ensures positive definiteness of the output matrices. To maintain the summation equal to constraint, we additionally normalize in a particular fashion. Overall, given a tangent vector , the expression for the retraction operator is
where , , , and is the matrix exponential operator.
the centering condition: and
the local rigidity condition: , where denotes the identity mapping on .
The centering condition for (8) is straightforward to verify by setting . To verify the local rigidity condition, we analyze the differential of the retraction operator locally, which is the composition of two steps: the first one is through the matrix exponential and the second is through the normalization by pre and post multiplying with . The matrix exponential is locally rigid due to the fact that it defines the well-known exponential mapping on the manifold [AMS08, Bha09]. The normalization step (with pre and post multiplying by ) does not change local rigidness. Hence, the overall composition (8) satisfies both the centering and local rigidity conditions needed to be a retraction operation.
2.3 Riemannian gradient and Hessian computations
As mentioned earlier, a benefit of the Riemannian submanifold structure is that it allows to compute the Riemannian gradient and Hessian of a function in a systematic manner. To that end, we consider a smooth function on the manifold. We also assume that it is well-defined on .
If is the Euclidean gradient of at , then the Riemannian gradient has the expression
where is the partial derivative of at with respect to and is the tangent space projection operator defined in (6). Here, extracts the symmetric part of a matrix, i.e., .
The computation of the Riemannian Hessian on the manifold involves the notion of Riemannian connection [AMS08, Section 5.5]. The Riemannian connection, denoted as , at generalizes the covariant-derivative of the tangent vector along the direction of the tangent vector on the manifold . Since is a Riemannian submanifold of the manifold , the computation of the Riemannian connection enjoys a simple expression in terms of the computations on the symmetric positive definite manifold [Bha09]. In particular, the Riemannian connection on is obtained by restricting the connection on to the tangent space . The connection on is easy to derive thanks to the well-known Riemannian geometry of . Overall, the Riemannian connection expression for is
where denotes the directional derivative of along . Based on the expression (9), the Riemannian Hessian operation along a tangent vector has the expression
which is easy to compute.
2.4 Computational cost
The expressions shown earlier involve matrix operations that cost . The solution to the system (7) can be obtained iteratively using standard linear equation solvers. The overall cost for the computations is linear in .
2.5 The Hermitian case
We discussed the matrix simplex manifold, as a generalization of the standard simplex constraint to symmetric positive definite matrices, from a Riemannian optimization point of view. As a future research direction, it would be interesting to identify machine learning applications where the matrix simplex constraint arises naturally.
- [AMS08] P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization algorithms on matrix manifolds, Princeton University Press, 2008.
- [Bha09] R. Bhatia, Positive definite matrices, Princeton university press, 2009.
- [BMAS14] N. Boumal, B. Mishra, P.-A. Absil, and R. Sepulchre, Manopt, a Matlab toolbox for optimization on manifolds, Journal of Machine Learning Research 15 (2014), no. Apr, 1455–1459.
Ryo Inokuchi and Sadaaki Miyamoto, C-means clustering on the multinomial
, International Conference on Modeling Decisions for Artificial Intelligence (MDAI), 2007.
- [LL04] G. Lebanon and J. Lafferty, Hyperplane margin classifiers on the multinomial manifold, International conference on Machine learning (ICML), 2004.
- [LSC08] K. L. Lee, J. Shang, W. K. Chua, S. Y. Looi, and B.-G. Englert, Somim: An open-source program code for the numerical search for optimal measurements by an iterative method, Tech. report, arXiv preprint arXiv:0805.2847, 2008.
- [MS16] B. Mishra and R. Sepulchre, Riemannian preconditioning, SIAM Journal on Optimization 26 (2016), no. 1, 635–660.
- [RBCG08] A. Rakotomamonjy, F. R. Bach, S. Canu, and Y. Grandvalet, Simplemkl, Journal of Machine Learning Research 9 (2008), no. Nov, 2491–2521.
- [ŘEK05] J. Řeháček, B.-G. Englert, and D. Kaszlikowski, Iterative procedure for computing accessible information in quantum communication, Physical Review A 71 (2005), no. 5, 054303.
Y. Sun, J. Gao, X. Hong, B. Mishra, and B. Yin,
Heterogeneous tensor decomposition for clustering via manifold optimization, IEEE transactions on pattern analysis and machine intelligence 38 (2015), no. 3, 476–489.
- [SH15] S. Sra and R. Hosseini, Conic geometric optimization on the manifold of positive definite matrices, SIAM Journal on Optimization 25 (2015), no. 1, 713–739.
- [ZCL05] D. Zhang, X. Chen, and W. S. Lee, Text classification with kernels on the multinomial manifold, International ACM SIGIR conference on Research and development in information retrieval (SIGIR), 2005.