 # Riemannian optimization on the simplex of positive definite matrices

We discuss optimization-related ingredients for the Riemannian manifold defined by the constraint X_1 + X_2 + ... + X_K = I, where the matrix X_i ≻ 0 is symmetric positive definite of size n× n for all i = {1,...,K }. For the case n =1, the constraint boils down to the popular standard simplex constraint.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Column (or row) stochastic matrices are those where each column (or row) has non-negative entries that sum to

. Such matrices are shown to be useful in many machine learning applications

[LL04, ZCL05, IM07, RBCG08, SGH15]. The constraint of interest for those matrices is

 x1+x2+…+xK=1,where xi≥0 for all i={1,…,K}, (1)

which is also called the standard simplex constraint. Sun et al. [SGH15] proposed a Riemannian geometry for set obtained from the constraint (1) with strictly positive entries. The strict positivity ensures that the set obtained from the constraint (1) has a differentiable manifold structure. They develop optimization-related ingredients to enable first- and second-order optimization.

In this work, we propose to generalize the constraint (1) to constraints with matrices, i.e., the matrix simplex constraint

 X1+X2+…+XK=I, (2)

where is a symmetric positive semidefinite of size for all . Although the constraint (2) is a natural generalization of (1), its study is rather limited [ŘEK05, LSC08]. To that end, we discuss a novel Riemannian geometry for the set obtained from the constraint (2) with strict positive definiteness111Strict positive definiteness of matrices is needed to obtain a differentiable manifold structure. The proposed Riemannian structure allows to handle potentially semidefinite, i.e., rank deficient, matrices gracefully by scaling those elements to the boundary of the manifold. of the matrices. The main aim of the work is to focus on developing optimization-related ingredients that allow to propose optimization algorithms on this constraint set. The expressions of the ingredients extend to the case of Hermitian positive definite matrices. We provide manifold description files for easy integration with the manifold optimization toolbox Manopt [BMAS14]. The files are available at https://bamdevmishra.in.

## 2 The matrix simplex manifold

We define the matrix simplex manifold of interest as

 MKn\coloneqq{(X1,X2,…,XK):X1+X2+…+XK=I,Xi∈Rn×n, and Xi≻0 for all i∈{1,2,…,K}}. (3)

It should be noted that the positive semidefiniteness constraint is replaced with the positive definiteness constraint to ensure that the set is differentiable. Below, we impose a Riemannian structure to the matrix simplex manifold (3) and discuss ingredients that allow to develop optimization algorithms systematically [AMS08].

### 2.1 Riemannian metric and tangent space projector

An element of is numerically represented as the structure which is a collection of symmetric positive definite matrices of size .

The tangent space of at an element is the linearization of the manifold, i.e., the constraint (2). Accordingly, the tangent space characterization of at is

 TxMKn={(ξX1,ξX2,…,ξXK):ξX1+ξX2+…+ξXK=0ξXi∈Rn×n, and ξ⊤Xi=ξXi for all i∈{1,2,…,K}}. (4)

It can be shown that is an embedding submanifold of the , which is the Cartesian product of manifolds of symmetric positive definite matrices of size [AMS08, Chapter 3.3]. Here denotes the manifold of symmetric positive definite matrices that has a well-known Riemannian geometry [Bha09]. The dimension of the manifold is . We endow the manifold with a smooth metric (inner product) at every [AMS08]. A natural choice of the metric is based on the well-known bi-invariant metric of [Bha09], i.e.,

 gx(ξx,ηx)\coloneqq∑itrace(X−1iξXiX−1iηXi). (5)

Once the manifold is endowed with the metric (5), the manifold turns into a Riemannian submanifold of . Following [AMS08, Chapters 3 and 4], the Riemannian submanifold structure allows the computation of the Riemannian gradient and Hessian of a function (on the manifold) in a straightforward manner from the partial derivatives of the function.

A critical ingredient in those computations is the computation of the linear projection operator of a vector in the ambient space

onto the tangent space (4) at an element of . In particular, given in the ambient space, we compute the projection operator , orthogonal with respect to the metric (5), as [MS16]

 Πx(z)=argminξx∈TxMKn−gx(z,ξx)+12gx(ξx,ξx),

which has the expression

 Πx(z)=(Z1+X1ΛX1, Z2+X2ΛX2, …, ZK+XKΛXK), (6)

where is the symmetric matrix that is the solution to the linear system

 ∑iXiΛXi=−∑iZi. (7)

It is easy to verify that

• belongs to the tangent space and

• and complementary to each other with respect to the chosen metric (6)

for all choices of .

### 2.2 Retraction operator

Given a vector in the tangent space, the retraction operator maps it to an element of the manifold [AMS08, Chapter 4]. Overall, the notion of retraction operation allows to move on the manifold, which is required by any optimization algorithm.

A natural choice of the retraction operator on the manifold is inspired from the well-known exponential mapping operation on , the manifold of positive definite matrices [Bha09]. However, this only ensures positive definiteness of the output matrices. To maintain the summation equal to constraint, we additionally normalize in a particular fashion. Overall, given a tangent vector , the expression for the retraction operator is

 Rx(ξx)\coloneqq(Y−1/2sumY1Y−1/2sum, Y−1/2sumY2Y−1/2sum,…, Y−1/2sumYKY−1/2sum), (8)

where , , , and is the matrix exponential operator.

To show that the operator (8) is a retraction operator, we need to verify certain conditions [AMS08, Chapter 4], which are

1. the centering condition: and

2. the local rigidity condition: , where denotes the identity mapping on .

The centering condition for (8) is straightforward to verify by setting . To verify the local rigidity condition, we analyze the differential of the retraction operator locally, which is the composition of two steps: the first one is through the matrix exponential and the second is through the normalization by pre and post multiplying with . The matrix exponential is locally rigid due to the fact that it defines the well-known exponential mapping on the manifold [AMS08, Bha09]. The normalization step (with pre and post multiplying by ) does not change local rigidness. Hence, the overall composition (8) satisfies both the centering and local rigidity conditions needed to be a retraction operation.

### 2.3 Riemannian gradient and Hessian computations

As mentioned earlier, a benefit of the Riemannian submanifold structure is that it allows to compute the Riemannian gradient and Hessian of a function in a systematic manner. To that end, we consider a smooth function on the manifold. We also assume that it is well-defined on .

If is the Euclidean gradient of at , then the Riemannian gradient has the expression

where is the partial derivative of at with respect to and is the tangent space projection operator defined in (6). Here, extracts the symmetric part of a matrix, i.e., .

The computation of the Riemannian Hessian on the manifold involves the notion of Riemannian connection [AMS08, Section 5.5]. The Riemannian connection, denoted as , at generalizes the covariant-derivative of the tangent vector along the direction of the tangent vector on the manifold . Since is a Riemannian submanifold of the manifold , the computation of the Riemannian connection enjoys a simple expression in terms of the computations on the symmetric positive definite manifold [Bha09]. In particular, the Riemannian connection on is obtained by restricting the connection on to the tangent space . The connection on is easy to derive thanks to the well-known Riemannian geometry of . Overall, the Riemannian connection expression for is

 ∇ξxηx=Πx(connection on SKn)=Πx(Dηx[ξx]−(symm(ξX1X−11ηX1),…,symm(ξXKX−1KηXK))), (9)

where denotes the directional derivative of along . Based on the expression (9), the Riemannian Hessian operation along a tangent vector has the expression

which is easy to compute.

### 2.4 Computational cost

The expressions shown earlier involve matrix operations that cost . The solution to the system (7) can be obtained iteratively using standard linear equation solvers. The overall cost for the computations is linear in .

### 2.5 The Hermitian case

The developments in Section 2 easily extend to Hermitian positive definite matrices satisfying the constraint (2). The matrix transpose operation is replaced with the conjugate transpose operation [SH15]. All other expressions are similarly developed.

## 3 Conclusion

We discussed the matrix simplex manifold, as a generalization of the standard simplex constraint to symmetric positive definite matrices, from a Riemannian optimization point of view. As a future research direction, it would be interesting to identify machine learning applications where the matrix simplex constraint arises naturally.

## References

• [AMS08] P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization algorithms on matrix manifolds, Princeton University Press, 2008.
• [Bha09] R. Bhatia, Positive definite matrices, Princeton university press, 2009.
• [BMAS14] N. Boumal, B. Mishra, P.-A. Absil, and R. Sepulchre, Manopt, a Matlab toolbox for optimization on manifolds, Journal of Machine Learning Research 15 (2014), no. Apr, 1455–1459.
• [IM07] Ryo Inokuchi and Sadaaki Miyamoto, C-means clustering on the multinomial manifold

, International Conference on Modeling Decisions for Artificial Intelligence (MDAI), 2007.

• [LL04] G. Lebanon and J. Lafferty, Hyperplane margin classifiers on the multinomial manifold, International conference on Machine learning (ICML), 2004.
• [LSC08] K. L. Lee, J. Shang, W. K. Chua, S. Y. Looi, and B.-G. Englert, Somim: An open-source program code for the numerical search for optimal measurements by an iterative method, Tech. report, arXiv preprint arXiv:0805.2847, 2008.
• [MS16] B. Mishra and R. Sepulchre, Riemannian preconditioning, SIAM Journal on Optimization 26 (2016), no. 1, 635–660.
• [RBCG08] A. Rakotomamonjy, F. R. Bach, S. Canu, and Y. Grandvalet, Simplemkl, Journal of Machine Learning Research 9 (2008), no. Nov, 2491–2521.
• [ŘEK05] J. Řeháček, B.-G. Englert, and D. Kaszlikowski, Iterative procedure for computing accessible information in quantum communication, Physical Review A 71 (2005), no. 5, 054303.
• [SGH15] Y. Sun, J. Gao, X. Hong, B. Mishra, and B. Yin,

Heterogeneous tensor decomposition for clustering via manifold optimization

, IEEE transactions on pattern analysis and machine intelligence 38 (2015), no. 3, 476–489.
• [SH15] S. Sra and R. Hosseini, Conic geometric optimization on the manifold of positive definite matrices, SIAM Journal on Optimization 25 (2015), no. 1, 713–739.
• [ZCL05] D. Zhang, X. Chen, and W. S. Lee, Text classification with kernels on the multinomial manifold, International ACM SIGIR conference on Research and development in information retrieval (SIGIR), 2005.