Real world data often contain some degrees of freedom that might be redundant. Matrix decomposition[6, 3, 23]
is an important tool in machine learning and data mining to normalize data. A prominent example of data normalization by matrix decomposition is principal component analysis (PCA). When the given point cloud is represented as a matrix with each row being coordinates of points, PCA removes the degree of freedom in translation and rotation of the point cloud with the help of singular value decomposition (SVD) on the matrix. The selection of particular matrix decomposition corresponds to which degrees of freedom we would like to remove. In the PCA example, SVD extracts an orthonormal basis that makes the normalized data invariant to rotation.
There are cases when other degrees of freedom exist in data. For example, planar objects like digits, characters or iconic symbols, often look distorted in photos because the camera sensor plane may not be parallel to the plane carrying the objects. Therefore in this case, the degrees of freedom we would like to eliminate from data are homography transforms , which can be approximated as combination of translation, rotation, shearing and squeezing when the planar objects are sufficient far away relative to their size. However, PCA is not applicable to eliminate these degrees of freedom, because the normalized form found with PCA is not invariant under shearing and squeezing. In general, based on the property of data, we would need new data normalization methods that can uncover invariant structures depending on the degrees of freedom we would like to remove.
In this paper we study the cases when degrees of freedom to be removed have a group structure when combined. Under such a condition, a data matrix can be mapped to its quotient set by the equivalence relation defined as
We call the elements of quotient set canonical forms of data, as they are invariant with respect to (w.r.t.) group actions . An important example of using the quotient set is the shape space method , which works in the quotient space of rotation matrix and is closely related to PCA and SVD.
Here and later, we restrict ourselves to the case when is a matrix group and when the group acts by simple matrix product. The quotient mapping can then be represented in the form of matrix decomposition:
Instead of constructing separate algorithms for different , we use an optimization process to induce corresponding matrix decomposition techniques. In particular, given a data matrix , we consider a group orbit optimization (GOO) problem as follows:
where is a cost function and is some number field.
In Section 3 we present several special classes of cost functions, which are used to construct new formulations for several matrix decompositions including SVD, Schur, LU, Cholesky and QR in Section 4. As an application, in Section 6 we illustrate how to use GOO to normalize low dimensional point cloud data over a special linear group. Experiment results for two-dimensional and three-dimensional point cloud are given in Figure 1 and Figure 2. It can be observed that the effect of rotation, shearing and squeezing in data has been mostly eliminated in the normalized point clouds. The detail of this normalization is explained in Section 6.
The GOO formulation also allows us to construct generalizations of some matrix decompositions to tensor. Real world data have tensor structure when some value depends on multiple factors. For example, in an electronic-commerce site, user preferences in different brands form a matrix. As such preferences change over time, the time-dependent preferences form a 3 order tensor. As in the matrix case, tensor decomposition techniques [13, 14] aim to eliminate degrees of freedom in data while respecting the tensor structure of data. In Section 5, we use GOO to induce tensor decompositions that can be used for normalizing tensor. In the unified framework of GOO, the GOO inducing tensor decomposition when applied to a 2 order tensor, is exactly the same as the GOO inducing matrix decomposition, when the same group and cost function is used for both GOO problems.
The remainder of paper is organized as follows. Section 2 gives notation used in this paper. Section 3 defines several properties for describing the cost function used in defining GOO to induce matrix and tensor decompositions. Section 4 studies GOO formulations that can induce SVD, Schur, LU, Cholesky, QR, etc. Section 5 demonstrates how to use GOO to induce tensor decompositions and prove a few inequalities relating a few forms of GOO. Section 6 demonstrates how to normalize point cloud data distorted by rotation, shearing and squeezing with GOO over the special linear group. Section 7 presents numerical algorithms and examples of matrix decomposition, point cloud normalization and tensor decomposition. Finally, we conclude the work in Section 9.
2.1 Matrix operation notation
In this paper, we let denote the identity matrix. Given an matrix , we denote and . The -norm of is defined by
for . Note that we abuse the notation a little bit as is not a norm when . When , it is also called the Frobenius norm and usually denoted by
. When applied to vector, is the -norm and it is shortened as . The dual norm of the -norm where is equivalent to the -norm, where . We let denote the Schatten -norm; that is, it is the norm of the vector of the singular values of .
Assume that is some number field. Let be the complex conjugate of , and be the complex conjugate transpose of . Let be a vector consisting of the diagonal entries of , and be a matrix with as its diagonals.
Given two matrices and , is their Hadamard product and is the Kronecker product. Similarly, is the Kronecker product of vectors and . For groups and , we denote group as . The Kronecker sum for two square matrices is defined as
A matrix is said to be pseudo-diagonal if there exist permutation matrices and such that is diagonal.
Note that a diagonal matrix is also pseudo-diagonal.
Given a pseudo-diagonal matrix , we have that
, , and are diagonal.
There exists a row permutation matrix such that is diagonal.
There exists a row permutation matrix such that is diagonal.
We let be the polyhedral formed by points with coordinates being rows of , and be the Lebesgue measure of . We let be a matrix where is the image pixel value at coordinate of image rasterized from polyhedral with unit grid.
2.2 Tensor operation notation
The notation of tensor operations used in this paper mostly follows that of . Given an order- tensor and matrices where , we define to be the inner product over the -th mode. That is, if , then
For shorthand, we denote
Here when is also known as the Tucker decomposition in the literature . With this notation, the SVD of a real matrix can be written as
Using the vectorization operation for tensor, we have
where we denote as shorthand for .
We let be a map from a sequence of indices to an integer such that
We note that is well-defined.
The unfold operation maps a tensor to a tensor of lower order and is defined by
where is an index set grouping of the indices into sets , , and satisfies:
When unfolding a single index, i.e., , we also denote as .
The -norm of tensor is defined as
for an arbitrary mode . For tensors , is their Frobenius inner product defined as:
Finally, given and , is defined as a tensor-valued function with applied to each entry of . Therefore, . When , we denote as .
2.3 Group notation
is the orthogonal group over real field . is the special orthogonal group over . is the unitary group over complex field. We let denote the upper-unit-triangular group and denote the lower-unit-triangular group, both of which have all entries along the diagonals being . is the group formed by (calibrated) homography transform below:
where is attitude of the camera; is position of the camera, and is equation of the object plane.
In this paper we would like to show that matrix and tensor decompositions techniques can be induced from formulations of the group orbit optimization. As we have seen in formula (1), a GOO problem includes two key ingredients: a cost function and a group structure . Thus, we present preliminaries, including sparsifying function and a unit matrix group. The sparsifying functions will be used to define cost functions for some matrix decompositions in Table 1 that have diagonal matrices in decomposed formulations.
It should be noted that other classes of functions can be used together with some unit matrix groups to induce interesting matrix and tensor decompositions. Confer Schur decomposition in Table 1 for an example.
3.1 Sparsifying functions
For two functions and , we here and later denote their composition as s.t. . We first prove several utility lemmas used for characterizing sparsifying functions.
[Subadditive properties] If is subadditive, then
First we have that . By the subadditivity of we further have , hence .
If is convex for any , then when , we have:
Since is convex, we have
If is strictly concave and , then where , with equality only when or .
We have . Obviously, the first equality holds only when or .
Assume . Then is concave and iff is concave and subadditive.
Because , w.l.o.g. we assume . We first prove “ part”. When and , we trivially have . Otherwise, we have
Thus, when or ,
As for “ part”, we have . Hence .
Now we are ready to define the sparsifying function. [sparsifying function] A function f is sparsifying if
is symmetric about the origin; i.e., ;
there is at most one with .
The following theorem gives a sufficient condition for function to be sparsifying.
[sufficient condition for sparsifying] If and is strictly concave and subadditive, then is sparsifying. Because , w.l.o.g. we assume . By Lemma 3.1, is strictly concave and . When , there is no with . Otherwise, it follows from Lemma 3.1 that
Also by Lemma 3.1, the equality holds iff . Because , there is only one with . In both cases, there is at most one with .
Conical combination of sparsifying functions. In particular, if and are sparsifying, then so is where and are two nonnegative constants. As strict concavity is preserved by conical combination, we only need prove subadditivity is preserved by conical combination, which holds because:
It can be directly checked that the following functions are sparsifying.
Following functions are sparsifying:
Power function: for ;
Capped power function: for ;
Shannon Entropy: when ;
Squared entropy: when ;
for and ;
for and ;
We note that is not subadditive because . Although for is subadditive, is not concave. Thus, these two functions are not sparsifying.
3.2 Unit Matrix Groups
[unit group] A matrix group is a unit group if .
Clearly, unitary, orthogonal, and unit-triangular matrix groups are unit groups. We now present some properties of the unit groups.
Unit group has the following properties.
Unit group is well-defined, i.e., closed under multiplication and inverse, and has an identity element which happens to be .
The Kronecker product of unit groups is also a unit group. In particular, if and are unit groups, then is also a unit group.
is a unit group.
is a group, and is a unit group iff is a unit group.
is a unit group iff is a unit group. is a unit group iff is a unit group.
Let . Then and
Hence and .
We first check is a group. This can be done by noting that when , ; and
Also . Moreover, since for any and , is a unit group.
Closedness under multiplication and inverse can be proved by noting
Also we have
Thus forms a group with as the identity. It is also a unit group as .
Closedness under multiplication and inverse can be proved based on
and . Thus forms a group with as the identity. Moreover , i.e., forms a unit group iff is from a unit group.
Note is a unit group with single element. By property (ii) we can prove this property.
It is worth pointing out that does not form a group in general because .
Finally, in Table 1 we list matrix decompositions of used in this paper. When referring to the Cholesky decomposition, should be positive definite.
|real SVD||, is diagonal|
|complex SVD||, is diagonal|
|QR||, is diagonal|
|LU||, is diagonal|
|Cholesky||, is diagonal|
|Schur||, is upper triangular|
4 Group Orbit Optimization
4.1 Matrix Decomposition Induced from Group Orbit Optimization
4.1.1 GOO formulation
We now illustrate how matrix decomposition can be induced from GOO. Given two groups and a data matrix , we consider the following optimization problem
Assume that and are minimizers of the above GOO and , then we refer to
as a matrix decomposition of which is induced from Formula (3).
When , an equivalent formulation of Formula (3) is:
where and .
4.1.2 GOO over unit group
For a general matrix group , implies that . However, group structure may not be sufficient to induce non-trivial matrix decomposition, as with some groups and cost functions the infimum will be trivially zero. For example, with general linear group and for any matrix , we have
Nevertheless, if we require to be a unit group, we have . Consequently, we can prevent the infimum from vanishing trivially for any -norm. Thus, we mainly consider the case where is a unit group in this paper.
The following theorem shows that many matrix decompositions can be induced from the group orbit optimization. SVD, LU, QR, Schur and Cholesky decompositions of matrix can be induced from GOO of the form
by using the corresponding unit group and cost function , which are given in Table 2.
The cost function for SVD, QR and Matrix Equivalence can be . And the cost function for LU, Schur and Cholesky can be .
|Decomposition||Unit group||Objective function|
|real SVD:||where is strictly concave,|
|complex SVD:||where is strictly concave,|
|QR:||where is strictly concave and increasing,|
|Matrix Equivalence:||where , is strictly concave and increasing; is convex|
|LU:||where , ,|
|Cholesky:||where , ,|
|Schur:||where , ,|
The formulation of QR decomposition exploits the fact that is equivalent to where , is upper-triangular, , and is diagonal.
However, there are matrix decompositions whose formulation cannot be expressed as GOO in the same way as Table 2. For example, Polar decomposition where and , though derivable from SVD, cannot be induced from a GOO formulation of diagonalization. This is because does not form a group as it is not closed under multiplication. For another example, consider a formulation of decomposition where and is diagonal. As we stated earlier, is not a group in general, so cannot be induced from a GOO formulation of diagonalization.
For matrix decomposition of the form , where and with .
In this case, we can zero-pad
. In this case, we can zero-padto , and extend and to and which are square matrices. Accordingly, we formulate a decomposition which may be induced from GOO.
We next prove a lemma that characterizes the optimum. [Criteria for infimum] If for any and there exists s.t. , then
We note that . By the group structure, the coset . Hence we have
Using the condition , we have
On the other hand, as we have . Hence
By virtue of Lemma 2, if we want to prove that matrix decomposition is induced by a GOO w.r.t. and , we only need prove that there exists a s.t. , and . The equality condition will determine the uniqueness of the optimum of the optimization problem.
4.2 Matrix Diagonalization as GOO
Next we demonstrate how matrix diagonalization can be induced from GOO with proper choice of cost function and unit group.
4.2.1 Singular Value Decomposition
First we discuss SVD of a complex matrix and of a real matrix. [Cost function and group for SVD] Let be pseudo-diagonal, and . Given a function such that and is strictly concave and subadditive, and we have
with equality iff there exists a row permutation matrix such that .
Furthermore, if , we have
with equality iff there exists a row permutation matrix such that .
First we prove the inequality. We write and . We let be a matrix-valued function of . As is concave and subadditive, by Lemma 3.1 for a vector , we have . Applying this to each column of , we have
Alternatively, we can also apply the inequality to each row of and have
As is pseudo-diagonal, is diagonal. Because is concave and , we can apply Jensen’s inequality, obtaining
Hence altogether we have:
Next we check the equality condition. By Theorem 3.1, is sparsifying. For the equality condition in inequality (5) to hold, can have at most one nonzero in each column. By the symmetry between (5) and (6), and noting and , can also have at most one nonzero in each row for to hold. Hence when the equality holds, is pseudo-diagonal. Then there exists a permutation matrix such that is a diagonal matrix with elements on diagonal in descending order and are all non-negative, where is a diagonal matrix s.t. . By the uniqueness of singular values of a matrix, we have . Hence equality in inequality4 holds when .
The proof for is similar.
Note that , modulo sign and permutation, is the global minimizer for a large class of functions .
After applying Lemma 2, we have the following theorem. [SVD induced from optimization] We are given a function such that and is strictly concave and subadditive, and . Let and be an optimal solution of the following optimization:
Then if SVD of is , there exist a permutation matrix and a diagonal matrix such that and .
With as in Theorem 4.2.1, eignedecomposition of a Hermitian matrix can be induced from
Similarly, eignedecomposition of a real symmetric matrix can be induced from
From the above optimization, we can derive several inequalities.
[The Schatten -norm and -norm inequality] The -norm of matrix is larger (smaller) than the Schatten -norm of when .
In particular, we have