Group Orbit Optimization: A Unified Approach to Data Normalization

In this paper we propose and study an optimization problem over a matrix group orbit that we call Group Orbit Optimization (GOO). We prove that GOO can be used to induce matrix decomposition techniques such as singular value decomposition (SVD), LU decomposition, QR decomposition, Schur decomposition and Cholesky decomposition, etc. This gives rise to a unified framework for matrix decomposition and allows us to bridge these matrix decomposition methods. Moreover, we generalize GOO for tensor decomposition. As a concrete application of GOO, we devise a new data decomposition method over a special linear group to normalize point cloud data. Experiment results show that our normalization method is able to obtain recovery well from distortions like shearing, rotation and squeezing.

Authors

• 26 publications
• 41 publications
• 11 publications
06/27/2019

Singular Value Decomposition and Neural Networks

Singular Value Decomposition (SVD) constitutes a bridge between the line...
07/02/2013

Novel Factorization Strategies for Higher Order Tensors: Implications for Compression and Recovery of Multi-linear Data

In this paper we propose novel methods for compression and recovery of m...
04/18/2021

Fifty Three Matrix Factorizations: A systematic approach

The success of matrix factorizations such as the singular value decompos...
08/06/2021

Fast and Accurate Low-Rank Tensor Completion Methods Based on QR Decomposition and L_2,1 Norm Minimization

More recently, an Approximate SVD Based on Qatar Riyal (QR) Decompositio...
06/28/2019

Tucker Tensor Decomposition on FPGA

Tensor computation has emerged as a powerful mathematical tool for solvi...
08/07/2018

Modelling hidden structure of signals in group data analysis with modified (Lr, 1) and block-term decompositions

This work is devoted to elaboration on the idea to use block term decomp...
07/14/2019

On improving learning capability of ELM and an application to brain-computer interface

As a type of pseudoinverse learning, extreme learning machine (ELM) is a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Real world data often contain some degrees of freedom that might be redundant. Matrix decomposition

[6, 3, 23]

is an important tool in machine learning and data mining to normalize data. A prominent example of data normalization by matrix decomposition is principal component analysis (PCA). When the given point cloud is represented as a matrix with each row being coordinates of points, PCA removes the degree of freedom in translation and rotation of the point cloud with the help of singular value decomposition (SVD) on the matrix. The selection of particular matrix decomposition corresponds to which degrees of freedom we would like to remove. In the PCA example, SVD extracts an orthonormal basis that makes the normalized data invariant to rotation.

There are cases when other degrees of freedom exist in data. For example, planar objects like digits, characters or iconic symbols, often look distorted in photos because the camera sensor plane may not be parallel to the plane carrying the objects. Therefore in this case, the degrees of freedom we would like to eliminate from data are homography transforms [8], which can be approximated as combination of translation, rotation, shearing and squeezing when the planar objects are sufficient far away relative to their size. However, PCA is not applicable to eliminate these degrees of freedom, because the normalized form found with PCA is not invariant under shearing and squeezing. In general, based on the property of data, we would need new data normalization methods that can uncover invariant structures depending on the degrees of freedom we would like to remove.

In this paper we study the cases when degrees of freedom to be removed have a group structure when combined. Under such a condition, a data matrix can be mapped to its quotient set by the equivalence relation defined as

 x1∼x2⟺∃g∈G,x1=gx2.

We call the elements of quotient set canonical forms of data, as they are invariant with respect to (w.r.t.) group actions . An important example of using the quotient set is the shape space method [4], which works in the quotient space of rotation matrix and is closely related to PCA and SVD.

Here and later, we restrict ourselves to the case when is a matrix group and when the group acts by simple matrix product. The quotient mapping can then be represented in the form of matrix decomposition:

 X=G^X, G∈G.

Instead of constructing separate algorithms for different , we use an optimization process to induce corresponding matrix decomposition techniques. In particular, given a data matrix , we consider a group orbit optimization (GOO) problem as follows:

 (1) infG∈Gϕ(GM),

where is a cost function and is some number field.

In Section 3 we present several special classes of cost functions, which are used to construct new formulations for several matrix decompositions including SVD, Schur, LU, Cholesky and QR in Section 4. As an application, in Section 6 we illustrate how to use GOO to normalize low dimensional point cloud data over a special linear group. Experiment results for two-dimensional and three-dimensional point cloud are given in Figure 1 and Figure 2. It can be observed that the effect of rotation, shearing and squeezing in data has been mostly eliminated in the normalized point clouds. The detail of this normalization is explained in Section 6.

The GOO formulation also allows us to construct generalizations of some matrix decompositions to tensor. Real world data have tensor structure when some value depends on multiple factors. For example, in an electronic-commerce site, user preferences in different brands form a matrix. As such preferences change over time, the time-dependent preferences form a 3 order tensor. As in the matrix case, tensor decomposition techniques [13, 14] aim to eliminate degrees of freedom in data while respecting the tensor structure of data. In Section 5, we use GOO to induce tensor decompositions that can be used for normalizing tensor. In the unified framework of GOO, the GOO inducing tensor decomposition when applied to a 2 order tensor, is exactly the same as the GOO inducing matrix decomposition, when the same group and cost function is used for both GOO problems.

The remainder of paper is organized as follows. Section 2 gives notation used in this paper. Section 3 defines several properties for describing the cost function used in defining GOO to induce matrix and tensor decompositions. Section 4 studies GOO formulations that can induce SVD, Schur, LU, Cholesky, QR, etc. Section 5 demonstrates how to use GOO to induce tensor decompositions and prove a few inequalities relating a few forms of GOO. Section 6 demonstrates how to normalize point cloud data distorted by rotation, shearing and squeezing with GOO over the special linear group. Section 7 presents numerical algorithms and examples of matrix decomposition, point cloud normalization and tensor decomposition. Finally, we conclude the work in Section 9.

2 Notation

2.1 Matrix operation notation

In this paper, we let denote the identity matrix. Given an matrix , we denote and . The -norm of is defined by

 ∥X∥pdef=\joinrel=(∑ij|xij|p)1p

for . Note that we abuse the notation a little bit as is not a norm when . When , it is also called the Frobenius norm and usually denoted by

. When applied to vector

, is the -norm and it is shortened as . The dual norm of the -norm where is equivalent to the -norm, where . We let denote the Schatten -norm; that is, it is the norm of the vector of the singular values of .

Assume that is some number field. Let be the complex conjugate of , and be the complex conjugate transpose of . Let be a vector consisting of the diagonal entries of , and be a matrix with as its diagonals.

Given two matrices and , is their Hadamard product and is the Kronecker product. Similarly, is the Kronecker product of vectors and . For groups and , we denote group as . The Kronecker sum for two square matrices is defined as

 A⊕B=A⊗In+Im⊗B.

A matrix is said to be pseudo-diagonal if there exist permutation matrices and such that is diagonal.

Remark

Note that a diagonal matrix is also pseudo-diagonal.

Given a pseudo-diagonal matrix , we have that

1. , , and are diagonal.

2. There exists a row permutation matrix such that is diagonal.

3. There exists a row permutation matrix such that is diagonal.

We let be the polyhedral formed by points with coordinates being rows of , and be the Lebesgue measure of . We let be a matrix where is the image pixel value at coordinate of image rasterized from polyhedral with unit grid.

2.2 Tensor operation notation

The notation of tensor operations used in this paper mostly follows that of [13]. Given an order- tensor and matrices where , we define to be the inner product over the -th mode. That is, if , then

 yi1⋯ia−1jia+1⋯ik=na∑ia=1xi1i2⋯ikujia.

For shorthand, we denote

 ∏iXUidef=\joinrel=X×1U1×2U2⋯×kUk%.

Here when is also known as the Tucker decomposition in the literature [24]. With this notation, the SVD of a real matrix can be written as

 M=\boldmathΣ\unboldmath×1U1×2U2=2∏i=1\boldmathΣ\unboldmathUi.

Using the vectorization operation for tensor, we have

 vec(∏iXUi)=[Un⊗Un−1⊗⋯⊗U1]vec(X)def=\joinrel=⊗↓iUivec(X),

where we denote as shorthand for .

We let be a map from a sequence of indices to an integer such that

 (2) [vecX]indexn1,n2,…,nk(i1,i2,…,ik)=Xi1,i2,…,ik.

We note that is well-defined.

The unfold operation maps a tensor to a tensor of lower order and is defined by

 fold−1J:Fn1×n2×…×nk↦Fm1×m2×…×ml,

where is an index set grouping of the indices into sets , , and satisfies:

 vec[fold−1J(A)]=vec(A).

When unfolding a single index, i.e., , we also denote as .

The -norm of tensor is defined as

 ∥A∥pdef=\joinrel=∥fold−1iA∥p

for an arbitrary mode . For tensors , is their Frobenius inner product defined as:

 ⟨A,B⟩def=\joinrel=⟨vec(A),vec(B)⟩.

Finally, given and , is defined as a tensor-valued function with applied to each entry of . Therefore, . When , we denote as .

2.3 Group notation

is the orthogonal group over real field . is the special orthogonal group over . is the unitary group over complex field. We let denote the upper-unit-triangular group and denote the lower-unit-triangular group, both of which have all entries along the diagonals being . is the group formed by (calibrated) homography transform below:

 H2w=R2w(I3+p2n⊤d),

where is attitude of the camera; is position of the camera, and is equation of the object plane.

3 Preliminaries

In this paper we would like to show that matrix and tensor decompositions techniques can be induced from formulations of the group orbit optimization. As we have seen in formula (1), a GOO problem includes two key ingredients: a cost function and a group structure . Thus, we present preliminaries, including sparsifying function and a unit matrix group. The sparsifying functions will be used to define cost functions for some matrix decompositions in Table 1 that have diagonal matrices in decomposed formulations.

It should be noted that other classes of functions can be used together with some unit matrix groups to induce interesting matrix and tensor decompositions. Confer Schur decomposition in Table 1 for an example.

3.1 Sparsifying functions

For two functions and , we here and later denote their composition as s.t. . We first prove several utility lemmas used for characterizing sparsifying functions.

1. where .

2. .

First we have that . By the subadditivity of we further have , hence .

If is convex for any , then when , we have:

 n∑i=1f(|xi|)≥nf((n∏i=1|xi|)1n).

Since is convex, we have

 n∑i=1f(|xi|)=n∑i=1f(eln|xi|)≥nf(e1n∑ni=1ln|xi|)=nf((n∏i=1|xi|)1n).

If is strictly concave and , then where , with equality only when or .

We have . Obviously, the first equality holds only when or .

Assume . Then is concave and iff is concave and subadditive.

Because , w.l.o.g. we assume . We first prove “ part”. When and , we trivially have . Otherwise, we have

 f(tx)=f(tx+(1−t)0)≥tf(x)+(1−t)f(0)≥tf(x).

Thus, when or ,

 f(a)+f(b)=f((a+b)aa+b)+f((a+b)ba+b)≥aa+bf(a+b)+ba+bf(a+b)=f(a+b).

As for “ part”, we have . Hence .

Now we are ready to define the sparsifying function. [sparsifying function] A function f is sparsifying if

1. is symmetric about the origin; i.e., ;

2. there is at most one with .

The following theorem gives a sufficient condition for function to be sparsifying.

[sufficient condition for sparsifying] If and is strictly concave and subadditive, then is sparsifying. Because , w.l.o.g. we assume . By Lemma 3.1, is strictly concave and . When , there is no with . Otherwise, it follows from Lemma 3.1 that

Also by Lemma 3.1, the equality holds iff . Because , there is only one with . In both cases, there is at most one with .

Conical combination of sparsifying functions. In particular, if and are sparsifying, then so is where and are two nonnegative constants. As strict concavity is preserved by conical combination, we only need prove subadditivity is preserved by conical combination, which holds because:

 (αf+βg)(x+y) =αf(x+y)+βg(x+y) ≤αf(x)+αf(y)+βg(x)+βg(y) =(αf+βg)(x)+(αf+βg)(y).

It can be directly checked that the following functions are sparsifying.

Example

Following functions are sparsifying:

1. Power function: for ;

2. Capped power function: for ;

3. for ;

4. ;

5. Shannon Entropy: when ;

6. Squared entropy: when ;

7. for and ;

8. for and ;

Remark

We note that is not subadditive because . Although for is subadditive, is not concave. Thus, these two functions are not sparsifying.

3.2 Unit Matrix Groups

[unit group] A matrix group is a unit group if .

Clearly, unitary, orthogonal, and unit-triangular matrix groups are unit groups. We now present some properties of the unit groups.

Unit group has the following properties.

1. Unit group is well-defined, i.e., closed under multiplication and inverse, and has an identity element which happens to be .

2. The Kronecker product of unit groups is also a unit group. In particular, if and are unit groups, then is also a unit group.

3. is a unit group.

4. is a group, and is a unit group iff is a unit group.

5. is a unit group iff is a unit group. is a unit group iff is a unit group.

1. Let . Then and

 |det(G1G2)|=|det(G1)||det(G2)|=1.

Hence and .

2. We first check is a group. This can be done by noting that when , ; and

 (G1⊗G2)(G3⊗G4)=(G1G3)⊗(G2G4).

Also . Moreover, since for any and , is a unit group.

3. Closedness under multiplication and inverse can be proved by noting

 (P⊗P−⊤)(Q⊗Q−⊤)=(PQ)⊗(P−⊤Q−⊤)=(PQ)⊗(PQ)−⊤.

Also we have

 (P⊗P−⊤)−1=P−1⊗P⊤.

Thus forms a group with as the identity. It is also a unit group as .

4. Closedness under multiplication and inverse can be proved based on

 (L⊗Lc)(R⊗Rc)=(LR)⊗(LcRc)=(LR)⊗LRc,

and . Thus forms a group with as the identity. Moreover , i.e., forms a unit group iff is from a unit group.

5. Note is a unit group with single element. By property (ii) we can prove this property.

It is worth pointing out that does not form a group in general because .

Finally, in Table 1 we list matrix decompositions of used in this paper. When referring to the Cholesky decomposition, should be positive definite.

4 Group Orbit Optimization

4.1 Matrix Decomposition Induced from Group Orbit Optimization

4.1.1 GOO formulation

We now illustrate how matrix decomposition can be induced from GOO. Given two groups and a data matrix , we consider the following optimization problem

 (3) infG1∈G1,G2∈G2ϕ(G2MG⊤1).

Assume that and are minimizers of the above GOO and , then we refer to

 M=^G−12D^G−⊤1,

as a matrix decomposition of which is induced from Formula (3).

When , an equivalent formulation of Formula (3) is:

 infG1∈G1,G2∈G2ϕ(G2MG⊤1)≡infG∈Gφ(Gvec(M)),

where and .

4.1.2 GOO over unit group

For a general matrix group , implies that . However, group structure may not be sufficient to induce non-trivial matrix decomposition, as with some groups and cost functions the infimum will be trivially zero. For example, with general linear group and for any matrix , we have

 infG∈GL∥GM∥p=0,

because and

 lims→0infs∈R∥sIM∥p=lims→0s∥M∥p=0.

Nevertheless, if we require to be a unit group, we have . Consequently, we can prevent the infimum from vanishing trivially for any -norm. Thus, we mainly consider the case where is a unit group in this paper.

The following theorem shows that many matrix decompositions can be induced from the group orbit optimization. SVD, LU, QR, Schur and Cholesky decompositions of matrix can be induced from GOO of the form

 infG1∈G1,G2∈G2ϕ(G2MG⊤1),

by using the corresponding unit group and cost function , which are given in Table 2.

Clearly, the matrix groups in Table 2 are unit groups by Lemma 3.2. We will prove the rest of theorem in Section 4.2 and Section 4.3.

Remark

The cost function for SVD, QR and Matrix Equivalence can be . And the cost function for LU, Schur and Cholesky can be .

Remark

The formulation of QR decomposition exploits the fact that is equivalent to where , is upper-triangular, , and is diagonal.

Remark

“Matrix Equivalence” in Table 2

finds a diagonal matrix equivalent to an invertible matrix

as defined in Section 4.2.3.

Remark

However, there are matrix decompositions whose formulation cannot be expressed as GOO in the same way as Table 2. For example, Polar decomposition where and , though derivable from SVD, cannot be induced from a GOO formulation of diagonalization. This is because does not form a group as it is not closed under multiplication. For another example, consider a formulation of decomposition where and is diagonal. As we stated earlier, is not a group in general, so cannot be induced from a GOO formulation of diagonalization.

Remark

For matrix decomposition of the form , where and with

. In this case, we can zero-pad

to , and extend and to and which are square matrices. Accordingly, we formulate a decomposition which may be induced from GOO.

We next prove a lemma that characterizes the optimum. [Criteria for infimum] If for any and there exists s.t. , then

 infG∈Gϕ(GM)=ϕ(D).

We note that . By the group structure, the coset . Hence we have

Using the condition , we have

 infG∈Gϕ(GD)≥infG∈Gϕ(D)=ϕ(D).

On the other hand, as we have . Hence

 ϕ(D)=infG∈Gϕ(GD)=infG∈Gϕ(GM).

By virtue of Lemma 2, if we want to prove that matrix decomposition is induced by a GOO w.r.t.  and , we only need prove that there exists a s.t. , and . The equality condition will determine the uniqueness of the optimum of the optimization problem.

4.2 Matrix Diagonalization as GOO

Next we demonstrate how matrix diagonalization can be induced from GOO with proper choice of cost function and unit group.

4.2.1 Singular Value Decomposition

First we discuss SVD of a complex matrix and of a real matrix. [Cost function and group for SVD] Let be pseudo-diagonal, and . Given a function such that and is strictly concave and subadditive, and we have

 ϕ(UDV∗)≥ϕ(D),

with equality iff there exists a row permutation matrix such that .

Furthermore, if , we have

 (4) ϕ(UDV⊤)≥ϕ(D),

with equality iff there exists a row permutation matrix such that .

First we prove the inequality. We write and . We let be a matrix-valued function of . As is concave and subadditive, by Lemma 3.1 for a vector , we have . Applying this to each column of , we have

 (5) ϕ(A)≥tr[g(A∗A)]=tr[g(VD∗DV∗)].

Alternatively, we can also apply the inequality to each row of and have

 (6) ϕ(A)≥tr[g(AA∗)]=tr[g(UDD∗U∗)].

As is pseudo-diagonal, is diagonal. Because is concave and , we can apply Jensen’s inequality, obtaining

 tr(g(VD∗DV∗))≥tr(V(g(D∗D))V∗).

Hence altogether we have:

 ϕ(A)≥tr(g(VD∗DV∗))≥tr(V(g(D∗D))V∗)=tr(g(D∗D))=∑ijf(dij)=ϕ(D).

Next we check the equality condition. By Theorem 3.1, is sparsifying. For the equality condition in inequality (5) to hold, can have at most one nonzero in each column. By the symmetry between (5) and (6), and noting and , can also have at most one nonzero in each row for to hold. Hence when the equality holds, is pseudo-diagonal. Then there exists a permutation matrix such that is a diagonal matrix with elements on diagonal in descending order and are all non-negative, where is a diagonal matrix s.t. . By the uniqueness of singular values of a matrix, we have . Hence equality in inequality4 holds when .

The proof for is similar.

Note that , modulo sign and permutation, is the global minimizer for a large class of functions .

After applying Lemma 2, we have the following theorem. [SVD induced from optimization] We are given a function such that and is strictly concave and subadditive, and . Let and be an optimal solution of the following optimization:

 infU∈U(n),V∈U(n)ϕ(U∗MV).

Then if SVD of is , there exist a permutation matrix and a diagonal matrix such that and .

With as in Theorem 4.2.1, eignedecomposition of a Hermitian matrix can be induced from

 infU∈U(n)ϕ(U∗MU).

Similarly, eignedecomposition of a real symmetric matrix can be induced from

 infU∈O(n)ϕ(U⊤MU).

From the above optimization, we can derive several inequalities.

[The Schatten -norm and -norm inequality] The -norm of matrix is larger (smaller) than the Schatten -norm of when .

In particular, we have

 ∥M∥p≥infU,V∈U∥UMV∗∥p=∥M∥∗p when 0

and

 ∥M∥p≤supU,V∈U∥UMV∗∥p=∥M∥∗p when p>2.