Constrained matrix decompositions are among the basic methods for unsupervised data analysis. These techniques play a role in many scientific and engineering fields, ranging from environmental engineering [PT94] and neuroscience [OF96] to signal processing [Com94] and statistics [ZHT06]. Constrained factorizations are powerful tools for identifying latent structure in a matrix; they also support data compression, summarization, and visualization.
The literature contains a number of frameworks [TB99, CDS02, Tro04, Sre04, Wit10, Jag11, Bac13, Ude15, BE16, Bru17, HV19] for thinking about constrained matrix factorization and for developing algorithms that pursue these factorizations. Nevertheless, we still lack theory that fully justifies these approaches. For instance, researchers have only attained a partial understanding of which factorization models are identifiable and which ones we can compute provably using efficient algorithms.
The purpose of this paper and its companion [KT19] is to develop foundational results on factorization models that we call binary component decompositions. In these models, one (or both) of the factors takes values in the set or in the set . Binary component decompositions are appropriate when the latent factors reflect an exclusive choice. From a mathematical perspective, these constrained factorizations also happen to be among the easiest ones to understand.
In this second paper, we consider the problem of factorizing a rectangular matrix into a binary factor and an unconstrained matrix of weights. We develop results on existence, uniqueness, tractable computation, and robustness to gross errors. Our analysis builds heavily on the work in the companion paper [KT19], which treats the problem of decomposing a positive-semidefinite matrix into symmetric binary factors.
We rely on standard notation from linear algebra and optimization. Scalars are written with lowercase Roman or Greek letters (); lowercase bold letters (
) denote (column) vectors; uppercase bold letters () denote matrices. We reserve calligraphic letters () for sets. The symbol suppresses universal constants.
Throughout, and are natural numbers. We work in the real linear spaces and equipped with the standard inner product and the associated norm topology featuring . The standard basis vector has a one in the th coordinate and zeros elsewhere, while is the vector of ones; the dimension of these vectors depends on context. The map transposes a vector or matrix. The binary operator
is the Schur (i.e., componentwise) product of vectors. The closed and open probability simplices are the sets
We write for the linear space of symmetric real matrices. The symbol
denotes the identity matrix, anddenotes the matrix of ones; their dimensions are determined by the context. The dagger refers to the Moore–Penrose pseudoinverse. A positive-semidefinite (psd) matrix is a symmetric matrix that satisfies for all vectors with compatible dimension. The statement means that is psd, and means that is strictly positive definite, i.e. for all vectors with compatible dimension.
2. Sign component decomposition and binary component decomposition
We begin with a short discussion of the singular-value decomposition and its properties (Section2.1). Afterward, we introduce the two factorizations that we treat in this paper, the sign component decomposition (Section 2.2) and the binary component decomposition (Section 2.3). We present our main results on situations where these factorizations are uniquely determined and when they can be computed using efficient algorithms. An outline of the rest of the paper appears in Section 2.5.
2.1. The singular-value decomposition
We begin with the singular-value decomposition (SVD), the royal emperor among all matrix factorizations. Let be a rectangular matrix. For some natural number , we can decompose this matrix as
In this expression, and are orthonormal families of left and right singular vectors associated with the positive singular values . We can also convert the decomposition (2.1) into a matrix factorization:
The matrices and are orthonormal; that is, and .
The singular-value decomposition is intimately connected to the problem of finding a best low-rank approximation of a matrix [Mir60]. Indeed, for any unitarily invariant norm ,
This variational property has a wide range of consequences, both theoretical and applied.
The singular-value decomposition also holds a distinguished place in statistics because of its connection with principal component analysis [Jol02]. Given a data matrix with standardized111A vector is standardized if its entries sum to zero and its Euclidean norm equals one. rows, we can perform a singular-value decomposition to express , where . In this setting, the left singular vectors are called principal components, the directions in which the columns of exhibit the most variability. The entries of the matrix are called weights or loadings; they are the coefficients with which we combine the principal components to express the original data points.
On the positive side of the ledger, the singular-value decomposition (2.1)–(2.2) always exists, and it is uniquely determined when the (nonzero) singular values are distinct. Moreover, we can compute the singular-value decomposition, up to a fixed (high) accuracy, by means of highly refined algorithms, in polynomial time.
On the negative side, we cannot impose constraints on the singular vectors to enforce prior knowledge about the data. Second, we generally cannot assign an interpretation or meaning to the singular vectors, without committing the sin of reification. Moreover, the orthogonality of singular vectors may not be an appropriate constraint in applications. Structured matrix factorizations are designed to address one or more of these shortcomings.
2.2. Sign component decomposition
In this project, we consider matrix factorization models where one of the factors is required to take binary values. In this section, we treat the case where the entries of the binary factor are limited to the set . In Section 2.3, we turn to the case where the entries are drawn from the set .
2.2.1. The decomposition
As before, assume that is a rectangular matrix. We seek a decomposition of the form
This factorization can also be written in vector notation as
We call (2.3)–(2.4) an (asymmetric) sign component decomposition of the matrix . The left factor is called the sign component; its columns are also called sign components. The right factor is unconstrained; its entries are called weights or loadings. See Figure 2.1 for an illustration.
It is not hard to show that each matrix admits a plethora of distinct sign component decompositions (2.3) where the inner dimension is ; see Proposition 4.1. It is more interesting to consider a low-rank matrix and to search for minimal decompositions, those where the inner dimension of the factorization (2.3) equals the rank of .
Remark 2.1 (Matrix sign function).
The sign component decomposition must not be confused with the matrix sign function, which is a spectral computation related to the polar factorization [Hig08, Chap. 5].
2.2.2. Schur independence
The sign component decomposition (2.3)–(2.4) has a combinatorial quality, which suggests that it might be hard to find. Remarkably, there is a large class of matrices for which we can tractably compute a minimal sign component decomposition. The core requirement is that the sign components must be somewhat different. The following definition [LP96, KT19] encapsulates this idea.
Definition 2.2 (Schur independence of sign vectors).
A set of sign vectors is Schur independent when the set
By extension, we also say that the sign matrix is Schur independent when its columns form a Schur independent set.
Fact 2.3 (Schur independence).
Assume that the set of sign vectors is Schur independent. We have the following consequences.
The family is linearly independent.
Each subset of is Schur independent.
For any choice of signs, the set remains Schur independent.
The cardinality of the set satisfies .
We can determine whether or not is Schur independent in polynomial time.
The main result of this paper is an algorithm for computing the minimal asymmetric sign component decomposition of a low-rank matrix. This algorithm succeeds precisely when the sign component is Schur independent. Moreover, this condition is sufficient to ensure that the sign component decomposition is essentially unique.
Theorem I (Sign component decomposition).
Let be a matrix that admits a sign component decomposition where
The sign matrix is Schur independent;
The weight matrix has full column rank.
Then the minimal sign component decomposition (with inner dimension ) is determined up to simultaneous sign flips and permutations of the columns of the factors. Algorithm 1 computes this decomposition in time polynomial in .
Theorem I identifies a rich set of factorizable matrices for which exact identification is always tractable and essentially unique. Moreover, existing denoising techniques allow us to compute the factorization in the presence of gross errors; see Section 7. Small perturbations appear more challenging; we will study this problem in future work.
It is surprising that the exact sign component decomposition is tractable. Most existing approaches to structured matrix factorization only produce approximations, and many of these approaches lack rigorous guarantees. The companion paper [KT19, Sec. 8] contains a discussion of the related work.
2.3. Binary component decomposition
The asymmetric sign component decomposition also serves as a primitive that allows us to compute other discrete matrix factorizations. In this section, we turn to the problem of producing a decomposition where one component takes values in the set .
2.3.1. The decomposition
Suppose that is a rectangular matrix. We consider a decomposition of the form
The vector formulation of this decomposition is
We refer to (2.5)–(2.6) as an (asymmetric) binary component decomposition of the matrix . The left factor is called the binary component, and its columns are also called binary components. The right factor is unconstrained; we refer to it as a weight matrix.
2.3.2. Schur independence
We can reduce the problem of computing a binary component decomposition to the problem of computing a sign component decomposition.
To do so, we first observe that there is an affine map that places the binary vectors and sign vectors in one-to-one correspondence:
We can extend the map to a matrix by applying it to each column. This correspondence suggests that there should also be a concept of Schur independence for binary vectors. Here is the notion that suits our purposes.
Definition 2.4 (Schur independence of binary vectors).
A set of binary vectors is Schur independent when the set
By extension, we say that a binary matrix is Schur independent when its columns compose a Schur independent set.
The following result [KT19, Prop. 6.3] describes the precise connection between the two flavors of Schur independence.
Fact 2.5 (Kueng & Tropp).
The binary matrix is Schur independent if and only if the sign matrix is Schur independent.
With these definitions at hand, we can state our main result on binary component decompositions.
Theorem II (Binary component decomposition).
Let be a matrix that admits a binary component decomposition where
The binary matrix is Schur independent;
The weight matrix has full column rank.
Then the minimal binary component decomposition (with inner dimension ) is determined up to simultaneous permutation of the columns of the factors. Algorithm 2 computes the decomposition in time polynomial in .
2.4. The planted sign basis problem
Problem 2.6 (Planted sign basis).
Let be an -dimensional subspace that admits a sign basis:
Given the subspace , find a sign basis for the subspace.
To clarify, we can assume that the problem data is a matrix whose range equals the -dimensional subspace . We must output a set of sign vectors that generates the subspace. The brute force approach may require us to sift through around families of sign vectors. Is it possible to solve the problem more efficiently?
Let us outline a solution for Problem 2.6 in the case where has a sign basis that is Schur independent. This is a rather mild deterministic condition, provided that the dimension of the subspace satisfies . The hypothesis also guarantees that the basis is determined up to permutation and sign flips, per Theorem I.
Here is how we solve the problem. Let be a matrix whose range coincides with the subspace . A Schur independent set is linearly independent, so we can write the matrix in the form , where and the weight matrix has full column rank. As a consequence, we can apply Algorithm 1 to the matrix to obtain a sign component decomposition . Theorem I ensures that the columns of coincide with the columns of up to sign flips and permutations. In other words, the columns of compose the (unique) sign basis that generates . In summary, we can solve Problem 2.6 for any subspace that is spanned by a Schur independent family of sign vectors.
We continue with a discussion about symmetric sign component decompositions in Section 3. In Section 4, we develop basic results about existence and uniqueness of asymmetric sign component decompositions. Section 5 explains how to compute an sign component decomposition. We turn to binary component decomposition in Section 6. Finally, in Section 7, we state some results on robustness of sign component decomposition which we prove in the appendices. For a discussion of related work, see the companion paper [KT19, Sec. 8].
3. Symmetric sign component decomposition
This section contains a summary of the principal results from the companion paper [KT19]. These results play a core role in our study of asymmetric factorizations.
3.1. Signed permutations
Matrix factorizations are usually not fully determined because they are invariant under some group of symmetries. For example, consider the decomposition of a psd matrix as the outer product of two symmetric factors:
Each of the factorizations on the right is equally valid, because there is no constraint that forbids rotations.
For binary component decompositions, permutations compose the relevant symmetry group.
Definition 3.1 (Permutation).
A permutation on letters is an element of the symmetric group . A permutation acts on via the linear map . This linear map can be represented by the permutation matrix whose entries take the form where and are zero otherwise. A permutation matrix is orthogonal: .
For sign component decompositions, the signed permutations make up the relevant symmetry group.
Definition 3.2 (Signed permutation).
A signed permutation on letters is a pair consisting of a permutation on letters and a sign vector . The signed permutation acts on via the linear map . This linear map can also be represented by the signed permutation matrix whose entries satisfy when and are otherwise zero. Each signed permutation matrix is orthogonal.
3.2. Symmetric sign component decomposition
In the companion paper [KT19]
, we explored the problem of computing a (symmetric) sign component decomposition of a correlation matrix. This research provides the foundation for the asymmetric sign component decomposition. Let us take a moment to present the principal definitions and results from the associated work.
Let be a correlation matrix; that is, is psd with all diagonal entries equal to one. We say that has a symmetric sign component decomposition when
In vector form,
The sign matrix is called the sign component, while the positive diagonal matrix, , is a list of convex coefficients. Not all correlation matrices admit a symmetric sign component decomposition, nor does the factorization need to be uniquely determined; see [KT19] for a full discussion.
The situation improves markedly when the sign component is Schur independent. In this case, the sign component decomposition is essentially unique, and we can compute it by means of an efficient algorithm [KT19, Thm. I].
Fact 3.3 (Kueng & Tropp).
Let be a correlation matrix that admits a sign component decomposition:
Then the sign component decomposition of is determined up to signed permutation. Moreover, with probability one, Algorithm 3 computes the sign component decomposition. That is, the output is a pair where the sign matrix and the convex coefficients (), for a signed permutation matrix .
Fact 3.4 (Kueng & Tropp).
Suppose that is a Schur independent sign matrix, and let be the orthogonal projector onto . Then
Fact 3.4 is a powerful tool for working with sign component decompositions. Indeed, we can compute the projector onto the range of a Schur independent sign matrix directly from any particular correlation matrix with . As a consequence, the identity (3.2) provides an alternative representation for the set of all correlation matrices with sign component , which allows us to optimize over this set. Fact 3.4 also plays a critical role in our method for computing an asymmetric sign component decomposition.
4. Existence and uniqueness of the asymmetric sign component decomposition
In this section, we begin our investigation of the asymmetric sign component decomposition. We lay out some of the basic questions, and we start to deliver the answers.
Existence: Which matrices admit a sign component decomposition?
Uniqueness: When is the sign component decomposition unique, modulo symmetries?
Computation: How can we find a sign component decomposition in polynomial time?
Robustness: How can we find a sign component decomposition from a noisy observation?
We quickly dispatch the first question, which concerns the existence of asymmetric sign component decompositions.
Proposition 4.1 (Sign component decomposition: Existence).
Every matrix admits a sign component decomposition (2.3) with inner dimension .
Let be a nonsingular matrix of signs. Define the second factor . ∎
As an aside, we remark that nonsingular sign matrices are ubiquitous. Indeed, a uniformly random element of is nonsingular with exceedingly high probability [Tik18].
Proposition 4.1 ensures that every matrix has an exorbitant number of sign component decompositions. Therefore, we need to burden the factorization with extra conditions before it is determined uniquely. We intend to focus on minimal factorizations, where the target matrix has rank , and the number of sign components coincides with the rank.
Like many other matrix factorizations, the sign component decomposition has some symmetries that we can never resolve. Before we can turn to the question of uniqueness, we need to discuss invariants of the factorization.
For a signed permutation on letters with associated signed permutation matrix , we have
Observe that remains a sign matrix. Therefore, and are both sign component decompositions of .
We have no cause to prefer one of the sign component decompositions induced by a signed permutation over the others. Thus, it is appropriate to treat them all as equivalent.
Definition 4.2 (Sign component decomposition: Equivalence).
Suppose that and are two sign component decompositions (2.3) with the same inner dimension . We say that the decompositions are equivalent if there is a signed permutation matrix for which and .
Alternatively, consider two sign component decompositions and with the same number of terms. The decompositions are equivalent if there is a signed permutation on letters for which and for each .
4.4. The role of Schur independence
As we have just seen, signed permutations preserve the class of sign component decompositions of a given matrix. Meanwhile, the proof of Proposition 4.1 warns us that we can sometimes map one sign component decomposition to an inequivalent decomposition via an invertible transformation. Remarkably, we can preclude the latter phenomenon by narrowing our attention to Schur independent sign matrices. In this case, sign permutations are the only invertible transformations that respect the sign structure.
Proposition 4.3 (Schur independence: Transformations).
be a Schur independent sign matrix, and let be an invertible matrix.
be an invertible matrix. Thenis a sign matrix if and only if is a signed permutation.
If is a signed permutation, then it is immediate that is a sign matrix. The reverse implication is the more interesting fact.
Introduce notation for the columns of the matrices under discussion:
For each index , the th column of the matrix satisfies
By assumption, is a sign vector, so
Schur independence of the matrix ensures that the family is linearly independent. As a consequence,
Since solves this quadratic system, it must be a signed standard basis vector: for a sign and an index . Since the matrix is invertible, it must be the case that is a permutation on letters. It follows that is a signed permutation. ∎
With this preparation, we can delineate circumstances where the (minimal) sign component decomposition of a low-rank matrix is unique up to equivalence.
Theorem 4.4 (Sign component decomposition: Uniqueness).
Consider a matrix that admits a sign component decomposition . Assume that
The sign matrix is Schur independent;
The weight matrix has full column rank.
Then all minimal sign component decompositions of (with inner dimension ) are equivalent.
The sign matrix has full column rank because it is Schur independent (Fact 2.3(1)), while the weight matrix has full column rank by assumption. We discover that the matrix has rank . Therefore, every sign component decomposition of has inner dimension at least , and the distinguished decomposition has the minimal inner dimension.
Suppose that is another sign component decomposition with inner dimension . Since has rank , both factors and must have full column rank. As a consequence, there is an invertible transformation for which . Since is a Schur independent sign matrix and is a sign matrix, Proposition 4.3 forces to be a signed permutation. Now, we have the chain of identities
Since the matrix has full column rank, we can cancel to see that . The signed permutation is orthogonal, so it follows that .
To summarize, we have been given two sign component decompositions with inner dimension . We have shown that they are related by and for a signed permutation . Therefore, the two decompositions are equivalent. ∎
Theorem 4.4 describes conditions under which the minimal sign component decomposition of a matrix is uniquely determined. It is natural to demand that both the left and the right factors have full column rank. The geometry of the factorization problem dictates the stronger requirement that the sign matrix is Schur independent. As we have discussed, most families of sign vectors are Schur independent, so this condition holds for a rich class of matrices.
5. Computation of the asymmetric sign component decomposition
In this section, we derive and justify Algorithm 3, which computes the asymmetric sign component decomposition of a matrix whose sign component is Schur independent. We establish the following result.
Theorem 5.1 (Sign component decomposition: Computation).
Consider a matrix that admits a sign component decomposition . Assume that
The sign matrix is Schur independent;
The weight matrix has full column rank.
Then, with probability one, Algorithm 1 identifies the minimal sign component decomposition, up to signed permutation. That is, the output is a pair where and for a signed permutation .
5.1. Factorization and semidefinite programming
Although constrained matrix factorization is viewed as a challenging problem, certain aspects are simpler than they appear. In particular, we can expose properties of the components of a matrix factorization by means of a semidefinite constraint.
Fact 5.2 (Factorization constraint).
Let be a matrix. The semidefinite relation
enforces a factorization of in the following sense.
We omit the easy proof, because we do not use this result directly.
The factorization constraint (5.1) does not give us direct access to the factors and . Nevertheless, we can place restrictions on the variables and to limit the possible values that the factors and can take. If the conditions are strong enough, it is sometimes possible to determine the factors completely, modulo symmetries.
Example 5.3 (From SVD to eigenvalue decomposition).
Let be a matrix. Consider the semidefinite program
Every minimizer takes the form and where is a singular value decomposition. We can find the left and right singular vectors of
by computing the eigenvalue decompositions ofand . As a side note, the minimal value of the optimization problem is the Schatten 1-norm (i.e., the sum of singular values) of the matrix .
5.2. Overview of algorithm and proof of Theorem 5.1
Given an input matrix with a Schur independent sign component , our aim is to find the (unknown) asymmetric sign component decomposition. We reduce this challenge to the solved problem of computing a symmetric sign component decomposition of a correlation matrix. In this section, we outline the procedure, along with the proof of Theorem 5.1. Algorithm 1 encapsulates the computations, and some details of the argument are postponed to the next sections.
The first step is to construct a correlation matrix whose symmetric sign component decomposition has the same sign factor as the input matrix . To that end, construct the orthogonal projector onto the range of . Then solve the semidefinite program (SDP)
Fact 5.2 shows that the semidefinite constraint in (5.2) links the variables and to a factorization of . Meanwhile, courtesy of Fact 3.4, the equality constraints in (5.2) force the variable to be a correlation matrix whose range equals the range of . The following lemma packages these claims.
Proposition 5.4 (Factorization SDP).
The next step is to extract the sign component of the correlation matrix that solves (5.2). According to Proposition 5.4, the correlation matrix meets the requirements of Fact 3.3. Therefore, we can invoke Algorithm 3, the symmetric sign component decomposition method, to obtain a factorization
We cannot resolve the signed permutation, but the computed sign component is equivalent with the designated sign component .
To complete the sign component decomposition, it remains to determine the weight matrix. We may do so by solving the linear system
The pair yields a sign component decomposition of the matrix that is equivalent with the specified decomposition . This observation completes the proof of Theorem 5.1.
5.3. Positive-semidefinite matrices
Fact 5.5 (Conjugation rule).
Conjugation respects the semidefinite order in the following sense.
If , then for each matrix with compatible dimensions.
If has full column rank and , then .
Fact 5.6 (Schur complements).
Assume that is a (strictly) positive-definite matrix. Then
Related results hold when is merely psd.
Fact 5.7 (Trace is monotone).
Let and be psd matrices that satisfy . Then , and equality holds precisely when .
5.4. The Factorization SDP
Lemma 5.8 (Factorization SDP).
Recall that for a Schur independent sign matrix and a matrix with full column rank.
First, we argue that a feasible point of the factorization SDP (5.2) must be a correlation matrix of the form
Indeed, the block matrix constraint in (5.2) ensures that , and the constraint makes a correlation matrix. At the same time, since the matrix has full column rank,
Fact 3.4 shows that the constraint isolates the family of correlation matrices. This establishes the claim.
Now, we can recognize that is a strictly positive-definite matrix. Indeed, owing to (5.4), the relation would imply that the corresponding column of the weight matrix equals zero, but this is impossible because has full column rank.
The objective function, , of the semidefinite program (5.2) is strictly monotone with respect to the semidefinite order (Fact 5.7). The variable is otherwise unconstrained, so the SDP achieves its minimum if and only if
It remains to determine the vector that minimizes the trace of .
To that end, calculate that
Equality holds if and only if the quantities are identical for all indices . Since , we may conclude that the minimizer has coordinates
In summary, we have shown that the unique matrices that optimize (5.2) take the form
Identify the diagonal matrix from the statement to complete the proof. ∎
6. Asymmetric binary component decomposition
In this section, we develop a procedure (Algorithm 2) for computing an asymmetric binary component decomposition (2.5)–(2.6). We prove Theorem II, which states that the algorithm succeeds under a Schur independence condition. Our approach reduces the problem of computing a binary component decomposition to the problem of computing a sign component decomposition of a related matrix.
6.1. Correspondence between binary vectors and sign vectors
As we have discussed, there is a one-to-one correspondence between sign vectors and binary vectors (2.7). The correspondence between asymmetric sign component decompositions and binary component decompositions, however, is more subtle because they are invariant under different transformation. Indeed, does not change if we flip the sign of both and . On the other hand, the matrix completely determines the vectors and .
6.2. Reducing binary component decomposition to sign component decomposition
Given a matrix that has a binary component decomposition, we can apply a simple transformation to construct a related matrix that admits a sign component decomposition
Proposition 6.1 (Binary component decomposition: Reduction).
Consider a matrix that has a binary component decomposition
Construct the matrix
Then admits a sign component decomposition with inner dimension :
Recall that is a matrix of ones with appropriate dimensions.
The result follows from a straightforward calculation: