Identifiability of an X-rank decomposition of polynomial maps

03/04/2016 ∙ by Pierre Comon, et al. ∙ The University of Chicago GIPSA-Lab 0

In this paper, we study a polynomial decomposition model that arises in problems of system identification, signal processing and machine learning. We show that this decomposition is a special case of the X-rank decomposition --- a powerful novel concept in algebraic geometry that generalizes the tensor CP decomposition. We prove new results on generic/maximal rank and on identifiability of a particular polynomial decomposition model. In the paper, we try to make results and basic tools accessible for general audience (assuming no knowledge of algebraic geometry or its prerequisites).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction: polynomial decompositions

1.1 Notation

We use boldface letters (

, …) for vectors, and boldface capital letters (

, , …) for matrices. Given an -dimensional vector space over a field , fix a basis for , then a vector can be identified with an matrix, i.e., , where denotes the transpose. Thus, stands for the matrix multiplication111Note that this is not the inner product in the case . By we denote the space of multivariate polynomials in variables of total degree , and we write an element of in the form , where .

Standardly, we use for Cartesian product of sets, and a shorthand notation . We use for the direct sum222i.e. the Cartesian product equipped with the vector space structure of vector spaces, and for the tensor product. By or we denote the space of -th order symmetric tensors on an -dimensional vector space (i.e., symmetric tensors). In , means .

1.2 Model and examples

Let be or . Consider a multivariate polynomial map , i.e., a vector of multivariate polynomials of total degree in variables, (i.e., each ). Without loss of generality, in this paper, we assume that (i.e., the constant part of is zero).

Following [21], we say that has a decoupled representation, if it can be expressed as

(1)

where , , and where are univariate polynomials over . The problem is often to find a decoupled representation (1) with minimum.

Example 1 ().

In this case, is a linear map, i.e. with . Without loss of generality we can assume , and (1) becomes a low-rank factorization333Example 1 shows that (1) can be interpreted as a “low-rank factorization” of a nonlinear map.

The next special case is one of the key examples in this paper.

Example 2 ().

In this case is a single polynomial , and (1) becomes

(2)

since we can assume that . An example of (2) is shown in Fig. 5.

The decomposition (2)

  • is known as sum of ridge functions or plane waves [30, 32] in approximation theory;

  • corresponds to ridge polynomial neural networks

    [37] (RPNs) in machine learning;

  • appears in blind source separation problems in signal processing [18].

(a)
(b)
(c)
(d)
Figure 5: , , , .

Next, the homogeneous versions of Eq. 1 and Eq. 2 are well-known in algebraic geometry.

Example 3 (, — homogeneous).

If is homogeneous of degree , then should be also homogeneous, i.e. . Hence, the decomposition (1) becomes

(3)

The decomposition (3) is known as Waring decomposition, and was subject to numerous studies in the literature [27, 1]. Via the correspondence between homogeneous polynomials and symmetric tensors (see Section A.1), (3) becomes the symmetric tensor decomposition

(4)

where is the symmetric tensor corresponding to the polynomial in .

For homogeneous case, the general decomposition (for ) was also already considered.

Example 4 (, — homogeneous).

As in Example 3, (1) can be rewritten as

(5)

The decomposition (5) is exactly the simultaneous Waring decomposition of homogeneous polynomials (equivalently, CP decomposition of a partially symmetric tensor).

Example 5 (the general case, , — non-homogeneous).

As summarized in [21], the general decomposition (1) appears in the field of nonlinear system identification [36, 25]

. A common problem in identification (parameter estimation) for several challenging nonlinear block-structured systems (parallel Wiener-Hammerstein

[36] and nonlinear feedback [41] models) is to decompose a nonlinear function (represented by a polynomial) in the form (1).

Remark 1

In the system identification literature ([21]), the decomposition (1) is often written in a compact form

where , and defined as . Also, a block-diagram for decomposition (1) (given in Fig. 6) is often used, where the “input” variables

are transformed by a linear transformation, followed by component-wise nonlinear transformations. The “outputs” are obtained by linear combinations of the results of the nonlinear transformation.

Figure 6: Representation of a polynomial decomposition.

1.3 Goals and previous works

When using model (1), a few natural theoretical questions arise that are important to understand the limits of the applicability of the model.

  1. When is the model identifiable? (i.e., when is the decomposition (1) unique?).

  2. What is the upper bound on in (1) needed to represent any polynomial?

  3. What is the typical (for a “random” ) behavior of in the shortest decomposition?

As for the special (homogeneous) cases of decomposition (1) (Examples 1,3,4), all the three cases were a subject of rapid development in the last two decades, and many results are available. In this paper, we address the non-homogeneous case (Examples 2 and 5), where very few results are available (listed below).

Bounds on and typical behavior

This question was considered only for , in the papers [34, 35, 6]. The best result shows that any can be decomposed as (2) whenever

(6)

where the bound444Bound (6) is better than a naive bound (number of monomials in the highest degree part of ). (6) is valid for , and for certain finite fields. The typical behavior of in the shortest decomposition is known only for the case and [34] (the case of bivariate polynomials).

Uniqueness

The uniqueness in representations (1) was almost not studied. The authors of [21] suggested to construct a structured tensor from the coefficients of polynomials. Based on a Kruskal-type condition for unstructured tensors, they propose a bound for generic uniqueness that depends on . This bound is, however, applicable only to unstructured tensors, and not to the decomposition (1), as we argue in Remark 3.

1.4 Contribution and structure of this paper

In this paper, we show that that the decomposition (1) can be viewed as a special case of -rank decomposition. The notion of -rank (or rank with respect to a variety ) is a powerful concept developed in the field of algebraic geometry that generalizes matrix rank, tensor rank, symmetric tensor rank and other notions of rank. The questions raised in Section 1.3 can be addressed in the framework of X-rank and correspond to finding maximal, typical, generic ranks and to checking -identifiability (generic uniqueness). In particular, we:

  1. Obtain results on identifiability and partial identifiability of (1).

  2. Determine the value of generic rank for some special cases of .

  3. Obtain a new bound on (for or ) that is better than (6).

Although in this paper we do not develop decomposition algorithms (see [21], [41],[40] for available algorithms), we believe that the ideas may lead to new or improved algorithms.

In Section 2, we introduce the concept of X-rank decompositions and make a review of recent results. We prefer a very simplistic exposition and hope that Section 2 may serve as an entry point to the literature on X-rank for a wider audience, including applied mathematicians and engineers. In Section 3, we recall the definition and known results on generic uniqueness (identifiability), and prove equivalence of different definitions appearing in the literature. In Section 4, we introduce Veronese scrolls, show that decompositions (1) and (2) are related to -rank decompositions for Veronese scrolls, and give defining equations for this variety. Section 5 contains the main results of the paper, including identifiability of Veronese scrolls and polynomial decompositions, dimensions of secant varieties, and results on generic ranks.

2 X-rank decompositions

The concept of

-rank (or rank with respect to a variety) was probably first proposed in

[42], and popularized in [7, 28]. In this section we give key definitions and basic results, in a simplified form. In particular, we avoid the use of projective varieties whenever possible.

2.1 X-rank: definitions

Consider an -dimensional vector space555For simplicity, one can think that . over , where is or . Assume that a subset is fixed that satisfies the following conditions.

Assumption 1.

is scale-invariant, i.e. and implies .

Assumption 2.

is non-degenerate, i.e.

it is not contained in any hyperplane of

.

Assumption 3.

is an algebraic variety, i.e. the zero set of a system of polynomial equations (see also Section A.2).

Definition 1

Given a subset , the -rank of any vector is defined as the smallest number of rank-one elements, such that can be represented as their sum:

(7)

Such a decomposition with the minimal possible number of terms is called the -rank decomposition. (The rank of , by convention, is zero.)

Assumption 1 guarantees that the -rank is compatible with linear operations, whereas Assumption 2 ensures that any vector has an -rank decomposition and that the -rank does not exceed . The Assumption 3 allows for an algebraic analysis of -rank decompositions.

The X-rank decomposition can be illustrated in Fig. 7. It is also similar in spirit to sparse (atomic) decompositions, that appeared recently in other branches of applied mathematics [11].

Figure 7: Vector can be decomposed into the sum of 2 elements of the variety .

In fact, Assumptions 3 and 1 imply that is an affine cone of a projective algebraic variety666where is the projective space. . The projective variety is the usual starting point in the definition of -rank, see [42, 7, 28]. In this paper, however, we prefer to work and give definitions in terms of the affine variety , which simplifies some expressions (as we will show later). One only has to bear in mind that . To avoid pathological phenomena and also for convenience of using algebraic geometry, the following assumption is often imposed.

Assumption 4.

is an irreducible variety (see Section A.2).

Finally, for real varieties, the following assumption is often added, to avoid unexpected phenomena and make use of the powerful tools from complex algebraic geometry.

Assumption 5.

The complex variety is defined by polynomial equations with real coefficients. In addition, the corresponding real variety contains a smooth point of (see Section A.2).

2.2 Examples

The basic examples, considered in Example 1, Example 3 and Example 4 fit in the framework of -rank, and are explained in Table 1. All these examples in Table 1 satisfy Assumptions 1 to 5.

Ambient space () variety
tensor Segre variety
symmetric tensor Veronese variety
several Segre-Veronese variety
symmetric tensors
Table 1: Varieties and -ranks

The dimension of the variety of rank-one elements

reflects the number of degrees of freedom in the parameterization of

. Take, for instance, the case of non-symmetric tensors (1-st row in Table 1). It is parameterized by parameters, but there are redundancies since any element of has many representations in the form , due to exchange of scaling. The other examples in Table 1 follow the same pattern: the dimension of is equal to the number of parameters minus the number of “dependencies”.

2.3 Maximal, typical ranks and basic relations

First, we introduce two notations:

Definition 2 (Maximal rank)

The maximal -rank is defined as the smallest such that , and denoted by .

Definition 3

A rank is called typical if contains an open Euclidean ball in .

Since is a semialgebraic set [33], a rank is typical if and only if has nonzero Lebesgue measure. Hence, a rank is typical, if and only if it appears with nonzero probability (if the vectors of

are drawn from an absolutely continuous probability distribution). The following properties of typical ranks over

and are known.

Lemma 1

If , there exists only one typical rank, which is called generic rank, and denoted by . Moreover, the elements or rank are Zariski-dense in , i.e. there exists an algebraic subvariety such that for any .

Theorem 1 ([5])

Over the real field, the typical ranks form a contiguous set, i.e. there exist the numbers and such that:

  • Any such that is typical;

  • Any such that or is not typical.

Next, the following theorem relates maximal and typical/generic ranks.

Theorem 2 ( [7])
  • If , then .

  • If , then .

Finally, there is a relation between real typical ranks and generic complex ranks.

Theorem 3 ([7])

Let be a real variety satisfying Assumptions 1 to 5, and be its complexification. Then it holds that

i.e. the smallest typical real rank is equal to the complex generic rank.

All the varieties that we consider in this paper satisfy Assumptions 1 to 5.

2.4 Secant varieties and border rank

The -th secant variety777Here we again prefer using affine varieties. For projective definitions, we invite the reader to consult [28]. is, by definition, the Zariski closure of the elements of rank :

The following properties of are known, see for example [28, Section 5.1] and [2, Theorem 4.3] for more details.

Theorem 4

  • If , then is the Euclidean closure of .

  • If , and , then a general point in has rank , i.e. there exist a subvariety , such that

  • If , it is not the case: there may exist a nonempty Euclidean open subset of such that each point in this open subset has -rank strictly larger than .

Nevertheless, there is a correspondence between real and complex varieties [33]: Let be a real variety satisfying Assumptions 1 to 5, and . Then for all the secant variety satisfies Assumptions 1 to 5, and is a complexification of .

2.5 Defectivity, expected dimension and generic rank

In this subsection, we only consider the case , and we assume that satisfies Assumptions 1 to 4.

A direct consequence of Theorem 4 is that the dimensions of are increasing until , i.e.,

and tells us that we are able to find the generic rank by looking at dimensions of . For this, a useful concept, i.e., the expected dimension, is introduced.

Definition 4 (Expected dimension)

The expected dimension of is defined as

The intuition behind Definition 4 is that if we add in (7) vectors from the variety of dimension , we obtain an object of dimension times larger. In general,

If there is a strict inequality, is called defective. Otherwise is called non-defective.

Corollary 1

The following bound on can be given:

(8)

In particular, if all are non-defective, then .

The Alexander-Hirschowitz theorem [1] states that for , all the secant varieties are non-defective except a finite number of exceptions. Hence, by Corollary 1 and Table 1, the generic rank is equal to , where

except , where is increased by .

3 Uniqueness and identifiability

3.1 Uniqueness of a decomposition

First, we introduce the notion of uniqueness.

Definition 5

An -rank decomposition (7) is unique if all the other decompositions of the form (7) differ only by permutation of the summands in (7).

This definition corresponds to the standard definition of uniqueness of tensor decompositions. For instance, a tensor decomposition

(9)

is unique if it is unique up to permutation of summands and exchange of scaling in the vectors. In this paper, we study the notion of generic uniqueness, or uniqueness of “almost all” decompositions. The following algebraic definition is often adopted in the literature.

Definition 6

A variety is called -identifiable if a general element in has a unique rank- decomposition, i.e. there exists a semialgebraic subset of strictly smaller dimension such that any element in has a unique rank- decomposition.

First, we remark on the relation between real and complex identifiability.

Lemma 2 ([33])

Assume that satisfies Assumptions 1 to 5, and is -identifiable. Then is also -identifiable.

Next, we give some interpretation to Definition 6. The following lemma (Lemma 3) states that is -identifiable if for “randomly chosen” their sum has a unique -rank decomposition. The following proposition (Proposition 1) gives an equivalent definition of identifiability in the parameter space. The proof of both results is given in Section 6.1.

Lemma 3

Let , satisfy Assumptions 1 to 4. Then is -identifiable if and only if

(10)

Proposition 1

Let be an algebraic variety over ( or ) satisfying Assumptions 1 to 5. Assume that there exists a polynomial map such that . Then is -identifiable if and only if for a general point , the decomposition

(11)

is unique, i.e., the semialgebraic set

(12)

has Lebesgue measure zero.

Consider the case of Equation Eq. 9. The Segre variety is -identifiable if and only if the decomposition Eq. 9 is unique for general (i.e. drawn randomly with respect to an absolutely continuous probability distribution). Note the decomposition Eq. 9 is unique does not mean are unique, in fact they are unique up to scaling. Definition in the parameter space is more common in linear algebra and engineering literature. Hence Proposition 1 establishes correspondence between these two definitions.

Finally, there is an important corollary of Definition 6 (in the case ) and Proposition 1 (in the case ).

Corollary 2

Let or , satisfy assumptions of Proposition 1. If is -identifiable, then any vector is a limit of a sequence of vectors with a unique decomposition.

Thus, any rank- vector can be approximated by rank- uniquely decomposable vectors to arbitrary precision. To our knowledge, in the case , this fact is not explicitly mentioned in the literature.

3.2 Necessary and sufficient conditions for generic uniqueness

Here, in what follows, we consider only the case . First, by [39], if is defective, then is not -identifiable. If is non-defective, then a general point in has a finite number of decompositions. Thus, already looking at the dimension of we can already conclude that is -identifiable. This can be done numerically using the Terracini’s lemma.

Lemma 4 (Terracini)

Assume that satisfies Assumptions 1 to 4. Then for a general point , the tangent space is

Hence, the non-defectivity can be checked numerically, by picking “random” points and comparing with . A variety is called -weakly defective if for general points in a general hyperplane tangent to them is tangent to elsewhere [12]. If is not -weakly defective, then is -identifiable (the converse is not true).

3.3 Examples: Veronese and Segre-Veronese varieties

We review here some results on identifiability of varieties from Table 1, that will be needed. First, recall a recent result that for all subgeneric ranks, the Veronese variety is -identifiable.

Theorem 5 ([15, Theorem 1.1])

Let and . Then is -identifiable for all , where

(13)

Next, we recall stronger results on -weak defectivity of the Veronese varieties.

Theorem 6 ([4, 31, 13])

Let and . Then the Veronese variety is not -weakly defective888The case was proved in the proof of [13, Thm 5.1], was proved in [31, Thm. 4.1], the case is proved in [4, Thm. 1.1.] (see also [31, Corollary 4.5]). for , where

For Segre-Veronese varieties, we are not aware of explicitly available results on identifiability. However, the identifiability of such varieties can be easily deduced from Theorem 6 and the results of [8] on identifiability of Segre products of varieties. Let

(14)
Corollary 3

Let , , , and , where

(15)

Then the variety is -identifiable.

Proof

The proof is given in Section 6.1.

Although the expression in (15) looks complicated, in fact,

if or if .

4 Veronese scrolls

In this section, we recall a variety that is a generalization of the well-known rational normal scroll [10].

4.1 Simultaneous Waring decompositions

Let be a sequence of natural numbers999By convention, is the set of nonnegative integers and includes . put in one vector and define a shorthand notation

which is a vector space of dimension

We say that has a Waring-like decomposition of rank if there exist and such that

(16)

In other words, decomposition (16) is equivalent to simultaneous Waring decompositions with the same vectors but different coefficients.

Example 6.

Let us show that Example 2 is a special case of the Waring-like decomposition (16). Since in (2), we have that

where is the -th degree homogeneous part of . Hence, if the polynomial admits a decomposition (2), then all the homogeneous parts can be decomposed as

which is a special case of Eq. 16 for the vector of integers .

4.2 Veronese scrolls: a parametric definition

The decomposition Eq. 16 can be put in the framework of -rank as follows. Define the following map:

(17)

and define the image of this map as

(18)

and the corresponding subset in the projective space.

It is easy to see that has a Waring-like decomposition if and only if it has an -rank decomposition with . It can be shown that satisfies Assumptions 1 to 4 (affine cone of a projective variety ). In particular, when , is the rational normal (-fold) scroll, a classic object in algebraic geometry [10]. When , we did not find a name of in the literatures, so we call it Veronese scroll, as a hybrid of “rational normal scroll” and “Veronese variety”. When , can be realized as a projective bundle101010We are not reproducing the bundle construction, since it is difficult without going into technical details. [3, 10, 18]. In the following sections, we give explicit (ideal-theoretic) defining equations for the set Eq. 18, which will provide an alternative proof that is a variety.

Now consider the following map

and define the image of . It is easy to see that , Moreover, as in Section 4.1, we can show that the polynomial decomposition Eq. 1 is exactly the -rank decomposition for .

4.3 Determinantal construction (defining equations)

This section is not needed to prove the main results of the paper, but still gives more insight in the nature of the Veronese scrolls.

First, recall a definition of the catalecticant matrix [27, Ch. 1] (we prefer giving it in coordinates). Let be given by coordinates , as defined in Section A.1. Then the first catalecticant matrix, for , is defined as111111In fact, this is the matrix representation map given by differentiation.

where the columns are indexed by .

Proposition 2

Let , and . Define the stacked matrix as

(19)

Then it holds that

i.e. is defined (set-theoretically) by the vanishing of all minors of .

Proof

The proof is contained in Section 6.2.

A similar construction for the matrix can be found in [3, §3].

Proposition 3

Let , and be defined as in Eq. 19. Then the minors of generate the ideal of .

The proposition is much stronger than Proposition 2. The proof relies on the tools of representation theory, and is contained in Section 6.2.

5 Main results

Throughout this section we assume that . By [33, Section 5], all our results hold for the real case too. We will also use a shorthand instead of .

Remark 2

A common idea to consider our model (1) (suggested to us by one of the reviewers) is that decomposition (2) can be brought to the form (3), and hence Waring decomposition can be applied (the same argument can be applied to bring (1) to the form (5)). However, homogenization can increase the number of terms, and does not give a good answer to our questions.

For example, the homogenization of in Fig. 5 is the trivariate polynomial

But it is known [9] that this homogeneous polynomial does not have a Waring decomposition Eq. 3 with less that terms (compare with terms in Fig. 5). The reason for that is that the polynomials do not correspond to powers of linear forms for the homogenized polynomial. In fact, homogenization restricts the form of polynomials . We will study this model by investigating properties of Veronese scrolls.

5.1 Identifiability of Veronese scrolls and polynomial decompositions

Proposition 4

Let , , . Next, consider the Veronese scroll with , , and the variety . Then we have the following.

  1. is -identifiable if