# Filtrated Algebraic Subspace Clustering

Subspace clustering is the problem of clustering data that lie close to a union of linear subspaces. In the abstract form of the problem, where no noise or other corruptions are present, the data are assumed to lie in general position inside the algebraic variety of a union of subspaces, and the objective is to decompose the variety into its constituent subspaces. Prior algebraic-geometric approaches to this problem require the subspaces to be of equal dimension, or the number of subspaces to be known. Subspaces of arbitrary dimensions can still be recovered in closed form, in terms of all homogeneous polynomials of degree m that vanish on their union, when an upper bound m on the number of the subspaces is given. In this paper, we propose an alternative, provably correct, algorithm for addressing a union of at most m arbitrary-dimensional subspaces, based on the idea of descending filtrations of subspace arrangements. Our algorithm uses the gradient of a vanishing polynomial at a point in the variety to find a hyperplane containing the subspace S passing through that point. By intersecting the variety with this hyperplane, we obtain a subvariety that contains S, and recursively applying the procedure until no non-trivial vanishing polynomial exists, our algorithm eventually identifies S. By repeating this procedure for other points, our algorithm eventually identifies all the subspaces by returning a basis for their orthogonal complement. Finally, we develop a variant of the abstract algorithm, suitable for computations with noisy data. We show by experiments on synthetic and real data that the proposed algorithm outperforms state-of-the-art methods on several occasions, thus demonstrating the merit of the idea of filtrations.

## Authors

• 16 publications
• 59 publications
• ### Filtrated Spectral Algebraic Subspace Clustering

Algebraic Subspace Clustering (ASC) is a simple and elegant method based...
10/15/2015 ∙ by Manolis C. Tsakiris, et al. ∙ 0

• ### Greedy Subspace Clustering

We consider the problem of subspace clustering: given points that lie on...
10/31/2014 ∙ by Dohyung Park, et al. ∙ 0

• ### Noisy Subspace Clustering via Thresholding

We consider the problem of clustering noisy high-dimensional data points...
05/15/2013 ∙ by Reinhard Heckel, et al. ∙ 0

• ### CUR Decompositions, Similarity Matrices, and Subspace Clustering

A general framework for solving the subspace clustering problem using th...
11/11/2017 ∙ by Akram Aldroubi, et al. ∙ 0

• ### RULLS: Randomized Union of Locally Linear Subspaces for Feature Engineering

Feature engineering plays an important role in the success of a machine ...
04/25/2018 ∙ by Namita Lokare, et al. ∙ 0

• ### A Critique of Self-Expressive Deep Subspace Clustering

Subspace clustering is an unsupervised clustering technique designed to ...
10/08/2020 ∙ by Benjamin D. Haeffele, et al. ∙ 0

• ### Learning the nonlinear geometry of high-dimensional data: Models and algorithms

Modern information processing relies on the axiom that high-dimensional ...
12/21/2014 ∙ by Tong Wu, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Given a set of points lying close to a union of linear subspaces, subspace clustering refers to the problem of identifying the number of subspaces, their dimensions, a basis for each subspace, and the clustering of the data points according to their subspace membership. This is an important problem with widespread applications in computer vision

[38], systems theory [24] and genomics [17].

### 1.1 Existing work

Over the past years, various subspace clustering methods have appeared in the literature [36]. Early techniques, such as K-subspaces [2, 34] or Mixtures of Probabilistic PCA [30, 13], rely on solving a non-convex optimization problem by alternating between assigning points to subspaces and re-estimating a subspace for each group of points. As such, these methods are sensitive to initialization. Moreover, these methods require a-priori knowledge of the number of subspaces and their dimensions. This motivated the development of a family of purely algebraic methods, such as Generalized Principal Component Analysis or GPCA [41], which feature closed form solutions for various subspace configurations, such as hyperplanes [40, 39]. A little later, ideas from spectral clustering [44] led to a family of algorithms based on constructing an affinity between pairs of points. Some methods utilize local geometric information to construct the affinities [47]. Such methods can estimate the dimension of the subspaces, but cannot handle data near the intersections. Other methods use global geometric information to construct the affinities, such as the spectral curvature [3]. Such methods can handle intersecting subspaces, but require the subspaces to be low-dimensional and of equal dimensions. In the last five years, methods from sparse representation theory, such as Sparse Subspace Clustering [8, 9, 10], low-rank representation, such as Low-Rank Subspace Clustering [22, 11, 20, 37], and least-squares, such as Least-Squares-Regression Subspace Clustering [23], have provided new ways for constructing affinity matrices using convex optimization techniques. Among them, sparse-representation based methods have become extremely attractive because they have been shown to provide affinities with guarantees of correctness as long as the subspaces are sufficiently separated and the data are well distributed inside the subspaces [10, 28]. Moreover, they have also been shown to handle noise [45]

and outliers

[29]. However, existing results require the subspace dimensions to be small compared to the dimension of the ambient space. This is in sharp contrast with algebraic methods, which can handle the case of hyperplanes.

### 1.2 Motivation

This paper is motivated by the highly complementary properties of Sparse Subspace Clustering (SSC) and Algebraic Subspace Clustering (ASC), priorly known as GPCA:222Following the convention introduced in [42], we have taken the liberty to change the name from GPCA to ASC for two reasons. First, to have a consistent naming convention across many subspace clustering algorithms, such as ASC, SSC, LRSC, which is indicative of the its type (algebraic, sparse, low-rank). Second, we believe that GPCA is a more general name that is best suited for the entire family of subspace clustering algorithms, which are all generalizations of PCA. On the one hand, theoretical results for SSC assume that the subspace dimensions are small compared to the dimension of the ambient space. Furthermore, SSC is known to be very robust in the presence of noise in the data. On the other hand, theoretical results for ASC are valid for subspaces of arbitrary dimensions, with the easiest case being that of hyperplanes, provided that an upper bound on the number of subspaces is known. However, all known implementations of ASC for subspaces of different dimensions, including the recursive algorithm proposed in [16], are very sensitive to noise and are thus considered impractical. As a consequence, our motivation for this work is to develop an algorithm that enjoys the strong theoretical guarantees associated to ASC, but it is also robust to noise.

### 1.3 Paper contributions

This paper features two main contributions.

As a first contribution, we propose a new ASC algorithm, called Filtrated Algebraic Subspace Clustering (FASC), which can handle an unknown number of subspaces of possibly high and different dimensions, and give a rigorous proof of its correctness.333Partial results from the present paper have been presented without proofs in [32]. Our algorithm solves the following problem:

[Algebraic subspace clustering problem] Given a finite set of points lying in general position444We will define formally the notion of points in general position in Definition 4.1. inside a transversal subspace arrangement555We will define formally the notion of a transversal subspace arrangement in Definition 2.3. , decompose into its irreducible components, i.e., find the number of subspaces and a basis for each subspace .

Our algorithm approaches this problem by selecting a suitable polynomial vanishing on the subspace arrangement . The gradient of this polynomial at a point

gives the normal vector to a hyperplane

containing the subspace passing through the point. By intersecting the subspace arrangement with the hyperplane, we obtain a subspace sub-arrangement , which lives in an ambient space of dimension one less than the original ambient dimension and still contains . By choosing another suitable polynomial that vanishes on , computing the gradient of this new polynomial at the same point, intersecting again with the new hyperplane , and so on, we obtain a descending filtration of subspace arrangements, which eventually gives us the subspace containing the point. This happens precisely after steps, where is the codimension of , when no non-trivial vanishing polynomial exists, and the ambient space , which is the orthogonal complement of the span of all the gradients used in the filtration, can be identified with . By repeating this procedure at another point not in the first subspace, we can identify the second subspace and so on, until all subspaces have been identified. Using results from algebraic geometry, we rigorously prove that this algorithm correctly identifies the number of subspaces, their dimensions and a basis for each subspace.

As a second contribution, we extend the ideas behind the purely abstract FASC algorithm to a working algorithm called Filtrated Spectral Algebraic Subspace Clustering (FSASC), which is suitable for computations with noisy data.666A preliminary description of this method appeared in a workshop paper [33]. The first modification is that intersections with hyperplanes are replaced by projections onto them. In this way, points in the subspace contained by the hyperplane are preserved by the projection, while other points are generally shrank. The second modification is that we compute a filtration at each data point and use the norm of point at the end of the filtration associated to point to define an affinity between these two points. The intuition is that the filtration associated to point will in theory preserve the norms of all points lying in the same subspace as

. This process leads to an affinity matrix of high intra-class and low cross-class connectivity, upon which spectral clustering is applied to obtain the clustering of the data. By experiments on real and synthetic data we demonstrate that the idea of filtrations leads to affinity matrices of superior quality, i.e., affinities with high intra- and low inter-cluster connectivity, and as a result to better clustering accuracy. In particular, FSASC is shown to be superior to state-of-the-art methods in the problem of motion segmentation using the Hopkins155 dataset

[31].

Finally, we have taken the liberty of presenting in an appendix the foundations of the algebraic geometric theory of subspace arrangements relevant to Algebraic Subspace Clustering, in a manner that is both rigorous and accessible to the interested audience outside the algebraic geometry community, thus complementing existing reviews such as [25].

### 1.4 Notation

For any positive integer , we define . We denote by the real numbers. The right null space of a matrix is denoted by . If is a subspace of , then denotes the dimension of and is the orthogonal projection of onto . The symbol denotes direct sum of subspaces. We denote the orthogonal complement of a subspace in by . If are elements of , we denote by the subspace of spanned by these elements. For two vectors , the notation means that and are colinear. We let be the polynomial ring over the real numbers in indeterminates. We use to denote the vector of indeterminates , while we reserve to denote a data point of . We denote by the set of all homogeneous777A polynomial in many variables is called homogeneous if all monomials appearing in the polynomial have the same degree. polynomials of degree and similarly the set of all homogeneous polynomials of degree less than or equal to . is an infinite dimensional real vector space, while  and are finite dimensional subspaces of of dimensions and , respectively. We denote by the field of all rational functions over and indeterminates . If is a subset of , we denote by the ideal generated by (see Definition A). If is a subset of , we denote by the vanishing ideal of , i.e., the set of all elements of that vanish on and similarly and . Finally, for a point , and a set of polynomials, is the set of gradients of all the elements of evaluated at .

### 1.5 Paper organization

The remainder of the paper is organized as follows: section 2 provides a careful, yet concise review of the state-of-the-art in algebraic subspace clustering. In section 3 we discuss the FASC algorithm from a geometric viewpoint with as few technicalities as possible. Throughout Sections 2 and 3, we use a running example of two lines and a plane in to illustrate various ideas; the reader is encouraged to follow these illustrations. We save the rigorous treatment of FASC for section 4, which consists of the technical heart of the paper. In particular, the listing of the FASC algorithm can be found in Algorithm 3 and the theorem establishing its correctness is Theorem 3. In section 5 we describe FSASC, which is the numerical adaptation of FASC, and compare it to other state-of-the-art subspace clustering algorithms using both synthetic and real data. Finally, appendices A, B and C cover basic notions and results from commutative algebra, algebraic geometry and subspace arrangements respectively, mainly used throughout section 4.

## 2 Review of Algebraic Subspace Clustering (ASC)

This section reviews the main ideas behind ASC. For the sake of simplicity, we first discuss ASC in the case of hyperplanes (section 2.1) and subspaces of equal dimension (section 2.2), for which a closed form solution can be found using a single polynomial. In the case of subspaces of arbitrary dimensions, the picture becomes more involved, but a closed form solution from multiple polynomials is still available when the number of subspaces is known (section 2.3) or an upper bound for is known (section 2.4). In section 2.5 we discuss one limitation of ASC due to computational complexity and a partial solution based on a recursive ASC algorithm. In section 2.6 we discuss another limitation of ASC due to sensitivity to noise and a practical solution based on spectral clustering. We conclude in section 2.7 with the main challenge that this paper aims to address.

### 2.1 Subspaces of codimension 1

The basic principles of ASC can be introduced more smoothly by considering the case where the union of subspaces is the union of hyperplanes in . Each hyperplane is uniquely defined by its unit length normal vector as . In the language of algebraic geometry this is equivalent to saying that is the zero set of the polynomial or equivalently is the algebraic variety defined by the polynomial equation , where with . We write this more succinctly as . We then observe that a point of belongs to if and only if is a root of the polynomial , i.e., the union of hyperplanes is the algebraic variety (the zero set of ). Notice the important fact that is homogeneous of degree equal to the number of distinct hyperplanes and moreover it is the product of linear homogeneous polynomials , i.e., a product of linear forms, each of which defines a distinct hyperplane via the corresponding normal vector .

Given a set of points in general position in the union of hyperplanes, the classic polynomial differentiation algorithm proposed in [39, 41] recovers the correct number of hyperplanes as well as their normal vectors by

1. embedding the data into a higher-dimensional space via a polynomial map,

2. finding the number of subspaces by analyzing the rank of the embedded data matrix,

3. finding the polynomial from the null space of the embedded data matrix,

4. finding the hyperplane normal vectors from the derivatives of at a nonsingular point of .888A nonsingular point of a subspace arrangement is a point that lies in one and only one of the subspaces that constitute the arrangement.

More specifically, observe that the polynomial can be written as a linear combination of the set of all monomials of degree in variables, as:

 p(x)=∑n1+n2+⋯nD=ncn1,n2,…,nDxn11xn22⋯xnDD=c⊤νn(x). (1)

In the above expression, is the vector of all coefficients , and is the Veronese or Polynomial embedding of degree

, as it is known in the algebraic geometry and machine learning literature, respectively. It is defined by taking a point of

to a point of under the rule

 (x1,…,xD)⊤νn⟼(xn1,xn−11x2,xn−11x3…,x1xn−1D,…,xnD)⊤, (2)

where is the dimension of the space of homogeneous polynomials of degree in indeterminates. The image of the data set under the Veronese embedding is used to form the so-called embedded data matrix

 νℓ(X):=[νℓ(x1)⋯νℓ(xN)]⊤. (3)

It is shown in [41] that when there are sufficiently many data points that are sufficiently well distributed in the subspaces, the correct number of hyperplanes is the smallest degree for which drops rank by 1: . Moreover, it is shown in [41] that the polynomial vector of coefficients is the unique up to scale vector in the one-dimensional null space of .

It follows that the task of identifying the normals to the hyperplanes from is equivalent to extracting the linear factors of . This is achieved999A direct factorization has been shown to be possible as well [40]; however this approach has not been generalized yet to the case of subspaces of different dimensions. by observing that if we have a point , then the gradient of evaluated at

 ∇p|x=n∑j=1bj∏j′≠j(b⊤j′x) (4)

is equal to up to a scale factor because and hence all the terms in the sum vanish except for the (see Proposition C for a more general statement). Having identified the normal vectors, the task of clustering the points in is straightforward.

### 2.2 Subspaces of equal dimension

Let us now consider a more general case, where we know that the subspaces are of equal and known dimension . Such a case can be reduced to the case of hyperplanes, by noticing that a union of subspaces of dimension of becomes a union of hyperplanes of after a generic projection . We note that any random orthogonal projection will almost surely preserve the number of subspaces and their dimensions, as the set of projections that do not have this preserving property is a zero measure subset of the set of orthogonal projections .

When the common dimension is unknown, it can be estimated exactly by analyzing the right null space of the embedded data matrix, after projecting the data generically onto subspaces of dimension , with [35]. More specifically, when , we have that , while when we have . On the other hand, the case is the only case for which the null space is one-dimensional, and so .

Finally, when both and are unknown, one can first recover as the smallest such that there exists an for which , and subsequently recover as the smallest such that ; see [35] for further details.

### 2.3 Known number of subspaces of arbitrary dimensions

When the dimensions of the subspaces are unknown and arbitrary, the problem becomes much more complicated, even if the number of subspaces is known, which is the case examined in this subsection. In such a case, a union of subspaces of , henceforth called a subspace arrangement, is still an algebraic variety. The main difference with the case of hyperplanes is that, in general, multiple polynomials of degree are needed to define , i.e., is the zero set of a finite collection of homogeneous polynomials of degree in indeterminates.

###### Example

Consider the union of a plane and two lines in general position in (Fig. 1).

Then is the zero set of the degree- homogeneous polynomials

 p1 :=(b⊤1x)(b⊤2,1x)(b⊤3,1x), p2 :=(b⊤1x)(b⊤2,1x)(b⊤3,2x), (5) p3 :=(b⊤1x)(b⊤2,2x)(b⊤3,1x), p4 :=(b⊤1x)(b⊤2,2x)(b⊤3,2x), (6)

where is the normal vector to the plane and

, are two linearly independent vectors that are orthogonal to the line

. These polynomials are linearly independent and form a basis for the vector space of the degree- homogeneous polynomials that vanish on .101010The interested reader is encouraged to prove this claim.

In contrast to the case of hyperplanes, when the subspace dimensions are different, there may exist vanishing polynomials of degree strictly less than the number of subspaces.

###### Example

Consider the setting of Example 2.3. Then there exists a unique up to scale vanishing polynomial of degree , which is the product of two linear forms: one form is , where is the normal to the plane , and the other linear form is , where is the normal to the plane defined by the lines and (Fig. 2).

As Example 2.3 shows, all the relevant geometric information is still encoded in the factors of some special basis111111Strictly speaking, this is not always true. However, it is true if the subspace arrangement is general enough, in particular if it is transversal; see Definition 2.3 and Theorem C. of , that consists of degree- homogeneous polynomials that factorize into the product of linear forms. However, computing such a basis remains, to the best of our knowledge, an unsolved problem. Instead, one can only rely on computing (or be given) a general basis for the vector space . In our example such a basis could be

 p1+p4,p1−p4,p2+p3,p2−p3 (7)

and it can be seen that none of these polynomials is factorizable into the product of linear forms. This difficulty was not present in the case of hyperplanes, because there was only one vanishing polynomial (up to scale) of degree and it had to be factorizable.

In spite of this difficulty, a solution can still be achieved in an elegant fashion by resorting to polynomial differentiation. The key fact that allows this approach is that any homogeneous polynomial of degree that vanishes on the subspace arrangement is a linear combination of vanishing polynomials, each of which is a product of linear forms, with each distinct subspace contributing a vanishing linear form in every product (Theorem C). As a consequence (Proposition C), the gradient of evaluated at some point lies in and the linear span of the gradients at of all such is precisely equal to . We can thus recover , remove it from and then repeat the procedure to identify all the remaining subspaces. As stated in Theorem 2.3, this process is provably correct as long as the subspace arrangement is transversal, as defined next. [Transversal subspace arrangement [5]] A subspace arrangement is called transversal, if for any subset of , the codimension of is the minimum between and the sum of the codimensions of all .

###### Remark

Transversality is a geometric condition on the subspaces, which in particular requires the dimensions of all possible intersections among subspaces to be as small as the dimensions of the subspaces allow (see Appendix C for a discussion).

[ASC by polynomial differentiation when is known, [41, 25]] Let be a transversal subspace arrangement of , let be a nonsingular point in , and let be the vector space of all degree- homogeneous polynomials that vanish on . Then is the orthogonal complement of the subspace spanned by all vectors of the form , where , i.e., .

Theorem 2.3 and its proof are illustrated in the next example.

###### Example

Consider Example 2.3 and recall that , , , and . Let be a generic point in . Then

 ∇p1|x2≅∇p2|x2≅b2,1,∇p3|x2≅∇p4|x2≅b2,2. (8)

Hence and so . Conversely, let . Then there exist , such that and so

 ∇p|x2=4∑i=1αi∇pi|x2∈Span(b2,1,b2,2)=S⊥2. (9)

Hence , and so .

### 2.4 Unknown number of subspaces of arbitrary dimensions

As it turns out, when the number of subspaces is unknown, but an upper bound is given, one can obtain the decomposition of the subspace arrangement from the gradients of the vanishing polynomials of degree , precisely as in Theorem 2.3, simply by replacing with .

[ASC by polynomial differentiation when an upper bound on is known, [41, 25]] Let be a transversal subspace arrangement of , let be a nonsingular point in , and let be the vector space of all degree- homogeneous polynomials that vanish on , where . Then is the orthogonal complement of the subspace spanned by all vectors of the form , where , i.e., .

###### Example

Consider the setting of Examples 2.3 and 2.3. Suppose that we have the upper bound on the number of underlying subspaces . It can be shown that the vector space has121212This can be verified by applying the dimension formula of Corollary 3.4 in [5]. dimension and is spanned by the polynomials

 q1 :=(b⊤1x)(f⊤x)3, q5 :=(b⊤1x)(f⊤x)(b⊤3x)2, (10) q2 :=(b⊤1x)2(f⊤x)2 q6 :=(b⊤1x)(b⊤2x)2(f⊤x), (11) q3 :=(b⊤1x)3(f⊤x), q7 :=(b⊤1x)(b⊤2x)2(b⊤3x), (12) q4 :=(b⊤1x)(f⊤x)2(b⊤3x), q8 :=(b⊤1x)(b⊤2x)(b⊤3x)2, (13)

where is the normal to , is the normal to the plane defined by lines and , and is a normal to line that is linearly independent from , for . Hence and . Then for a generic point , we have that

 ∇q1|x2=∇q2|x2=∇q4|x2=∇q6|x2=∇q7|x2=0, (14) ∇q3|x2≅∇q5|x2≅f,∇q8|x2≅b2. (15)

Hence and so . Similarly to Example 2.3, since every element of is a linear combination of the , we have .

###### Remark

Notice that both Theorems 2.3 and 2.4 are statements about the abstract subspace arrangement , i.e., no finite subset of is explicitly considered. To pass from to and get similar Theorems, we need to require to be in general position in , in some suitable sense. As one may suspect, this notion of general position must entail that polynomials of degree for Theorem 2.3, or of degree for Theorem 2.4, that vanish on must also vanish on and vice versa. In that case, we can compute the required basis for , simply by computing a basis for , by means of the Veronese embedding described in section 2.1, and similarly for . We will make the notion of general position precise in Definition 4.1.

### 2.5 Computational complexity and recursive ASC

Although Theorem 2.4 is quite satisfactory from a theoretical point of view, using an upper bound for the number of subspaces comes with the practical disadvantage that the dimension of the Veronese embedding, , grows exponentially with . In addition, increasing also increases the number of polynomials in the null space of , some which will eventually, as becomes large, be polynomials that simply fit the data but do not vanish on . To reduce the computational complexity of the polynomial differentiation algorithm, one can consider vanishing polynomials of smaller degree, , as suggested by Example 2.3. While such vanishing polynomials may not be sufficient to cluster the data into subspaces, they still provide a clustering of the data into subspaces. We can then look at each of these clusters and see if they can be partitioned further. For instance, in Example 2.3, we can first cluster the data into two planes, the plane and the plane containing the two lines and , and then partition the data lying in into the two lines and . This leads to the recursive ASC algorithm proposed in [16, 41], which is based on finding the polynomials of the smallest possible degree that vanish on the data, computing the gradients of these vanishing polynomials to cluster the data into groups, and then repeating the procedure for each group until the data from each group can be fit by polynomials of degree , in which case each group lies in single linear subspace. While this recursive ASC algorithm is very intuitive, no rigorous proof of its correctness has appeared in the literature. In fact, there are examples where this recursive method provably fails in the sense of producing ghost subspaces in the decomposition of . For instance, when partitioning the data from Example 2.3 into two planes and , we may assign the data from the intersection of the two planes to . If this is the case, when trying to partition further the data of , we will obtain three lines: , and the ghost line (see Fig. 3).

### 2.6 Instability in the presence of noise and spectral ASC

Another important issue with Theorem 2.4 from a practical standpoint is its sensitivity to noise. More precisely, when implementing Theorem 2.4 algorithmically, one is required to estimate the dimension of the null space of , which is an extremely challenging problem in the presence of noise. Moreover, small errors in the estimation of have been observed to have dramatic effects in the quality of the clustering, thus rendering algorithms that are directly based on Theorem 2.4 unstable. While the recursive ASC algorithm of [16, 41] is more robust than such algorithms, it is still sensitive to noise, as considerable errors may occur in the partitioning process. Moreover, the performance of the recursive algorithm is always subject to degradation due to the potential occurrence of ghost subspaces.

To enhance the robustness of ASC in the presence of noise and obtain a stable working algebraic algorithm, the standard practice has been to apply a variation of the polynomial differentiation algorithm based on spectral clustering [35]. More specifically, given noisy data lying close to a union of subspaces , one computes an approximate vanishing polynomial whose coefficients are given by the right singular vector of

corresponding to its smallest singular value. Given

, one computes the gradient of at each point in (which gives a normal vector associated with each point in , and builds an affinity matrix between points and as the cosine of the angle between their corresponding normal vectors, i.e.,

 Cjj′,angle=∣∣⟨∇p|xj||∇p|xj||,∇p|xj′||∇p|xj′||⟩∣∣. (16)

This affinity is then used as input to any spectral clustering algorithm (see [44] for a tutorial on spectral clustering) to obtain a clustering . We call this Spectral ASC method with angle-based affinity as SASC-A.

To gain some intuition about , suppose that is a union of hyperplanes and that there is no noise in the data. Then must be of the form . In this case is simply the cosine of the angle between the normals to the hyperplanes that are associated with points and . If both points lie in the same hyperplane, their normals must be equal, and hence . Otherwise, is the cosine of the angles between the hyperplanes. Thus, assuming that the smallest angle between any two hyperplanes is sufficiently large and that the points are well distributed on the union of the hyperplanes, applying spectral clustering to the affinity matrix will in general yield the correct clustering.

Even though SASC-A is much more robust in the presence of noise than purely algebraic methods for the case of a union of hyperplanes, it is fundamentally limited by the fact that, theoretically, it applies only to unions of hyperplanes. Indeed, if the orthogonal complement of a subspace has dimension greater than , there may be points inside such that the angle between and is as large as . In such instances, points associated to the same subspace may be weakly connected and thus there is no guarantee for the success of spectral clustering.

### 2.7 The challenge

As the discussion so far suggests, the state of the art in ASC can be summarized as follows:

1. A complete closed form solution to the abstract subspace clustering problem (Problem 1.3) exists and can be found using the polynomial differentiation algorithm implied by Theorem 2.4.

2. All known algorithmic variants of the polynomial differentiation algorithm are sensitive to noise, especially for subspaces of arbitrary dimensions.

3. The recursive ASC algorithm described in section 2.5 does not in general solve the abstract subspace clustering problem (Problem 1.3), and is in addition sensitive to noise.

4. The spectral algebraic algorithm described in section 2.6 is less sensitive to noise, but is theoretically justified only for unions of hyperplanes.

The above list reveals the challenge that we will be addressing in the rest of this paper: Develop an ASC algorithm, that solves the abstract subspace clustering problem for perfect data, while at the same time it is robust to noisy data.

## 3 Filtrated Algebraic Subspace Clustering - Overview

This section provides an overview of our proposed Filtrated Algebraic Subspace Clustering

(FASC) algorithm, which conveys the geometry of the key idea of this paper while keeping technicalities at a minimum. To that end, let us pretend for a moment that we have access to the entire set

, so that we can manipulate it via set operations such as taking its intersection with some other set. Then the idea behind FASC is to construct a descending filtration of the given subspace arrangement , i.e., a sequence of inclusions of subspace arrangements, that starts with and terminates after a finite number of steps with one of the irreducible components of :131313We will also be using the notation , where the arrows denote embeddings.

 (17)

The mechanism for generating such a filtration is to construct a strictly descending filtration of intermediate ambient spaces, i.e.,

 V0⊃V1⊃V2⊃⋯, (18)

such that , , and each contains the same fixed irreducible component of . Then the filtration of subspace arrangements is obtained by intersecting with the filtration of ambient spaces, i.e.,

 A0:=A⊃A1:=A∩V1⊃A2:=A∩V2⊃⋯. (19)

This can be seen equivalently as constructing a descending filtration of pairs , where is a subspace arrangement of :

 (RD,A)←(V1≅RD−1,A1)←(V2≅RD−2,A2)←⋯. (20)

But how can we construct a filtration of ambient spaces (18), that satisfies the apparently strong condition ? The answer lies at the heart of ASC: to construct pick a suitable polynomial vanishing on and evaluate its gradient at a nonsingular point of . Notice that will lie in some irreducible component of . Then take to be the hyperplane of defined by the gradient of at . We know from Proposition C that must contain . To construct we apply essentially the same procedure on the pair : take a suitable polynomial that vanishes on , but does not vanish on , and take to be the hyperplane of defined by . As we will show in section 4, it is always the case that and so . Now notice, that after precisely such steps, where is the codimension of , will be a -dimensional linear subspace of that by construction contains . But is also a -dimensional subspace and the only possibility is that . Observe also that this is precisely the step where the filtration naturally terminates, since there is no polynomial that vanishes on but does not vanish on . The relations between the intermediate ambient spaces and subspace arrangements are illustrated in the commutative diagram of (21). The filtration in (21) will yield the irreducible component of that contains the nonsingular point that we started with. We will be referring to such a point as the reference point. We can also take without loss of generality . Having identified , we can pick a nonsingular point and construct a filtration of as above with reference point . Such a filtration will terminate with the irreducible component of containing , which without loss of generality we take to be . Picking a new reference point and so on, we can identify the entire list of irreducible components of , as described in Algorithm 1.

 (21)
###### Example

Consider the setting of Examples 2.3 and 2.3. Suppose that in the first filtration the algorithm picks as reference point . Suppose further that the algorithm picks the polynomial , which vanishes on but certainly not on . Then the first ambient space of the filtration associated to is constructed as . Since , this gives that is precisely the plane of with normal vector . Then is constructed as , which consists of the union of three lines , where is the intersection of with (see Figs. 3 and 3).

Since , the algorithm takes one more step in the filtration. Suppose that the algorithm picks the polynomial , where is the unique normal vector of that is orthogonal to , for (see Fig 3). Because of the general position assumption, none of the lines is orthogonal to another. Consequently, . Moreover, since , we have that defines a line in that must contain . Intersecting with we obtain and the filtration terminates with output the irreducible component of associated to reference point .

Continuing, the algorithm now picks a new reference point , say . A similar process as above will identify as the intermediate ambient space of the filtration associated to that arises after one step. Then a third reference point will be chosen as and will be identified as the intermediate ambient space of the filtration associated to that arises after two steps. Since the set is empty, the algorithm will terminate and return , which is up to a permutation a decomposition of the original subspace arrangement into its constituent subspaces.

Strictly speaking, Algorithm 1 is not a valid algorithm in the computer-science theoretic sense, since it takes as input an infinite set , and it involves operations such as checking equality of the infinite sets and . Moreover, the reader may reasonably ask:

1. Why is it the case that through the entire filtration associated with reference point we can always find polynomials such that ?

2. Why is it true that even if then ?

We address all issues above and beyond in the next section, which is devoted to rigorously establishing the theory of the FASC algorithm.141414At this point the reader unfamiliar with algebraic geometry is encouraged to read the appendices before proceeding.

## 4 Filtrated Algebraic Subspace Clustering - Theory

This section formalizes the concepts outlined in section 3. section 4.1 formalizes the notion of a set being in general position inside a subspace arrangement . Sections 4.2-4.4 establish the theory of a single filtration of a finite subset lying in general position inside a transversal subspace arrangement , and culminate with the Algebraic Descending Filtration (ADF) algorithm for identifying a single irreducible component of (Algorithm 2) and the theorem establishing its correctness (Theorem 2). The ADF algorithm naturally leads us to the core contribution of this paper in section 4.5, which is the FASC algorithm for identifying all irreducible components of (Algorithm 3) and the theorem establishing its correctness (Theorem 3).

### 4.1 Data in general position in a subspace arrangement

From an algebraic geometric point of view, a union of linear subspaces is the same as the set of polynomial functions that vanish on . However, from a computer-science-theoretic point of view, and are quite different: is an infinite set and hence it can not be given as input to any algorithm. On the other hand, even though is also an infinite set, it is generated as an ideal by a finite set of polynomials, which can certainly serve as input to an algorithm.That said, from a machine-learning point of view, both and are often unknown, and one is usually given only a finite set of points in , from which we wish to compute its irreducible components .

To lend ourselves the power of the algebraic-geometric machinery, while providing an algorithm of interest to the machine learning and computer science communities, we adopt the following setting. The input to our algorithm will be the pair , where is a finite subset of an unknown union of linear subspaces of , and is an upper bound on . To make the problem of recovering the decomposition from well-defined, it is necessary that be uniquely identifiable form . In other words, must be in general position inside , as defined next. [Points in general position] Let be a finite subset of a subspace arrangement . We say that