A Notion of Harmonic Clustering in Simplicial Complexes

10/16/2019
by   Stefania Ebli, et al.
0

We outline a novel clustering scheme for simplicial complexes that produces clusters of simplices in a way that is sensitive to the homology of the complex. The method is inspired by, and can be seen as a higher-dimensional version of, graph spectral clustering. The algorithm involves only sparse eigenproblems, and is therefore computationally efficient. We believe that it has broad application as a way to extract features from simplicial complexes that often arise in topological data analysis.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 7

11/01/2019

Optimality of Spectral Clustering for Gaussian Mixture Model

Spectral clustering is one of the most popular algorithms to group high ...
08/21/2020

KCoreMotif: An Efficient Graph Clustering Algorithm for Large Networks by Exploiting k-core Decomposition and Motifs

Clustering analysis has been widely used in trust evaluation on various ...
05/11/2018

Convex Programming Based Spectral Clustering

Clustering is a fundamental task in data analysis, and spectral clusteri...
06/08/2015

A Topological Approach to Spectral Clustering

We propose a clustering algorithm which, for input, takes data assumed t...
10/01/2014

Riemannian Multi-Manifold Modeling

This paper advocates a novel framework for segmenting a dataset in a Rie...
03/04/2014

The Hidden Convexity of Spectral Clustering

In recent years, spectral clustering has become a standard method for da...
05/15/2019

EasiCS: the objective and fine-grained classification method of cervical spondylosis dysfunction

The precise diagnosis is of great significance in developing precise tre...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

An important objective in modern machine learning, and part of many scientific and data analysis pipelines, is

clustering [berkhin2007survey]. By clustering, we generally mean the separation of data into groups, in a way that is somehow meaningful for the domain-specific relationships that govern the underlying data and problem in question. However, the demands that the clustering scheme should satisfy are of course inherently vague.

For data that form a point cloud in Euclidean space, and where one expects clusters to exist, one may employ elementary methods such as -means clustering [steinhaus1956division]. For data in a more abstract “similarity space”, for which no obviously meaningful Euclidean embedding exists, researchers invented the schemes [sorensen1948method, sneath1957application] that we today refer to as hierarchical clustering. Alternatively, one can derive a graph structure from some notion of similarity between the data points. Treating the data points as vertices of a graph allows one to exploit the popular and highly successful spectral clustering techniques [vonluxburg2007tutorial, ng2001] which developed from the field of spectral graph theory [chung1997spectral].

Although the graph structure provides us with additional information about the data, graphs are intrinsically limited to modeling pairwise interactions. The success of topological methods in studying data, and the parallel establishment of topological data analysis (TDA) as a field [edelsbrunner2000topological, zomorodian2005computing] (see also [carlsson2008, chazal2017, edelsbrunner2010computational, ghrist2008barcodes] for modern introductions and surveys), have confirmed the usefulness of viewing data through a higher-dimensional analog of graphs [moore2012, patania2017]. Such a higher-dimensional analog is called a simplicial complex, a mathematical object whose structure can describe -fold interactions between points. Their ability to capture hidden patterns in the data has led to various applications from biology [giusti2015, reimann2017] to material science [hiraoka2016]. Recent work has also expanded classical graph-centric results — such as Cheeger inequalities [gundert2014higher, braxton2018], isoperimetric inequalities [parzanchevski2016] and spectral methods [horak2013spectra] — to simplicial complexes. This leads naturally towards a novel domain of “spectral TDA” methods.

In this paper we present the harmonic clustering algorithm

, a novel clustering scheme inspired by the well-known spectral clustering algorithm for graphs. Our method, like spectral clustering, does not require any parameter optimization and involves only computing the smallest eigenvalue eigenvectors of a sparse matrix. The harmonic clustering algorithm is applied directly to a simplicial complex and it outputs a clustering of the

simplices (of a fixed degree) that is sensitive to the homological structure of the complex, something that is highly relevant in TDA. Moreover, since simplices can encode interactions of higher order than just the pairs captured by graphs, our algorithm allows us to cluster complex community structures rather than just the entities they comprise.

Our method can be seen as complimentary to the one presented in [braxton2018].

1.1 Spectral graph theory

The method we present in this paper does not require many formal results from spectral graph theory. The notions relevant for our purposes are described below for the sake of completeness.

In its simplest form, the Laplacian of an undirected and unweighted finite graph is taken to be the positive definite matrix , where is the adjacency matrix of and its diagonal degree matrix (i.e. the row/column sums of ). The normalized Laplacian is then defined as . For reasons that will become clear later on (see 2.1), we will write

for the free real vector space generated by the vertices of

, and consider as the matrix of a linear map in this basis.

Already in the middle of the 19th century it was clear that the eigenvalue spectrum of has a lot to say about , as is evident from as early as a historic theorem of Kirchhoff relating the eigenvalues of the Laplacian with the number of spanning trees in the graph [kirchhoff1847]. From the 1950s, graph theorists and quantum chemists were independently discovering more relationships between a graph and the eigenspectrum of its Laplacian. However, the publication of the book [cvetkovic1979] may be said to mark the start of spectral graph theory as a field in its own right. A modern introduction to the field and references to the results listed below can be found in [chung1997spectral].

The spectrum of encodes information about the connectivity of the graph. For instance, the number of connected components of the graph is equal to the dimension of the kernel of . Moreover, the eigenvectors associated to the zero eigenvalues, also called harmonic representatives, take constant values on connected components. A perhaps more interesting result is given by the Cheeger constant [cheeger1969lower], a measure of how far away a connected graph is from being disconnected by bounding the smallest non-zero eigenvalue of .

Theorem 1 (cheeger1969lower, cheeger1969lower [cheeger1969lower]; see also e.g. [chung1997spectral]).

Let be a finite, connected, undirected, unweighted graph. Write for the triples with and such that and

Define the Cheeger constant of as

Then the first non-zero eigenvalue of the graph’s normalized Laplacian satisfies

A partition of as that attains the Cheeger constant is called a Cheeger cut. It is known that finding an exact Cheeger cut is an NP-hard problem [szlam2010total]. One of the best known approaches to approximating the Cheeger cut is the spectral clustering method, which takes the first non-zero eigenvalue eigenvector of the graph Laplacian as a relaxed real-valued solution of the original discrete optimization problem [vonluxburg2007tutorial]. Namely, the smallest non-zero eigenvector of , also called the Fiedler vector or the connectivity vector [fiedler1973algebraic], can be exploited to find the best partition of the graph into two “almost disconnected” components. The Cheeger cut can be easily generalized to find “almost disconnected” components using the first non-zero eigenvectors of the graph Laplacian  [vonluxburg2007tutorial].

1.2 Graph spectral clustering

The Fiedler vector being a relaxed solution of the Cheeger cut has implications for clustering the vertices of a graph into “almost disconnected” components [vonluxburg2007tutorial, belkin2002laplacian]. For the remainder of this section we will assume that the graph under consideration is connected.

Graph spectral clustering of a graph with Laplacian works in two steps. First, one uses the information encoded in the lowest-eigenvalue eigenvectors of to map into low-dimensional Euclidean space. One thereafter uses standard -means or any applicable Euclidean clustering technique on the points in the image of this map, before pulling back to . Specifically, we will write for the eigenvectors associated with the first non-zero eigenvalues of . One defines a function, also called a spectral embedding, by

(1)

where is the inner product on that makes orthonormal. As a finite Euclidean point cloud, is then clustered in by standard -means or any suitable clustering algorithms. The clustering obtained is then pulled back to . Figure 1 shows an example. Observe that in this case, mapping into the real line using only the Fiedler vector would suffice (i.e. ).

Figure 1: Graph spectral clustering of the nodes of a graph with two well-connected components weakly interconnected. Clustering using the Fiedler vector produces as clusters the well-connected components.

As pointed out in [vonluxburg2007tutorial]

, spectral clustering is one of the standard approaches to identify groups of “similar behavior” in empirical data. It is therefore not surprising that it has been successfully employed in many fields ranging from computer science and statistics to biology and social science. Moreover, compared to other approaches, such as Gaussian Mixture Models clustering, spectral clustering does not require any parameter optimization and can be solved efficiently by standard linear algebra methods.

2 Harmonic clustering in simplicial complexes

Our method is inspired by spectral clustering in graphs, but applies instead to a higher-dimensional analog, namely simplicial complexes. Instead of clustering only vertices, which are the zero-dimensional building blocks of graphs and simplicial complexes, the method clusters independently building blocks of any dimension.

This section outlines the prerequisite basic constructions from algebraic topology before describing our method. A reader interested in more background on algebraic topology is directed to standard textbooks [hatcher].

Those wishing a quick overview of method can view it in algorithmic form in figure 3.

2.1 Algebraic topology

A simplicial complex is a collection of finite sets closed under taking subsets. We refer to a set in a simplicial complex as a simplex of dimension if it has cardinality . Such a -simplex has faces of dimension , namely the sets omitting one element, which we will denote as when omitting the ’th element. While this definition is entirely combinatorial, we will soon see that there is a geometric interpretation, and it will make sense to refer to and think of -simplices as vertices, -simplices as edges, -simplices as triangles, -simplices as tetrahedra, and so forth.

Let be the free real vector space with basis , the set of -simplices in a simplicial complex . The elements of are called -chains. These vector spaces come equipped with boundary maps, namely linear maps defined by

with the convention that and for convenience. Figure 2 shows how the boundary maps give a geometric interpretation of simplicial complexes.

One readily verifies that , and so is a real chain complex. By the ’th homology vector space of we will mean the ’th homology vector space of this chain complex, namely

The elements of are called -cycles, while those of are called -boundaries, as can be seen geometrically in Figure 2. The Betti numbers are the dimensions of the homology vector spaces, and we write . Intuitively, the Betti numbers count connected components, non-bounding loops, non-bounding cavities, and so forth.

We emphasize again that this is homology with real coefficients, not integer or finite field, as is common in TDA.

Figure 2: A simplicial complex with -simplices, -simplices (the edges) and -simplices (the filled triangles), with some highlighted -chains. The highlighted simplices in these represent the edges with non-zero coefficient in each chain (the unfamiliar reader is invited to fill in possible values for these coefficients). The red -chain consists of a single -simplex, and is neither a cycle nor a boundary. The orange -chain has trivial boundary, and is therefore a cycle. It is not a representative of any non-trivial homology class, for it is the boundary of -chain consisting of the three -simplices it encloses. The green and the blue -chains are cycles that represent the same homology class (intuitively the -dimensional hole in the middle). is -dimensional, ’s single connected component, while is -dimensional due to the central hole.

2.2 Simplicial Laplacians

We are in this paper concerned with finite simplicial complexes, and assume that they are built in a way that encodes useful information about the data being studied. We will briefly discuss the case where each simplex in comes equipped with extra data — including, but not limited to the filtration/weighting information that is ubiquitous in TDA — or with a normalization factor derived from the complex’s structure, in the form of a function . The latter is analogous to the various normalization schemes that are often used in graph spectral theory. Our computational experiments, however, will only consider the case .

The weights are encoded into the chain complex by endowing each degree with the inner product that makes all simplices orthogonal, and a simplex have norm given by the weight, i.e.

Further discussions of weighting schemes can be found in [horak2013spectra].

We place no further assumptions on the simplicial complex that we take as input. In particular, it is not necessary for it to come equipped with some embedding into Euclidean space, nor do we demand that it triangulates a Riemannian manifold. Therefore dualities like the Hodge star, which is used to construct the Hodge–de Rham Laplacian in the smooth setting [madsen1997calculus] that motivates us, are unavailable for our method. The same is true for discrete versions of the Hodge star, such as that of Hirani [hirani2003thesis]. Instead of dualizing with respect to a Hodge star, to define a discrete version of the Laplacian for simplicial complexes, we simply take the linear adjoint of the boundary operator with respect to the inner product, defining by

In analogy with Hodge–de Rham theory, we then define the degree- simplicial Laplacian of a simplicial complex as the linear operator such that

The harmonics are defined as

Observe that there are Laplacians for a complex of dimension . In most practical applications, the matrices for the Laplacians are very sparse and can easily be computed as a product of sparse boundary matrices and their transposes.

The following discrete version of the important Hodge decomposition theorem is a simple exercise in linear algebra in the current setting.

Theorem 2 (eckmann1944, eckmann1944 [eckmann1944]).

The vector spaces of chains decompose orthogonally as

Moreover,

  1. the harmonics are both cycles and cocycles (i.e. cycles with respect to )

  2. the harmonics are the -minimal representatives of their (co)homology classes, i.e. if and are homologous, then .

The first detailed work on the spectral properties of this kind of simplicial Laplacian was carried out by Horak and Jost [horak2013spectra]. Recently Steenbergen et al. [steenbergen2014cheeger] provided a notion of a higher dimensional Cheeger constant for simplicial complexes. At the same time, Gundert and Szedlák [gundert2014higher] proved a lower bound for a modified version of the higher dimensional Cheeger constant for simplicial complex which was later generalized to weighted complexes by Braxton et al. Mukherjee and Steenbergen [mukherjee2016random] developed an appropriate notion of random walks on simplicial complexes, and related the asymptotic properties of these walks to the simplicial Laplacians and harmonics. It is worth mentioning that, to the best of our knowledge, no connection between the eigenvectors of the simplicial Laplacian and an optimal cut for simplices in higher dimensions is known.

Our contribution is a notion of spectral clustering for simplicial complexes using the harmonics.

2.3 Harmonic clustering

Observe that the ordinary graph Laplacian, as described in section 1.1, is just the matrix of in the standard basis for . The function in equation (1) can thus be seen as projecting the -simplices onto a subspace of low-but-nonzero-eigenvalue eigenvectors of . The zero part of the spectrum is not used. Theorem 2 makes the reason clear: harmonics in have the same coefficient for every vertex in a connected component of . As connectivity information is easy to obtain anyway, there is little use in adding these eigenvectors to the subspace that projects onto. This is not so for the higher Laplacians. In fact, our method primarily uses the harmonics, and only optionally ventures into the non-zero part of the eigenspectrum.

In what follows, is a fixed simplicial complex arising from data. The particulars of how was built from data is outside the scope of this paper, and is a topic that is well-studied in the field of TDA in general. Our goal is to obtain a useful clustering of for some chosen . We assume that is of low “homological complexity” in degree , by which we mean that is small (less than , say).

Analogously to above, we define the harmonic embedding

where , is orthogonal projection, and is any vector space isomorphism. In practice, we simply pick an orthonormal basis for and let

In many situations of practical use, it turns out that many points in lie along one-dimensional subspaces of . The membership of a point in such a subspace is what is used to cluster the -simplex (or to leave it unclustered in case it is not judged to be sufficiently close to lying in one of the subspaces). This amounts to clustering by performing Euclidean subspace clustering of . A variety of Euclidean subspace clustering methods are available, but are outside the scope of this paper. Examples include independent component analysis [ica], SUBCLU [subclu], and density maximization on (or, more precisely, on ), which itself has a multitude of approaches, including purely TDA-based ones by means of persistent homology of sublevel sets.

We point out that the choice of the isomorphism does not matter on a theoretical level. It may, however, have practical implications for how easy it is to perform subspace clustering. In experiments we typically choose to be the isomorphism that sends to the standard basis vector . Choosing a different orthonormal basis for then just amounts to an element of acting on .

Figure 3 summarizes our method in algorithmic form.

0:  Integer ; simplicial complex with small, , and inner products on .
  
   (computed using iterative methods [SLEPc] on )
  for  to  do
      
  end for
  
  for  to  do
      
  end for
  Homologically sensitive clustering of -simplices in .
Figure 3: Our method in algorithmic form. The subroutine refers to any Euclidean subspace clustering scheme, such as independent component analysis [ica], SUBCLU [subclu], or density maximization on (or, more precisely, on ). The latter can be done using methods from TDA, for example by means of persistent homology of certain sublevel sets. Note that there may be unclustered simplices, i.e. it may happen that .

3 Experimental results

In this section we present experimental results for the harmonic clustering algorithm on synthetic data. Specifically, we focus on clustering the edges of various constructed simplicial complexes. The outcomes of our experiments suggest that the harmonic clustering algorithm provides clusters sensitive to the homology of the complex. Comparing our results with those of the traditional spectral clustering algorithm applied to the graph underlying the simplicial complex reinforces the idea that our algorithm reveals substantially different patterns in data compared to the classical method.

Below, we consider four simplicial complexes. Three of them are complexes built from Euclidean point clouds by standard methods from TDA, while one is a triangulation of a torus. We reiterate that our method works with abstract simplicial complexes without utilizing any embedding of these into an ambient space. Euclidean point clouds just happen to be a good and common source of simplicial complexes in TDA, and allow for visualization of the obtained clustering in a way that easily relates to the original data.

An important step in preprocessing many kinds of input data in TDA is constructing a simplicial complex satisfying certain theoretical properties. In particular, if the input data come from points sampled from a topological space , one may wish for the homology of the complex to coincide with the homology of .

Two constructions for which some such guarantees exist are the alpha complex [alpha] and the Vietoris–Rips (VR) complex [vietoris1927]. Both can be seen as taking a point cloud and producing a filtered simplicial complex , i.e. a sequence with the property that whenever . We wish to work with a single simplicial complex, not a filtration, so we use persistent homology (see e.g. [ghrist2008barcodes]) to find the filtration scale for which

has the appropriate homology. Of course, since in practice one probably has little or no knowledge of

itself, one cannot necessarily know the “correct” to consider. However, it is often the case in TDA that long-lived homological features — that is to say, homology classes that remain non-trivial under the induced maps for large — express interesting properties of the underlying space. We therefore choose a to consider by looking for a scale within the range of a manageable number of long-lived features and few short-lived ones in the degree under consideration.

In the following experiments, we simplify the setup in the algorithm in figure 3 by performing the subroutine in a somewhat ad hoc semi-manual way. Specifically, all the images of the ’s lie in or in these experiments, so we manually pick out the subspaces in question. Then, the points in are orthogonally projected onto each of the subspaces. A point is determined to lie on subspace if has norm at least , while the onto all other subspaces has norm less than . The simplex is then said to be in cluster number . If the above is not true for any of the subspaces, is considered unclustered.

In many of the experiments that follow, many points in end up determined as “unclustered” because they project well onto neither of the -dimensional subspaces, or project too well onto multiple of them, as described in 2.3. This is not necessarily a problem, as the parts that are clustered contain a lot of useful information. Moreover, the problem can be reduced by choosing less ad hoc subspace clustering methods than we are currently employing.

To ease visualization, we focus on simplicial complexes that naturally live in or because they arise from point clouds.

3.1 Wedge of a sphere and two circles

In this experiment, we consider a noisy sampling of realized as a central unit sphere with unit circles wedged onto antipodal points. We sampled points uniformly randomly from the central sphere, adding radial uniform noise with amplitude . The circles were sampled using points each, again with a radial noise of . This yields a point cloud with points, which is shown in figure 4. The VR complex is certainly a suboptimal choice of simplicial complex to build on this kind of data, but we chose it to demonstrate that our method works well also for such an overly dense complex. The complex, constructed at scale and denoted within this section, has -simplices and -simplices, and the Betti numbers are , , , as for itself.

Figure 4: The point sample under consideration in 3.1.

We focus on clustering the -simplices of the complex. The image of in is shown in figure 5. The points are colored according to which of the two one-dimensional subspaces they are deemed to belong to. The determination was made by a simple criterion of projecting well enough onto one of the lines, but not the other. Points that project well onto both or neither are considered unclustered and shown as red. Figure 6 shows this clustering pulled back to the complex itself, excluding the unclustered edges. Observe how the method separates the -simplices of the VR complex in a manner that is sensitive to the two non-bounding cycles that generate homology in degree (the two circles).

Figure 5: The image of for the -simplices in the VR complex from the experiment in 3.1. The dashed lines indicate the subspaces used for clustering. The inset shows a detailed view near the origin, where one can see a large number of points in gray that are unclustered due to them projecting too well onto both subspaces.
Figure 6: The clustering from figure 5 pulled back to the -simplices of the VR complex from 3.1, which is here drawn in using the coordinates of the points for visualization purposes only. The unclustered -simplices, in number, are not drawn.

We also repeated the experiment with one of the circles in moved to be attached to the other circle instead of the sphere. This space is obviously homotopy equivalent to , but is geometrically very different. figure 7 shows the result. Observe that the sphere is now captured by its adjacent circle, and that the unclustered edges tend to be those near where the two circles intersect.

Figure 7: The result of clustering the rearranged point cloud from 3.1. Again the -simplices of the VR complex are clustered in a way respecting the generators of 1-homology. The unclustered simplices, in number, are drawn in gray. (That the sphere appears solid is only a visualization artifact; the -simplices are not drawn.)

3.2 Punctured plane

In this experiment we uniformly randomly sample points from a unit square in with three disks of radius cut out. The points are seen as faint does in figure 8. We construct the alpha complex at parameter , and denote it by in this section. It has Betti numbers , and . There are -simplices and -simplices. We again focus on the -simplices for clustering. The codomain of is now , and the image is clustered according to three -dimensional linear subspaces. The result is shown in figure 8, and we again observe how the obtained clustering occurs with respect to the punctures of the square.

Figure 8: The point cloud of the experiment in 3.2 is shown as faint dots. The punctures can be seen in near , and , and one observes that the clustered -simplices (blue, green, orange, respectively) follow the punctures. The gray -simplices are unclustered. The -simplices have not been drawn.

3.3 Torus

We next perform clustering of the edges of two different tori.

3.3.1 From a point cloud

We uniformly randomly sampled points from the unit square and map these under to produce a point sample of a torus in . The points were then given a uniformly random noise of amplitude in both radii. Again a VR complex was built, at scale . It has -simplices and -simplices, and has the homology of a torus, i.e. , , . VR was chosen in order for the clustering task to be more complicated than in a more orderly alpha complex.

Figure 9 shows the image in of under . The subspaces for clustering are somewhat harder to make out than before, but they can still be found. The result of clustering by them can be seen in figure 10. Observe that the two clusters respect the two independent unfilled loops of the torus.

Figure 9: The clustering of the -simplices from the simplicial complex obtained from the sampled torus in the experiment in 3.3.1. The unclustered points are shown in gray. They are in number.
Figure 10: The clustering in figure 9 pulled back to -simplices of the torus from the experiment in 3.3.1. The unclustered ones are not shown, something which may make the torus appear broken.

3.3.2 A triangulation of the flat torus

As a smaller, more abstract and noise-free example, we consider a triangulation of a flat torus. The considered triangulation consists of a simplicial complex with vertices, -simplices and -simplices. The image of its -simplices in under is shown in figure 11. The arrangement into a perfect hexagon means that there are in fact three subspaces that can be chosen for clustering. The clusters are shown in figure 12. The arrangement into a hexagon, and therefore the result of three instead of the expected two clusters, disappear if one breaks some of the symmetry in the triangulation, for example by having some of the diagonal edges go the opposite direction.

Figure 11: The clustering of the -simplices from the simplicial complex obtained as a triangulation of the flat torus from the experiment in 3.3.2. Note that many points overlap. Three clusters are given by points lying on three different linear subspaces.
Figure 12: A triangulation of the flat torus represented as a rectangle with pairs of opposing edges identified. The -simpilces are clustered into three groups (orange, blue and green). Those in orange and blue are representatives of the two -homology classes of the complex, whereas the green ones are a linear combination of the others.

3.4 Clustering -simplices

We have illustrated our method only on -simplices so far for ease of visualization. To point out that it also performs well in other dimensions, we sampled points (each) from two spheres of radius centered at and , each with a radial uniform random noise with amplitude . We computed the alpha complex at parameter , so as to create a rather messy region between the spheres. There are -simplices and -simplices, and , and as expected. Our clustering method performs as expected, producing clusters of that correspond to homological features, as is shown in figure 13.

Figure 13: The output of our method when clustering the -simplices from the complex in the experiment in 3.4 are the blue and orange clusters. The gray simplices are unclustered.

3.5 Comparison with graph spectral clustering

It is worth comparing clustering obtained from our method with the ones obtained by clustering the nodes of the graph underlying each simplicial complex using the graph spectral clustering algorithm. Figure 14 shows the results of graph spectral clustering on the nodes of the graph underlying the complex in figure 7. The two first graph Laplacian eigenvectors were used to map the nodes into , and then -means was used to find two clusters. Similarly, figure 15 displays three clusters on the nodes of the graph underlying the complex representing a punctured plane with three holes in figure 8. The nodes are mapped to using the three first graph Laplacian eigenvectors, after which -means was used to find three clusters. In both cases we see that the clusters do not reflect any obviously meaningful property of the underlying data, unlike our method, which clusters in a way sensitive to homology.

Figure 14: Graph spectral clustering of the vertices of the graph underlying the VR complex of figure 7.
Figure 15: Graph spectral clustering of the vertices of the graph underlying the alpha complex from figure 8.

4 Conclusions and future work

In this paper we have presented a novel clustering method for simplicial complexes, one that is sensitive to the homology of the complex. We see the method as a contribution to an emerging field of spectral TDA [cascade, barbarossa2018learning]. Our results suggest that the algorithm can be used to extract homological features for simplices of any degree. Experiments in various simplicial complexes demonstrate the ability of the method to accurately detect edges belonging to different non-bounding cycles. Similar results, not shown in this article for practical considerations of visualization, have been obtained by clustering simplices in higher dimensions. The sub-problem of the structure of the linear subspaces in the image of , and how to accurately cluster based on demand, require further investigation of both a mathematical and a algorithmic nature.

While our method seems robust to noise in the underlying data, a more thorough investigation into the output’s dependence on noise, and the output’s dependence on the scale at which a point-cloud-derived simplicial complex is built, is warranted.

Moreover, it has not eluded us that the method as outlined is not restricted to clustering just simplices. Other finitely generated chain complexes, such as discrete Morse complexes or cubical complexes, naturally lend themselves to the same analysis. One may also want to consider if there are theoretical implications even in the smooth case.

Further development will also include enlarging the target of the projection in to include non-zero eigenvectors of , as in graph spectral clustering. Preliminary results indicate that this yields a further refinement of the homologically sensitive clusters into “fair” subdivisions.

Finally, further work needs to explore the effects of weighting. Both structural weighting, i.e. deriving weights from the local connectivity properties of the complex, as is often done with graph spectral clustering, and weighting originating from the underlying data itself, as is common in TDA.

A potential future application that we suspect fits our method well is collaboration networks [patania2017], where -fold collaborations clearly cannot accurately be encoded as pairwise ones.

Acknowledgments

Alpha complexes and persistent homology were computed using GUDHI [GUDHI]. Eigenvector computations were done with SLEPc [SLEPc].

We would like to thank K. Hess for valuable discussions.

Both authors were supported by the Swiss National Science Foundation grant number 200021_172636.