Fitting a Simplicial Complex using a Variation of k-means

07/13/2016
by   Piotr Beben, et al.
University of Southampton
0

We give a simple and effective two stage algorithm for approximating a point cloud S⊂R^m by a simplicial complex K. The first stage is an iterative fitting procedure that generalizes k-means clustering, while the second stage involves deleting redundant simplices. A form of dimension reduction of S is obtained as a consequence.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/02/2019

Anthropometric clothing measurements from 3D body scans

We propose a full processing pipeline to acquire anthropometric measurem...
07/24/2019

One-stage Shape Instantiation from a Single 2D Image to 3D Point Cloud

Shape instantiation which predicts the 3D shape of a dynamic target from...
08/07/2015

Dimension reduction for model-based clustering

We introduce a dimension reduction method for visualizing the clustering...
06/02/2021

Matrix factorisation and the interpretation of geodesic distance

Given a graph or similarity matrix, we consider the problem of recoverin...
07/30/2019

PointHop: An Explainable Machine Learning Method for Point Cloud Classification

An explainable machine learning method for point cloud classification, c...
04/16/2017

Learn-Memorize-Recall-Reduce A Robotic Cloud Computing Paradigm

The rise of robotic applications has led to the generation of a huge vol...
06/27/2012

An Iterative Locally Linear Embedding Algorithm

Local Linear embedding (LLE) is a popular dimension reduction method. In...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

The use of simplicial complexes as a means for estimating topology has found many applications to data analysis in recent times. For example, unsupervised learning techniques such as persistent homology 

[MR2476414, MR2405684] often use what are known as Cech or Vietoris-Rips filtrations to capture multi-scale topological features of a point cloud . The simplicial complexes in these filtrations individually are not always a good representation of the actual physical shape of since, for example, they often have a higher dimension than . Our aim is to give an algorithm that can approximate to the greatest extent possible when fed any simplicial complex mapped linearly into . This algorithm has several nice properties, including a tendency towards preserving embeddings, as well as reducing to k-means clustering when is -dimensional. The resulting fitting is further refined by deleting simplices that have been poorly matched with ; the end result being a locally linear approximation of . A lower dimensional representation of in terms of barycentric coordinates follows by projecting onto this approximation.

2. Algorithm Description

Fix be a (geometric) simplicial complex, and let be the set of vertices of . The facets of are the simplices of that have the highest dimension in the sense that they are not contained in the boundary of any other simplex. may then be represented as a collection of facets, each of which is represented by the set of vertices that it contains. The dimension of a facet is equal to the number of vertices it contains minus . When we refer to the boundary of a simplex , we mean the union of its smaller dimensional boundary simplices, even when the simplex is embedded as a subset of . Its interior is minus this boundary.

Any point is contained in a unique smallest dimensional simplex . We may represent uniquely as a convex combination

over the vertices that are contained in , where for some barycentric coordinates . A map is said to be linear if it is linear on each simplex of . Namely,

So restricts to a linear embedding on each simplex of , and is uniquely determined by the values that it takes on its vertices . Thus, we have a convenient representation of in terms of the ’s.

2.1. Fitting

Fix our finite set of data points. Suppose is any choice of linear map, meant to represent our initial fitting of to . Starting with , our aim is to obtain successively better fittings of to , at each iteration giving a better reflection of the shape and structure of . We do this as follows.

1:  set and ;
2:  repeat
3:     increment ;
4:     for each  do
5:        find a choice of such that is nearest to , and the smallest dimensional simplex that contains ;
6:        compute such that and is the convex combination
7:     end for
8:      using , construct a new linear map , defined on each vertex by setting
when , and otherwise, where
9:  until a given stop condition has not been reached (e.g. is small)
10:  set ;
11:  return , each , , and the barycentric coordinates .
Algorithm 1 (Simplicial Means) First stage: fitting to

The assignment for in Step 8 can be generalized to

where is the learning rate. Larger values for lead to faster convergence, but poorer fittings and reduced stability. Taking to be small () but nonzero has every advantage in addition to preventing the change in the mapping of a vertex becoming sluggish between iterations when the barycentric coordinates are all near zero. On the other hand, this does not prevent mapping of from becoming stuck in its current position when each is zero (regardless of the value ) since consists only of those for which is on the interior of some simplex that contains (meaning those on the boundary faces of opposite to are ignored). There are advantages and disadvantages to this. On the positive side, higher dimensional simplices are often prevented from being collapsed onto lower dimensional linear patches of data, but this can also be prevent simplices from being fitted in situations where it would be desired. For example, when fitting homeomorphic to a sphere to data sampled from a sphere (Figure LABEL:LF8), craters can form and remain in subsequent iterations if data points surrounding the crater are in fact nearest to points lying on its rim. This issue can be circumvented (but our advantages reversed!) by taking the slightly larger set

in place of , together with .

There is a straightforward intuition underlying Algorithm 1. Consider the case where at the iteration is a linear embedding. Then and can then be thought of as and the subspace , and Algorithm 1 in essence has attract towards it by having each point exert a pull on a nearest point . The caveat here is that in doing so the embedding must be kept linear, so the net effect of this pull must come down to its influence on the individual vertices of the simplex containing . The influence on each of these vertices should in turn decrease with some measure of distance of to . This is analogous to pulling on a string attached at some point along a perpendicular uniform rod floating in space, in which case the acceleration of an endpoint of the rod in the direction of the pull decreases with increasing distance from the string. Since the size or shape of the simplex in our context is irrelevant, the distance from to is measured in terms of its barycentric coordinates . In particular, if lies near a boundary simplex opposite to , then is near and has little influence on , while has full influence on when , or equivalently, when . The accumulation of these pulling forces on a vertex leads us to take the centroid of the over all ; equivalently, over all that are closest to a point lying on the interior of a simplex that has as a vertex.

When is -dimensional – namely a collection of disjoint vertices – there is only a single barycentric coordinate for each and , so Algorithm 1 reduces to the classical k-means algorithm with initial clusters . Algorithm 1

is thereby a high dimensional non-discrete generalization of k-means clustering. This is perhaps in the same spirit as persistent homology is a higher dimensional generalization of hierarchical clustering (also by way of simplicial complexes).

2.2. Preserving embeddings

To obtain a our fitting , why not simply apply the k-means algorithm to the vertices ? This would likely be a poor fitting of since the arrangement of simplices comprising is ignored completely. Moreover,

would probably not be an embedding irrespective of

. Take for example the case where is the following four-vertex graph embedded in