1. Introduction
The problem of data clustering has been extensively studied. Clustering is used in fields as diverse as biology, psychology, machine learning, sociology, image processing, and chemistry, in order to discover hidden structure in data. Among the earliest systematic treatment of clustering theory was that of Jardine and Sibson in 1971
[20]. Since then, there have been several distinct directions of research in clustering theory, with only modest communication between researchers pursuing different paths.The classical work of Jardine and Sibson was followed by other similarly comprehensive works such as Everitt [16]. Further theoretical work on these mostly classical methods was also done by Kleinberg [21] and Carlsson and Mémoli [9, 10]. Work on computing phylogenetic trees inspired a seminal paper by Bandelt and Dress [3] on split decompositions of metrics. This line of research was continued with investigations into split systems and cut points of injective envelopes of metric spaces. Representative papers include [14] and [15]. While not explicitly clustering methods, these methods are quite similar in spirit to stratified clustering schemes. In this category we might also add the classification of injective envelopes of sixpoint metric spaces by Sturmfels and Yu [25].
Bandelt and Dress also had a large influence on another field as a result of their work on weak hierarchies [2, 4]. This led to work by Diatta, Bertrand, Barthélemy, Brucker, and others on indexed set systems (see, e.g., [6, 8, 13]). An interesting recent development here is the work by Janowitz on ordinal clustering [18].
Additional work has been done on topologicallybased clustering methods. This includes the Mapper
algorithm by Singh, Mémoli, and Carlsson [24], as well as work on persistence based methods
[12] and Reeb graphs [17].
Meanwhile, most users of clustering methods default either to a classical linkagebased clustering method (such as singlelinkage or completelinkage) or to more geometrically based methods like means. Unfortunately, the wide array of clustering theories has had little impact on the actual practice of clustering.
This paper works to bridge some of these gaps by extending a recent paper of Carlsson and Mémoli [10]
. Their paper introduced the idea of viewing clustering methods as functors from a category of metric spaces to a category of classifying objects giving rise to clusters (e.g. partitions, dendrograms). We will only make use of the very basics of category theory including the notions of categories, morphisms, functors and natural transformations. This abstract language is extremely powerful for not only compactly representing complex information, but also providing a formalism for reasoning about natural operations. For those unfamiliar with these concepts, see either
[22] (for a mathematical treatment) or [5] (for a computer science perspective). Many desirable properties of a clustering method are subsumed in functoriality when morphisms are properly chosen. One of our principal goals is to extend their theory of functorial clustering schemes to methods that allow overlapping clusters, and in so doing obviate some of the unpleasant effects of chaining that occur for example with singlelinkage. Rather than relying on chaining to overcome certain technical problems, we accept overlapping clusters.2. Definitions
Let be a set, to be thought of as a set of unlabeled data to be analyzed. In order to make as few assumptions as possible, we only require that be endowed with a metric, which we will often refer to as . However, we do not assume, for example, that is embeddable in Euclidean space or that is obtained by sampling from some distribution. Recall that a cover of is a collection of subsets of whose union is . A cover of is a partition of if for any pair of distinct subsets . Also recall that a cover is said to refine a cover , if every is contained in some .
Traditionally, a clustering method applied to an input dataset is expected to produce a partition of . The work by Kleinberg [21] highlights the need for a rigorous treatment of the formal relations between the nonexpansive maps among finite metric spaces on the one hand, and refinement relations among partitions produced by distancebased clustering methods on the other. In fact, the main result of loc. cit. clearly states refinement relations as the obstruction to the “richness” axiom (stating that every partition be obtainable as an output of the clustering method for some suitably chosen input, this axiom seems to us as the single least debatable of Kleinberg’s axioms).
Accepting the philosophical position of Carlsson and Mémoli [9] that functoriality of the clustering map is a suitable replacement for the rest of Kleinberg’s axioms, our interest in clustering with overlaps leads us to formulating a restriction on the class of covers of acceptable as outputs of a distancebased clustering method. However, we prefer to view functoriality as a way of imposing constraints on consistent clustering across datasets, rather than as a set of axioms that must be adhered to.
Following Jardine and Sibson [20], we consider a clustering as encoded by a symmetric and reflexive relation , with clusters being defined as the fibers, of the relation. This point of view shows that, in addition to the functoriality constraints already mentioned, a clustering method affording overlaps requires a weakening of the transitivity property (characteristic of partitioning methods). Should transitivity be dropped completely, all that remains is the observation that the fibers of form a cover of . Still, intuitively, for the purpose of distancebased clustering one feels that three points , , and , which are pairwise “similar” to some (measurable) degree need to be regarded as “jointly” similar to the same degree. Likewise, this observation should remain valid for larger set of points. This motivates the following definition:
Definition 1.
Let be a nonempty finite set. A nonnested flag cover (or simply flag cover) is a cover of satisfying the following conditions:

If and , then .

The abstract simplicial complex consisting of all the subsets of elements of is a flag complex.
We denote the set of flag covers of by .∎
Note that is the collection of maximal simplices of , with a simplex spanning if and only if is contained in some element of the cover . Thus, is flag if, for every , spans a simplex in whenever every pair of distinct points is contained in an element of . In particular, every partition of is a flag cover of .
Finally, note that any cover of can be “upgraded” to a nonnested flag cover , commonly referred to as the flagification of , in a minimal way, where will refine and will refine any other flag cover which refines. This may be done by iteratively adjoining to any clusters mandated by the flag condition, and then removing all the nonmaximal ones.
Perhaps the most common notion in the clustering literature (see, e.g., [20, 13]) related to flag complexes is that of maximally linked sets. We recall:
Definition 2.
Let be a set and let be a symmetric, reflexive relation . A subset is maximally linked with respect to if (1) for all , and (2) is not properly contained in any subset of satisfying (1).∎
Clearly, picking to be the set of all maximally linked subsets of with respect to results in a flag cover of . One of the most studied constructions of this form is the Vietoris–Rips complex, arising from a metric space as upon setting , for some .
Definition 3.
A persistent cover on is a function such that

If then refines .

For any , there is an with for all .
If we also have (3) below, then we call a sieve on :

There exists such that is the trivial cover .∎
Persistent covers and sieves are a direct generalization of Carlsson and Mémoli’s persistent sets and dendograms, which satisfy the same conditions, but have the set of partitions of as codomain. They may also be seen as a sort of strictly isotone indexed set system as in [7], where the index of each set is given by the infimum of the values of such that .
We now consider the category , which has finite metric spaces as objects and nonexpansive mappings as morphisms. That is, a map of sets is a morphism in if for any , . This is the same as saying , where is the metric on given by . Note that any morphism factors through . We abuse terminology somewhat by allowing zero distances between points in our finite metric spaces. In this way is a valid object in even when is not injective.
We want to take objects in and convert them into (collections of) clusters in various ways. The category
is the category of ordered pairs
, where is a set and is a flag cover of . A morphism between and is a map of sets such that is a refinement of . These are called consistent maps. Note that need not be a flag cover of , though it becomes one upon removal of its nonmaximal elements.The category of partitions is a subcategory of , where only coverings that are also partitions are allowed. We define as the category of pairs , where is a sieve on . The morphisms in are an extension of the morphisms of ; that is, a set map is a morphism of sieves if for every , refines . Note that this means that we have a family of functors from to , by restricting to a particular value of the parameter . For convenience, we summarize these categories in Figure 1.
Category  Objects  Morphisms 

Finite metric spaces  Nonexpansive maps  
Finite metric spaces  Nonexpansive injections  
, a partition of  Consistent maps  
, a flag cover of  Consistent maps  
, a persistent set on  Consistent maps  
, a dendrogram on  Consistent maps  
, a persistent cover of  Consistent maps  
, a sieve on  Consistent maps 
3. Flat Clustering
The primary development of this paper focuses on clustering methods that work at a fixed scale, giving clusters of similar data points either as blocks of a partition or sets in a covering. In the next section, we will briefly describe how these methods can be extended to hierarchical versions.
3.1. Functors on Met
We consider a flat or nonhierarchical (overlapping) clustering to be a covariant functor from to , which restricts to the identity on the underlying set, i.e., takes the form where is a flag cover of . We will refer to such as clustering functors. A reasonable first question is whether there are any interesting such functors, and the following definition provides a useful way of constructing many examples.
Definition 4.
Let be a set of finite metric spaces. Given a metric space , define a relation on with if and only if there exists a morphism from some into satisfying . Let be the covering of by maximally linked subsets under . We refer to as the clustering functor generated by .
Remark 5.
Clearly, the relation above is preserved under any morphism. By this we mean that if we have a morphism in and , so that there is a morphism for some with and in the image of , then the composition yields . This verifies that is, indeed, functorial for .
A wide range of clustering functors are obtainable in this way. We begin with the standard singlelinkage clustering scheme. Given a parameter , we can construct the Vietoris–Rips complex from any metric space by adding an edge between two points whenever the distance between them is at most . Define as the partition of given by the connected components of the Vietoris–Rips complex of , and define by . Carlsson and Mémoli, in [10], have shown that is functorial when viewed as a map to . Since contains , the mapping is also functorial with as target category. Alternatively, it is easy to see that is generated by the collection of spaces endowed with the metric , with ranging over the positive integers (see Definition 4).
We now define maximallinkage clustering in a similar fashion. Given , again construct the Vietoris–Rips complex of . We then take to be the set of maximal simplices of this complex. We define as the map taking to . Alternatively, is for .
Theorem 6.
is a surjective functor from to .
Proof.
The image under of a morphism in should be the morphism in given by the same set function, if it is indeed a morphism in . Thus, as long as maps morphisms to morphisms, it will respect composition. In the following diagram, we need to show that is a morphism in given that is a morphism in .
Define a reflexive symmetric relation on with if ; similarly, let if . Then for any morphism in , . Under a mild abuse of notation, this means that . Note that the sets in are the maximal linked sets under . Since contains , every maximal linked set under is contained in a maximal linked set under . Hence refines .
An alternative proof of this fact is to note that the sets in are the maximal linked sets, i.e. if is one such set then for every and , and for every there is an with . It follows that so that all the points in are within of each other. They therefore lie in some maximal linked set in . It follows that and that refines .
To see that is surjective, note that the cover implicitly defines a simplicial complex on by taking the sets in the cover as maximal simplices. Because is a flag cover, this complex is flag, uniquely determined by its skeleton. We can therefore metrize the skeleton of this complex by setting every edge length to , and setting the distance between any two disconnected points to be . Then the distance between two points in the complex is less than or equal to if and only if they are in the same simplex. This implies that the maximal simplices are exactly the maximal linked sets under this metric. Thus every flag cover arises from some metric on under the map, so that this map is surjective. ∎
The concept in the preceding proof of defining a symmetric reflexive relation on and then taking its maximal linked sets is an important one. We may reformulate in terms of a relation by letting for two elements of if there is a positive integer and a sequence of points such that for all . This is just a more explicit way of saying that there is a nonexpansive map containing in its image (see the definition of above). In this case the relation is an equivalence relation and the singlelinkage clusters are simply the equivalence classes of . Similarly, consists of the maximal linked sets of the relation where if .
This suggests the possibility of expanding the relation to include more pairs but not to the extent of the relation . One potential method is to fix a positive integer and define a relation as before, such that if there exists a sequence of points (not necessarily all distinct) such that for . In other words, we can get from to in steps of size at most . We denote the resultant map from , given by taking maximal linked sets of , as , and call it linkage clustering. Observe is generated by . In particular, is a functor by Remark 5.
Of course, additional relations are possible. For example, we could also define where we can take as many steps as we like provided that the sum of the lengths are no more than . Alternatively, we could combine this with to obtain the relation where we require that .
An immediate consequence of the definition of is that for any metric space and any threshold value , there exists such that is equivalent to on . This may be summarized as saying that . Similarly, . The functor is also known as “Cech clustering at scale ” or sometimes “at ”.
Now let
be any (flat/nonhierarchical) clustering functor. We consider the twopoint metric space
with distance between the two points. Note that if then there is a nonexpansive mapping (morphism in ):Thus if consists of two single point clusters then so does . On the other hand, if is a single cluster, then so is for any .
We call trivial if is two single point clusters for all or if is a single two point cluster for all . One can easily show that in the former case is the cover by singletons for all in and in the latter case is just the cover consisting of itself for all . (Keep in mind that the cover is a flag cover.)
Thus if is nontrivial there exists a number such that is a single two point cluster if and two singleton clusters if . The question of what happens when is a minor annoyance, and we will assume is a single two point cluster. The other case can be handled with some minor changes to our discussion.
Definition 7.
Given a nontrivial clustering functor , we call the clustering parameter for .∎
Note that if has clustering parameter and is any metric space with and then and lie in a common cluster (set of the cover) of .
Theorem 8.
Suppose is a nontrivial clustering functor with clustering parameter . Then for any input space , the output of refines the output of and is refined by the output of .
Proof.
Suppose such that . Then there exists a morphism with image . By the hypothesis, merges into one cluster, so there must be some cluster in such that . Here we are using the functoriality of , which means we have a morphism
in , and our single two point cluster must refine the pullback of to . Since is flag, if is a maximal linkage component of at scale then is contained in an element of .
Now suppose are elements of such that and are in separate components of . Then there exists a morphism for some which sends and to different points in . But this implies that and can never be in the same cluster in . ∎
Note that this does not imply that the clusters in are unions of Rips clusters (i.e., ), which is false in general.
3.2. Functors on
Carlsson and Mémoli, after proving the uniqueness of as a functor , considered an expanded class of functors, those from . In this section we consider some other clustering schemes in this context.
A number of overlapping clustering schemes have been suggested in the literature. Jardine and Sibson [20] proposed two “type B” methods that restricted the size of the overlap between clusters. We consider these two methods, along with two similar methods based on vertex and edge connectivity. The clustering method is designed to prohibit overlaps of cardinality greater than or equal to . One way to obtain it is by taking the maximally linked clusters for a given level , and repeatedly merging any two clusters that overlap in or more points. Alternately, one may construct the threshold graph for a given , and then repeatedly add edges implied by the following property: if and are vertices, and there exists a complete subgraph of size such that both and are adjacent to every vertex in , then and are adjacent. Then the clusters are the maximal cliques of this graph. This requirement is relaxed in the coarser method , in which need not be a complete subgraph, or even connected at all, i.e., and are each adjacent to a subset of points. Note that neither nor are functorial for (see figure 3 below).
We define the clustering methods as follows: Given a metric space and , we construct the graph with with vertices equal to the set , where there is an edge between and if and only if . We call this graph the threshold graph for . Then for any integer construct the covering of given by maximal vertexconnected subgraphs of . We denote this clustering method . Note that . Further inspection shows that . All three of these methods and are distinct, as Figure 2 shows.
The use of vertex connectivity in defining the clustering methods leads naturally to the idea of using edgeconnectivity to separate clusters. Note that the maximal edgeconnected subgraphs always form a partition of the vertices, unlike the maximal vertexconnected subgraphs. We will call this clustering method . As with vertex connectivity, we also have that and for all . In general, however, and will produce different results.
It is easy to see that each of these clustering methods fails to be functorial on for finite and any . Consider the two spaces in Figure 3. For , is grouped into a single cluster under the three overlaprestricting methods. However, the nonexpansive mapping takes onto a metric space that has two clusters. The lack of functoriality stems from the restriction on numbers of overlapping points. Morphisms in may collapse several points into one, thus splitting a vertexlinked subgraph. Similarly, the two spaces in Figure 4 show that () is not functorial on , with the problem again arising from the fact that multiple points can be collapsed into a single point. This motivates the consideration of the category as in [10], which restricts morphisms to injective nonexpansive maps.
Theorem 9.
The mappings and from are functorial for each and .
Proof.
Let be a morphism in . If and are the threshold graphs associated to and , then induces a graph homomorphism since is injective and nonexpansive. Thus preserves both edge and vertex connectedness, and the maximal edge or vertexconnected subsets of contain the images of the maximal connected subsets of . In other words, refines and refines , as desired.
Theorem 10.
For every , there is a sequence of natural transformations
in the category of functors .
Proof.
Note that if , the clustering given by always refines the clustering given by . Then the identity maps from to are morphisms in for any . ∎
The biconnected components of a graph can be computed in linear time; given a division of a graph into maximal connected subgraphs, these can be divided into connected subgraphs by finding all element vertex cuts. This can be done in polynomial time for each , so the vertex connected components of a graph can be enumerated in polynomial time. (For more information see [23]). Constructing the adjacency graph for a given metric requires quadratic time in the number of points, so the clustering schemes can be calculated in polynomial time for any fixed . However, the maximal clique problem is NPcomplete, so no polynomial time algorithm is known to compute in general.
Note that the method is excisive in the sense of Carlsson–Mémoli [10] for each , so by Theorem of loc. cit., it is representable by a set of test metric spaces whose injections into determine the clusters. However, it may be more efficiently calculated using one of several fast algorithms for finding maximal edgeconnected subgraphs, such as those in [26], [11], and [1].
4. Hierarchical Clustering
All of the parameterized flat clustering schemes we have considered generalize naturally to hierarchical clustering methods which we call sieving functors .
Theorem 11.
Suppose . Then for any (including ) there is a natural transformation .
Proof.
The theorem follows easily from the fact that for any , the clustering given by refines that given by whenever . ∎
Theorem 12.
Suppose is a family of functors from to indexed by nonnegative real numbers such that whenever , there is a natural transformation . Then the map given by , with is a functor.
The proof of Theorem 12 follows easily from the definition of a sieve, and we call a functor of this type a sieving functor. The two previous theorems then give us a family of hierarchical clustering schemes, . Note, however, that there are many more functorial hierarchical clustering schemes. A broader theoretical treatment will be given in a forthcoming paper, “Functorial Clustering via Projections,” where we work with sets having more general dissimilarity measures and provide a characterization of stable sieving functors.
5. Acknowledgements
The authors are grateful for the financial support of the Air Force Office of Scientific Research under the LRIR 12RY02COR, LRIR 15RYCOR153, and MURI FA95501010567 grants.
References
 [1] Takuya Akiba, Yoichi Iwata, and Yuichi Yoshida. Lineartime enumeration of maximal edgeconnected subgraphs in large networks by random contraction. In Proceedings of the 22nd ACM international Conference on information and knowledge management, pages 909–918, 2013.
 [2] HansJürgen Bandelt and Andreas W. M. Dress. Weak hierarchies associated with similarity measures: An additive clustering technique. Bulletin of Mathematical Biology, 51(1):133–166, 1989.
 [3] HansJürgen Bandelt and Andreas W. M. Dress. A canonical decomposition theory for metrics on a finite set. Advances in Mathematics, 92:47–105, 1992.
 [4] HansJürgen Bandelt and Andreas W. M. Dress. An order theoretic framework for overlapping clustering. Discrete Mathematics, 136(1–3):21–37, December 1994.
 [5] Michael Barr and Charles Wells. Category Theory for Computing Science. Les Publications CRM, Montréal, 3rd edition, 1999.
 [6] J.P. Barthélemy, F. Brucker, and C. Osswald. Combinatorial optimisation and hierarchical classifications. Annals of Operations Research, 153(1):179–214, September 2007.
 [7] Patrice Bertrand. Set systems and dissimilarities. European Journal of Combinatorics, 21:727–743, 2000.
 [8] Patrice Bertrand and Jean Diatta. Weak hierarchies: A central clustering structure. In Fuad Aleskerov, Boris Goldengorin, and Panos M. Pardalos, editors, Clusters, Orders, and Trees: Methods and Applications, Springer Optimization and Its Applications, pages 211–230. Springer New York, 2014.
 [9] Gunnar Carlsson and Facundo Mémoli. Characterization, stability, and convergence of hierarchical clustering methods. Journal of Machine Learning Research, 11:1425–1470, 2010.
 [10] Gunnar Carlsson and Facundo Mémoli. Classifying clustering schemes. Foundations of Computational Mathematics, 13(1):221–252, 2013.
 [11] Lijun Chang, Jeffrey Xu Yu, Lu Qin, Xuemin Lin, Chengfei Liu, and Weifa Liang. Efficiently computing edge connected components via graph decomposition. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 205–216, 2013.
 [12] Frédéric Chazal, Steve Oudot, Primoz Skraba, and Leonidas J. Guibas. Persistencebased clustering in Riemannian manifolds. Journal of the Association of Computing Machinery, 60(6), 2013.
 [13] Jean Diatta. Onetoone correspondence between indexed cluster structures and weakly indexed closed cluster structures. In Paula Brito, Patrice Bertrand, Guy Cucumel, and Francisco de Carvalho, editors, Selected Contributions in Data Analysis and Classification, Studies in Classification, Data Analysis, and Knowledge Organization, pages 477–482. Springer Berlin Heidelberg, 2007.
 [14] Andreas Dress, Vincent Moulton, Andreas Spillner, and Taoyang Wu. Obtaining splits from cut sets of tight spans. Discrete Applied Mathematics, 161:1409–1420, 2013.
 [15] Andreas W. M. Dress, Katarina T. Huber, and Vincent Moulton. Totally splitdecomposable metrics of combinatorial dimension two. Annals of Combinatorics, 5(1):99–112, 2001.

[16]
Brian Everitt.
Cluster Analysis.
Wiley series in probability and statistics. Wiley, 5 edition, 2011.
 [17] W Harvey, O Rübel, V Pascucci, PT Bremer, and Y Wang. Enhanced topologysensitive clustering by Reeb graph shattering. In Ronald Peikert, Helwig Hauser, Hamish Carr, and Raphael Fuchs, editors, Topological Methods in Data Analysis and Visualization II: Theory, Algorithms, and Applications, Mathematics and Visualization, pages 77–90. Springer Berlin Heidelberg, 2012.
 [18] Melvin F. Janowitz. Ordinal and Relational Clustering, volume 10 of Interdisciplinary Mathematical Sciences. World Scientific, 2010.
 [19] Nicholas Jardine and Robin Sibson. The construction of hierarchic and nonhierarchic classifications. Computer Journal, 11:117–184, 1968.
 [20] Nicholas Jardine and Robin Sibson. Mathematical Taxonomy. Wiley Series in Probability and Mathematical Statistics: Tracts on Probability and Statistics. Wiley, 1971.
 [21] Jon Kleinberg. An impossibility theorem for clustering. Advances in Neural Information Processing Systems, 15, 2002.
 [22] Saunders Mac Lane. Categories for the Working Mathematician. Number 5 in Graduate Texts in Mathematics. Springer, New York, 2nd edition, 1998.
 [23] David W. Matula. blocks and ultrablocks in graphs. Journal of Combinatorial Theory, B(24):1–13, 1978.

[24]
Gurjeet Singh, Facundo Mémoli, and Gunnar Carlsson.
Topological methods for the analysis of high dimensional data sets and 3D object recognition.
In Symposium on PointBased Graphics, pages 91–100, 2007.  [25] Bernd Sturmfels and Josephine Yu. Classification of sixpoint metrics. The Electronic Journal of Combinatorics, 11, 2004.
 [26] Rui Zhou, Chengfei Liu, Jeffrey Xu Yu, Weifa Liang, Baichen Chen, and Jianxin Li. Finding maximal edgeconnected subgraphs from a large graph. In EDBT/ICDT Joint Conference, March 2012.