A fundamental task in data analysis is to characterize variability that separates informative data relations from disruptive ones, e.g., due to noise or collection artifacts. In predictive tasks such as classification, for example, one might seek to extract and preserve information that enhances class separation, while eliminating intra-class variance. However, in descriptive tasks and data exploration, such knowledge does nota priori exist, and instead data processing methods must detect emergent patterns that encode meaningful abstractions of the data. Furthermore, it is often the case that data abstraction cannot be conducted at a single scale, and instead one must consider multiresolution data representations that generate several scales of abstraction – each emphasizing different properties in the data.
The need for multiresolution data representations is of particular importance in biomedical data exploration, where recent technological advances introduce vast amounts of unlabeled data to be explored by limited numbers of domain experts. For example, in single-cell transcriptomics, high-throughput genomic and epigenetic assays have led to an explosion in high-dimensional biological data measured from various systems including imaging giesen2014highly ; angelo2014multiplexed , mass cytometry Bendall2011 , and scRNA-seq shapiro2013single ; kolodziejczyk2015technology . To fully utilize this transformative big data availability, computational methods are needed that leverage the intrinsic data geometry (e.g., using manifold learning techniques (moon2018manifold, )) to enable exploratory analysis upon it.
A common approach towards data abstraction is to use clustering algorithms that provide coarse-grained representations of the data by grouping data points into salient clusters (levine2015data, ; galluccio2013clustering, ; von2007tutorial, ), either at a single scale or hierarchically (see Section 2). However, standard clustering algorithms such as -means (lloyd1982least, ; kanungo2002efficient, )
or expectation maximization (EM)(moon1996expectation, )
have many limitations. For example, they fail to perform well on high-dimensional data, or they require a number of assumptions about the underlying structure of the data(Ng2002, ). In particular, a primary challenge in clustering is determining the optimal number of clusters or groups. Many algorithms require the user to explicitly choose the number of clusters (as in -means) or tune a parameter that directly relates to the number of detected clusters (e.g., as in Phenograph levine2015data ). In exploratory settings, this makes it particularly challenging to detect small, unique, or otherwise rare data type clusters, and extract new knowledge from them.
Here, we present a new approach to address the challenge of multiscale data coarse graining by using a data-driven time-inhomogeneous diffusion process, which we call diffusion condensation. Our proposed diffusion condensation process learns a “continuous hierarchy” of coarse-grained representations by iteratively contracting the data-points towards a time-varying data manifold that represents increasingly coarser resolutions. At each iteration, the data points move to the center of gravity of their local neighbors as defined by this data-driven diffusion process Coifman2006 ; Nadler2005
. This in turn alters the next steps of the diffusion process to reflect the new data positions. Across iterations, this construction creates a time-inhomogeneous Markov process on the data, which represents the changing affinities between data points, along with changing granularities. The process eventually collapses the entire data set to a single point. However, intermediate steps in this process produce coarse-grained data representations at particular granularities or abstraction levels. Importantly, our results show that distinct clusters emerge at different scales and each data point (e.g., each cell in transcriptomic data) is represented by a time series of feature vectors that capture multiresolution information in the data. Therefore, the data embedding provided by the constructed diffusion condensation process can be thought of as a dynamic video, as opposed to static snapshots provided by traditional manifold learning, such as diffusion mapsCoifman2006 and other dimensionality reduction methods van2009dimensionality .
2 Related work
Typical attempts at providing multiscale data abstraction or summarization rely on hierarchical clustering, which is a family of methods that attempts to derive a tree of clusters based on either recursive agglomeration of datapoints or recursive splitting. Agglommerative methods include the popular linkage clustering, or community detection methods including the Louvain Method (blondel2008fast, ). Splitting based approaches include recursive bisection dasgupta2006spectral and divisive analysis clustering kaufman2009finding . At each iteration, these methods explicitly attempt to discover the best split or merge at each iteration, thereby forcing points together or apart as the case may be. Diffusion condensation by contrast does not force any splits or mergers at any iteration and simply allows datapoints to come together naturally via repeated condensation steps. Thus, there may be many iterations in which a cluster of datapoints remains distinct from other clusters. This time length under which the cluster persists can itself be a metric of the distinctness of a cluster, and the agglomeration of all such cluster persistence times creates a diagram similar to those created in persistent homology wasserman2018topological ; kwitt2015statistical . Thus the hierarchical tree created by diffusion condensation (displayed as a Sanky diagram in Figure 3) has branches whose lengths are meaningful in terms of cluster separation.
High dimensional data can often be modeled as originating from a sampling of a dimensional manifold that is mapped to dimensional observations via a nonlinear function . Intuitively, the reason for this phenomenon is that data collection measurements (modeled here via
) typically result in high dimensional observations, even when the intrinsic dimensionality, or degrees of freedom, in the data is relatively low. This manifold assumption is at the core of the vast field of manifold learning (e.g.,moon2018manifold ; Coifman2006 ; van2009dimensionality ; izenman2012introduction , and references therein), which leverages the intrinsic data geometry, modeled as a manifold, for exploring and understanding patterns, trends, and structure in data.
In Coifman2006 , diffusion maps were proposed as a robust way to capture intrinsic manifold geometry in data using random walks that aggregate local affinity to reveal nonlinear relations in data and allow their embedding in low dimensional coordinates. These local affinities are commonly constructed using a Gaussian kernel
where is an Gram matrix whose entry is denoted by to emphasize the dependency on the data . The bandwidth parameter
controls neighborhood sizes. A diffusion operator is defined as the row-stochastic matrixwhere is a diagonal matrix with , which is referred to as the degree of . The matrix
defines single-step transition probabilities for a time-homogeneous diffusion process (which is a Markovian random walk) over the data, and is thus referred to as the diffusion operator. Furthermore, as shown inCoifman2006 , powers of this matrix , for , can be used for multiscale organization of , which can be interpreted geometrically when the manifold assumption is satisfied.
While originally conceived for dimensionality reduction via the eigendecomposition of the diffusion operator, recent works (e.g., van2018recovering ; lindenbaum2018geometry ; gama:diffScatGraphs2018 ; gao:graphScat2018 ) have extended the diffusion framework of Coifman2006 to allow processing of data features by directly using the operator . In this case, serves as a smoothing operator, and may be regarded as a generalization of a low-pass filter for either unstructured or graph-structured data. Indeed, consider a vector that we think of as a signal over . Then replaces the value with a weighted average of the values for those points such that
. Applications of this approach include data denoising and imputationvan2018recovering , data generation lindenbaum2018geometry , and graph embedding with geometric scattering gama:diffScatGraphs2018 ; gao:graphScat2018 .
4 Diffusion condensation
4.1 Time inhomogeneous heat diffusion
The matrix defines the transition probabilities of a random walk over the data set . Computing powers of runs the walk forward, so that gives the transition probabilities of the -step random walk. Since the same transition probabilities are used for every step of the walk, the resulting diffusion process is time homogeneous.
A time inhomogeneous diffusion process arises from an inhomogeneous random walk in which the transition probabilities change with every step. Its -step transition probabilities are given by
is the Markov matrix that encodes the transition probabilities at step.
Suppose the data set has an additional parameter that results from measurements of a time-varying manifold at discretely sampled times . Let be the resulting Markov matrix derived from , constructed according to the anisotropic diffusion process of (Coifman2006, , Section 3) (which is similar to the construction described in Section 3). One can show hirn:timeCoupledDM2014 the resulting inhomogeneous diffusion process approximates heat diffusion over the time varying manifold . The singular vectors of this process can be used to construct a so-called time coupled diffusion map, which gives time-space geometric summaries of the data .
The perspective of hirn:timeCoupledDM2014 is that the data is intrinsically time varying. However, one can also start with a static data set and construct a series of deformations of the data. In this paper we take the latter perspective and deform the data set according to an imposed, data driven time inhomogeneous process that reduces variability within the data over time. The resulting process is referred to as condensation, and is described in the next section.
4.2 The diffusion condensation process
Recall from Section 3 that the application of the operator to a vector averages the values of over small neighborhoods in the data. In the case of data measured from an underlying manifold with the model for , this averaging operator can be directly applied to the coordinate functions . Let be the vector corresponding to the coordinate function evaluated on the data samples, i.e., . The resulting description of the data is given by where . The coordinates of are smoothed versions of the coordinates of , which dampens high frequency variations in the coordinate functions and thus removes small perturbations in the data. This smoothing technique is used in van2018recovering to impute and denoise data.
Here we consider not only the task of eliminating variability that originates from noise, but also coarse graining the data coordinates to provide multiple resolutions of the captured information in them. Therefore, we aim to gradually eliminate local variability in the data using a time inhomogeneous diffusion process that refines the constructed diffusion geometry to the coarser resolution as time progresses. This condensation process proceeds as follows. Let be the original data set with Markov matrix and the coordinate-smoothed data described in the previous paragraph. We can iterate this process to further reduce the variability in the data by computing the Markov matrix using the coordinate representation . A new coordinate representation is obtained by applying to the coordinate functions of . In general, one can apply the process for an arbitrary number of steps, which results in the condensation process. Let be the coordinate representation of the data after steps so that with , where . We obtain by applying , the Markov matrix computed from , to the coordinate vectors . This process results in:
From (2) we see the coordinate functions of the condensation process at time are derived from the imposed time inhomogeneous diffusion process . The low pass operator applies a localized smoothing operation to the coordinate functions . Over the entire condensation time, however, the original coordinate functions are smoothed by the cascade of diffusion operators . This process adaptively removes the high frequency variations in the original coordinate functions. The effect on the data points is to draw them towards local barycenters, which are defined by the inhomogeneous diffusion process. Once two or more points collapse into the same barycenter, they are identified as being members of the same cluster. A detailed description of the algorithm is available in Appendix A. In Section 5 we demonstrate condensation’s dynamic data deformations to remove variability and collapse points into clusters.
5 Properties of diffusion condensation
5.1 Cluster self-organization
Unlike other state-of-the-art clustering algorithms, such as -means, diffusion condensation does not require the user to a priori choose a potentially arbitrary number of data clusters to find. Rather, condensation grows self-organizing cluster hierarchies that emerge through local interactions among the data manifold’s sampling density and curvature variation. To disentangle and illustrate such properties, we provide condensation video stills in Figure 1. To begin we highlight the hyperuniformly-sampled (i.e., grid-sampled) circle manifold on the top-left of Figure 1, which demonstrates the base case of homogeneous data density and constant curvature. Note the absence of cluster formation. Comparing this to the hyperuniformly-sampled ellipse on the right of Figure 1, we observe the formation of nontrivial condensation clusters, particularly in the regions of high curvature. Similarly, the (non-hyper) uniformly-sampled circle manifold of constant curvature in the bottom-left of Figure 1 exhibits local cluster formation. Hence, we conjecture that nontrivial data density or curvature variation are sufficient conditions for the formation of diffusion condensates.
5.2 Cluster characterization via spectral decay
In addition to still frames, it is enlightening to consider cluster formation via its correspondence with the spectral decay of three related operators: , the diffusion maps operator, , the Markov operator built from the condensation, , of the data set , and , the time-inhomogeneous diffusion operator, which is composed of . Figure 2 demonstrates that data condensation corresponds with sudden, rapid spectral decay. Recall that a nested series of hierarchical data representations may be achieved through diffusion maps by taking successive powers of the diffusion operator, (not
), or, equivalently, powering its eigenvalues,for , which function as coordinates of the spectral embedding (e.g., , for all ). Of particular interest is the smooth decay to in the limit as of the diffusion maps spectrum as contrasted with the eigenvalues of (Figure 5 in Appendix B) which exhibit rapid, finite-time decay to zero.
Figure 2 displays many features of interest. Most striking is the correspondence between rapid spectral decay of
and cluster formation, which are depicted just before the moment of condensation. We see three major areas of cluster formation beginning near iteration 15, again near iteration 53, and once again near the last iteration, 100, when the algorithm halts.
5.3 Condensation allows multiscale persistence analysis
Since the condensation algorithm naturally allows points to come together via a low pass filter application at each iteration, then the time-point in the process at which clusters naturally come together, and the period of time which a cluster stays persistent (without merging), offer a notion of cluster metastability. This can be used to derive a partitioning of the dataspace that has mixed levels of granularity. By contrast, most clustering methods are only able to produce results at a particular level of granularity, for example -means tends to favor clusters that roughly divide the data into partitions of similar sizes. However, different parts of the dataspace may naturally separate at different levels of granularity and this is not visible in other methods. Even hierarchical clustering, due to forced splits and merges, may not reveal the levels of granularity at which data groupings are most distinct. We visualize this persistence information using Sanky diagrams (see Figures 3 and 4) that show natural groupings of the data. In Section 6.1 we use this capability of condensation to suggest a more relevant subtyping of retinal bipolar neurons on the basis of their transcriptomic profile, as compared to previous literature.
6 Empirical results
6.1 Single-cell transcriptomics data
A recent study of retinal bipolar neurons using single-cell transcriptomics was performed Shekhar2016
to classify cells into coherent subtypes. The study identified 15 cell sub-types by using the method ofblondel2008fast , of which 13 were well known and 2 were novel. We use the findings of said study to benchmark the condensation algorithm. From the dataset, we use a randomly selected sample of 20,000 cells with gene expression counts sequenced to a median depth of 8,200 mapped reads per cell to perform condensation. The condensation ran for 64 iterations until it achieved a metastable state of 12 clusters (close to the 15 reported in Shekhar2016 ). However due to the continuous clustering history offered by condensation, we are able to assess when these metastable clusters first form; see Figure 3 for a diagram of iterations to . A key advantage of the condensation method is its ability to compute cluster persistence based on the lengths of the clustering tree branches, which we use to reassess the subtyping of retinal bipolar cells performed in Shekhar2016 .
Using community detection methods, Shekhar2016 found that cluster BC1 (bipolar cone cells, subtype 1) is better described as two clusters, BC1A and BC1B. Shekhar et al. Shekhar2016 even confirm that morphologically BC1B seems to be a unipolar cluster rather than bipolar. Condensation clustering corroborates this new finding. Indeed, as shown in Figure 3, the dark grey BC1 subclusters stay persistently separated until the last iteration shown. Therefore, the two subcluster-state is more persistent than the single cluster.
On the other hand, condensation suggests alternative groupings of other clusters not identified by previous papers on retinal bipolar neurons including Shekhar2016 . Among these, we find that although BC3 has been described in terms of two subcomponents, BC3A and BC3B in biological literature, and in Shekhar2016 , these subclusters merge early (iteration 53) and the transcriptional profiles are not significantly distinct overall, despite certain selective markers such as Erbb4, Nnat being different between the two. Additionally, we find that our results strongly suggest that BC7 consists of 3 distinct subtypes that persist separately until the last iteration. Previously, the BC7 cell type has been described as a Vstm2b+Casp7+ cone cell that is distinct from other BC types as predicted by Shekhar2016 . Our analysis, however, reveals that there may be multiple sub-populations that are distinct within this cell type designation. While additional experimentation are required to follow up on this finding, condensation provides a way to examine granularities at which data is best organized based on cluster persistence via the whole condensation history.
6.2 Neural connectome data
Since the condensation algorithm operates via a series of diffusion operators, which can be regarded as types of adjacency matrices, we sought to understand if the algorithm would apply to coordinate-free spaces. To achieve this we took a datatype that naturally exists as a graph: the neural connectome data of the Caenorhabditis elegans brain, a neuropil called the “nerve ring” consisting of 181 neurons. Here an adjacency matrix was created from the contact profiles determined by images along slices of the worm, i.e., neurons that were more frequently in contact with one another were assumed to have a stronger connection and communication with one another. This adjacency matrix was then eigendecomposed to create a coordinate space in order to perform the condensation. The remainder of the algorithm remained as described.
First we sought to test out the robustness of the condensation algorithm by applying it to two complete connectomes of the Caenorhabditis elegans brain. Previous comparisons between these connectomes had concluded that they largely share similar structure at the level of cell morphology and synaptic positions white1986structure . We therefore hypothesized that by comparing the output from these similar connectomes we could test the robustness of our algorithm. Specifically, we focused on analyzing the relationship between cell-cell contact profiles for every neuron within the two connectomes brittin2018volumetric . Cell-cell contact relationships should define modules with the brain that are bundled together, and we hypothesized that if the algorithm was working as expected, it should extract similar contact profiles among the two connectomes. We observed that our algorithm produced similar condense profiles for the two connectomes (See Figure 4), suggesting that our method can be used to robustly analyze connectomics data. We see that the sanky diagrams preserve much of the structure including an important mechanosensory circuit. To quantify the similarity between condensation clusters generated from the two connectomes, we compute the adjusted Rand index (ARI) at each condensation iteration from to and then take the mean. This yields an , for , where the closer to the better.
A major advantage of the diffusion condensation algorithm (in Section A) is that it allows analyses of computational iterations to extract biologically relevant information informing the clustering steps. We hypothesized that these iterative steps could reveal units of circuit architecture underlying the brain. To test this, we examined the clusters for well described circuits, specifically, for the anterior body mechanosensation circuit girard2006wormbook . The anterior body mechanosensation circuit contains 2 classes of mechanosensory neurons and 4 classes of command interneurons that contact and connect to each other, and based on their contact profile, should be identified by the algorithm chalfie1985neural ; wicks1995integration . Indeed, iteration 14 (Figure 4) identified the circuit in both worms, revealing the predicted relationships between these connecting neurons. Interestingly, iteration 14 also contains neurons of unknown function that, upon closer inspection, are closely associated to the circuit, but have not been implicated in mechanosensory behaviors. Therefore inspection of the condensation algorithm not only extracted the known circuit, but also motivated a new hypothesis regarding the function of unknown neurons associated to the circuit. Together, our analyses demonstrate that the method can be used to compare connectomics data across organisms, to extract biologically relevant units of circuit architecture and even to inform new experiments and discoveries of biological importance. We propose that this method will be broadly useful for systems level analysis of connectomics data.
We presented a multiresolution data abstraction approach based on a time-inhomogeneous diffusion condensation process that gradually coarse grains data features along the intrinsic data geometry. We demonstrated the application of this method to biomedical data analysis, in particular in single cell transcriptomics. Furthermore, the presented diffusion condensation can be seen as a cascade of data-driven lowpass filters that gradually eliminates variations in the data to extract increasingly abstract features. Indeed, under this interpretation, the abstraction provided by the condensation process can be related to common intuitions of features extracted by hidden layers of deep convolutional networks, e.g., in image processing. Such features are commonly considered as increasing in abstraction capabilities together with the depth of the network. However, we note that while convolutional networks typically employ relatively-simple pointwise nonlinearities, here the nonlinearity we employ is the reconstruction of the diffusion geometry based on the coarse grained features along the cascade. Therefore, our cascade both learns a multiresolution data geometry and extracts multiresolution characterizations of groupings based on invariant features at each iteration. Finally, we note the increasing interest in geometric deep learning, which aims to tie together filter training in deep networks with non-Euclidean geometric structures that often exist intrinsically in modern data. While our approach here relies only on lowpass filters, it opens interesting directions in employing trained filters (or even designed diffusion wavelets, as done in diffusion and geometric scatteringgama:diffScatGraphs2018 ; gao:graphScat2018 ) together with geometric reconstruction for multiscale feature extraction from data.
Software and media
All code along with selected media are available at https://github.com/matthew-hirn/condensation.git.
The authors would like to thank Tobias Welp for his help in initiating this research. This research was partially funded by: NIH grant F32-NS098616 [M.W.M.]; IVADO (l’institut de valorisation des données) [G.W.]; the Alfred P. Sloan Fellowship (grant FG-2016-6607), the DARPA Young Faculty Award (grant D16AP00117), NSF grant 1620216 [M.H.]; and the Chan-Zuckerberg Initiative (grant ID: 182702) [S.K.].
- (1) C. Giesen, H. A. Wang, D. Schapiro, N. Zivanovic, A. Jacobs, B. Hattendorf, P. J. Schüffler, D. Grolimund, J. M. Buhmann, S. Brandt et al., “Highly multiplexed imaging of tumor tissues with subcellular resolution by mass cytometry,” Nature methods, vol. 11, no. 4, p. 417, 2014.
- (2) M. Angelo, S. C. Bendall, R. Finck, M. B. Hale, C. Hitzman, A. D. Borowsky, R. M. Levenson, J. B. Lowe, S. D. Liu, S. Zhao et al., “Multiplexed ion beam imaging of human breast tumors,” Nature medicine, vol. 20, no. 4, p. 436, 2014.
- (3) S. C. Bendall, E. F. Simonds, P. Qiu, E.-a. D. Amir, P. O. Krutzik, R. Finck, R. V. Bruggner, R. Melamed, A. Trejo, O. I. Ornatsky, R. S. Balderas, S. K. Plevritis, K. Sachs, D. Pe, S. D. Tanner, and G. P. Nolan, “Single-Cell Mass Cytometry of Differential,” Science, vol. 332, no. May, pp. 687–695, 2011.
- (4) E. Shapiro, T. Biezuner, and S. Linnarsson, “Single-cell sequencing-based technologies will revolutionize whole-organism science,” Nature Reviews Genetics, vol. 14, no. 9, p. 618, 2013.
- (5) A. A. Kolodziejczyk, J. K. Kim, V. Svensson, J. C. Marioni, and S. A. Teichmann, “The technology and biology of single-cell rna sequencing,” Molecular cell, vol. 58, no. 4, pp. 610–620, 2015.
- (6) K. R. Moon, J. Stanley, D. Burkhardt, D. van Dijk, G. Wolf, and S. Krishnaswamy, “Manifold learning-based methods for analyzing single-cell rna-sequencing data,” Current Opinion in Systems Biology, vol. 7, pp. 36–46, 2018.
- (7) J. H. Levine, E. F. Simonds, S. C. Bendall, K. L. Davis, D. A. El-ad, M. D. Tadmor, O. Litvin, H. G. Fienberg, A. Jager, E. R. Zunder et al., “Data-driven phenotypic dissection of aml reveals progenitor-like cells that correlate with prognosis,” Cell, vol. 162, no. 1, pp. 184–197, 2015.
- (8) L. Galluccio, O. Michel, P. Comon, M. Kliger, and A. O. Hero, “Clustering with a new distance measure based on a dual-rooted tree,” Information Sciences, vol. 251, pp. 96–113, 2013.
U. Von Luxburg, “A tutorial on spectral clustering,”Statistics and computing, vol. 17, no. 4, pp. 395–416, 2007.
- (10) S. Lloyd, “Least squares quantization in pcm,” IEEE transactions on information theory, vol. 28, no. 2, pp. 129–137, 1982.
T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu, “An efficient k-means clustering algorithm: Analysis and implementation,”IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 24, no. 7, pp. 881–892, 2002.
- (12) T. K. Moon, “The expectation-maximization algorithm,” IEEE Signal processing magazine, vol. 13, no. 6, pp. 47–60, 1996.
A. Y. Ng, M. I. Jordan, and Y. Weiss, “On spectral clustering: Analysis and an algorithm,” inAdvances in Neural Information Processing Systems 14, T. G. Dietterich, S. Becker, and Z. Ghahramani, Eds. MIT Press, 2002, pp. 849–856.
- (14) R. R. Coifman and S. Lafon, “Diffusion maps,” Applied and Computational Harmonic Analysis, vol. 21, no. 1, pp. 5–30, 2006.
B. Nadler, S. Lafon, I. Kevrekidis, and R. R. Coifman, “Diffusion maps, spectral clustering and eigenfunctions of fokker-planck operators,” inAdvances in Neural Information Processing Systems 18, Y. Weiss, P. B. Schölkopf, and J. C. Platt, Eds. MIT Press, 2005, pp. 955–962.
- (16) L. Van Der Maaten, E. Postma, and J. Van den Herik, “Dimensionality reduction: a comparative,” J Mach Learn Res, vol. 10, no. 66-71, p. 13, 2009.
- (17) V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of statistical mechanics: theory and experiment, vol. 2008, no. 10, p. P10008, 2008.
- (18) A. Dasgupta, J. Hopcroft, R. Kannan, and P. Mitra, “Spectral clustering by recursive partitioning,” in European Symposium on Algorithms. Springer, 2006, pp. 256–267.
- (19) L. Kaufman and P. J. Rousseeuw, Finding groups in data: an introduction to cluster analysis. John Wiley & Sons, 2009, vol. 344.
- (20) L. Wasserman, “Topological data analysis,” Annual Review of Statistics and Its Application, vol. 5, pp. 501–532, 2018.
- (21) R. Kwitt, S. Huber, M. Niethammer, W. Lin, and U. Bauer, “Statistical topological data analysis-a kernel perspective,” in Advances in neural information processing systems, 2015, pp. 3070–3078.
- (22) A. J. Izenman, “Introduction to manifold learning,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 4, no. 5, pp. 439–446, 2012.
- (23) D. Van Dijk, R. Sharma, J. Nainys, K. Yim, P. Kathail, A. Carr, C. Burdziak, K. R. Moon, C. L. Chaffer, D. Pattabiraman et al., “Recovering gene interactions from single-cell data using data diffusion,” Cell, vol. 174, no. 3, pp. 716–729, 2018.
- (24) O. Lindenbaum, J. Stanley, G. Wolf, and S. Krishnaswamy, “Geometry based data generation,” in Advances in Neural Information Processing Systems, 2018, pp. 1400–1411.
- (25) F. Gama, A. Ribeiro, and J. Bruna, “Diffusion scattering transforms on graphs,” in International Conference on Learning Representations, 2019, arXiv:1806.08829.
- (26) F. Gao, G. Wolf, and M. Hirn, “Geometric scattering for graph data analysis,” 2019, to appear in the Proceedings of the 36th International Conference on Machine Learning, arXiv:1810.03068.
- (27) N. Marshall and M. J. Hirn, “Time-coupled diffusion maps,” Applied and Computational Harmonic Analysis, vol. 45, no. 3, pp. 709–728, 2018.
- (28) K. Shekhar, S. W. Lapan, I. E. Whitney, N. M. Tran, E. Z. Macosko, M. Kowalczyk, X. Adiconis, J. Z. Levin, J. Nemesh, M. Goldman, S. A. McCarroll, C. L. Cepko, A. Regev, and J. R. Sanes, “Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics,” Cell, vol. 166, no. 5, pp. 1308–1323.e30, 2016.
- (29) L. R. Girard, T. J. Fiedler, T. W. Harris, F. Carvalho, I. Antoshechkin, M. Han, P. W. Sternberg, L. D. Stein, and M. Chalfie, “Wormbook: the online review of caenorhabditis elegans biology,” Nucleic acids research, vol. 35, no. suppl_1, pp. D472–D475, 2006.
- (30) J. G. White, E. Southgate, J. N. Thomson, and S. Brenner, “The structure of the nervous system of the nematode caenorhabditis elegans,” Philos Trans R Soc Lond B Biol Sci, vol. 314, no. 1165, pp. 1–340, 1986.
- (31) C. A. Brittin, S. J. Cook, D. H. Hall, S. W. Emmons, and N. Cohen, “Volumetric reconstruction of main caenorhabditis elegans neuropil at two different time points,” bioRxiv, p. 485771, 2018.
- (32) M. Chalfie, J. E. Sulston, J. G. White, E. Southgate, J. N. Thomson, and S. Brenner, “The neural circuit for touch sensitivity in caenorhabditis elegans,” Journal of Neuroscience, vol. 5, no. 4, pp. 956–964, 1985.
- (33) S. R. Wicks and C. H. Rankin, “Integration of mechanosensory stimuli in caenorhabditis elegans,” Journal of Neuroscience, vol. 15, no. 3, pp. 2434–2444, 1995.
Appendix A Algorithm
Pseudocode for the diffusion condensation algorithm is provided in Algorithm 1. We wish to highlight that while not absolutely necessary, we increase the bandwidth when the -norm of the difference of the densities, and , of the current and previous iteration is below some threshold, to speed up cluster convergence. When not iteratively increasing , the user should be aware that, viewing as a graph diffusion operator, the underlying data graph can become disconnected, thereby halting the algorithm at a lower level of cluster abstraction.
Appendix B Operator spectra
As originally discussed in Section 5 and illustrated in Figure 2, the Markov operators and provide alternative characterizations of condensation cluster formation. More precisely, and exhibit rapid, finite-time spectral decay that directly corresponds with condensate formation (e.g., the merger of two or more points). This differs from the spectral decay of the diffusion maps operator, , as presented in Figure 5. The former characterizations may be useful in high dimensions, for example, in the identification of major condensation events. One might plot iterations, , against , for , and use to decide the desired level of abstraction (e.g., in analogy with the use of a Scree plot in PCA).
Appendix C Condensation naturally denoises
The video stills in Figure 6 are supplied to demonstrate the robustness of diffusion condensation to noise perturbation. The image at the left is the unperturbed tree, and it is followed by a subsequent iteration of the algorithm. We then perturb the data with Gaussian noise, and repeat the clustering process. Notice the similarity between the condensed forms (middle-left and far-right). This suggests that our algorithm is robust to noise (i.e., inhomogeneous diffusion condensation captures salient information about the data and produces consistent abstractions in the presence of noise.). This is reasonable to expect by similar reasoning as presented in Coifman2006 .