This section presents theoretical background on topological data analysis (TDA). It contains definitions adapted from Tierny et al. . Reference introduction books to Morse theory and computational topology have been published by Milnor  and Edelsbrunner and Harer .
The input data is given as an ensemble of piecewise (PL) linear scalar fields on a PL -manifold (with or ) , with . Each individual scalar field is an ensemble member. For brevity, we will only use the notation in cases where several members are considered and thus need disambiguation. Each member is valued at the vertices of and is linearly interpolated with barycentric coordinates on the remaining simplices of . In practice, is enforced to be injective on with simulation of simplicity . The set of simplices having a given simplex as a face form the star of , . The set of faces of the simplices of that do not intersect form the link of , .
For a vertex , let us define to be the lower link of () and as the upper link of (). When both and are simply connected, is called a regular vertex. If not, is a critical point of . Such points correspond to configurations where the sub-level sets of (subset of valued below the isovalue ) change their topology when continuously varying the isovalue . Critical points often correspond to features of interest in applications. They are usually classified with the notion of index , which is equal to , , and respectively for: minima (empty lower link), -saddles (2 connected components of lower link), -saddles (2 connected components of upper link), and maxima (empty upper link) respectively.
The population of critical points of can be visually encoded with the notion of persistence diagram  (Figure 1). This diagram encodes critical points as pairs such that and . These pairs follow the Elder rule , which intuitively implies that if two topological features of meet at a critical point of , the youngest feature (created at the highest function value) dies, favoring the oldest (created at the lowest function value). In a persistence diagram , each pair is represented as a point in 2D at coordinates , which are the birth and death of the pair respectively. The persistence of the pair is given by its height in the diagram, . It describes the lifespan in the range of the corresponding topological feature. In the following, only the critical point pairs involving local extrema, and , will be considered. The consequence of this simplifying assumption are described in section 5. Moreover, for genericity purposes, all persistence evaluations will be normalized with regard to the largest persistence found in the data (). In practice, the pairs of the diagram located in the vicinity of the diagonal denote low-amplitude noise while prominent features will be associated with persistent pairs, located far away from the diagonal (Figure 1). The persistence diagram has been extensively studied from a theoretical perspective and its stability to perturbations in the input data has been demonstrated . This stability result greatly motivated the use of persistence in applications, ranging from machine learning  to visualization, where it has been shown to significantly help users distinguish salient features from noise.
Our approach is composed of three main steps (Figure 2). It takes as input PL scalar fields defined on the same PL manifold .
First (section 2), the persistence map of each ensemble member is computed. The purpose of this representation is to evaluate the spatial distribution of the critical points in each member, while at the same time balancing the contribution of each critical point by its persistence to emphasize salient features and reduce the contribution of noise.
Second (section 3), we leverage spectral embedding to represent each member as a point in a low-dimensional Euclidean space. Distances in this feature space denote dissimilarities between persistence maps. This space is conducive to further statistical analysis of the members which are clustered based on their persistence maps. The first two dimensions of this space are used to generate planar views which enable the direct visualization of the main trends in the ensemble in terms of critical point layouts.
Third (section 4), confidence regions in the geometrical domain are computed for each cluster by leveraging the notion of mandatory critical points . Finally, the confidence regions of all clusters are composed together into the final persistence atlas. This enables the visualization of the regions of occurrence of the most prominent critical points along with estimations of their probability of appearance.
2 Persistence maps
In this section, we introduce the notion of persistence map, a representation of the critical point distribution in each member.
The main target of persistence maps is to facilitate the comparison of two members and in terms of the layout of their critical points. As discussed in subsection 0.1, existing topological metrics (e.g. the Bottleneck distance ) do not take into account the spatial embedding of the critical points in and are therefore not suited for our purpose. Let and be the set of critical points of and respectively, which can be interpreted as point clouds in . The problem of comparing the layouts of critical points of and then reduces to that of comparing two point clouds, a problem for which no universal solution exists. For instance, the Haussdorff distance, can be seen as a worst-case metric that only measures the distance between the two most distant points of the two sets. This is too limiting for our setting since the similarity of the rest of the point cloud is not assessed. Moreover, due to the presence of noise in the data, it is highly likely in practice that a significant number of the critical points of and are noise artifacts. Such artifacts must be taken into account in the similarity estimation in order to reduce their importance and highlight salient features. This last observation is the main motivation behind persistence maps.
A reason for the difficulty in estimating the similarity between the point clouds and is that there exists no canonical parameterization of these sets allowing for a straightforward comparison with established distance measures, as can be done for streamlines for instance . and may not even be of the same size. This observation motivates the transformation of and into an alternate representation that would yield a natural parameterization directly usable with standard distance measures.
Breckner and Möller  faced a similar problem in the context of isosurface comparison and introduced a signed distance field transform, measuring the distance between each vertex of and the considered isosurface. Then, the similarity between two isosurfaces can be evaluated based on the standard distance between their distance transforms. The same idea has been later used by Ferstl et al.  in a context that resembles our setting (isocontour clustering for level set variability analysis and visualization). We build upon this strategy to construct persistence maps. In particular, one could derive a distance transform for a critical point set , by considering for each vertex , the distance to the closest critical point of . However, such a distance transform would be highly sensitive to the presence of noise in the data since all the critical points of would be considered for its computation. Therefore, it is necessary to develop a transformation where the contribution of each critical point could be weighted by an importance measure, such as topological persistence 
. While such a weighting strategy is difficult to elaborate for distance fields, it is much easier to derive for sums of gaussian radial basis functions. In particular, letbe the following scalar function, where and are scalars controlling the amplitude and spatial spread of the contribution of the critical point :
If constant values are considered for both and , is a measure of the local critical point density (Figure 3, inset). To limit the importance of noisy critical points in this density estimation and to highlight salient features, we use persistence as an importance measure in the expressions of and as follows, where stands for the persistence of the critical point pair containing in :
controls the focus that is given to salient features in terms of their spread in the spatial domain. Distances are normalized with regard to the bounding box diagonal. We have found that is a good value in practice. This representation resembles the notion of persistence images , which focuses on range rather than domain density.
Since they are both defined on the same spatial domain , the persistence maps and of two critical points sets and benefit from a common parameterization and their distance can be estimated with standard distance measures, such as the norm:
By design, this metric is robust to noise, since the contribution of critical points to the persistence maps is weighted by their persistence (Eq. 2). Hence, small persistence pairs (typically corresponding to low amplitude noise, Figure 1(b)) will have a negligible contribution in practice to the persistence maps (Figure 3(b), further discussion in subsection 5.4). This is important since small scale additive noise often occur in practice even for assumed smooth simulation data. This metric is also robust by design to global variations in data values which do not change the critical point spatial layout, since the actual data values are not taken into account in the persistence map. In contrast, the standard distance would tend to miss the possible preservation of salient features in the presence of global shifts in data values, as can be the case with seasonal effects in climate data. Finally, the distance is specifically designed to penalize changes in the layout of salient critical points. The above properties are illustrated in Figure 3, which shows persistence maps on a toy example, , along with three variants: with additive noise, which contains a global shift in data values (slope), and which contains an additional salient feature. For this data, we have: . In other words, with the distance between the actual data values, the noise affected dataset () is the most distant to the original (), while the dataset with a drastic change in critical point layout () is the closest. In contrast, the distance between the corresponding persistence maps results in a different ordering: . In other words, with the persistence map metric, the closest data set from the original () is the one which better preserves the critical point layout (), while the most distant is the one which changes it the most (). This indicates that the metric is indeed more robust to noise and global shift in data values than and that it better describes variations in the layout of salient critical points. Our distance (Eq. 3) resembles the kernel distance defined for generic point cloud data . In contrast, persistence maps focus on the critical points of a scalar field (instead of generic point clouds). This allows to additionally consider in the density estimation the persistence of each critical point as an importance measure (Eq. 2), to highlight salient features and reduce the effect of noise.
3 Space of persistence maps
As described above, the distance between persistence maps is a good candidate to compare the spatial layout of critical points between two members. Based on this metric, a distance matrix is computed for the entire ensemble, with , and then normalized. In this section, we exploit this distance matrix to visualize and identify the main trends in critical point layouts within the ensemble.
3.1 Low dimensional embedding
To directly visualize the global trends in critical point layouts, we first consider a low dimensional embedding of the ensemble into a space of persistence maps, noted , where each map is represented by a point and where distances between points denote distances between persistence maps. For this, we employ established methods for non-linear dimensionality reduction [87, 13]. In particular, we focus on the spectral approach by Belkin et al. 
based on Laplacian eigenmaps, which has been shown to better preserve locality than standard methods such as principal component analysis or Isomap . This property is particularly beneficial if clustering is subsequently considered, which is the case in our framework (subsection 3.2). For completeness, we briefly sketch the main steps of the Laplacian eigenmap approach and we refer the reader to  for further details.
First, an adjacency graph is constructed, where the node represents the ensemble member and where arcs are introduced between the node and its nearest neighbors (according to the distance matrix ). In practice, we set to a default recommended value (). Next, a weight matrix is constructed such that if and are connected in the adjacency graph and otherwise. A diagonal matrix is also established such that . Then, the Laplacian, , of the adjacency graph is considered as , which is a symmetric, positive semidefinite matrix . Finally, the low-dimensonal space is constructed by projecting each ensemble member along the
first eigenvectorsof , which are solutions of the generalized eigenvector problem: (where stands for the eigenvalues of ). In practice, the first eigenvector is discarded, as suggested by Belkin et al. . Thus, the ensemble member is then embedded at position . Since the first eigenvectors of are usually considered to be the most informative , for visualization purpose, we typically represent planar layouts of the space of persistence maps
by only considering the first two components of this vector ().
3.2 Persistence map clusters
Figure 2(d) shows a typical 2D layout of the first two dimensions of the space of persistence maps, , for a toy ensemble dataset. As shown in this example, clear patterns that correspond to distinct trends in critical point layout emerge from this visualization. To quantitatively analyze these patterns, we next employ clustering algorithms. In particular, we employ the popular -means algorithm , which has been shown to be well suited for a combined usage with spectral emdedding (subsection 3.1), yiedling the notion of spectral clustering . This algorithm is based on the classical Lloyd relaxation scheme  which, given an initial assignment of
cluster centroids chosen among the data points, assigns each data point to the cluster of its closest centroid. Next, for each cluster, a new centroid is selected as the point being the closest to the new cluster barycenter and the procedure is iterated until convergence. Note that for the above clustering procedure, the spectral clustering literature recommends to only use thefirst components of , although we found in practice that with our implementation, the most stable results were obtained for .
The number of clusters to be considered is particularly important as it directly corresponds to the number of trends which can be visualized in the ensemble. While we offer users the possibility to explicitly specify pre-defined values of , we also provide an automatic estimation procedure. Several statistical measures have been studied for the automatic estimation of , such as the Bayesian Information Criterion . In the specific case of spectral clustering however, it has been shown that the eigenvalues of the Laplacian matrix (subsection 3.1) already exhibit important hints regarding cluster numbers and that they are particularly useful to identify proper values for . In particular, the first eigenvalue resulting in a significant eigengap is usually considered as a good value for (see von Luxburg  for formal arguments based on perturbation theory). Thus, in practice, we provide as an initial guess for , the position of the first local maximum of eigengap . Figure 4 plots the evolution of the eigengaps for the example of Figure 2. As shown in this figure, the appropriate number of clusters for this specific dataset indeed corresponds to the first local maximum of eigengap (). Note that several other local maxima of eigengaps occur for higher eigenvalues. We also offer users the possibility to interactively explore them individually.
4 Confidence regions for persistence map clusters
The major trends in critical point layout in the ensemble can be identified by clustering the persistence maps (section 3). In this section, we describe how to visualize the spatial variability of critical points within each of the identified clusters.
4.1 Per cluster variability analysis
The clustering procedure described in the previous section identifies disjoints subsets of ensemble members which share a common pattern in critical point layout. Let be such a subset (). To understand the variability of critical points within this subset, one needs first (i) to identify a common topological structure among all of the members of and second (ii) to analyze its spatial variability. As discussed in subsection 0.1, several approaches have been proposed to study the positional uncertainty of critical points. Among those, we focus on the approach based on mandatory critical points  since it is based on point-wise intervals and is, therefore, well suited for the analysis of ensemble data, where no specific assumption can be made about the structure of the point-wise random variables locally modeling the data variability. For completeness, we briefly sketch the main steps of this method and refer the reader to  for further details.
First, pointwise scalar value bounds are extracted as two scalar fields and , such that and . Given an isovalue , let be a connected component of sub-level set of (blue region in Figure 5). By construction, for each vertex in , there exists at least one member for which . Then, there exists a member for which a connected component of sub-level set passes through at isovalue (gray regions in Figure 5). Then, is called a candidate region for the appearance of a local minimum (responsible for the creation of the component in ).
Let be a minimum of . Since and are nested, must be located inside a connected component of sub-level set of at isovalue . Let be that region and let us first consider that is the only minimum of in it. At isovalue , by construction, all the members are such that . This means, that for all the members of the subset , there exists a connected component of sub-level set passing through (gray components containing in Figure 5(c) and Figure 5(d)). In particular, this connected component was created at an earlier isovalue, at one of the vertices of the corresponding candidate region, . Overall, this means that must contain at least one minimum (responsible for the initial creation of the component ) for all the members of . Thus, the region is called a mandatory minimum: a minimal connected component of , associated with a minimal interval , such that any contains at least one minimum in with .
Figure 5 illustrates this process where candidate regions (blue) may contain several connected components (gray) of sub-level set of ensemble members ( is shown in green). Note that if the candidate region contains a second minimum such that , this implies that the sub-level set of all members pass through as well. However, they may do so with the component which already contains (Figure 5(e)). Thus, the existence of such a second minimum does not necessarily imply the existence of an additional minimum in , as it is the case in Figure 5(e) (as opposed to Figure 5(f)). As discussed in section 5, this observation may have important practical implications, as it may prevent the detection of a mandatory critical point in case of high pointwise value variability .
Other types of mandatory critical points are extracted similarly, as described in . Eventually, each cluster is associated with a collection of mandatory critical points, which describe the spatial variability of the common topological structure found among its members.
4.2 Global visualization
The mandatory critical points can be visualized for each cluster independently, by displaying each critical component with a colored region. Additionally, the positional variability of critical points within each region can be estimated and visualized as follows. Given a histogram representation of the data values taken by a vertex in , we estimate this variability as the probability of to admit a scalar value within the critical interval of each mandatory critical point. Finally, we estimate the overall probability of appearance of a mandatory critical point as the proportion between the size of and the total number, , of members in the ensemble. As shown in Figure 2 (right), this probability can be visualized in the form of a barplot. The Persistence Atlas is then created from a collection of confidence maps (composed together) that provide for each major trend found in the ensemble, confidence regions for the appearance of critical points along with their probability of appearance, as well as, their individual critical point spatial variability given by the above estimation (Figure 2).
This section presents experimental results obtained on a desktop computer with a Xeon CPU (2.6 GHz, 2x6 cores), with 64 GB of RAM. For the computation of the persistence diagrams, we used the Topology ToolKit (TTK) . For the spectral embedding and clustering, we adapted classes from the scikit-learn package . The other components of our approach have been implemented as TTK modules.
|Gaussians (Figure 2)||100||262,144||57.28||1.03||0.67||0.08||2.53||61.59|
|Vortex street (Persistence Atlas for Critical Point Variability in Ensembles)||45||30,000||2.28||0.02||0.67||0.09||0.22||3.28|
|Starting vortex (Figure 9)||12||1,500,000||61.44||0.09||0.65||0.07||9.08||71.33|
|Isabel (Figure 10)||12||3,125,000||168.70||0.18||0.63||0.07||41.84||211.68|
|Sea Surface Height (Figure 11)||48||1,036,800||290.25||0.99||0.65||0.08||8.38||300.35|
Figures Persistence Atlas for Critical Point Variability in Ensembles and 6 to 11 report various experiments on simulated and acquired 2D and 3D ensemble datasets. Persistence Atlas for Critical Point Variability in Ensembles presents our entire approach on an ensemble of 45 von Kármán vortex streets, where the considered scalar data is the orthogonal component of the curl taken at a fixed time-step, for five different fluids of distinct viscosity (9 runs per fluid, each run with varying Reynolds numbers). For such scalar fields, local extrema are typically considered as reliable estimations of the center of the vortices. Extrema of a few representative members (Persistence Atlas for Critical Point Variability in Ensembles(a)) exhibit clearly distinct layout patterns, in terms of both the position and number of vortices, revealing high spatial and trend variabilities within the ensemble. The mandatory critical points estimated for the entire ensemble are particularly conservative given these variabilities: only one region is extracted for each side of the street (one for minima, one for maxima). The persistence atlas manages to automatically identify five clusters in the ensemble, corresponding to distinct critical point layouts (one per viscosity regime). The mandatory critical points extracted from these clusters provide more accurate and useful predictions for the appearance of vortices (colored regions in (d) to (h), one color per cluster). In particular, the persistence atlas reveals that the number of vortices increases with the Reynolds numbers (from left to right: 6, 10, 12, 14 and 15 vortices) while the spatial variability of each vortex tends to decrease for increasing Reynolds numbers (smaller mandatory critical points). Figure 6 illustrates persistence maps for five representative members of the ensemble and shows how salient features are captured by this representation. Figure 7 shows persistence maps on a volumetric ensemble composed of groups of key timesteps (formation, drift and landfall) in the simulation of the Isabel hurricane . For such datasets, the eyewall of the hurricane is typically characterized by high wind velocities (green regions, Figure 7, left) and contains salient maxima. In particular, this figure shows that subtle features of the hurricane (eyewall, high wind speed peripheral regions and hurricane’s tail) are well captured by local maxima of the wind velocity magnitude and by the corresponding persistence maps. As discussed in subsection 2.3, the norm between persistence maps is more suited to our purpose than the norm between the actual data values, since it is more robust to noise and global shifts in data values, while better discriminating changes in salient features (Figure 3). Figure 8 further exemplifies this observation on the Starting vortex ensemble, which includes 12 runs of a 2D simulation of the formation of a vortex behind a wing, for two distinct wing configurations. The considered scalar field is the curl orthogonal component and salient extrema are expected at the center of vortices. Given the small spatial extent of the features behind the wing, the norm between the actual data values fails at capturing similarities between members belonging to the second configuration, as denoted by the corresponding distance matrix, where distances are important in the upper-right corner. In particular, two members belonging to the same wing configuration are reported by this distance as the two furthest members (darkest green entry). In contrast, the distance matrix computed from persistence maps exhibits much smaller (resp. higher) distances between the members belonging to a common (resp. distinct) wing configuration.
Figure 9 shows the persistence atlas for the Starting vortex ensemble. Given the trend variability of this dataset, the mandatory critical points computed from the entire ensemble exhibit only one, very large, mandatory maximum (colored region) describing the appareance of vortices for both wing configurations, although these two vortices never occur simultaneously in the data. The persistence atlas automatically identifies the two trends present in the data, as shown in the planar view (center), resulting in much more accurate predictions for the appearance of the distinctly identified vortices (green and blue region, right). Figure 10 shows the persistence atlas for the Isabel ensemble. Similarly to the previous example, mandatory critical points computed from the entire ensemble identify only one, very large, mandatory maximum, which merges the three distinct states of the hurricane. In contrast, the persistence atlas manages to isolate these three states and provides much more accurate confidence regions for the position of the hurricane eyewall. Note that this example is the only dataset for which the initial automatic suggestion for the number of clusters provided by the eigengap heuristic needed adjustment. All the other results have been generated with the automatic suggestion. Figure 11 shows the persistence atlas for the Sea surface height ensemble, which is composed of 48 observations taken in January, April, July and October 2012 (https://ecco.jpl.nasa.gov/products/all/). For such datasets, salient extrema in the height are expected at the center of eddies. The mandatory critical points globally extracted on the entire ensemble identify only few features (Figure 11(a)), due to the high pointwise data variability (subsection 4.1). The clustering automatically performed by our approach based on the persistence maps correctly identifies four clusters, corresponding to the four seasons: winter (c), spring (d), summer (e) and fall (f). This seasonal decomposition drastically reduces pointwise data variability and enables mandatory critical points to identify many more structures, corresponding to clockwise and counterwise vortices (minima and maxima) and revealing complex structures in the Gulf stream area (insets). Note that, for this example, due to the high number of critical points and their respective proximity, the parameter , controlling the spread of salient features in the persistence map, has been set to instead of the default value ().
5.2 Time performance
Table 1 presents the running times we obtained for the datasets presented in this paper. The most time consuming portion of our approach is the computation of the persistence maps, which typically needs to be run for each ensemble member as a pre-process. Since the number of pairs in the diagram is typically proportional to the number of vertices in the domain, this part requires steps overall. In practice, to accelerate this computation, we ignore all pairs with a persistence less than of the total function range. The distance matrix computation takes steps, but since is typically much smaller than , the computation time for this step is small in practice. Both the spectral embedding (subsection 3.1) and clustering (subsection 3.2) employ iterative solvers but these computations are typically the fastest steps of the pipeline. The computation of the mandatory critical points for each cluster admits quadratic complexity .
Most of these steps can be trivially parallelized. The persistence diagram is computed in parallel  and the persistence map can be evaluated independently for each vertex. Each entry of the distance matrix (subsection 2.3) can be computed independently. Finally, mandatory critical points are computed in parallel for each cluster. As reported in Table 1, once the persistence maps have been computed in a pre-process, the rest of the framework is sufficiently fast to allow interactivity.
In this section, we compare our approach to alternative critical point clustering strategies. First, we consider a baseline approach, which consists in simply clustering persistent critical points in the spatial domain, by using a vanilla implementation of spectral clustering, combined with our eigengap heuristic for the automatic suggestion of the number of clusters (subsection 3.2). Once such clusters have been computed, this baseline strategy evaluates confidence regions for the appearance of critical points by considering the convex hull of each cluster in the spatial domain. As shown in Figure 12(a), this simple strategy provides unsatisfactory results for the von Kármán vortex street ensemble (Persistence Atlas for Critical Point Variability in Ensembles) since features which never occur simultaneously in the ensemble are clustered based on their proximity. In particular, the extracted clusters mix the two types of vortices (right and left) and group them based on their distance from the obstacle (bottom).
To further evaluate our approach, we consider the ensemble from Figure 2, which we split in half into a training and test ensemble. The training ensemble is analyzed with (i) the baseline approach (Figure 12(b)), (ii) a strategy based on the kernel method by Reininghaus et al.  (Figure 12(c), where the distance matrix considered for clustering has been generated with the authors’ implementation of the kernel method  run with default parameters) and (iii) the persistence atlas (Figure 12(e)). To quantitatively evaluate the prediction performance of these approaches, we consider the persistent critical points of the test ensemble having a persistence higher than 20% of the function range. Next, the test critical points are assigned to the confidence region in which they land in the domain (spheres of matching colors in Figure 12). As shown in Figure 12(b), the baseline approach overestimates the number of clusters. In particular, it fails at clustering together features which always occur simultaneously (dark green and light blue clusters in Figure 12(b)). Kernel based methods for persistence diagrams [70, 40] do not take the spatial embedding of critical points into account (Figure 12(c)) and cluster members with the same persistence profile, irrespective of the features’ location. This leads to an underestimated number of clusters: the blue and white clusters of Figure 2, which both include a single very persistent maximum, are erroneously merged although the corresponding features never occur simultaneously in the ensemble. Moreover, convex hulls obtained from this clustering overestimate the size of the confidence regions in the presence of multiple salient features per cluster. Even when the correct clustering is explicitly provided (Figure 12(d)), confidence regions based on convex hulls miss 21% of the persistent critical points of the test ensemble. In contrast, the persistence atlas (Figure 12(e)) provides a correct prediction for 100% of the critical points of the test ensemble, which illustrates the quantitative performance of the persistence atlas regarding critical point prediction.
Our entire pipeline assumes that the input data is given as a collection of piecewise linear scalar fields (subsection 1.1). In many applications , this may be too restrictive (motivating taylored interpolants for uncertainty modeling ). However, generalizing the TDA arsenal to a larger set of interpolants is a vast research topic (see  for an example) which goes beyond the scope of this paper. Our approach focuses on and persistence pairs, which correspond to pairs only involving minima and maxima. Persistence maps (Eq. 1) are therefore only computed based on the location of either the minima or maxima (Figs. 2, 9 and 7), or both (Figs. 6 and 11). Hence, saddle points are not taken into account by our framework in its current form. However, we have found that in practice the correspondence between saddle points and features of interest was less clear in our applications. Also, when the data exhibits salient large flat plateaus, persistent critical points can appear in arbitrary locations inside these plateaus. This can potentially impair the stability of the persistence map. However, we did not observe this behavior in practice on our datasets as large plateaus, when they occurred, were not collocated with salient features. We found in practice that using constant weights () for the evaluation of the Laplacian of the adjacency graph of the persistence maps (section 3) resulted in more stable and accurate clusterings than the second weighting strategy (based on heat kernels) described by Belkin et al. . However, constant weights result in the limitation that several members can be projected to the exact same point in the low dimensional space when they exhibit a very similar neighborhood pattern in the adjacency graph. In this case, the number of visible points in our planar layouts may be smaller than the actual number of members. However, this non-uniqueness in the embedding only occurs for persistence maps which are very close to each other, hence it does not impact negatively the clustering or analysis. Although the automatic suggestion for the number of clusters provided satisfactory results for all but one example (where it needed to be changed from to , Figure 10), an exhaustive interactive exploration may be needed when there is no clear trend in the ensemble. Finally, the persistence atlas currently displays simultaneously mandatory critical points for all clusters. This may result in cluttered visualizations due to overlapping. Although we provide users with the possibility of refining this visualization to a selected subset of clusters, improved strategies for the overall visualization of the atlas could be researched in the future.
In this paper, we presented the Persistence Atlas, an approach for the visual analysis of the spatial variability of features of interest represented by critical points in ensemble data. By analyzing the structure of the ensemble in terms of patterns of critical point layouts, our method addresses trend variability, by identifying clusters of ensemble members which share a common geometrical configuration of critical points. By computing mandatory critical points for each cluster, our approach addresses spatial variability, by showing minimal regions where at least one critical point is guaranteed to occur for each member of the cluster, hence conveying the possible extent of features for each trend. Our approach is based on the new notion of Persistence Map, which describes the local density in critical points and leverages topological persistence to emphasize salient features, and which has been shown to be well suited for the purpose of comparing geometrical layouts of critical points. We showed how to leverage spectral embedding methods to provide low-dimensional views representing the main trends found in the ensemble. We also showed how to leverage spectral clustering to automatically identify revelant clusters of ensemble members and how to provide relevant automatic guesses based on eigengaps for the number of clusters. In practice, our approach has been shown to provide more accurate descriptions of the variability of critical points than global methods, such as the original mandatory critical points , which either miss features or considerably over-estimate spatial variability in the presence of trend variability. We quantitatively evaluated the prediction accuracy of our method and showed that it compared favorably to a baseline strategy based on an off-the-shelf clustering approach. Our work extends recent advances in the visual analysis of spatial variability in ensembles of geometrical objects, such as level sets  or streamlines , to topological structures. In particular, we focused in this paper on features of interest represented by critical points. However, many more topological constructions could benefit from a similar variability analysis based on such tailored clustering strategies. For instance, the separatrices of the Morse-Smale complex [32, 71] have been shown to excel at representing filament structures in various applications, such as chemistry [28, 10] or astrophysics [79, 78], and studying their trend and spatial variabilities would be of tremedous help for the understanding of non-deterministic models in these applications. By first focusing on critical points, we believe we made a first step in this direction, which will be helpful and inspirational for future generalizations to other topological constructions.
Acknowledgements.This work is partially supported by the BPI grant “AVIDO” (PIA FSN2, reference P112017-2661376/DOS0021427), NSF CRII 1657020, and NSF/NIH QuBBD 1664848. We would like to thank the reviewers for their thoughtful remarks and suggestions. Julien Tierny would like to dedicate this paper to his son Otis.
-  ISO/IEC Guide 98-3:2008 uncertainty of measurement-part 3: Guide to the expression of uncertainty in measurement (GUM), 2008.
-  H. Abdi and L. Williams. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2010.
-  H. Adams, T. Emerson, M. Kirby, R. Neville, C. Peterson, P. Shipman, S. Chepushtanova, E. Hanson, F. Motta, and L. Ziegelmeier. Persistence images: A stable vector representation of persistent homology. Journal of Machine Learning Research, 2017.
-  T. Athawale and A. Entezari. Uncertainty quantification in linear interpolation for isosurface extraction. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2013.
-  T. Athawale, E. Sakhaee, and A. Entezari. Isosurface visualization of data with nonparametric models for uncertainty. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2016.
-  T. F. Banchoff. Critical points and curvature for embedded polyhedral surfaces. The American Mathematical Monthly, 1970.
-  U. Bauer, X. Ge, and Y. Wang. Measuring distance between Reeb graphs. In Symp. on Comp. Geom., 2014.
-  K. Beketayev, D. Yeliussizov, D. Morozov, G. H. Weber, and B. Hamann. Measuring the distance between merge trees. In Topological Methods in Data Analysis and Visualization III, Theory, Algorithms, and Applications. 2014.
-  M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 2003.
-  H. Bhatia, A. G. Gyulassy, V. Lordi, J. E. Pask, V. Pascucci, and P.-T. Bremer. Topoms: Comprehensive topological exploration for molecular and condensed-matter systems. Journal of Computational Chemistry, 2018.
-  H. Bhatia, S. Jadhav, P. Bremer, G. Chen, J. Levine, L. Nonato, and V. Pascucci. Flow visualization with quantified spatial and temporal errors using edge maps. IEEE Transactions on Visualization and Computer Graphics, 18(9):1383–1396, 2012.
-  G.-P. Bonneau, H.-C. Hege, C. Johnson, M. M. Oliveira, K. Potter, and P. Rheingans. Overview and State-of-the-Art of Uncertainty Visualization. In Scientific Visualization: Uncertainty, Multifield, Biomedical, Scalable, Mathematics and Visualization. Springer, 2014.
-  I. Borg and P. Groenen. Modern multidimensional scaling: Theory and Applications. Springer, 2005.
-  M. B. Botnan and H. B. Bjerkevik. Computational complexity of the interleaving distance. In Symp. on Comp. Geom., 2018.
-  P. Bremer, G. Weber, J. Tierny, V. Pascucci, M. Day, and J. Bell. Interactive exploration and analysis of large scale simulations using topology-based data segmentation. IEEE Transactions on Visualization and Computer Graphics, 2011.
-  S. Bruckner and T. Möller. Isosurface similarity maps. Computer Graphics Forum (Proc. of EuroVis), 2010.
-  M. Carrière, M. Cuturi, and S. Oudot. Sliced wasserstein kernel for persistence diagrams. In ICML, 2017.
-  F. Chazal, D. Cohen-Steiner, M. Glisse, L. J. Guibas, and S. Oudot. Proximity of persistence modules and their diagrams. In Symp. on Comp. Geom., 2009.
-  F. Chazal, L. Guibas, S. Oudot, and P. Skraba. Persistence-based clustering in Riemannian manifolds. Journal of the ACM, 2013.
-  D. Cohen-Steiner, H. Edelsbrunner, and J. Harer. Stability of persistence diagrams. In Symp. on Comp. Geom., 2005.
-  P. Diggle, P. Heagerty, K.-Y. Liang, and S. Zeger. Analysis of longitudinal data. Oxford University Press, 2002.
-  H. Edelsbrunner and J. Harer. Computational Topology: An Introduction. American Mathematical Society, 2009.
-  H. Edelsbrunner, D. Letscher, and A. Zomorodian. Topological persistence and simplification. Disc. Compu. Geom., 2002.
-  H. Edelsbrunner and E. P. Mucke. Simulation of simplicity: a technique to cope with degenerate cases in geometric algorithms. ACM Trans. on Graph., 1990.
-  G. Favelier, C. Gueunet, and J. Tierny. Visualizing ensembles of viscous fingers. In IEEE SciVis Contest, 2016.
-  F. Ferstl, K. Bürger, and R. Westermann. Streamline variability plots for characterizing the uncertainty in vector field ensembles. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2016.
-  F. Ferstl, M. Kanzler, M. Rautenhaus, and R. Westermann. Visual analysis of spatial variability and global correlations in ensembles of iso-contours. Computer Graphics Forum (Proc. of EuroVis), 2016.
-  D. Guenther, R. Alvarez-Boto, J. Contreras-Garcia, J.-P. Piquemal, and J. Tierny. Characterizing molecular interactions in chemical systems. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2014.
-  D. Guenther, J. Salmon, and J. Tierny. Mandatory critical points of 2D uncertain scalar fields. Computer Graphics Forum (Proc. of EuroVis), 2014.
-  C. Gueunet, P. Fortin, J. Jomier, and J. Tierny. Task-based Augmented Merge Trees with Fibonacci Heaps,. In IEEE LDAV, 2017.
-  A. Gyulassy, P. Bremer, R. Grout, H. Kolla, J. Chen, and V. Pascucci. Stability of dissipation elements: A case study in combustion. Computer Graphics Forum (Proc. of EuroVis), 2014.
-  A. Gyulassy, P. T. Bremer, B. Hamann, and V. Pascucci. A practical approach to morse-smale complex computation: Scalability and generality. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2008.
-  A. Gyulassy, M. A. Duchaineau, V. Natarajan, V. Pascucci, E. Bringa, A. Higginbotham, and B. Hamann. Topologically clean distance fields. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2007.
-  A. Gyulassy, A. Knoll, K. Lau, B. Wang, P. Bremer, M. Papka, L. A. Curtiss, and V. Pascucci. Interstitial and interlayer ion diffusion geometry extraction in graphitic nanosphere battery materials. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2015.
-  C. Heine, H. Leitte, M. Hlawitschka, F. Iuricich, L. De Floriani, G. Scheuermann, H. Hagen, and C. Garth. A survey of topology-based methods in visualization. Comp. Grap. For., 2016.
-  M. Hilaga, Y. Shinagawa, T. Kohmura, and T. L. Kunii. Topology matching for fully automatic similarity estimation of 3D shapes. In Proc. of ACM SIGGRAPH, 2001.
-  M. Hummel, H. Obermaier, C. Garth, and K. I. Joy. Comparative visual analysis of lagrangian transport in CFD ensembles. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2013.
-  C. Johnson and A. Sanderson. A next step: Visualizing errors and uncertainty. IEEE Computer Graphics and Applications, 2003.
-  J. Kasten, J. Reininghaus, I. Hotz, and H. Hege. Two-dimensional time-dependent vortex regions based on the acceleration magnitude. IEEE Transactions on Visualization and Computer Graphics, 2011.
-  R. Kwitt. Persistence learning. https://github.com/rkwitt/persistence-learning, 2015.
-  D. E. Laney, P. Bremer, A. Mascarenhas, P. Miller, and V. Pascucci. Understanding the structure of the turbulent mixing layer in hydrodynamic instabilities. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2006.
-  T. Liebmann and G. Scheuermann. Critical points of gaussian-distributed scalar fields on simplicial grids. Computer Graphics Forum (Proc. of EuroVis), 2016.
-  S. P. Lloyd. Least square quantization in pcm. Technical report, Bell Telephone Laboratories, 1957.
-  A. M. MacEachren, A. Robinson, S. Hopper, S. Gardner, R. Murray, M. Gahegan, and E. Hetzler. Visualizing geospatial information uncertainty: What we know and what we need to know. Cartography and Geographic Information Science, 32(3):139–160, 2005.
-  J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proc. Symposium on Mathematical Statistics and Probability, 1967.
-  J. Milnor. Morse Theory. Princeton U. Press, 1963.
-  M. Mirzargar, R. T. Whitaker, and R. M. Kirby. Curve boxplot: Generalization of boxplot for ensembles of curves. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2014.
-  J. Munkres. Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics, 1957.
-  G. Nucha, G. Bonneau, S. Hahmann, and V. Natarajan. Computing contour trees for 2d piecewise polynomial functions. Computer Graphics Forum (Proc. of EuroVis), 2017.
-  S. Oeltze, D. J. Lehmann, A. Kuhn, G. Janiga, H. Theisel, and B. Preim. Blood flow clustering and applications invirtual stenting of intracranial aneurysms. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2014.
-  M. Otto, T. Germer, H.-C. Hege, and H. Theisel. Uncertain 2D vector field topology. Comp. Graph. For., 29:347–356, 2010.
-  M. Otto, T. Germer, and H. Theisel. Uncertain topology of 3d vector fields. In Proc. of IEEE PacificVis, 2011.
-  A. T. Pang, C. M. Wittenbrink, and S. K. Lodha. Approaches to uncertainty visualization. The Visual Computer, 13(8):370–390, 1997.
-  F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. VanderPlas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 2011.
D. Pelleg and A. W. Moore.
X-means: Exteding k-means with efficient estimation of the number of clusters.In Proc. of ICML, 2000.
-  C. Petz, K. Pöthkow, and H.-C. Hege. Probabilistic local features in uncertain vector fields with spatial correlation. Computer Graphics Forum (Proc. of EuroVis), 31(3pt2):1045–1054, 2012.
-  T. Pfaffelmoser, M. Mihai, and R. Westermann. Visualizing the variability of gradients in uncertain 2d scalar fields. IEEE Transactions on Visualization and Computer Graphics, 2013.
-  T. Pfaffelmoser, M. Reitinger, and R. Westermann. Visualizing the positional and geometrical variability of isosurfaces in uncertain scalar fields. Computer Graphics Forum (Proc. of EuroVis), 30:951–960, 2011.
-  T. Pfaffelmoser and R. Westermann. Visualization of global correlation structures in uncertain 2d scalar fields. Computer Graphics Forum (Proc. of EuroVis), 2012.
-  T. Pfaffelmoser and R. Westermann. Visualizing contour distributions in 2d ensemble data. In EuroVis-Short Papers, pp. 55–59. The Eurographics Association, 2013.
J. M. Phillips, B. Wang, and Y. Zheng.
Geometric inference on kernel density estimates.In Symp. on Comp. Geom., 2015.
-  K. Pöthkow and H.-C. Hege. Positional uncertainty of isocontours: Condition analysis and probabilistic measures. IEEE Transactions on Visualization and Computer Graphics, 17(10):1393–1406, 2011.
-  K. Pöthkow and H.-C. Hege. Nonparametric models for uncertainty visualization. Computer Graphics Forum (Proc. of EuroVis), 32:131–140, 2013.
-  K. Pöthkow, C. Petz, and H.-C. Hege. Approximate level-crossing probabilities for interactive visualization of uncertain isocontours. Int. J. Uncert. Quantif., 3:101–117, 2013.
-  K. Pöthkow, B. Weber, and H. Hege. Probabilistic marching cubes. Computer Graphics Forum (Proc. of EuroVis), 2011.
-  K. Potter, S. Gerber, and E. Anderson. Visualization of uncertainty without a mean. IEEE CGA, 33:75–79, 2013.
-  K. Potter, P. Rosen, and C. R. Johnson. From quantification to visualization: A taxonomy of uncertainty visualization approaches. In Uncertainty Quantification in Scientific Computing, vol. 377, pp. 226–249. Springer, 2012.
-  K. Potter, A. T. Wilson, P. Bremer, D. N. Williams, C. M. Doutriaux, V. Pascucci, and C. R. Johnson. Ensemble-vis: A framework for the statistical visualization of ensemble data. In IEEE International Conference on Data Mining Workshops, 2009.
-  P. S. Quinan and M. D. Meyer. Visually comparing weather features in forecasts. IEEE Transactions on Visualization and Computer Graphics, 2016.
-  J. Reininghaus, S. Huber, U. Bauer, and R. Kwitt. A stable multi-scale kernel for topological machine learning. In IEEE CVPR, 2015.
-  V. Robins, P. Wood, and A. Sheppard. Theory and algorithms for constructing discrete morse complexes from grayscale digital images. IEEE Trans. on Pat. Ana. and Mach. Int., 2011.
-  H. Saikia, H. Seidel, and T. Weinkauf. Extended branch decomposition graphs: Structural comparison of scalar data. Computer Graphics Forum (Proc. of EuroVis), 2014.
-  J. Sanyal, S. Zhang, J. Dyer, A. Mercer, P. Amburn, and R. J. Moorhead. Noodles: A tool for visualization of numerical weather model ensemble uncertainty. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2010.
-  G. Scheuermann, X. Tricoche, and H. Hagen. C1-interpolation for vector field topology visualization. In IEEE VIS, 1999.
S. Schlegel, N. Korn, and G. Scheuermann.
On the interpolation of data with normally distributed uncertainty for visualization.IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2012.
-  I. SciVisContest. Simulation of the isabel hurricane. http://sciviscontest-staging.ieeevis.org/2004/data.html.
-  J. Shi and J. Malik. Normalized cuts and image segmentation. In Proc. of IEEE CVPR, 1997.
-  N. Shivashankar, P. Pranav, V. Natarajan, R. van de Weygaert, E. P. Bos, and S. Rieder. Felix: A topology based framework for visual exploration of cosmic filaments. IEEE Transactions on Visualization and Computer Graphics, 2016. http://vgl.serc.iisc.ernet.in/felix/index.html.
-  T. Sousbie. The persistent cosmic web and its filamentary structure: Theory and implementations. Royal Astronomical Society, 2011. http://www2.iap.fr/users/sousbie/web/html/indexd41d.html.
-  A. Szymczak. Hierarchy of stable morse decompositions. IEEE Transactions on Visualization and Computer Graphics, 19(5):799–810, 2013.
-  J. B. Tenenbaum, V. de Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 2000.
-  D. M. Thomas and V. Natarajan. Multiscale symmetry detection in scalar fields by clustering contours. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2014.
-  J. Tierny, G. Favelier, J. A. Levine, C. Gueunet, and M. Michaux. The Topology ToolKit. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2017. https://topology-tool-kit.github.io/.
-  K. Turner, Y. Mileyko, S. Mukherjee, and J. Harer. Fréchet means for distributions of persistence diagrams. Discrete & Computational Geometry, 2014.
-  U. von Luxburg. A tutorial on spectral clustering. In Statistics and Computing, 2007.
-  R. T. Whitaker, M. Mirzargar, and R. M. Kirby. Contour boxplots: A method for characterizing uncertainty in feature sets from simulation ensembles. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2013.
-  F. Wickelmaier. An introduction to mds. Technical report, Aalborg University, 2003.