Progressive Wasserstein Barycenters of Persistence Diagrams

07/10/2019 ∙ by Jules Vidal, et al. ∙ 1

This paper presents an efficient algorithm for the progressive approximation of Wasserstein barycenters of persistence diagrams, with applications to the visual analysis of ensemble data. Given a set of scalar fields, our approach enables the computation of a persistence diagram which is representative of the set, and which visually conveys the number, data ranges and saliences of the main features of interest found in the set. Such representative diagrams are obtained by computing explicitly the discrete Wasserstein barycenter of the set of persistence diagrams, a notoriously computationally intensive task. In particular, we revisit efficient algorithms for Wasserstein distance approximation [12,51] to extend previous work on barycenter estimation [94]. We present a new fast algorithm, which progressively approximates the barycenter by iteratively increasing the computation accuracy as well as the number of persistent features in the output diagram. Such a progressivity drastically improves convergence in practice and allows to design an interruptible algorithm, capable of respecting computation time constraints. This enables the approximation of Wasserstein barycenters within interactive times. We present an application to ensemble clustering where we revisit the k-means algorithm to exploit our barycenters and compute, within execution time constraints, meaningful clusters of ensemble data along with their barycenter diagram. Extensive experiments on synthetic and real-life data sets report that our algorithm converges to barycenters that are qualitatively meaningful with regard to the applications, and quantitatively comparable to previous techniques, while offering an order of magnitude speedup when run until convergence (without time constraint). Our algorithm can be trivially parallelized to provide additional speedups in practice on standard workstations. [...]

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 8

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Preliminaries

This section presents the theoretical background of our approach. It contains definitions adapted from the Topology ToolKit [92]. It also provides, for self completeness, concise descriptions of the key algorithms [94, 51] that our work extends. We refer the reader to Edelsbrunner and Harer [27] for an introduction to computational topology.

Figure 3: Illustration of the Auction algorithm [12] on a 2D example for the optimal assignment of 2 point sets (light and dark green). Boxes and columns represent auction rounds and auction iterations respectively. Each matrix reports the value currently estimated by the bidder for the purchase of the object . Assignments are shown with black arrows. After the first round (left box), a sub-optimal assignment is achieved, which becomes optimum at the second round (right box). Note that bidders can steal objects from each other within one auction round (iteration , second round).

1.1 Persistence diagrams

The input data is an ensemble of piecewise linear (PL) scalar fields defined on a PL -manifold , with in our applications. We note the sub-level set of , namely the pre-image of by . When continuously increasing , the topology of can only change at specific locations, called the critical points of . In practice, is enforced to be injective on the vertices of [30], which are in finite number. This guarantees that the critical points of are isolated and also in finite number. Moreover, they are also enforced to be non-degenerate, which can be easily achieved with local re-meshing [28]. Critical points are classified according to their index : 0 for minima, 1 for 1-saddles, for -saddles, and for maxima. Each topological feature of (i.e. connected component, independent cycle, void) can be associated with a unique pair of critical points , corresponding to its birth and death. Specifically, the Elder rule [27] states that critical points can be arranged according to this observation in a set of pairs, such that each critical point appears in only one pair such that and . Intuitively, this rule implies that if two topological features of (e.g. two connected components) meet at a critical point , the youngest feature (i.e. created last) dies, favoring the oldest one (i.e. created first). Critical point pairs can be visually represented by the persistence diagram, noted , which embeds each pair to a single point in the 2D plane at coordinates , which respectively correspond to the birth and death of the associated topological feature. The persistence of a pair, noted , is then given by its height . It describes the lifetime in the range of the corresponding topological feature. The construction of the persistence diagram is often restricted to a specific type of pairs, such as pairs or pairs, respectively capturing features represented by minima or maxima. In the following, we will consider that each persistence diagram is only composed of pairs of a fixed critical type , which will be systematically detailed in the experiments. Note that in practice, each critical point in the diagram additionally stores its 3D coordinates in to allow geometrical lifting (Sec. 1.2).

In practice, critical point pairs often directly correspond to features of interest in the applications and their persistence has been shown to be a reliable importance measure to discriminate noise from important features. In the diagram, the pairs located in the vicinity of the diagonal represent low amplitude noise while salient features will be associated with persistent pairs, standing out far away from the diagonal (Fig. 1) . For instance, in Fig. 1, the two hills are captured by two large pairs, while noise is encoded with smaller pairs near the diagonal. Thus, the persistence diagram has been shown in practice to be a useful, concise visual representation of the population of features of interest in the data, for which the number, range values and salience can be directly and visually read. In addition to its stability properties [22], these observations motivate its usage for data summerization, where it significantly help users distinguish salient features from noise. In the following, we review distances between persistence diagrams [22, 27] (Sec. 1.2), efficient algorithms for their approximation [12, 51] (Sec. 1.3), as well as the reference approach for barycenter estimation [94] (Sec. 1.4).

1.2 Distances between Persistence diagrams

To compute the barycenter of a set of persistence diagrams, a necessary ingredient is the notion of distance between them. Given two diagrams and , a pointwise distance, noted , inspired from the norm, can be introduced in the 2D birth/death space between two points and , with , as follows :

(1)

By convention, is set to zero if both and exactly lie on the diagonal ( and ). The -Wasserstein distance [49, 62], noted , between and can then be introduced as:

(2)

where is the set of all possible assignments mapping each point to a point , or to the closest diagonal pointits projection onto the diagonal, , which denotes the removal of the corresponding feature from the assignment, with a cost (see additional materials for an illustration). The Wasserstein distance can be computed by solving an optimal assignment problem, for which existing algorithms [64, 63] however often require a balanced setting. To address this, the input diagrams and are typically augmented into and , which are obtained by injecting the diagonal projections of all the points of one diagram into the other:

(3)
(4)

In this way, the Wasserstein distance is guaranteed to be preserved by construction, , while making the assignment problem balanced () and thus solvable with traditional assignment algorithms. Note that wWhen , becomes a worst case assignment distance called the Bottleneck distance [22]., which is often interpreted in practice as less informative, however, than the Wasserstein distance. In the following, we will focus on .

In the applications, it can often be useful to geometrically lift the Wasserstein metric, by also taking into account the geometrical layout of critical points [86]. Let be the critical point pair corresponding the point . Let be their linear combination with coefficient in : . Our experiments (Sec. 5) only deal with extrema, and we set to for minima and for maxima (to only consider the extremum’s location). Then, the geometrically lifted pointwise distance can be given as:

(5)

The parameter quantifies the importance that is given to the geometry of critical points and it must be tuned on a per application basis. When , the combination of with becomes the classical Earth Mover’s distance [DBLP:conf/iccv/LevinaB01] between critical points in and thus enables users to simply linearly blend these two metrics.

1.3 Efficient distance computation by Auction

This section briefly describes a fast approximation of Wasserstein distances by the auction algorithm [12, 51], which is central to our method.

The assignment problem involved in Eq. 2 can be modeled in the form of a weighted bipartite graph, where the points are represented as nodes, connected by edges to nodes representing the points of , with an edge weight given by (Eq. 1). To efficiently estimate the optimal assignment, Bertsekas introduced the auction algorithm [12] (Fig. 3), which replicates the behavior of a real-life auction: the points of are acting as bidders that iteratively make offers for the purchase of the points of , known as the objects. Each bidder makes a benefit for the purchase of an object , which is itself labeled with a price , initially set to . During the iterations of the auction, each bidder tries to purchase the object of highest value . The bidder is then said to be assigned to the object . If was previously assigned, its previous owner becomes unassigned. At this stage, the price of is increased by , where is the absolute difference between the two highest values that the bidder found among the objects , and where is a constant. This bidding procedure is repeated iteratively among the bidders, until all bidders are assigned (which is guaranteed to occur by construction, thanks to the constant). At this point, we sayit is said that an auction round has completed: a bijective, possibly sub-optimal, assignement exists between and . The overall algorithm will repeat auction rounds, which progressively increases prices under the effect of competing bidders.

The constant plays a central role in the auction algorithm. Let be the approximation of the Wasserstein distance , obtained with the assignment returned by the algorithm. Large values of will drastically accelerate convergence (as they imply fewer iterations for the construction of a bijective assignment within one auction round, Fig. 3), while low values will improve the accuracy of . This observation is a key insight at the basis of our approach. Bertsekas suggests a strategy called -scaling, which decreases after each auction round. In particular, if:

(6)

then it can be shown that [13, 51]:

(7)

This result is particularly important, as it enables to estimate the optimal assignment, and thus the Wasserstein distance, with an on-demand accuracy (controlled by the parameter ) by using Eq. 6 as a stopping condition for the overall auction algorithm. For persistence diagrams, Kerber et al. showed how the computation could be accelerated by using space partitioning data structures such as kd-trees [51]. In practice, we set to be initially is initially set to be equal to of the largest edge weight , we divide itand is divided by after each auction round, as recommended by Bertsekas [12], and we. We use is set to as suggested by Kerber et al. [51].

Input : Set of diagrams

Output : Wasserstein barycenter

1:   // with randomly chosen in
2:  while  change do
3:     // Relaxation start
4:     for  do
5:          // optimizing Eq. 2
6:     end for
7:      // arithmetic means in birth/death space
8:     // Relaxation end
9:  end while
10:  return  
Algorithm 1 Reference algorithm for Wasserstein Barycenters [94].

1.4 Wasserstein barycenters of Persistence diagrams

Let be the space of persistence diagrams. The discrete Wasserstein barycenter of a set of persistence diagrams can be introduced as the Fréchet mean of the set, under the metric . It is the diagram that minimizes its distance to all the diagrams of the set (i.e. minimizer of the so-called Fréchet energy):

(8)

The computation of Wasserstein barycenters involves a computationally demanding optimization problem, for which the existence of at least one locally optimum solution has been shown by Turner et al. [94], who also introduced the first algorithm for its computation. This algorithm (Alg. 1) consists in iterating a procedure that we call Relaxation (line 3 to 8), which resembles a Lloyd relaxation [56], and which is composed itself of two sub-routines: (i) Assignment (line 5) and (ii) Update (line 7). Given an initial barycenter candidate randomly chosen among the set , the first step ((i) Assignment) consists in computing an optimal assignment between and each diagram of the set , with regard to Eq. 2. The second step ((ii) Update) consists in updating the candidate to a position in which minimizes the sum of its squared distances to the diagrams of under the current set of assignments . In practice, this last step is achieved by simply replacing each point by the arithmetic mean (in the birth/death space) of all its assignments . The overall algorithm continues to iterate the Relaxation procedure until the set of optimal assignments remains identical for two consecutive iterations.

1.5 Overview of our approach

The reference algorithm for Wasserstein barycenters (Alg. 1) reveals impractical for real-life data sets. Its main computational bottleneck is the Assignment step, which involves the computation of Wasserstein distances (Eq. 2). However, as detailed in the result section (Sec. 5), even when combined with efficient algorithms for the Wasserstein distance exact computation [86] or even approximation [51] (Sec. 1.3), this overall method can still lead to hours of computation in practice.

Thus, a drastically different approach is needed to improve computation efficiency, especially for applications such as ensemble clustering, which require multiple barycenter estimations per iteration.

In this work, we introduce a novel progressive framework for Wasserstein barycenters, which is based on two key observations. First, from one Relaxation iteration to the next (lines 3 to 8, Alg. 1), the estimated barycenter is likely to vary only slightly. This motivates the design of a faster algorithm for the Assignment step, which would be capable of starting its optimization from a relevant initial assignment, typically given in our setting by the previous Relaxation iteration. Second, in the initial Relaxation iterations, the estimated barycenter can be arbitrarily far from the final, optimized barycenter. Thus, for these iterations, it can be beneficial to relax the level of accuracy of the Assignment step, which is the main bottleneck, and to progressively increase it when it is the most needed, as the barycenter converges to a solution.

2 Progressive Barycenters

This section presents our novel progressive framework for the approximation of Wasserstein barycenters of a set of Persistence diagrams . As mentioned above, the key insights of our approach are twofolds. First, the assignments involved in the Wasserstein distance estimations can be re-used as initial conditions along the iterations of the barycenter Relaxation (Sec. 3.2). Second, progressivity can be injected in the process to accelerate convergence, by adequately controlling the level of accuracy in the evaluation of the Wasserstein distances along the Relaxation iterations of the barycenter. This progressivity can be injected at two levels: by controlling the accuracy of the distance estimation itself (Sec. 3.3) and the resolution of the input diagrams (Sec. 3.4). The rest of the section presents a parallelization of our framework (Sec. 3.5) and describes an interruptible algorithm, capable of respecting running time constraints (Sec. 3.6).

3 Progressive Barycenters

This section presents our novel progressive framework for the approximation of Wasserstein barycenters of a set of Persistence diagrams..

3.1 Overview

The key insights of our approach are twofolds. First, in the reference algorithm (Alg. 1), from one Relaxation iteration to the next (lines 3 to 8), the estimated barycenter is likely to vary only slightly. Previous assignments can thus Thus, the assignments involved in the Wasserstein distance estimations can be re-used as initial conditions along the iterations of the barycenter Relaxation (Sec. 3.2). Second, in the initial Relaxation iterations, the estimated barycenter can be arbitrarily far from the final, optimized barycenter. Thus, for these early iterations, it can be beneficial to relax the level of accuracy of the Assignment step, which is the main bottleneck, and to progressively increase it as the barycenter converges to a solution. Progressivity can be injected at two levels: by controlling the accuracy of the distance estimation itself (Sec. 3.3) and the resolution of the input diagrams (Sec. 3.4). Our framework is easily parallelizable (Sec. 3.5) and the progressivity allows to design an interruptible algorithm, capable of respecting running time constraints (Sec. 3.6).

Figure 4: Visual comparison between the converged Wasserstein barycenters obtained with the Auction barycenter algorithm [94]+[51](top) and our approach (bottom) for one cluster (Sec. 5.3) of each ensemble data set. Differences are barely noticeable, and only for small persistence pairs.

3.2 Auctions with Price Memorization

The Assignment step of the Wasserstein barycenter computation (line 5, Alg. 1) can be resolved in principle with any of the existing techniques for Wasserstein distance estimation [64, 63, 51, 86]. Among them, the Auction based approach [12, 51] (Sec. 1.3) is particularly relevant as it can compute very efficiently approximations with on-demand accuracy.

In the following, we consider that each distance computation involves augmented diagrams (Sec. 1.2). Each input diagram is then considered as a set of bidders (Sec. 1.3) while the output barycenter contains the objects to purchase. Each input diagram maintains its own list of prices for the purchase of the objects by the bidders . The search by a bidder for the two most valuable objects to purchase is accelerated with space partitioning data structures, by using a kd-tree and a lazy heap respectively for the off- and on-diagonal points [51] (these structures are re-computed for each Relaxation). Thus, the output barycenter maintains only one kd-tree and one lazy heap for this purpose. Since Wasserstein distances are only approximated in this strategy, we suggest to relax the overall stopping condition (Alg. 1) and stop the iterations after two successive increases in Fréchet energy (Eq. 8), as commonly done in gradient descent optimizations. In the rest of the paper, we call the above strategy the Auction barycenter algorithm [94]+[51], as it naivelyjust combines the algorithms by Turner et al. [94] and Kerber et al. [51].

However, this naive usage of the auction algorithm results in a complete reboot of the entire sequence of auction rounds upon each Relaxation, while in practice, for the barycenter problem, the output assignments may be very similar from one Relaxation iteration to the next and thus could be re-used as initial solutions. For this, we introduce a mechanism that we call Price Memorization, which simply consists in initializing the prices for each bidder to the prices obtained at the previous Relaxation iteration (instead of ). This has the positive effect of encouraging bidders to bid in priority on objects which were previously assigned to them, hence effectively re-using the previous assignments as an initial solution. This memorization makes most of the early auction rounds become unnecessary in practice, which enables to drastically reduce their number, as detailed in the following.

3.3 Accuracy-driven progressivity

The reference algorithm for Wasserstein barycenter computation (Alg. 1) can also be interpreted as a variant of gradient descent [94]. For such methods, it is often observed that approximations of the gradient, instead of exact computations, can be sufficient in practice to reach convergence. This observation is at the basis of our progressive strategy. Indeed, in the early Relaxation iterations, the barycenter can be arbitrarily far from the converged result and achieving a high accuracy in the Assignment step (line 5) for these iterations is often a waste of computation time. Therefore we introduce a mechanism that progressively increases the accuracy of the Assignment step along the Relaxation iterations, to provide the maximumin order to obtain more accuracy near convergence.

To achieve this, inspired by the internals of the auction algorithm, we apply a global -scaling (Sec. 1.3), where we progressively decrease the value of , but only at the end of each Relaxation. Combined with Price Memorization (Sec. 3.2), this strategy enables us to perform only one auction round per Assignment step. As large values accelerate auction convergence at the price of accuracy, this strategy effectively speeds up the early Relaxation iterations and leads to more and more accurate auctions, and thus assignments, along the Relaxation iterations.

In practice, we divide by after each Relaxation, as suggested by Bertsekas [12] in the case of the regular auction algorithm (Sec. 1.3). Moreover, to guarantee precise final barycenters (obtained for small values), we modify the overall stopping condition to prevent the algorithm from stopping if is larger than of its initial value.

Data set 10cm Sinkhorn
[53] 10cm Munkres
[94]+[86] 10cmAuction
[94]+[51] Ours Speedup
Gaussians (Fig. 7) 100 2,078 7,499.33 H 8,975.60 785.53 11.4
Vortex Street (Fig. 8) 45 14 54.21 0.14 0.47 0.23 0.6
Starting Vortex (Fig. 9) 12 36 40.98 0.06 0.67 0.28 0.2
Isabel (3D) (Progressive Wasserstein Barycenters of Persistence Diagrams) 12 1,337 1,070.57 H 377.42 82.95 4.5
Sea Surface Height (Fig. 10) 48 1,379 4,565.37 H 949.08 75.90 12.5
Table 1: Comparison of running times (in seconds, 1 thread) for the estimation of Wasserstein barycenters of Persistence diagrams. and respectively stand for the number of members in the ensemble and the average size of the input persistence diagrams.

3.4 Persistence-driven progressivity

In practice, the persistence diagrams of real-life data sets often contain a very large number of critical point pairs of low persistence. These numerous small pairs correspond to noise and are often meaningless for the applications. However, although they individually have only little impact on Wasserstein distances (Eq. 2), their overall contributions may be non-negligible. To account for this, we introduce in this section a persistence-driven progressive mechanism, which progressively inserts in the input diagrams critical point pairs of decreasing persistence. This focuses the early Relaxation iterations on the most salient features of the data, while considering the noisy ones last. In practice, this encourages the optimization to explore more relevant local minima of the Fréchet energy (Eq. 8) that favor persistent features.

Given an input diagram , let be the subset of its points with persistence higher than : . To account for persistence-driven progressivity, we run our barycenter algorithm (with Price Memorization, Sec. 3.2, and accuracy-driven progressivity, Sec. 3.3) by initially considering as an input the diagrams . After each Relaxation iteration (Alg. 2, line 10), we decrease such that does not increase by more than 10% (to progress at uniform speed) and such that does not get smaller than (we set to replicate locally Bertsekas’s suggestion for setting, Sec. 1.3). Initially, is set to half of the maximum persistence found in the input diagrams. Along the Relaxation iterations, the input diagrams are progressively populated, which yields the introduction of new points in the barycenter , which we initialize at locations selected uniformly among the newly introduced points of the inputs. This strategy enables to distribute among the inputs the initialization of the new barycenter points. The corresponding prices are initialized with the minimum price found for the objects at the previous iteration.

Input : Set of diagrams , time constraint

Output : Wasserstein barycenter

1:   // with randomly chosen in
2:  while the Fréchet energy decreases do
3:     // Relaxation start
4:     for  do
5:         // In parallel // Sec. 3.5
6:          // Sec. 3.2
7:     end for
8:      // arithmetic means in birth/death space
9:      // Sec. 3.3
10:     if then // Sec. 3.4
11:     else if then return // Sec. 3.6
12:     // Relaxation end
13:  end while
14:  return  
Algorithm 2 Our overall algorithm for Progressive Wasserstein Barycenters.

3.5 Parallelism

Our progressive framework can be trivially parallelized as the most computationally demanding task, the Assignment step (Alg. 2), is independent for each input diagram . The space partitioning data structures used for proximity queries to are accessed independently by each bidder diagram. Thus, we parallelize our approach by running the Assignment step in independent threads.

3.6 Computation time constraints

Our persistence-driven progressivity (Sec. 3.4) focuses the early iterations of the optimization on the most salient features, while considering the noisy ones last. However, as discussed before, low persistence pairs in the input diagrams are often considered as meaningless in the applications. This means that our progressive framework can in principle be interrupted before convergence and still provide a meaningful result.

Let be a user defined time constraint. We first progressively introduce points in the input diagrams and perform the Relaxation iterations for the first of the time constraint , as described in Sec. 3.4. At this point, the optimized barycenter contains only a fraction of the points it would have contained if computed until convergence. To guarantee a precise output barycenter, we found that continuing the Relaxation iterations for the remaining of the time, without introducing new persistence pairs, provided the best results. Note that iIn practice, in most of our experiments, we observed that this second optimization part fully converged even before reaching of the computation time constraint. Alg. 2 summarizes our overall approach for Wasserstein barycenters of persistence diagrams, with price memorization, progressivity, parallelism and time constraints. For clarity, we provide in the additional materials a toy example of the unfolding of our algorithm. Several iterations of our algorithm are illustrated on a toy example in additional material.

4 Application to Ensemble Topological Clustering

This section presents an application of our progressive framework for Wasserstein barycenters of persistence diagrams to the clustering of the members of ensemble data sets. Since it focuses on persistence diagrams, this strategy enables to group together ensemble members that have the same topological profile, hence effectively highlighting the main trends found in the ensemble in terms of features of interest.

The -means is a popular algorithm for the clustering of the elements of a set, if distances and barycenters can be estimated on this set. The latter is efficiently computable for persistence diagrams thanks to our novel progressive framework and we detail in the following how to revisit the -means algorithm to make it use our progressive barycenters as estimates of the centroids of the clusters.

Data set 1 thread 8 threads Speedup
Gaussians (Fig. 7) 100 2,078 785.53 117.91 6.6
Vortex Street (Fig. 8) 45 14 0.23 0.10 2.3
Starting Vortex (Fig. 9) 12 36 0.28 0.19 1.5
Isabel (3D) (Progressive Wasserstein Barycenters of Persistence Diagrams) 12 1,337 82.95 31.75 2.6
Sea Surface Height (Fig. 10) 48 1,379 75.90 19.40 3.9
Table 2: Running times (in seconds) of our approach (run until convergence) with 1 and 8 threads. and stand for the number of members in the ensemble and the average size of the diagrams.
Data set 10cmAuction
[94]+[51] Ours Ratio
Gaussians (Fig. 7) 100 2,078 39.4 39.0 0.99
Vortex Street (Fig. 8) 12 36 415.1 412.5 0.99
Starting Vortex (Fig. 9) 45 14 112,787.0 112,642.0 1.00
Isabel (3D) (Progressive Wasserstein Barycenters of Persistence Diagrams) 12 1,337 2,395.6 2,337.1 0.98
Sea Surface Height (Fig. 10) 48 1,379 7.2 7.1 0.99
Table 3: Comparison of Fréchet energy (Eq. 8) at convergence between the Auction barycenter method [94]+[51]  and our approach.
Figure 5: Comparison of the evolution of the Fréchet energy (log scale, Sea Surface Height, maximum diagrams), for the Auction barycenter method [94]+[51](red) and 3 variants of our approach: without (blue) and with (orange) persistence progressivity, and with time constraints (green). Persistence-driven progressivity drastically accelerates convergence.

The -means is an iterative algorithm, which highly resembles barycenter computation algorithms (Sec. 1.4), where each Clustering iteration is composed itself of two sub-routines: (i) Assignment and (ii) Update. Initially, cluster centroids () are initialized on diagrams of the input set . For this, in practice, we use the -means++ heuristic described by Celebi et al. [21], which aims at maximizing the distance between centroids. Then, the Assignment step consists in assigning each diagram to its closest centroid . This implies the computation, for each diagram of its Wasserstein distance to all the centroids . For this step, we estimate these pairwise distances with the Auction algorithm run until convergence (, Sec. 1.3). In practice, we use the accelerated -means strategy described by Elkan [31], which exploits the triangle inequality between centroids to skip, given a diagram , the computation of its distance to centroids which cannot be the closest. Next, the Update step consists in updating each centroid’s location by placing it at the barycenter (with Alg. 2) of its assigned diagrams . The algorithm continues these Clustering iterations until convergence, i.e. until the assignments between the diagrams and the centroids do not evolve anymore, hence yielding the final clustering.

From our experience, the Update step of a Clustering iteration is by far the most computationally expensive. To speed up this stage in practice, we derive a strategy that is similar to our approach for barycenter approximation: we reduce the computation load of each Clustering iteration and progressively increase their accuracy along the optimization. This strategy is motivated by a similar observation: early centroids are quite different from the converged ones, which motivates an accuracy reduction in the early Clustering iterations of the algorithm. Thus, for each Clustering iteration, we use a single round of auction with price memorization (Sec. 3.2), and a single barycenter update (i.e. a single Relaxation iteration, Alg. 2). Overall, only one global -scaling (Sec. 3.3) is applied at the end of each Clustering iteration. This enhances the -means algorithm with accuracy progressivity. If a diagram migrates from a cluster to a cluster , the prices of the objects of for the bidders of are initialized to and we run the auction algorithm between and until the pairwise value matches the global value, in order to obtain prices for which are comparable to the other diagrams. Also, we apply persistence-driven progressivity (Sec. 3.4) by adding persistence pairs of decreasing persistence in each diagram along the Clustering iterations. Finally, a computation time constraint can also be provided, as described in Sec. 3.6. Results of our clustering scheme are presented in Sec. 5.3.

5 Results

This section presents experimental results obtained on a ècomputer with two Xeon CPUs (3.0 GHz, 2x4 cores), with 64GB of RAM. The input persistence diagrams were computed with the algorithm by Gueunet et al. [37], which is available in the Topology ToolKit (TTK) [92]. the FTM algorithm [37, 92]. We implemented our approach in C++, as TTK modules.

Our experiments were performed on a variety of simulated and acquired 2D and 3D ensembles, ., exactly the same as those used by Favelier et al. [32]. taken from Favelier et al. [32]. The Gaussians ensemble contains 100 2D synthetic noisy members, with 3 patterns of Gaussians (Fig. 7). The considered features of interest in this example are the maxima, hence only the diagrams including the persistence pairs are considered.. The Vortex Street ensemble (Fig. 8) includes 45 runs of a 2D simulation of flow turbulence behind an obstacle. The considered scalar field is the curl orthogonal component, for 5 fluids of different viscosity(9 runs per fluid, each run with varying Reynolds numbers).. In this application, salient extrema are typically considered as reliable estimations of the center of vortices. Thus, each run is represented by two diagrams, processed independently by our algorithms: one for the pairs (involving minima) and one for the pairs (involving maxima). The Starting Vortex ensemble (Fig. 9) includes 12 runs of a 2D simulation of the formation of a vortex behind a wing, for 2 distinct wing configurations. The considered data is also the curl orthogonal component and diagrams involving minima and maxima are also considered. The Isabel data set (Progressive Wasserstein Barycenters of Persistence Diagrams) is a volume ensemble of 12 members, showing key time steps (formation, drift and landfall) in the simulation of the Isabel hurricane [84]. In this example, the eyewall of the hurricane is typically characterized by high wind velocities, well captured by velocity maxima. Thus we only consider diagrams involving maxima. Finally, the Sea Surface Height ensemble (Fig. 10) is composed of 48 observations taken in January, April, July and October 2012 (https://ecco.jpl.nasa.gov/products/all/). TheHere, the features of interest are the center of eddies, which can be reliably estimated with height extrema. Thus, both the diagrams involving the minima and maxima are considered and independently processed by our algorithms. Unless stated otherwise, all results were obtained by considering the Wasserstein metric based on the original pointwise metric (Eq. 1) without geometrical lifting (i.e. , Sec. 1.2).

5.1 Time performance

We first evaluate the time performance of our progressive framework when run until convergence (i.e. no computation time constraint). The corresponding timings are reported in Tab. 1. Tab. 1 evaluates the time performance of our progressive framework when run until convergence (i.e. no computation time constraint). This table also provides running times for 3 alternatives. The column, Sinkhorn, provides the timings obtained with a Python CPU implementation kindly provided by Lacombe et al. [53], for which we used the recommended parameter values (entropic term: heat map resolution: ). Note that this approach casts the problem as an Eulerian transport optimization under an entropic regularization term. Thus, it optimizes for a convex functional which is considerably different from the Fréchet energy considered in our approach (Eq. 8). Overall, these aspects, in addition to the difference in programming language, challenge direct comparisons and we only report running times for completeness. The columns Munkres, noted [94]+[86], and Auction, noted [94]+[51], report the running times of our own C++ implementation of Turner’s algorithm [94] where distances are respectively estimated with the exact method by Soler et al. [86] (available in TTK [92]) and our own C++ implementation of the auction-based approximation by Kerber et al. [51] (with kd-tree and lazy heap, run until convergence, ).

Figure 6: Interrupted Wasserstein barycenters for one cluster of the Sea Surface Height ensemble with different computation time constraints. From left to right : 0.1 s., 1 s., 10 s., and full convergence (21 s.).
Figure 7: Clustering the Gaussians ensemble. From left to right, pointwise mean and Wasserstein barycenter for each of the identified clusters () with geometrical lifting ().

As predicted, the cubic time complexity of the Munkres algorithm makes it impractical for barycenter estimation, as the computation completed within 24 hours for only two ensembles(Vortex Street and Starting Vortex), where the diagrams are particularly small.. The Auction approach is more practical but still requires up to hours to converge for the largest data sets. In contrast, our approach converges in sequential in less than 15 minutes at most. The column Speedup reports the gain obtained with our method against the fastest of the two explicit alternatives, Munkres or Auction. For ensembles of realistic size, this speedup is about an order of magnitude. Our approach can be trivially parallelized by running the Assignment step (Alg. 2) in independent threads (Sec. 3.5). We implemented this strategy with OpenMP and the results are reported in Tab. 2. As reported in Tab. 2, our approach can be trivially parallelized with OpenMP by running the Assignment step (Alg. 2) in independent threads (Sec. 3.5). As the size of the input diagrams may strongly vary within an ensemble, this trivial parallelization may result in load imbalance among the threads, impairing parallel efficiency. In practice, this strategy still provides reasonable speedups, bringing the computation down to a couple of minutes at most.

5.2 Barycenter quality

Tab. 3 compares the Fréchet energy (Eq. 8) of the converged barycenters for our method and the Auction barycenter alternative [94]+[51] (the Wasserstein distances between the results of the two approaches are provided in additional material for further details). In particular, tThe Fréchet energy has been precisely evaluated with an estimation of Wasserstein distances based on the Auction algorithm run until convergence (). While the actual values for this energy are not specifically relevant (because of various data ranges), the ratio between the two methods indicates that the local minima approximated by both approaches are of comparable numerical quality, with a variation of in energy at most. Fig. 4 provides a visual comparison of the converged Wasserstein barycenters obtained with the Auction barycenter alternative [94]+[51]  and our method, for one cluster of each of our data sets (Sec. 5.3). This figure shows that differences are barely noticeable and only involve pairs with low persistence, which are often of small interest in the applications.

Fig. 5 detailscompares the convergence rates of the Auction barycenter [94]+[51](red) andto three variants of our framework: without (blue) and with (orange) persistence progressivity and with time computation constraints (green, complete computations for increasing time constraints). For the interruptible version of our algorithm (green), the energy is reported after each computation, for increasing time constraints. This plotIt indicates that our approach based on Price Memorization and single auction round (Secs. 3.2, 3.3, blue (blue) already substantially accelerates convergence (the first iterationafter initialization, dashed, is performed with a large and thus induces a high energy). Interestingly, persistence-driven progressivity (orange) provides the most important gains in convergence speed. The number of Relaxation iterations is larger for our approach (, orange) than for the Auction barycenter method (, red), which emphasizes the low computational effort of each of our iterations. Finally, when the Auction barycenter method completed its first Relaxation iteration (leftmost red point), our persistence-driven progressive algorithm already achieved 80% of its iterations, resulting in a Fréchet energy almost twice smaller. The quality of the barycenters obtained with the interruptible version of our approach (Sec. 3.6) is illustrated in Figs. Progressive Wasserstein Barycenters of Persistence Diagrams and 6 for varying time constraints. As predicted, features of decreasing persistence progressively appear in the diagrams, while the most salient features are accurately represented for very small constraints, allowing for reliable estimations within interactive times (below a second).

Figure 8: Clusters automatically identified by our topological clustering (: 10 seconds). From top to bottom: pointwise mean of each cluster. Left: Centroids computed by our interruptible clustering algorithm. Right: Wasserstein barycenters of the clusters, computed by our progressive algorithm run until convergence. Differences are visually indistinguishable. Barycenter extrema are scaled in the domain by persistence (spheres).

5.3 Ensemble visual analysis with Topological Clustering

In the following, we systematically set a time constraint of 10 seconds. To facilitate the reading of the diagrams, each pair with a persistence smaller than of the function range is shown in transparent white, to help visually discriminate salient features from noise. Fig. 7 shows the clustering of the Gaussians ensemble by our approach. This synthetic ensemble exemplifies the motivation for the geometrical lifting introduced in Sec. 1.2(Sec. 1.2). The first and third clusters both contain a single Gaussian, resulting in diagrams with a single persistent feature, but located in drastically different areas of the domain . Thus, the diagrams of these two clusters would be indistinguishable for the clustering algorithm if geometrical lifting was not considered. If feature location is important for the application, our approach can be adjusted thanks to geometrical lifting (Sec. 1.2). For the Gaussians ensemble, this makes our clustering approach compute the correct clustering. Moreover, taking the geometry of the critical points into account allows us to represent in the extrema involved in the Wasserstein barycenters (spheres, scaled by persistence, Fig. 7) which allows user to have a visual feedback in the domain of the features representative of the set of scalar fields. Geometrical lifting is particularly important in applications where feature location bears a meaning, such as the Isabel ensemble (Progressive Wasserstein Barycenters of Persistence Diagrams(f)). For this example, our clustering algorithm with geometrical lifting automatically identifies the right clusters, corresponding to the three states of the hurricane (formation, drift and landfall). For the remaining examples, geometrical lifting was not necessary (). For the Vortex Street ensemble (Fig. 8), our approach manages to automatically identify the correct clusters, precisely corresponding to the distinct viscosity regimes found in the ensemble. Note that the centroids computed by our topological clustering algorithm with a time constraint of 10 seconds (left) are visually indistinguishable from the Wasserstein barycenters of each cluster, computed after the fact with our progressive algorithm run until convergence (right). This indicates that the centroids provided by our topological clustering are reliable and can be used to visually represent the features of interest in the ensemble. In particular, for the Vortex Street example, these centroids enable the clear identification of the number and salience of the vortices: pairs which align horizontally and vertically respectively denote minima and maxima of flow vorticity, which respectively correspond to clockwise and counterclockwise vortices. Fig. 9 presents our results on the Starting Vortex, where our approach also automatically identifies the correct clustering, corresponding to two wing configurations. In this example, the difference in turbulence (number and strength of vortices) can be directly and visually read from the centroids returned by our algorithm (insets). Finally, Fig. 10 shows our results for the Sea Surface Height, where our topological clustering automatically identifies four clusters, corresponding to the four seasons: winter (top left), spring (top right), summer (bottom left), fall (bottom right). As shown in the insets, each season leads to a visually distinct centroid diagram. In this example, as diagrams are larger, differences between the interrupted centroids (left) and the converged barycenters (right) become noticeable. However, these differences only involve pairs of small persistence, whose contribution to the final clustering reveal negligible in practice.

Comparing with the approach from Favelier et al. [32], on the exact same data, we obtain similar qualitative results, with correct clusterings found on all data-sets. Performance-wise, our time-constrained approach finds the correct clustering within ten seconds for a large ensemble data-set such as Sea Surface Height (in addition to 35 seconds to compute the persistence diagrams) against several hundreds of seconds for the parallel persistence atlas approach. Overall, our approach provides the same clustering results than Favelier et al. [32]: the returned clusterings are correct for both approaches, for all of the above data sets. However, once the input persistence diagrams are available, our algorithm computes within a time constraint of ten seconds only, while the approach by Favelier et al. requires up to hundreds of seconds (on the same hardware) to compute intermediate representations (Persistence Maps) which are not needed in our work.

Figure 9: Clusters automatically identified by our topological clustering (). Left insets (a, c): Centroids computed by our interruptible clustering algorithm. Right insets (b, d): Wasserstein barycenters of the clusters, computed by our progressive algorithm run until convergence. Differences are visually indistinguishable. Top: pointwise mean of each cluster. Barycenter extrema are scaled in the domain by persistence (spheres). Top: pointwise mean of each cluster, with barycenter extrema scaled by persistence (spheres).

5.4 Limitations

In our experiments, we focused on persistence diagrams which only involve extrema, as these often directly translate into features of interest in the applications. Although our approach can consider other types of persistence pairs (e.g. saddle-saddle pairs in 3D), from our experience, the interpretation of these structures is not obvious in practice and future work is needed to improve the understanding of these pairs in the applications. Thanks to the assignments computed by our algorithm, the extrema of the output barycenter can be embedded in the original domain (Fig. 7 to 10). However, in practice a given barycenter extremum can be potentially assigned with extrema which are distant from each other in the ensemble members, resulting in its placement at an in-between location which may not be relevant for the application. Regarding the Fréchet energy, our experiments confirm the proximity of our approximated barycenters to actual local minima (Fig. 4, Tab. 3). However, theoretical proximity bounds to these minima are difficult to formulate and we leave this for future work. However, aAlso, as it is the case for the original algorithm by Turner et al. [94], there is no guarantee that this solution is a global minimizerour solutions are global minimizers. For the clustering, we observed that the initialization of the -means algorithm had a major impact on its outcome but we found that the -means++ heuristic [21] provided excellent results in practice. Finally, when the geometrical location of features in the domain has a meaning for the applications, the geometrical lifting coefficient (Sec. 1.2) must be manually adjusted by the user on a per application basis, which involves a trial and error process. However, our interruptible approach greatly helps in this process, as users can perform such adjustments at interactive rates.

Figure 10: Clusters automatically identified by our topological clustering (). From left to right, top to bottom: pointwise mean of each cluster. Barycenter extrema are embedded in the domain with spheres scaled by persistence. From left to right, top to bottom: pointwise mean of each cluster with barycenter extrema scaled by persistence (spheres). Left insets (a, c, e, g): Centroids computed by our interruptible clustering algorithm. Right insets (b, d, f, h): Wasserstein barycenters of the clusters, computed by our progressive algorithm run until convergence.

6 Conclusion

In this paper, we presented an algorithm for the progressive approximation of Wasserstein barycenters of Persistence diagrams, with applications to the visual analysis of ensemble data. Our approach takes advantage of previous results on the computation of Fréchet means for persistence diagrams [94], and revisits the fast approximation of Wasserstein distance [51] to induce progressivity. Our approach revisits efficient algorithms for Wasserstein distance approximation [12, 51] in order to specifically extend previous work on barycenter estimation [94]. Our experiments showed that our strategy drastically accelerates convergence and reported an order of magnitude speedup against previous work, while providing barycenters which are quantitatively comparable and visually similar. and visually comparable. The progressivity of our approach allowedallows for the definition of an interruptible algorithm, enabling the estimation of reliable barycenters within interactive times. We presented an application to the topological clustering of ensemble data, ensemble data clustering, where the obtained centroid diagrams provided key visual insights about the global feature trends of the ensemble.

A natural direction for future work is the extension of our framework to other topological abstractions, such as Reeb graphs or Morse-Smale complexes. However, the question of defining a relevant, and importantly, computable metric between these objects is still an active research debate. Our framework provides only approximations of Wasserstein barycenters. In the future, it would be useful to study the convergence of these approximations from a theoretical point of view. Although we have focused on scientific visualization applications, our framework can be used mostly as-is for persistence diagrams of more general data, such as filtrations of high dimensional point clouds. In that context, other applications than clustering will also be investigated. Moreover, we believe our progressive strategy for Wasserstein barycenters can also be used for more general inputs than persistence diagrams, such as generic point clouds, as long as an importance measure substituting persistence is available, which would significantly enlarge the spectrum of applications of the ideas presented in this paper.

Acknowledgements.
The authors would like to thank T. Lacombe, M. Cuturi and S. Oudot for sharing an implementation of their approach [53]. We also thank the reviewers for their thoughtful remarks and suggestions. This work is partially supported by the European Commission grant H2020-FETHPC-2017 “VESTEC” (ref. 800904).

References

  • [1] ISO/IEC Guide 98-3:2008 uncertainty of measurement - part 3: Guide to the expression of uncertainty in measurement (GUM). 2008.
  • [2] H. Adams, S. Chepushtanova, T. Emerson, E. Hanson, M. Kirby, F. Motta, R. Neville, C. Peterson, P. Shipman, and L. Ziegelmeier.

    Persistence Images: A Stable Vector Representation of Persistent Homology.

    Journal of Machine Learning Research, 2017.
  • [3] K. Anderson, J. Anderson, S. Palande, and B. Wang. Topological data analysis of functional MRI connectivity in time and space domains. In MICCAI Workshop on Connectomics in NeuroImaging, 2018.
  • [4] T. Athawale and A. Entezari. Uncertainty quantification in linear interpolation for isosurface extraction. IEEE Transactions on Visualization and Computer Graphics, 2013.
  • [5] T. Athawale and C. R. Johnson. Probabilistic asymptotic decider for topological ambiguity resolution in level-set extraction for uncertain 2D data. IEEE Transactions on Visualization and Computer Graphics, 2019.
  • [6] T. Athawale, E. Sakhaee, and A. Entezari. Isosurface visualization of data with nonparametric models for uncertainty. IEEE Transactions on Visualization and Computer Graphics, 2016.
  • [7] U. Ayachit, A. C. Bauer, B. Geveci, P. O’Leary, K. Moreland, N. Fabian, and J. Mauldin. Paraview catalyst: Enabling in situ data analysis and visualization. In ISAV, 2015.
  • [8] A. C. Bauer, H. Abbasi, J. Ahrens, H. Childs, B. Geveci, S. Klasky, K. Moreland, P. O’Leary, V. Vishwanath, B. Whitlock, and E. W. Bethel. In-situ methods, infrastructures, and applications on high performance computing platforms. Comp. Grap. For., 2016.
  • [9] U. Bauer, X. Ge, and Y. Wang. Measuring distance between Reeb graphs. In Symp. on Comp. Geom., 2014.
  • [10] U. Bauer, E. Munch, and Y. Wang. Strong equivalence of the interleaving and functional distortion metrics for Reeb graphs. In Symp. on Comp. Geom., 2015.
  • [11] K. Beketayev, D. Yeliussizov, D. Morozov, G. H. Weber, and B. Hamann. Measuring the distance between merge trees. In TopoInVis. 2014.
  • [12] D. P. Bertsekas. A new algorithm for the assignment problem. Mathematical Programming, 1981.
  • [13] D. P. Bertsekas and D. Castañon. Parallel synchronous and asynchronous implementations of the auction algorithm. Parallel Computing, 1991.
  • [14] H. Bhatia, A. G. Gyulassy, V. Lordi, J. E. Pask, V. Pascucci, and P.-T. Bremer. Topoms: Comprehensive topological exploration for molecular and condensed-matter systems. J. of Computational Chemistry, 2018.
  • [15] H. Bhatia, S. Jadhav, P. Bremer, G. Chen, J. A. Levine, L. G. Nonato, and V. Pascucci. Flow visualization with quantified spatial and temporal errors using edge maps. IEEE Transactions on Visualization and Computer Graphics, 2012.
  • [16] S. Biasotti, D. Giorgio, M. Spagnuolo, and B. Falcidieno. Reeb graphs for shape analysis and applications. Theoretical Computer Science, 2008.
  • [17] G. Bonneau, H. Hege, C. Johnson, M. Oliveira, K. Potter, P. Rheingans, and T. Schultz”. Overview and state-of-the-art of uncertainty visualization. Mathematics and Visualization, 37:3–27, 2014.
  • [18] P. Bremer, G. Weber, J. Tierny, V. Pascucci, M. Day, and J. Bell. Interactive exploration and analysis of large scale simulations using topology-based data segmentation. IEEE Transactions on Visualization and Computer Graphics, 2011.
  • [19] H. Carr, J. Snoeyink, and U. Axen. Computing contour trees in all dimensions. In Symp. on Dis. Alg., 2000.
  • [20] M. Carrière, M. Cuturi, and S. Oudot. Sliced Wasserstein Kernel for Persistence Diagrams. ICML, 2017.
  • [21] M. E. Celebi, H. A. Kingravi, and P. A. Vela. A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl., 2013.
  • [22] D. Cohen-Steiner, H. Edelsbrunner, and J. Harer. Stability of persistence diagrams. In Symp. on Comp. Geom., 2005.
  • [23] M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In NIPS. 2013.
  • [24] M. Cuturi and A. Doucet. Fast computation of wasserstein barycenters. In ICML, 2014.
  • [25] L. De Floriani, U. Fugacci, F. Iuricich, and P. Magillo. Morse complexes for shape segmentation and homological analysis: discrete models and algorithms. Comp. Grap. For., 2015.
  • [26] P. Diggle, P. Heagerty, K.-Y. Liang, and S. Zeger. The Analysis of Longitudinal Data. Oxford University Press, 2002.
  • [27] H. Edelsbrunner and J. Harer. Computational Topology: An Introduction. American Mathematical Society, 2009.
  • [28] H. Edelsbrunner, J. Harer, and A. Zomorodian. Hierarchical Morse-Smale complexes for piecewise linear 2-manfiolds. Disc. Compu. Geom., 2003.
  • [29] H. Edelsbrunner, D. Letscher, and A. Zomorodian. Topological persistence and simplification. Disc. Compu. Geom., 2002.
  • [30] H. Edelsbrunner and E. P. Mucke. Simulation of simplicity: a technique to cope with degenerate cases in geometric algorithms. ACM Trans. on Graph., 1990.
  • [31] C. Elkan. Using the triangle inequality to accelerate k-means. In ICML, 2003.
  • [32] G. Favelier, N. Faraj, B. Summa, and J. Tierny. Persistence Atlas for Critical Point Variability in Ensembles. IEEE Transactions on Visualization and Computer Graphics, 2018.
  • [33] G. Favelier, C. Gueunet, and J. Tierny. Visualizing ensembles of viscous fingers. In IEEE SciVis Contest, 2016.
  • [34] F. Ferstl, K. Bürger, and R. Westermann. Streamline variability plots for characterizing the uncertainty in vector field ensembles. IEEE Transactions on Visualization and Computer Graphics, 2016.
  • [35] F. Ferstl, M. Kanzler, M. Rautenhaus, and R. Westermann. Visual analysis of spatial variability and global correlations in ensembles of iso-contours. Comp. Grap. For., 2016.
  • [36] D. Guenther, R. Alvarez-Boto, J. Contreras-Garcia, J.-P. Piquemal, and J. Tierny. Characterizing molecular interactions in chemical systems. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2014.
  • [37] C. Gueunet, P. Fortin, J. Jomier, and J. Tierny. Task-based Augmented Merge Trees with Fibonacci Heaps,. In IEEE LDAV, 2017.
  • [38] D. Günther, J. Salmon, and J. Tierny. Mandatory critical points of 2D uncertain scalar fields. Computer Graphics Forum, 2014.
  • [39] A. Gyulassy, P. Bremer, R. Grout, H. Kolla, J. Chen, and V. Pascucci. Stability of dissipation elements: A case study in combustion. Computer Graphics Forum, 2014.
  • [40] A. Gyulassy, P. Bremer, and V. Pascucci. Shared-memory parallel computation of Morse-Smale complexes with improved accuracy. IEEE Transactions on Visualization and Computer Graphics, 2019.
  • [41] A. Gyulassy, P. T. Bremer, B. Hamann, and V. Pascucci. A practical approach to Morse-Smale complex computation: Scalability and generality. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2008.
  • [42] A. Gyulassy, M. A. Duchaineau, V. Natarajan, V. Pascucci, E. Bringa, A. Higginbotham, and B. Hamann. Topologically clean distance fields. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2007.
  • [43] A. Gyulassy, D. Guenther, J. A. Levine, J. Tierny, and V. Pascucci. Conforming Morse-Smale complexes. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2014.
  • [44] A. Gyulassy, A. Knoll, K. Lau, B. Wang, P. Bremer, M. Papka, L. A. Curtiss, and V. Pascucci. Interstitial and interlayer ion diffusion geometry extraction in graphitic nanosphere battery materials. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2015.
  • [45] C. Heine, H. Leitte, M. Hlawitschka, F. Iuricich, L. De Floriani, G. Scheuermann, H. Hagen, and C. Garth. A survey of topology-based methods in visualization. Comp. Grap. For., 2016.
  • [46] M. Hilaga, Y. Shinagawa, T. Kohmura, and T. L. Kunii. Topology matching for fully automatic similarity estimation of 3D shapes. In Proc. of ACM SIGGRAPH, 2001.
  • [47] M. Hummel, H. Obermaier, C. Garth, and K. I. Joy. Comparative visual analysis of lagrangian transport in CFD ensembles. IEEE Transactions on Visualization and Computer Graphics, 2013.
  • [48] C. R. Johnson and A. R. Sanderson. A next step: Visualizing errors and uncertainty. IEEE Computer Graphics and Applications, 2003.
  • [49] L. Kantorovich. On the translocation of masses. AS URSS, 1942.
  • [50] J. Kasten, J. Reininghaus, I. Hotz, and H. Hege. Two-dimensional time-dependent vortex regions based on the acceleration magnitude. IEEE Transactions on Visualization and Computer Graphics, 2011.
  • [51] M. Kerber, D. Morozov, and A. Nigmetov. Geometry helps to compare persistence diagrams. ACM Journal of Experimental Algorithmics, 2016.
  • [52] M. Kraus. Visualization of uncertain contour trees. In International Conference on Information Visualization Theory and Applications, 2010.
  • [53] T. Lacombe, M. Cuturi, and S. Oudot. Large Scale computation of Means and Clusters for Persistence Diagrams using Optimal Transport. In NIPS, 2018.
  • [54] D. E. Laney, P. Bremer, A. Mascarenhas, P. Miller, and V. Pascucci. Understanding the structure of the turbulent mixing layer in hydrodynamic instabilities. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2006.
  • [55] T. Liebmann and G. Scheuermann.

    Critical points of gaussian-distributed scalar fields on simplicial grids.

    Computer Graphics Forum (Proc. of EuroVis), 2016.
  • [56] S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 1982.
  • [57] H.-C. H. M. Otto, T. Germer and H. Theisel. Uncertain 2D vector field topology. Comp. Graph. For., 29:347–356, 2010.
  • [58] T. G. M. Otto and H. Theisel. Uncertain topology of 3D vector fields. Proc. of IEEE Pacific Vis, 2011.
  • [59] A. Maceachren, A.Robinson, S. Hopper, S. Gardner, R. Murray, M. Gahegan, and E. Hetzler. Visualizing geospatial information uncertainty: What we know and what we need to know. Cartography and Geographic Information Science, 2005.
  • [60] D. Maljovec, B. Wang, P. Rosen, A. Alfonsi, G. Pastore, C. Rabiti, and V. Pascucci. Topology-inspired partition-based sensitivity analysis and visualization of nuclear simulations. In Proc. of IEEE PacificVis, 2016.
  • [61] M. Mirzargar, R. Whitaker, and R. Kirby. Curve boxplot: Generalization of boxplot for ensembles of curves. IEEE Transactions on Visualization and Computer Graphics, 2014.
  • [62] G. Monge. Mémoire sur la théorie des déblais et des remblais. Académie Royale des Sciences de Paris, 1781.
  • [63] D. Morozov. Dionysus. http://www.mrzv.org/software/dionysus, 2010. Accessed: 2019-03-01.
  • [64] J. Munkres. Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics, 1957.
  • [65] A. T. Pang, C. M. Wittenbrink, and S. K. Lodha. Approaches to uncertainty visualization. The Visual Computer, 1997.
  • [66] V. Pascucci, G. Scorzelli, P. T. Bremer, and A. Mascarenhas. Robust on-line computation of Reeb graphs: simplicity and speed. ACM Trans. on Graph., 2007.
  • [67] C. Petz, K. Pöthkow, and H.-C. Hege. Probabilistic local features in uncertain vector fields with spatial correlation. Computer Graphics Forum, 2012.
  • [68] T. Pfaffelmoser, M. Mihai, and R. Westermann. Visualizing the variability of gradients in uncertain 2D scalar fields. IEEE Transactions on Visualization and Computer Graphics, 2013.
  • [69] T. Pfaffelmoser, M. Reitinger, and R. Westermann. Visualizing the positional and geometrical variability of isosurfaces in uncertain scalar fields. Computer Graphics Forum, 2011.
  • [70] T. Pfaffelmoser and R. Westermann. Visualization of global correlation structures in uncertain 2D scalar fields. Comp. Grap. For., 2012.
  • [71] K. Pöthkow and H.-C. Hege. Positional uncertainty of isocontours: Condition analysis and probabilistic measures. IEEE Transactions on Visualization and Computer Graphics, 2011.
  • [72] K. Pöthkow and H.-C. Hege. Nonparametric models for uncertainty visualization. Comp. Grap. For., 2013.
  • [73] K. Pöthkow, C. Petz, and H.-C. Hege. Approximate level-crossing probabilities for interactive visualization of uncertain isocontours. Int. J. Uncert. Quantif., 2013.
  • [74] K. Pöthkow, B. Weber, and H.-C. Hege. Probabilistic marching cubes. In Computer Graphics Forum, 2011.
  • [75] K. Potter, S. Gerber, and E. W. Anderson. Visualization of uncertainty without a mean. IEEE Computer Graphics and Applications, 2013.
  • [76] K. Potter, A. Wilson, P. Bremer, D. Williams, C. Doutriaux, V. Pascucci, and C. R. Johnson. Ensemble-vis: A framework for the statistical visualization of ensemble data. In 2009 IEEE International Conference on Data Mining Workshops, 2009.
  • [77] J. C. Potter K, Rosen P. From quantification to visualization: A taxonomy of uncertainty visualization approaches. IFIP Advances in Information and Communication Technology, 2012.
  • [78] J. Reininghaus, S. Huber, U. Bauer, and R. Kwitt. A stable multi-scale kernel for topological machine learning. In IEEE CVPR, 2015.
  • [79] B. Rieck, F. Sadlo, and H. Leitte. Topological machine learning with persistence indicator functions. In Proc. of TopoInVis, 2017.
  • [80] L. Roy, P. Kumar, Y. Zhang, and E. Zhang.

    Robust and fast extraction of 3D symmetric tensor field topology.

    IEEE Transactions on Visualization and Computer Graphics, 2019.
  • [81] H. Saikia, H. Seidel, and T. Weinkauf. Extended branch decomposition graphs: Structural comparison of scalar data. Computer Graphics Forum, 2014.
  • [82] J. Sanyal, S. Zhang, J. Dyer, A. Mercer, P. Amburn, and R. Moorhead. Noodles: A tool for visualization of numerical weather model ensemble uncertainty. IEEE Transactions on Visualization and Computer Graphics, 2010.
  • [83] S. Schlegel, N. Korn, and G. Scheuermann.

    On the interpolation of data with normally distributed uncertainty for visualization.

    IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2012.
  • [84] I. SciVisContest. Simulation of the isabel hurricane. http://sciviscontest-staging.ieeevis.org/2004/data.html, 2004.
  • [85] N. Shivashankar, P. Pranav, V. Natarajan, R. van de Weygaert, E. P. Bos, and S. Rieder. Felix: A topology based framework for visual exploration of cosmic filaments. IEEE Transactions on Visualization and Computer Graphics, 2016. http://vgl.serc.iisc.ernet.in/felix/index.html.
  • [86] M. Soler, M. Plainchault, B. Conche, and J. Tierny. Lifted Wasserstein Matcher for Fast and Robust Topology Tracking. In IEEE Symposium on Large Data Analysis and Visualization, 2018.
  • [87] J. Solomon, F. De Goes, G. Peyré, M. Cuturi, A. Butscher, A. Nguyen, T. Du, and L. Guibas. Convolutional wasserstein distances. ACM Transactions on Graphics, 2015.
  • [88] T. Sousbie. The persistent cosmic web and its filamentary structure: Theory and implementations. Royal Astronomical Society, 2011. http://www2.iap.fr/users/sousbie/web/html/indexd41d.html.
  • [89] A. Szymczak. Hierarchy of stable Morse decompositions. IEEE Transactions on Visualization and Computer Graphics, 2013.
  • [90] D. M. Thomas and V. Natarajan. Detecting symmetry in scalar fields using augmented extremum graphs. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2013.
  • [91] D. M. Thomas and V. Natarajan. Multiscale symmetry detection in scalar fields by clustering contours. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2014.
  • [92] J. Tierny, G. Favelier, J. A. Levine, C. Gueunet, and M. Michaux. The Topology ToolKit. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2017. https://topology-tool-kit.github.io/.
  • [93] J. Tierny, A. Gyulassy, E. Simon, and V. Pascucci. Loop surgery for volumetric meshes: Reeb graphs reduced to contour trees. IEEE Transactions on Visualization and Computer Graphics (Proc. of IEEE VIS), 2009.
  • [94] K. Turner, Y. Mileyko, S. Mukherjee, and J. Harer. Fréchet Means for Distributions of Persistence Diagrams. Disc. Compu. Geom., 2014.
  • [95] R. T. Whitaker, M. Mirzargar, and R. M. Kirby. Contour boxplots: A method for characterizing uncertainty in feature sets from simulation ensembles. IEEE Transactions on Visualization and Computer Graphics, 2013.
  • [96] K. Wu and S. Zhang. A contour tree based visualization for exploring data with uncertainty. International Journal for Uncertainty Quantification, 2013.