1 Introduction
Simulation ensembles of spatiotemporal phenomena are common in many areas of science like physics or geosciences and in medicine. An ensemble consists of several simulation runs where each run depends on different parameters or initial conditions. The main idea behind generating an ensemble is to capture the uncertainty in the choice of correct parameter settings or to capture the dependence of the simulation outcome on the parameters. In the latter case, the main analysis task is to extract this dependence.
Ensemble visualization faces the challenge of connecting the simulation outcome with the parameter space. The parameter space can be understood as a multidimensional space where each dimension corresponds to one parameter. It contains a set of multidimensional points (the parameterspace samples), where each point represents one simulation run. For the presented work, we assumed a multidimensional parameter space with a dimensionality that is larger than 3. Due to the complexity and, thus, high computation costs of spatiotemporal simulations, it remains infeasible to sample parameter spaces of very high dimensionality. The domain experts who want to analyze their simulation results want to investigate how the parameters influence the outcome. It is often desired to obtain a segmentation or partitioning of the parameter space, where each segment represents a connected region with similar simulation behavior. One example would be a blood flow simulation, where the flow behavior may change from laminar to turbulent depending on the parameter choices, i.e., one would have a parameter space that is partitioned into two segments (laminar and turbulent flow). The question would then be, which parameter settings lead to which flow behavior and where is the transition between the two behaviors in parameter space, i.e., one wants to analyze the extent of the parameterspace segments and their interface.
The overarching goal of understanding the partitioning of the parameter space is achieved using three components: First, the similarity between the ensemble’s simulation runs is analyzed, leading to a semiautomatic generation of similarity clusters. Second, these similarity clusters induce the parameterspace segments, whose extents and transitions are analyzed using parameterspace visualizations. Third, coordinated interactions of similarity and parameterspace visualizations allow for an interactive refinement of the partitioning.
The methodological part of the paper starts with a problem statement, see Section 3. In Section 4, we provide an overview of the proposed solution for the stated problem. It includes a novel, distortionfree visualization for analyzing the parameterspace partitions based on the concept of a hyperslicer. This view is complemented with a 2D embedding of the parameterspace samples to ease the navigation within the parameter space, see Section 6. For the generation of the parameterspace partitioning, the similarities of the ensemble’s simulation runs are analyzed using similarityspace visualizations of automatic clustering outcomes in conjunction with spatial visualizations as well as visualizations of the runs’ temporal evolution, see Section 7. Coupling similarity and parameterspace analyses in coordinated views allows for comprehension and interactive refinement of the parameterspace partitioning.
We evaluate our approach on synthetic data and compare it to alternative approaches. We then apply it in three case studies to data sets from different fields. We perform the analytical workflow towards the parameterspace partitioning in joint sessions with domain experts in Section 8. Our main contributions can be summarized by:

We propose a novel visualization approach to investigate partitions of multidimensional parameter spaces of simulation ensembles, where partitions are imposed by clusters of simulation runs with similar behavior.

We link the parameterspace partitioning visualization to similarityspace visualizations of the spatiotemporal simulation runs and embed them in an interactive visual analysis tool that supports the analysis of all facets of the simulation ensemble.

We compare our approach to parallel coordinates and scatterplot matrices on synthetic data. We further discuss the usefulness and intuitiveness of our approach with domain experts from different fields within three realworld application scenarios (blood flow, semiconductor quantum wire, microswimmers).
2 Related Work
Analysis approaches for spatiotemporal simulation ensembles commonly focus on statistical properties like the mean of an ensemble potter2009ensemble; sanyal2010noodles. However, such approaches do neither support a comparative analysis nor allow for determining the parameters’ influence. There is also a wide range of clustering approaches for simulation data available pretorius2011visualization; Hao2016; Ferstl2017; Ma2019; Kappe2019; Kappe2019a. However, these approaches do not support a direct analysis of the parameter space. The approach proposed by Phadke et al. phadke2012exploring includes techniques for comparisons but it is limited to a small number of ensemble members. Eichner et al. eichner2020making present a tool for analyzing ensembles of timelines and investigating the parameter space for segmentation algorithms. However, as this approach uses correlations between timelines, it cannot be transferred easily to spatiotemporal simulation data. Wang et al. Wang2019 presented a comprehensive survey about the different aspects of ensemble visualization. Another approach kumpf2021visual analyzes the value distributions in multifield simulation ensembles but does not consider the dependency on input parameters. A comparative visualization of the whole ensemble is possible with multirun plots as proposed by Fofonov et al. fofonov2016visual; fofonovCGF2018. This analysis approach allows for comparing individual ensemble members, whose temporal evolution is visually represented as a curve in similarity space. We build upon their approach by adopting their field similarity measure and extending it to spatiotemporal similarities.
In ensemble analysis, parameterspace analysis is a key task. Sedlmair et al. sedlmair2014visual provided a conceptual framework together with a literature review. Our system mainly follows the navigation strategy globaltolocal even though it is common to switch back from a local level to a more global view during the analysis process. The framework includes six analysis tasks of which we focus on the partitioning task
for which we propose new visualization approaches, but outlier detection via similarity visualizations is also supported. Clustering ensemble data is a preliminary step to the analysis of the parameter space partitioning. This problem has been addressed by Jarema et al.
Jarema2015, who create a distance matrix and use it for hierarchical clustering. However, they neither deal with parameter space analysis nor with parameter space visualization. Recently, it has been proposed to use machine learning to fill gaps in the parameter space and find interesting regions
Hazarika2020; He2019. However, these approaches do not include a direct visualization of the parameter space and, thus, do not allow the user to obtain a global overview.Several approaches for parameter optimization have been proposed. Tuner TorsneyWeir2011 is a visualization tool based on hyperslices to investigate the parameter space for image segmentation. They use hyperslices to evaluate different measures for the quality of the segmentation. In contrast to our approach, they visualize the parameter space concerning quality measures while we investigate a parameter space partitioning based on similarities in spatiotemporal simulation outcomes. Unger et al. unger2012visual also facilitate parameter optimization by showing the goodness of fit for geoscientific data. For many applications of spatiotemporal simulation ensembles, it is not clear what result is expected and what would be a good measure to quantify how good the results are. Thus, this tool cannot be applied to the problem of generally understanding the parameter space. Bruckner and Möller bruckner2010result propose a tool to facilitate the goaldriven parameter choice for physical animations. They use spatiotemporal clustering to explore the parameter space for the creation of visual effects. However, they do not analyze the relationship between input parameters and simulation output. Pretorius et al. pretorius2011visualization address the optimization of parameters via a gallerybased visualization for determining parameter settings of image analysis algorithms. However, they only consider a small part of the parameter space because the user’s domain knowledge allows limitations. In contrast, we want to analyze the global parameter space to spot interesting features. The local parameter space analysis is also supported by Berger et al. berger2011parameterspace, who use a sensitivity analysis to optimize parameter settings. Even though they provide supporting visualizations of local neighborhoods of points, which should avoid getting lost in the multidimensional parameter space, they neither support global overviews nor a geometric understanding of the parameter space.
Obermaier et al. obermaier2015visual propose a trend analysis framework and highlight the importance of discovering the dependence on parameters. They also connect their results to the parameter space by using parallel coordinates for parameterspace visualization. Wang et al. Wang2017 also propose an adapted parallel coordinates plot to visualize the parameter space. Apart from parallel coordinates, other techniques have been proposed for parameterspace visualization including radial layouts bruckner2010result and projections/dimensionality reduction spence1995visualization; Orban2019. One of our objectives was to obtain an undistorted view of the parameter space, where the values can be read and distances can be interpreted. Hence, neither parallel coordinates nor dimensionality reduction methods should be used. Glyphbased visualizations are another optionbock2015visual, but they do not scale to higher dimensions in parameter space. Instead, we propose to achieve this objective by adopting the concept of hyperslices van1993hyperslice. This concept has been extended to show slices for multiple focus points simultaneously TorsneyWeir2018. However, these visualizations become quickly cluttered and, therefore, are not suitable for our purpose of visualizing partition boundaries. Hyperslices were also used in HyperMoVal Piringer2010 to show the model space. The tool aims to validate regression models and does not include partitioning the parameter space and investigating its properties.
Other works link parameter space and simulation outcome via iterative selection and refinement of parameter values Splechtna2015; matkovic2008interactive. In contrast to our work, they do not aim at getting an overview of the parameter space. Luboschik et al. Luboschik2014 link input parameters to simulation outcome by colorcoding the parameters for each ensemble member, which does not provide a geometric overview of the parameter space. In Paraglide bergner2013paraglide, the focus was on sampling aspects and use case evaluations. They include a manual parameter space partitioning, while we include a semiautomatic partitioning approach that facilitates working with several similarity clusterings. Paraglide visualizes the parameter space partitioning using a scatterplot matrix (SPLOM). This approach is quite similar to ours, but a SPLOM comes with the issue of overplotting for complete partitions. We compare our approach against visualizations with parallel coordinates and SPLOMs.
3 Problem Specification
T1  T2  T3  T4  
Piringer et al. Piringer2010  
Splechtna et al. Splechtna2015  
Matkovic et al. matkovic2008interactive  
Fofonov et al. fofonovCGF2018  
TorsneyWeir et al. TorsneyWeir2011  
Unger et al. unger2012visual  
Berger et al. berger2011parameterspace  
Obermaier et al. obermaier2015visual  
Wang et al. Wang2017  
Luboschik et al. Luboschik2014  
Bergner et al. bergner2013paraglide  ()  
Our approach 
We consider an ensemble consisting of several timedependent simulation runs , which are also referred to as ensemble members. Each simulation run was created by a simulation with a unique set of input parameter values . The input parameters form the parameter space . In the parameter space, each simulation run corresponds to a single point with coordinates that reflect the chosen parameter values. In the context of this work, we only consider numerical or binary input parameters. The number of input parameters, i.e., the dimensionality of the parameter space , may be larger than three, but we assume it not to become too large. Further, each run includes timevarying simulation data, where each time step corresponds to a 2D or 3D scalar field.
The visualization approach should target the following analysis tasks:
(T1) Partitioning the parameter space based on the simulation outcome: The partitions in parameter space shall be formed by clusters of similar simulation runs. The analysis approach should support the user in defining these clusters and thus creating the partitioning.
(T2) Obtaining an overview about the parameter space and its partitioning: The parameter space samples should be shown all at once to provide an overview, and it should become clear to which segment of the partitioning the samples belong.
(T3) Analyzing the extent and the uncertainty of the different parameter space partitions: The user should understand the geometrical structure of the partitions in parameter space, including their distances, sizes, and shapes. The uncertainties of the partitioning shall be conveyed to avoid misinterpretations.
(T4) Exploring the simulation outcome on different levels of detail: For a complete understanding of the simulation ensembles and the meaning of the segments, an analysis of the simulation data is inevitable. Here, the analysis of the temporal behavior is important as well as the spatial analysis of individual time steps.
A commonly applied approach to solve these tasks involves a manual analysis by observing each ensemble member individually and comparing the observations. However, this is a tedious and timeconsuming process. Thus, the goal is to develop an interactive approach for the visual analysis of the parameter space that incorporates (semi)automatic algorithms to define and analyze the parameter space partitioning. All methods should be embedded into an integrated tool that supports the entire analytical workflow. Table 1 summarizes, which tools support which tasks. It documents that our integrated tool is the only one that addresses all tasks.
4 Overview
To address the analysis tasks listed above, multiple facets of the data need to be investigated, which we propose to do using linked coordinated views of these facets as detailed below. For the analytical workflow, we follow an “Overview first, zoom and filter, then details on demand”approach shneiderman1996eyes. To have a flexible and extendable tool, we decided to integrate our visualizations into the modular structure of the Voreen framework meyer2009voreen.
To support the partitioning of the parameter space (T1), we need to establish a clustering of the ensemble menbers based on the similarity of the simulation outcomes. While cluster generation is a widely used patternrecognition task and can be performed automatically, any clustering algorithm comes with choices for its settings that influence the clustering outcome
kumpf2018cluster. On the other hand, a fully manual cluster generation would require us to inspect all simulation runs in sufficient detail. Hence, we allow for an automatic cluster generation coupled with interactive analysis and adjustment of the clustering outcome. We chose a hierarchical clustering approach because it only depends on a pruning level in a cluster tree. We support the interactive selection of the pruning level by visualizing the cluster tree in an interactive clustering dendrogram, see Figure 1 and Section 7.1. Other clustering methods depend on much less intuitive parameters like the number of expected clusters, kernel sizes, or bin sizes leading to a static output.To allow for an effective judgment of the clustering outcome, we support its visual inspection. The similarity space formed by pairwise similarities of all ensemble runs can be visualized by embedding them into a 2D visual space where each point in the embedding represents an ensemble member. Since we are interested in observing similarities (or dissimilarities), those should be represented as distances in the embedding. Representing dissimilarities as distances in an embedding is obtained by minimizing the objective function of multidimensional scaling approach wickelmaier2003introduction. We refer to the visualization of the ensemble members’ similarities in an MDS embedding as similarity embedding, see Figure 1 and Section 7.2. We visually encode the clusters in the embedding by colorcoding the sample points.
For an overview of the complete, multidimensional parameter space and its partitioning (T2), we propose to use a dimensionality reduction of colorcoded parameter space samples . We refer to the respective visualization as parameter sample embedding (see Figure 1 and Section 6.2). For the embedding, we chose MDS because it minimizes stress such that distances are maximally preserved. This view also allows for easy navigation by selecting single samples for a more detailed analysis.
For analyzing the geometric structure of single segments (T3), a distortionfree visualization of the parameter space is needed. This is generally not fulfilled by embeddings that do not allow for reading off the dimensions’ values. Parallel coordinates or SPLOMs are alternatives that support this task. However, they suffer from overplotting and, thus, hinder an intuitive understanding of the geometry of the underlying space as shown in Section 9. Therefore, we propose a hyperslicer, see Figure 1 and Section 6.1. Hyperslices provide distortionfree visualizations, which facilitate the required analysis, cf. TorsneyWeir2011; Piringer2010. Additionally, they are based on viewing slices, which is a visual encoding that most domain experts are familiar with because of commonly used slicebased volume viewers. However, existing hyperslice visualizations only provide information about the selected focus point. We propose a hyperslice visualization that we enhance with information about the parameter space partitioning outside of the slice. One remaining issue may be that hyperslices do not scale well with the number of dimensions. We alleviate this issue by including an automatic reordering based on the correlation between the single parameters and the simulation result such that one can concentrate on the most relevant dimensions.
To understand the simulation behavior over time (T4), we visualize the temporal evolution in similarity space. While the similarity embedding aggregates information over space and time, we also add a visualization that encodes the time component explicitly. Using the same considerations as for the similarity embedding, we create a 2D embedding where the temporal evolution is shown as curves parametrized over time. We refer to the respective view as the temporal evolution plot, see Figure 1 and Section 7.2. To investigate single time steps, we use direct volume rendering which has become the standard for visualizing volumetric scalar fields. In case of occlusion issues or for 2D simulations, a slicebased visualization referred to as a slice viewer can be used instead or in addition.
For establishing correspondences between single views, we use consistent color throughout the different visualizations because color is the most intuitive visual variable for this purpose.
5 Synthetic Dataset
Throughout the following sections, we will use a synthetic dataset to illustrate the different components, support the explanations of the techniques, and validate our approach. The dataset has a fourdimensional parameter space . Parameters , , and influence the number and position of Gaussian kernels in a 2D spatial domain (of resolution ), while the parameter does not influence the output. Four different behaviors are modeled, where each behavior is associated with one segment of the parameter space, as shown by the four colors in Figure 2. We sample the 4D parameter space equidistantly, choosing samples. The details for the creation of the dataset can be found in Appendix, Section 1.
6 Parameter Space Visualization
Given a parameterspace partitioning (based on a clustering of simulation runs, cf. Section 7), this section is concerned with its visualization.
6.1 Hyperslicer
The visualization of the parameterspace partitioning with undistorted views, preservation of distances, and the possibility to directly read off the parameter values (task (T3)) adopts and extends the concept of the hyperslice approach suggested by van Wijk and van Liere van1993hyperslice
. The main idea of hyperslices for the visualization of a multidimensional space is to show axesaligned slices through all pairwise combinations of dimensions in a matrixlike layout, i.e., a layout similar to SPLOMs. However, in contrast to SPLOMs, hyperslices do not show a projection of all data samples, but only the selection that is in the slice. Thus, the hyperslice approach corresponds to a generalization of a volumetric slice viewer to higher dimensions, where multiple slices are shown simultaneously. Treating the parameter space as a multidimensional space where each parameter represents one dimension and corresponds to one axis of the multidimensional Cartesian coordinate system of the parameter space, we slice the parameter space around a focus point that can be chosen and changed interactively by the user. This procedure is shown schematically in Figure
3I for a threedimensional space. The hyperslice viewer then shows all axesaligned 2D planes that contain the selected focus point. In each slice viewer, two parameter values are varied on the axis while the others are kept fixed. Those fixed values form the focus point, which represents one parameter setting and is shown by the grey cross. The exact parameter settings of the focus point are additionally shown below the parameter names on the diagonal of the matrix layout.Partitioning Visualization. We extend the hyperslice approach van1993hyperslice to enhance the hyperslices with the partitioning information. We include the samples’ memberships to segments, the segments’ boundaries, and respective uncertainties. We show our visual encodings in Figure 3.
First, all simulation runs whose parameter settings belong to the selected slice are drawn as colored dots with a white outline that separates them from the background, see Figure 3. The colors indicate the cluster membership. Each similarity cluster is assigned a unique color and all parameterspace samples whose simulation runs belong to a similarity cluster are assigned the same color. We draw a black circle at all projected locations of simulation runs that lie outside the selected slice. This provides context about the locations in parameter space of other simulation runs and gives a hint to the user, for which of the parameter settings simulation runs exist. As the parameter space is often sampled on an axisaligned grid, multiple samples may be projected to the same position within a slice (see Figure 3III). We show the names of all runs when hovering over the position, where the names are written in the corresponding cluster color. In summary, the visual appearance of the parameterspace samples’ visualization is similar to a SPLOM with the points lying in the currently selected slice being highlighted.
The parameterspace partitioning
is visualized by colorcoding each slice according to the similarity clusters. To obtain the segmentation, we use multiclass support vector machines (SVMs) (see Figure
3II) as implemented in libsvm libsvm. We train the SVM based on the clustering in the similarity space. For the training, we use radial basis functions as a kernel. The parameters
, defining the costs for wrong classifications, and , which can be intuitively interpreted as the range of influence of a single sample, can be defined by the user. However, as we want to avoid wrong classifications, we show a warning if they occur such that the user can adapt the parameters. The use of SVMs leads to significantly smoother boundaries for sparsely sampled spaces when compared to, e.g. an approximation of the Voronoi diagram. In any case, the boundaries that we show are only an approximation. If the domain experts want to define segment boundaries more precisely, further simulations are unavoidable. Our tool provides valuable hints about where to sample the parameter space for executing further simulation runs and thus refining the segment boundaries.We discretize the parameter space to a regular grid for fast computations of the segment boundaries and use the trained SVM to determine to which cluster the grid point belongs. The user can interactively choose the resolution of the grid. A relatively low resolution suffices for most applications, as parameterspace sampling is often sparse. Using low resolutions, the visual output of the colorcoded slices may exhibit staircase artifacts (cf. Figure 3III). The artifacts can be reduced by using a higher grid resolution, possibly at the expense of a short delay.
Uncertainty visualization. Since the parameterspace segmentation partitions the entire parameter space, each grid point in parameter space gets assigned a color no matter whether there was any simulation run located close by. Hence, parts of the parameter space may get colored despite having no simulation run with parameter settings in those parts. To provide a visual hint, we add an uncertainty visualization to our slices. We support the evaluation of the expressiveness by varying the saturation of the colors in the slice viewer (see Figure 3III). The saturation is adapted according to the Euclidean distance to the closest parameterspace sample. We first normalize the parameter values, calculate the distances, and then normalize the distances to the unit interval. The saturation of the colors is then decreased by multiplying it with . Thus, lower saturation of the colors corresponds to higher uncertainty. In Figure 3III, which shows a regularly sampled parameter space, one can see that saturation is lowest in between the sample points, i.e., there the uncertainty is highest. The colors are most saturated close to the sample points indicating a low uncertainty. This corresponds to the intuitive expectations that the uncertainty decreases in regions without sample points. The uncertainty visualizations are especially helpful for irregularly sampled parameter spaces, where it is unclear if any sample points outside the selected slice are close to the observed region. The desaturated colors in uncertain regions remind the user not to overinterpret the clustering results. In fact, segment boundaries become less visible in uncertain regions. Of course, the uncertainty visualization can be turned off at any time.
Boundary projection. A hyperslice visualization of the partitioning per se only shows the segmentation within the chosen slices. To understand the whole parameter space, one would need to traverse it in all dimensional directions, which can be timeconsuming and may impose a high cognitive load. We support this task by providing additional information about the investigated segment within the slice view. More precisely, we include a projection of segmentation boundaries for a selected cluster to guide the exploration of the parameter space. This provides a hint about which parameters need to be varied to observe the shape of the segment. We encode the existence of clusters by using a texture overlay (see Figure 3IV). The texture we chose exhibits dots on a grid rotated by and colored in the same color as the selected similarity cluster to encode the cluster’s extent directly. The dotted texture leads to a perceived extension of the selected segment while keeping the segmentation in the selected slice visible. For the projection of a segment with label along parameter , we create a discrete binary mask for each slice that is spanned by parameters and , . Without loss of generality, assume . Further, let be the current focus point. The mask indicates, whether a point of the slice shall be textured. For each point , the mask is computed by
where is the labeled dimensional parameter space. The projection itself is represented schematically in Figure 4(a) for a 3D parameter space (for illustration purposes only). The parameter space is partitioned into two segments by the brightred plane. Now, the red segment is selected for projection into the slice spanned by parameters and along dimension . The boundaries of the red segment in dimension are extracted and projected onto the slice. The projected area within the boundary is shown with the texture using red dots. The same result is shown on the lefthand side of Figure 4(b). If we have a fourth parameter , we could switch to a projecting along that dimension and visualize the respective boundaries of the red cluster. A possible result is shown on the righthand side of Figure 4(b).
Finally, we can also combine projection dimensions using Boolean operations. If we want to consider the union of the segment boundaries in two dimensions and , we select “ or ”, extract the boundaries of the union, and visualize them as shown at the bottom of Figure 4(b). The union is computed by executing an oroperation on the two binary masks. Any combination of arbitrarily many parameters (apart from the dimensions and that span the plane) can be combined using Boolean operations. We support the most common Boolean operators including and, or, not, xor, nor, nand, implication, and equivalence. They are entered within the graphical user interface using a command line. If the projection for the complete parameter space is desired, the keyword Complete can be used in the Boolean expression and the occurrence of segments from all possible parameter combinations is visualized.
We analyze the results for synthetic data to verify our approach. Figure 3II shows the whole hyperslicer for the entire parameter space and provides an overview. Especially the slice spanned by and shows all clusters. We enlarge this slice and activate the uncertainty visualization (see Figure 3III). We can see that the saturation decreases with an increasing distance to the sample points, as expected. Based on the colored labels, we can also already see that runs of the red segment are projected to the segmented sample point as well as runs of the green segments. We use the projection of the segments to see the full extent of the green cluster as shown in Figure 3IV. We see that it is found for larger c values and its boundary is diagonal in this slice. This fits the definition of the dataset. We also observe that parameter has no impact.
Dimensionality reduction. An obvious general drawback of the hyperslice approach is that it does not scale to very high dimensionality. However, in the context of parameterspace visualization for spatiotemporal simulation ensembles, the number of parameters does not become very large. Still, it may be desired to keep the number of dimensions as low as possible. We provide a method to reduce the dimensionality of the parameter space interactively by having the user select which parameters to include in the analysis. To facilitate the selection, we propose to reorder the parameters based on the absolute value of the correlation between simulation data and each parameter. This calculation is determined by computing the Pearson correlation coefficient between the parameter values and the first principle component taken from the similarity embedding (see below) as
where and are the mean of and , and denotes the ensemble member. We visualize the correlations for all dimensions using a bar chart, which supports the user in deciding how many dimensions to include in the analysis, see Figure 8 (bottom left).
6.2 Parameter Sample Embedding
To get an overview of the parameterspace samples, we propose to use a 2D embedding of the parameter space based on MDS projections (task (T2)), using the R implementation. This view facilitates navigation in parameter space. Moreover, it allows for investigating whether similarity clusters also form clusters in parameter space, i.e., it facilitates the evaluation of the quality of the clustering and especially how they are connected in parameter space. An annotated example can be found in Figure 5. Each cluster is represented by drawing the points in the cluster color and by an additional point showing the cluster’s barycenter. The barycenters, shown as larger dots in the cluster color, are added to facilitate selecting the middle of the cluster as the focus point for the hyperslice visualization. Since adding the barycenters shall not affect the projection layout, especially not when interactively changing the clusters, the barycenters are computed after projection.
The parameter sample embedding for the synthetic dataset presented in Figure 5 shows the regular structure of the parameter space. One can also see that the segments are connected even though there is an overlap. This overlap is expected by the dimensionality reduction but can be resolved by the use of the hyperslicer.
7 Ensemble Analysis
To support the whole analysis process for spatiotemporal simulation ensembles, we embed the proposed parameterspace visualizations described above into an interactive visual system with coordinated views, see Figure 1. The system allows for a detailed analysis of all facets of the ensemble. It also supports similaritybased clusterings of simulation runs and its interactive adjustment to induce a parameterspace partitioning.
7.1 Clustering
Similarity Measure. Clustering of simulation runs shall be based on their similarity. In order to define and establish a similarity space, we first need to define a similarity measure. Fofonov and Linsen fofonovCGF2018
proposed a field similarity measure that determines the similarity between scalar fields as a generalization of an isosurface similarity measure. They presented evidence that its behavior is preferable over other similarity measures in preserving the characteristics of the observed phenomena in ensemble data sets. For a detailed discussion, we refer to their paper. A Monte Carlo approach allows for fast computations using random sample points. From these sample points, vectors that describe the scalar field can be created. The distance
between two timesteps and of runs and characterized by vectors and can be calculated as(1) 
We generalize this similarity measure for simulation runs. To establish a fast similarity measure between two runs, we calculate the average distance over time as shown schematically in Figure 6a. Different runs may cover different time intervals and may have different step lengths. When comparing two runs, we only consider their overlapping time interval . This interval is equidistantly resampled with sampling length , where the number of sampling points
is determined as the maximum number of time steps of the two runs within the considered time interval. We assume that the distances between consecutive time steps are sufficiently small to allow for their linear interpolation. Hence, the distance between runs
and is calculated as(2) 
with .
During the interactive analysis, the user may select a temporal region of interest such that the similarity shall be restricted to a subinterval. Having precomputed the similarities for all time samples as in Equation 1, the recomputation of Equation 2 for the subinterval can be performed within interactive rates. We also want to point out that, since we are computing similarities pairwise, we can always use the largest temporal overlap of each pair of runs for the similarity computation.
Depending on the application scenario, it may be desirable to compute the similarities as in Equation 2 or to compute the best match of the time series by allowing a time shift (up to an upper threshold ) for a temporal alignment. In the latter case, we replace Equation 2 by
(3) 
where is varied in discrete steps between and .
Computing pairwise distances of all simulation runs leads to a distance matrix , which spans the similarity space. Which similarity measure to use (Equation 2 or Equation 3) is decided by the domain expert performing the interactive analysis.
Hierarchical Clustering. To detect patterns in the similarity space of the simulation runs, we apply a clustering method based on distance matrix (task (T1)). As it is not a priori known how many clusters exist, we propose to use a hierarchical clustering approach. A hierarchical clustering approach generates a cluster tree by iteratively merging clusters in a bottomup fashion until all clusters are merged into one cluster, which becomes the root of the tree. The number of clusters can then be determined in retrospect by analyzing the cluster tree.
In hierarchical clustering, the results depend on the choice of the linkage algorithm
. As the optimal choice for the linkage algorithm is datadependent and thus cannot be generally answered, we implement the possibility for the user to choose from a list of the most common linkage algorithms. The users may interactively test multiple linkage algorithms and visually compare their outcomes to choose the optimal algorithm for their goal. The list of available algorithms includes (i) Ward’s minimum variance method that minimizes the increase in variance when two clusters merge (called ward.D2 in the following), (ii) an adaption thereof where the input distances are not squared (ward.D), (iii) single linkage which is based on the minimum distance, (iv) complete linkage which is based on the maximum distance between points, (v) average linkage using the unweighted pairgroup method for arithmetic averages (UPGMA) which minimizes the average distance between pairs of elements in the union of the clusters, and (vi) average linkage using the weighted pairgroup method for arithmetic averages (WPGMA) which is based on the average distance between pairs from both clusters. For the implementation of the hierarchical clustering algorithms, we used the
R package R, which we accessed from our C++ code using the RInside package RInside.Cluster Analysis. A hierarchical clustering does not create a single clustering solution. However, it represents a hierarchical ensemble of clusterings, from which a single clustering result can be extracted by pruning the cluster tree. To facilitate the pruning, we visualize the hierarchical clustering outcome with a dendrogram. This cluster tree visualization exhibits in which order the clusters merge and split. The height in the dendrogram also conveys the distances at which the clusters change. The users can prune the cluster tree by interactively adjusting the height of a horizontal line representing the pruning level. The number of clusters is thus implicitly given by the pruning selection. For example, in Figure 7a, the number of clusters is four.
We link the interactively chosen clustering result to other views to analyze the clusters and their interactive adjustment further. This is achieved through the use of colors. We identify three requirements to apply a color map to the cluster tree: (1) The colors should be clearly distinguishable. (2) Since clusters correspond to the parameterspace segments, an excessive number of clusters should be avoided, as it would lead to an oversegmentation of the parameter space. (3) When adjusting the pruning level of the cluster tree, the assigned colors should remain unchanged as much as possible. Hence, we decided to pick the colors from a qualitative (or categorical) color map with up to clusters generated with ColorBrewer colorBrewer. (For the examples shown in this paper, we used the scheme “Set 1” with classes.) We assign the colors by traversing the cluster tree in a topdown manner. The root node gets assigned the first color. At each inner node, the child with the largest subtree gets assigned the same color as its parent, while the other child gets a new color from the set of still available colors. If no further color is available, both children get assigned the color of the parent. The procedure progresses until the leaves are reached, see Figure 7a. It can be observed that clusters at a lower level in the hierarchy will be assigned the same color. However, it is possible to select subclusters for further analysis as in Figure 7b, where colors are reassigned to the selected subclusters, while already assigned colors within the selection are maintained. Not selected subclusters are colored in light grey.
7.2 Visual Encodings
Similarity Embedding. We introduce a similarity embedding to visualize the similarity space of all simulation runs as captured by distance matrix . Hence, we use a lowdimensional embedding, where each point represents a simulation run. The distances in the embedding shall reflect the distances in the distance matrix as much as possible. This goal reflects the objective function of an MDS approach wickelmaier2003introduction. The similarity embedding is linked to the other visualizations via the assigned cluster colors, see Figure 7c. It allows the user to evaluate the quality of the clustering and the user can interact with the dendrogram to adjust the cluster selection for achieving a better matching. For the synthetic dataset, we observe four separate groups corresponding to the clusters for the pruning level selected in Figure 7a. By choosing a lower pruning level in the dendrogram, we can also confirm that the green and the blue cluster split into three subgroups each.
Temporal Evolution Plot. The clustering approach that induces the parameterspace partitioning is governed by distance matrix , where we aggregate over time. However, the investigation of the temporal evolution is essential to observe if the runs also behave similarly over time and find out, e.g., if they diverge or converge (task (T4)).
To visualize the temporal evolution, we follow the ideas of the multirun plot proposed by Fofonov et al. fofonov2016visual. Therefore, we compute the dissimilarities between all time steps of all runs according to Equation 1 and store them in a distance matrix . Hence, we would like to use a lowdimensional embedding, where each timestep of a simulation run is represented by a point and where the distances in the embedding reflect the distances in matrix as much as possible, which is again achieved by an MDS approach wickelmaier2003introduction. The projected points are then connected in temporal order to obtain a curve showing the similarity over time and getting an overview of the temporal evolutions of the runs’ similarity. The simulation runs are colorcoded according to the clustering, which allows for a quick evaluation of the temporal evolution of cluster members, see Figure 6c. For the embeddings, we can use 1D, 2D, or 3D spaces. Which dimensionality is chosen depends on the intrinsic dimensionality of the data set. 1D embeddings are particularly intuitive, as they can be plotted using time as a second axis. The 1D embedding can also be used to intuitively select a time interval of interest that should be analyzed further. This time interval is shown in 2D and 3D embeddings by rendering the parts of the curve outside the selected time interval with decreased saturation.
7.3 Interplay of Components & Analytical Workflow
A typical analytical workflow using our tool starts by getting an overview of the ensemble using the similarity embedding and the temporal evolution plot. The temporal evolution plot can be used to limit the time range taken into consideration for the analysis. Using the clustering dendrogram together with the similarity embedding, a suitable linkage algorithm and pruning level of the hierarchical clustering is chosen. This clustering induces a partitioning of the parameter space, which can be observed in the parameter sample embedding to get an overview. Then, we can investigate this partitioning and the boundaries of the segments in detail and understand their relationship to the parameter values. This is done via the hyperslicer. Using the correlation between parameter values and simulation outcome, we reduce the dimensionality of the hyperslicer by deselecting uninteresting parameters. Clusters can be selected in the parameter sample embedding to change the focus point to the respective cluster in the hyperslicer. The boundaries of this cluster can be investigated to understand its extent in parameter space. For a connection back to the simulation outcome, we select a single ensemble member in the hyperslicer and observe its temporal evolution in a volume rendering or a slice viewer, which are both common techniques for volume visualization (task (T4)). Using our system, it is easy to change the level of detail of the investigations. Common usage scenarios include, e.g., refining the temporal selection, changing the method or the pruning level of the clustering, or selecting various clusters for indetail investigations. A detailed walkthrough for a concrete application scenario is presented in Section 8.1.
8 Case Studies
We present three case studies from different domains, one within this paper and two in the supplementary material. For the first case study, we include an indetail walkthrough explaining how the insights were obtained. We also conducted an informal user study with, in total, four domain experts (two professors and two graduate students), which were involved in the creation of the three datasets we used. We started with a short introduction to our tool and the respective visualizations. Then, they explored their datasets (in the same session) and gave us feedback. The timings reported below were taken on a laptop with a 1.6 GHz Intel Core i5 processor.
8.1 Blood Flow
We analyzed the simulation results of blood flow in an aneurysm leistikow2020interactive. The scientists want to understand how the parameters influence the simulation outcome to find a suitable match to experimental data. We used the magnitude of the flow field created by using a Lattice Boltzmann method. The simulation outcome depends on five parameters: a characteristic length, a characteristic velocity, the viscosity of the fluid, its density, and a parameter that stores whether Bouzidi boundary conditions, which are a special kind of boundary conditions for fluid simulations, are used. The dataset consists of runs with to equidistant timesteps. Each scalar field has a grid resolution of sample points. To obtain a sufficiently good representation, Monte Carlo seed points per timestep are chosen for the similarity calculations. The relevant time interval for further analysis was determined and interactively chosen to only start at simulation time s. Timings and scaling for computing distance matrix are discussed in detail in literature fofonov2016visual. Given , the subsequent calculation of distance matrix took s. We evaluated the application on this dataset with the help of two domain experts, where one of them created the simulation ensemble and the other one ran accompanying measurements.
At first, the similarities were investigated and a suitable clustering based on the similarity plots is chosen. Using our tool, different linkage algorithms were tested and Ward’s minimum variance method without squared distances (ward.D) was identified to create the most reasonable clusters. A division into two similarity clusters directly becomes evident from the clustering dendrogram. (see Figure 1). We selected the partitioning by choosing a suitable pruning level. This clustering was investigated next in the parameter space. Looking at the parameter sample embedding, one can see that the two clusters in parameter space do not agree with those in similarity space. Using the hyperslicer, we observed that the parameter length leads to a separation of these two clusters. These findings agree with the expectations of the domain expert. However, he rather had expected the influence of the boundary conditions to be larger than the influence of the length. Our observation, however, can be confirmed by calculating the absolute value of the correlation between the simulation data and the individual parameters (see Figure 8 bottom left). We further observed that the two parameters density and viscosity play no significant role and decided to exclude them from further analysis.
Next, we intended to refine the two initial clusters. In the clustering dendrogram in Figure 1, we observe that both clusters split into two subclusters at approximately the same height (red cluster splits into red and purple, while blue cluster splits into green and blue). In the parameter sample embedding in Figure 8, we can see that this splitting almost leads to a separation of the two cluster pairs in parameter space. Only the green cluster occurs in both groups. Thus, this green cluster is selected in the parameter sample embedding for further investigation. Projecting the segment to the slice, we observe that no further regions contain the cluster. The hyperslice view also shows a clear separation of the red and the purple cluster caused by the Bouzidi boundary condition. The green and blue clusters, which occur only for small lengths (i.e., a high spatial resolution of the simulation domain), are not separated by the choice of boundary conditions. Instead, the blue cluster only occurs without Bouzidi boundary conditions and for higher velocities. This observation can be interpreted as a lower influence of the boundary conditions for higher spatial resolutions. To better understand the simulation result, single points in each cluster were selected and investigated using a direct volume renderer.
8.2 Findings
The domain experts ranked the tool as very helpful for their work. Our visualizations facilitate the process of understanding the data and the parameter influence. Two of the domain experts explicitly mentioned the importance of linking the abstract visualizations with the spatial visualizations to support an intuitive understanding. The domain expert who ran the blood flow simulations stated that our tool allowed him to confirm prior knowledge like the strong influence of the characteristic length and the boundary conditions. At the same time, he got new insights into the dominance of the individual parameters. Three domain experts positively highlight the broad applicability of the tool. The one who works with the blood flow dataset as well as with experimental data would like to include her experimental data into the visualization and use the approach to analyze the parameter values of the segment closest to the measurements. The physicist working on microswimmers (see Appendix, Section 3) states that the tool permits new possibilities in analyzing data that depends on more than three parameters. He rates the hyperslicer as especially helpful to obtain a mental picture of the parameter space’s geometry. It helps him to understand phenomena depending on multidimensional parameter spaces, which he usually would limit to fewer parameters. However, the limitation of parameters comes with the risk of missing interesting behavior predicted by the model. The domain scientist working on semiconductor simulations (see Appendix, Section 2) pointed out that the tool saves him time in the analysis process and showed him some parameter dependencies worth a deeper analysis with further simulations. Without our visualizations, he was not aware of this interesting region in the parameter space. He sees an application of our approach in finding parameter subspaces for further analysis.
One of the domain scientists would appreciate further visual support in investigating a selected cluster, which we plan to include in future research. Including an uncertainty visualization into the hyperslicer was also suggested by the user with whom we conducted the first case study. We included this before the sessions with the other experts. The users rated especially the interactive exploration as very helpful and intuitive, and one of them also positively highlighted the high information density. However, it became clear that training is necessary to use our tool. The need for an initial training session is common for most complex tools and does not limit the intuitiveness to a trained user.
9 Comparison to SPLOM and PCP
We compare our extended hyperslicer approach to the use of scatter plot matrices (SPLOMs) and parallel coordinate plots (PCPs). Here, PCP and SPLOM are only compared to the visual encoding of our hyperslicer, not as a replacement for our whole system. Parameter spaces are often sampled on a structured grid, e.g., for the datasets presented in Section 8.1 (Bloodflow) and Appendix, Section 3 (Microswimmers).
The results for a structured sampling of the parameter space when using the three approaches are shown in Figure 9ac. As in the hyperslicer, the segments in the PCP and SPLOM are colorcoded. One can directly see that both PCP and SPLOM suffer significantly from overplotting. The structured sampling can explain this. For PCP, it is especially problematic as all possible parameter combinations are present in the dataset. Thus, many lines are drawn on top of each other. This also explains why there are no visible purple lines between and . Another problem might arise due to misleading information. In both visualizations, it is not directly possible to see how many points or lines are drawn on top of each other. The visualization is also strongly impacted by the rendering order which determines which color is shown. Even though one can suspect from the PCP as well as the SPLOM that the red cluster only occurs for small values of c while the blue one is visible for larger values, it is not clear if this is not a projection artifact due to overplotting. Transparency or orderindependent blending might reduce this issue in some cases, mainly if the density is not homogeneous, but it does not generally help in visualizing differently colored clusters and may cause some hard to interpret mixed colors. The hyperslicer addresses this issue by only showing a clearly determined slice and, thus, a subset of the total information.
In principle, a similar extension of showing subsets can be introduced for PCP and SPLOM as well. However, the hyperslicer enables the user to develop a geometric understanding. To compare the different approaches for this task, we use an unstructured sampling as presented in Figure 9df. In those cases, overplotting is less prominent for SPLOM and PCP. However, it is still hard to identify some structures. For example, while the separation caused by parameter can be identified in PCP and SPLOM, the diagonal structure for the green and the purple cluster is hardly visible (see Figure 2 for the shape of the segments in 3D). Different adjustments like edge bundling for PCP or densitybased scatterplots for SPLOM might partially improve the perception but will not fully solve the problem. The reason for this lies in the large amount of data which is shown completely in SPLOMs and PCP, while only a subset is visualized in the hyperslicer and this subset can be changed interactively. The local view of the hyperslicer allows us to spot the diagonal shapes, for example, in the slice spanned by parameters and . The possibility to interactively change the focus point further supports building a geometric understanding of the parameter space (see video).
10 Discussion and Conclusion
We presented an approach to analyze simulation ensembles leading to a parameterspace partitioning based on the similarity of the simulation runs. We successfully applied our visualizations to different analysis tasks and the domain experts were able to confirm existing knowledge and get new insights into their data. We received positive feedback together with helpful suggestions. It was even mentioned that our approach would allow the analysis of datasets with higherdimensional parameter spaces, e.g., a tendimensional parameter space, which before was mainly avoided due to the high effort of manual analysis that is necessary. Additionally, our visualizations hint at possibly interesting parameter regions that otherwise might be missed in manual analysis.
A wellknown problem arising with hyperslices is the bad scaling to high dimensions known from SPLOMs. Even though this might lead to problems in some specific cases, for most simulation ensembles, the number of simulation parameters is not that high. For higherdimensional parameter spaces, a dense sampling is computationally barely possible. Additionally, we included the possibility to determine the correlation coefficient between the single parameter values and the simulation data. This introduces a ranking of the parameters and facilitates the selection of relevant parameters. Thus, the dimensionality of the parameter space can be reduced. One of our domain experts would have liked to select more than one cluster at a time, which could be easily added to our approach. However, when selecting several clusters simultaneously, the visualizations may quickly become cluttered. Finding a suitable visualization dealing with this problem and facilitating the further analysis of identified clusters will be part of future work.
Our methods are based on a (dis)similarity matrix. The choice of a similarity measure is crucial. Depending on the application, there might be cases where the field similarity is not the best choice. To analyze the dependence of the analysis result with our methods on the similarity measure, further research is needed. However, exchanging the similarity measure is straightforward, which then allows for the direct application of our visualization methods to vector fields or even tensor fields. In addition to changing the similarity measure, it is only necessary to adapt the spatial visualizations to analyze flow field ensembles. Suitable flow visualizations are already available in Voreen and can be easily exchanged due to the modular structure.
Acknowledgments
This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) grant 260446826 (LI 1530/212). We would like to thank Verena Hörr, Simon Leistikow, Andreas Völker, and Raphael Wittkowski for their valuable ideas and feedback.