1 Introduction
Turbulent, seemingly chaotic flows, are known to create selfsimilar structures and flow patterns on multiple scales. Especially interesting are any discrete longlived structures that appear close to the dissipation scales, originating from the intermittency of the turbulence. In magnetohydrodynamical (MHD) turbulence these intermittent structures manifest themselves as the socalled current sheets that are regions of intense electric current flowing in thin twodimensional sheetlike configurations (Biskamp_2003). Analyzing the geometrical shapes and sizes of these structures can help us better quantify the role of intermittency in turbulent fluids and plasmas.
In the past, different statistical methods for automating the detection of such structures have been conceived but the algorithms are known to be computationally very demanding (e.g. Uritsky2010; Zhdankin2013; Azizabadi2020). Machine learning methods and computer vision algorithms offer a new promising avenue for constructing such detection frameworks since they are fast to evaluate and highly optimized (e.g., 2020Dupuis; Hu2020; 2021Sisti). Here, we present an ensemble unsupervised machine learning algorithm for image structure detection and automated segmentation that is specifically tailored for structure detection from computer simulation outputs.
In general, computer vision algorithms and the related machine learning tools offer an interesting new way of studying physical systems. These algorithms enable an easytouse and efficient automation of visual segmentation of computer simulation outputs. Image segmentation is an actively studied problem in the field of computer vision where the aim is to label each pixel in an image into a distinct category. Typically, supervised learning models are used to perform the pixel category assignment — in this case, large training sets containing labels for each pixel needs to be compiled a priori. The stateoftheart models for realworld images are prominently deep learning algorithms
(he2017mask; choy20194d; Valada_2019) and they are rapidly evolving and improving (e.g., Landrieu2018PointCloud; wang2017nonlocal; Zhu2019AsymmetricNN). However, the large computational cost and requirement for a vast pixellevel labeled pool of data make some of the best performing supervised models not suitable for practical analysis. Additionally, obtaining a desired accuracy for deep learning models is known to be notoriously hard (HURTADO2001; Shrestha2019).A major class of machine learning methods tackling data with no ground truth is the unsupervised learning method. Unsupervised learning algorithms are prominently used for cluster detection, dimensionality reduction of high dimensional data, or aiming at representing the data with a few prototypes. Unsupervised algorithms, which can be applied to data with no knowledge of the ground truth or even of the amount of possible clusters, offer an easytouse alternative for segmentation of images and videos
(gansbeke2020scan; Ventura2019RVOSER). A major drawback of unsupervised machine learning methods is that the resulting image segments and their associated clusters tend to be varying from one evaluation of the algorithm to another. This can be highly unwanted especially in applications where robust cluster classification, segmentation, and pixellevel image dissection are needed. The validity of unsupervised clustering algorithms depends on the algorithm and prior knowledge about the data; this estimation is usually done with external or internal metrics
(halkidi2001clustering; Rini_2018).A common cure for increasing the ”signal strength” of individual analysis results is to combine multiple evaluations together, hence increasing the signaltonoise ratio. These algorithms are typically known as ensemble machine learning methods. Ensemble frameworks have been developed to increase the robustness and stability of unsupervised clustering algorithms in many previous studies
(Fred2002; zhang2014unsupervised; topchy2004mixture). Graph partitioning models to create cluster ensembles show promising results in (GraphPartitioning1; strehl2002cluster). Multimodal hypergraph methods are developed, for example, in (YuIEEE; YuIEEE2). In jiang2004soman unsupervised ensemble framework is developed for Selforganizing Map clustering results, similarly to this work.
In this paper we construct a new ensemble framework that combines independent realizations of multiple clustering algorithm results, with the aim to provide reliable and robust segmentation regions obtained via unsupervised learning methods. The ensemble framework is devised to increase stability and robustness of the clusters detected by an unsupervised learning algorithm from image data. We use the SelfOrganizing Map (SOM) algorithm (also known as Kohonen’s map; kohonenSOM; kohonen2013essentials) as our base clustering algorithm, but labels obtained from other clustering algorithms may also be used.^{1}^{1}1 The SOM is chosen for its topology preservation quality, ability of obtaining fine structures from image data, and conceptual simplicity. The SOM algorithm has been used and further developed in thousands of scientific research papers and used in many fields of science including medical sciences, financial sciences, and speech analysis (see e.g., kohonen2013essentials). We show that the resulting region of interest (ROI) of the objects from the combined ensemble algorithm are more easily interpretable and geometrically more stable against fluctuations. This makes the proposed method ideal for automating the segmentation procedure of computer simulations. We, however, note that the new ensemble method may also be used for many other similar image segmentation tasks with no ground truth information, where high accuracy of the ROIs are needed like, for example, in medicine, remote sensing, or biology where images need to be dissected and segmented into different structures automatically (2020Qiu; 2020Xiangchun)
2 Data
Our main aim is to dissect the simulation data snapshot pixels into distinct physical structures based on the similarity of their feature vectors. We are, in particular, interested in automating the grouping of pixels into different physical categories whose geometrical size distributions we, in the end, want to measure.
We apply our image segmentation algorithms to fullykinetic particleincell simulations of decaying relativistic fullykinetic turbulence (see e.g., Zhdankin2017; Comisso2018; Nattila2019). Here we omit most of the data interpretation and physics discussion and focus instead on the results and performance of the image segmentation algorithms. We refer the reader to Nattila2019 for a presentation and discussion of the various technical simulation parameters. We also note that no ground truth data exists in our case. The situation is common for many nonlinear physicsmotivated problems that have no apriori known solution; instead, both the new algorithm and the system are studied simultaneously.
The analyzed input data is complex, sophisticated and the features are highly correlated. It consists of large 2dimensional rectangular images where each pixel carries multiple features (i.e., a feature vector). Generalization to 3dimensional data input is straightforward; here we focus on dimensional case only to simplify the visualization of the results. We describe each pixel with a set of restricted invariant physical features obtained from the initial data because image analysis features need to be as invariant as possible (kohonen2013essentials).
One simulation image snapshot consists of roughly million pixels with each pixel storing multiple physical quantities. We perform the clustering by using consecutive snapshots from the simulations, with equal time steps between the samplings (corresponding to roughly one eddy turnover time in physical units). This increases the total number of data points to roughly million pixels. For our segmentation analysis we select a dimensional data vector . These characteristics were chosen by relying on domain knowledge and on the fact that they capture most of the variability in the data.
Physically we expect that the perpendicular magnetic field averages out, , but has a finite rootmeansquare value,
. The quantity is, to a first approximation, seen to follow a normal distribution. The parallel component of the current,
, should be small except for a few highly localized regions; its distribution is therefore expected to strongly deviate from a Gaussian normal distribution. In our analysis we normalize the value of to the theoretical maximum that a uniform charged particle background can support, yielding . The quantity is seen to strongly deviate from a normal distribution due to heavy tails. The third feature, , is a measure of an energy transport from the electromagnetic fields to the plasma; for a volumetric dissipation we would expect it to be a constant—in reality its value is highly variable, reminiscent of the intermittency of the turbulence. In our segmentation analysis we normalize its value with its rootmeansquare value. The quantity deviates from a normal distribution due to overpronounced tails. All of the described features have nontrivial mutual correlations.Figure 1 visualizes the used simulation data. The visualizations shows five physical features processed from the raw simulation data. The upper left panel shows the whole simulation box consisting of a little over million pixels. The depicted feature is the plasma number density (in units of initial number density ) at a physical time of eddy turnover times. Rest of the images are zoomin views to the simulation box. Left panel of the second row shows the current along the outoftheplane axis, . Right panel of the second row shows the magnetic field strength perpendicular to the outofplane axis, . The lower left panel visualize the work done by the parallel electric field and lower right panel the plasma bulk Lorenz factor, . All of these visualized features have multiple selfsimilar structures that are clearly visible in Fig. 1. For example, we identify circular islands that coincide with a high signal in , and in . Another prominent feature are the currents sheets that correspond to a maximum in and , and a minimum in .
3 SOM algorithm
We start by presenting a short overview of the SOM algorithm and its process parameters, following kohonen2013essentials, Kohonen1989 and kohonenSOM. Next, we discuss some of its shortcomings that motivated the development of the subsequent stacking framework.
3.1 Theory
The data sample is described by a vector , , , where each element describes a set of input variables , such that , , , . We adopt a regular 2dimensional rectangular shaped Kohonen neural map of dimensions . kohonen2013essentials advises to select map dimensions such that they describe the lengths of the first principal components and for a bigger map size to detect fine structures. A rigorous choice for the dimensions and architecture of the map will result in a faster convergence.
Each neuron on the map is associated with a parametric vector
. The initial values for the parametric vectors of nodes may be sampled randomly from the domain of the input vector parameter space. A more sophisticated nomination could also be used, if needed (kohonen2013essentials; valova2013initialization).Let denote the distance metric used to determine the Best Matching Unit, (BMU) from the 2dimensional neuron map, which is the neuron most similar to the sampled data vector ,
(1) 
In this work we use the Euclidean distance on normalized data vectors and neuron models , but other measures can also be used. The data vector is compared to all the neurons . The neuron with the smallest Euclidean distance is chosen as the BMU.
A neighborhood consists of all the nodes up to a certain distance from a node on the neuron map, according to some geometric distance. The set usually shrinks with time and is determined by the neighborhood function . The choice of influences the map’s ability to order and learn the underlying data distribution (kohonenSOM; lee2002self). In the SOM algorithm learning step, both the BMU and its spatial neighbors learn from the input vector.
The learning step to modify the neuron weight vectors is given as
(2) 
where is a discrete time value. The value is the learning rate, which determines the statistical accuracy of the neuron map and the ordering of the neurons on the map. The function acts like a neighborhood function and for convergence as . Neighborhood function determines the rate of change for neurons on the map (kohonenSOM; kohonen2013essentials; stefanovivc2011influence).
Neurons in the neighborhood of the winning neuron will be updated in accordance to a chosen criterion during the learning step. During this the smoothing of the map takes place. The matching law used in Equation (1) and the updating law in Equation (2) need to be compatible.
The algorithm is stochastic, which means that the reliability of the learning is also dependent on the number of training steps. kohonenSOM proposes an empirical rule of thumb of times the count of network units (neuron map size) for the total number of training steps.
The result of the SOM is a dimensional neural map representing the dimensional input space. Each input sample vector has a neuron on the neuron map, which describes its parametric vector the best on the twodimensional map.
3.2 Stability of SOM clustering
The obtained clusters depend on the initial neuron map size (), learning rate , neighborhood function , training step size, distance metric, the rule for joining the neurons, sampling of the input vector , the nature of initialization of the neurons and the stochastic nature of the algorithm. Thus two SOM algorithm evaluations on the same data, applying same process parameters and initial functions, commonly converge to somewhat different clustering results (statisticsSOM). In addition we do not actually know, whether the algorithm has reached a global optimum.
We apply the SOM clustering method to astrophysical plasma simulation data ’images’. Our goal is to use a clustering algorithm to detect different physical structures in the data. We observe that the results obtained with the SOM algorithm are strongly dependent on the selection of process parameters, iteration steps and randomness of the initial conditions. This volatility of the segmentation is visualized in Figure 3 showing four independent SOM runs with only slightly different process parameter values. Notably, we obtain differing cluster sets for the same input image.
4 Stacking of SOMs in the SCE framework
We will now present a new theoretical description for stacking of many clustering evaluations in order to obtain more stable clustering regions. We call the algorithm the Statistically Combined Ensemble (SCE) since it is based on clustering results in not one map but in a statistically combined ensemble of independent maps. We emphasize that the method is not limited to the use of SOMs as the base algorithm. The SCE ensemble framework is specifically designed for unsupervised image segmentation ensembles.
In this case study, first we obtain independent SOM clustering results for astrophysical simulation image data. The SCE framework stacks cluster maps detected in independent SOM algorithm runs and gives an estimate to how well the observed cluster fits with other clusters detecting a similar structure. A key realization that enables to combine different SOM results is that we use the projected original images to perform the stacking operations; not the SOM results themselves that are bound to change from one realization to another. Metrics to estimate the goodness of fit between cluster maps are then constructed. As a result we obtain cluster elements that are the best detected in all the independent SOM runs.
4.1 Cluster masks
Applying the SOM algorithm to an image of size will result in dividing the original data into clusters. Each pixel , where and , on this image can be associated with a cluster from the set of detected clusters , where .
For every cluster in the set of clusters a mask can be defined. A mask is a boolean matrix , such that
(3) 
Elements in this mask matrix consists of values if the pixel is in the observed cluster and otherwise. Using the set of clusters we have now defined mask matrices . The mask matrix describes the locations and area of the structure on the image providing a mapping from the cluster groupings into the initial data view.
We can then apply the SOM algorithm independent times on the same image data. This means that for each pixel on the image we will have independent clusters. In other words, we have gained cluster sets , with elements. From these, we can then create independent sets of mask matrices , , , , , , according to Equation (3). We use a notation where the upper index of a mask matrix refers to the SOM realization index and the lower index to the detected cluster in that realization.
4.2 Masktomask stacking
Let us randomly select a set of masks , , , , from the set of independent mask sets . We call this mask set the base mask set. We compare each mask in to every other mask in the set . For every mask in we obtain comparisons. The comparison is performed between every mask in a randomly chosen base mask set against every other mask set independent of . This process is carried out as long as each set of masks has been chosen as the base mask set.
In supervised segmentation the obtained result is compared to the known truth about the data. In the unsupervised case the base mask matrix is compared to every other cluster realization from . The stacking of masks from on the base mask from is used to evaluate the reliability of the base mask.
4.2.1 Similarity measures
We define three different similarity measure matrices for a mask in the base mask set , with masks and , and a mask from a mask set , with masks, and . These matrices are: The union matrix , with possible values of , defined as
(4) 
where and . The intersection matrix , with possible values of , defined as
(5) 
where and . The sum matrix , with possible values of , is defined as
(6) 
where and .
We calculate these similarity measures between mask in the base mask set and every mask in the set .
This is done until all masks in the set have been chosen as a base mask set.
4.2.2 Signal strength
Using the similarity matrices we can construct a socalled signal strength measure. It quantifies how well the observed cluster resembles other clusters. We denote two independent clusters similar, if their mask matrices are identical.
The signal strength metric is identical to the Mean IntersectionoverUnion (MIoU) metric commonly used in supervised segmentation. In the supervised learning the MIoU metric is used to compare the detected segmented objects to the ground truth — information that is lacking in the unsupervised learning case. The metric is defined between independent realizations of unsupervised clustering algorithms. Every evaluation set is taken as the ground truth and each of these ”true” cluster masks matrices are compared to every other cluster result from independent realizations.
The signal strength gives an estimate for the strength of intersection of the mask matrices, measuring the intersection in comparison to the union of the two masks. For any base mask matrix and a random mask , where , the signal strength scalar is defined as
(7) 
The signal strength scalar weighs the intersection matrix (Equation (5)) of two masks with their union matrix (Equation (4)). This is done to eliminate the cluster size dependency; large clusters will likely have more common pixels with any other clusters in independent SOM runs, so we need to counter this effect by normalizing the quantities with the union measure.
If , then . This means that the two masks have the value in the same locations and in same quantity in their matrices. In that case, these two independent SOM algorithm evaluations have detected the same shape structures in the same pixel locations.
The signal strength scalar is a normalized quantity, which makes it a quantitative measure to order and rank different masks. The mask will have elements in the vector of all signal strength scalars, a vector noted as . Each mask in the set of all masks will obtain a signal strength vector. Using this measure we can order the masks.
4.2.3 Quality of cluster unions
Another accompanying measure to the strength is the socalled quality measure as it gives an estimate to the quality of the union of the stacked masks. The best quality measure for a specific cluster corresponds to having a union of two masks close to their intersection. This corresponds to a small residual area, area of the symmetric difference in set theoretical notion, between the masks. This means that their mask matrices overlap perfectly resulting in a union that is equal to the intersection. The worst quality measure, on the other hand, corresponds to a case when the area of the union of two masks is close to the area of their sum.
For any base mask matrix and a random mask , where , the quality scalar is defined as
(8) 
The value of quality scalar goes to zero as , which means the two independent stacked masks and have detected exactly the same structure in the same locations on the image. On the other hand, as goes to unity, the element sum of the union matrix and element sum of the sum matrix is equal. This means that the intersection of the two masks is negligible and they correspond to completely different structures. The union quality measure can be used to discard clusters from independent SOM runs that have detected completely different structures from other evaluations.
This metric is similar, but not equivalent, to the Dice coefficient that is widely used in the supervised image segmentation. Using the intersection and union matrices defined in this paper, the Dice coefficient is defined as . The Dice coefficient is positively correlated with the MIoU whereas the of cluster masks is negatively correlated to the MIoU, or in our concept.
4.3 Stacking of multiple masks
In this subsection we will generalize the masktomask comparisons to general quantities averaged over the complete set of independent realizations. To do this, we combine the signal strength and quality of union . We will denote this matrix with
, as it estimates the goodness of the fit between independent cluster masks. The measure peaks in areas, where the base mask has often a high percentage of its area in common with other detection algorithm clusters. This results in highlighting the highest probability structures on the image. The measure gives a quantitative value to estimate the goodness of fit between independently detected cluster masks. Cluster masks, which have detected the same structures, will fit together well and contribute more to the total sum of the measure. Similarly, cluster masks that do not represent the same physical structures end up not contributing significantly to the total integrated goodness measure for that base mask.
In subsections 4.2.2 and 4.2.3 a mask in the base mask set is combined with every mask in the set . This corresponds to signal strengths and qualities of union . The goodness of fit of a cluster mask to any other cluster mask from , is defined as
(9) 
where and are the signal strength scalar and quality of union scalar of the masks and . The matrix refers to all the pixels on the image that belong to either of these two cluster masks. For an observed mask in the base mask set we obtain matrices according to the Equation (9), which have the corresponding quotient value in the union of the two compared masks. Then for every base mask in all of its matrices can be summed together to yield
(10) 
where . This is done for every base mask, until all mask sets in have been chosen as a base mask set. We then obtain summed goodness of fit matrices . Each of them characterizing the fit of a cluster mask to all other cluster masks detected in other independent SOM cluster evaluations.
By summing the quotients of the signal strength and the union quality in the Equation (10), we will accumulate value to pixels, where the base mask and other independent masks have detected a structure. The value added to the by a base mask and a compared mask will be high for those pairs, that have a similar number of pixels assigned to the cluster and their masks have value in same location. The contribution from mask pairs, which detect distinct structures on the image, is negligible since the will be close to zero in value for those cases.
We apply the stacking framework on clusters detected by the SOM algorithm. From SOM results we attain in total cluster masks. After calculating the for a randomly chosen base mask , each pixel on the image obtains a value between and illustrating its stability of belonging to the cluster with that shape and location. In a sense every cluster realization from all of the SOM realizations is viewed as the ground truth for a cluster.
Next we define a scalar value for the goodness of fit for as the following
(11) 
where the sum is taken over all the comparisons between and every other mask in .
By ordering the scalar of all cluster masks, we can find the base mask, which has learned most of the information that describes this cluster. In other words, this mask fits the best with all the other independent realizations of that cluster. Same cluster realization base masks will likely follow the winning base mask stack on the ordering of the
value. Correlated classifiers for clusters from an unsupervised ensemble framework indicate an accurate classification
pmlrv51jaffe16; Platanios2017; ROKACH2009. A base mask detecting a cluster from the image is a classifier for a given object. This means that cluster masks with values that are highly correlated indicate that the cluster realization is accurate.Therefore, same clusters are bound to have similar goodness of fit measures; a large change in its value indicates a physically different cluster group or an nonaccurate classifier for the given group.
4.4 Summary of the SCE algorithm
SCE framework is summarized in Figure 2. The input data is a set of simulation images obtained from an astrophysical simulation, each pixel has 3 different physical features attached. These images are fed into N independent unsupervised clustering algorithms, to obtain cluster realizations, with elements (in this work the SelfOrganizing Map was used). Clustering results are then deconstructed into sets of mask matrices
where each mask represent a cluster detected in the image in the specific clustering run. These masks are then compared with all other independent cluster masks with a quality of union metric and signal strength metric. These similarity measures are combined together to create a goodnessoffit metric the . Then for each cluster mask all of its matrices are summed together to yield the matrix. These stacked matrices for each detected cluster are the output of the SCE framework.
5 Applying SOM and SCE
In this section we apply the SOM and then the SCE method on complex images of turbulent magnetically dominated collisionless plasma simulations described in Section 2. The images we dissect — and that inspired us to develop the discussed method — are 2dimensional computer simulation images. Our main aim is to dissect the simulation snapshot pixels into distinct physical structures based on the similarity of their feature vectors. We are, in particular, interested in automating the grouping of pixels into three different physical shapes: magnetic flux tubes (islands), current sheets (long stripes), and background medium (see e.g., Zhdankin2017; Comisso2018; Nattila2019, for a more detailed discussion of the physics).
Similar problem setups can also be envisioned in many other fields of science like, for example, in medicine where medical images need to be dissected and segmented into different structures automatically.
5.1 Results of SOM segmentation
We apply the SOM algorithm to consecutive simulation image snapshots. We use rectangular shaped neuron map with dimensions . The learning rate values are chosen to be , , or . Number of total iterations is chosen to be , , , , or .
SOM algorithm is applied on the studied images with all the possible aforementioned parameter combinations. This gives independent SOM clustering results for the astrophysical plasma simulation images. Figure 3 visualizes results from four representative SOM clustering algorithm realizations projected back to the original image view. The resulting clusters can be directly compared to Figure 1. The clusters detected by the SOM algorithm on Figure 3 correspond to distinct regions in the original images that are also visible in Figure 1.
The output of the algorithm is dependent of the randomized nature of the initial neural map, sampled input data and the process parameters. As a consequence, the resulting clusters differ for each SOM outcome. The geometric sizes of resulting segmented cluster regions vary significantly between independent realizations.
5.2 Results of SCE segmentation
In our showcase we stack a set of SOM cluster masks, which were obtained from the independent SOM runs with different process parameters. The goodnessoffit metrics described in Section 4 were used to find most stable structures in the image and discard bad SOM maps.
As an example, Figure 4 visualizes two cluster maps of two SOM algorithm results labeled as map and map . Map has detected clusters and map clusters. Thus, the corresponding mask sets are , , , , and , , . By taking the cluster mask from map and cluster mask from map , we can calculate the intersection matrix (Equation (5)) and union matrix (Equation (4)). Figure 4 bottom row images show the intersection and union matrices for these masks. The pixels colored black correspond to the value and white to . The matrix highlights the common pixels of and . The matrix highlights pixels belonging to both and . One can see that and are similarly shaped and have similarly positioned structures, suggesting that map and map have detected the same structure from the input image.
The signal strength (Equation (7)) for base mask and a mask is shown on the left panel of Figure 5. The metric compares the fraction of overlap between mask and mask (see Figure 4). Value of the quantity is the highest on the location where the two masks overlap.
The right panel of Figure 5 visualizes the quality of the union metric (Equation (8)). The depicts how well the two cluster masks and align on top of each other. If their position is exactly the same, then the value for the pixels approaches . This corresponds to having identical masks. In Figure 5 the pixels with values close to indicate a good fit between the two masks.
Figure 6 shows the total integrated scalars of goodness of fit, (Equation (11)) for all the masks detected by the SOM evaluations. We have visually inspected the resulting clusters and recovered the major physical structures from these results: background pixels, circular islands, and sheets.
The background plasma (red points in Figure 6) is detected the best among all the independent SOM algorithm evaluations, since the sum of the stacked values are the highest. The second best cluster detected is that of the magnetic flux tubes or islands, (darkblue diamonds in Figure 6). The third best detected structure is the current sheets (violet triangles in Figure 6). This manual classification is seen to correlate fairly well with the corresponding goodnessoffit value. A large change in the value of is seen to match well with the change of the physical meaning of the clusters. As was discussed in Section 4.3, this fact could be further used to group different clusters between different SOM realizations together, since a same structure is expected to have a similar even if a different base map is used. Additionally, highly correlated base masks detecting same physical structures indicate a high accuracy for detecting the structure.
We also see that after the cluster ranked th there is a sharp drop in the value of indicating that these clusters are only weakly correlated with other clusters in the SCE. We use this fact to discard these points and select only the masks with rank as having a meaningfully strong signal (in comparison to being just noise). A statistically robust method to analyse, group, and select different data clusters based on their goodnessoffit metrics is left for future work.
5.3 Advantage of applying the SCE
The best base mask clusters (highest values) with their corresponding goodnessoffit matrix values are visualized in Figure 7. Here each pixel has accumulated a value (see Equation (10)) to the matrix from every comparison in the SCE. These four panels describe the four best detected distinct clusters in the input data images. These matrices are the main outcome of the SCE algorithm and can be used at least in three different ways to help with the data analysis. The four ”winning” clusters of each physical cluster set are denote with star symbols in the Figure 6.
Firstly, the activated pixels in each images are visually seen to capture each empirically defined classification category; the benefit of the framework is that this work is now fully automated as it could also be performed by just ranking the masks by their . In the practical sense, these matrices can now be used for easily viewing the different physical structures that emerge from the clustering analysis.
Secondly, the resulting matrices provide a clear way of defining the actual segmentation boundaries: we can now set a direct contour cutoff value for the providing a quantitative way of selecting which pixels belong to which cluster. This is especially helpful since we need to perform statistically robust geometrical analysis of the resulting shapes.
Thirdly, throughout this analysis we have used the million pixels composed of a feature vector , , . However, nothing guarantees that these features are the best to detect these physical shapes. Therefore, the presented framework can also be used to determine the best combination of the features for detecting the clusters of interest from the observed images. In practice, the goodness of fit measures can be compared between the different SOM algorithm realizations that have been trained with distinct feature vectors on the same input data. The actual numerical value can be easily used to rank the different feature vector combinations. We leave this analysis for further work.
5.4 Analysis of SCE results
The SCE framework was designed to combine decisions of independent unsupervised clustering results for image data. The clusters present in our astrophysical plasma simulations represent fine geometrical structures. The SCE algorithm is designed to tackle the prevailing issue of not gaining robust regions of interest for the geometrical structures of interest. This is especially important for fine geometrical shapes, whose geometrical features such as width and length are of interest.
Figure 8 compares clusters from two independent ensembles, , and , and different SOM segmentations to asses the quantitative improvement of using the proposed method. It shows pairwise comparisons of all clusters detected by the different methods. We use the signal strength as our comparison metric. It is mathematically defined for the comparison of independent unsupervised clustering realizations in Section 4.2.2 (Equation 7). The signal strength describes the amount of pixels similarly classified by the two classifiers in relation to the union of the compared clusters. The metric is equal to the IOU metric defined in supervised learning. Each diamond on the Figure 8 represent the value between one cluster from and another cluster from . The estimates the agreement between two independent cluster realizations. If the clusters consist of pixels describing the same phenomenon on the image, their will be close to 1. For the contrary, if the clusters describe completely different phenomenon, the metric will be close to 0.
As described in Section (4.3) the SCE framework actually results in a continuous scalar for each pixel on the image, and we therefore have a freedom in selecting how to quantize each mask. We have tested that the SCE threshold level does not play an important role in the results by testing threshold values of and (blue vs. orange diamonds).
Figure 8 demonstrates that the SCE algorithm is capable of stabilizing the regions of interest for geometrical structures on images, the for vs is close to 1 for many clusters. The for the pairwise comparisons of independent clusters in SOM set 1 and in SOM set 2 are significantly lower and they reach the highest agreement of around 0.8. We therefore conclude that the independent clusters from two different SCE results agree systematically better than the independent SOM cluster realizations.
We also note that the distributions of for all comparisons are actually bimodal, as expected theoretically. The difference originates from comparing masks that have detected the same cluster (corresponding to high similarity and therefore high ) and from comparing physically different clusters (poor similarity and low ). The cutoff value is present around for SCE comparisons and at for SOM comparisons.
6 Conclusion
We use computer vision and machine learning tools to automate the segmentation of physical structures from 2dimensional image snapshots originating from large supercomputer simulations of fullykinetic turbulent plasma flows. Machine learning clustering algorithms provide a fast and powerful method for detecting and segmenting visually distinct structures in data. This makes them an attractive choice for automating data segmentation of computer simulation results: Firstly, such algorithms are typically fast to evaluate which allows to couple them directly to the simulation time advance loops. Secondly, unsupervised algorithms need very little initial input from the user.
Many such clustering algorithms are, however, nondeterministic and so the end result depends on the initial conditions and different technical process parameters that are used. This can be an especially prohibiting feature in science applications where accurate and stable ROI boundaries are needed for e.g., in order to perform geometrical measurements of the segmented objects. Here we designed a new machine learning framework that combines clustering results from multiple different segmentation realizations. By averaging segmentation results from many independent clustering realizations the presented Statistically Combined Ensemble (SCE) algorithm can yield much more accurate ROI objects. The SCE algorithm can be used to

determine optimal number of meaningful clusters present in the input data,

uniquely identify each pixel with the corresponding object cluster, and

segment clustered pixels into stable contours that can be further analyzed for their geometrical shape, size, and area.
The SCE algorithm uses image cluster masks as a base unit for the stacking operations; each individual clustering realization (obtained here via SOM algorithm) can be split into different image masks where only the pixels from one specific cluster are ’active’. We defined stacking operations of these masks by using the union, intersection, and sum of two masks. These quantities are shown to reflect the cluster signal strength and quality of the mask matching in each comparison. The resulting quality and strength measures can be combined with a weighted average over the complete SCE. This procedure enables to stack and combine information from many parallelly evaluated clustering realizations. This, in turn, enhances the accuracy of the detection algorithm and makes it suitable for use in many science applications where accurate and stable ROI boundaries are needed.
In the future, we plan to apply the SCE method for studying the spatial and temporal properties of intermittent turbulent structures found in the simulations discussed.
Acknowledgements
We thank Pekka Manninen for useful discussions related to designing the stacking operations and the two anonymous referees for their comments and suggestions that helped to improve the paper. MB would like to thank Elmo Tempel and Radu S. Stoica for support and discussions. MB acknowledges support from NORDITA via the Visiting PhD program and JN via the NORDITA Postdoctoral Fellowship. MB acknowledges the financial support by the institutional research funding IUT402 of the Estonian Ministry of Education and Research and the support by the Centre of Excellence “Dark side of the Universe” (TK133), which is financed by the European Union through the European Regional Development Fund. The work of MB was also supported by the European Research Council Consolidator grant 682068PRESTISSIMO. The work has been partially performed under the Project HPCEUROPA3 (INFRAIA 20161730897), with the support of the EC Research Innovation Action under the H2020 Programme. The simulations were performed on resources provided by the Swedish National Infrastructure for Computing (SNIC) at PDC.