Analyzing causality is one of the central tasks of science since it influences decision making in such diverse domains as natural, social and health sciences. Causality is the relationship between two events, if changes of one (cause) trigger changes of the other (effect) . In our previous work , a causal analysis model was developed for analyzing causality of multivariate and nonlinear data (unlabeled in nature). In that model, different Self-Organizing Maps (SOMs) 
for input and output data sets were networked using a weight association based on the connection prototype feature vector similarity. Given such SOMs, the similarity weights conditional on a given input could be assigned to the neurons of the output SOM. Such a weighted SOM pattern of the output is described by two information types:the weight distribution and the property (prototype feature vector) distribution. For assessing output changes by input changes, it is crucial to measure the property difference of weighted SOM patterns.
There have been many attempts to measure the dissimilarity between two distributions (patterns) such as the Minkowski and the Shanon’s entropy families [1, 4, 6], the Quadratic Form Distance  and the Earth Mover’s Distance (EMD) 
. For weights with adaptive neurons in a weighted SOM pattern, we found the EMD to be the most suitable method in measuring the dissimilarity of weighted SOM patterns. The EMD, as the others, aims to provide a numerical value to define only a notion of the overall resemblance of patterns. It cannot differentiate between weighted SOM patterns if they have the same dissimilarity but different properties. Therefore, we introduce a method, called Property EMD (PEMD), to measure the property difference by individual feature differences in the pattern change. However, it is still difficult to represent and to compare the overall property dissimilarity in the change for high dimensional data. It is also difficult to observe possible feature values that gain the pattern change.
Due to the limitations of quantitative approaches, we propose a visualization approach for measuring the property change of weighted SOM patterns along with the PEMD. Our visualization integrates colors and graphical shape objects of star glyph to represent the pattern information in the change. Using this approach, the property dissimilarity of weighted SOM patterns can be captured with the size and the direction of the change. Possible feature values that gain the pattern change can also be observed by exploring regions of interest in weighted SOM patterns. Ecological data is used to demonstrate that our approach is useful for the pattern property comparison in the pattern change. The experimental results show that our approach provides the change information considered for causal analysis in an effective visual way.
A Self-Organizing Map (SOM)  projects high dimensional data onto a low dimensional (typically -dimensional) grid map space. A set of neurons of the map, which are prototype feature vectors adaptively projected for original feature vectors, reflects the data properties. Using the causal analysis model in our previous work , a weight distribution is estimated on the property distribution of the output SOM for a given input. Fig. 1 shows a simple example illustrating two information types in a weighted SOM pattern: the weight distribution and the property distribution. The SOM in Fig. 1 is used as it is easy for visualizing -dimensional RGB color property and position. Based on the SOM, several weighted SOM patterns ( - in Fig. 1) are created by different color opacity values (weights). Such a weighted SOM pattern can be depicted as , ,..,, where is the number of neurons (); is th prototype feature vector (, ,..,, where is the number of features (); is the th component of the prototype feature vector) and is th weight, representing the two information types.
The perceptual dissimilarity between two weighted SOM patterns in Fig. 1 can be measured by observing the color properties in the highly weighted neurons. The patterns and are different patterns by their color properties although they have the same distance in relation to their changes from . The pattern shows the change in two perspectives by the two different color properties highlighted in . Such differences can be measured using the -dimensional color property. Nonetheless, it is still difficult to estimate the size and the direction of the change with respect to the pattern property. Furthermore, it becomes harder to measure such differences in higher dimensions.
There have been many methods to quantitatively measure the dissimilarity between two distributions (patterns) of high dimensional data. The most used families of functions are the Minkowski and the Shanon’s entropy families [1, 6]. However, these functions do not match perceptual dissimilarity well since they only compare weights of corresponding fixed bins . The families do not use the similarity information across neighboring bins such as adaptive neurons of weighted SOM patterns. When considering the information across bins, the Quadratic Form Distance (QFD)  and the Earth Mover’s Distance (EMD)  are the most used functions. However, the QFD tends to overestimate the dissimilarity of patterns as the weight of each bin is simultaneously compared with weights across all bins . On the other hand, the EMD uses the ground distance of feature vectors across bins for the minimum weight flow providing better perceptual matches.
The EMD, as the other functions, aims to provide a numeric value to define only a notion of the overall resemblance of patterns. The overall dissimilarity itself cannot differentiate between weighted SOM patterns when the feature vector distance and the weight distribution have the same relation but different feature values such as and from in Fig. 1. This explains that such patterns can be further differentiated by the information of the pattern property in relation to the weight distribution. Moreover, it is difficult to identify the two different properties in that gain the change to (Fig. 1). In an attempt to handle these issues, we propose a visualization approach based on a metric using the EMD to measure the property change of weighted SOM patterns.
3 Our Approach
In this section, we propose a visualization approach that uses a metric for measuring the property difference of weighted SOM patterns in order to capture the change information based on the property dissimilarity.
3.1 A Metric for Pattern Property Difference
In order to measure the pattern property difference, we introduce an extended function of the EMD, called Property EMD (PEMD). The PEMD measures the individual feature difference in the pattern change based on the capability of the EMD for the pattern dissimilarity measure.
According to , the EMD between two weighted SOM patterns and is defined as follows:
where is the ground distance function and is the minimum cost flow under constraints: : , : , : , and min. The weighted SOM patterns and are based on the same SOM; thus, they have the same number () of neurons and their weights equally sum to .
Based on the EMD, the difference of a feature in for given can be measured by a function as follows:
The direction of the feature change is accounted for by the difference measure. This distance is then defined as the resulting feature difference in the change normalized by the total work flow of the EMD. The feature difference is normalized to avoid favoring larger differences between pattern changes in the comparison.
|Pattern Change||Scaled EMD||PEMD||PEMD||PEMD|
The individual feature differences of the weighted SOM patterns in Fig. 1 are measured by the PEMD for the property comparison. The EMD is also measured and scaled by the maximum EMD of the data space for the dissimilarity comparison. As the results show in Table 1, the individual feature differences can be used to explain the property difference between the patterns and changed from , which show the same EMD. However, it does not explain that the patterns and are not the same as shown by the EMD. This shows that the patterns can be further explained by possible feature values that gain the pattern change. Furthermore, it is difficult to compare the overall property dissimilarity if the dimensionality is high. Therefore, we propose a visualization approach in the next section for better analyzing the pattern change of high dimensional data based on the property difference by the PEMD.
3.2 Visualization of Pattern Property Changes
Our visualization integrates colors and graphical objects, which are perceptually orthogonal , to represent the pattern dissimilarity of high dimensional data. Hue colors and star glyph shape objects are used to view a weighted SOM pattern. The scaled hue colors in Fig. 2 are used to indicate high weight by red and low weight by blue. The graphical object mapping of prototype feature vector into a star glyph shape represents a neuron property in a SOM. A star glyph  has evenly angled branches emanating from a central point in the same ordering of dimensions. The length of each branch marks the value along the dimension it represents, and the value points are connected creating a bounded polygon shape. The patterns in Fig. 1 are illustrated in Fig. 2 - . As the shape is used as a single visual parameter, it is easier to recognize the property difference of neurons by only considering the shape variations in the fixed orientation . The perceptual dissimilarity of hue colors indicates a clear boundary of the weights. Thus, it facilitates the user selection of regions of interest for understanding the main properties of the weighted SOM patterns.
The property dissimilarity between weighted SOM patterns can also be visualized by integrating colors and star glyph shape objects. As shown in Fig. 2, a star glyph shape created by individual feature differences is imposed on the -branch star glyph shape frame indicating the overall property dissimilarity. The PEMD value for each feature is scaled in the range of  for the visualization. The average of the PEMD values is used to indicate the direction of the overall property change by applying it to the color saturation. The property shape is filled with the direction color; red for increase, blue for decrease and white for no change. The property direction color can differentiate between property changes if they have the same property shape but opposite directions of any individual feature changes. The possible value changes in each feature can be visualized depending on the user selection of regions (e.g. highly weighted regions). The possible value changes are indicated by red and blue for increase and decrease respectively (empty dots for the reference pattern and full dots for the changed pattern). The line with red or blue from the center to 0.1 indicates the direction of the individual feature change for increase or decrease, respectively. The overall property dissimilarity of the patterns can then be captured by the property shape variation with the change information in the fixed orientation. The EMD is also visualized by filling its gray scale in the frame for the dissimilarity comparison. Fig. 2 shows that and are not the same pattern by the EMD while there is no property difference in the change. This can be explained by observing the change () which shows the possible changes in the features and by the same size increase and decrease while making no change to the pattern property. Fig. 2 and show that and obtained from are very different patterns by their different property shapes although they show the same EMD as the size and the direction color of the shapes.
In summary, our approach can measure the dissimilarity of weighted SOM patterns in terms of the pattern property. It measures the individual feature differences by the PEMD and captures the overall property dissimilarity by the visualization in the pattern change. It facilitates a simultaneous comparison of weighted SOM patterns and provides the information of how the patterns change.
4 Experimental Results
In this section, we test our approach by applying it to the ecological domain data111The ecological features: (Shredders), (Filtering-Collectors), (Collector-Gathers), (Scrapers) and (Predators) for Biological data set; (Elevation), (Slope), (Stream Order), (Embeddedness) and (Water Temperature) for Physical data set. The feature values are all standardized for the total data.  for analyzing changes of output pattern by changes of input in the causal analysis. The physical and biological SOMs were trained using hexagonal grids by the minimum values of quantization and topological errors. The physical input SOM was associated with the biological output SOM. Among the physical features, it is known that Embeddedness () has a strong impact on the biotic integrity . Thus, for our experiments, we varied the value of in the physical input to examine the changes of the biological output. The value of was increased by and standard deviations (SD) for the first and the second change, respectively, while the others were fixed at the initial value (
SD). The standardized Z-score values of data used in the data analysis were converted to T-score values for visualization using our approach.
Fig. 3 shows the weighted biological output SOM patterns for the physical input given in , the first changed in and the second changed in . More than one region is highly weighted in showing the high possibility of having different biological outputs for the given physical input. The reddish regions of each pattern are selected for analyzing the change. The difference of the patterns are measured and visualized using our approach in for the first change and for the second change. More information can be added in the view and we have added the significance information, measured on the difference of every weighted feature distribution using the Kolmogorov-Smirnov test . The insignificant changes are indicated by yellow and cyan while the significant changes are indicated by red and blue for increase and decrease, respectively.
In Fig. 3 and , the user can observe that the second change is larger than the first change with a similar tendency of the property change. This can be explained by comparing the size, the direction color and the similarity of the property shape as well as the EMD gray scale in the frame. The individual change of each feature is also captured by the change information in the center of each branch. It shows that the changes in and become significant and the changes in , , and become larger when the value of is increased. Based on the selected regions, the possible changes are also detailed on each branch. In particular, both increase and decrease are seen in , which cannot be provided by a quantitative measure. Throughout the experiments, the user can derive the impact of (Embeddedness) as its increase lowers the balance of the biological composition of the ecosystem. The causal effects are more effectively analyzed by considering all possible changes for well-informed decision making. Our approach supports this by detecting regions of interest and providing the change information visually; thus, it can be very useful for comparing the pattern changes in the process of causal analysis.
In this paper, we have presented our approach for analyzing weighted output SOM pattern changes by input changes in causal analysis. We elucidated the idea of analyzing the change of weighted SOM patterns by comparing the dissimilarity of the pattern properties corresponding to the weight distributions. Our approach measures the property difference using a metric and uses a visualization to measure the property change of weighted SOM patterns. Throughout the experiments, we have shown that our approach is useful for measuring and comparing the pattern property in the change of weighted SOM patterns. We also facilitated exploring regions of user interest and capturing all possible changes to the pattern property. The experimental results show that our approach provides the property change information in an interactive and effective visual way when analyzing causal effects.
Cha, S.H.: Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences 1, 300–307 (2007)
-  Chung, Y., Takatsuka, M.: A Causal Model using Self-Organizing Maps. In: Proceedings of ICONIP’15, Lecture Notes in Computer Science. pp. 591–600 (2015)
-  Giddings, E.M.P., Bell, A.H., Beaulieu, K.M., Cuffney, T.F., Coles, J.F., Brown, L.R., Fitzpatrick, F.A., Falcone, J., Sprague, L.A., Bryant, W.L., Peppler, M.C., Stephens, C., McMahon, G.: Selected physical, chemical, and biological data used to study urbanizing streams in nine metropolitan areas of the united states, 1999-2004. Technical Report Data Series 423, U.S. Geological Survey (2009)
-  Johnson, D.H., Sinanovic, S.: Symmetrizing the Kullback-Leibler Distance. Tech. rep., IEEE Transactions on Information Theory (2000)
-  Kohonen, T.: Self-Organizing Maps. Information Sciences, Springer-Verlag, Heidelberg, 3 edn. (2001)
-  Kullback, S.: Information Theory and Statistics. Courier Corporation (2012)
-  May, W.E.: Knowledge of Causality in Hume and Aquinas. The Thomist 34, 254–288 (1970)
-  Niblack, C.W., Barber, R., Equitz, W., Flickner, M.D., Glasman, E.H., Petkovic, D., Yanker, P., Faloutsos, C., Taubin, G.: Qbic project: querying images by content, using color, texture, and shape. Proc. SPIE 1908, 173–187 (1993)
-  Novotny, V., Virani, H., Manolakos, E.: Self Organizing Feature Maps Combined With Ecological Ordination Techniques For Effective Watershed Management. Technical Report 4, Northeastern University, Boston (2005)
-  Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C: The Art of Scientific Computing. Campbridge University Press, 2 edn. (1992)
-  Ward, M.O.: Multivariate Data Glyphs: Principles and Practice, pp. 179–198. Springer Berlin Heidelberg (2008)
-  Wong, P.C., Bergeron, R.D.: 30 years of multidimensional multivariate visualization. In: Scientific Visualization, Overviews, Methodologies, and Techniques. pp. 3–33. IEEE Computer Society (1997)