SoRC – Evaluation of Computational Molecular Co-Localization Analysis in Mass Spectrometry Images

09/24/2020 ∙ by Karsten Wüllems, et al. ∙ Bielefeld University 0

The computational analysis of Mass Spectrometry Imaging (MSI) data aims at the identification of interesting mass co-localizations and the visualization of their lateral distribution in the sample, usually a tissue cross section. But as the morphological structure of tissues and the different kinds of mass co-localization naturally show a huge diversity, the selection and tuning of the computational method is a time-consuming effort. In this work we address the special problem of computationally grouping mass channel images according to their similarities in their lateral distribution patterns. Such an analysis is driven by the idea, that groups of molecules that feature a similar distribution pattern may have a functional relation. But the selection of the similarity function and other parameters is often done by a time-consuming and unsatsifactory trial and error. We propose a new flexible workflow scheme called SoRC (sum of ranked cluster indices) for automating this tuning step and making it much more efficient. We test SoRC using three different data sets acquired from the lab for three different kinds of samples (barley seed, mouse bladder tissue, human PXE skin). We show, that SORC can be applied to score and visualize the results obtained with the applied methods in short time without too much effort. In our application example, the SoRC results for the three data sets reveal that a) some well-known similarity functions are suited to achieve good results for all three data sets and b) for the MSI data featuring a higher degree of irregularity improved results can be achieved by applying non-standard similarity functions. The SoRC scores computed with our approach indicate that an automated testing and scoring of different methods for mass channel image grouping can improve the final outcome of a study by finally selecting the methods of the highest scores.



There are no comments yet.


page 4

page 9

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Mass spectrometry imaging (MSI) is a well established and widely used technique to identify the spatial distributions of various molecules within biological samples. The extension of this technology into ever new domains leads to a constant increase in the amount of data and in the diverity of biological samples investigated with this technology. The high dimension of MSI analysis of any MSI data is a complex problem as it combines two domains of information. On the one hand, there is the information provided by many mass spectra. On the other hand, there is the spatial information provided by molecular distributions. In addition, both are influenced by the morphology of the biological sample.

A computational approach to MSI analysis can either target the grouping (or clustering) of the of pixels and their mass spectra [mccombie2005spatial, deininger2008maldi, alexandrov2010spatial, kolling2012whide] or the grouping of masses and their associated lateral molecular distribution patterns [alexandrov2013analysis, wijetunge2014unsupervised, wullems2019detection]. The first approach often pursues the objective of grouping pixels into clusters and the final clustering outcome is visualized as a so-called segmentation map [alexandrov2010spatial, kolling2012whide]. These segmentation maps use colors to present the location of each cluster in the MSI visualization. The second approach pursues grouping of similar lateral molecular distribution patterns into clusters. These molecular distribution patterns are often called mass channel images and are visualized as grey value or heat map visualizations. The idea behind this approach is that co-localization of molecules can hint to molecular interactions, which in turn can be an indication for a relationship within a molecular network. An important requirement for the clustering of mass channel images is the non-trivial problem of the definition of a function to map pairs of mass channel images to similarity or dissimilarity values.

Since the analysis of molecular distributions by mass spectrometry imaging is mainly focused around the analysis of mass channel images, there is a close relation to the field of image processing. The field of image processing has proposed several functions to quantify the similarity of two images. It has been shown that the quantification of similarity between two images can not be defined in a unique way, because visual similarity is often context dependent. There are examples, like natural scenes

[simoncelli2001natural, srivastava2003advances, sheikh2005information] or face images [abaza2012quality], where it has already been shown that some similarity functions perform significantly better than others. Transferring the problem of defining pairwise similarity to the field of MSI we realize the same kind of context dependency. First, there is a biological context, which can be described by the biological and biochemical characteristics of the sample, such as the morphology, molecular networks, biochemical reactions and functions etc. Second, there is the technical context, which is given by the measurement conditions, e.g. the measuring instrument, parameter settings or the matrix protocol applied. This multitude of contexts raises the question if some algorithms are more suitable than others to assess the similarity between two mass channel images in one particular combination of biological and technical contexts. This question becomes even more difficult to answer considering the fact that the grouping of mass images is usually achieved using a pipeline of inter-related consecutive computational steps. These steps include for instance methods for denoising, image/signal enhancement, image transformations and clustering methods. In this manuscript, one particular pipeline of parameterized algorithmic modules as a pipeline setup.

Many previous works on MSI data and grouping mass channel images preferably used quite basic functions for assessing the similarity of mass channel images, such as the Pearson Correlation Coefficient [mather1992cluster], the Cosine Similarity [mather1992cluster] or the Structural Similarity Index [wang2004image]. But most of these works do not provide any motivation for the choice of their function. To the best of our knowledge, there is only one comparative study on different similarity measures for co-localization, which also investigates the influence of some basic pre-processing methods [ovchinnikova2020colocml]. However, the number of methods is considerably smaller compared to our study, the pipeline setup plays a much smaller role and, because of the individual focus of this work, no systematic evaluation procedure is presented.

In this manuscript three biological samples are used for testing our approach in three different biological contexts. We will investigate the influence of the pipeline setup, with a focus on various similarity measures, which are mostly derived from the field of image processing. In addition, we will propose a methodology called SoRC (similarity of ranked c

luster indices). SoRC is a workflow scheme to evaluate the suitability of various pipeline setups. Part of the workflow is the computation of a score for each context, called SoRC score. The score is computed by utilizing two statistical measures that estimate the quality of clusters, called cluster indices. The SoRC scores will be visualized to suggest suitable combinations of methods for a given biological sample and a defined clustering method. This way it will support the user to design and evaluate a good analysis pipeline to cluster the mass channel images of a given biological sample.

2 Materials and Methods

2.1 Data Set Definition

This section briefly introduces the formal description of an MSI data set as used in this manuscript. One mass spectrometry image data set is defined as a multivariate image that consists of rows and columns of pixels , with and . The subset of all spectral positions is defined as . This set describes all positions where the MSI instrument applied a measurement, i.e. a mass spectrum has been recorded. The total number of all spectral positions for and is defined by and . Each spectral pixel consists of measured intensity values, each of which corresponds to a specific mass channel. These mass channels are indexed by and are also named mass-to-charge ratio values (-values), each of which corresponds to the molecular mass. At each spectral position with , the term represents the intensity value of mass channel . For all other positions we set .

It is known that MSI technology suffers from a large variation of different intensity value ranges between individual mass channel images. To compensate for this, each mass channel image was initially linearly scaled into a range of . However, due to write and read operations with the viridis colormap the range shifted into .

2.2 Mass Channel Image Features and Representations

We define the following feature maps and image representations in order to encode pixel-related mass signal characteristics.

Gradient vector image:

To integrate the local signal change and its direction (i.e. the signal value gradient) within a mass channel image we compute a gradient vector image

for each mass.


Here, and are the partial derivatives of with respect to and . Both derivatives describe the local directional changes of intensity values in their respective horizontal or vertical direction.

Magnitude image

Based on the gradient vector image the strength of the local directional changes can be quantified. The resulting image is known as the magnitude image () and can be defined as follows:

Orientation image

The orientation image () can also be derived from the gradient vector image and can be interpreted as a complement to the magnitude image. While the magnitude image describes the strength of each local change, the orientation image describes the direction of each local change.


The vectorization of an image describes the process to transform the two-dimensional image into a one-dimensional vector , which contains only spectral positions. A common method, which is also applied in this work is the “stacking of pixel rows”:

Image histogram

Histograms of intensity values are another popular tool to analyse images. The one-dimensional histogram of a two-dimensional image , over all spectral positions, is referred to as .

2.3 Image Scale-Space Representations

Image scale-space representation are used to analyse structures and details at different granularity in the image [witkin1987scale, lindeberg1994scale, lindeberg2007scale]. Lower scale features emphasize finer structures (like edges, corners and hot spots), while higher scales should emphasize coarser structures (like homogeneous regions). As biological tissue is organized and structured on several scales, scale space representations appear well motivated.

The scale-space representation is a family of images generated by repeated convolution with a two-dimensional Gaussian kernel (), where each repetition defines a single scale level (). Formally, a scale-space representation can be defined as:


The “;” in and denotes a convolution that is performed over all pixels and , while is a constant for the predefined scale level. Two special cases for are , which corresponds to the convolution with a standard Gaussian filter and , which corresponds to the identity transformation . In this manuscript Equation 7 is adjusted by setting . The step size variable is introduced and the definition of is changed to and changed to . denotes a -fold convolution of .

To illustrate the influence of scale-space representations, three sets of different step sizes are used in this manuscript: and

The main idea behind considering these three sets is, that the original scale should be more suitable for fine detailed structures like edges, corners and hot spots, while higher scales should be more suitable for coarser structures like homogeneous regions. The omission of a scale-space representation, i.e.  set above, is used as some kind of control to examine whether there is any value using scale-space representations. Utilizing sets with two different step sizes, i.e. (b.) and (c.), will examine whether there is a potential value using different scale-space levels.

2.4 Pattern Regularity of Mass Channel Images

There is no terminology for describing morphology and structure in a biological sample beyond single chosen contexts (like histopathology classification schemes). And there also is no generally accepted terminology for the description of intensity patterns in mass channel images. In some discussions, morphologies and structures are described with one-dimensional quality scales such as simple, regular, partly regular, irregular or complex [rezaeilouyeh2016microscopic, thiran1996morphological, yuan2006image, guo2016pathology, xing2016robust]. Following this observation we also apply such a scale to describe and rank the structure in biological samples. The scale ranges from the term regular to the other end denoted with irregular. To describe the term regularity we propose and use the following expression: The degree of regularity in a pattern is related to the effort taken to describe the pattern, i.e. with the number of points, lines or geometrical shapes necessary to describe it. To give an intuition for the assessment of different levels of regularity using this expression, Figure 1 illustrates examples of three different levels of regularity using abstract shapes.

We use the proposed concept to describe the irregularity in Section 2.5 below to categorize the samples and related data sets used in our experiments.

Figure 1: Pattern regularity example – Three artificial image set of abstract shapes to exemplify high (A), medium (B) and low (C) regularity. The three sets consist of six images each. Images of the upper and lower row for each example are considered to build a cluster. (A) Artificial images with high regularity. All signal distribution pattern are areal, crisp, well defined and the pattern of the upper and lower row are well disjunct. (B) Artificial images with medium regularity. The patterns are finer and less areal, but still well defined. The pattern of the upper and lower row partially overlap, but due to their difference in shape they are still well disjunct. (C) Artificial images with low regularity. The patterns are noisy, filigree and not well defined. The pattern of the upper and lower row are still disjunct, but not as clearly as in (A).

2.5 Data sets

Three data sets from different kinds of samples (a detailed description of the data sets is given below) were chosen to represent different degrees of regularity in morphological structures, with showing the most regular structures and showing the least regular structures. The three data sets were pre-processed with an in-house software, including signal alignment, normalization, matrix signal removal and peak picking.

Barley Seed

A barley seed cross section with a thickness of 14 µm was measured with a MALDI-ToF/ToF ultrafleXtreme from Bruker Daltonics in positive ion mode. The laser diameter (pixel width) was set to 50 µm, with a laser energy of 44% to 47% and 300 laser shots per spot. DHB was used as matrix. The ion images have a dimension of () pixels, with 3422 spectral pixels. Peak picking resulted in a set of 101 mass channel images, referred to as . The barley seed consists of a small number of compartments, which are disjunct and well defined.

Mouse Urinary Bladder

The mouse urinary bladder data set is an example file from the website. The tissue section has a thickness of 20 µm and was measured with an AP-SMALDI LTQ Orbitrap Discovery from Thermo Scientific, in positive ion mode, with a laser diameter of 10 µm (pixel width). DHB was used as matrix and applied with a pneumatic sprayer. More details about the preparation of this sample can be found in [rompp2010histology]. The ion images have a dimension of () pixels, with 34840 spectral pixels. Peak picking resulted in a set of 150 mass channel images, referred to as . The morphological structures in the mouse urinary bladder data are also well defined and many of the intensity patterns are disjunct. However, some of the morphological structures are filigree, which places this sample further towards the middle of the regularity spectrum .

Human PXE Skin

A female human pxe diseased skin section was donated after surgery and measured with a MALDI-ToF/ToF ultrafleXtreme from Bruker Daltonics in positive mode. The laser diameter (pixel width) was set to 20 µm and DHB was used as matrix. The ion images have a dimension of () pixels, with 75042 spectral pixels. Peak picking resulted in a set of 51 mass channel images, referred to as . The human Pseudoxanthoma elasticum (pxe) skin sample has a more irregular morphology. The molecular patterns, i.e. intensity distributions, are blurred and they overlap. In addition, some of them are wide-spread, while others are quite small and densely structured.

2.6 SoRC Workflow

The proposed SoRC (similarity of ranked cluster indices) methodology consists of three parts: I. Pre-prorcessing (i.e. transformation and enhancement of data), II. Analysis (i.e. mass channel similarity computation (II.1) and clustering (II.2)) and III. Evaluation (i.e. computation of the final SoRC scores).

In part I and II, different parameter settings of a modular pipeline to pre-process and cluster mass channel images are executed. As motivated above, the aim is to find clusters of mass channel images that show similar lateral molecular localization, i.e. similar signal distribution patterns. The implementation as used in this study, with clustering of mass channel images as analysis objective, is presented in Figure 2. Part I is subdivided into two procedures: () thresholding and smoothing and () computation of scale-space representations. The latter is carried out in two variants, referred to as and (see details below). The two kinds of pre-processings are tested and combined in six different setups in our study: no pre-processing, , , , and .

Figure 2: SoRC workflow – Outline of the SoRC workflow for the task of mass channel image clustering. The workflow evaluates multiple pipeline setups to propose the most suitable ones. To cluster mass channel images the computation of a similarity matrix and the application of a clustering procedure are required. The application of pre-processing and computation of scale-space representations is optional. A SoRC score is computed for each pipeline setup to suggest suitable combinations of algorithms for a given data set. The results are visualized as bar-chart-tables, i.e. ranking tables, combined with a bar-chart visualization.

The second part II includes the algorithmic process to achieve to required analytical output, in this case the computation of the similarities between pairs of mass channel images and subsequent clustering. This step uses three different clustering algorithms in combination with 13 different similarity functions (see details below). Together with the six different setups from part I sets this results in different clustering results. However, since the scale-space representations are not utilized by every similarity function, the actual number of different results is lowered to , which is still too much to evaluate manually at the end.

The third part III consists of the computation and visualization of SoRC scores for each result computed in II. The visualization is implemented as a bar-chart integrated within a table. This tabular visualization will be referred to as bar-chart-table.

As stated in the introduction, the intention of this work is to propose a new methodology for evaluating computational pipelines and to promote one pipeline to optimal in some regard. In the following, the single computational steps of part I to III are described in more detail:

2.6.1 Part I: Pre-processing

The following two pre-processing procedures are applied:

  • Thresholding and smoothing.

    • Each mass channel images is zero-padded with a padding width of thirteen pixels.

    • The intensity values are thresholded by Otsu’s method [otsu1979threshold].

    • Gaussian smoothing with a kernel size is applied.

  • scale-space computation.

    • Each mass channel images is zero-padded with a padding width of thirteen pixels.

    • Computation of two scale-space representations according to Equation 7 using () and ().

2.6.2 Part II.1: Analysis – Similarity Functions

We investigate thirteen different functions that cover multiple different approaches. The terms in brackets are used in the results section to relate to the individual function applied. The descriptions in this section are intentionally short and informal to provide an intuition about the concepts of the various functions. Detailed definitions can be found in Supplementary S1.

Multiscale Pearson Correlation Coefficient (Pearson):

The Pearson Correlation Coefficient is one of the most commonly used similarity functions. Since it is defined for one-dimensional vectors or arrays we apply the transformation according to Equation 6. If a scale-space representation is applied to the mass channel images before, the Pearson Correlation Coefficient is computed for each scale level individually.

Multiscale Cosine Similarity (Cosine):

The Cosine Similarity is widely applied and also originally defined for one-dimensional vectors, so we again apply the transformation . The application to scale-space representation is the same as described for the Pearson Correlation Coefficient above.

Multiscale Angular Similarity (Angular):

The Angular Similarity is similar to the Cosine Similarity, but more sensitive to differences between small angles. The scale-space representation handling is the same as for the Pearson Correlation Coefficient.

Multiscale Structural Similarity Index (MSSIM):

The Structural Similarity Index (SSIM) uses local statistics to compare luminance, contrast and structure within the image and combines the result to a similarity index. In its original form, the function is defined for a pair of sliding windows to compute a similarity map for a pair of images. Due to the sliding window approach the local neighborhood (Moore neighborhood) of each pixel is incorporated. An arithmetic mean pooling operation is applied to compute the single similarity value.

Multiscale Multifeature Similarity Index (MMFS):

Similar to the SSIM approach, the Multifeature Similarity Index (MFS) is defined between two sliding windows. It follows the basic formula of the SSIM. However, it does not only consider the pixel intensity values, but also the luminance, contrast and structure features of the orientation images and the magnitude images (see above). So for each pair of mass channel images three similarity maps are computed and fused into one similarity . To combine them into a single similarity map, a pooling function is applied at each pixels position. In order to consider only the most dominant features, maximization is the pooling function of choice. However, the pooling function can be exchanged if desired. Again, the final similarity map has to be pooled to compute a single similarity value. Like in the SSIM, the arithmetic mean is selected.

Shared Pixel Information (Shared Pixel):

The Shared Pixel Information was implemented as a fast pixel-to-pixel comparison function that focuses only on the actual intensity values.

Contingency Similarity Index (Contingency):

The Contingency Similarity Index was implemented to investigate the suitability of multi-thresholding.

Hypergeometric Similarity Measure (Hypergeometric):

Since a previous study reported very good results of the Hypergeometric Similarity Measure [kaddi2011hypergeometric] we tested this measure as well, however in a slightly modified version that achieved better results on our data compared to the originally proposed formula.

Local Standard Deviation based Image Quality Index (Local Std)

An adjusted version of the

Local Standard Deviation based Image Quality Index

[gore2015full] is included to investigate the effect of an intensity deviation based approach.

Intensity-Magnitude-Angle Similarity (IMA Sim):

The Intensity-Magnitude-Angle Similarity is very similar to the Multifeature Similarity Index, but it is based a pixel-to-pixel comparison instead of a sliding window approach. Therefore, the function is faster to compute, but no local neighborhood information is incorporated (except for the gradient and magnitude information).

Gradient Information (Grad Info)

The Gradient Information function is included as a function that considers just the gradient information, i.e. the gradient direction and gradient magnitude, so the intensity of the signal is ignored.

Intensity Histogram Similarity (Histogram)

The Intensity Histogram Similarity compares intensity value histograms () of mass channel images using the Hellinger distance [hellinger1909neue] or another function. In this work we apply the Hellinger distance [hellinger1909neue], as it is directly related to the Euclidean norm, which supports an intuitive understanding of the comparison approach.

Mutual Information (Mutual Info)

Mutual Information is a commonly known information theoretical measure that quantifies the “amount of information”. It is included since information theoretical approaches are regularly used in many different areas of signal analysis [pluim2003mutual].

2.6.3 Part II.2: Analysis – Clustering Methods

The following three clustering algorithms are applied in this manuscript. All three methods are briefly described with more details in the supplementary (S2).

Agglomerative hierarchical clustering

[sokal1958statistical] is very prominent in bioinformatics and applied in many contexts. Data points are iteratively joint into clusters according a selected similarity criterion until all data points are joint into one cluster. In this work we apply the agglomerative average linkage version [sokal1958statistical].

Affinity propagation [frey2007clustering] determines a set of data points that best suited to represent the entire data in an iterative way, by exchanging messages between points evaluating the suitability of a point to be representative for others.

Community detection [wullems2019detection] clusters data points based on the evaluation of a graph representation. The graph is based on an adjacency matrix computed from a similarity matrix computed from the points’ coordinates.

Besides these three clustering algorithms, the publicly available SoRC workflow implementation currently supports -medoids (partitioning around medoids (PAM) version [novikov2019pyclustering]

and expectation-maximization (EM) version

[de2003c]), DBSCAN [ester1996density], OPTICS [ankerst1999optics]

and Spectral clustering

[shi2000normalized]. But the workflow can easily be extended to include any type of clustering procedure, as long as the labels are adjusted to start with index one and noise labels, if they exist, are assigned to unique numbers.

2.6.4 Part III: Evaluation – SoRC Score

The SoRC score is designed to quantify the quality of a clustering result with a single number. The computation of a single score allows an easy visualization, ranking and comparison for a multitude of chosen methods and modifications. This way it increases the efficiency in the process of designing and improving a software pipeline for grouping mass channel images into clusters.

A clustering is considered to be of a higher quality if the result clusters are dense, well separated and represent the most important structural features of the data. The SoRC score uses two different cluster indices, which are not strictly correlated, to express these properties in numbers: The Silhouette Coefficient score (SCS) [rousseeuw1987silhouettes] and the Calinski-Harabasz index (CHI) [calinski1974dendrite]

The Silhouette Coefficient SCS is based on a comparison of a data point’s position in relation to its assigned cluster (cohesion) and to the next (or second best) nearest other cluster (separation). The arithmetic mean over all data points is taken as final SCS value. A higher number of unambiguous data point - cluster assignments results in a higher number of SCS values. So indicates a coherent and well separated (i.e. non-ambiguous) cluster assignment. Lower values indicate lesser separation and lesser coherence.

The Calinski-Harabasz index (CHI)

is defined as the ratio between the within-cluster dispersion and the between-cluster dispersion. The index increases with decreasing intra-cluster variance and increasing inter-cluster distance.

The main difference between these two cluster coefficients (or indices) is that the SCS is computed on the actual similarity values, while the CHI is computed directly on the data points within the clusters. So an integration of both values into one SoRC score enables the integration of both aspects, the composition of the similarity values and the composition of the images (i.e. data points) within each cluster.

Due to their different output ranges (see above) we do consider the rank and not the actual value of the values. So for both measures, the final values are ranked and the rank position values are normalized into . Consequently, higher ranks represent better clusters. Ties are solved by mean ranks, e.g. the values rank to . The actual SoRC score is the arithmetic mean of the normalized ranks of both cluster indices.

3 Results

Before all results are compared with each other, we present an example for the entire evaluation procedure using the agglomerative hierarchical clustering results for the mouse urinary bladder data set (). Figure 3 shows the bar-chart-table visualization for the different combinations in part I and II (pre-processing and similarity function) but with a fixed clustering algorithm (hierarchical) with the highest SoRC scores (out of a total of ). It can be seen that in these examples, functions with pre-processing always score better than their counterparts without pre-processing. For those similarity functions where the integration of a scale-space is available, the larger step size scores better than the smaller step size, which scores better than no scale-space. The evaluation of the individual columns for the SCS and the CHI shows that the indices “disagree” for some combinations, i.e. one index rates the clustering better than the other. This demonstrates the importance of using two not strictly correlated cluster indices that assess different characteristics of a clustering. In general, the SoRC score is higher if both indices rate the clustering similarly. The original values, presented in the last two columns, can be used to either examine the quality difference between two pipeline setups in more detail or to compare the bar-chart-tables for different clustering procedures.

Mouse Urinary Bladder () with Agglomerative Hierarchical Clustering

Figure 3: SoRC score bar-chart-table example – The bar-chart-table visualization shows the highest SoRC scores (out of a total of ) for the mouse urinary bladder () in combination with agglomerative hierarchical clustering. The Method column contains the applied combination of pre-processing, scale-space representation and similarity function. PP indicates pre-processing and 0MS, 1MS and 2MS indicate the utilization of scale-spaces with , and , respectively. The columns SCS Rank and CHI Rank contain the normalized ranks. The ranks are computed on the basis of the absolute values of the respective cluster index, which are shown in the columns SCS Val and CHI Val. The whole bar-chart-table is sorted according to the SoRC scores, which are shown in the Score column.

To finally compare the different clustering methods in the analysis, multiple bar-chart-tables can be compared, as presented in Figure 4. HTML files with bar-chart-tables for each individual combination of data set and clustering method are provided as part of the Additional Files. To provide a clear comparison, the columns of the bar-chart-tables are reduced to the SoRC scores and the rows are reduced to the five highest scoring context combinations. The evaluation of the comparison yields five major observations:

  1. The Cosine Similarity and the Pearson Correlation Coefficient are among the best scoring similarity functions for most combinations of data set and clustering method. The only exception is the human skin data set ().

  2. For and , most similarity functions achieve better results in combination with pre-processing.

  3. The results from our study indicate, that similarity measures based clustering can achieve better results when they include scale-space representations. But there is no clear result whether the smaller or the larger step size performs better.

  4. The five highest SoRC scores for and are higher than for in almost all cases. This shows that the selected cluster indices quantify the clustering quality of the irregular sample differently more often than it is the case for samples with higher regularity.

  5. Global image features, discarding the local features achieve inferior results which indicates that discarding any positional relationship between the pixels is likely to have a negative impact on the clustering result.

  6. The last and probably most important observation is that no pipeline setup always leads to the best score.

Affinity Propagation Hierarchical Clustering Community Detection
Figure 4: SoRC score comparison – Comparison of the top five computed SoRC scores for each combination of data set and cluster method. PP indicates pre-processing. 0MS, 1MS and 2MS indicate the utilization of scale-spaces with , and , respectively. A higher score at the same position does not necessarily imply a better result. It rather indicates that there are less numerical ties and less disagreements between SHS and CHI.

Table 1 shows a summary of the time requirements needed to evaluate each data set with the SoRC workflow. Each pre-processing setup was executed in parallel. It can be seen that with increasing number and size of images the time requirements increase notably. This is even more important if the different pre-processing setups are not executed in parallel.

Pre-processing Setup Barley Bladder Skin
0 MS 1:08:58 4:05:14 1:00:43
1 MS 1:20:42 4:36:36 1:08:46
2 MS 1:21:21 4:36:52 1:10:47
PP – 0 MS 1:08:04 3:17:23 0:42:36
PP – 1 MS 1:19:03 3:57:17 0:50:53
PP – 2 MS 1:19:59 3:57:14 0:50:47
Total 7:38:07 24:30:36 5:44:32
Table 1: Time requirements – Time requirements for the whole SoRC workflow. Each pre-processing setup was executed in parallel on a cloud machine with 500 GB memory and 28 CPUs. Total illustrates the time requirement if no parallel execution is used.

4 Discussion

Using our workflow and the SoRC visualizations we were able to observe, that pre-processing (de-noising, multi scale representation) seems to have a positive impact on the final clustering results, independent from the the clustering method or the similarity function. Please note, that we do not want to generalize this observation to data outside this study. On the one hand, de-noising pre-processing can have a positive influence on commonly known MSI problems, such as pixel-to-pixel variation and hotspots. On the other hand, it can have a negative influence on molecular distributions that are either small or finely structured. The same applies to the integration of different scale-space levels. The idea to combine multiple levels to consider fine grained structures at lower scales and coarser structures at higher scales seems to work out well in most cases. An integration of this approach as presented in this manuscript, i.e. by combining the results of several scale levels with a mean function, can theoretically be used in combination with any similarity function.

Since we did not find such a comparative analysis for similarity functions and different degrees of sample complexity in the literature one motivation for SoRC was the comparative analysis of different similarity functions. The results confirmed that two of the most commonly used functions (Pearson Correlation Coefficient and Cosine Similarity) are good choices, which is especially true for samples with a more regular structure. Interestingly, this result is in line with the recommendation given in the work of Ovchinnikova et al. [ovchinnikova2020colocml]. This previous comparison also recommends to use pre-processing in comparison with the Cosine Similarity. To be precise, they recommend to use a median signal thresholding and a sliding window median filter of size three, which also supports our statement about the importance of the pipeline setup.

However, it was also revealed that the choice of a suitable similarity measure becomes more difficult with increasing irregularity of the sample. This is supported by the observation that alternative image features, such as gradients or magnitudes, seem to gain in importance with increasing irregularity. The same seems to apply to the preservation of local neighborhoods.

All results taken together it becomes clear, that evaluation workflows with numerical quality estimations provide a great support to increase the quality of analysis pipelines.

5 Conclusion

In this manuscript, the suitability of different similarity functions to evaluate the co-localization of mass channel images was investigated. The influence of pre-processing and image scales was considered and the interaction of data source, analysis pipeline and analysis objective was addressed. Additionally, the SoRC evaluation workflow was presented. The workflow proposes a method to estimate the quality of an analysis pipeline by using a single score, the SoRC score. Based on this score, the user is provided with suggestions for combinations of methods. The use of the SoRC workflow allows to evaluate multiple pipeline setups fully automatically. This saves a time-consuming execution and evaluation of each pipeline setup and its parameters individually (see Table 1). In addition, the otherwise necessary tedious pairwise comparison of the results is avoided. Consequently, streamlining the comparison of pipeline setups through the SoRC workflow significantly improves the efficiency of the evaluation. SoRC in its present form is only suitable to evaluate clustering as analysis objective and its execution requires some programming knowledge. However, the underlying methodology can be extended to any analysis objective, as long as one can define at least one numerical value to estimate the quality of a result. However, if time or computational resources are limited we advice to use either the Pearson Correlation Coefficient or the Cosine Similarity in combination with basic pre-processing techniques and the integration of a small scale-space representation. Chances are high that a combination of these methods and functions provides good results for regular structured samples and at least acceptable results for irregular structured samples.


Source code is available on github.
Project name: SoRC
Project home page:
Operating system(s): Platform independent
Programming language: Python
Other requirements: Python 3.5 or higher
License: GNU GPLv3

Author’s contributions

Concept and design of study and software: KW, TWN. Developed and implemented software: KW. Analyzed data: KW. Evaluation and interpretation of results: KW, TWN. Writing: KW, TWN. All authors read and approved the final manuscript.


KW was funded by the International DFG Research Training Group GRK 1906. The funding body played no role regarding the design of the study, data collection, analysis or interpretation or the writing of the manuscript.

Competing interests

The authors declare that they have no competing interests.

Ethics approval and consent to participate

The collection and use of the human skin tissue was approved by the ethics commission of the Ruhr University of Bochum Faculty of Medicine, located in Bad Oeynhausen. All patients (donors of the human tissue) provided their informed consent in written form. We obtained the data completely anonymized.


S1 Similarity Functions

s1.1 Multiscale Pearson Correlation Coefficient (Pearson):


where and are the variance and covariance function.

s1.2 Multiscale Cosine Similarity (Cosine):


where is the Euclidean norm.

s1.3 Multiscale Angular Similarity (Angular):


Equation 10 is only valid for . For the factor must be changed to .

s1.4 Multiscale Structural Similarity Index (MSSIM):

The structural similarity function is defined between two sliding windows of size , with .


where is a sliding window function of size across two images. The function returns a new image, which is called similarity map. and calculate the arithmetic mean and the standard deviation of the sliding window, are two small constants to avoid division by zero and is the size of the set of all spectral positions.

s1.5 Multifeature Similarity Index (MMFS):

Similar to the SSIM approach, the MFS Max is defined between two sliding windows of size , with .


where , and are sliding window functions of size across two images. Like in Equation 11 these functions return similarity maps. , and calculate the arithmetic mean, standard deviation and covariance of the sliding window. represents the signs of its input, represents the absolute values of and is a small constants to avoid division by zero.

s1.6 Shared Pixel Information (Shared Pixel)


s1.7 Contingency Similarity Index (Contingency):


where calculates a contingency table of two images for values in {1,2,3}.

s1.8 Hypergeometric Similarity Measure (Hypergeometric):



describes a binarization function.

s1.9 Local Standard Deviation based Image Quality Index (Local Std):


where is the result of the convolution of with a standard deviation kernel of size and c is a small constant to avoid division by zero.

s1.10 Intensity-Magnitude-Angle Similarity (IMA Sim):


s1.11 Gradient Information (Grad Info):


s1.12 Intensity Histogram Similarity (Histogram):


where is the Hellinger distance and is the representation of an image as one-dimensional intensity histogram.

s1.13 Mutual Information (Mutual Info):


where is a one-dimensional joint histogram.

S2 Clustering Methods

s2.1 Hierarchical Clustering

Hierarchical clustering is commonly used within the MSI community and in the entire realm of bioinformatics. One of its main advantages is that it allows to track each clustering step by inspecting the tree like result called dendrogram. For this study an agglomerative average linkage (UPGMA) procedure is applied [sokal1958statistical]. The idea behind agglomerative average linkage hierarchical clustering is to initialize each data item (i.e. -image) as a single cluster. Thereafter, the two clusters with the closest average distance are merged together. This procedure is applied iteratively until only one cluster remains or until the intended number of clusters is reached. Without the use of approximation techniques, hierarchical clustering requires that the number of clusters be determined manually. To avoid an arbitrary choice, a universally applicable approximation technique is used.


where is the set of all distances resulting from all cluster merges, is the arithmetic mean, is the standard variance and is a constant, which was set to one.

s2.2 Affinity Propagation

The idea behind affinity propagation is to find data points that are suitable to be a representative of a number of other data points. To identify those points, messages are sent between pairs of samples until convergence. Each message belongs to one of two categories. First, the responsibility, which describes how well one data item () is suited as a representative for another item () and second, the availability, which describes how appropriate it would be for an item () to pick () as its representative [frey2007clustering]. The method that detects the number of clusters automatically. The applied implementation requires three parameters: the convergence iteration (set to 15), the maximum number of iterations (set to 200) and the damping factor (set to 0.5).

s2.3 Community Detection on Mass Channel Similarity Graphs

The idea behind this method is to create a mass channel similarity graph from the set of all mass channel images and use community detection as clustering approach [wullems2019detection]. The basic outline of this approach follows a three step procedure:

  • A pairwise similarity matrix is computed.

  • A threshold is computed to transform the similarity matrix into an adjacency matrix, which is an alternative representation of the mass channel similarity graph.

  • A community detection algorithm is applied.

The applied thresholding procedure is similar to Equation 21, but in this case equals the set of all pairwise similarity values of the similarity matrix.