Multi-Spectral Imaging via Computed Tomography (MUSIC) - Comparing Unsupervised Spectral Segmentations for Material Differentiation

by   Christian Kehl, et al.

Multi-spectral computed tomography is an emerging technology for the non-destructive identification of object materials and the study of their physical properties. Applications of this technology can be found in various scientific and industrial contexts, such as luggage scanning at airports. Material distinction and its identification is challenging, even with spectral x-ray information, due to acquisition noise, tomographic reconstruction artefacts and scanning setup application constraints. We present MUSIC - and open access multi-spectral CT dataset in 2D and 3D - to promote further research in the area of material identification. We demonstrate the value of this dataset on the image analysis challenge of object segmentation purely based on the spectral response of its composing materials. In this context, we compare the segmentation accuracy of fast adaptive mean shift (FAMS) and unconstrained graph cuts on both datasets. We further discuss the impact of reconstruction artefacts and segmentation controls on the achievable results. Dataset, related software packages and further documentation are made available to the imaging community in an open-access manner to promote further data-driven research on the subject


page 3

page 4

page 7

page 8

page 10

page 11

page 13

page 18


ADJUST: A Dictionary-Based Joint Reconstruction and Unmixing Method for Spectral Tomography

Advances in multi-spectral detectors are causing a paradigm shift in X-r...

Material-separating regularizer for multi-energy X-ray tomography

Dual-energy X-ray tomography is considered in a context where the target...

Characterizing the Immaterial. Noninvasive Imaging and Analysis of Stephen Benton's Hologram Engine no. 9

Invented in 1962, holography is a unique merging of art and technology. ...

Block Matching Frame based Material Reconstruction for Spectral CT

Spectral computed tomography (CT) has a great potential in material iden...

Regularization by Denoising Sub-sampled Newton Method for Spectral CT Multi-Material Decomposition

Spectral Computed Tomography (CT) is an emerging technology that enables...

Iterative Reconstruction of the Electron Density and Effective Atomic Number using a Non-Linear Forward Model

For material identification, characterization, and quantification, it is...

Reconstruction and segmentation from sparse sequential X-ray measurements of wood logs

In industrial applications it is common to scan objects on a moving conv...

1 Introduction

Spectral CT is researched in physics applications with the potential to be ”the emerging technology” for material distinction of objects[1, 2]. Due to advances in flux handling of the detector technology made in recent years, MECT instruments are becoming a feasible alternative to their single- and dual energy counterparts. Deducing materials with anisotropic spectral properties within volumetric, tomographic scans is more involved than the spectral analysis of natural [3]- or hyperspectral remote sensing [4] images and usually requires a set of prior assumptions on object behaviour or prior knowledge on image content. In this work, we present an approach for deducing object boundaries, represented by segmentation masks, based on material properties in spectral volume CT scans, covering large parts of the x-ray spectrum. The presented approach is superior to previous segmentations of SECT data with established methods (such as Otsu’s method [5] or domain-specific tools [6]), which is demonstrated in fig. 1. The spectral dataset, which is made available to the community, also provides input for training supervised segmentation methods, e.g. via neural networks. The material differentiation objective itself is synonymous with object segmentation based on material properties. The computed segmentation maps provide an input to improve subsequently material identification (i.e. object classification).

Fig. 1: The added value of using spectral CT data for fluid segmentation: a single-energy CT (b) segmentation suffers from lots of noise and the inability to clearly differentiate between materials, compared to manual reference segmentations (a). A full-spectrum analysis can differentiate between the materials, while still suffering in quality from low signal-to-noise ratios (c). The proposed method (d) is best able to differentiate individual segments that are closest to given manual reference outlines.

Material differentiation via spectral imaging has various application scenarios related to the imaging modality, ranging from well-known medical diagnosis and monitoring [2] over manufacture quality control [7, 8] to geological evaluation in the planetary sciences and for natural resources. The focus of this article is material differentiation in tomographic reconstructions, for which we extend previously published segmentation methods, originally presented in other contexts and on other imaging modalities. The target application of our research is in reliable fluid threat item detection (e.g. explosives, acids) within the CIL scanning project111CIL2018 NextGen Scanner for Checked In Luggage - for airport security. Despite the strong focus on spectral tomography, the presented segmentation method extensions also apply to other sources of spectral image data.

A sketch of the tomographic data processing is shown in fig.2. In contrast to other application areas where the range of imaged materials is limited and objects are easier to separate, luggage scanning deals with a wide range of material compositions, tightly arranged next to one another and inadequately distinguished by just the material phase (e.g. fluid, viscous semi-fluid, solid lattice with air pockets, solid filled, metal). Therefore, information from multiple parts of the x-ray spectrum are needed to distinguish objects and materials in more detail (e.g. different types and concentrations of fluids, see fig. 3 for reference of the used spectrum). Moreover, as fluids are non-rigid and arbitrarily arranged in luggage, an actual tomographic reconstruction is required to detect their traces. Luggage scanning is therefore an ideal testcase for spectral segmentation.

Fig. 2: Sketch of the tomographic data processing for segmenting MECT data in the CIL project.
Fig. 3: Chart of the electromagnetic spectrum. The range of x-ray spectrum used in this work is highlighted in green.

In terms of computational requirements, we distinguish between three image segmentation- and classification scenarios: (i) supervised, where the expected content is known in advance or can be compared to a reference database of known materials and segmentation maps (e.g. medical diagnosis and monitoring, atlas-based segmentation[9]); (ii) semi-supervised, where external annotations and input can be used to create a base segmentation and classification (e.g. quality control); (iii) unsupervised, where no prior knowledge about the acquired content is available. Methodologically, semi-supervised and unsupervised methods differ in that the number of separate objects and materials can be determined and indicated in semi-supervised schemes, while this information is commonly unknown for unsupervised schemes. this passage is more a discussion for classification - as we don’t do classification here, leave that one out.

Spectral (i.e. multivariate) multi-label object distinction is a long-standing challenge in image analysis and various algorithms have been proposed to address this task. A short overview of existing methods applicable to spectral CT segmentation is given in section 2. Utilizing recent advances of neural network-based image segmentation [10] requires extensive, domain-specific training datasets. Hence, one major gap addressed by this article is the lack of an openly available benchmark dataset for spectral CT segmentation in order to test the plethora of proposed techniques in a standardized manner. We introduce the spectral 2D (i.e. MUSIC2D) and 3D (i.e. MUSIC3D) CT datasets for algorithmic benchmarking of multivariate image segmentation methods. Both datasets are made publicly available with manual segmentations for reference- and scoring purpose. Apart from spectral tomographic segmentation, the dataset also provides input for research in CT correction algorithms, such as MAR. The benchmark dataset is curated collaboratively by physicists and computer scientists at the Technical University of Denmark (DTU) for future assessments. Based on benchmark segmentation results, we discuss the effects of reconstruction artefacts and spectral binning on the visual quality and quantitative precision of the detected materials and objects.

2 Scientific background and related literature

The following paragraphs give an overview of tasks inherent to tomographic segmentation and the proposed approaches within the literature to address them. The radiographic scanning applied here follows a common workflow for CT acquisition and processing (see fig.2): the x-ray acquisition system collects spectral radiographic projections of the object of interest in a given scanning setup. Correction algorithms are applied to account for detector response and photon correlation ambiguities (see Dreier et al. [11] for details). Then, a sinogram for each energy bin is computed. The sinograms are used in an iterative tomographic reconstruction algorithm [12] to compute the interior x-ray attenuation contribution (i.e. the LAC) per energy bin. Algorithms such as MAR and spectral range reduction can be employed beforehand or be integrated in reconstruction process to improve the smoothness and reduce noise in the tomographic dataset. The improved reconstruction data are then subject to the object segmentation algorithms assessed in this article. Due to over-segmentation effects, a fusion of segments based on the statistical distribution or morphological constraints may be required for each material. This latter task is performed manually in the remainder of the article for reasons of brevity and due to the focus on global segmentation methods.

2.1 CT reconstruction

Common fan-beam CT reconstruction aims at computing the x-ray attenuation in a 2D image from multiple 1D angular radiographic projections (analogously: computing the 3D volume from 2D angular projections) [13]. The resulting LAC of a scanned object depends on its material properties (e.g. density and element composition).

In general, the CT reconstruction task is an inverse problem [14]. A reconstruction can be computed analytically using the inverse Radon transform (i.e. FBP; [15, 13]), if sufficient projections for a given image resolution are available. The acquisition of such excessive number of projections for high-resolution CT images is infeasible in some practical scenarios due to (i) physical constraints of the imaging system , (ii) computational runtime demands or (iii) x-ray dose of radiation-sensitive objects (e.g. medical applications). On the other hand, an acquisition with too few projections lowers the SNR and introduces currently-unrecoverable reconstruction artefacts in the tomographic images.

Compressed sensing theory- and reconstruction methods [16] circumvent the excessive x-ray acquisition while still preserving a high quality in the image reconstruction. Total variation methods introduced geometric constraints and regularisation terms to solve inverse imaging more adequately [14]. Within our experiments, we use an ART-TV for the CT reconstruction [17, 12]. Still, by using as few as 9 up to 37 projections with 256 detector pixels each for reconstructing slices of 100x100 pixels in resolution, the SNR poses a major challenge to unsupervised image analysis approaches.

2.2 Unsupervised tomographic image segmentation

For the purpose of this article, we define the image segmentation , result of an iterative process over , as extracting non-overlapping regions whose pixel intensities are maximally homogeneous whereas their average intensity are maximally heterogeneous to all other regions. Conversely, we can also state the problem as minimising the inter-segment similarity while maximising the in-segment similarity (see eq. 1 to 4).


When segmenting spectral data, previous approaches [8, 7] utilise statistical distributions and a priori knowledge (e.g. number of expected materials and objects) as conditioning parameters to simplify the segmentation task. These information are hard to obtain in a general acquisition setup: The statistical distribution of x-ray intensities (fig. 4) and material attenuations (fig. 5) does not allow for a clear separation and derivation of the number of scanned materials. As visible in the related images, the segmentation task can even be perceptually challenging for humans.

Fig. 4: Side-by-side comparison of the x-ray projections and their statistical intensity distribution (using a histogram) for energy bins 19 (a), 39 (b) and 96 (c), corresponding to energy levels 40.52 keV, 92.60 keV and 120.72 keV.
Fig. 5: Side-by-side comparison of the tomographic reconstructions and their statistical LAC distribution (see histogram) for energy bins 19 (a), 39 (b) and 96 (c), corresponding to energy levels 40.52 keV, 92.60 keV and 120.72 keV.

This work focusses on the fully automatic, unsupervised segmentation of 2D- and 3D spectral CT images. Therefore, an in-depth discussion on conditioned graph cut methods [18, 19, 20] or active contour models [21, 22, 23] is not within the scope of this article, other than being applicable to the image segmentation if the number of distinct materials or objects and their approximate position were known a priori. Comparable approaches that either need prior conditioning (GMM[8], SDA [24, 7]) or that are related to supervised segmentation, such as random walks [25] via texture synthesis or SVM learning [26], are described elsewhere as they are not directly applicable to the given problem.

Traditional multi-label unsupervised image segmentation techniques without prior knowledge of the image content include approaches such as hierarchical clustering

[27] and MST [28]. More recently, MS algorithms [29] have been used in multivariate segmentation [30], which shares similarities to spectral segmentation. The drawback of fixed kernel sizes for analysing anisotropic value distributions has been addressed by adaptive MS [31, 32]. In this work, we utilise the FAMS algorithm [31] together with its spectral gradient extension [3] as one reference procedure for unbiased, unconstrained spectral CT segmentation in 2D and 3D.

Directly mapping algorithms for analysing multivariate statistics to image analysis challenges (e.g. using MS for image segmentation) tends to omit the dense connectivity between data points inherent in images. The Power Watershed algorithm by Couprie et al. represents a recent approach for unsupervised image segmentation that utilizes dense connectivity constraints [33], which has not been tested in this paper due to the computational complexity of the watershed calculation across multiple channels. In this work, we extend an unconditioned graph cut algorithm by Felzenszwalb and Huttenlocher [34] to perform the image segmentation while taking advantage of the inherent dense pixel connectivity. The specific graph cut formulation utilised in this article is explained in section 5.2.

2.3 Object- and material identification in luggage scanning

In the baggage screening application, we first distinguish between the object identification and the material identification, which also approximately relates to the distinction between the segmentation and classification. Separating and locating objects within the CT scan is a segmentation challenge. Identifying rigid threat items, such as small arms, has been the identification focus in traditional literature (e.g. [35, 36]), which requires a good object segmentation combined with simple shape priors in the classification.

Recent interest has shifted to the identification of explosives and LAG threats in baggage. These objects are non rigid and occur in arbitrary shapes. In such cases, the segmentation is only an auxiliary tool to group areas of similar x-ray intensity or attenuation. In turn, the utilization of attenuation contributions within the volume necessitates an actual tomographic reconstruction whereas rigid threat item detection can equally be performed on the radiographic projections themselves.

Classic single-energy CT- and x-ray radiography is still prevalently used for the identification of rigid threat items due to the rapid acquisition and processing. Furthermore, for solid objects, the captured attenuation relates well to the material density and the rough distinction between solids (e.g. plastics and metals). Liquid materials generally appear as low-attenuation objects in single-energy CT and different liquids exhibit almost identical attenuation values, which is why baggage screening for such threats uses DECT [37] or MECT [38, 39]. The DEI is used in DECT to distinct materials on a chemical level [40]. The use of MECT is a promising recent research direction due to its additional material information, though CT reconstruction and image processing algorithms are more complex. An review of the state-of-the-art in baggage screening is provided by Mouton et al. [41], which is based on the final project report of the ALERT initiative and supplementary review extensions.

More specific to DECT and MECT segmentation, Eger et al. attempt to use the full energy spectrum (10keV to 150keV) for material identification via extensive chemical reference samples and direct, pixelwise LAC comparison [38]

. Then, they perform a linear dimensionality reduction via SVD and LDA to extract the most relevant image information, while the final classification is performed via trained classifiers (i.e. likelihood ratio test) for reference material data. Mouton et al. provide a benchmark overview of four chained DECT segmentation methods, consisting of reference material comparison via DEI, a subsequent quality estimation and the refinement of critical segment areas via connected components and split-and-merge strategies


. The presented results are acceptable on visual inspection, though note that the evaluation dataset depicts mainly rigid objects and easily distinguishable materials. Martin et al. presented a learning-based segmentation algorithm for DECT data that trains a probabilistic kNN classifier while employing unconditioned graph cuts for the class separation

[42]. Our approach of using the spectral information as explicit auxiliary dimension is similar to [42], though we explicitly address the segmentation without prior knowledge on the number of objects (e.g. the knowledge of k for kNN).

Currently published material classification methods that purely operate on a material database comparison achieve limited success due to reconstruction artefacts, noise and natural material variations, which is discussed in the literature [38, 40, 42]

. More recently, material classification is approached with pattern recognition and machine learning algorithms such as visual bag-of-words

[43], with a classification accuracy of 70% (see review in [44]

). A recent trend in material classification is the training of CNN to perform object segmentation and material classification. Mery et al. first attempted to use pre-trained models on the ImageNet dataset from the two prevalent neural network architectures with mixed success. Akcay et al. use a pre-trained model with ImageNet data from ConvNet

[45], which is refined using actual 2D reconstructions of a limited-size dataset from the UK airport authorities [46]. The final-stage classification uses SVM. Both research groups report the lack of sufficient training data as a major impediment for better detection accuracy with CNN.

Possibly one of the largest application areas of the the 2D and 3D spectral dataset published with this paper is the provision of an overall set of 2376 spectral images from the domain of material science that can be directly used as input for advanced machine learning algorithms (e.g. CNN, training-based MCMC [47, 25]) in baggage screening.

3 Acquisition Instrumentation and Parameters

The 2D- and 3D spectral datasets in this article are acquired with a custom build tomography setup using MULTIX ME-100 V2 cadmium telluride (CdTe) x-ray detectors. It counts photons in the energy range of 20keV to 160keV and resolves them in upto 128 energy bins [48, 49]. The detector itself consists of two elements of each 128 pixels. The x-ray generator is a microfocus Hamamatsu ML12161-07. For the datasets presented here the operation Voltage was 150 kVp and 250µA resulting in a focal spot size of  75µm. The beam is collimated horizontally with a JJ X-ray IB-80-Air to a height of 0.6mm at the source a custom built 5mm thick tungsten slit directly in front of the detector with an opening of 0.6mm keeps scattered photons from the detector.

A spectral correction algorithm is applied to the real data in order to remove spectral detector pixel bevels artefacts [11]. The result are 37 energy-corrected x-ray projections covering the full angular range of 360 degrees. These spectral sinograms are included with each dataset. Subsequently, we use an ART-TV reconstruction algorithm to obtain the spectral images and volumes. The lateral reconstruction pixel resolution is limited by the amount of projections and the detector pixel resolution, result currently in a target resolution of 100 x 100 pixels laterally (i.e. per slice). The datasets are still subject to metal artefacts as proper MAR is still under development in our acquisition procedure.

4 The Datasets

In this section we present the datasets (2D- and 3D-spectral) that form the major contribution of this article. We discuss the core properties of each dataset, such as SNR, CNR, and the energy responses of various materials in the scanner.

4.1 Music2d

Tomographic 2D images are easier accessible nowadays and due to the previously-mentioned body of literature on luggage- and cargo inspection, even dual-energy data can be obtained upon request from the application domain community. We decided to evaluate and publish our spectral data as novel contribution because (i) most open data available for segmentation originate from the medical domain or cargo inspection, where segmentation can be steered by strong shape priors (which is in contrast to actual material identification), and (ii) spectral tomographic data beyond dual energy, covering larger parts of the x-ray spectrum, are rare.

The MUSIC2D dataset consists of 32 spectral images in total, of which there are 11 single-object material reference images as material database and 21 multi-object realistic scans to evaluate attenuation interference between the different materials. Fig. 6 shows a collection of material reference scans and multi-object scans, composed as follows:

Fig. 6: Collection of multi-spectral CT illustrations of MUSIC2D dataset, where each colour channel depicts a specific x-ray energy channel. Note the considerable noise in each energy channel that poses intrinsic challenges to the material identification.

Segmenting the data according to their spectral response is challenging without the use of shape priors. The SNR within the detector-noise corrected x-ray projections (expressed via non-background photon count and its standard deviation

, see [50]) is between 0.92 to 1.85 in metalicity-heavy scans and is at 2.340.05 in metal-unaffected scans, while the SNR in the reconstruction (measured as within the image, as in [51]) is between 3.1 (high metalicity) and 4.9 (low metalicity). The average CNR (expressed as , consult [52] for symbols and definition) is in the range of 3.1 and 6.5. The challenge becomes more apparent when considering the average CNR across all 11 single-object scans depending on energy spectrum (fig. 7): The low-energy range of 20 keV to 35 keV and high-energy-range of 125 keV to 160 keV shows excessive noise. One control parameter on the SNR is the count statistics during the x-ray acquisition (i.e. the amount of emitted and received photons contributing to a pixel’s attenuation coefficient), which can be adjusted in the instrument. Low-energy noise artefacts further originate from saturation truncation in the correction algorithm [11], as well as the sample absorption in the radiographic projections (i.e. objects sample one another in the projections, where the object further back receives less photons to detect). High-energy noise is due to generally decreasing count statistics in this part of the spectrum. The typical LAC profiles for some tested materials in this article are presented in fig. 8.

Fig. 7: The CNR for the various testing materials (shown as a function of x-ray energy spectrum) shows low signal-to-noise contrast in the spectral regions around 20-35 and 125-160 keV. Conversely, the region close around 30 keV is essential for most fluid identifications.
Fig. 8: The LAC response curves for all 11 reference material scans of MUSIC2D. Notice the photon starvation below 30keV and that the boundary effects from the correction algorithm results in deviations from the physical norm in LACs.

4.2 Music3d

The major novelty of this article is the treatment of 3D spectral data (i.e. the MUSIC3D dataset). For medical imaging, volumetric phantoms of single-energy CT are openly available whereas multi-energy datasets are not. Additionally, the acquisition of MECT for medical diagnosis and treatment is uncommon due to the considerably-increased x-ray exposure of potential patients, and actual patient data is rarely being made public. MECT scans are more common in cargo- and luggage assessment, where x-ray exposure of organisms is less of a concern, but openly available datasets are missing while they are in high demand for advanced image analysis. This is the scientific gap filled by MUSIC3D.

The MUSIC3D dataset consists of 7 spectral samples in total, all including multiple objects in realistic settings. Two scans (i.e. ’Sample 23012018’ and ’Sample 24012018’) pose increasing challenges for image segmentation due to an aluminium bar that causes considerable metal artefacts. Fig. 9 shows the DVR-generated images for each scan with the energy channel mapping keV.

Fig. 9: Collection of multi-spectral CT illustrations of MUSIC3D dataset, where each colour channel depicts a specific x-ray energy channel.

5 Algorithms & Methods

For the segmentation of spectral images and volumes and with respect to the introduced target domain, we quickly summarize key requirements on our target methodology:

  • minimize inner-segment heterogeneity while maximising inter-segment heterogeneity

  • make explicit use of the extra, spectral data dimension

  • not require user input for the approximate number or locations of target segments

  • not require a priori knowledge about the number of objects or materials within the data (i.e. fully unsupervised segmentation)

  • not rely on rigid boundaries or prominent shapes for the segmentation

  • reduce bias by not expecting specific materials to be present in the data

The boundary condition on this spectral segmentation are very strict and thus leave few existing algorithm to be applicable to the task, which are further discussed.

5.1 Spectral Fast Adaptive Mean Shift

The mean shift algorithm [29] is an established tool for multivariate data analysis and has been applied to multi-spectral image analysis in many application domains [53, 3, 28]. The algorithm aims at estimating the data density in using a kernel function (see eq. 7).


, with being the data sample, being a kernel normalisation constant in and being the kernel profile. With this density estimation, a mean shift vector is constructed, as in eq. 8.


, with being the kernel function and being the kernel center. Within the resulting data mapping function in , the algorithm locates extremal points whose density derivative equals zero (eq. 9). These extremal points are called modes and are the data cluster centroids in kernel space. In an image segmentation scenario, the modes are segment centres in kernel space and the (pruned) number of modes equals the number of segments.


A drawback of the initial equation is the use of a constant kernel size (eq. 8), which assumes an isotropic value distribution in . From our initial experiments and the analysis of statistical distribution of x-ray intensities, we see that the isotropic value sampling does not apply to MECT data. Thus, we apply the FAMS extension [31] of MS to our data, where the mean shift vector follows the formulation in eq. 10.


In this formulation, the mean shift vector does not depend on a globally-estimated, isotropic bandwidth , but adapts to local anisotropic density variations with an individual density bandwidth per datum . Furthermore, Jordan and Angelopoulou proposed using the spectral gradient as input to the mean shift (as opposed to the actual absorption intensities in their spectral photographs) to obtain more coherent segmentations [3]. After initial experimental comparisons between mean shifting x-ray intensities or spectral gradients thereof, we decided using the spectral gradient adaptation.

We implemented the algorithm using the reference implementation of [31] in C and wrapped its functionality into Python to provide an easy interface to the actual data processing. The spectral gradient computation is performed within Numpy, as is the uniform data normalisation and quantization, which is necessary as the FAMS procedure was specifically designed for integer numerics.

5.2 Spectral Graph Cut

For the unconditioned graph cut, we use the method formulation by Felzenszwalb and Huttenlocher [34]: Let the image (or volume) be represented by a undirected graph with each pixel (or voxel) being a vertex of the graph. Each vertex contains a property vector representing the x-ray intensities or LAC. All vertices are connected to their 1-ring neighbourhood by edges where the edge length is given by weights with . Otherwise, we define the edge weights as the mean vector derivative of two vertices, so that


The implications of different 1-ring neighbourhood definitions is discussed with the experiment results.

After the definition of the data terms, our segmentation proceeds as originally described in [34]: each vertex is initialised with a unique segment number. At each iteration, the internal difference per segment is computed as the maximum weight of its MST (eq. 12). The inter-segment difference is computed as the minimum edge weight connecting two segments (eq. 13). An inter-segment boundary is indicated by the delta-function (eq. 14). Segments not separated by a boundary are merged. The algorithm terminates when all segments are separated by boundaries. This formulation strictly follows the first criterion laid out for our segmentation goals. Note that these boundaries are limits of the spectral signature and not expected shape boundaries.


The chosen neighbourhood definition applied to the edge connectivity in graph impacts the segmentation results because the weights of the edges are composed of neighbouring absolute voxel differences. The original implementation uses a box filter kernel (27-neighbourhood in 3D grids) as connectivity representation (fig. 10b). This leads to leakage between segments as diagonal boundaries are not respected for containing a segment. We experimented with the constrained 7-neighbourhood definition for vertex adjacency (fig. 10a), which prevents leakage explicitly but results in a larger number of isolated segments, as well as a bell-curve weighted box filter (i.e. weighted 27-neighbourhood; fig. 10c), which represents an adaptable trade-off between segment leakage and segment isolation.

(a) 7-neighbourhood
(b) 27-neighbourhood
(c) weighted 27-neighbourhood
Fig. 10: Neighbourhood definition illustration. For the weighted neighbourhood, the block intensity maps to the contribution of each adjacent cell.

5.3 Adaptive, anisotropic binning

The initial segmentation results, especially in 3D spectral domains, have shown unsatisfying results for subsequent material identification, even when omitting the high-noise bottom- and top end of the acquired x-ray spectrum (see CNR in fig. 7). A detailed analysis of the attenuation profiles in the 3D spectral data revealed that the material information most distinct for fluids are located in narrow bands of the x-ray spectrum, located between 38 keV and 70 keV (see fig. 8). The channels in focus for material distinction are in a low-to-mid energy range where solid materials as well as liquids are significantly attenuating the penetrating x-rays (i.e. more than 30% [54]). Separating different solid materials (e.g. plastics, metals) can be done in higher energy channels too, while liquids are attenuating the high-energy x-rays to little to be visible in the upper spectral range. If applying uniform (i.e. isotropic) sampling of the energy bins, these low-energy high-information channels are accumulated into one energy bin, leading to a loss of information.

We correct the information loss by introducing an adaptive, anisotropic binning scheme that reserves more output bins for the high-variability bands in the x-ray spectrum. The adaptive binning computes the variance

for each input energy channel separately, as well as the global sum of all energy levels. Then, we define the budget for each output bin as follows:


with being the number of output bins. According to this budget, energy channels are accumulated while the distinct energy information are preserved relative to overall data variability.

6 Results

In this section, we present the segmentation results obtained with the methods discussed above. A discussion of the FAMS and unsupervised graph cut segmentation with minimal data modification is followed by an in-depth analysis of the impact of adaptive spectral binning. Moreover, we discuss how the deteriorating quality of limited-projection tomographic reconstruction influences the segmentation results on the adaptively-binned spectral datasets.

The two segmentation methods utilised in this study each evaluate the spectral gradient: the graph cut methods does so explicitly via edge weights, while the mean shift does so implicitly by localising extremal points in the feature plane, which is here described by the spectral dimension. As there is too few variation between each individual channel response and because the data need to be trimmed to avoid harmful frequencies (see section 2.1), the full spectrum data are first clipped between 35 keV and 140 keV and then uniformly rebinned into 20 averaged channels.

6.1 FAMS segmentation

Fig. 11 shows the segmentation results on 9 of the 11 single-object scans of the MUSIC2D dataset using FAMS across the full spectrum (i.e. uniform binning). Figure 12 shows the segmentation maps for 3 selected cases of the multi-object scans in the 2D dataset. As seen in the figures, the spectral FAMS is prevalently unable to extract larger segment regions, though it is often able to extract the spectral boundary (i.e. boundary formed by high-intesity spectral gradient) between materials. Hence, while the spectral FAMS method fails at providing good segmentations for MECT (in contrast to previously-documented literature for visible- and infrared spectra [55, 3]), it provides spectral boundary constraints that can be used to extract connected components and then to derive spectrally-separate objects. Other versions of the local density-adaptive mean shift (also listed in [3]) have been tested directly on the recorded energy channels, but no mean shift algorithm was able to extract consistent spectral object maps usable for subsequent material identification.

Fig. 11: Collection of the FAMS segmentation results achieved on 9 of the available 11 reference material scans. Completely greyed-out images did not provide any segments.
Fig. 12: Collection of the FAMS segmentation results achieved on 3 of the available 20 multi-object scans. Completely greyed-out images did not provide any segments. The composition of each sample is given in app. A.

These 2D results apply the standard parameterization of (neighbourhood query size), (observation size) and (detail level) (consult [31] for parameter descriptions).

Fig. 13 provides an overview of the achievable segmentations on the MUSIC3D dataset with the FAMS segmentation. As with the 2D spectral data, the segmentation results of 3D data are of insufficient quality for direct material identification. We failed to extract any segments via FAMS for 3 samples of the 3D spectral dataset, which is due to the low SNR. The method’s parameters of , and (see their description in [31]) needed to be fine-tuned for each sample to achieve a non-arbitrary segmentation, due to the low SNR. When segmenting the 3D datasets, we established the following range of working parameter settings experimentally.

  • k: 100 – 220 (std.: 100)

  • K: 20 – 30 (std.: 24)

  • L: 30 – 40 (std.: 35)

Fig. 13: FAMS segmentation results achieved on the MUSIC3D dataset. Spectral FAMS predominantly extracts the segment boundaries as the target segments (i.e. output is a binary segmentation of boundaries). For some datasets (’Sample24012018’,’Fluids’,’Fruits’), their content is indistinguishable from the background attenuation, leading to an empty segment map. The illustration embeds the segments (black colour) in a DVR of energy channel 39 (61.35 keV).

The drawback of the parameterisation is rooted in the FAMS method: it assumes that the data is gaussian distributed and that each value is equiprobable. Due to the underlying decay in attenuation (see fig.


) for most non-metallic materials, the input data to FAMS from MECT is neither gaussian distributed nor is each value occurrence equally probable. This affects the neighbourhood determination via LSH

[56], upon which FAMS is based on, because a neighbour query is more probable to locate neighbours in higher parts of the x-ray spectrum. The speed of the LSH query is what drives the observation size parameters and , which in return results in small observation sizes (and thus: small segments) of FAMS. This theory is supported by our experiments: we executed the integrated auto-parameterization provided by Georgescu and Shimshoni [31, 32], which determines optimal observation sizes depending on the size of the neighbourhood query . The auto-parameterisation yields better results on the full-spectrum MUSIC2D dataset that actually includes multiple material segments (see fig. 14 for samples with multi-material objects). The results of the auto-parameterization for scan ’Sample 31102016’ vary significantly and, in the case of , even provide an acceptable objects segmentation (fig. 15). Despite the potential improvements of the auto-parameterization, the actual segments still need to be extracted via extraction of connected components. Furthermore, because the auto-parameterization relies on execution time measurements, the results of the procedure vary with changing system load during the program’s execution.

Fig. 14: Results of the auto-parameterization on the full-spectrum MUSIC2D FAMS segmentation. The images show results of ’sample3’ (k=20,K=15,L=15), ’sample5’ (k=150,K=12,L=16), ’sample11’ (k=120,K=13,L=17) and ’sample15’ (k=100,K=13,L=14), which show improved segmentation of material segments compared to the standard parameterization in fig. 12. All images are zoomed-in version of the full-scale results for better detail visibility.
Fig. 15: Result of the auto-parameterization option of FAMS: the result for neighbourhood query sizes of 80, 100 and 220 are shown after optimal observation sizes were determined for ’Sample 31102016’. The algorithm is able to extract object interiors (semi-translucent grey contours) and exteriors (opaque orange contours). The interiors can be used as input for connected component extraction.

6.2 Graph cut segmentation

In comparison to the spectral FAMS method, the unsupervised spectral graph cut yields more appropriate segmentation results. Figures 16 and 17 display the segmentation maps for the 9 single-material reference samples of MUSIC2D and side-by-side comparisons of LAC and graph cut segment maps for 3 multi-object scans in the same 2D dataset. The material boundaries still show considerable uncertainties, but the segments are visually of acceptable quality so to be used for subsequent material identification.

Fig. 16: Renderings of the achievable graph cut segmentation map for 9 of the 11 reference material scans. Segments with equal tones but different brightness are composed of multiple segment indicators.
Fig. 17: Side-by-side comparison of the graph cut segments and LAC responses per material for 3 multi-object scans (for sample composition, see app. A). Segments with equal tones but different brightness are composed of multiple segment indicators.

Table I presents the dice coefficient as quantitative evaluation of segmentation quality for each scan in the MUSIC2D dataset for the graph cut results. The manual reference segmentation for each scan has been obtained manually beforehand using MITK [57] and DeVIDE [58].

Sub-dataset dice coeff.
Acetone 0.7023
Brandy Chantré 0.5879
Cien hand cream 0.8145
Garnier Fructis 0.5760
HO 0.8290
Methanol 0.4573
Nitromethane 0.5930
Nivea sun lotion 50+ 0.5617
Olive oil 0.5340
HO 0.8694
Whiskey Tullamore Dew 0.5844
Overall 0.6463
TABLE I: Dice coefficient overview for MUSIC2D dataset as quantitative graph cut segmentation quality evaluation

The unconditioned graph cut also provides an improved segmentation quality for MUSIC3D, which is illustrated in fig. 18. As with FAMS, due to the high noise level, the observation size parameter (see Felzenszwalb and Huttenlocher [34]) often needs to be adapted for each scan individually.

Fig. 18: Graph cut segmentation results achieved on the MUSIC3D dataset. Some segments are composed of multiple indicators, as in fig. (b)b. Note the high degree of noise that leads to insufficient material distinction. The illustration embeds the segments (colours) in a DVR of energy channel 39 (61.35 keV).

The overall segmentation quality of the graph cut is shown in table II using the dice coefficient.

Sub-dataset dice coeff.
Sample 31102016 0.7646
Sample 23012018 0.3391
Sample 24012018 0.4424
Fluids 0.3428
Fruits 0.5578
Non-threat items 0.6145
Threat items 0.7010
Overall 0.5374
TABLE II: Dice coefficient overview for MUSIC3D dataset as quantitative segmentation quality evaluation

The neighbourhood definition applied for the edge connectivity in graph , as described in sec. 5.2, has a significant impact on the segmentation results. Fig. 19 shows the effect of varying neighbourhood definitions on the graph cut segmentation result: the common 27-neighbourhood definition performs relatively poorly as it is visually prone to noise. A 7-neighbourhood definition results in cleaner separations between segments, though at the cost of drastic oversegmentation (two to three times the overall extracted segments compared to other neighbourhood definitions). The weighted 27-neighbourhood definition gives the most visually-acceptable results for all full-spectrum 3D samples.

Fig. 19: Comparison of the graph cut segmentation with full-spectrum data depending on different neighbourhood definitions. The illustration embeds the segments (colours) in a DVR of energy channel 39 (61.35 keV).

6.3 Adaptive spectral binning

The application of the adaptive, anisotropic spectral binning scheme generally offers large improvements upon the poor segmentations presented until now. The improved segmentation quality can be observed quantitatively in the dice coefficient (see table III and IV) as well as qualitatively by visual inspection (see fig. 20, 21 and 22).

Sub-dataset dice coeff.
Acetone 0.4340
Brandy Chantré 0.6613
Cien hand cream 0.8676
Garnier Fructis 0.8961
HO 0.6749
Methanol 0.9155
Nitromethane 0.4545
Nivea sun lotion 50+ 0.8301
Olive oil 0.4778
HO 0.4992
Whiskey Tullamore Dew 0.5011
Overall 0.6557
TABLE III: Dice coefficient overview of the graph cut for MUSIC2D dataset after adaptive binning
Sub-dataset dice coeff.
Sample 31102016 0.7630
Sample 23012018 0.5155
Sample 24012018 0.8758
Fluids 0.9095
Fruits 0.4195
Non-threat items 0.8995
Threat items 0.7569
Overall 0.7342
TABLE IV: Dice coefficient overview for MUSIC3D dataset after adaptive binning
Fig. 20: Improvements of FAMS segmentation maps after adaptive binning.
Fig. 21: Improvements of graph cut segmentation maps after adaptive binning.
Fig. 22: MUSIC3D graph cut segmentation results achieved after adaptive binning. The illustration embeds the segments (colours) in a DVR of energy channel 39 (61.35 keV).

Adaptive binning additionally allows for determining a fixed observation size parameterisation for FAMS and graph cuts that is valid across all scans, discarding the need for manual parameter optimisation. This is due to the reduced noise in each target energy bin, the increase in self-information carried by each energy channel, and the data-adaptive nature of the binning. Thus, the obtained results for adaptive binning use the following parameterisation:


  • k: 220

  • K: 24

  • L: 35

unconditioned graph cuts:

  • k: 3.0

  • minSize: 625

  • edge connectivity: 27-neighbourhood

The type of neighbourhood definition has a distinct impact on segmentation results by the graph cut method, thus fig. 23 shows the achievable quality of different neighbourhood definitions on adaptively binned data: within the ’Fruits’ sample, the objects are located closely to the bounding plexi-glass cylinder. Due to the lack of a definite hull separating the plexi-glass and the objects, leakage of the segments occurs so that fruits and plexi-glass are incorrectly labelled with an equal indicator. This leakage can be prevented when applying a 7-neighbourhood definition. This is, in cases of cluttered object arrangements such as check-in luggage, a good control to steer the target level of detail. Note that this result on adaptively-binned data is in contrast to full-spectrum graph cut segmentation and that the weighted 27-neighbourhood definition performs poorly with adaptive binning. The reason for the contrasting results is in the improved SNR for adaptively-binned data: a 7-neighbourhood kernel on full-spectrum data extracts an excessive amount of segments that are poorly connected between slices in the volume stack due to high noise, which is filtered with the weighted 27-neighbourhood kernel. For the case of adaptive binning, the improved SNR allows for better-connected segments in general, which explains the superior 7-neighbourhood kernel performance.

Fig. 23: Comparison of the graph cut segmentation with adaptive binning depending on different neighbourhood definitions. The illustration embeds the segments (colours) in a DVR of energy channel 39 (61.35 keV).

6.4 CT reconstruction influence

When comparing scan ’Sample 31102016’ and ’Sample 24102016’ in MUSIC3D (see fig. 9), we observed a difference in segmentation quality due to metal artefact influence. The metal artefacts are partially so dominant that material identification is impossible. MAR is needed and, in common cases with metal artefacts only affecting minor portions of the whole scan, it can possibly eliminate the metalicity issue within scans for material identification [59]. In other application areas, MAR is still a problem with with recent advances [60] that requires further treatment in the literature[61].

Another significant influence on SNR and CNR is rooted in the inverse reconstruction from x-ray projections to the computed tomography, as discussed in section 2.1. With the projections available from compressed sensing, an analytical reconstruction for the MECT data is not possible. In an undersampled tomography domain, image quality (with respect to SNR and CNR) degrades rapidly with a decrease in available projection data. Fig. 24 shows the effects of this decreasing image quality with the availability of 74, 37 and 9 projections on the segmentation using the above-outlined adaptive binning and unconstrained graph cut. As can be seen in the images, even using a considerably undersampled tomographic scan (9 projections), which leads to severe reconstruction artefacts, the presented segmentation procedure allows to extract a reasonable segmentation.

Fig. 24: Comparison of the graph cut segmentation with adaptive binning depending on different number of projections used for the reconstruction with the reference manual segmentation. The illustration embeds the segments (colours) in a DVR of energy channel 39 (61.35 keV).

7 Conclusion

In conclusion, we presented an openly available spectral CT dataset, acquired in the domain of baggage scanning, which is aimed at improving automatic image analysis by method benchmarking. The dataset itself, called MUSIC, consists of two separate parts for 2D (32 sets)- and 3D (7 sets) x-ray projections and tomographic reconstructions. The dataset, including further information on how to use it, is available at The set includes the corrected x-ray projections, their ART-TV reconstructions, as well as the presented segmentations (for FAMS and graph cuts). The data are provided in the following formats:

  • MatLab/Octave/Python for 3D data: HDF5 (.h5)

  • C++/Python/available CT software: MetaIO format (.mhd)

Furthermore, the C++ implementation of the unconditioned graph cuts as well as the utilized Python scripts in the data processing can be obtained at

As for the methodology analysis, we presented and compared two techniques for fully unsupervised spectral image- and volume segmentation, namely 3D spectral FAMS and unconditioned 3D spectral graph cuts, that are based on existing literature and that were adapted to process MECT data. The article compared the results of both segmentation methods using an isotropic, uniform spectral binning from 128 down to 20 energy channels of the MUSIC dataset, which shows significant drawbacks of both methods in the presence of high noise and low SNR. Based on the results for isotropic binning, we presented and adaptive, anisotropic spectral binning that follows a variance budget allocation scheme. The adaptive binning scheme, using 10 compressed energy bins, improves the SNR per energy bin while maintaining the information diversity in low-energy channels. The improvements are observable visually in the segmentation volume maps as well as quantitatively in the dice coefficient measurements ( for 2D spectral-, for 3D spectral data). Besides the improves SNR, the adaptive binning also eliminates the need for meticulous manually-tuned parameterisation of each segmentation method.

Evaluating the segmentation methods in more detail, we generally observe better segmentation maps given by the unconditioned graph cut while the MS-based algorithm tends to supply object boundary maps (similar to edge filtering). The result of the FAMS method on MECT data shows significantly different behaviour than previously-published results on hyperspectral imaging in the visible- and infrared part of the spectrum [3]. A detailed analysis has revealed that the different results may be rooted in the problem of signal quantization: the FAMS method was designed for images obtained by digital cameras with common CCD sensors and optical lens filters to separate the various spectra, quantifying the recorded light intensities in 8- or 16-bit values (irrespective of integer- or floating-point value representation). For tomographic reconstructions and with the goal to separate marginal material differences, a 16-bit quantization range is insufficient (see fig. 8 to get an impression of the recorded attenuation range and the quantization resolution required to differentiate various fluids). In our experiments, the spectral gradient itself is significant enough to detect material boundaries, but the underlying signal response (i.e. the LAC) does not facilitate multi-material differentiation. As discussed in section 6.1, a connected-component analysis based on clean object separation via FAMS may yield appropriate segmentation maps for later processing.

Lastly, this article illustrated and discussed deteriorating affects, such as metal artefacts and the considerable undersampling of x-ray projections in ART-TV reconstruction, on the segmentation quality. Both of these issues are still remaining obstacles that need to be addressed in future work on the subject. It is particularly important for the application of automatic check-in luggage scanning and material identification because (i) metal objects cannot be excluded from the luggage and (ii) the physical constraints of airport CT scanners in some cases do not allow for more than 9 x-ray detectors (i.e. 9 x-ray projections) being acquired simultaneously.

8 Discussion

More high-quality segmentations may be achievable with other existing methods. The strongest postulate of the presented research is the unknown number of objects and materials within a specific dataset. We apply this postulate to automatically-acquired scans because, from a strict statistical perspective, predefining the number of expected segments introduces a bias in the segmentation. This bias is variable and depends on the observation scale, the segment definition (or definition policy) or the personal judgement of the domain expert with respect to the distinction detail. As such, the bias allows for flexibility and uncertainty in the segmentation itself and provides an error margin for initial estimations. A priori

knowledge about the number of distinct materials or even their approximate position inside the scan (i.e. material seeds) facilitates for more robust semi-supervised segmentations (e.g. minCut-maxFlow graph cuts, gaussian mixture models). Hence, our future research involves the robust, probabilistic estimation of material seed points inside MECT scans.

The unsupervised methods presented here deliver numerous equiprobable segmentation for each scan, which provide large reference datasets for subsequent machine learning approaches (e.g. neural networks). The increasing amount of segmentation volumes enables the training of neural networks from a limited number of reference scans. One of the potentially major uses of MUSIC is thus the provision of a spectral volume image database for also other applications that require material classification without shape priors.


The authors would like to thank Mark Lyksborg and Mina Kheirabadi for their former MECT research, on which some concepts in this paper extent and build upon. We further acknowledge the Innovation Fund Denmark, funding the presented research in the CIL2018 project (code: 10437). We further thank and acknowledge NVIDIA Corp. for the donation of one NVIDIA Titan Xp in support of our research efforts.


  • [1] J. Fornaro, S. Leschka, D. Hibbeln, A. Butler, N. Anderson, G. Pache, H. Scheffel, S. Wildermuth, H. Alkadhi, and P. Stolzmann, “Dual-and multi-energy ct: approach to functional imaging,” Insights into imaging, vol. 2, no. 2, pp. 149–159, 2011.
  • [2] C. H. McCollough, S. Leng, L. Yu, and J. G. Fletcher, “Dual-and multi-energy ct: principles, technical approaches, and clinical applications,” Radiology, vol. 276, no. 3, pp. 637–653, 2015.
  • [3] J. Jordan and E. Angelopoulou, “Mean-shift clustering for interactive multispectral image analysis,” in Image Processing (ICIP), 2013 20th IEEE International Conference on.   IEEE, 2013, pp. 3790–3794.
  • [4] C. Chang, Hyperspectral Imaging: Techniques for Spectral Detection and Classification, ser. Hyperspectral Imaging: Techniques for Spectral Detection and Classification.   Springer US, 2003, no. v. 1. [Online]. Available:
  • [5] N. Otsu, “A threshold selection method from gray-level histograms,” IEEE transactions on systems, man, and cybernetics, vol. 9, no. 1, pp. 62–66, 1979.
  • [6] M. Lyksborg, “Implementation: Material and 3D-object recognition – EASI,” Technical University of Denmark (DTU Compute), Technical Report, October 2016.
  • [7] G. Einarsson, J. N. Jensen, R. R. Paulsen, H. Einarsdottir, B. K. Ersbøll, A. B. Dahl, and L. B. Christensen, “Foreign object detection in multispectral x-ray images of food items using sparse discriminant analysis,” in Scandinavian Conference on Image Analysis.   Springer, 2017, pp. 350–361.
  • [8] H. Einarsdóttir, M. S. Nielsen, R. Miklos, R. Lametsch, R. Feidenhans’l, R. Larsen, and B. K. Ersbøll, “Analysis of micro-structure in raw and heat treated meat emulsions from multimodal x-ray microtomography,” Innovative Food Science & Emerging Technologies, vol. 24, pp. 88 – 96, 2014, food Microstructure. [Online]. Available:
  • [9] B. Dogdas, D. Stout, A. F. Chatziioannou, and R. M. Leahy, “Digimouse: a 3d whole body mouse atlas from ct and cryosection data,” Physics in Medicine & Biology, vol. 52, no. 3, p. 577, 2007.
  • [10]

    Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,”

    Nature, vol. 521, no. 7553, p. 436, 2015.
  • [11] E. S. Dreier, J. Kehres, M. Khalil, M. Busi, Y. Gu, R. Feidenhans’l, and U. L. Olsen, “Spectral correction algorithm for multispectral cdte x-ray detectors,” Optical Engineering, vol. 57, pp. 57 – 57 – 13, 2018. [Online]. Available:
  • [12] E. Y. Sidky and X. Pan, “Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization,” Physics in Medicine & Biology, vol. 53, no. 17, p. 4777, 2008.
  • [13] A. C. Kak and M. Slaney, Principles of computerized tomographic imaging.   IEEE press, 1988.
  • [14] P. C. Hansen, Discrete inverse problems: insight and algorithms.   Siam, 2010, vol. 7.
  • [15] R. N. Bracewell and A. Riddle, “Inversion of fan-beam scans in radio astronomy,” The Astrophysical Journal, vol. 150, p. 427, 1967.
  • [16] D. L. Donoho, “Compressed sensing,” IEEE Transactions on information theory, vol. 52, no. 4, pp. 1289–1306, 2006.
  • [17] E. Y. Sidky, C.-M. Kao, and X. Pan, “Accurate image reconstruction from few-views and limited-angle data in divergent-beam ct,” Journal of X-ray Science and Technology, vol. 14, no. 2, pp. 119–139, 2006.
  • [18] Y. Boykov and V. Kolmogorov, “An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 9, pp. 1124–1137, Sept 2004.
  • [19] C. Rother, V. Kolmogorov, and A. Blake, “Grabcut: Interactive foreground extraction using iterated graph cuts,” in ACM transactions on graphics (TOG), vol. 23, no. 3.   ACM, 2004, pp. 309–314.
  • [20] Y. Boykov and G. Funka-Lea, “Graph cuts and efficient n-d image segmentation,”

    International Journal of Computer Vision

    , vol. 70, no. 2, pp. 109–131, Nov 2006. [Online]. Available:
  • [21] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” International Journal of Computer Vision, vol. 1, no. 4, pp. 321–331, 1988. [Online]. Available:
  • [22] G. Sapiro, “Color snakes,” Computer Vision and Image Understanding, vol. 68, no. 2, pp. 247 – 253, 1997. [Online]. Available:
  • [23] C. Xu and J. L. Prince, “Generalized gradient vector flow external forces for active contours,” Signal Processing, vol. 71, no. 2, pp. 131 – 139, 1998. [Online]. Available:
  • [24] L. Clemmensen, T. Hastie, D. Witten, and B. Ersbøll, “Sparse discriminant analysis,” Technometrics, vol. 53, no. 4, pp. 406–413, 2011. [Online]. Available:
  • [25] Z. Tu and S.-C. Zhu, “Image segmentation by data-driven markov chain monte carlo,” IEEE Transactions on pattern analysis and machine intelligence, vol. 24, no. 5, pp. 657–673, 2002.
  • [26] G. Camps-Valls, D. Tuia, L. Bruzzone, and J. A. Benediktsson, “Advances in hyperspectral image classification: Earth monitoring with statistical learning methods,” IEEE Signal Processing Magazine, vol. 31, no. 1, pp. 45–54, Jan 2014.
  • [27] Z. Wu and R. Leahy, “An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation,” IEEE transactions on pattern analysis and machine intelligence, vol. 15, no. 11, pp. 1101–1113, 1993.
  • [28] B. Banerjee, S. Varma, K. M. Buddhiraju, and L. N. Eeti, “Unsupervised multi-spectral satellite image segmentation combining modified mean-shift and a new minimum spanning tree based clustering technique,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 3, pp. 888–894, March 2014.
  • [29] D. Comaniciu and P. Meer, “Mean shift: a robust approach toward feature space analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603–619, May 2002.
  • [30] W. Tao, H. Jin, and Y. Zhang, “Color image segmentation based on mean shift and normalized cuts,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 37, no. 5, pp. 1382–1389, 2007.
  • [31] B. Georgescu, I. Shimshoni, and P. Meer, “Mean shift based clustering in high dimensions: A texture classification example,” in ICCV, vol. 3, 2003, p. 456.
  • [32] I. Shimshoni, B. Georgescu, and P. Meer, “Adaptive mean shift based clustering in high dimensions,” Nearest-neighbor methods in learning and vision: theory and practice, pp. 203–220, 2006.
  • [33] C. Couprie, L. Grady, L. Najman, and H. Talbot, “Power watershed: A unifying graph-based optimization framework,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, no. 7, pp. 1384–1399, 2011.
  • [34] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graph-based image segmentation,” International journal of computer vision, vol. 59, no. 2, pp. 167–181, 2004.
  • [35] M. Singh and S. Singh, “Image segmentation optimisation for x-ray images of airline luggage,” in Computational Intelligence for Homeland Security and Personal Safety, 2004. CIHSPS 2004. Proceedings of the 2004 IEEE International Conference on.   IEEE, 2004, pp. 10–17.
  • [36] B. R. Abidi, Y. Zheng, A. V. Gribok, and M. A. Abidi, “Improving weapon detection in single energy x-ray images through pseudocoloring,” IEEE Transactions on Systems, Man, and Cybernetics - Part C: Applications and Reviews, vol. 36, no. 6, p. 784, 2006.
  • [37] G. Heitz and G. Chechik, “Object separation in x-ray image sets,” in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on.   IEEE, 2010, pp. 2093–2100.
  • [38] L. Eger, P. Ishwar, W. C. Karl, and H. Pien, “Classification-aware dimensionality reduction methods for explosives detection using multi-energy x-ray computed tomography,” in Computational Imaging IX, vol. 7873.   International Society for Optics and Photonics, 2011, p. 78730Q.
  • [39] N. Megherbi, T. P. Breckon, and G. T. Flitton, “Investigating existing medical ct segmentation techniques within automated baggage and package inspection,” in Optics and Photonics for Counterterrorism, Crime Fighting and Defence IX; and Optical Materials and Biomaterials in Security and Defence Systems Technology X, vol. 8901.   International Society for Optics and Photonics, 2013, p. 89010L.
  • [40] A. Mouton and T. P. Breckon, “Materials-based 3d segmentation of unknown objects from dual-energy computed tomography imagery in baggage security screening,” Pattern Recognition, vol. 48, no. 6, pp. 1961–1978, 2015.
  • [41] ——, “A review of automated image understanding within 3d baggage computed tomography security screening,” Journal of X-ray science and technology, vol. 23, no. 5, pp. 531–555, 2015.
  • [42] L. Martin, A. Tuysuzoglu, W. C. Karl, and P. Ishwar, “Learning-based object identification and segmentation using dual-energy ct images for security,” IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 4069–4081, 2015.
  • [43] M. Baştan, M. R. Yousefi, and T. M. Breuel, “Visual words on baggage x-ray images,” in Computer Analysis of Images and Patterns, P. Real, D. Diaz-Pernil, H. Molina-Abril, A. Berciano, and W. Kropatsch, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 360–368.
  • [44] M. Kundegorski, S. Akcay, M. Devereux, A. Mouton, and T. Breckon, “On using feature descriptors as visual words for object detection within x-ray baggage security screening,” IET Conference Proceedings, pp. 12 (6 .)–12 (6 .)(1), January 2016. [Online]. Available:
  • [45]

    S. Akçay, M. E. Kundegorski, M. Devereux, and T. P. Breckon, “Transfer learning using convolutional neural networks for object classification within x-ray baggage security imagery.”   IEEE, 2016.

  • [46] S. Akçay, M. E. Kundegorski, C. G. Willcocks, and T. P. Breckon, “Using deep convolutional neural network architectures for object classification and detection within x-ray baggage security imagery,” IEEE Transactions on Information Forensics and Security, vol. 13, no. 9, pp. 2203–2215, Sept 2018.
  • [47] S. B. Strebelle, A. G. Journel et al., “Reservoir modeling using multiple-point statistics,” in SPE Annual Technical Conference and Exhibition.   Society of Petroleum Engineers, 2001.
  • [48] J. Kehres, M. Lyksborg, and U. L. Olsen, “Threat detection of liquid explosives and precursors from their x-ray scattering pattern using energy dispersive detector technology,” Proceedings of S P I E - International Society for Optical Engineering, vol. 10391, 2017.
  • [49] U. Olsen, E. Christensen, M. Khalil, Y. Gu, and J. Kehres, “Detector response artefacts in spectral reconstruction,” Proceedings of SPIE - International Society for Optical Engineering, vol. 10391, 2017.
  • [50] F. Verdun, D. Racine, J. Ott, M. Tapiovaara, P. Toroi, F. Bochud, W. Veldkamp, A. Schegerer, R. Bouwman, I. H. Giron, N. Marshall, and S. Edyvean, “Image quality in ct: From physical measurements to model observers,” Physica Medica, vol. 31, no. 8, pp. 823 – 843, 2015. [Online]. Available:
  • [51] F. F. Behrendt, B. Schmidt, C. Plumhans, S. Keil, S. G. Woodruff, D. Ackermann, G. Mühlenbruch, T. Flohr, R. W. Günther, and A. H. Mahnken, “Image fusion in dual energy computed tomography: effect on contrast enhancement, signal-to-noise ratio and image quality in computed tomography angiography,” Investigative radiology, vol. 44, no. 1, pp. 1–6, 2009.
  • [52] R. J. Jaszczak, R. E. Coleman, and F. R. Whitehead, “Physical factors affecting quantitative measurements using camera-based single photon emission computed tomography (spect),” IEEE Transactions on Nuclear Science, vol. 28, no. 1, pp. 69–80, Feb 1981, definition SNR.
  • [53] S. Z. Gilani and N. I. Rao, “A clustering based automated glacier segmentation scheme using digital elevation model,” in Digital Image Computing: Techniques and Applications, 2009. DICTA’09.   IEEE, 2009, pp. 277–284.
  • [54] M. Vopálenskỳ, D. Vavrík, and I. Kumpová, “Optimization of acquisition parameters in radiography and tomography,” in 7th Conference on Industrial Computed Tomography, Leuven, Belgium (iCT 2017) CT in NDT and Manufacturing: NDT. net, 2017, p. 20845.
  • [55] E. Angelopoulou, S. W. Lee, and R. Bajcsy, “Spectral gradient: a material descriptor invariant to geometry and incident illumination,” in Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, Sept 1999, pp. 861–867 vol.2.
  • [56] A. Gionis, P. Indyk, R. Motwani et al., “Similarity search in high dimensions via hashing,” in Proc. Int. Conf. on Very Large Data Bases, vol. 99, no. 6, 1999, pp. 518–529.
  • [57] I. Wolf, M. Vetter, I. Wegner, T. Böttger, M. Nolden, M. Schöbinger, M. Hastenteufel, T. Kunert, and H.-P. Meinzer, “The medical imaging interaction toolkit,” Medical image analysis, vol. 9, no. 6, pp. 594–604, 2005.
  • [58] C. P. Botha and F. H. Post, “Hybrid scheduling in the DeVIDE dataflow visualisation environment.” in SimVis, 2008, pp. 309–322.
  • [59] J. F. Barrett and N. Keat, “Artifacts in ct: recognition and avoidance,” Radiographics, vol. 24, no. 6, pp. 1679–1691, 2004.
  • [60] S. Karimi, H. Martz, and P. Cosman, “Metal artifact reduction for ct-based luggage screening,” Journal of X-ray science and technology, vol. 23, no. 4, pp. 435–451, 2015.
  • [61] M. N. Bongers, C. Schabel, C. Thomas, R. Raupach, M. Notohamiprodjo, K. Nikolaou, and F. Bamberg, “Comparison and combination of dual-energy-and iterative-based metal artefact reduction on hip prosthesis and dental implants,” PLoS One, vol. 10, no. 11, p. e0143584, 2015.

Appendix A 2D multi-object composition

Fig. 25: Sketch on the location of each sample referred to in tab. V.
Sub-dataset Material 1 Material 2 Material 3 Material 4
Sample 1 Olive oil Brandy Chantré Cien hand cream Whiskey Tullamore Dew
Sample 2 Nivea sun lotion 50+ Brandy Chantré Methanol HO (50%)
Sample 3 Garnier Fructis Whiskey Tullamore Dew Nitromethane Brandy Chantré
Sample 4 HO HO (50%) Garnier Fructis Nivea sun lotion 50+
Sample 5 Cien hand cream Nitromethane Whiskey Tullamore Dew HO
Sample 6 HO Brandy Chantré Olive oil Whiskey Tullamore Dew
Sample 7 Brandy Chantré Garnier Fructis HO (50%) Nivea sun lotion 50+
Sample 8 HO (50%) Methanol Garnier Fructis Olive oil
Sample 9 Methanol Cien hand cream Acetone Nivea sun lotion 50+
Sample 10 Acetone HO Garnier Fructis Whiskey Tullamore Dew
Sample 11 Acetone HO Garnier Fructis Whiskey Tullamore Dew
Sample 12 Brandy Chantré Olive oil Methanol HO (50%)
Sample 13 Garnier Fructis Whiskey Tullamore Dew HO Acetone
Sample 14 Nitromethane Nivea sun lotion 50+ Cien hand cream Brandy Chantré
Sample 15 HO (50%) Olive oil Garnier Fructis Nitromethane
Sample 16 Nivea sun lotion 50+ Whiskey Tullamore Dew HO Olive oil
Sample 17 Whiskey Tullamore Dew Brandy Chantré Garnier Fructis Nitromethane
Sample 18 Cien hand cream Nivea sun lotion 50+ Acetone HO (50%)
Sample 19 Olive oil Garnier Fructis Cien hand cream Brandy Chantré
Sample 20 HO Acetone Olive oil Whiskey Tullamore Dew
Sample Test Olive oil Brandy Chantré Whiskey Tullamore Dew Cien hand cream
TABLE V: Complete overview of all materials inside the composed MUSIC2D subset. The naming is counter-clockwise, starting at the top. An aluminium pin at top-right hand corner and coordinate system per image originates top-left hand corner.

Appendix B 3D manual segmentations

Fig. 26: Sample 23012018
Fig. 27: Sample 24012018
Fig. 28: Sample 31102016
Fig. 29: Fluids
Fig. 30: Fruits
Fig. 31: Non-threat items
Fig. 32: Threat items