1.1 Notations and Conventions
We follow the notations of Depeursinge et al. 14 and Unser 40. The imaginary symbol for complex numbers is denoted by . A continuous image is modeled as a -dimensional function of the spatial coordinates , taking values
. The Fourier transform of an integrableis noted and is defined as
is the frequency coordinate vector anddenotes the scalar product.
A discrete image is modeled as a -dimensional function of the variable , taking values . The discrete Fourier transformaaaIn this discrete setting, the normalised Nyquist frequency is denoted by . of a summable is noted , with the discrete frequency coordinate vector, and is defined as
1.1.1 Image Directions
Two reference frames are important for image filtersbbbFor further information, please see https://www.slicer.org/wiki/Coordinate_systems, as of November 2019.: (i) the patient frame of reference and (ii) the image frame of reference. The patient frame of reference determines how the image itself is oriented with regard to anatomical directions. Filters are computed in the image frame of reference, which does not need to exactly match the patient frame of reference. However, to use filters consistently, all images in a given study should have the same orientation relative to the patient frame of reference. This may require the rotation of image sets (e.g. scans) to a common orientation relative to the patient frame of reference, e.g. in a study with mixed sagittal and axial images. Besides, if data stored in different image formats are used in the same study, special care needs to be taken to ensure that the image reference frame of all image sets have the same orientation relative to the patient frame of reference (e.g. this may require flipping and/or transposing images along different directions). For example, image file types such as Digital Imaging and COmmunications in Medicine (DICOM) and Neuroimaging Informatics Technology Initiative (NIfTI) formats may store an image using a different orientation relative to the patient frame of reference. Overall, to maximize the reproducibility potential of a given study, two important parameters should be reported: (i) the common orientation of the image frame of reference of all image sets relative to the patient frame of reference (e.g. the Image Orientation Patient (0020 0037) field for the DICOM format) and (ii) if applicable, the (potentially different) rotation operations applied to each image set separately to ensure a common orientation of the image frame of reference relative to the patient frame of reference.
The image frame of reference determines how the stored image grid is oriented and thus where image voxels are located. Within the image reference frame, we define filter directions as follows:
(i.e. ) goes from left to right with increasing grid index values,
(i.e. ) goes from top to bottom with increasing grid index values,
(i.e. ) goes from front to back with increasing grid index values.
This section provides an overview of the filter-based image biomarker extraction process, which is shown in Fig. 1.1. It consists of the following steps:
1.2.1 Image Padding
is convolved with a filter , which yields a response map with the same dimension as . This process is described in Section 2 and consists of applying the filter at all possible locations in the input image. Local or global rotation invariance operations may be applied during this step to minimise the variation of with respect to rotations of the input image, as well as variations in orientations of local image patterns (e.g. tumor walls or vessels, see Chapter 3).
is not a directly usable quantitative image feature and needs to be aggregated (i.e. summarised) over a given Region Of Interest (ROI) using an aggregation function 14. The latter will transform the response map into a scalar measurement
, which can be further used in a statistical or machine learning model.can be seen as a new image derived from , on which any feature defined in the first IBSI reference manual can be computed 49. However, the interpretability of the latter would be difficult (unless the filter is simply used to e.g.
remove high frequency noise), and the large number of possible combinations of filters and features increases the risk of fortuitous correlations between features and clinical endpoints. In practice, the most obvious aggregation functions are the first four statistical moments of the voxel value distribution in, the first one being the average:
where denotes the number of voxels in the ROI. It is worth noting that the influence of the aggregation function is as important as the filter and it should be considered carefully 12. For the particular case of average-based aggregation in Eq. (1.3), one must be careful when using zero-mean filters (e.g. LoG, some Laws kernels, Gabor, wavelets, Riesz) leading to . In this case, taking the energy or the absolute value of the response map before the aggregation is recommended.
2.1 Separability of the Convolution
Specific filter subtypes are separable, which means that their -dimensional kernel can be obtained from the outer product of 1-dimensional kernels as
An example of a separable 2 unit-norm smoother is
The steps of a separable convolution process of a image with are illustrated in Fig. 2.1.
Thanks to the associativity of convolution, separable filters can be implemented efficiently with successive convolutions with 1 filters when compared to more computationally expensive convolutions with -dimensional filters. For instance, we have the composition
However, satisfying Eq. (2.5) strongly constrains the design of the filter. Moreover, because image axes are analysed separately, separable convolutions yield directional image analyses biased along , and with no rotation invariance by default (the importance of the latter is detailed in Section 3.1). The only separable and directionally insensitive kernel is the Gaussian.
As a convention, the filter responses are noted when resulting from a convolution with the three 1 filters as follows: along , along , and along (see section 1.1.1). The corresponding 3 kernel is noted .
2.2 Boundary Conditions
Computing the discrete convolution as in Eq. (2.2) when the distance between the centre of the filter and a boundary of the image is smaller than half of the support
of the filter requires accessing pixel values outside the support of the image. Therefore, a method must be used to impute pixel values in the vicinity of the image boundaries,i.e., to generate an extended image that includes a margin with a width/height/depth of . Four common padding methods are described in the following subsections and are compared qualitatively in Fig. 2.2 and 2.3, where a CT image of honeycombing lung parenchyma (Fig. 2.2, ) is smoothed by convolution with an isotropic Gaussian filter (Fig. 2.2, , ).
Because all padding methods make arbitrary assumptions concerning image content beyond the boundaries, a method should be chosen based on the expected image background. However, general advantages and disadvantages exist and are discussed in the next sections.
2.2.1 Constant Value Padding
The simplest method to extend the support of is to pad it with a constant value as
Thus, all pixels outside the original image are assigned the constant value . This is illustrated in Fig. 2.2. Though constant value padding is simple to implement, potentially sharp transitions between and the image value at a boundary may yield inconsistent filter responses. Distortions may appear in the boundary region of the response image as a consequence. Such artefacts can be observed in all boundaries of the filtered image in Fig. 2.2. With , this method is also called zero padding.
2.2.2 Nearest Value Padding
The nearest value padding method consists of repeating the values of the image at the boundary. We have
where is the closest neighbour in . Thus, all pixels outside the original image get the intensity value of the closest pixel in the original image. This is illustrated in Fig. 2.2. The advantage of this method is that the transitions at the boundary are usually relatively small. However, it introduces a nonexistent pattern in the response image. This method is also called replicate.
Another straightforward method is to repeat the image along every dimension, i.e., so to periodise the image content. The extended image is therefore equivalent to the original image modulo its support as
for indices . This is illustrated in Fig. 2.3. The introduced patterns are consistent with the actual image content. However, strong transitions are possible at the boundaries. The subsequent artefacts can be observed in the upper part of the left boundary of the example response image (see Fig. 2.3). It is worth noting that this periodisation is implicit if the convolution operation is performed in the Fourier domain using Eq. (2.3). This method is also called wrapping, circular or tiling.
The mirror folding method consists of symmetrising the image at the boundaries. The extended image is , with , where
for indices .
This is illustrated in Fig. 2.3. The introduced patterns are consistent with the actual image content and the transitions at the boundaries are relatively smooth, minimising convolution artefacts (see Fig. 2.3). Depending on the software implementation, the extended image may include or exclude the boundary pixels of the original image. This method is also called symmetric.
2.2.5 Considerations for Radiomics
In many radiomics applications, the choice of boundary extension method is generally not important, as long as the ROI is sufficiently far from the image boundary, i.e. at least by more than half of the convolution filter width/height/depth. In the uncommon case where the ROI is close to the border, the mirror method may generally be recommended because it avoids sharp transitions and artificial patterns.
3.1 Geometric Invariances for Medical Image Analysis
Analysis of medical imaging requires invariance or equivariance to various geometric transformations. As in Depeursinge et al. 14, we define the invariance of a function to a transformation of the image as . In the case of filtering, this means that the response map obtained by the convolution of the image with a filter is unchanged under the effect of . The equivariance is defined as , i.e. the response map undergoes the same transformation as the image.
Translation: The response maps of linear filters are equivariant to translation by construction, thanks to the convolution operation. For a translation transformation by , , meaning that the filtering process does not differ according to position, e.g. if an input leads to response , then input should lead to response . Equivariance to translations is required in medical imaging since we want to keep track of the positions where the filter responded to create full response maps, where patterns of interest may appear at random positions.
Scaling: In most cases, scale is a discriminative property in medical images. Thus, filters should not be scale invariant.
Rotation: Response maps should be equivariant to global rotations and invariant to local rotations as defined in Depeursinge et al. 14. We consider a global rotation when applied around the origin of the image. In this case, equivariance to global rotations is required for the same reasons as translation equivariance, where the positions where the filter responded should rotate in the same fashion as the image itself. A local rotation around a given position requires the following: . For a transformation that locally rotates patterns, we seek the invariance . Invariance to local rotations is required to equivalent obtain filter responses to any orientation of a local pattern. For example, patterns such as vessels may have arbitrary orientations and all must be equivalently characterised (see Section 3.3.1. of Depeursinge et al. 14). In general, we are not interested in the local orientation of the pattern itself, but rather to its presence only, hence justifying the invariance to local rotations.
3.2 Directional Sensitivity
Most structures of interest in medical images are composed of directional components such as edges and corners. Filters may be sensitive to image directions, but not necessarily. This has a one-to-one relation with the circular/spherical symmetry of the filter 16. We adopt the definition of directionality introduced in Tamura et al. 39. Formally, a non-directionally sensitive filter consists of a circular/spherical averaging, i.e. only depends on the radius . Directional sensitivity is important: such circularly/spherically symmetric filters cannot differentiate between blob and tubular structures (e.g. a small nodule and a vessel). An example of a non-directionally sensitive filter is the Laplacian of Gaussian (LoG) defined in Section 4.3 and in particular Eq. (4.2) (see Fig. 4.1). An example of a directionally sensitive filter is a Gabor filter defined in Section 4.5 (see Fig. 4.5).
3.3 Combining Directional Sensitivity and Invariance to Local Rotations
Ideally, filters should combine directional sensitivity with invariance to local rotations. A simple convolutional filter is invariant to local rotations (as defined in Section 3.1) if and only if the filter is directionally-insensitive, i.e. circularly symmetric (see Proposition 1 of Andrearczyk et al. 2). Therefore, slightly more complex and non-linear filtering designs are needed to combine the two properties.
A suitable strategy to achieve invariance to local rotations with a directionally-sensitive filter is to (i) compute a pseudo rotation equivariant representation via a collection of rotated filter responses at uniformly sampled orientations and (ii) voxelwise orientation pooling: taking either the average or the max of the oriented responses, yielding the locally rotation invariant response maps or , respectively.
The pseudo equivariant representation (i) is
where is a rotation matrix and is a uniformly sampled set of rotations (e.g. the set of right-angle rotations in 2). The process is illustrated in 2 in Fig. 3.1. It is worth noting that Eq. (3.1) can be efficiently computed using steerable filters, which obviates the need to reconvolve the image with rotated versions of the filters 45, 1. In the particular case of separable filters and right angle rotations, the equivariant representation can be efficiently obtained using permutations and unidimensional filter flipping. The latter is described for the 2 and 3 cases in Appendix A.
Voxelwise orientation pooling (ii) can be done using either the average or the max over the elements of the equivariant representation provided by Eq. (3.1). Although commonly used, average pooling strongly deteriorates the directional sensitivity of the filtering operation. For a large set of rotations, averaging is equivalent to filtering with a single circularly symmetric filter. The max response preserves the directional sensitivity of the filters and achieves invariance to local rotations 10, 1. Taking the max response can be interpreted as “aligning” the filter locally to seek for the best match between the filter and the local image pattern (see Fig. 3.1).
3.4 Spectral Coverage
The type of filters considered here change the frequency contents of an image in the Fourier domain, thereby altering its appearance. Each filter has a frequency profile that can be characterised in the Fourier domain. Filters are usually split into one of four categories based on their frequency profiles:
All-pass filter: An all-pass filter does not change the frequency content of the image. Such filters are rarely used. Examples are the identity filter, where the response image is identical to the input image, and a translation filter, where the response image is identical to the input, except for a shift.
Low-pass filter: A low-pass filter attenuates the high-frequency content of the image. Examples are mean and Gaussian filters, which produce a smoothed version of the input image.
High-pass filter: A high-pass filter attenuates the low-frequency content of the image. High-pass filters may be used to sharpen an image, at the cost of amplifying noise.
Band-pass filter: A band-pass filter attenuates both high- and low-frequency content in an image in specific ranges. Band-pass filters can capture specific spectral signatures of patterns at relevant scales that are related to the problem at hand (e.g., fibrosis, necrosis).
Examples of low-, high- and band-pass filtered images are shown in Fig. 4.9, respectively , and .
4.1 Identification of Common Approaches
4.2 Mean Filter
One of the simplest existing kernels is the mean filter that computes the average intensity over its spatial supportbbb We restrict the definition to odd values of
We restrict the definition to odd values of. as
4.3 Laplacian of Gaussian
The Laplacian of Gaussian (LoG) is a band-pass and circularly/spherically symmetric convolutional operator. It is therefore invariant to local rotations but also directionally insensitive. Its profile only depends on the 1 radius and corresponds to a radial second-order derivative of a -dimensional Gaussian filter as
where the standard deviation of the Gaussiancontrols the scale of the operator.
LoG filtering cannot be implemented with separable convolution and requires a full -dimensional convolution. However, it can be approximated using a Difference of two Gaussians (DoG) when the ratio between their respective standard deviations and is . Because Gaussian kernels are separable, DoG filtering can also be efficiently implemented using separable convolutions. With an adequate sequence of , a collection of LoGs can cover the entire image spectrum. In this case, they form wavelets and are often called Mexican hat, Ricker, or Marr wavelet.
The spatial support of the LoG is (i.e. not compact). Because this would require a filter with infinite spatial support, the LoG filter is cropped in practice, usually based on the parameter. The 1 filter size is then
with the truncation parameter, the parameter in voxel space (e.g., voxels instead of mm for voxel spacing of mm. As a consequence, the size of a (1) LoG filter is at least and will have an odd, integer value. is commonly used for truncation.
The profile of a 2 LoG is illustrated in Fig. 4.1.
The LoG filter can be used to enhance image blobs and ridges at a specific scale, controlled by . This is illustrated in Fig. 4.2.
4.4 Laws Kernels
Laws kernels are a collection of five types of small 1 filters 30. They are combined using outer products to obtain 2 and 3 filters.
The first type is a low-pass kernel called Level for grey level averaging, which is available in two scales with a spatial support of 3 or 5 pixels:
The next kernels are all zero mean, which makes them insensitive to the average grey level. They will solely focus on spatial transitions between the values (i.e. texture). The four types of transitions covered are image Edges:
Spots and circular blobs:
Wave or undulation:
Laws 2 and 3 kernels are separable by design, and response maps are created by 1 kernel convolution along each image direction. Such response maps are referred to by their 1- kernel names. For instance, a 2 response map called is obtained after first convolving the image with the kernel along the image lines (i.e. ) and a subsequent convolution of the image with the kernel along the image columns (i.e. ). Examples of 2 Laws kernels are shown in Fig. 4.3. Note that the above definitions for Laws kernels were normalised, and in this sense deviate from those originally defined by Laws himself 30.
4.4.1 Laws Texture Energy Images
Laws used his kernels to generate texture energy images 30. This is done in two steps. In the first step, a response map is generated by convolving the image with a Laws kernel along each image direction, as described above. Then, a smoothed image is computed where the absolute intensitiescccIt is worth noting that whereas ”energy” often involves the computation of squared quantities, the absolute value was proposed by Laws. The goal is to regroup negative and positive filter responses. of voxels in within Chebyshev distance of a centre voxel are summed to create an energy image :
where and the number of voxels in the -dimensional neighbourhood. In practice, can be computed using kernel convolutions by convolving with a element long 1 kernel with constant values along each of the image directions, i.e. a mean filter as described in Section 4.2.
Note that the definition given above deviates from the one given by Laws 30 by introducing the normalisation factor . Laws moreover suggested using a sliding window of pixels, which corresponds to . We recommend that is chosen within the context of a given application.
To summarize, Laws filtering requires the following sequence of operations: (i) pad the input image, (ii) filter with a given Laws kernel, (iii) pad the response map and (iv) compute the energy image via the mean filter. We suggest using the same padding (see Section 2.2.1) used to compute the initial response of the Laws kernel to compute the energy image. Laws filtering is not rotation-invariant. However, since the kernels are separable, rotational invariance can be efficiently approximated using permutations and unidimensional filter flipping, followed by orientation pooling with e.g. the max (see Section 3.3 and Appendix A). In this case, we recommend computing the texture energy image after orientation pooling.
An example of image filtering with the kernel is shown in Fig. 4.4.
Gabor filter banks allow for extracting multi-directional and multi-scale texture information via a systematic parcellation of the Fourier domain with elliptic Gaussian windows 6 (see Fig. 3.2 of Depeursinge et al. 13). In the spatial domain, 2 Gabor kernels are complex Gaussian-windowed oscillatory functions defined as
where controls the scale of the filter in mm (standard deviation of the Gaussian envelope) and is the wavelength in mm (i.e. inverse of the frequency of the oscillations). is the spatial aspect ratio (i.e. the ellipticity of the support of the filter as ), defines the radial and orthoradial elliptic Gaussian axes at the orientation via the 2 rotation matrix . As a convention, we define to turn clockwise in the axial plane . The considered coordinate system follows the conventions introduced in Section 1.1.1. It is depicted in Fig. 4.5 along with an example of a 2 Gabor filter seen in the spatial domain.
In practice, the Gabor function is used as a filter kernel and the spatial frequency bandwidth of the filter needs to be defined. Petkov et al. 35 established the relation between the half-response spatial frequency bandwidth (in units of octaves) and the ratio as
Gabor filters can then be constructed at multiple scales and orientations to explore the spectrum of patterns in an image. Bianconi et al. 6 proposed to extract response maps at multiple orientations and frequencies along with are defined to cover all directions and scales up to the maximum frequency . It is worth noting that when using a suitable sequence of scales, Gabor filters can also satisfy the wavelet admissibility condition (i.e. referred to as “Gabor wavelets”) and can, in this particular case, fully cover the Fourier domain. Finally, since is complex, the modulus of the associated response map can be used before aggregation and feature calculation as . An example of filtering with Gabor is depicted in Fig. 4.6.
Gabor filters are not rotation-invariant and are therefore best suited for applications where the absolute feature orientation is meaningful. Rotation equivariance and invariance can be approximated by combining the response maps of several elements of the Gabor filterbank (see Section 3.3).
The 3 extension of Gabor filters can be found in Qian et al. 37. However, no recommendations are provided in the 3
case to sample scales and orientations for further feature extraction. One simple option to achieve (i.e. approximate) 3 image analysis is to extract 2 Gabor features as recommended above in the three orthogonal planes of the image frame of reference, followed by an averaging of the response maps over the three planes.
Wavelets form a large category of filtering methods based on a collection of high-pass and low-pass filters that are designed to cover the entire image spectrum 32 (see Section 3.4). Pairwise combinations of one high- and one low-pass filter result usually in a sequence of band-pass response maps (i.e. wavelet coefficients) with a factor of 2 between their scales and one remaining low-pass response map. Two properties must be considered when implementing wavelet transforms, namely decimation and separability. Decimation relates to the downsampling operation of the response maps and is compared in Sections 4.6.1 and 4.6.2. Separable and non-separable wavelets concern the separability of high- and low-pass filters and are detailed in Sections 4.6.3 and 4.6.4.
4.6.1 Decimated Transform
The decimated transform is not redundant and allows coding images with a minimal number of coefficients. However, the response maps containing the coefficients are iteratively decimated, which means that their size decreases throughout the levels of the decomposition, leading to a lack of translation invariance.
For instance, with separable wavelets (see Section 4.6.3), the image is first convolved with a high-pass filter and a matching low-pass filter along each image direction. In 2, this yields four response maps: , , and . All response maps are then downsampled by a factor of two in all directions to become , , and . This concludes the first iteration of the discrete wavelet transform. For the next iterations, the low-pass coefficients are subsequently convolved with and along each image direction and downsampled. This means that the response map of each decomposition level is used as input image for the next level . It is worth noting that is traditionally discarded when the wavelet transform is used for image compression and reconstruction. Because the response maps are downsampled times, this has the same effect as dilating (i.e. upsampling) the filters by in Eq. (4.5). The downsampling of the response maps yields the wavelet coefficients of iteration . The response maps resulting from the decimated separable wavelet decomposition is illustrated in Fig. 4.7 for the 2 case. Although illustrated in the context of separable wavelets, decimated transforms can also be used with non-separable wavelets.
It is worth noting that the convolution is, in this case, modified because the shifts are restricted to the resolution of the times downsampled response maps 11. In addition, this must be taken into account when aggregating the response maps (see Section 1.2), where the ROI mask must also be downsampled to match the dimensions of the corresponding response maps .
4.6.2 Undecimated Transform
The undecimated transform, also called stationary transform, yields a translation-invariant decomposition by obviating the downsampling steps required by the decimated transform. Although the transform becomes redundant (i.e. it yields more coefficients than strictly required for a perfect reconstruction), it is better suited to our case where we are not interested in image coding but rather image analysis.
The separable undecimated transform involves upsampling of the wavelet filters and to achieve the multiscale decomposition via the matched-size filters in Eq. (4.5). While others exist, a common upsampling approach is to use the à trous algorithm 20. The response maps have the same dimensionality as the input image and are simply obtained via the convolution of with the filters and along each image direction. In 2, this yields four response maps for every iteration: , , and .
The response maps resulting from the undecimated separable wavelet decomposition is illustrated in Fig. 4.8 for the 2 case. Again, although illustrated here for separable wavelets, non-decimated transforms can also be used with non-separable wavelets.
4.6.3 Separable Wavelets
The discrete separable wavelet transform yields a collection of 1 wavelet kernels obtained from dilations of one unique mother wavelet function, which is a high-pass filter 11. The remaining low frequencies are covered by a low-pass filter called scaling function. When considering dyadic dilations, the scale of the kernels and are indexed by as
The distinctive property of the separable wavelet transform is to use the separable convolution. The latter is computationally efficient but for radiomics has two distinct disadvantages: it is not rotationally invariant, and only strictly separable wavelets can be used (see Section 2.1).
In 3, the convolution of the low-pass and high-pass filters in the three directions of space yields eight different wavelet response maps: , , , , , , and . Within the image frame of referencedddThe image frame of reference of every image set (e.g. a scan) of a study should have the same common orientation relative to the patient reference frame, see Section 1.1.1., these three directions are , and . For example, let us consider the response map: a low-pass filter is applied in the direction, a high-pass filter is applied in the direction and a low-pass filter is applied in the direction (see Section 2.1).
The simplest separable wavelet is the Haar wavelet. The Haar wavelet and scaling function form the simplest admissible function pair and are defined as 31
Daubechies wavelets and scaling functions are characterised by their number of vanishing moments , i.e. their ability to approximate polynomials of order up to 11. For Daubechies kernels are equivalent to Haar. Kernel values of wavelet and scaling functions for can be obtained from the PyWavelets websiteeeehttp://wavelets.pybytes.com/family/db/, as of July 2018..
There exists a large number of other separable wavelet/scaling function pairs (e.g. Meyer, Coiflets), each targeting different objectives in terms of signal analysis. The kernel values of most common wavelets can be obtained from the Wavelet Browser of PyWaveletsfffhttp://wavelets.pybytes.com, as of July 2018..
4.6.4 Non-separable Wavelets
As motivated in Section 3.1, filters (and more generally texture operators) should be invariant or equivariant to local rotations in most cases. However, with the exception of the Gaussian filter, all separable filters (including separable wavelets) are not invariant/equivariant to rotations. Therefore, it is interesting to consider non-separable wavelets to achieve isotropic image analysis. The starting point for the definition of non-separable wavelets is a single (unidimensional) radial profile in the Fourier domain as , where is a function of defining the radial coordinate 42. As a consequence, these functions are circularly symmetric and can be further combined with directional analysis such as the Riesz transform (Section 4.7). Similarly to separable wavelets, they can be implemented using either a decimated or undecimated transform.
Non-separable wavelets are easily implemented directly in the Fourier domain using their unidimensional radial profile. The radial coordinate is obtained as . The support of the filter in the Fourier domain is the same as the considered image to allow the efficient computation of the convolution as a simple Hadamard product via Eq. (2.3). Each coordinate is contained in and the null frequency lies at the center of the image. The maximum coordinate value is denoted as and called the normalized Nyquist frequency. When the input image is not a cubic array, zero padding can be used in the spatial domain and before the FFTgggZero padding in the spatial domain corresponds to a sinc interpolation in the Fourier domain. to make it cubic and to ensure that the corresponding wavelets filters are circularly symmetric.
The simplest non-separable and circularly symmetric wavelet is the Shannon wavelet 42 characterised by the following mother function
Eq. (4.6) corresponds to the sinc wavelet in the spatial domain, which has the disadvantage of having large spatial supports. Therefore, an interesting alternative is the smoother and more compactly supported (in space) Simoncelli wavelet 36, defined as
The response maps resulting from the undecimated non-separable wavelet decomposition is illustrated in Fig. 4.9 for the 2 case. Because these wavelets are non-separable, there is only one high-pass response map per decomposition level and one remaining low-pass response map . An example of a Simoncelli wavelet is shown in Fig. 4.10. Other alternatives can be found in Table 1 of Unser et al. 42.
4.6.5 Wavelets: Considerations for Radiomics
To summarise, undecimated non-separable wavelets have the advantage of yielding isotropic (i.e. rotation invariant/equivariant, see Section 3.3) and translation equivariant image analysis. Moreover, they yield only one response per decomposition level (i.e. one per scale), which significantly reduces the number of radiomics features when compared to their separable counterpart. Separable wavelets were mostly designed for image coding, which has very different design constraints. They yield a large collection of response maps that are biased towards image axes and lack rotational invariance. It is possible to make them rotation invariant using orientation pooling over equivariant right angle representations as suggested in Section 3.3 and Appendix A. However, when the directionality of image patterns of interest (e.g., tumour margin, vessels) is expected to be important, aligned directional wavelet filters such as Riesz (Section 4.7) that allow combining directional sensitivity with local invariance to rotations can be considered (see Section 3.3).
Whereas non-separable wavelets like Simoncelli constitute an excellent first solution for using wavelets in radiomics studies, they are also circularly symmetric and therefore cannot characterise directional patterns. However, these non-separable circularly symmetric wavelets can be advantageously combined with directional analysis methods such as directional derivatives or spherical harmonics.
An elegant approach to obtain features measuring directional transitions between pixel values (e.g. gradient- or Hessian-like) is to compute them in the Fourier domain instead of using pixel differences in Gaussian-smoothed images 13. This also provides the opportunity to easily compute higher-order image derivatives of order as
where . It can be noticed that differentiating an image along the direction only requires multiplying its Fourier transform by . Computing -order derivatives has an intuitive interpretation (e.g., gradient for , curvature for ), which makes them attractive for understanding the meaning of the texture measures in a particular medical or biological application context. However, a pure image derivative filter as computed in Eq. (4.8) is high-pass (because multiplied by ) and accentuates high frequencies along . Therefore, it is desirable to implement image derivatives as all-pass filters, which is what the 1st-order Riesz transform yields as 41
It can be noticed that dividing the Fourier representation by the norm of transforms Eq. (4.8) in all-pass operators . Eq. (4.9) can be used to compute first-order directional derivatives (i.e. gradient-like for characterising image edges). However, higher-order derivatives can be relevant for radiomics studies (e.g. second-order or Hessian-like for characterising image ridges like tumour margin or vessels). For a fixed (maximal) order , the collection of higher-order all-pass image derivatives are defined in the Fourier domain as
which yields a total of all-pass filters for all combinations of the elements of the vector as . The collection of Riesz operators of order is denoted by . For instance, in 3 and with , the element corresponds qualitatively to a second-order derivative of along the direction and we have
A set of band-pass, multi-scale and multi-orientation filters can be obtained by simply applying the Riesz transform to circularly symmetric non-separable wavelets e.g. Eq. (4.7) or multi-scale filters e.g. the LoG filter Eq. (4.2), as
An example of a 2 Riesz filterbank for and combined with a Simoncelli wavelet is shown in Fig 4.11. Eq. (4.11) yields a collection of filters measuring -th order (scaled) derivatives. However, these derivatives are computed along image axes , which entails similar challenges as separable wavelets: (i) analyzing images along their axes does not have a particular signification for the problem at hand (e.g. characterising local tissue structures within a tumour) and (ii) they are not rotation invariant/equivariant.
4.7.1 Aligning Riesz kernels
To address (i) and (ii), one can locally align all Riesz filters using a given alignment criterion to combine directional sensitivity with local invariance to rotations, as motivated in Section 3.3 and Fig. 3.1. Whereas the max orientation pooling operation can be seen as an alignment criterion, the latter only focuses on the maximum response of one single filter. Since Riesz filter banks include several filters with distinct profiles (see e.g. Fig. 4.11 with order ), other alignment criteria based on meaningful quantities can be used. In particular, two suitable alignment criteria are those maximizing the gradients or Hessian 18
, which can be efficiently computed directly from the Riesz coefficients using the structure (or Hessian) tensor8. The formulae for computing the regularised structure tensor from Riesz coefficients can be found in Section IV-B of Chenouard et al. 8
. Regularizing the structure tensor is important to determine the scale of the structures from which the orientation is important. This must be optimized for the problem at hand via the varianceof a Gaussian window noted as in Section IV-B of 8. Computing these alignment maps allows determining how to further orient all elements of the Riesz filter bank to achieve locally rotation invariant image analysis. Moreover, thanks to the steerability property of the Riesz transform 41, no additional convolution operations are required to compute the response of rotated filters. A steerable filter 25 is a filter that can be obtained as a linear combination of steerable kernels, and all rotations of the former can also be obtained via another linear combination of the filter responses parameterised by the desired rotation (e.g. one rotation angle in 2), which obviates the need to re-convolve the oriented filter with the input image. The formulae and directions for computing the steered Riesz coefficients are detailed in Section III-E of Chenouard et al. 8. To summarise, aligned Riesz filters allow directional and rotation-invariant image analysis that can characterise interpretable transitions in medical images. Rotationally-invariant steerable representations can also be learned to be specific to the problem at hand 15, 1, also with CNNs 1, 2, 45, 46, 44, 5.
Implementations of the Riesz transform are available such as in 27hhhhttps://pypi.org/project/itk-isotropicwavelets/, as of July 2019., 8iiihttp://bigwww.epfl.ch/demo/steerable-wavelets-3d/, as of July 2019. and its adaptation for texture feature extraction, including filter alignment, in 18jjjhttp://publications.hevs.ch/index.php/publications/show/2035/, as of July 2019.. An example of image filtering in 2 with the Riesz kernel is shown in Fig. 4.12.
4.8 Qualitative Comparison of Linear Filter Properties
Based on the quantitative comparison criteria introduced in Section 3, it appears that not all approaches can fulfill all properties that are desirable for local medical image analysis. One typically observes a trade off between implementation simplicity and efficiency versus filter specificity and invariance/equivariance to geometric transforms 16. Table 4.1 provides a qualitative comparison of the filtering methods detailed in Section 4.
While approximated rotation invariance can be artificially added to all methods by computing the average of directional response maps, this often annihilates directional sensitivity (see Section 3.3). Other designs allow combining rotation invariance with directional sensitivity (see Section 3.3).
|directional sensitivity||local rotation invariance||separability of the convolution||wavelet|
|LoG||no||yes||no (yes with DoG)||yes/no|
|isotropic non-separable wavelets||no||yes||no||yes|
|aligned wavelets (e.g. Riesz)||yes||yes||no||yes|
6.1 Phase 1: Benchmarking filters using digital phantoms
The IBSI developed several 3 phantoms to test image filter implementations. All phantoms (but the orientation phantom) have the same dimension and consist of isotropic voxels with 2.0 by 2.0 by 2.0 mm spacing. 8-bit voxel intensities in all phantoms fall in the range . The phantoms are stored in the NIfTI format. The phantoms are shown in Fig. 6.1. They include:
Empty phantom (empty.nii.gz): all voxels have intensity. Intended to investigate the convolution process.
Impulse response phantom (impulse_response.nii.gz): all but one voxel have intensity . The single remaining centre voxel has an intensity of . This allows visualisation of the filter itself.
Checkerboard phantom (checkerboard.nii.gz): alternates between cubic regions with intensity and with intensity .
Noise phantom (noise.nii.gz): contains Gaussian noise with mean intensity of and a standard deviation of . As such it has no inherent structure.
Sphere phantom (sphere.nii.gz): consists of four concentric spherical hulls with intensity that are centred on the phantom centre. Thus, the phantom lacks directionality.
Pattern #1 phantom (pattern_1.nii.gz): this is the first of three phantoms that involve directionality. Three perpendicular lines (intensity ) intersect at the centre of the phantom. Along with Pattern #2 and Pattern #3 phantoms, it is intended to investigate filtering methods able to characterise the local organisation of image directions 17.
Pattern #2 phantom (pattern_2.nii.gz): this is the second of the directional phantoms. The phantom contains three parallel lines (intensity ).
Pattern #3 phantom (pattern_3.nii.gz): this the last of directional phantoms. The phantom contains three lines (intensity ), of which two are parallel.
Orientation phantom (orientation.nii.gz): Not all filters are rotationally invariant, and therefore the direction along which filters are applied affects the response map. Using the orientation phantom you can check whether the orientation of the image coordinate system in your software matches the orientation expected by the IBSI. The orientation phantom has a dimension of voxels along (), () and () axes, respectively. The pixel intensity increases with the distance from the origin, which has an intensity of . The most distal voxel has an intensity of .
The aim of benchmarking the filters using the phantom images is to arrive at a standard implementation. Therefore, filters are applied to the phantoms to create a response maps , which have the same dimensions as the phantoms ( voxels). Instead of calculating one or more features from each response map (see Section 1.2), the entire response map is to be submitted online to ease debugging and find differences in software implementations. Therefore it is important to note the following:
The phantom data need to be converted from an integer data type to at least 32 bit floating point precision, prior to filtering.
Filters need to be applied using the settings provided in Table 6.1. Some settings refer to real-world spacing (in millimetres), instead of voxel spacing.
The response map needs to be exported in a (compressed) NIfTI format, with at least 32 bit floating point precision. Please make sure that the response map has the correct dimensions.
The response map should be uploaded to https://radiomics.hevs.ch/ibsi. This requires you to log in using a Github account.
The voxel-wise distance between response maps from the different teams will then be compared to identify a consensus response map. In order to visualise and quantify the discrepancy between response maps
, a variational approach based on Principal Component Analysis (PCA) in thespace of voxel space is used. Each considered response map is an observation in this space. In addition, a consensus is defined as the average cluster (centroid in ) of all submissions. A visualisation of the first two PCA components allows assessing the distribution and dispersion of the submissions and the consensus. In addition, a boxplot is proposed to visualise the distribution of the distances to the consensus in
and to suggest outliers. An example of comparison of 120 simulated response maps is shown in Fig.6.2.
|1.a.1||mean||checkerboard||support , zero padding|
|1.a.2||support , nearest value padding|
|1.a.3||support , periodic padding|
|1.a.4||support , mirror padding|
|2.a||LoG||impulse||[nosep,after=,leftmargin=*] zero padding scale mm, filter size cutoff|
|2.b||checkerboard||[nosep,after=,leftmargin=*] mirror padding scale mm, filter size cutoff|
|3.a.1||Laws||impulse||[nosep,after=,leftmargin=*] zero padding E5L5S5 response map|
|3.a.2||[nosep,after=,leftmargin=*] zero padding E5L5S5 response map 3 rotation invariance, max pooling|
|3.a.3||[nosep,after=,leftmargin=*] zero padding E5L5S5 energy map, distance voxels 3 rotation invariance, max pooling|
|3.b.1||checkerboard||[nosep,after=,leftmargin=*] mirror padding E3W5R5 response map|
|3.b.2||[nosep,after=,leftmargin=*] mirror padding E3W5R5 response map 3 rotation invariance, max pooling|
|3.b.3||[nosep,after=,leftmargin=*] mirror padding E3W5R5 energy map, distance voxels 3 rotation invariance, max pooling|
|4.a.1||Gabor||impulse||[nosep,after=,leftmargin=*] zero padding 2 modulus response map mm, mm, in-plane orientation|
|4.a.2||[nosep,after=,leftmargin=*] zero padding 2 modulus response map mm, mm, 2 rotation invariance, , average pooling average 2 responses over orthogonal planes|
|4.b.1||sphere||[nosep,after=,leftmargin=*] mirror padding 2 modulus response map mm, mm, in-plane orientation|
|4.b.2||[nosep,after=,leftmargin=*] mirror padding 2 modulus response map mm, mm, 2 rotation invariance, , average pooling average 2 responses over orthogonal planes|
|5.a.1||Daubechies 2||impulse||[nosep,after=,leftmargin=*] zero padding undecimated LHL map – 1st level|
|5.a.2||[nosep,after=,leftmargin=*] zero padding undecimated LHL map – 1st level 3 rotation invariance, average pooling|
|6.a.1||Coifflet 1||sphere||[nosep,after=,leftmargin=*] periodic padding undecimated HHL map – 1st level|
|6.a.2||[nosep,after=,leftmargin=*] periodic padding undecimated HHL map – 1st level 3 rotation invariance, average pooling|
|7.a.1||Haar||checkerboard||[nosep,after=,leftmargin=*] mirror padding undecimated LLL map – 2nd level 3 rotation invariance, average pooling|
|7.a.2||[nosep,after=,leftmargin=*] mirror padding undecimated HHH map – 2nd level 3 rotation invariance, average pooling|
|8.a.1||Simoncelli||checkerboard||[nosep,after=,leftmargin=*] periodic padding L map – 1st level|
|8.a.2||[nosep,after=,leftmargin=*] periodic padding L map – 2nd level|
|8.a.3||[nosep,after=,leftmargin=*] periodic padding H map – 3rd level|
|9.a||Riesz-transformed LoG||impulse||[nosep,after=,leftmargin=*] zero padding scale mm, filter size cutoff|
|9.b.1||sphere||[nosep,after=,leftmargin=*] zero padding scale mm, filter size cutoff|
|9.b.2||[nosep,after=,leftmargin=*] zero padding scale mm, filter size cutoff aligned by structure tensor,|
|10.a||Riesz-transformed Simoncelli||impulse||[nosep,after=,leftmargin=*] zero padding L map – 1st level|
|10.b.1||pattern 1||[nosep,after=,leftmargin=*] nearest value padding L map – 1st level|
|10.b.2||[nosep,after=,leftmargin=*] nearest value padding L map – 1st level aligned by structure tensor,|
PCA (left) and boxplot (right) visualisations of a comparison between 120 simulated response maps. The blue and red observations are coming from two distinct uniform distributions. The consensus is marked with a black cross in the scatter plot.
6.2 Phase 2: Benchmarking filters using a lung cancer CT image
CT images from four patients with non-small-cell lung carcinoma were made available to serve as radiomics phantoms (DOI:10.17195/candat.2016.08.1). The IBSI previously used the image for the first patient (PAT1) to establish reference values for image features 49. This image is used here again.
The lung cancer CT image is stored as a stack of slices in DICOM format. The image slices can be identified by the DCM_IMG prefix. Additionally, the RT structure set (starting DCM_RS) contains a delineation of the Gross Tumour Volume (GTV) which is used to create a segmentation mask for the ROI. The image and the mask have been converted to the NIfTI format. Note that some import algorithms alter the floating point bit depth of the image and mask. If this occurs, the imported image and mask should be converted back to (at least) 32-bit floating point and rounded prior to any further processing.
Compared to the digital phantoms, the lung cancer CT image offers a more realistic setting to benchmark applications of image filters as part of a radiomics analysis. Image filtering is assessed as part of two different image processing configurations, see Table 6.2. The main difference between the configurations is that one performs image processing and filtering within each image slice (i.e. 2), whereas the other operates in 3. Features are still computed over the entire region of interest, as mentioned in the IBSI 1 reference manual. The filtering configurations are then shown in Table 6.3.
|Parameter||Configuration A||Configuration B|
|slice-wise (2) or as volume (3)||2||3|
|resampled voxel spacing (mm)||(axial)|
|interpolation method||bicubic spline||tricubic spline|
|intensity rounding||nearest integer||nearest integer|
|ROI interpolation method||bilinear||trilinear|
|ROI partial mask volume|
|filters||see Table 6.3||see Table 6.3|
|intensity histogram||FBN: 16 bins||FBN: 16 bins|
a Fixed bin size cannot be used with most image filters, as intensities in response maps may no longer represent calibrated intensities.
|1.A — 1.B||none||–|
|2.A — 2.B||mean||support voxels|
|3.A — 3.B||LoG||scale mm, filter size cutoff|
|4.A — 4.B||Laws||[nosep,after=,leftmargin=*] L5E5 energy map, distance voxels 2 rotation invariance, max pooling|
|[nosep,after=,leftmargin=*] L5E5E5 energy map, distance voxels 3 rotation invariance, max pooling|
|5.A — 5.B||Gabor||[nosep,after=,leftmargin=*] 2 modulus response map mm, mm, 2 rotation invariance, , average pooling average 2 responses over orthogonal planes|
|6.A — 6.B||Daubechies 3||[nosep,after=,leftmargin=*] undecimated LH map – 1st level 2 rotation invariance, average pooling|
|[nosep,after=,leftmargin=*] undecimated LLH map – 1st level 3 rotation invariance, average pooling|
|7.A — 7.B||Daubechies 3|