Towards Automatic Abdominal Multi-Organ Segmentation in Dual Energy CT using Cascaded 3D Fully Convolutional Network

by   Shuqing Chen, et al.

Automatic multi-organ segmentation of the dual energy computed tomography (DECT) data can be beneficial for biomedical research and clinical applications. However, it is a challenging task. Recent advances in deep learning showed the feasibility to use 3-D fully convolutional networks (FCN) for voxel-wise dense predictions in single energy computed tomography (SECT). In this paper, we proposed a 3D FCN based method for automatic multi-organ segmentation in DECT. The work was based on a cascaded FCN and a general model for the major organs trained on a large set of SECT data. We preprocessed the DECT data by using linear weighting and fine-tuned the model for the DECT data. The method was evaluated using 42 torso DECT data acquired with a clinical dual-source CT system. Four abdominal organs (liver, spleen, left and right kidneys) were evaluated. Cross-validation was tested. Effect of the weight on the accuracy was researched. In all the tests, we achieved an average Dice coefficient of 93 and 89 feasible and promising.



There are no comments yet.


page 3


Fully Automatic Segmentation of Lumbar Vertebrae from CT Images using Cascaded 3D Fully Convolutional Networks

We present a method to address the challenging problem of segmentation o...

Towards dense volumetric pancreas segmentation in CT using 3D fully convolutional networks

Pancreas segmentation in computed tomography imaging has been historical...

An application of cascaded 3D fully convolutional networks for medical image segmentation

Recent advances in 3D fully convolutional networks (FCN) have made it fe...

A deep level set method for image segmentation

This paper proposes a novel image segmentation approachthat integrates f...

Fully Convolutional Networks for Diabetic Foot Ulcer Segmentation

Diabetic Foot Ulcer (DFU) is a major complication of Diabetes, which if ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The Hounsfield unit (HU) scale value depends on the inherent tissue properties, the x-ray spectrum for scanning and the administered contrast media [1]. In a SECT image, materials having different elemental compositions can be represented by identical HU values [2]. Therefore, SECT has challenges such as limited material-specific information and beam hardening as well as tissue characterization [1]. DECT has been investigated to solve the challenges of SECT. In DECT, two energy-specific image data sets are acquired at two different X-ray spectra, which are produced by different energies, simultaneously. The multi-organ segmentation in DECT can be beneficial for biomedical research and clinical applications, such as material decomposition [3], organ-specific context-sensitive enhanced reconstruction and display [4, 5], and computation of bone mineral density [6]. We are aiming at exploiting the prior anatomical information that is gained through the multi-organ segmentation to provide an improved context-sensitive DECT imaging [4, 5]. The novel technique offers the possibility to present evermore complex information to the radiologists simultaneously and bears the potential to improve the clinical routine in CT diagnosis.

Automatic multi-organ segmentation on DECT images is a challenging task due to the inter-subject variance of human abdomen, the complex 3-D intra-subject variance among organs, soft anatomy deformation, as well as different HU values for the same organ by different spectra. Recent researches show the power of deep learning in medical image processing

[7]. To solve the DECT segmentation problem, we use the successful experience from multi-organ segmentation in volumetric SECT images using deep learning [8, 9]. The proposed method is based on a cascaded 3D FCN, a two-stage, coarse-to-fine approach [8]. The first stage is used to predict the region of the interest (ROI) of the target organs, while the second stage is learned to predict the final segmentation. No organ-specific or energy-specific prior knowledge is required in the proposed method. The cross-validation results showed that the proposed method is promising to solve multi-organ segmentation problem for DECT. To the best of our knowledge, this is the first study about multi-organ segmentation in DECT images based on 3D FCNs.

2 Materials and Methods

2.1 Network Architecture for DECT Prediction

As described by Krauss et al. [10], a mixed image display is employed in clinical practice for the diagnose using DECT. The mixed image is calculated by linear weighting of the images values of the two spectra:


where is the weight of the dual energy composition, denotes the mixed image. and are the images at low and high kV, respectively.

Figure 1: Cascaded network architecture for DECT multi-organ segmentation

We preprocessed the DECT images following Eq. 1 straightforwardly. Figure 1 illustrates the network architecture of the proposed method for the DECT multi-organ segmentation. First of all, mixed image is calculated by combining the images at the low energy level and the high energy level using Eq. 1. Then, a binary mask is generated by thresholding the skin contour of the mixed image. Subsequently, the mixed image, the binary mask and the labeled image are given into the network as multi-channel inputs. The network consists of two stages. The first stage is applied to generate the region of the interest (ROI) in order to reduce the search space for the second stage. The prediction result of the first stage is taken as the mask for the second stage. Each stage is based on a standard 3D U-Net [11], which is a fully convolutional network including an analysis and a synthesis path. We used the open-source implementation of two stages cascaded network [8] developed by Roth et al. based on the 3D U-Net [11]

and the Caffe deep learning library

[12]. A general model was trained by Roth et al. [8] on a large set of SECT images including some of the major organ labels. Our model was trained by fine-tuning the general model with the mixed DECT images. The difference between the network output and the ground truth labels are compared using softmax with weight voxel-wise cross-entropy loss [11, 8].

2.2 Experimental Setup

The proposed method was evaluated with 42 clinical torso DECT images scanned by the department of radiology, university hospital Erlangen. All of the images were taken from male and female adult patients who had different clinically oriented indication justified by the radiologist. Ultravist 370 was given as contrast agent with body weight adapted volumes. The images were acquired at different X-ray tube voltage setting of 70 kV (560 mAs) and Sn 150 kV (140 mAs, with Sn filter) using a Siemens SOMATOM Force CT system with Stellar detector, an energy integrating detector. Each volume consists of 992-1290 slices of 512x512 pixels. The voxel dimensions are [0.6895-0.959, 0.6895-0.959, 0.6] mm. Four abdominal organs were tested, including liver, spleen, right and left kidneys. Ground truth was generated by experts in an inter-observer way. Training data, validation data, and test data were selected randomly with the ratio 5:1:1, i.e. in each test we used 6 images for validation, 6 images for test, and 30 images for training.

3 Results

3.1 Performance Estimation with Cross-Validation

NVIDIA GeForce GTX 1080 Ti with 11 GB memory was used for all of the experiments. The similarity between the segmentation result and the ground truth was measured with Dice metric by using the tool provided by VISCERAL [13]

. First, the performance of the proposed method was estimated by 8-folds cross-validation, using 0.6 as

as well as . Fig. 2 shows one segmentation results in 3-D. Table~1 summarizes the Dice coefficients of the segmentation results and compares DECT results with the SECT results. The proposed method under the above weight condition yielded an average Dice coefficient of 92% for the liver, 84% for the spleen, 88% for the right kidney and 87% for the left kidney, respectively. Fig. ~3 plots the distributions of the Dice coefficients for different test scenarios and showed the high robustness of the proposed method.

Liver Spleen r.Kidney l.Kidney
DECT Avg. 0.92 0.84 0.88 0.87
SD 0.02 0.08 0.03 0.03
Min. 0.84 0.62 0.80 0.78
Max. 0.94 0.95 0.94 0.93
Avg. 0.95 0.90 0.90 0.88
SD 0.02 0.06 0.06 0.05
Min. 0.91 0.75 0.75 0.78
Max. 0.97 0.95 0.95 0.94
Table 1: Dice coefficients of cross-validation with =0.6 and

=0.6. SD is abbreviated for standard deviation. Notice that the methods used different data set, the numbers are not directly comparable.

Figure 2: 3D rendering of one DECT segmentation with yellow for liver, blue for spleen, green for right kidney and red for left kidney
Figure 3: Dice coefficients of the target organs with and for 8 different testing folds

3.2 Study on the Weight

Figure 4: Dice coefficients of target organs with alpha blending for testing fold 1

We are aiming at exploiting the spectral information in the DECT data. Since the mixing results basically in pseudo monochromatic images comparable to single energy scans, the influence of the weight on the accuracy was further researched. 0.3, 0.6 and 0.9 were chosen as and in this study. Fig. 4 illustrates the distributions of the Dice coefficients with different weight combination for the testing fold 1. Table 2 lists the average Dice coefficient. For all of the cases, the liver had the highest accuracy (92%-93%), the standard deviation of the dice coefficients (around 2%) was fairly robust. The segmentation of the right kidney was usually more accurate than the left kidney. The best Dice values per organ per training set are highlighted in Table 2. The test with =0.9 and =0.9 obtained the highest accuracy for liver and right kidney. The test with weight combination 0.9-0.6 showed the best segmentation for spleen, the combination with 0.9-0.3 had the finest result for left kidney. The =0.9 generated better segmentation for liver, the =0.6 worked better for spleen.

- Liver Spleen r.Kidney l.Kidney
0.3-0.3 0.924 0.850 0.900 0.891
0.3-0.6 0.925 0.885 0.891 0.881
0.3-0.9 0.926 0.866 0.872 0.841
0.6-0.3 0.909 0.847 0.844 0.885
0.6-0.6 0.922 0.899 0.895 0.887
0.6-0.9 0.912 0.872 0.843 0.873
0.9-0.3 0.930 0.860 0.898 0.892
0.9-0.6 0.932 0.900 0.904 0.873
0.9-0.9 0.933 0.896 0.905 0.862
Table 2: Dice coefficients of different alpha for testing fold 1. Bold denotes the best organ results per training set.

4 Discussion and Conclusion

We proposed a deep learning based method for automatic abdominal multi-organ segmentation in DECT. The evaluation results show the feasibility of the proposed method. Compared to the results of the SECT images reported by Roth et al. [9], our method is promising and robust (see Table 1). The segmentation of liver and spleen was less accurate than the SECT. The third testing fold had a large deviation. The reason could be that our image data were taken from patients with different disease (liver tumor, spleen tumor, etc.). The disease type is not considered by the data selection. Training and test with inconsistent symptoms could have an impact on the accuracy.

The study on the weight can be divided into three groups with different . =0.9 is close to the low energy images which have on average the best soft-tissue contrast, =0.9 worked thus better in general. =0.6 is close to =0.5 which is the optimal fusion of both images with respect to signal-to-noise ratio (SNR),

=0.6 had therefore usually the smallest deviation and showed the strongest adaptability in the inter-group comparison. The intra-group comparison showed that the cases with identical training and test conditions had a higher probability to get the best segmentation result. This is expected because the mixed images generated by the matched training and test conditions may have the highest similarity. Furthermore, the comparison of the case 0.3-0.9 (low-contrast model for high-contrast image) with the case 0.9-0.3 (high-contrast model for low-contrast image) showed that using a model trained on high-contrast images for segmenting low-contrast test images works better. In addition, liver is well segmented in middle to high

ranges. Spleen is segmented best at =0.6. Kidneys work best in matched training and test conditions. This suggests that there is an optimal for each organ for image segmentation.

The weight for the mixed image calculation is currently a user-defined parameter in the preprocessing in our approach. It can be used to augment the data for the training in future. Also, the net could be modified with two image inputs. Furthermore, more organs and more scans from different patients could be used.

ACKNOWLEDGMENTS This work was supported by the German Research Foundation (DFG) through research grant No. KA 1678/20, LE 2763/2-1 and MA 4898/5-1.