The segmentation of liver tumours tumours in computed tomography (CT) is required for assessment of tumour load, treatment planning, prognosis, and monitoring of treatment response. Because manual segmentation is time consuming, tumour size is usually estimated in clinical practice from measurements in the axial plane of the largest diameter of the tumour and the diameter perpendicular to it. Nevertheless, tumour volume is a better predictor of patient survival than diameter . Hence, there is a clear need for tools to aid with tumour detection and segmentation.
are based on fully convolutional neural networks (FCN)[9, 10], often similar to the UNet . We exploit the architecture that is evaluated in  to construct a model configuration for segmenting metastatic lesions in the liver within CT volumes.
We attain competitive liver and liver lesion detection and segmentation scores across a wide range of metrics in the 2017 MICCAI Liver Tumour Segmentation Challenge (LiTS). Unlike other top scoring methods, we do not pre-process the data, we employ only trivial post-processing of model outputs, and we propose a single-stage model that is trained end-to-end.
We construct a model with two fully convolutional networks (FCNs), one on top of the other, trained end-to-end to segment 2D axial slices. Both networks are UNet-like  with short and long skip connections as in . The combined network is shown in Figure 1
(A). FCN 1 takes an axial slice as input and its output is passed to a linear classifier that outputs (via a sigmoid) a probability for each pixel being within the liver. FCN 2 takes as input both the axial slice and the output of FCN 1. The input thus has a number of channels equal to the number of channels in the representation produced by FCN 1 plus one channel which contains the axial slice. The representation produced by FCN 1 is effectively passed to every layer of FCN 2 due to short skip connections, after first passing through the first convolution layer of FCN 2. The output representation of FCN 2 is passed to a lesion classifier, of the same type as the liver classifier.
The FCN 1 and 2 networks have an identical architecture, as shown in Figure 1 (B). In each FCN, an input passes through an initial convolution layer and is then processed by a sequence of convolution blocks at decreasing resolutions and an increasing receptive field size. This contracting path is shown in blue on the left. An expanding path (right, in yellow) then reverses the downsampling performed by the contracting path. The expanding path mirrors the structure of the contracting path. Each block in the expanding path takes as input the sum of the previous block’s output and the output of its corresponding block from the contracting path; this allows the expanding path to recover spatial detail lost with downsampling. Representations are thus skipped from left to right along long skip connections.
We used two types of blocks: block A and block B. Both have short skip connections which sum the block’s input into its output, as shown in Figure 1
(C), (D). Both blocks contain dropout layers, a downsampling layer when used along the contracting path, and an upsampling layer when used along the expanding path. The downsampling layer in block A is max pooling. In block B it is basic grid subsampling, achieved by applying convolutions with a stride of 2. The upsampling layer performs simple nearest neighbour interpolation. The main difference between blocks A and B is in the number of convolution operations: block A contains one convolution layer and block B contains 2. All convolution layers use 3x3 filters; the number of filters is shown for each block in Figure1 (B).
2.2 Data Set
The proposed segmentation method was applied to metastatic lesions in the liver imaged with CT. The dataset included 200 CT volumes with variable coverage, either limited to the abdomen or including the entire abdomen and thorax. All volumes were enhanced with a contrast agent, imaged in the portal venous phase. All volumes contained a variable number of axial slices with a resolution of 512x512 pixels, with varying slice thicknesses. Of the 200 volumes, 130 volumes were provided publicly with manual segmentations of the liver and liver lesions while 70 were withheld until near the end of the LiTS challenge for evaluation. Manual segmentations were not provided for this evaluation set.
Of the 130 cases with segmentations, we used 115 for training and 15 for validating our segmentation models. We did not apply any pre-processing to the images except for basic image-independent scaling of the intensities to ensure inputs to our neural networks were within a reasonable range: we divided all pixel values by 255 and then clipped the resulting intensities to within [-2, 2].
2.3 Training the Model
We trained the model only on 2D axial slices that contain the liver, using RMSprop and the Dice loss defined in [4, 6]. For data augmentation, we applied random horizontal and vertical flips, rotations up to 15 degrees, zooming in and out up to 10%, and elastic deformations as described by 
. In order to improve training time, allowing us to test many models and hyperparameters in a short time, we first downscaled all slices from a 512x512 resolution to 256x256. This initial model was trained with a 0.001 learning rate (0.9 momentum). The model was then fine-tuned on full resolution slices, using a 0.0001 learning rate.
|Dice||VOE (%)||RVD (%)||ASSD (mm)||MSD (mm)||RMSD (mm)|
|Detection||50% overlap||0% overlap||Mixed measures|
|Precision||Recall||Precision||Recall||Global Dice||Dice per case|
The model was trained for 200 epochs on downscaled slices (batch size 40) and fine-tuned for 30 epochs on full-resolution slices (batch size 10). The final model weights were those which yielded the best loss on the validation set.
The proposed model is limited to processing 2D slices due to memory constraints. To improve segmentation performance and consistency across slices for the LiTS challenge, we introduced some cross-slice context. For every slice, three consecutive slices were considered (one above, one below). The pre-classifier outputs from each of the three slices were combined by a convolution (3x3 kernel); a new classifier for the middle slice was trained on the resultant features.
2.4 Generating Segmentations
At test time, segmentation predictions were averaged across all four input orientations achieved by vertical and horizontal flips. This was done for three similar models and the predictions of the ensemble were averaged. A liver segmentation was extracted by selecting the largest connected component in the model’s liver segmentation prediction. A lesion segmentation was extracted by cropping the model’s lesion segmentation prediction to a dilated version of the liver segmentation. For dilation, we chose to extend the liver’s boundaries by 20mm. This eliminated false positives outside of the liver without incorrectly cropping out lesions when the liver is slightly under-segmented. Beyond cropping to a single liver, no post-processing was performed on the model outputs.
3 Results and Discussion
. Segmentation metrics evaluate the segmentation of detected lesions (averaged across lesions). They are comprised of a per-lesion Dice score, a volume overlap error (VOE), a relative volume difference (RVD), the average symmetric surface distance (ASSD), the maximum surface distance (MSD), and the root means square symmetric surface distance (RMSD). Detection metrics were evaluated as precision and recall at50% and 0% overlap (measured by intersection over union) of each predicted lesion with the corresponding ground truth. Dice metrics that confound both detection and segmentation were the Dice score computed on all combined volumes (global Dice) and the mean Dice score per volume (Dice per case). Entries in the challenge were ranked according to the Dice per case, placing our method fourth in lesion segmentation with a score of 0.661. Liver segmentation performed well, with an average Dice per case of 0.951 (the best entry scored 0.963).
Although three methods attained higher mean Dice per case, our method compares favourably in terms of higher detection scores or lower complexity. While leHealth attained the top Dice per case score of 0.702, the method suffers from very low precision (0.156 compared to our 0.446, at 50% overlap). It also relies on extensive model ensembling and post-processing. The second method, labeled hchen in Table 1, attained a Dice per case of 0.686 but at the cost of a lower precision (0.409 at 50% overlap) . This method relies on a three-stage process where the liver is first roughly segmented, then liver and lesions are segmented with a 2D FCN, and finally, the segmentations are refined with a small 3D FCN that takes the initial segmentation predictions as input. The authors found that using a pre-trained 2D model significantly boosted performance. By contrast, we developed a single-stage pipeline in which we did not use pre-trained models; we will extend our method to 3D in the future. Finally, hans.meine (using the approach described in 
) attained a Dice per case of 0.676 with detection scores at 50% overlap that are slightly higher than for our method; however, that method involved post-processing with a random forest classifier to improve precision and used data other than that provided in the challenge to train a liver segmentation model. In comparison, our post-processing was trivial and we trained our model on only the data provided in the challenge.
All top methods used an FCN for lesion segmentation, conditioned on prior liver segmentation. In this regard, our approach differs only in that it is a single-stage model, trained end-to-end, and the lesion segmentation (FCN 2) is conditioned on the high-dimensional pre-classifier representation in the liver (FCN 1), rather than on the liver classifier outputs. This configuration allows FCN 2 to focus on the liver when performing lesion segmentation and ignore lesions far from the liver. We found that using a single FCN to segment lesions and the liver simultaneously is less effective. This may be because this does not model the dependence of the lesion segmentations on that of the liver. In addition, training FCN 1 and 2 end-to-end allows FCN 1 to learn a representation that is amenable for lesion segmentation, boosting the performance of FCN 2. Indeed,  found that an FCN may act as an effective learned pre-processor for another FCN.
The proposed model performs end-to-end joint liver and lesion segmentation in CT quickly without any need for pre-processing of input images or complicated post-processing of the outputs. Segmentation performance could be improved by extending the proposed model to processing the whole CT volume rather than slice inputs. The proposed model’s simplicity makes it a good base model for architectural research toward improving liver and liver lesion segmentation.
We thank An Tang and Gabriel Chartrand for preparing some of the data used in the LiTS challenge.
-  EA Eisenhauer, P Therasse, and J Bogaerts et al., “New response evaluation criteria in solid tumours: revised recist guideline (version 1.1),” European journal of cancer, vol. 45, no. 2, pp. 228–247, 2009.
-  J Chapiro, R Duran, M Lin, R Schernthaner, Z Wang, B Gorodetski, and JF Geschwind, “Identifying staging markers for hepatocellular carcinoma before transarterial chemoembolization: comparison of three-dimensional quantitative versus non–three-dimensional imaging markers,” Radiology, vol. 275, no. 2, pp. 438–447, 2014.
-  S Ioffe and C Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
-  M Drozdzal, E Vorontsov, G Chartrand, S Kadoury, and C Pal, “The importance of skip connections in biomedical image segmentation,” in International Workshop on Large-Scale Annotation of Biomedical Data and Expert Label Synthesis. Springer, 2016, pp. 179–187.
-  PF Christ, MEA Elshaer, and F Ettlinger et al., “Automatic liver and lesion segmentation in ct using cascaded fully convolutional neural networks and 3d conditional random fields,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2016, pp. 415–423.
-  F Milletari, N Navab, and SA Ahmadi, “V-net: Fully convolutional neural networks for volumetric medical image segmentation,” in 3D Vision (3DV), 2016 Fourth International Conference on. IEEE, 2016, pp. 565–571.
-  RPK Poudel, P Lamata, and G Montana, “Recurrent fully convolutional neural networks for multi-slice mri cardiac segmentation,” arXiv preprint arXiv:1608.03974, 2016.
H Chen, X Qi, JZ Cheng, and PA Heng,
“Deep contextual networks for neuronal structure segmentation,”in
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI Press, 2016, pp. 1167–1173.
J Long, E Shelhamer, and T Darrell,
“Fully convolutional networks for semantic segmentation,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431–3440.
-  B Hariharan, P Arbeláez, R Girshick, and J Malik, “Hypercolumns for object segmentation and fine-grained localization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 447–456.
-  O Ronneberger, P Fischer, and T Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2015, pp. 234–241.
T Tieleman and G Hinton,
“Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural Networks for Machine Learning, 2012.
-  X Li, H Chen, X Qi, Q Dou, CW Fu, and PA Heng, “H-denseunet: Hybrid densely connected unet for liver and liver tumor segmentation from ct volumes,” arXiv preprint arXiv:1709.07330, 2017.
-  G Chlebus, H Meine, JH Moltz, and A Schenk, “Neural network-based automatic liver tumor segmentation with random forest-based candidate filtering,” arXiv preprint arXiv:1706.00842, 2017.
-  M Drozdzal, G Chartrand, E Vorontsov, L Di Jorio, A Tang, A Romero, Y Bengio, C Pal, and S Kadoury, “Learning normalized inputs for iterative estimation in medical image segmentation,” arXiv preprint arXiv:1702.05174, 2017.