1 Introduction
Accurate lung boundaries provide valuable imagebased information such as total lung volume or shape irregularities, but it also has an important role as a prerequisite step for developing computeraided diagnosis (CAD) system. However, an automated segmentation of lung fields is a challenging problem due to high variations in shape and size among different chest radiographs.
For automatic detection of lung fields, a lot of methods have been proposed over the past decade [1, 3, 10, 12]. The early segmentation methods can be partitioned into rulebased, pixel classificationbased, deformable modelbased, and hybrid methods [3]. Recently, deep neural networkbased approaches [10, 12]^{1}^{1}1In [10], the authors propose a hybrid model combined distance regularized level sets with a deep learning model for lung segmentation. This model shows high overlap scores but it requires good initial guesses. Therefore, we exclude this model from our comparison.
have been proposed due to the success of deep learning in various computer vision tasks including object classification
[8], localization [13], and segmentation [2, 11].For semantic segmentation, the encoderdecoder architecture is commonly used [11]
. In this architecture, encoder is a typical convolutional neural network (CNN), while decoder consists of transposed convolutions and upsampling operations. The role of decoder is to restore the abstracted feature map by learning how to densify the sparse activations. The final output of decoder is a probability map with the same size as that of the groundtruth masks, and pixelwise cross entropy loss is employed for training. Such encoderdecoder architecture has also been shown its promising performances in various medical imaging problems
[12, 14]. For example, UNet [14], a variant of the encoderdecoder architecture, shows the impressive results on segmentation of neuronal structures in electron microscopic stacks. For the task of lung segmentation, the authors of
[12]present UNetbased CNN architecture for automated segmentation of anatomical organs (e.g., lung, cavicles and heart) in chest radiographs. They also propose a modified loss function to deal with the multiclass segmentation problem.
Another succesful approach for semantic segmentation is to employ atrous convolutional layers by replacing some convolutional layers [2]. It is known that atrous convolution effectively enlarges the global receptive field of CNN [9], and therefore larger context information can be efficiently utilized for prediction of pixelwise labels.
In this paper, we introduce an accurate lung segmentation model for chest radiographs based on deep CNN with atrous convolutions. The proposed model is designed to have a deepandthin architecture, which has much fewer parameters compared to other CNNbased lung segmentation models. To improve further, we propose a multistage training strategy, networkwise training, which the current stage network is fed with both input images and the outputs of prestage network. It is shown that this strategy has an ability to reduce falsely predicted labels (i.e. false positives and false negatives) and produce smooth boundaries of segmented lung fields.
We evaluate the proposed method on a common benchmark dataset, the Japanese Society of Radiological Technology (JSRT) [15], and achieve the stateoftheart results under four popular segmentation metrics: the Jaccard similarity coefficient, Dice’s coefficient, average contour distance, and average surface distance. To investigate generalization capability of our method, we test on another dataset, the Montgomery County (MC) [6]
. It is observed that performances on this dataset are comparable in terms of mean values, but have high variances since there is some degree of a shift between training (JSRT) and test (MC) distributions.
2 Methods
2.1 Lung Segmentation with Atrous Convolutions
We present a deepandthin CNN architecture based on residual learning [5]
which has skip connections to prevent the gradient vanishing problem. Dense prediction problems should consider large context to predict class labels of pixels. Simple way for larger context is increasing the global receptive fields of network by stacking more convolution layers or using downsampling operations (e.g., pooling or strided convolution)
[9].Recently, it is known that atrous convolution is useful to enlarge the fieldofview (i.e. receptive fields) of filters. This enlargement is particularly effective for segmentation task since it should consider the context around the location where we want to predict class labels [2]. Atrous convolution contains ‘holes’ between weights of filters so that it involves larger fields to compute activations. Given a filter for and the input at location , atrous convolution with rate computes the output as follows:
(1) 
Note that if , Eq. 1 stands for standard convolution operation. Therefore, the global receptive field of network can be controlled via rate while maintaining the number of weights.
Fig. 1 shows an architecture of our network for lung segmentation task. It consists of 3 convolutional layers and 6 residual blocks, i.e. 15 convolutional layers. We employ atrous convolutions with
for the end of two residual blocks. Batch normalization layer is followed by every convolutional layer. The global stride of our network is 4, i.e. 2 convolutional layers at the beginning of particular residual blocks (the first layer in each red block in Fig.
1) operate convolutions with stride 2 (i.e. 2strided convolution). To calculate pixelwise cross entropy loss with the groudtruth mask, we upsample network outputs by bilinear interpolation.The advantage of the proposed deepandthin architecture is that it has much fewer model parameters compared to other CNNbased lung segmentation models. For examples, our model has 120,672 weights (26 times fewer parameters) while the encoderdecoder network like UNet has 3,140,771 weights [12].
2.2 Networkwise Training of CNN
Generally, CNN with atrous convolutions and bilinear interpolation has some limitations. First, it may produce small false positive or false negative areas. This is mainly caused due to pixelwise cross entropy loss dealing with every pixel independently. Second, it outputs blurry object boundaries, which is inevitable if we use a bilinear interpolation to upsample the downsampled feature maps. To overcome these issues, postprocessing via conditional random fields [7] is widely used to smooth such noisy segmentation maps [2].
We propose another strategy, networkwise training, to refine segmentation results. It is designed as a repeated training pipeline which has an output of prestage model as an input (see Fig. 1). At the first stage (namely stage 1), a network is trained using only input chest radiographs. After training it, both input chest radiographs and network outputs from trained model at stage 1 are fed into the second network. Specifically, input chest radiographs and the corresponding output from prestage network are concatenated across the channel dimension. From relatively coarse segmentation outputs, a network can more focus on the details to learn accurate boundaries of lung fields. This procedure is iterated until validation performance is saturated. Note that this strategy can be considered as iterative cascading, an extended version of the cascaded network [4].
3 Computational Experiments
We use a common benchmark dataset, the Japanese Society of Radiological Technology (JSRT) dataset [15], to evaluate lung segmentation performance of our model. JSRT database contains 247 the posterioranterior (PA) chest radiographs, 154 have lung nodules and 93 have no nodules. The groundtruth lung masks can be obtained in the Segmentation in Chest Radiographs (SCR) database [3].
Following previous practices in literatures, JSRT dataset is split in two folds: one contains 124 odd numbered and the other contain 123 even numbered chest radiographs. Then, one fold is used for training
^{2}^{2}2After the search of hyperparameters with randomly selected 30% training data, the network is retrained with the entire training data.
and the other fold used for testing, and vice versa. Final performances are computed by averaging results from both cases. Also, all training images are resized to 256256 as in the literatures. The network is trained via stochastic gradient descent with momentum 0.9. For learning rate scheduling, we set initial learning rate to 0.1 and it is decreased to 0.01 after training 70 epochs.
We use Montgomery County (MC) dataset [6] as another testset to investigate generalization capability of our model. MC dataset contains PA chest radiographs collected from National Library of Medicine, National Institutes of Health, Bethesda, MD, USA. It consists of 80 normal and 58 abnormal cases with manifestations of tuberculosis. It is interesting to see segmentation performances on this dataset since it has different characteristics compared to training set (JSRT): image acquisition equipment, abnormal diseases, nationality of patients, etc.
3.1 Performance Metrics
We use four commonly used metrics in the literatures: the Jaccard similarity coefficient(JSC), Dice’s coefficient (DC), average contour distance (ACD), average surface distance (ASD)^{3}^{3}3Average surface distance is also known as symmetric mean absolute surface distance [12].. JSC and DC are similar in that they only consider the number of true positives, false positives and false negatives. Therefore, they are metrics ignoring predicted locations. On the other hand, ACD and ASD are distancebased metrics. They penalize if the minimum distance of a particular pixel predicted as lung boundaries to the groundtruth boundaries is large. Therefore, performance from these metrics may vary even if JSC and DC are almost the same.
Let , , and , , be the pixels on the segmented boundary and the groundtruth boundary . The minimum distance of on to is defined as . Then, ACD and ASD are computed as follows:
ACD(S,G)  (2)  
ASD(S,G) 
Dataset  Methods  JSC  DC  ACD (mm)  ASD (mm) 

JSRT  Human observer [3]  0.9460.018    1.640.69   
Hybrid voting [3]  0.9490.020    1.620.66    
Candemir [1]  0.9540.015  0.9670.008  1.3210.316    
InvertedNet [12]  0.950  0.973    0.69  
Proposed (Stage 1)  0.9500.023  0.9740.012  1.3470.919  0.7240.163  
Proposed (Stage 2)  0.9540.020  0.9760.011  1.2950.846  0.6900.151  
Proposed (Stage 3)  0.9560.018  0.9770.010  1.2830.814  0.6830.145  
Proposed w/ aug (Stage 3)  0.9610.015  0.9800.008  1.2370.702  0.6750.122  
MC  Candemir [1]  0.9410.034  0.9600.018  1.5990.742   
Proposed w/ aug (Stage 3)  0.9310.049  0.9640.028  2.1861.795  0.9150.258 
Mean and standard deviations of segmentation performances for JSRT and MC datasets. The best mean performances for each dataset are given in bold.
3.2 Quantatitive and Qualititive Results
Table 1 summarizes segmentation performances of our model compared to previous methods^{4}^{4}4Note that JSC and DC numbers in Candemir [1] are incorrect since DC should be 2JSC/(1+JSC).. First, we evaluate the models at stage 1 and 3, which are trained without any preprocessing method such as histogram equalization and data augmentation techniques to exclude other potential factors that may affect performances. These results show that segmentation performances are continously improved through a networkwise training, and those from stage 3 model outperforms other methods.
The left side in Fig. 2 shows the effect of the proposed networkwise training, false positive and negative reduction and boundary smoothing. The top row shows that lung boundaries from trained model at stage 3 are much smoother than those from the model at stage 1. Also, the second and third rows support that false positives and false negatives can be supressed as stage goes. The performance plot in the right side in Fig. 2 shows the change of performances according to stages. It is observed that they are saturated at stage 3, so we report the performances from the model at stage 3.
In addition, we investigate the effect of data augmentation. For this, we adjust pixel values randomly through adjusting brightness and contrast so that the network is invariant to pixel value perturbations^{5}^{5}5Cropping, horizontal flipping and rotation were not effective. This is because lung segmentation network does not need to be invariant to such transformations.. As shown in Table 1, the trained model at stage 3 with data augmentation gives much better segmentation performances.
However, it should be noted that the performances on MC dataset are not as good as those on JSRT. Mean performances are slightly lower than the the hybrid approach in [1], but standard deviations are much higher even if the model is trained with data augmentation. It means that our model traind using JSRT gives unstable segmentation results on some cases in MC as shown in Fig. 3. This is due to the presence of a shift between distributions of training and test datasets, which needs to solve Domain Adaptation problem.
The samples of segmented lung boundaries are visualized in Fig. 3. Left two columns show the best two results in terms of JSC, and right two columns show the worst two for JSRT and MC datasets.
4 Conclusion
In this paper, we present an accurate lung segmentation model based on CNN with atrous convolutions. Furthermore, a novel multistage training strategy, networkwise training, to refine the segmentation results is also proposed. Computational experiments on benchmark dataset, JSRT, show that the proposed architecture and the networkwise training are very effective to obtain the accurate segmentation model for lung fields. We also evaluate the trained model on MC dataset, which raises the task for us to develop the model insensitive to domain shift.
References
 [1] Candemir, S., Jaeger, S., Palaniappan, K., Musco, J.P., Singh, R.K., Xue, Z., Karargyris, A., Antani, S., Thoma, G., McDonald, C.J.: Lung segmentation in chest radiographs using anatomical atlases with nonrigid registration. IEEE Transactions on Medical Imaging 33(2), 577–590 (2014)
 [2] Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv preprint arXiv:1606.00915 (2016)
 [3] van Ginneken, B., Stegmann, M.B., Loog, M.: Segmentation of anatomical structures in chest radiographs using supervised methods: a comparative study on a public database. Medical Image Analysis 10(1), 19–40 (2006)
 [4] Havaei, M., Davy, A., WardeFarley, D., Biard, A., Courville, A., Bengio, Y., Pal, C., Jodoin, P.M., Larochelle, H.: Brain tumor segmentation with deep neural networks. Medical Image Analysis 35, 18–31 (2017)
 [5] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR. pp. 770–778 (2016)
 [6] Jaeger, S., et al.: Automatic tuberculosis screening using chest radiographs. IEEE Transactions on Medical Imaging 33(2), 233–245 (2014)
 [7] Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with gaussian edge potentials. In: NIPS. pp. 109–117 (2011)

[8]
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS. pp. 1097–1105 (2012)
 [9] Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: NIPS. pp. 4898–4906 (2016)
 [10] Ngo, T.A., Carneiro, G.: Lung segmentation in chest radiographs using distance regularized level set and deepstructured learning and inference. In: 2015 IEEE International Conference on Image Processing (ICIP). pp. 2140–2143 (2015)
 [11] Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: ICCV. pp. 1520–1528 (2015)
 [12] Novikov, A.A., Major, D., Lenis, D., Hladůvka, J., Wimmer, M., Bühler, K.: Fully convolutional architectures for multiclass segmentation in chest radiographs. arXiv preprint arXiv:1701.08816 (2017)
 [13] Ren, S., andRoss Girshick, K.H., Sun, J.: Faster RCNN: Towards realtime object detection with region proposal networks. In: NIPS. pp. 91–99 (2012)
 [14] Ronneberger, O., Fischer, P., Brox, T.: Unet: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, M.W., Frangi, F.A. (eds.) MICCAI 2015, Part I. LNCS, vol. 9351, pp. 234–241. Springer International Publishing (2015)
 [15] Shiraishi, J., Katsuragawa, S., Ikezoe, J., Matsumoto, T., Kobayashi, T., ichi Komatsu, K., Matsui, M., Fujita, H., Kodera, Y., Doi, K.: Development of a digital image database for chest radiographs with and without a lung nodule: Receiver operating characteristic analysis of radiologists detection of pulmonary nodules. American Journal of Roentgenology 174(1), 71–74 (2000)
Comments
There are no comments yet.