iW-Net: an automatic and minimalistic interactive lung nodule segmentation deep network. By Guilherme Aresta, Colin Jacobs, Teresa Araújo, António Cunha, Isabel Ramos, Bram van Ginneken and Aurélio Campilho
We propose iW-Net, a deep learning model that allows for both automatic and interactive segmentation of lung nodules in computed tomography images. iW-Net is composed of two blocks: the first one provides an automatic segmentation and the second one allows to correct it by analyzing 2 points introduced by the user in the nodule's boundary. For this purpose, a physics inspired weight map that takes the user input into account is proposed, which is used both as a feature map and in the system's loss function. Our approach is extensively evaluated on the public LIDC-IDRI dataset, where we achieve a state-of-the-art performance of 0.55 intersection over union vs the 0.59 inter-observer agreement. Also, we show that iW-Net allows to correct the segmentation of small nodules, essential for proper patient referral decision, as well as improve the segmentation of the challenging non-solid nodules and thus may be an important tool for increasing the early diagnosis of lung cancer.READ FULL TEXT VIEW PDF
Early diagnosis and analysis of lung cancer involve a precise and effici...
Fully-automatic lung lobe segmentation is challenging due to anatomical
Rationale and objectives: Several studies have evaluated the usefulness ...
Coronavirus Disease 2019 (COVID-19) spread globally in early 2020, causi...
In this work we present a method for lung nodules segmentation, their te...
Identifying image features that are robust with respect to segmentation
Accurate quantification of pulmonary nodules can greatly assist the earl...
iW-Net: an automatic and minimalistic interactive lung nodule segmentation deep network. By Guilherme Aresta, Colin Jacobs, Teresa Araújo, António Cunha, Isabel Ramos, Bram van Ginneken and Aurélio Campilho
Lung cancer is the most fatal cancer type in both men and women . Thankfully, early diagnosis of this pathology and proper medical follow-up allow to increase the patients’ survival rate. Namely, annual screening of risk groups with low-dose chest computed tomography (LDCT) allows to reduce lung cancer mortality by 20% . During screening, radiologists search for lung nodules by visually inspecting the LDCT volumes. Potential findings are then characterized in terms of dimension (axes length and volume), texture (solid, sub-solid and non-solid), spiculation, calcification and location. Patient follow-up is then decided according to a specific lung cancer screening guideline. Particularly, the initial nodule dimensions and growth-rate are two pivotal characteristics in major screening guidelines [3, 4, 5] and thus accurate 3D lung nodule segmentation is an important task during screening. However, performing accurate manual segmentation is a highly time consuming task, thus motivating the need for automatic lung nodule segmentation solutions. Furthermore, it is known that nodule segmentation is a subjective task and specialists often disagree on their annotations . Consequently, interactive segmentation tools are of high interest on this clinical setting.
Over the past years, several automatic lung nodule segmentation methods have been proposed with the goal of automating lung cancer screening. Despite achieving acceptable performances, lung nodule segmentation methods are still limited because either do not allow for user interaction, are slow or require extensive user interaction (e.g. adjustment of several parameters) to achieve a satisfying result.
We propose an end-to-end deep learning scheme, iW-Net (interactive W-Net), that allows for both automatic and optional interactive 3D lung nodule segmentation, as suggested in Fig. 1. The network receives as input a cube of fixed dimensions which centroid is indicated by the user, or by an automatic nodule detection framework, and proposes a corresponding segmentation. If the user is not satisfied, the segmentation can be corrected by using the end-points of a manually inserted stroke of the nodule’s diameter. For this purpose, we use a second segmentation network that integrates the 3D image of the nodule, the initial segmentation and the coordinates of the end-points. Namely, this paper shows that the end-points can be represented by a physics-inspired weight map that, when used as a feature map and as loss function term, allows to cap the inter-observer agreement in the LIDC-IDRI public dataset. Our approach allows a simple and fast segmentation correction when that information is available without introducing a significant over-head in comparison to the non-guided version of the model.
Lung nodule segmentation has been a focused research topic over the last decade. Segmentation methods usually take advantage of the natural characteristics of solid nodules, which commonly have high contrast with the lung parenchyma and spherical shapes. A common approach is to do voxel-wise segmentation by extracting intensity [7, 8] and shape-related features, namely from Hessian matrices 10] to obtain the final result. However, the extension of feature-design approaches for non-solid and sub-solid nodules is a hard and tedious process  due to the cloudy texture, irregular shape and reduced contrast with the parenchyma of non-solid, and the diffused boundaries of sub-solid nodules.
Because of this, Convolutional neural networks (CNNs) have become the standard approach for medical image segmentation since they allow to significantly reduce the required field-knowledge to work with these images and thus the need for manual feature design. For instance, Wanget al.  proposed a multi-scale CNN that performs voxel-wise predictions, inside a cube containing a lung nodule, of the abnormal tissue. Each predicted voxel corresponds to the center of a fixed dimension patch to be processed by the network and thus predicting an entire segmentation requires the evaluation of a high number of patches. Furthermore, this model has an inherent lack of global context, since the network only evaluates patches, and thus the 3D reconstruction of the nodule may be affected. A common solution is to adapt 3D U-Net  architectures, since they allow to consider both local and global context. With this in mind, Wu et al.  proposed a multi-task scheme for pulmonary nodule segmentation together with the prediction of the nodules’ expected malignancy, achieving state-of-the-art performance in both tasks. This malignancy prediction is performed by concatenating and processing via a set of fully-connected layers the features of the segmentation network’s bottle neck with a convolved version of the produced segmentation prediction.
Despite the high performance of deep learning methods, their application in the medical field is being criticized due to 1) the inherent lack of explanations behind the decision and, 2) the production of deterministic outputs, ignoring the existing inter-observer variability of the annotations and inhibiting the medical specialist to interact and change the decisions of the system. With this in mind, Kohl et al.  proposed to model the inter-observer variability by combining a conditional variational auto encoder (cVAE) with an U-Net. The cVAE is used for drawing a set of feature maps sampled from the trained latent space representation. These features are then concatenated with the last feature maps of the U-Net, which are then convolved to produce the segmentation output. By varying the sampled set of features from the cVAE, this model is capable of producing different, yet plausible, nodule segmentations. However, the method of Kohl et al. does not allow the clinician to alter the segmentation, instead forcing the specialist to opt for the result closer to his/her expectations.
Recently, Wang et al. 
proposed a scribble-based approach to refine 2D and 3D segmentations resulting from a fully-convolutional neural network. First, the user selects a bounding box containing the anatomical structure to segment. For each unseen image, the top of a pre-trained segmentation model is trained to accommodate the foreground and background scribbles by minimizing, via an expectation-maximization (EM) approach, a loss function composed of two terms:1) a pixel-wise weighted categorical cross-entropy term that prioritizes the inclusion of foreground and the removal of background scribbles, and 2) a pair-wise smoothness term that encourages the aggregation of neighbor pixels of similar intensity . Even though this scheme achieves state-of-the-art results on organ segmentation in MRI images, its application for lung nodule segmentation is limited due to the nature of the abnormalities. For instance, nodules are often attached to structures of similar intensity, such as the pleural wall and blood vessels, and thus the EM scheme may lead to the inclusion of these structures in the segmentation and thus extra manual correction efforts. Also, sub-solid and non-solid nodules do not have a clear boundary, which can further hinder the minimization of the smoothness term.
Having in mind the limitations of the existing approaches for lung nodule segmentation, we propose iW-Net, a simplistic deep learning approach that allows to alter segmentations while requiring minimal user interaction. The model’s design and the respective experimental setup are described in Section II. Then, in Section III we show that our approach allows to achieve state-of-the-art performance. Finally, Section IV draws the main conclusions from this study.
iW-Net allows to easily correct lung nodule segmentations according to the specialists’ perception. As depicted in Fig. 2, iW-Net first performs an 1. automatic 3D segmentation of lung nodules, predicted by the first block (i.e. U) of the network, and after an 2. optional segmentation correction, performed by the second block via the analysis of the end-points of the user-introduced stroke of the diameter of the nodule. For this, we propose a pixel-wise weight map to guide the segmentation, as detailed in Section II-A. is then used as a feature map of iW-Net and in a loss function term to train an auto-encoder segmentation network, as described in Sections II-B and II-C.
Our weight map is inspired on the attraction field generated by punctual electric charges of opposite value. Let define a sphere of undetermined radius:
where (, , ) is the center of the sphere and (,,) are Cartesian coordinates. The unitary normalized gradient field is:
The norm of the vectors of can be weighted as function of the distance to the center of the sphere:
where controls the decay of the vectors’ magnitude and makes the field centripetal or centrifuge, respectively. Then, is a vector field that moves from to . In our approach, and correspond to the user introduced points and is a 3D feature map indicating how valuable each voxel is for the segmentation. In terms of magnitude, has high intensity in the region between the centers of and and low vector magnitude elsewhere, indicating to the network that the region between the two points has high interest for the segmentation. Changing affects the strength of the interaction between the two points, as shown in Fig. 3. Namely, a lower increases the the focus on the central region but also increases its overall volume, whereas a high leads to more spherical regions of interest surrounding the points. Note that if no points exist, then
is a zero-value tensor with the same size of the input volume.
The proposed nodule segmentation scheme is adaptation of the 3D U-Net . As shown in Fig. 2, iW-Net is composed of two auto-encoders: the first outputs an initial segmentation, which is then used as an input for the second block to produce the corrected segmentation. Each of the auto-encoders has a reduced a number of filters in the encoding and decoding parts in comparison to the 3D U-Net, resulting in less parameters to tune and thus easing the back-propagation process.
We include the proposed segmentation weight map by concatenating it to the initial feature maps of the encoding part of the second block of the model since preliminary experiments showed a significant performance drop if was included on the upsampling part only. In fact, adding on the initial part of segmentation correction block ensures that all weights of the model are affected by these external features. Due to the skip connections, is also included on the final segmentation layer, thus directly affecting the model’s output.
iW-Net predicts a 3D map of the probability of each voxel belonging to the nodule. We use a two-term loss function, where the first is based on the intersection over union (IoU):
where and are the ground truth mask and the soft prediction mask, respectively, and is the Hadamard product. The second term aims at forcing the network to have in account the manually introduced points by evaluating if there are segmentation points in the defined region of interest:
where controls the extent of the region of interest.
where controls the relative importance of the terms.
iW-Net was developed using the LIDC-IDRI  dataset, which contains 1012 LDCT scans with variable slice thickness. In this dataset, nodules with diameter have voxel-wise annotations from up to 4 different expert radiologists and the corresponding inter-observer agreement level is indicative of how likely an abnormality is in fact a nodule. The dataset also contains a numeric description of several nodule characteristics. Namely, nodule texture indicates the opacity of the nodule, with 1 being a pure non-solid nodule and 5 a pure solid nodule. We considered the 888 scans used for the LUNA16 challenge  and studied 2284 nodules (some samples were discarded due to annotation inconsistencies, poor scan reconstruction or excessive slice thickness). From those, 1593, 1190 and 790 have agreement level , and , respectively. In our experiments, a nodule is considered non-solid if it has an average texture , solid if and sub-solid otherwise. For an agreement level , the dataset has 135 non-solid, 300 sub-solid and 1695 solid nodules.
All nodules were collected by patching a cube centered at the average center of mass of the specialists annotations and were then isotropically resized to voxels. The intensity of the volume image was linearly mapped from Hounsfield Units to . Adam  was used as optimizer (learning rate 0.001) and the network was trained using a batch size of 8 samples.
The dataset was artificially augmented by performing random rotations, translations, flips and zooms. For each epoch, user input was simulated by selecting the two most distant points on the middle axial slice of the segmentation. All agreement levels were considered to account for the inter-observer variability and thus no segmentation combination was performed,i.e. the same nodule was paired with different viable ground-truths to train the model. Furthermore, iW-Net was evaluated via stratified 5-fold cross-validation with partition at scan level and we used 20% of the training for validation. All hyper-parameters were found via random search  with 100 search steps. At each step, , where
is an uniform distribution. Optimization was performed on the validation set of the first train-test split.
iW-Net was trained in two steps. The first block was initially trained separately using until the validation loss stopped improving for 3 epochs. The weights were then frozen and the entire iW-Net was trained using , the output of the first segmentation block and the artificially generated user interaction until the loss stopped improving for 5 epochs. Since each nodule can have multiple segmentations (one per expert), iW-Net had to perform different corrections according to the expert’s annotation and the respective simulated user input. Experiments were performed on an Intel Core i7-5960X, 32Gb RAM,
GTX1080 desktop with Python 3.5 and Keras 2.2111https://github.com/gmaresta/iW-Net.
iW-Net produces pixel-wise predictions , which are thresholded at 0.5 for the model’s evaluation. The predictions are evaluated in terms of 3D intersection over union (IoU) and average surface distance (ASD), as follows:
where is the expert’s annotation, is the model’s prediction, and are the number of surface elements, is the Euclidean distance (mm) and min is the minimum operation.
For each nodule, the average inter-observer IoU performance is computed by iteratively considering one expert’s annotation as the ground-truth and the remaining as predictions and then averaging the results. For instance, the inter-observer IoU performance in an agreement level 4 nodule is the average of IoU results. For better comparison with the observers, iW-Net is only evaluated in nodules with agreement level . The segmentation performance is also analyzed in terms of nodule radius and texture. We consider the radius of each nodule as the average of the equivalent spherical radius of all the annotators.
We study the performance of the non-guided segmentation unit (the first block of iW-Net) using as comparison terms the average inter-observer agreement and the segmentation produced using the 3D U-Net . This U-Net is trained and tested on the aforementioned dataset. Due to computational constraints, the batch size is reduced to 2. Evaluation is performed according to Eq. 9:
where is the expert’s agreement level for nodule . Since a nodule can have multiple segmentations, it is not expected that the model outperforms the inter-observer agreement.
The goal of this experiment is to evaluate the impact of the user’s input on the segmentation of iW-Net. For that, we artificially generate user inputs on the axial plane of the slice that contains the nodule’s centroid. Similarly to the training procedure, the two most points distant points in the ground-truth boundary of that slice are selected.
The performance of the full iW-Net is compared with the output of the first block in terms of IoU and ASD for different nodule sizes and textures. As in a real case scenario, we consider that the experts can keep either the initial or the corrected segmentation, according to which better fits their needs. The evaluation is thus performed via Eq. 10:
This principle is also applied to the ASD metric having as decision criteria the IoU, i.e., the same nodules are considered.
The best performing set of parameters are , and . These allow to achieve an average validation IoU of 0.59 in the first train/test split. Intuitively, a near 0.5 (recall Fig. 2(b)) allows to create a weight map that prioritizes the inclusion of the points and the respective connection region without overspreading (Fig. 2(a)) or over-emphasizing the points (Figs. 2(c) and 2(c)). Likewise, the found
allows the binarized weight map to have an ellipsoidal structure, following the approximate shape of most of the nodules. Finally,balances the contribution of the initial manual segmentation and the added weight map during model training. In the limit where the network would be trying to approximate the nodule segmentation to an ellipsoid. On the other hand, ensures that the manual segmentation is the prioritized target during training and that the weight map (see Fig. 4) is used for local corrections.
iW-Net without user interaction outperforms the baseline 3D U-Net . As shown in Table I, the nodule segmentation performance is relatively increased by approximately 39% while reducing the number of parameters by a factor of 6.9. In fact, the reduction of the size of the network contributed to the disparity between the referred IoUs by allowing to increase the batch size during training and thus help the error’s back-propagation via a better batch normalization .
As expected, iW-Net’s prediction without user-interaction tends to be better for larger nodules (see Fig. 5). Indeed, since most segmentation errors occur near the nodules’ boundary, then smaller nodules, which have a higher surface area vs volume ratio, should be more challenging. Interestingly, the inter-observer agreement follows the same tendency, indicating that smaller nodules are particularly difficult to segment.
|IoU||Number of parameters|
|3D U-Net ||19 080 001|
|iW-Net first block||1 592 093|
The proposed simplistic user interaction approach allows to improve the baseline segmentation on more than 75% of the cases. Fig. 4 depicts examples where iW-Net allows to significantly alter the 3D shape of the segmentation just by the introduction of two points, being capable of correcting, at least partially, poor segmentations (middle) as well as change the orientation of the proposed region of interest (right). In fact, 44% of the user-introduced points are inside the new segmentations, further showing the tendency of iW-Net to alter the shape of the segmentation. Also, as detailed in Table II and Fig. 6, iW-Net specially enables the delineation correction of the challenging non-solid nodules.
Our proposed approach also has promising results for computer-aided lung cancer screening. As depicted in Fig. 7, the radius range is where iW-Net (user supervised) most improves the quality of the nodules’ segmentation. Importantly, several international lung cancer screening guidelines, such as LUNG-RADS , point this dimension range as essential to classify a nodule as either benign or malignant.
iW-Net with the simulated user-interaction allows to improve over the baseline for nodules of different dimensions and textures, as summarized in Fig. 4, 6 and 7. However, the achieved IoU is still, in average, 0.04 lower than the inter-observer agreement. A possible reason for this is that, due to the variability of the ground-truth in the data (i.e.
several segmentations for the same nodule), the network is likely to learn an average segmentation in order to minimize the loss over the redundant training images. Also, during the segmentation correction we are always selecting the two furtherest points in the nodule boundary. In fact, this is a challenging scenario since there is no guarantee that the selected points are in the direction in which the segmentation needs to be corrected. Instead, we are assuming that providing an estimation of the nodule’s largest axis is sufficient to improve the segmentation.
Despite always using the two farthest points to correct the segmentation, iW-Net improves the baseline segmentation’s ASD by 24%, (Fig. 8). Namely, the baseline’s average ASD is 1.09 and the corrected’s is 0.827, meaning that iW-Net has a segmentation error that is in average less than 1 voxel. Also, similarly to the IoU’s behavior, the simplistic user interaction allows to significantly improve the quality of the nodules’ segmentation in non-solid and sub-solid abnormalities.
iW-Net achieves a performance in pair to the inter-observer agreement, similarly to other state-of-the-art approaches. Note that making a direct comparison between the approaches is non-trivial since 1. there is a great variation on the size of the test set, type and size of the nodules used as well as the minimum inter-observer agreement; 2. different methods use different voxel scales, and the inherent re-sampling affects the shape of the ground-truth; 3. there are different ways of combining the ground-truth annotations from the different observers (using all, the average or the median, for instance) to produce the final evaluation mask. Nevertheless, for reference, Table III shows the achieved IoUs of different approaches on the LIDC-IDRI dataset. Similarly to other state-of-the-art approaches, the performance of our method is close to the inter-observer agreement, even though a significantly larger number of samples has been studied. Advantageously, iW-Net does not rely on computationally heavy pre-processing steps and allows to segment nodules of all sizes and textures without the need to define bounding boxes or other specific parameters. Also, unlike Wu et al.  model, training iW-Net does not require other metadata, making it easier to enrich the training set and thus the generalization capability of the system.
|Tan et al. ||2013||NA||23||0.65||NA|
|Lassen et al. *||2015||NA||19|
|Messay et al. ||2015||300||66||NA|
|Gonçalves et al. ||2016||57||512|
|Wang et al. ||2017||350||493|
|Wu et al. ||2018||1404||1404||NA|
We propose iW-Net, a novel lung nodule interactive segmentation scheme. Drawing a stroke of the nodule’s diameter and respective end-point extraction allows to generate a weight map , which is then used for altering the prediction of the network. Specifically, is designed having in account the expected spherical shape of the nodules and the distance between the introduced points. To promote the influence of in the resulting segmentation, this map is incorporated as a feature of the model and as a component of the loss function.
iW-Net allows to improve the segmentation of more than 75% of the studied nodules. In fact, in comparison to the baseline, our model (with user interaction) significantly improves the segmentation of nodules with radii , which are essential for referral. Likewise, using iW-Net improves the segmentation performance of nodules with all types of textures, specially the challenging non-solid nodules. Given the inherent subjectivity of lung nodule segmentation, iW-Net may be an important tool to add to CAD systems, removing the need for manual segmentation while providing an easy and fast method to correct the produced output if needed.