DINs: Deep Interactive Networks for Neurofibroma Segmentation in Neurofibromatosis Type 1 on Whole-Body MRI

06/07/2021 ∙ by Jian-Wei Zhang, et al. ∙ Zhejiang University Harvard University 0

Neurofibromatosis type 1 (NF1) is an autosomal dominant tumor predisposition syndrome that involves the central and peripheral nervous systems. Accurate detection and segmentation of neurofibromas are essential for assessing tumor burden and longitudinal tumor size changes. Automatic convolutional neural networks (CNNs) are sensitive and vulnerable as tumors' variable anatomical location and heterogeneous appearance on MRI. In this study, we propose deep interactive networks (DINs) to address the above limitations. User interactions guide the model to recognize complicated tumors and quickly adapt to heterogeneous tumors. We introduce a simple but effective Exponential Distance Transform (ExpDT) that converts user interactions into guide maps regarded as the spatial and appearance prior. Comparing with popular Euclidean and geodesic distances, ExpDT is more robust to various image sizes, which reserves the distribution of interactive inputs. Furthermore, to enhance the tumor-related features, we design a deep interactive module to propagate the guides into deeper layers. We train and evaluate DINs on three MRI data sets from NF1 patients. The experiment results yield significant improvements of 44 in DSC comparing with automated and other interactive methods, respectively. We also experimentally demonstrate the efficiency of DINs in reducing user burden when comparing with conventional interactive methods. The source code of our method is available at <https://github.com/Jarvis73/DINs>.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 4

page 5

page 6

page 7

page 9

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Neurofibromatosis type-1 (NF1) is an autosomal dominant neurogenetic disorder characterized by the development of both benign and malignant tumors. The hallmark tumors are neurofibromas, which are histologically benign tumors that arise from the peripheral nerve sheath and involve any body part. Despite benign histology, they can cause significant morbidity due to compression and invasion of nerves and other vital anatomical organs. Neurofibromas can be located deep inside the body and, if asymptomatic, are usually detected by whole-body magnetic resonance imaging (WBMRI) using short tau inversion recovery (STIR) sequences. Based on tumor morphology on MRI, plexiform (invasive or involving multiple nerves) neurofibromas (PNFs) carry an increased risk for transformation into malignant peripheral nerve sheath tumors.

Fig. 1 depicts two NF1 cases of WBMRI with the ground truth segmentation of tumor regions contoured in yellow. Accurate detection and evaluation of tumor burden on WBMRI are important for longitudinal tracking of tumor size, which enables accurate assessment of tumor growth and treatment response. However, the detection and segmentation of neurofibroma on WBMRI, particularly PNFs, is associated with three technical challenges.

Fig. 1: Two examples of neurofibromas with ground truth on 3D WBMRI. The three images in each example are (1) a coronal slice, (2) annotations of neurofibromas contoured in yellow, (3) and the 3D view of neurofibromas, respectively.

Large number of tumors across the entire body and variable anatomical locations of tumors. Neurofibromas can develop anywhere along peripheral nerves. Their appearance across individuals can vary in number (from none to hundreds) and sizes (from several cubic centimeters to several thousand cubic centimeters). Traditional interactive segmentation methods cannot reasonably handle such a large number of tumors at a time. Conventional segmentation methods for neurofibromas have been proposed in the literature, including histogram thresholding [35], region growing [36] using histogram templates, 3D dynamic thresholding level-set [3]. These interactive segmentation methods are labor-intensive and time-consuming since they involve manual identification of individual tumors by raters and interactive contour correction due to the imperfect segmentation obtained from automated methods. In general, it may take a few minutes to 1-2 hours to complete segmentation on WBMRI.

Heterogeneous and diffuse tumor architecture. PNFs can be elongated in shape with a characteristic ringlike or septate pattern that typically has a target-like appearance on MRI, with central low signal intensity and peripheral high signal intensity. Recently, deep convolutional neural networks (CNNs) have achieved great success in medical image segmentation, such as U-Net [28], V-Net [25], DeepMedic [17], and nnU-Net [13]. CNNs have brought a breakthrough for tumor segmentation in the brain [9], lung [15], liver [23], and other organs. Nevertheless, their application in neurofibroma segmentation on WBMRI has been minimal. Besides, CNN-based approaches tend not to generalize well to new data because the targeted neurofibromas may be substantially different in size, shape, intensity, and boundary to adjacent organs in the training and testing data sets. Here, we explore how to embed user interactions into CNNs to improve generalizability for obtaining an accurate and efficient interactive segmentation approach on WBMRI.

Guide maps suffer from the distribution shift for variable image sizes. Some CNN-based interactive segmentation approaches [39, 34] have been proposed to extract foreground objects interactively. These methods convert user interactions into distance maps utilizing either Euclidean distance transform [39] (EDT) or geodesic distance transform [34] (GDT). Typically, training CNNs with image patches and fine-tune/test on the whole image is a common trade-off between GPU memory and accuracy/inference speed [29, 26, 23]. However, both transformations are sensitive to image sizes, which leads to the distribution shift for guide maps with various sizes and a performance decrease when applied to neurofibroma data.

This paper proposes deep interactive neural networks (DINs) for interactive neurofibroma segmentation on WBMRI. We first adapt popular 3D U-Net [6]

to the neurofibroma data on WBMRI by introducing anisotropic convolutional kernels for more accurate tumor-related feature extraction. Then, user interactions are encoded into guide maps as inputs with a distance transformation and embedded into multiple layers of the model for reserving user knowledge in deeper layers. The guide maps are regarded as the local appearance prior and the spatial prior. To avoid the effect of variable image sizes, we propose using exponential distance transform (ExpDT), whose intensity distribution is size-agnostic. With the guide maps, DINs embed users’ prior knowledge into neural networks for correcting segmentation results. Furthermore, to reduce the interaction effort in training and testing processes of DINs, we develop a strategy to simulate user interactions to synthesize various user interactive methods and rapidly explore the best hyper-parameters. It is well known that medical images acquired from different devices have a distribution shift problem that can not be neglected and may lead to poor generalization for CNN models. In this situation, we experimentally demonstrated that DINs have stronger robustness and stability than previous automated and interactive methods.

To evaluate DINs, we collected two WBMRI data sets and a local-region MRI (LRMRI) data set from NF1 patients, obtained using different MRI acquisition parameters. Experiments showed that DINs significantly outperformed automated methods by 44% in Dice Similarity Score (DSC), which demonstrated the effectiveness of the CNN-based interactive segmentation. DINs outperformed other CNN-based interactive methods by 29% and ExpDT outperformed other distance transforms (DTs) by 14% in DSC. Furthermore, comparison results on conventional interactive methods suggested that DINs significantly reduced user interactions and running time.

The main contributions of the work are summarized as follow:

  • We propose DINs to cope with the challenges of neurofibromas for interactive segmentation on WBMRI.

  • We introduce ExpDT for integrating user interactions into neural networks. ExpDT is size-independent comparing with other common DTs and therefore is more suitable for WBMRI.

  • We propose a deep interactive module to integrate user knowledge into the deeper layers of the model, which effectively enhances the learned features about neurofibromas and improves the segmentation performance.

  • We develop a strategy to simulate user interactions for training 3D interactive neural network models.

  • DINs outperform automated and interactive methods by 44% and 29% in DSC, and ExpDT outperforms other DTs by 14% in DSC. Furthermore, DINs reduce user interactions and running time comparing with conventional methods.

Ii Related Work

Ii-a Neurofibroma segmentation

A few interactive and semi-automated segmentation methods have been developed for NF1 in the literature. Solomon et al. [31] developed an interactive 2D segmentation method for PNF that detects tumor regions within a manually defined area on each slice using a histogram-based threshold. This method failed if the histogram is unimodal or close to unimodal. Thus, manual contouring was frequently required to correct the resulting contours. Following this idea, Weizman et al. [35] proposed 15 histogram templates of various distributions from bimodal to unimodal to identify the optimal threshold. Cai et al. [3] developed the 3DQI system for semi-automated neurofibroma segmentation performed by the dynamic-threshold level set method starting from a seed region. These existing methods required a large amount of time and effort of interaction from users, either slice-by-slice or tumor-by-tumor with user-provided scribbles or initial seeds. However, as shown in Fig. 1, there might be dozens or hundreds of tumors in a single study, which put a heavy burden of interaction on users. Some work [37, 11] introduced neural networks into the segmentation of neurofibromas. Wu et al. [37] integrated CNNs into the Active Contour Model for predicting the parametric maps, while Ho et al. [11]

compared a multi-spectral neural network classifier with manual segmentation on diffusion-weighted imaging data. However, these methods still relied on conventional segmentation methods, and state-of-the-art deep CNNs were not explored for neurofibroma segmentation yet. Therefore, a highly accurate and efficient neurofibroma segmentation method remained a technical challenge.

Ii-B CNNs in medical image segmentation

CNNs have been successfully adopted in various medical image segmentation applications. In particular, U-Net [28] was designed for medical image semantic segmentation with the symmetric encoder and decoder paths. Long skip connections between the two paths enhanced the fusion of multi-level features. Based on the same network, many encoder-decoder CNNs have subsequently been introduced for 2D and 3D medical image segmentation. 3D U-Net [6] expanded U-Net from 2D image segmentation to 3D volumetric image segmentation using 3D convolutions and a training strategy with sparse annotation. VNet [25] and HighRes3DNet [22] incorporated residual modules [10]

into their network structures. The difference between VNet and HighRes3DNet was that the VNet enlarged the receptive field by downsampling the feature map with large-stride convolution while HighRes3DNet adopted dilated convolution 

[4]. In addition, cascaded networks [5, 23], were also explored for improving the learning ability of the model. H-DenseUNet [23] adopted a multi-stage strategy, cascaded 2D and 3D networks to jointly fuse and optimize the learned intra-slice and inter-slice features for better liver and tumor segmentation. These methods have achieved promising results in various applications of tumor and organ segmentation. However, existing CNN models did not generalize well to segment tumors in new data sets with different spatial and density distributions (such as variable location and tumor morphology in NF1) than in the training set.

Ii-C Interaction in CNNs

Inspired by the semi-automated image segmentation techniques, which achieved a higher accuracy with minimal effort of interaction to provide cues that guide segmentation algorithms, such as clicks or scribbles used in Graph Cut [2] and Random Walk [8], recent works have introduced user interactions as extra channels of the input images in CNN [39, 34]. The user interactions were typically transformed into guide maps by DT. We reviewed two well-known DTs:

  • Euclidean distance transform. Xu et al. [39] proposed Deep Interactive Object Selection (DIOS) for natural image segmentation, directly fine-tuned the fully convolutional network (FCN) [24] and introduced EDT for adapting interaction and refined the segmentation results by graph cuts.

  • Geodesic distance transform. Wang et al. [34] proposed two networks for 2D placenta segmentation: a proposal network (P-Net) for initial segmentation, and a refinement network (R-Net) for refined segmentation. GDT was used to transform user interactions into intensity-related distance maps that could provide auxiliary information for accurate segmentation.

Both methods transformed the user clicks into guide maps using a DT. However, the intensity distribution of the two distance maps closely relied on the image size which was commonly variable in 3D medical image segmentation. Inconsistent intensity distribution of the input guide maps significantly affected the segmentation accuracy of CNN-based models. Instead of interacting in the training stage, image-specific fine-tuning [33] incorporating user interactions at the test stage was another solution. Nevertheless, fine-tuning a deep neural network at runtime required massive computational resources, which was unavailable in most of the situations.

Fig. 2: The structure of DINs. User clicks are transformed into foreground guides and background guides, which are concatenated with raw images as the input of backbone networks. ExpDT is the exponential distance transform. The two guides are also encoded and integrated into deeper layers according to the Deep Interactive Module for enhancing information flow about the interactions.

Iii Methods

In the context of neurofibroma segmentation on WBMRI, we propose deep interactive neural networks (DINs) for 3D medical image semantic segmentation, which is inspired by deep object selection [39] for interactive segmentation of natural images and feature modulation [27, 40] for conditional controlling of the neural networks. DINs employed an encoder-decoder backbone embedded within the deep interactive module (DIM). The DIM influences neural network segmentation via the image-specific information generated from user interactions represented by the distance map. The structure of DINs is shown in Fig. 2. Furthermore, for efficient training of DINs, we propose a strategy to simulate user interactions in the training process, thereby avoiding creating thousands of training samples manually. This strategy can also be used to evaluate the performance of DINs and adjust the hyper-parameters quickly.

Iii-a Exponential distance transform

The DT of a binary mask, specifies the minimum distances from each pixel to the boundaries of non-zero regions, where the distances may be signed to distinguish between the inside and outside of the non-zero regions. Instead of the boundaries, unsigned DTs compute minimum distances to the whole masked regions. Various unsigned DTs have been studied for image segmentation in the literature [7, 39, 34]. Given an N-Dimension gray image and a corresponding binary mask , represents the image intensity at point and . Point set is defined as that is considered the point set of user interactions. The DT of concerning is formulated as:

(1)

where is a specific distance function between two points in an image. For Euclidean distance and geodesic distance, can be uniformly defined as:

(2)

where is the set of all paths between the point and , and is such a path, parameterized by . is the derivative of with respect to and

is a unit vector along the tangent direction of

. If , Equation (1) is called Euclidean distance transform, which is not conditioned on the image intensity and thus degenerates to . If , it becomes a geodesic distance transform. When and , Equation (1) is a combination of the two distances. A common characteristic of the two distance functions is that they strongly rely on the image size, which means that for images with different sizes, the intensity distribution is significantly different. We name the DT with this feature as “global transform”. An illustration of EDT, GDT and their blended DT is shown in Fig. 3 (c-e). It can be seen that the grayscale (shown as the color bar) significantly varies from the image size (shown as the dotted box).

Fig. 3: Examples of different DTs. (a) Raw image; (b) Ground truth; (c) Euclidean distance transform; (d) Geodesic distance transform; (e) Blend of Euclidean and geodesic distance; (f) Exponential distance transform. Interactions are displayed as red points, i.e., the set . Color bars represent the grayscale of distance maps with different image sizes.
Fig. 4: Details of the deepest layer of the encoder. DIM output 2 are added to the output of the first normalization layer.

Compared with popular CNN models taking fixed size images as input for classification problems [30, 10], FCN-like models, such as U-Net, remove densely connected layers and thus can accept input images of arbitrary sizes and produce correspondingly-sized outputs [24]

. Furthermore, FCN-like models allow the size of inference images to be different from that of training images, which is crucial in the segmentation of large 3D medical images such as WBMRI due to limited GPU memory and insufficient training samples. For example, we may need to train a 3D U-Net with small volume patches due to the limited GPU memory, but inference with the whole volume as the inference saves the memory of storing gradients of parameters. Most of the operators in CNNs such as addition, multiplication, ReLU, convolution, and max pooling are either element-wise or window-wise 

[20], with which the predictions are hardly affected if the image patches were expanded or clipped (i.e., the size changes). However, the distribution of the integrated global DT corresponds to the actual feature size, and therefore inconsistent image sizes in training and inference stages lead to distribution inconsistency, which may impact the segmentation performance. Therefore, we propose a “local transform”, exponential distance transform, to avoid being affected by the variable image sizes.

The ExpDT is formulated as:

(3)
(4)

with the scale parameter, and controlling the influence of the points in on surrounding points. As shown in Fig. 3 (f), the ExpDT is a local-enhanced distance map that means pixels with high gray levels are tightly gathered near the points in , and therefore ExpDT is hardly affected by the variable image sizes. When , ExpDT tends to form spikes at the point in ; and when , ExpDT becomes flat and loses locality. Different from the previous two transforms that use min to compute distances to , ExpDT turns to use max due to the negative sign in equation (4). If we only considered from the perspective of DTs, ExpDT neither has global attributes nor combines image intensity. We argue that, with the proposed DINs framework, CNNs can still learn discriminative features from the local-enhanced ExpDT.

Iii-B Structure of DINs

The structure of DINs follows the encoder-decoder scheme with skip connection like 3D U-Net [6]. But 3D U-Net is originally evaluated to segment organs with fixed size and balance pixel spacing, such as Xenopus kidney, and not suitable for the scenario of various neurofibromas. Considering that WBMRI has an approximate average shape of with a pixel spacing of , we fix the size of the input image to , which is about half the size of the original image. It should be noted that when the pixel spacing of an image differs significantly, isotropic resampling is not a good choice, due to the missing inter-slice information. Therefore, we build the backbone network with convolutional layers that have different kernel sizes and strides. There are four downsamplings in the coronal plane, but only once in the orthogonal direction. Instead of using max pooling layers, downsampling is implemented by large-stride convolution to save GPU memory and apply a larger batch size. Upsampling is performed by deconvolutional layers [42]

. Commonly, batch normalization (BN), which is used to reduce internal covariate shift and stabilize training 

[12], has poor performance when confronted with small batch size [38]. In addition, Isensee et al. [14] experimentally demonstrate instance normalization [32] (IN) performs better than BN in medical images. Therefore, we apply IN after the convolutional layers, followed by ReLU activation. For clarity, we list all the details of the internal layers of DINs in Table I.

Modules Details of the layers
Input 1 channel  concat: DIM output 1
E1 [conv  k:   s: ]
E2
conv  k:   s: 
conv  k:   s: 
E3
conv  k:   s: 
conv  k:   s: 
E4
conv  k:   s: 
conv  k:   s: 
E5
conv  k:   s:  add: DIM output 2
conv  k:   s: 
D4
deconv  k:   s:  concat: E4 output
[conv  k:   s: ]
D3
deconv  k:   s:  concat: E3 output
[conv  k:   s: ]
D2
deconv  k:   s:  concat: E2 output
[conv  k:   s: ]
D1
deconv  k:   s:  concat: E1 output
[conv  k:   s: ]
Output conv*  k:   s:  softmax
“k: 133, 30”–kernel size (1, 3, 3) and output channel 30
“s: 111”–stride (1, 1, 1)
TABLE I: Details of the feature extractor of DINs. “conv” denotes a series of convolutional, instance normalization and ReLU layers and “conv” denotes pure convolutional layer. “deconv” means deconvolutional layer. And denotes repeat the layer twice.

To incorporate user interactions into deep neural networks, we develop a deep interactive module (DIM). By leveraging feature modulation [27], the DIM embeds additional image-specific information into the network backbone and guides the model to focus on the features that are enhanced by the distance maps, or the so-called guide maps, as shown in Fig. 2. The DIM consists of an ExpDT, a max pooling layer, and a convolutional layer, transforming user interactions into guided maps of two different sizes (DIM output 1 and DIM output 2). Concretely, user interactions are transformed into a foreground guide map and a background guide map by ExpDT and are then integrated into the input layer by concatenating with the raw image as a three-channel input. The two guided maps are further encoded and integrated into the encoder’s deepest layer to avoid the guide information being gradually diluted as more complex features are extracted. A detailed experiment regarding the position where the DIM outputs are inserted is placed in Section V-C3. As shown in Fig. 4

, downsampled guide maps are added to the output of the first normalization layer in the deepest layer of the encoder and followed by a ReLU activation function. The layers in decoder path do not need more integration due to the guide information passed from the skip connections.

Iii-C Simulating strategy

Simulating user interactions in the training stage and evaluation stage can not only free users from the burdensome interactive work for generating thousands of training samples, but also accelerate the process of exploring the optimal hyperparameters. Our strategy of simulating user interactions is based on the work in 

[39] and extends to the setting of 3D images. Let denote the ground truth segmentation of an image , and denote the set of pixels of foreground objects satisfying . We define background regions surrounding objects as:

(5)

where is the Euclidean distance between the point and the set , and is the bandwidth.

When processing 2D natural images, Xu et al. [39] proposed to sample positive clicks from

randomly, and the number of sampled points followed a discrete uniform distribution from 1 to

. Negative clicks were randomly selected from the whole background (random selection) and evenly select from  (uniform selection). The number of negative points did not exceed but could be zero. However, if we directly use the same upper bound and in 3D images, user interactions will become quite sparse as the additional axis, which is experimentally demonstrated to be harmful to the model performance. Therefore, we adapt and to the 3D version as follow:

(6)

This strategy maybe not the best choice, but it indeed is a simple and effective way to determine a better upper bound of the click numbers sampled in the training stage. In addition, positive and negative clicks from the -pixel region near the boundaries should be avoided. Considering the fact of large inter-slice spacing of WBMRIs and infiltrative MRI appearance of neurofibromas, the restriction of was only applied to individual slices. Finally, at least pixels should be kept between any two points in each dimension.

During the evaluation, simulation is performed by placing the next positive/negative click on the center of the largest error region acquired from the symmetric difference between the current prediction and ground truth. Specifically, if the largest error region is part of a foreground object, then the next click is a positive point, whose coordinate is . If , we replace by:

(7)

where is the skeleton [21] of the region . It means that is the nearest point of in the skeleton of . This situation may occur when is concave. In this way, we guarantee that the positive points will not be placed in the wrong region (background) and vice versa. The maximum number of clicks on a single study is limited to . we set a threshold DSC (see Section IV-B) of in cross-validation experiments. If the target threshold can not be achieved in clicks, we will terminate the interaction of the current study.

Methods DSC VOE ARVD FG BG Overall
EDT [39] 1.0 0.0 - 0.62 0.52 0.73 7.6 7.7 15.3
EDT-half 1.0 0.0 - 0.68 0.46 0.46 7.6 6.6 14.2
GDT [34] 0.0 1.0 - 0.72 0.41 0.49 5.9 5.7 11.6
GDT-half 0.0 1.0 - 0.70 0.43 0.51 5.4 6.0 11.4
(EDT + GDT)-half 0.5 0.5 - 0.70 0.43 0.55 5.6 6.4 12.0
ExpDT (ours) - - 0.74 0.39 0.25 7.1 3.1 10.2
- - 0.73 0.40 0.33 4.8 5.5 10.3
- - 0.75 0.38 0.26 5.2 4.3 9.5
TABLE II: Cross-validation results of ExpDT and other methods on the training set with a threshold of 80% DSC. FG: foreground points, BG: background points, Overall: total number of interactions.
Methods DSC VOE ARVD FG BG
EDT [39] 1.0 0.0 - 0.36 0.73 62.25 4.2 15.8 <0.01
EDT-half 1.0 0.0 - 0.38 0.73 22.92 5.0 15.0 <0.01
GDT [34] 0.0 1.0 - 0.23 0.86 86.38 2.3 17.7 <0.01
GDT-half 0.0 1.0 - 0.53 0.60 10.53 6.0 14.0 <0.01
(EDT + GDT)-half 0.5 0.5 - 0.43 0.68 30.65 4.1 15.9 <0.01
ExpDT (ours) - - 0.61 0.52 0.59 10.3 9.7 0.10
- - 0.66 0.47 0.91 7.2 12.8 0.33
- - 0.67 0.46 0.79 7.6 12.4 -
TABLE III: Comparison of ExpDT with other methods on the WBMRI test set with at most 20 interactions. The test set is more challenging than the training set due to the distribution shift compared with the training set. FG: foreground points, BG: background points, Overall: total number of interactions.

is the p-values of t-test between the results of ExpDT and other DTs.

Iv Experiments settings

Iv-a Data set and preprocessing

We collected two WBMRI data sets and an LRMRI data set from NF1 patients; one WBMRI data set was used as the training set and the remaining two as testing sets. Both WBMRI data sets were acquired on 1.5-T MR scanners (MAGETOM Avanto fit, Siemens Medical Systems, USA) using different software (Syngo MR 2004 V for the training set, Syngo MR E11 for the testing set). We did not shuffle and reassigned the training and testing sets to evaluate the DINs’ ability to handle such a complicated situation. The training set contained 125 studies with 1156 NF1 tumors manually contoured by clinicians with expertise in identifying peripheral nerve sheath tumors. Their sizes and pixel spacings are described in section III-B. We adopted online data augmentation to reduce overfitting, including randomly cropping from MRI scans, scaling between 1.0 and 1.25, rotating with a degree

sampled from Gaussian distribution

, flipping in both three dimensions, and gamma transformation with a range of . The WBMRI testing set was composed of 33 studies with an approximate dimension of and unified spacing of . A total of 224 tumors were manually contoured in this data set. The LRMRI testing set contained 45 studies with various dimensions (from to ) and spacing (from to ).

Iv-B Implementation details

We used the weighted cross-entropy as the loss function. The weighting factors were 1.0 and 3.0 for background pixels and foreground pixels, respectively. Adam optimizer 

[19] with and was used to update model parameters. The learning rate was set to

at first and reduced by 0.2 once the validation loss did not decrease for 30 epochs. The minimum learning rate was set to

. We trained 200 batches per epoch with a batch size of 8 and terminated training after 250 epochs. We forced 50% of the images in one batch to include tumors while the others were randomly cropped without restriction. We implemented DINs with the Tensorflow 

[1] package in Python, and conducted experiments on a single Tesla V100 GPU with 32GB memory. We utilized ITK-SNAP [41], 3DQI [3] and imcut [16] for comparing with Active Contour, Random Walk, and Graph Cut, respectively.

Foreground and background clicks were simulated following the strategies described in Section III-C. In DIOS, is set to in 2D image segmentation. We therefore set

(8)

The maximum number of background clicks was set to the same value as for simplicity. It should be noted that this is a crucial hyper-parameter for acceptable performance, and we will discuss it in section V-C. was set to 3 pixels by default. To preserve tumors less than 6 pixels in either direction, we remove the restriction of for these small tumors. and were set to 10, and was liberalized to 1 as a compromise of the huge inter-slice spacing. The bandwidth of was set to 40 pixels.

In the evaluation, dice similarity coefficient (DSC), the total number of clicks, Volumetric overlap error (VOE), and absolute relative volume difference (ARVD) were the primary evaluation metrics. The number of positive clicks and negative clicks were also logged for comparison. Let

and denote the binary prediction and ground truth, respectively. Then, the three metrics are formulated as:

(9)
(10)
(11)

where denotes the number of non-zero elements and means the absolute value. For the interactive evaluation process, if not specified, user interactions were continuously provided until either of 20 clicks or the threshold of 0.8 DSC was reached in all of the following experiments.

V Results and Discussion

V-a Comparison of ExpDT with EDT and GDT

In this section, we compare ExpDT with EDT, GDT, and other three variants. We remove the DIM output 2 in this group of experiments to highlight the contributions of DT. The suffix “half” denotes the input images downsampled to half of the original image patches. The cross-validation results are listed in Table II. When using settings of , ExpDT outperforms EDT by 12% DSC and reduces the average number of interactions from 15.3 to 10.2. Compared with GDT, ExpDT does not involve image intensity but achieves better results in all metrics. Interestingly, with a DSC threshold of 80%, EDT-half and GDT-half achieve fewer interactions than EDT and GDT, respectively. We conjecture that the smaller image sizes alleviated the problem of inconsistency of image sizes, which affects the performance of “global transformation”, between the training and evaluation stages. Finally, ExpDT with achieves the best results on all the four primary metrics. The last three rows of Table II indicate that a successive improvement can be made by subtly adjusting even if the model parameters are fixed after training. The comparison between ExpDT with other DTs demonstrates the effectiveness and flexibility of ExpDT.

Fig. 5: Examples of the segmentation results by different DTs. Each row is a coronal patch from the training set. The first column shows the raw images and the corresponding ground truths. The right three columns are the segmentation results of three DTs: EDT, GDT, and ExpDT, respectively. Red curve: the boundaries of the ground truth. Yellow curve: the boundaries of segmentation results.
Fig. 6: Examples of the segmentation results on WBMRI by DINs. Red curve: the boundaries of the ground truth. Yellow curve: the boundaries of segmentation results.

Table III compares the performance of different DTs on the more challenging testing set. We report three accuracy metrics and the numbers of foreground and background points with 20 clicks provided. One can observe that the overall performance is inferior to those in the training set. Potential reasons include the difference in image size, pixel spacing, and the distribution shift between training and test set. We observe that the accuracy of EDT and GDT decreases more than ExpDT. The number of background points is much more than the number of foreground points, and the ARVD is notably larger than that of ExpDT. It indicates EDT and GDT predict many false-positive regions on account of the properties of their “global transformation”. On the contrary, ExpDT yields improved results, and the numbers of positive clicks and negative clicks are relatively balanced. We also report the p-values of t-test between ExpDT and other DTs, which indicate the substantial improvement of ExpDT. Therefore, as a “local transformation”, ExpDT is more generalizable.

Fig. 5 displays four segmentation cases by DINs with different transform functions. For each case, only one positive click is provided to perform segmentation. We observe that with only one click, DINs with ExpDT achieved better performance in the segmentation of discrete neurofibromas (first row and second row), whereas EDT and GDT produce more false-positive regions as well as false-negative regions. In plexiform neurofibromas, ExpDT achieves consistent performance to segment with good accuracy (third row and fourth row). However, EDT and GDT may miss tumor regions far from the clicked object (third row). Their high sensitivity to image size is the main reason for such instability. For large plexiform neurofibromas (fourth row), GDT may miss part of the lesions due to heterogeneous and diffuse tumor architecture, while ExpDT and show better performance. Fig. 6 shows some results on WBMRI by DINs.

Methods VD RVD (%) Num.
NCI-3DQI [3] -21 (-406 to 114) 4.5 (0.3 to 28.4) 43
MGH-3DQI [3] -30 (-782 to 353) 9.5 (0.1 to 48.8) 34
DINs -3 (-101 to 360) 16.6 (7.9 to 29.2) 40
TABLE IV: Comparison of DINs with other neurofibroma segmentation methods on the LRMRI test set (45 cases). VD: Volume difference, ; RVD: Relative volume difference, . The values in VD and RVD columns are median (range). Num: Number of cases with ¡20% RVD.
Type Methods DSC VOE ARVD
A DeepMedic [17] 0.06 0.97 77.95 <0.01
3D U-Net [6] 0.06 0.97 77.62 <0.01
nnU-Net [13] 0.25 0.84 5.98 <0.01
I nnU-Net + DIM output 1 0.40 0.72 6.70 <0.01
DINs (DIM output 1) 0.67 0.46 0.79 0.58
DINs 0.69 0.45 0.64 -
TABLE V: Comparison of DINs with deep CNN-based methods on the WBMRI test set with 20 interactions. “A” and “I” means automated and interactive methods, respectively. is the p-values of t-test between DINs and other methods.
Fig. 7: Average DSC vs. the number of interactions on the WBMRI test set. Comparison of (a) different interactive methods (b) upper bound of the number of interactions during training and (c) scale factor of ExpDT.
Methods Type of interactions DSC (20 inters) # of inters (0.8 DSC) Running time/inter (in minute)
RW [8] boxes, points 0.47 19.5 2.12
GC [2] boxes, points 0.66 17.9 0.71
AC [18] boxes, thresholds, bubbles 0.71 15.2 5.32
DINs-full (ours) points 0.76 14.6 0.33
DINs-box (ours) boxes, points 0.80 12.1 0.14
TABLE VI: Average Number of interactions of the proposed methods with conventional interactive methods on the WBMRI test set with a threshold of 0.8 DSC. RW: Random Walk; GC: Graph Cut; AC: Active Contour; inters: interactions. The number of interactions count only the points or bubbles.

V-B Comparison with other methods

V-B1 State-of-the-art interactive methods

Cai et al. [3] perform volume measurements on the LRMRI data set using 3DQI software at Massachusetts General Hospital (MGH) and National Cancer Institute (NCI) and MEDx software at NCI. Table IV shows the results of NCI-3DQI - NCI-MEDx and MGH-3DQI - NCI-MEDx at the first two rows. We take the segmentation results of NCI-MEDx as the ground truth and use DINs on the LRMRI data set. Results in the last row indicate that DINs can achieve similar performance comparing with 3DQI. Notice that the results of NCI-3DQI and MGH-3DQI are finalized with various editing tools, while DINs are not for a fair comparison.

V-B2 Deep CNN-based methods

A comparison between DINs and state-of-the-art medical image segmentation methods is shown in Table V. One can observe that DINs outperform automated methods by 44%–63% and outperform “nnU-Net + Dim output 1” by 29%. Automated methods get low scores, while interactive methods have better performance with the help of user knowledge. The comparison between “nnU-Net + Dim output 1” and “DINs (DIM output 1)” indicates that DINs benefit from the proposed structure of feature extractor adapted to WBMRI. Finally, with the DIM output 2, DINs further increase the DSC by 2%, which suggests the effectiveness of integrating user knowledge into deeper layers. In addition, the p-values of t-test between the results of DINs and other methods are listed for reference.

V-B3 Conventional interactive methods

We compare DINs with some conventional interactive segmentation methods, including Random Walk (RW) [8], Graph Cut (GC) [2] and Active Contour (AC) [18]. RW and GC treat the volume as a discrete static graph, performing segmentation with many positive and negative clicks by solving the linear system (RW) or min-cut (GC) problem. Commonly, to save computation time and reduce irrelevant information, a bounding box is provided before running these conventional approaches. Then the search space can be restricted to a smaller region. Therefore, we implement two versions of DINs:

  • DINs-full. Feeding the whole 3D volume into DINs for evaluation.

  • DINs-box. We manually create several bounding boxes in each volume with three criteria: (1) Spatially closer tumors are grouped into the same bounding box. (2) The heights and the widths of bounding boxes are at least 128 pixels, while depths are set as tight to the tumor boundaries as possible that is consistent with users’ behavior. (3) There are no more than five bounding boxes in one case.

Fig. 8:

Ablation study of the structure of DIM. The models are evaluated on the WBMRI test set. Average DSCs and standard deviations after 2, 12, 6, 20 clicks are presented in the bar chart. (a) Comparison of the DIM components. (b) Comparison of the effects of the position where the DIM output 2 is inserted in the encoder. (c) A further comparison of DIM-2, DIM-3, and DIM-input.

Fig. 9: Interactive segmentation results of plexiform neurofibromas with the DINs framework. Bounding boxes are used to trim noises and improve inference speed. The ground truth segmentation is shown in red contours and the prediction contour is shown in yellow. The positive and negative clicks are masked by red points and yellow points, respectively.

Fig. 7 (a) presents the trend of DSC for DINs and conventional methods when the interactive points are continuously provided. RW, GC, AC, and DINs-box ran the algorithms within the volume of interest while DINs-full employs the entire 3D volume. In Fig. 7, we observe that RW and GC show a low DSC, which is caused by the limited information used to compute features. We notice that DINs-full only has a modest improvement compared with AC since AC needs to set a bounding box and adjust two thresholds for filtering backgrounds, which requires much more user efforts to precisely tune the thresholds. With additional bounding boxes, DINs-box significantly exceeds all of the other three conventional interactive methods. For a detailed comparison, we summarize the type of interactions, DSC, number of interactions, and running time in Table VI. DINs-full has the minimum requirement of interactions to perform the segmentation, which substantially reduces the complexity of the interaction. With extra bounding boxes, DINs-box further improves segmentation accuracy and reduces both the user burden and the running time.

Overall, as the number of interactions increases, DINs improve the segmentation accuracy more stably and successively. Furthermore, a substantial improvement can be made by providing a few bounding boxes for some difficult cases (DINs-box). This implies that DINs are more effective and flexible for interactive segmentation.

Fig. 10: Comparison of DINs with GC and RW methods. The ground truth and the prediction ares shown in red and yellow contours, respectively. The positive and negative clicks are masked by red and yellow points, respectively. (A negative click in the second image has a deviated location compared with the other two methods due to the empty prediction with original location.)

V-C Ablation studies

In this section, we conduct ablation studies to investigate the influence of three crucial parameters of DINs: (1) the upper bound of the number of interactions sampled during training, (2) scale parameter of ExpDT, and (3) the DIM module.

V-C1 Number of interactions during training

Training DINs with different upper bound , the number of foreground clicks and background clicks may significantly influence the performance of the resulting model. We conduct experiments with four different to train DINs and evaluate it on the test set. The comparison is shown in Fig. 7 (b). We find that the model trained with has a higher DSC than when the number of clicks exceeds 3. The models trained with and are significantly inferior to . Too many points in training may lead the model to severely rely on user interactions and become more conservative, while too few points may be not adequate to help the model choose discriminative features. It indicates that this hyper-parameter is crucial to train a well-performed model, and the strategy of linking the upper bound of interaction number and image dimension (see Equation (6)) is reasonable and effective.

V-C2 Scale parameter of ExpDT

The scale factor is another key parameter to train a well-performed model, where are correlated to the anatomical anterior-posterior, superior-inferior and left-right, respectively. We fix to 1 and compared different scale values of and . The results are presented in Fig. 7 (c), which indicate that scale factor is a important parameter for optimal model performance. The potential reason is the varied sizes of neurofibromas. However, the sensitivity can also be seen as the flexibility that users can use different scale factors for tumors with various sizes to achieve higher accuracy. The scale factor can also be adjusted in the inference stage for better segmentation results. A comparison is shown in Table II and Table III. It indicates that a slightly larger gives better results. Furthermore, we can adjust the for each case to improve performance.

V-C3 Deep interactive module

We conduct experiments to present the effectiveness of each part of the DIM. Several variants of the DIM are compared: (1) DIM-input: DIM with only the output 1 branch. (2) DIM-highest: DIM with only the output 2 branch. (3) DIM-v2: The path to output 2 is implemented with a max-pooling layer whose kernel size is and two large-stride convolutional layers. The results are shown in Fig. 8 (a). Intuitively, the user interactions are additional features to discriminate against the tumor from the background. We observe that DIM-highest gets a poor DSC and only has subtle improvement as the number of clicks increase, while DIM-input has a higher DSC. By combining DIM-highest with DIM-input together, we achieve further improvement. The comparison among DIM-highest, DIM-input, and DIM indicates that guide maps with spatial information help neural networks learn more discriminative features, and the guide maps in the highest layer of encoder exactly enhance the corresponding features. A comparison between DIM and DIM-v2 indicates that a single convolutional layer is adequate to pass the extra features to deeper layers. More layers introduce more parameters that increase the risk of overfitting.

We also compare the effect of the position where the DIM output 2 is inserted in the encoder. Let DIM- denote the variant of connecting DIM output 2 to the th layer of the encoder, where . DIM- is exactly the proposed structure. The results are shown in Fig. 8 (b). Overall, the DSC increases as the guide maps are integrated into deeper layers. The reason is that guide maps’ information is limited compared with the abundant features from the images and is easily diluted as more features are extracted. Therefore, it is the best choice to integrate the guide maps into the encoder’s deepest layer for enhanced learning of the features. Besides, DIM-input outperforms DIM-1 and DIM-2, which is presented in Fig. 8 (c) for clarity. It indicates that integrating guide maps into shallow layers multiple times (including the input layer) of the encoder hurts the feature learning because of overfitting, which impacts generalization.

V-C4 Effect of click positions

Click positions will affect the interactive segmentation accuracy. To show the effect of click positions when using DINs, we randomly select ten neurofibromas from the training set, and each neurofibroma is clicked once. Each example is evaluated five times with different click positions. The standard deviations of the five results in each neurofibroma are computed. The median (range) of the ten standard deviations is 0.015 (0.005 to 0.044), which indicates that the click positions affect the performance within a reasonable range. As the click number increases to 2 and 3, segmentation results become more stable, and the median of the standard deviations decrease to 0.008 (0.001 to 0.038) and 0.005 (0.001 to 0.026), respectively.

V-D Interactive results

Two interactive segmentation results of plexiform neurofibromas with DINs are displayed in Fig. 9. The ground truth contours (manual segmentation) are red, and the prediction contours are yellow. The positive and negative interactive clicks are marked by red and yellow points, respectively. DINs achieve accurate segmentation of multiple tumors with one click and iteratively improves segmentation with additional interactions. Notice that the is set to by default, which is suitable for most neurofibroma segmentation situations.

In Fig. 10, we compare three interactive methods with the same clicks. (Note: A negative click in the second image has a deviated location compared with the other two methods due to the empty prediction with the original location.) Random walk tends to lead to undersegmentation, while graph cut cannot distinguish neurofibromas from normal organs and tends to result in oversegmentation. In comparison, DINs can recognize neurofibromas accurately. The two groups of segmentation results support the advantages of the DINs.

Vi Conclusion

In conclusion, we propose the effective and flexible Deep Interactive Networks with a novel Exponential Distance Transform for neurofibroma segmentation on WBMRIs. The DINs framework efficiently extracts discriminative features of tumors by incorporating user interactions into low-level features and high-level features. The “local transformation” ExpDT is better equipped to address biased data distribution in medical images. Experiments on the training set and the test set show that the proposed method outperforms conventional interactive methods and performs significantly better than automated and interactive CNN-based methods. Limitations of DINs include the following parts: (1) Like conventional semi-automatic segmentation methods, DINs also need extra editing tools to achieve a acceptable volume measurement; (2) The ExpDT generates guide maps ignoring the image intensities, which may help improve the quality of the guide maps; (3) DINs may failed in some cases such as neurofibromas near the orbits due to the similar intensity. Considering these limitations, integrating the anatomical structure into neural networks and combining the image intensity into guide maps may be the future directions for developing high-performance interactive neural network methods.

Acknowledgment

Wei Chen is supported by The National Key R&D Program of China under grant No.2019YFB1404802 and National Natural Science Foundation of China (61772456). Wenli Cai is supported by Grant R42CA192600 and R42CA189637 from the National Institute of Health and Children Tumor Foundation. Pengyi Hao is supported by National Natural Science Foundation of China under grants No.61801428. Scott Plotkin received support from the Department of Defense (W81 XWH-06-1-0739) and philanthropic funds.

References

  • [1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, and M. Isard (2016)

    Tensorflow: a system for large-scale machine learning

    .
    In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–283. Cited by: §IV-B.
  • [2] Y.Y. Boykov and M.-P. Jolly (2001-07) Interactive graph cuts for optimal boundary region segmentation of objects in N-D images. In

    Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001

    ,
    Vol. 1, pp. 105–112 vol.1. External Links: Document Cited by: §II-C, §V-B3, TABLE VI.
  • [3] W. Cai, S. M. Steinberg, M. A. Bredella, G. Basinsky, B. Somarouthu, S. R. Plotkin, J. Solomon, B. C. Widemann, G. J. Harris, and E. Dombi (2018-02) Volumetric MRI analysis of plexiform neurofibromas in neurofibromatosis type 1: Comparison of 2 methods. Acad. Radiol. 25 (2), pp. 144–152. External Links: ISSN 1076-6332, Document Cited by: §I, §II-A, §IV-B, §V-B1, TABLE IV.
  • [4] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille (2018-04) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40 (4), pp. 834–848. External Links: ISSN 1939-3539, Document Cited by: §II-B.
  • [5] P. F. Christ, M. E. A. Elshaer, F. Ettlinger, S. Tatavarty, M. Bickel, P. Bilic, M. Rempfler, M. Armbruster, F. Hofmann, M. D’Anastasi, W. H. Sommer, S. Ahmadi, and B. H. Menze (2016) Automatic liver and lesion segmentation in CT using cascaded fully convolutional neural networks and 3D conditional random fields. In Medical Image Computing and Computer-Assisted Intervention, Cham, pp. 415–423 (en). External Links: Document, ISBN 978-3-319-46723-8 Cited by: §II-B.
  • [6] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger (2016) 3D U-Net: learning dense volumetric segmentation from sparse annotation. In Medical Image Computing and Computer-Assisted Intervention, Cham, pp. 424–432. External Links: Document, ISBN 978-3-319-46722-1 978-3-319-46723-8 Cited by: §I, §II-B, §III-B, TABLE V.
  • [7] A. Criminisi, T. Sharp, and A. Blake (2008) Geos: geodesic image segmentation. In European Conference on Computer Vision, pp. 99–112. Cited by: §III-A.
  • [8] L. Grady (2006-11) Random walks for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (11), pp. 1768–1783. External Links: ISSN 1939-3539, Document Cited by: §II-C, §V-B3, TABLE VI.
  • [9] M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville, Y. Bengio, C. Pal, P. Jodoin, and H. Larochelle (2017-01) Brain tumor segmentation with deep neural networks. Med. Image Anal. 35, pp. 18–31. External Links: ISSN 1361-8415, Document Cited by: §I.
  • [10] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 770–778. Cited by: §II-B, §III-A.
  • [11] C. Y. Ho, J. M. Kindler, S. Persohn, S. F. Kralik, K. A. Robertson, and P. R. Territo (2020) Image segmentation of plexiform neurofibromas from a deep neural network using multiple b-value diffusion data. Sci Rep 10 (1), pp. 1–10. External Links: Document Cited by: §II-A.
  • [12] S. Ioffe and C. Szegedy (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv: 1502.03167. External Links: 1502.03167 Cited by: §III-B.
  • [13] F. Isensee, P. F. Jäger, S. A. A. Kohl, J. Petersen, and K. H. Maier-Hein (2020) Automated design of deep learning methods for biomedical image segmentation. arXiv:1904.08128. External Links: 1904.08128 Cited by: §I, TABLE V.
  • [14] F. Isensee, J. Petersen, S. A. Kohl, P. F. Jäger, and K. H. Maier-Hein (2019) nnU-Net: breaking the spell on successful medical image segmentation. arXiv: 1904.08128. External Links: 1904.08128 Cited by: §III-B.
  • [15] J. Jiang, Y. Hu, C. Liu, D. Halpenny, M. D. Hellmann, J. O. Deasy, G. Mageras, and H. Veeraraghavan (2019-01)

    Multiple resolution residually connected feature streams for automatic lung tumor segmentation from CT images

    .
    IEEE Trans. Med. Imaging 38 (1), pp. 134–144. External Links: ISSN 1558-254X, Document Cited by: §I.
  • [16] M. Jiřík, V. Lukes, M. Svobodova, and M. Železný (2013) Image segmentation in medical imaging via graph-cuts.. In 11th International Conference on Pattern Recognition and Image Analysis: New Information Technologies, Cited by: §IV-B.
  • [17] K. Kamnitsas, C. Ledig, V. F. J. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, and B. Glocker (2017-02) Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med. Image Anal. 36, pp. 61–78 (en). External Links: ISSN 1361-8415, Document Cited by: §I, TABLE V.
  • [18] M. Kass, A. Witkin, and D. Terzopoulos (1988-01) Snakes: active contour models. Int. J. Comput. Vis. 1 (4), pp. 321–331 (en). External Links: ISSN 1573-1405, Document Cited by: §V-B3, TABLE VI.
  • [19] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv:1412.6980. External Links: 1412.6980 Cited by: §IV-B.
  • [20] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 1097–1105. Cited by: §III-A.
  • [21] T. Lee, R. L. Kashyap, and C. Chu (1994) Building skeleton models via 3-D medial surface axis thinning algorithms. CVGIP: Graphical Models and Image Processing 56 (6), pp. 462–478. External Links: Document Cited by: §III-C.
  • [22] W. Li, G. Wang, L. Fidon, S. Ourselin, M. J. Cardoso, and T. Vercauteren (2017) On the compactness, efficiency, and representation of 3D convolutional networks: brain parcellation as a pretext task. In Information Processing in Medical Imaging, Cham, pp. 348–360. External Links: ISBN 978-3-319-59050-9 Cited by: §II-B.
  • [23] X. Li, H. Chen, X. Qi, Q. Dou, C. Fu, and P. Heng (2018-12) H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes. IEEE Trans. Med. Imaging 37 (12), pp. 2663–2674. External Links: ISSN 0278-0062, Document Cited by: §I, §I, §II-B.
  • [24] J. Long, E. Shelhamer, and T. Darrell (2015) Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440. Cited by: 1st item, §III-A.
  • [25] F. Milletari, N. Navab, and S. Ahmadi (2016-10) V-Net: fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, pp. 565–571. External Links: Document, ISBN 978-1-5090-5407-7 Cited by: §I, §II-B.
  • [26] K. Nazeri, A. Aminpour, and M. Ebrahimi (2018) Two-stage convolutional neural network for breast cancer histology image classification. In International Conference Image Analysis and Recognition, pp. 717–726. External Links: Document Cited by: §I.
  • [27] E. Perez, F. Strub, H. de Vries, V. Dumoulin, and A. Courville (2018-04) FiLM: visual reasoning with a general conditioning layer.

    Proceedings of the AAAI Conference on Artificial Intelligence

    32 (1) (en).
    External Links: ISSN 2374-3468 Cited by: §III-B, §III.
  • [28] O. Ronneberger, P. Fischer, and T. Brox (2015) U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Cited by: §I, §II-B.
  • [29] T. B. Sekou, M. Hidane, J. Olivier, and H. Cardot (2019) From patch to image segmentation using fully convolutional networks–application to retinal images. arXiv: 1904.03892. External Links: 1904.03892 Cited by: §I.
  • [30] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556. External Links: 1409.1556 Cited by: §III-A.
  • [31] J. Solomon, K. Warren, E. Dombi, N. Patronas, and B. Widemann (2004) Automated detection and volume measurement of plexiform neurofibromas in neurofibromatosis 1 using magnetic resonance imaging. Comput. Med. Imaging Graph. 28 (5), pp. 257–265. External Links: Document Cited by: §II-A.
  • [32] D. Ulyanov, A. Vedaldi, and V. Lempitsky (2016) Instance normalization: the missing ingredient for fast stylization. arXiv: 1607.08022. External Links: 1607.08022 Cited by: §III-B.
  • [33] G. Wang, W. Li, M. A. Zuluaga, R. Pratt, P. A. Patel, M. Aertsen, T. Doel, A. L. David, J. Deprest, S. Ourselin, and T. Vercauteren (2018-07) Interactive medical image segmentation using deep learning with image-specific fine tuning. IEEE Trans. Med. Imaging 37 (7), pp. 1562–1573. External Links: ISSN 0278-0062, 1558-254X, Document Cited by: §II-C.
  • [34] G. Wang, M. A. Zuluaga, W. Li, R. Pratt, P. A. Patel, M. Aertsen, T. Doel, A. L. David, J. Deprest, S. Ourselin, and T. Vercauteren (2019-07) DeepIGeoS: a deep interactive geodesic framework for medical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 41 (7), pp. 1559–1572. External Links: ISSN 0162-8828, 2160-9292, 1939-3539, Document Cited by: §I, 2nd item, §II-C, §III-A, TABLE II, TABLE III.
  • [35] L. Weizman, D. Helfer, D. B. Bashat, L. Pratt, L. Joskowicz, S. Constantini, B. Shofty, and L. B. Sira (2014) PNist: interactive volumetric measurements of plexiform neurofibromas in MRI scans. Int. J. Comput. Assist. Radiol. Surg. 9 (4), pp. 683–693. External Links: Document Cited by: §I, §II-A.
  • [36] L. Weizman, L. Hoch, D. B. Bashat, L. Joskowicz, L. Pratt, S. Constantini, and L. B. Sira (2012) Interactive segmentation of plexiform neurofibroma tissue: method and preliminary performance evaluation. Med. Biol. Eng. Comput. 50 (8), pp. 877–884. External Links: Document Cited by: §I.
  • [37] X. Wu, G. Tan, K. Li, S. Li, H. Wen, X. Zhu, and W. Cai (2020) Deep parametric active contour model for neurofibromatosis segmentation. Future Generation Computer Systems 112, pp. 58–66. External Links: Document Cited by: §II-A.
  • [38] Y. Wu and K. He (2018-09) Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: §III-B.
  • [39] N. Xu, B. Price, S. Cohen, J. Yang, and T. S. Huang (2016) Deep interactive object selection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 373–381. Cited by: §I, 1st item, §II-C, §III-A, §III-C, §III-C, TABLE II, TABLE III, §III.
  • [40] L. Yang, Y. Wang, X. Xiong, J. Yang, and A. K. Katsaggelos (2018-06) Efficient video object segmentation via network modulation. In The IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §III.
  • [41] P. A. Yushkevich, J. Piven, H. C. Hazlett, R. G. Smith, S. Ho, J. C. Gee, and G. Gerig (2006) User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage 31 (3), pp. 1116–1128. External Links: Document Cited by: §IV-B.
  • [42] M. D. Zeiler, G. W. Taylor, and R. Fergus (2011) Adaptive deconvolutional networks for mid and high level feature learning. In 2011 International Conference on Computer Vision, pp. 2018–2025. External Links: Document Cited by: §III-B.