Urinary bladder cancer (BC) is a life-threatening disease with high morbidity and mortality rate [1, 2, 3, 4, 5, 6]. Accurate identification of tumor stage and grade is of extreme clinical importance for the treatment decision and prognosis of patients with BC [5, 6, 7, 8, 9]. Clinical standard reference for this task is optical cystoscopy (OCy) with transurethral resection (TUR) biopsies, however, this procedure is often limited due to its invasiveness and discomfort for patients. With narrow field-of-view (FOV) for lumen observation and local characterization of tissue samples, single transurethral biopsy has exhibited relatively high misdiagnosis rates, especially for staging [5, 7, 8, 9]. Recent advances in magnetic resonance imaging (MRI) and image processing technologies have made radiomics methods that predict tumor stage and grade using image features a potential alternative for non-invasive evaluation of BC [9, 10, 11, 12, 13].
Previous studies indicate that radiomics descriptors from inner and outer bladder wall (IW and OW) as well as attached tumor regions in MRI images have great potential in reflecting tumorous subtypes, properties or muscle invasiveness [12, 13, 14, 11, 10, 15]. The segmentation of bladder walls and tumors is an important step toward extracting these useful descriptors [9, 10, 12, 16]. Nowadays, the standard reference in clinical routine to segment BC structures in MRI images is based on manual delineation by experts, which is performed in a slice-by-slice manner [9, 10, 12]. However, it is a tedious process requiring a huge amount of human effort. Furthermore, the expert’s experience, imaging parameters like slice thickness, as well as image noise, motion artifacts, weak wall boundaries, or muscle-invasive tumors can impact the delineation efficiency and consistency, which may affect the radiomics-based prediction of BC properties [9, 10, 17, 12, 16]. Therefore, an accurate and automatic multi-region segmentation tool is highly desired for tumor prediction and prognosis in a radiomics perspective [9, 10, 17, 12, 13, 16].
|Li et al.,2004 ||Markov random field (MRF)||IW|
|Li et al., 2008 ||Markov random field (MRF)||IW|
|Duan et al., 2010 ||Coupled Level-sets||IW/OW|
|Chi et al., 2011 ||Coupled Level-sets||IW/OW|
|Garnier et al., 2011 ||Active region growing||IW|
|Ma et al., 2011 ||Geodesic active contour (GAC) + Shape-guided Chan-Vese||IW/OW|
|Duan et al., 2012 ||Coupled Level-set + Bladder wall thickness Prior||Tumour|
|Han et al., 2013 ||Adaptive Markov random field (MRF) + Coupled Level-set||IW/OW|
|Qin et al., 2014 ||Coupled directional Level-sets||IW/OW|
|Xiao et al., 2016 ||Coupled directional Level-sets + Fuzzy c-means||IW/OW/Tumour|
|Xu et al., 2017 ||Continuous max-flow + Bladder wall thickness Prior||IW/OW|
The automatic delineation of IW and OW in MRI images remains a challenging task due to important bladder shape variations, strong intensity inhomogeneity in urine caused by motion artifacts, weak boundaries and complex background intensity distribution (Fig. 1) [9, 10, 15]. When further considering the presence of cancer, the problem becomes much harder as it introduces more variability across population. That might explain why literature on multi-region bladder segmentation remains scarce, with few techniques proposed to date (Table I). Initial attempts considered the use of Markov Random Fields to tackle the segmentation of the IW [18, 19]. Garnier et al.  proposed a fast deformable model based on active region growing that solved the leakage issue of standard region growing algorithms. The algorithm combined an inflation force, which acts like a region growing process, and an internal force that constrains the shape of the surface. However, it would be difficult to apply these approaches directly to OW segmentation due to the complex distribution of tissues surrounding the bladder.
Several level-set based segmentation methods have also been introduced to extract both inner and outer bladder walls [9, 20, 23, 10]. In , Duan et al. developed a coupled level-set framework which adopts a modified Chan-Vese model to locate both IW and OW from T1-weighted MRI in a 2D slice fashion. Recently, Chi et al.  applied a geodesic active contour (GAC) model in T2-weighted MRI images to segment the IW, and then coupled the constraint of maximum wall thickness in T1-weighted MRI images to segment the OW. The limitation of this work arises from the difficulty to register slices between the two sequences. To overcome these limitations, Qin et al.  proposed an adaptive shape prior constrained level-set algorithm that evolves both IW and OW simultaneously from T2-weighted images. Despite its precision, this algorithm can be sensitive to the initializing process. In an extension of these approaches, Xiao et al.  introduced a second step based on fuzzy c-means  to include tumor segmentation in the pipeline. However, this extended method showed inconsistent results between different datasets. While popular, level-sets present some important drawbacks. First, these variational approaches are based on local optimization techniques, making them highly sensitive to initialization and image quality. Second, if multiple objects are embedded in another object, multiple initializations of the active contours are required, which is time-consuming. Third, if there exist some gaps in the target, evolving contours may leak into those gaps and represent objects with incomplete contours. Finally, processing times can be prohibitive, particularly in medical applications where segmentation is typically performed in volumes. As reported in previous works, segmentation times usually exceed 20 minutes for a single 3D volume. As alternative, a modified Geodesic active contour (GAC) model and a shape-guided Chan-Vese model were proposed in  to segment bladder walls. Recently, Xu et al.  introduced a continuous max-flow framework with global convex optimization to achieve a more accurate segmentation of both IW and OW. Nevertheless, the high sensitivity to initialization of all previous methods makes the full automation of segmentation very challenging. Further, most methods focus only on bladder walls and are unable to segment simultaneously both bladder walls and tumors.
Deep learning has recently emerged as a powerful modeling technique, demonstrating significant improvements in various computer vision tasks such as image classification, object detection  and semantic segmentation . Particularly, convolutional neural networks (CNNs) have been applied with enormous success to many medical image segmentation problems [29, 30, 31, 32, 33]. Bladder segmentation was also addressed with deep learning techniques, however, the image modality of study has mainly been limited to computed tomography (CT) [34, 35, 36]. For example, Cha et al.  proposed a CNN followed by a level-set method to segment the IW and OW. Considering the significant advantages of MRI, including its high soft-tissue contrast and non-radiation, it may be more suitable for the characterization of bladder wall and tumor properties. Surprisingly, the application of deep learning to the multi-region segmentation of bladder cancer in MRI images remains, to the best of our knowledge, unexplored.
In light of limitations with state-of-the-art methods, and inspired by the success of deep learning in medical image segmentation, we propose to address the task of multi-region bladder segmentation in MRI using a CNN. Specifically, we use a deep CNN that builds on UNet , a well establish model for segmentation which combines a contracting path and an expansive path to get a high-resolution output of the same size as the input. To increase the receptive field spanned by the network, we propose to use a sequence of progressive dilation convolutional layers. As motivated in , aggressively increasing dilation factors might fail to aggregate local features due to sparsity of the kernel, which is detrimental for small objects. Thus, we hypothesize that slowly increasing the dilation rate along the convolutions within each block may decrease sparsity in the dilated kernels, thereby allowing to capture more context while preserving the resolution of the analyzed region. As both large and small tumor regions are present in the images of the current study, this scenario benefits to both cases. This strategy enables us to span broader regions of input images without incorporating large dilation rates that can degrade segmentation performance. The current work is the first attempt to apply CNNs for multi-region segmentation of bladder cancer in MRI.
2.1 Fully convolutional neural networks
CNNs are a special type of artificial neural networks that learn a hierarchy of increasingly complex features by successive convolution, pooling and non-linear activation operations [krizhevsky2012imagenet, lecun1998gradient]. Originally designed for image recognition and classification, CNNs are now commonly used in semantic image segmentation. A naive approach follows a sliding-window strategy where regions defined by the window are processed independently. This technique presents two main drawbacks: reduction of segmentation accuracy and low efficiency. An alternative approach, known as fully CNNs (FCNs) [FCN], mitigates these limitations by considering the network as a single non-linear convolution that is trained in an end-to-end fashion. An important advantage of FCNs compared to standard CNNs is that they can be applied to images of arbitrary size. Moreover, because the spatial map of class scores is obtained in a single dense inference step, FCNs can avoid redundant convolution operations, making them computationally more efficient.
The networks explored in this work are built on the UNet architecture, which has shown outstanding performance in various medical segmentation tasks [39, 40, 41, 42, 43]. This network consists of a contracting and expanding path, the former collapsing an image down into a set of high level features and the latter using these features to construct a pixel-wise segmentation mask. The original architecture also proposed skip-connections between layers at the same level in both paths, by-passing information from early feature maps to the deeper layers in the network. These skip-connections allow incorporating high level features and fine pixel-wise details simultaneously.
Unlike in natural images, targeting clinical structures in segmentation tasks requires a certain knowledge of the global context. This information may, for example, indicate how an organ is arranged with respect to other ones. Standard convolutions have difficulty integrating global context, even when pooling operations are sequentially added into the network. For instance, in the original UNet model, the receptive field spanned by the deepest layer is only 128128 pixels. This means that the context of the entire image is not fully considered in the deep architecture to generate its final prediction. A straightforward solution for increasing the receptive field is to include additional pooling operations in the network. However, this strategy usually decreases the performance since relevant information is lost in the added down-sampling operations.
2.2 Dilated convolutions
adopted this operation for semantic segmentation to increase the receptive field of deep CNNs, as alternative to down-sampling feature maps. The main idea is to insert “holes” (i.e., zeros) between pixels in convolutional kernels to increase image resolution of intermediate feature maps, thus enabling dense feature extraction in deep CNNs with an enlarged field of convolutional kernels (Fig.2). This ultimately leads to more accurate predictions [45, 46, 47, 48, 49, 50].
Consider a convolutional kernel in layer with a size of . The receptive field of , also known as effective kernel size, can be defined as
where represents the dilation rate of kernel , specifying the number of zeros (or holes) to be placed between pixels. Note that, in standard convolutions,
is equal to 1. Furthermore, the stride is considered equal to 1 for simplicity.
2.3 Architecture details
In this study, we propose to use dilated convolutions in a CNN architecture based on UNet. To evaluate the impact of dilated convolutions on segmentation performance, several models are investigated. First, we employ the original UNet implementation and a modified version that will serve as baselines (Section 2.3.1). Then, in the third network, the first standard convolution of each block in the baseline model is replaced by a dilated convolution (Section 2.3.2). For the proposed model, the entire standard block in the baseline is replaced by the proposed progressive dilated convolutional block (Section 2.3.3). Furthermore, we compare the proposed method to state-of-the-art segmentation architectures in the literature. Particularly, we investigated related deep convolutional neural networks that include factorized convolutions, including ERFNet  and ENet .
2.3.1 UNet baselines
We employ the original version of UNet as described in  – with the exception of using 32 kernels in the first layer instead of 64 – as baseline network, which will be denoted as UNet-Original. In addition, we further included three main modifications on the original version in a second baseline, denoted as UNet-Baseline
. First, we employ convolutions with stride 2 instead of max-pooling in the contracting path. Second, the deconvolutional blocks in the decoding path are replaced by upsampling and convolutional blocks, which have demonstrated to improve performance (Fig. 3
,a). Third, to have a more compact representation of learned features, a bottleneck block with residual connections (Fig.3,b) is introduced between the contracting and expanding paths. The objective of these connections is to have the information flow from the block’s input to its output without modification, thus encouraging the path through non-linearities to learn a residual representation of the input data . In addition, the number of kernels in the first convolutional block has been reduced from 64 to 32, since no improvement was observed with the heavier model, and allowing us to obtain a more efficient model.
Furthermore, each convolution layer in the proposed models performs batch normalization
. By reducing variations between training samples in mini-batch learning, this technique was shown to accelerate convergence of the parameter learning process and make the model more robust in testing. In addition, all activation functions in our networks are parametric rectifier linear units (PReLUs).
2.3.2 Dilated UNet
Our first dilated CNN model follows the general architecture of UNet, but introduces a context module at each block of the encoding path. The context module contains a dilated convolution as first operation of each block to systematically aggregate multi-scale contextual information. An inherent problem when employing dilated convolutions is gridding  (Fig. 4,top
). As zeros are padded between pixels in a dilated convolutional kernel, the receptive field spanned by this kernel only covers an area with some sort of checkerboard patterns, sampling only locations with non-zero values. This results in the loss of neighboring information, which might be relevant for an effective learning. If dilation rateincreases, this issue becomes even worse, as the convolution kernel becomes too sparse to capture any local information. To alleviate this problem, we follow the strategy proposed in other works [52, 51], where dilated convolutions are alternated with standard convolutions and dilation rates are progressively increased. Therefore, the dilation rate in the convolutional blocks of this model are equal to 1, 2, 4 and 8, from shallow to deep layers, respectively.
2.3.3 UNet with progressive dilated convolutional blocks
Instead of gradually increasing the dilation factor through different layers, we propose to increase it within each context module. The main idea is that features learned at each block are able to capture multi-scale level information. Therefore, at each block, the dilation rate will be equal to 1,2, and 4. With this, we avoid including large values that span broader regions, while maintaining the same network receptive field.
Figure 5 gives the schematic of the proposed model. As shown, the deep network consists of two important components: an encoder and a decoder part. While the encoder path learns the visual features for the input data, the decoder path is responsible for creating the dense segmented mask and recovering the original resolution. The encoder path is composed of 4 convolutional blocks, each one containing 3 convolutional layers, followed by a strided convolution and a bridge block which contains two convolutional layers and a residual block (Fig. 3,b) – generating a structure with a depth of 16 layers. On the other hand, the decoding path has 17 convolutional layers distributed as follows: 4 upsampling modules, which contains 4 convolutional layers each, followed by a 1
convolution before the softmax layer. All the other convolutional layers are composed of 33 filters. The dilation rate is shown at the bottom of the convolutional layers in the first block. In the rest of the network, blocks with the same color correspond to the same dilation rate.
3.1.1 Patients population
The study was approved by the Ethics Committee of Tangdu Hospital of the Fourth Military Medical University. Informed content was obtained from each enrolled subject. Sixty patients with pathologically-confirmed BC lesions between October 2013 and May 2016 were involved in this study (Table II). Among them, 12 patients had multiple focal bladder tumors, including 9 patients with two tumor sites, two patients with three tumor sites, and one patient with four tumor sites. A total of 76 BC lesions were identified, their diameter in the range of 0.57 - 6.05 cm (Table II).
|Patients, No.(%)||60 (100%)|
|Male, No.(%)||53 (88.33%)|
|Female, No.(%)||7 (11.67%)|
|Age, Median (Range), yrs||67 [42, 81]|
|Tumors, No.(%)||76 (100%)|
|Stage T1 or lower, No.(%)||16 (21.05%)|
|Stage T2, No.(%)||42 (55.26%)|
|Stage T3 or higher, No.(%)||18 (23.69%)|
|Tumor Size, Median (Range), cm||2.22 [0.57, 6.05]|
3.1.2 Image acquisition
All subjects were examined before treatment by a clinical whole body MR scanner (GE Discovery MR 750 3.0T) with a phased-array body coil. A high-resolution 3D Axial Cube T2-weighted (T2W) MR sequence was adopted due to its high soft tissue contrast and relatively fast image acquisition. Prior to scanning, each patient was asked to drink enough mineral water and then waited for an adequate time period so that the bladder was sufficiently distended. The acquisition time ranged from 160.456 to 165.135 s with three-dimensional scanning. The repetition and echo time were 2500 ms and 135 ms, respectively. The imaging process contained from 80 to 124 slices per scan, each of size 512 512 pixels, with a pixel resolution of 0.5 0.5 mm. Moreover, the slice thickness was 1 mm, and the space between slices also 1 mm.
3.1.3 Ground truth
For each dataset, the urine, bladder walls and tumor regions were manually delineated by two experts with 9 years of experience in MR image interpretation, using a custom-developed package of MATLAB 2016b. Particularly, during the delineation process, all the target regions, including IW, OW and tumor regions, were first independently outlined slice-by-slice by the two experts who were blinded to the pathological results of the patient. Afterwards, both their delineations were mapped to the corresponding images with different contour colors. Then, the two experts worked together to achieve a consensus by modifying their delineations, referring to the corresponding pathological results or the functional MRI images (if available) with the corresponding slice identified by the registration of functional MRI with T2W image data sets. If disagreements on the delineations remained, the average of the two delineations was computed.
Similarity between two segmentations can be assessed by employing several comparison metrics. Since each of these yields different information, their choice is very important and must be considered in the appropriate context. The Dice similarity coefficient (DSC)  has been widely used to compare volumes based on their overlap. The DSC for two volumes and can be defined as
However, volume-based metrics generally lack sensitivity to segmentation outline, and segmentations showing a high degree of spatial overlap might present clinically-relevant differences between their contours. This is particularly important in medical applications, such as radiation treatment planning, where contours serve as critical input to compute the delivered dose or to estimate prognostic factors. An additional analysis of the segmentation outline’s fidelity is highly recommended since any under-inclusion of the target region might lead to a higher radiation exposure in healthy tissues, or vice-versa, an over-inclusion might lead to tumor regions not being sufficiently irradiated. Thus, distance-based metrics like the average symmetric surface distance (ASSD) were also considered in our evaluation. The ASSD between contoursand are defined as follows:
where is the distance between point and .
3.2 Implementation details
All networks were trained from scratch by employing the Adam optimizer with standard
values equal to 0.9 and 0.99 and minimizing the cross-entropy between the predicted probability distributions and the ground truth. Weights were initialized as in. The learning rate was initially set to 1e
, and then decreased by half each time encountering 20 epochs without improvement on the validation set. During training, four images were employed for each mini-batch. All three models were implemented in pyTorch and experiments were ran on a machine equipped with a NVIDIA TITAN X with 12GBs of memory.
The performance of the UNet-Progressive model was compared with that of the UNet-Base and UNet-Dilated models introduced in Sections 2.3.1 and 2.3.2, as well as against the original UNet implementation , ENet and ERFNet. For these experiments, the dataset was split into training, validation and testing sets composed of 40, 5 and 15 patients, respectively. These datasets remained the same for training and testing all models. Figure 6 depicts the evolution of the DSC measured on the validation set at different training epochs. In general, all models yield similar performance for the inner wall (IW) and outer wall (OW) regions. However, in the case of tumor regions, the UNet-Progressive model obtained a higher accuracy than other models once the training converged.
Table III reports the accuracy in terms of DSC and ASSD obtained by all evaluated models on the testing set. Results show the three models to achieve comparable results on inner and outer wall segmentation. However, as observed in the validation set, the UNet-Progressive model performed better than the baseline and UNet-Dilated when segmenting the tumor. Results from a one-tailed Wilcoxon signed-rank test between the proposed network with progressive dilated convolutions and other models are shown in Table IV. We see that our UNet-Progressive network is statistically superior to standard UNet, for all regions and metrics. In all but one case, it also gives statistically better performance than ENet, which employs similar dilated modules. The advantage of our method is particularly substantial for tumor regions, where it statistically outperforms all compared methods except UNet-Dilated. Improvements for this region are especially pronounced for the ASSD metric (see Table III), which accounts for clinically-relevant differences between contours.
|Inner Wall||Outer Wall||Tumor||Inner Wall||Outer Wall||Tumor|
|UNet-Original||0.9701 0.0130||0.7969 0.0492||0.5638 0.1646||0.4260 0.1749||0.5590 0.1673||4.4298 2.6382|
|UNet-Baseline||0.9839 0.0030||0.8344 0.0214||0.6276 0.0963||0.3379 0.0796||0.4503 0.0919||3.7432 1.6923|
|UNet-Dilated||0.9844 0.0030||0.8386 0.0232||0.6791 0.0818||0.3210 0.0632||0.4238 0.0725||3.4320 1.9224|
|UNet-Progressive*||0.9836 0.0033||0.8391 0.0247||0.6856 0.0827||0.3517 0.0874||0.4299 0.0859||2.8352 1.1865|
|ENet ||0.9788 0.0082||0.8065 0.0446||0.6185 0.1436||0.3775 0.1245||0.5046 0.1452||4.3640 3.5614|
|ERFNet ||0.9822 0.0038||0.8367 0.0378||0.6412 0.1192||0.3398 0.1069||0.4377 0.1314||3.4588 2.1505|
|* UNet-Progressive corresponds to our proposed method.|
details the distribution of DSC and ASSD values for tested models. In these plots, we can first observe how adding dilated convolutions to standard UNet architectures improves performance, which is reflected in the distribution of segmentation accuracy values. Further, if convolutions are added progressively, such as in the proposed modules, the distribution of values remains more compact (i.e., the variance is smaller). This is more prominent for results on tumor regions in both DSC and ASSD distribution plots.
To visualize the impact of including dilated convolutions in the standard way, or including progressive dilated modules, Fig. 8 shows segmentation results of the baseline UNet, UNet with standard dilated convolutions, and the proposed model. These results illustrate the variable sizes of tumors, some of them quite small and thus hard to segment (e.g., the tumor in the bottom row). Once again, we see that the three models achieve similar segmentations for inner and outer walls, and that differences arise when comparing the tumor segmentations provided by the models. Even though the tumor is typically identified by all the models, the proposed UNet-Progressive model achieves the most reliable contours compared to the ground truth. UNet underestimates the tumor region in two of the three examples, and generates a blobby contour in the third case (top). On the other hand, UNet-Dilated improves results compared to the version without dilated convolutions, however fails to separate outer walls from carcinogenic regions in some cases (top of the figure). By employing progressive dilated modules, our UNet-Progressive network can successfully differentiate tumor and outer walls, as shown in the top-right image of Fig. 8.
To show that adding the proposed progressive dilated convolution modules does not introduce a burden on computation time, we compared the different UNet-based architectures in terms of efficiency (Table V). We observe that inference times per 2D slice is very similar across the three deep models. Taking into account that a volume contains between 80 and 124 2D slices, the segmentation of a whole volume was performed in less than a second, regardless the architecture.
In this study, a deep CNN model with progressive dilated convolutional modules was proposed to segment multiple regions in MRI images of BC patients. The proposed network extends the well-known UNet model by including dilated convolutions where the dilation rate within each module increases progressively. We evaluated our model on MRI image datasets acquired from an in-house cohort of 60 patients with BC. Results demonstrate the proposed approach to achieve state-of-the-art accuracy compared to existing approaches for this task, in a fraction of time. Additionally, when compared against similar networks, our architecture also demonstrated outstanding performance, particularly for tumor regions.
Tested models have shown similar results for the segmentation of the inner and outer bladder walls. However, a significant improvement was observed between the UNet-based models for tumor segmentation, particularly between the baselines and models with dilated convolutions. This improvement is due to the larger receptive field provided by dilated convolutions, which leverages more contextual information. When using progressive dilated convolutions, the ability to span similar-sized regions while avoiding large dilation rates – which insert many holes between neighbor pixels – might explain improved accuracy compared to models with standard dilated convolutions.
The proposed model also outperformed recent deep architectures that incorporate dilated convolutions with different dilation rates, such as ENet or ERFNet. It is important to note that those networks were developed to achieve a trade-off between performance and inference time. Therefore, their limited number of learnable parameters may explain differences with the architecture proposed in this work. Nevertheless, unlike out model, these networks do not implement skip connections between layers from the encoder and the decoder, which was shown to be important for recovering spatial information lost during downsampling .
Direct comparison with previous state-of-the-art methods for this task is challenging. First, as shown in Table I, research on automatic segmentation of multiple bladder regions in MRI images remains very limited. Moreover, different performance metrics were used to evaluate the performance of the few existing approaches, and some works only reported qualitative results [21, 23], which is subject to user interpretation. This makes it difficult to perform a fair and complete comparison between the proposed model and previous approaches for this problem. For example, Ma et al.  reported a mean DSC of 0.97 for the IW, but performance on the OW was not assessed. More recently, Qin et al.  evaluated their method on 11 subjects, reporting mean DSC values of 0.96 and 0.71 for IW and OW, and an average surface distance (ASD) of 1.45 and 1.94 for IW and OW, respectively. In another study, Xu et al.  achieved a mean DSC of 0.87 measured on IW and OW. In light of the advantages of our models with respect to the state-of-art, we believe that approaches similar to those proposed in this work should now be considered to assess the segmentation of BC images.
Although our results demonstrated the high performance of proposed models, there are some regions where segmentation might not be satisfactory in a clinical setting (e.g., Fig. 9). Particularly, these situations are observed in both the upper and lower extremes. We believe that these regions are more challenging to segment because of the thicker appearance of the walls. As segmentation is performed in 2D slices, there exist an imbalance in the number of samples representing the extremes of the urine –where outer walls are thicker in these images (See Fig 9, right column)– with respect to middle regions. This might bias the CNN towards providing thinner wall regions across all the 2D slices. To improve CNN-based segmentation, recent works have considered to cast the probability maps from the CNN as unary potentials in an energy minimization framework [30, 49, 62]. In these works, length is typically employed as a regularizer in the energy function. More complex regularizers have demonstrated to further boost the performance of segmentation techniques, e.g. convexity  or compactness . Employing such regularizers may improve performance in the current application given the compact shape of the bladder. Furthermore, recent works have shown that combining several deep models can lead to important improvements in several segmentation tasks [65, 66, 67]. This promising strategy could also be investigated to improve performance, especially for the difficult task of segmenting bladder tumor regions.
Given the 3D nature of the volumetric data in this particular application, by performing 2D segmentation in a slice-manner the anatomic context in directions orthogonal to the 2D plane is discarded. Adopting a 3D approach is, a priori, more appropriate in order to account for volumetric information. Nevertheless, in the context of the proposed approach, the large receptive field achieved by the proposed network makes the dimension of the input patch in 3D intractable. Due to memory limitations in current resources, 3D convolutional neural networks typically take input sub-volumes of size 646464. Taking into account that the proposed progressive dilated network enlarges the receptive field, the input patch should be, at least, equal to 267 pixels per dimension, which cannot be efficiently allocated in our GPU facilities.
A main limitation of the current study is the limited available dataset employed in our experiments. First, the reduced amount of subjects employed for training (i.e., 45 MRI scans) is insufficient to capture the high variability of tumor regions across the population, as demonstrated by the results. Secondly, the data was acquired by the same scanner with the same imaging parameters, which may possibly reduce the generalization of the proposed scheme and impair its overall performance in segmentation. A larger validation group, including datasets acquired from multiple clinical centers with different scanners and imaging parameters, would further demonstrate its potential in real clinical applications.
Even though segmentation is a fundamental task in the medical field, it rarely represents the final objective of the clinical pipeline. In the assessment of bladder cancer patients, segmentation of IW, OW and tumor is employed to evaluate the muscle invasiveness and grade of BC, which play a crucial role in treatment decision and prognosis [12, 13, 68, 11]. In future works, we aim to apply the proposed multi-region segmentation scheme with a radiomics strategy for the pre-operative and automatic evaluation of BC.
We proposed an approach using progressive dilated convolution blocks for the multi-region semantic segmentation of bladder cancer in MRI. Progressive dilated blocks allow having the same receptive field as standard dilated blocks but with lower dilation rates. The proposed network achieved a higher accuracy than approaches using standard dilated convolutions, particularly when segmenting tumors. Moreover, the proposed model outperformed state-of-the-art methods for the task at hand, bringing three important advantages: i) it enables segmenting multiple BC regions simultaneously, ii) there is no need for contour initialization, and iii) it is computationally efficient, e.g., 2-3 orders of magnitude faster than current approaches based on level-sets. In summary, deep CNNs in general, and the proposed network in particular, are very suitable for this task.
This work is supported by the National Science and Engineering Research Council of Canada (NSERC), discovery grant program, the National Nature Science Foundation of China under grant No.81230035, National Key Research and Development Program of China under grant No.2017YFC0107400, Key project supported by Military Science and Technology Foundation under grant No.BWS14C030, and by the ETS Research Chair on Artificial Intelligence in Medical Imaging.
-  S. Antoni, J. Ferlay, I. Soerjomataram, A. Znaor, A. Jemal, and F. Bray, “Bladder cancer incidence and mortality: A global overview and recent trends,” European Urology, vol. 71, no. 1, p. 96, 2017.
-  S. Woo, C. H. Suh, S. Y. Kim, J. Y. Cho, and S. H. Kim, “Diagnostic performance of MRI for prediction of muscle-invasiveness of bladder cancer: A systematic review and meta-analysis,” European Journal of Radiology, pp. 46–55, 2017.
-  W. J. Alfred, T. Lebret, E. M. Compérat, N. C. Cowan, S. M. De, H. M. Bruins, V. Hernández, E. L. Espinós, J. Dunn, and M. Rouanne, “Updated 2016 EAU guidelines on muscle-invasive and metastatic bladder cancer,” European Urology, 2016.
-  American Cancer Society, “Cancer Facts & Figures 2016,” Cancer Facts & Figures 2016, pp. 1–9, 2016.
-  A. M. Kamat, N. M. Hahn, J. A. Efstathiou, S. P. Lerner, P.-u. Malmström, W. Choi, C. C. Guo, and Y. Lotan, “Bladder cancer,” The Lancet, vol. 388, 2016.
-  W. Choi, S. Porten, S. Kim, D. Willis, E. R. Plimack, B. Roth, T. Cheng, M. Tran, I.-l. Lee, J. Melquist, J. Bondaruk, T. Majewski, S. Zhang, S. Pretzsch, and K. Baggerly, “Identification of distinct basal and luminal subtypes of muscle-invasive bladder cancer with different sensitivities to frontline chemotherapy,” Cancer Cell, vol. 25, no. 2, pp. 152–165, 2015.
-  M. A. Knowles and C. D. Hurst, “Molecular biology of bladder cancer : new insights into pathogenesis and clinical diversity,” Nature Publishing Group, vol. 15, no. 1, pp. 25–41, 2015. [Online]. Available: http://dx.doi.org/10.1038/nrc3817
-  Cancer Genome Atlas Research Network and others, “Comprehensive molecular characterization of urothelial bladder carcinoma,” Nature, vol. 507, no. 7492, pp. 315–322, 2014. [Online]. Available: http://dx.doi.org/10.1038/nature12965
-  C. Duan, Z. Liang, S. Bao, H. Zhu, S. Wang, G. Zhang, J. J. Chen, and H. Lu, “A coupled level set framework for bladder wall segmentation with application to MR cystography,” IEEE Transactions on Medical Imaging, vol. 29, no. 3, pp. 903–915, 2010.
-  X. Qin, X. Li, Y. Liu, H. Lu, and P. Yan, “Adaptive shape prior constrained level sets for bladder MR image segmentation,” IEEE journal of biomedical and health informatics, vol. 18, no. 5, pp. 1707–1716, 2014.
-  X. Xu, X. Zhang, Q. Tian, G. Zhang, Y. Liu, G. Cui, J. Meng, Y. Wu, T. Liu, Z. Yang, and H. Lu, “Three-dimensional texture features from intensity and high-order derivative maps for the discrimination between bladder tumors and wall tissues via MRI,” International Journal of Computer Assisted Radiology and Surgery, 2017. [Online]. Available: http://link.springer.com/10.1007/s11548-017-1522-8
-  X. Xu, Y. Liu, X. Zhang, Q. Tian, Y. Wu, G. Zhang, J. Meng, Z. Yang, and H. Lu, “Preoperative prediction of muscular invasiveness of bladder cancer with radiomic features on conventional MRI and its high-order derivative maps.” Abdominal Radiology, vol. 42, no. 7, pp. 1–10, 2017.
-  X. Zhang, X. Xu, Q. Tian, B. Li, Y. Wu, Z. Yang, Z. Liang, Y. Liu, G. Cui, and H. Lu, “Radiomics assessment of bladder cancer grade using texture features from diffusion-weighted imaging.” Journal of Magnetic Resonance Imaging Jmri, vol. 46, 2017.
-  D. Xiao, G. Zhang, Y. Liu, Z. Yang, X. Zhang, L. Li, C. Jiao, and H. Lu, “3D detection and extraction of bladder tumors via MR virtual cystoscopy,” International journal of computer assisted radiology and surgery, vol. 11, no. 1, pp. 89–97, 2016.
-  C. Duan, K. Yuan, F. Liu, P. Xiao, G. Lv, and Z. Liang, “An adaptive window-setting scheme for segmentation of bladder tumor surface via MR cystography,” IEEE Transactions on Information Technology in Biomedicine, vol. 16, no. 4, pp. 720–729, 2012.
-  P. Lambin, R. T. H. Leijenaar, T. M. Deist, J. Peerlings, E. E. C. D. Jong, J. V. Timmeren, S. Sanduleanu, R. T. H. M. Larue, A. J. G. Even, and A. Jochems, “Radiomics: the bridge between medical imaging and personalized medicine,” Nature Reviews Clinical Oncology, vol. 14, no. 12, p. 749, 2017.
-  X. Xu, X. Zhang, Y. Liu, Q. Tian, G. Zhang, Z. Yang, H. Lu, and J. Yuan, “Simultaneous segmentation of multiple regions in 3D bladder MRI by efficient convex optimization of coupled surfaces,” in International Conference on Image and Graphics. Springer, 2017, pp. 528–542.
-  L. Li, Z. Wang, X. Li, X. Wei, H. L. Adler, W. Huang, S. A. Rizvi, H. Meng, D. P. Harrington, and Z. Liang, “A new partial volume segmentation approach to extract bladder wall for computer-aided detection in virtual cystoscopy,” in Medical Imaging 2004: Physiology, Function, and Structure from Medical Images, vol. 5369. International Society for Optics and Photonics, 2004, pp. 199–207.
-  L. Li, Z. Liang, S. Wang, H. Lu, X. Wei, M. Wagshul, M. Zawin, E. J. Posniak, and C. S. Lee, “Segmentation of multispectral bladder MR images with inhomogeneity correction for virtual cystoscopy,” in Medical Imaging 2008: Physiology, Function, and Structure from Medical Images, vol. 6916. International Society for Optics and Photonics, 2008, p. 69160U.
-  J. W. Chi, M. Brady, N. R. Moore, and J. A. Schnabel, “Segmentation of the bladder wall using coupled level set methods,” in Biomedical Imaging: From Nano to Macro, 2011 IEEE International Symposium on. IEEE, 2011, pp. 1653–1656.
-  C. Garnier, W. Ke, and J.-L. Dillenseger, “Bladder segmentation in MRI images using active region growing model,” in Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of the IEEE. IEEE, 2011, pp. 5702–5705.
-  Z. Ma, R. N. Jorge, T. Mascarenhas, and J. M. R. Tavares, “Novel approach to segment the inner and outer boundaries of the bladder wall in T2-weighted magnetic resonance images,” Annals of biomedical engineering, vol. 39, no. 8, pp. 2287–2297, 2011.
-  H. Han, L. Li, C. Duan, H. Zhang, Y. Zhao, and Z. Liang, “A unified EM approach to bladder wall segmentation with coupled level-set constraints,” Medical image analysis, vol. 17, no. 8, pp. 1192–1205, 2013.
-  X. Qin, X. Li, Y. Liu, H. Lu, and P. Yan, “Adaptive Shape Prior Constrained Level Sets for Bladder MR Image Segmentation,” IEEE Journal of Biomedical and Health Informatics, vol. 18, no. 5, pp. 1707–1716, 2014.
-  J. C. Bezdek, R. Ehrlich, and W. Full, “FCM: The fuzzy c-means clustering algorithm,” Computers & Geosciences, vol. 10, no. 2-3, pp. 191–203, 1984.
G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten, “Densely connected
convolutional networks,” in
Proceedings of the IEEE conference on computer vision and pattern recognition, vol. 1, no. 2, 2017, p. 3.
-  J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” pp. 7263–7271, 2017.
-  F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, 2015.
-  J. Dolz, C. Desrosiers, and I. B. Ayed, “3D fully convolutional networks for subcortical segmentation in MRI: A large-scale study,” NeuroImage, vol. 170, pp. 456–470, 2018.
-  T. Fechter, S. Adebahr, D. Baltas, I. Ben Ayed, C. Desrosiers, and J. Dolz, “Esophagus segmentation in CT via 3D fully convolutional neural network and random walk,” Medical Physics, vol. 44, no. 12, pp. 6341–6352, 2017.
-  G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. van der Laak, B. Van Ginneken, and C. I. Sánchez, “A survey on deep learning in medical image analysis,” Medical image analysis, vol. 42, pp. 60–88, 2017.
-  J. Dolz, K. Gopinath, J. Yuan, H. Lombaert, C. Desrosiers, and I. Ben Ayed, “Hyperdense-Net: A hyper-densely connected CNN for multi-modal image segmentation,” arXiv preprint arXiv:1804.02967, 2018.
-  A. Carass, J. L. Cuzzocreo, S. Han, C. R. Hernandez-Castillo, P. E. Rasser, M. Ganz, V. Beliveau, J. Dolz, I. B. Ayed, C. Desrosiers et al., “Comparing fully automated state-of-the-art cerebellum parcellation from magnetic resonance images,” NeuroImage, vol. 183, pp. 150–172, 2018.
-  K. H. Cha, L. Hadjiiski, R. K. Samala, H.-P. Chan, E. M. Caoili, and R. H. Cohan, “Urinary bladder segmentation in CT urography using deep-learning convolutional neural network and level sets,” Medical physics, vol. 43, no. 4, pp. 1882–1896, 2016.
-  K. H. Cha, L. M. Hadjiiski, R. K. Samala, H.-P. Chan, R. H. Cohan, E. M. Caoili, C. Paramagul, A. Alva, and A. Z. Weizer, “Bladder cancer segmentation in CT for treatment response assessment: application of deep-learning convolution neural network—a pilot study,” Tomography: a journal for imaging research, vol. 2, no. 4, p. 421, 2016.
-  K. Men, J. Dai, and Y. Li, “Automatic segmentation of the clinical target volume and organs at risk in the planning CT for rectal cancer using deep dilated convolutional neural networks,” Medical physics, vol. 44, no. 12, pp. 6377–6389, 2017.
-  O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241.
-  R. Hamaguchi, A. Fujita, K. Nemoto, T. Imaizumi, and S. Hikosaka, “Effective use of dilated convolutions for segmenting small object instances in remote sensing imagery,” in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2018, pp. 1442–1450.
-  P. F. Christ, M. E. A. Elshaer, F. Ettlinger, S. Tatavarty, M. Bickel, P. Bilic, M. Rempfler, M. Armbruster, F. Hofmann, M. D’Anastasi et al., “Automatic liver and lesion segmentation in CT using cascaded fully convolutional neural networks and 3D conditional random fields,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2016, pp. 415–423.
-  Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3D U-Net: learning dense volumetric segmentation from sparse annotation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2016, pp. 424–432.
-  H. Dong, G. Yang, F. Liu, Y. Mo, and Y. Guo, “Automatic brain tumor detection and segmentation using U-Net based fully convolutional networks,” in Annual Conference on Medical Image Understanding and Analysis. Springer, 2017, pp. 506–517.
-  K. Sirinukunwattana, J. P. Pluim, H. Chen, X. Qi, P.-A. Heng, Y. B. Guo, L. Y. Wang, B. J. Matuszewski, E. Bruni, U. Sanchez et al., “Gland segmentation in colon histology images: The GlaS Challenge contest,” Medical image analysis, vol. 35, pp. 489–502, 2017.
-  C. Zotti, Z. Luo, O. Humbert, A. Lalande, and P.-M. Jodoin, “GridNet with automatic shape prior registration for automatic MRI cardiac segmentation.” arXiv preprint arXiv:1705.08943, 2017.
-  M. Holschneider, R. Kronland-Martinet, J. Morlet, and P. Tchamitchian, “A real-time algorithm for signal analysis with the help of the wavelet transform,” in Wavelets. Springer, 1990, pp. 286–297.
-  J. M. Wolterink, T. Leiner, M. A. Viergever, and I. Išgum, “Dilated convolutional neural networks for cardiovascular MR segmentation in congenital heart disease,” in Reconstruction, Segmentation, and Analysis of Medical Images. Springer, 2016, pp. 95–102.
-  Z. Wu, C. Shen, and A. v. d. Hengel, “High-performance semantic segmentation using very deep fully convolutional networks,” arXiv preprint arXiv:1604.04339, 2016.
-  P. Moeskops, M. Veta, M. W. Lafarge, K. A. Eppenhof, and J. P. Pluim, “Adversarial training and dilated convolutions for brain MRI segmentation,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, 2017, pp. 56–64.
-  M. M. Lopez and J. Ventura, “Dilated convolutions for brain tumor segmentation in MRI scans,” in International MICCAI Brainlesion Workshop. Springer, 2017, pp. 253–262.
-  L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2018.
-  M. Anthimopoulos, S. Christodoulidis, L. Ebner, T. Geiser, A. Christe, and S. Mougiakakou, “Semantic segmentation of pathological lung tissue with dilated fully convolutional networks,” arXiv preprint arXiv:1803.06167, 2018.
-  E. Romera, J. M. Alvarez, L. M. Bergasa, and R. Arroyo, “Efficient convnet for real-time semantic segmentation,” in Intelligent Vehicles Symposium (IV), 2017 IEEE. IEEE, 2017, pp. 1789–1794.
-  A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “Enet: A deep neural network architecture for real-time semantic segmentation,” arXiv preprint arXiv:1606.02147, 2016.
-  V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 12, pp. 2481–2495, 2017.
-  K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network
training by reducing internal covariate shift,” in
International Conference on Machine Learning, 2015, pp. 448–456.
K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” inProceedings of the IEEE international conference on computer vision, 2015, pp. 1026–1034.
-  P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell, “Understanding convolution for semantic segmentation,” arXiv preprint arXiv:1702.08502, 2017.
-  L. R. Dice, “Measures of the amount of ecologic association between species,” Ecology, vol. 26, no. 3, pp. 297–302, 1945.
-  X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010, pp. 249–256.
-  A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
-  M. Drozdzal, E. Vorontsov, G. Chartrand, S. Kadoury, and C. Pal, “The importance of skip connections in biomedical image segmentation,” in Deep Learning and Data Labeling for Medical Applications. Springer, 2016, pp. 179–187.
-  K. Kamnitsas, C. Ledig, V. F. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, and B. Glocker, “Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation,” Medical image analysis, vol. 36, pp. 61–78, 2017.
-  L. Gorelick, O. Veksler, Y. Boykov, and C. Nieuwenhuis, “Convexity shape prior for binary segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 2, pp. 258–271, 2017.
-  J. Dolz, I. Ben Ayed, and C. Desrosiers, “Unbiased shape compactness for segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2017, pp. 755–763.
-  K. Kamnitsas, W. Bai, E. Ferrante, S. McDonagh, M. Sinclair, N. Pawlowski, M. Rajchl, M. Lee, B. Kainz, D. Rueckert et al., “Ensembles of multiple models and architectures for robust brain tumour segmentation,” in International MICCAI Brainlesion Workshop. Springer, 2017, pp. 450–462.
-  J. Dolz, C. Desrosiers, L. Wang, J. Yuan, D. Shen, and I. B. Ayed, “Deep CNN ensembles and suggestive annotations for infant brain mri segmentation,” arXiv preprint arXiv:1712.05319, 2017.
-  J. V. Manjón, P. Coupé, P. Raniga, Y. Xia, P. Desmond, J. Fripp, and O. Salvado, “Mri white matter lesion segmentation using an ensemble of neural networks and overcomplete patch-based voting,” Computerized Medical Imaging and Graphics, 2018.
-  Y. Liu, X. Xu, L. Yin, X. Zhang, L. Li, and H. Lu, “Relationship between glioblastoma heterogeneity and survival time: An MR imaging texture analysis,” Ajnr Am J Neuroradiol, vol. 38, no. 9, 2017.