Automatic segmentation of anatomical structures in medical images has various medical applications. For example, in radiotherapy prostate segmentation is essential in the diagnosis, therapy, and post-therapy analysis of prostate cancer. It is critical in selecting patients for a specific treatment, to guide source delivery and in computing dose distribution [1, 2]. T2-weighted MRI is the modality of choice for prostate segmentation. However, CT and US are also routinely used because: 1) CT image is used to calculate the dose distribution due to its characteristics of relating the density of tissues with the voxel intensity, and 2) US imaging is suitable for real-time image guided radiotherapy. Despite the need for accurate segmentation of the prostate in radiotherapy, manual segmentation is subjective to inter and intra-observer variabilities, time-consuming, and depends on the experience of the physician. Automatic and reliable segmentation of the prostate on these images is thus an important but difficult task due to the inhomogeneous and inconsistent contrast of prostate boundary and large shape variations. This is particularly complicated on CT images because of the inherent low-contrast imaging characteristics of CT for soft tissues (such as prostate boundary) as can be seen from Fig. 1 (b).
Recently, organ boundary detection through modeling and incorporating the organ shape as prior information has been successfully used for automatic and reliable anatomical structure segmentation (such as prostate , brain , and heart 
). The prior prostate shape has been modeled using principal component analysis from labeled prostate CT scans. The modeled shape was then used to guide the segmentation of prostate gland on CT images. A deep learning approach followed by a multi-atlas based feature extraction has also been proposed . Distinctive curve guided fully convolutional neural network has also been employed for the pelvic organ segmentation on CT images . Kazemifar et al.  used convolutional networks (U-net architecture ) to segment both the prostate and organs at risk in male pelvic CT images. Guo et al. 
have also used deep features and sparse patch matching approach to segment the prostate on MR images. Although atlas and shape-prior based methods demonstrated promising performance, they might not be generic. This lack of generalization ability is due to the possibility of statistical shape or atlas of an organ being different for a new patient. It then requires a robust modeling and registration algorithm. However, robust feature extraction is still a challenging task to obtain an optimal shape model. Indeed in medical images, image contrast, organ shape, acquisition protocol and deformable characteristics of an organ can vary widely.
Deep convolutional neural networks (CNN) have shown promising performances in various medical applications . For example, U-net architecture has been often used for medical image segmentations . Adversarial neural network has also been proven to improve medical image segmentation (e.g. for liver ). Thus, our hypothesis is that by combining CNN-based feature extraction and learning-based anatomical structure modeling (through generative neural network) from reliable contrast images (such as T2-weighted MRI for soft tissues), we can predict accurately an organ boundary in low-contrast imaging modalities (e.g. prostate segmentation on CT).
In this paper, we present a new deep generative model-driven anatomical structure segmentation (named DGMNet), specifically designed for multimodal (CT and MR) prostate segmentation. The proposed method employs a convolutional feature extraction with an embedded generative CNN [8, 12]. The generative CNN is designed for learning-based modeling of prior organ shape from MRI and applied to low-contrast CT images. It also involves a learning-based registration with a given raw input image. Experimental results on MRI and CT datasets reveal that our method can fully automatically segment the prostate robustly and accurately regardless of the difference in contrast, size, and imaging modality.
We aim at detecting the boundary of the prostate volume in a given 3D raw input image of size . Here , , and are width, height and depth of the image, respectively. We use a deep CNN which outputs a label map of size whose voxels contain a label 1 for prostate volume and 0 otherwise. This is done by combining a predicted label mask from a decoder and a predicted shape from an embedded shape-model generator (see Fig. 2 (a)).
2.1 Network architecture
The network architecture is illustrated in Fig. 2. It consists of feature extraction, generative shape modeling, and feature map upsampling. The feature extraction (encoder) resides on a convolutional neural network 12](see Fig. 2
(b)), and a 2x2 max pooling operation with stride 2 for downsampling. From the extracted feature maps, two paths named as model and decoder path are applied. We also used dropout regularization at the bottleneck layer to have a better generalization by reducing over-fitting during the training. The decoder path is composed of a 2x2 up-convolution and concatenation layer, followed by the same block as the encoder (Fig.2 (b)). Generative path (i.e. Model in Fig. 2 (a)) is composed of average max pooling followed by fully connected layers (FC). The output of these FC layers (corresponding to surface boundary coordinates) are feed to the generative model where it generates the shape of a given organ (in our case, prostate gland). It consists of a projection and a reshape block followed by repeated Leaky ReLU activation (except the last activation which was sigmoid), batch normalization, and up-convolution (similar to the one proposed in 
). The model-generator and decoder outputs are merged using addition and further a convolutional block is applied. The output layer is a 1x1 convolutional layer with sigmoid activation function. It is worthy to mention here that the proposed network involves only 1.5 million trainable parameters while the U-net architecture has 31.024 million.
We formulate the model generator to predict the prostate volume given a few sampled prostate surface boundary coordinates, . To this end, the voxel depth (, for a given slice ) is taken as classification task (0 or 1) and the remaining ( and ) as regression task. Given the surface boundary coordinates, , it is trained to predict a labeled model , in which the prostate volume is 1 and 0 otherwise, i.e. . We automatically extracted four surface boundary landmark coordinates (left, right, top, and bottom) per-slice from the given labeled ground truth (from MRI) and repeated over the whole volume of the prostate ().
2.2 Loss function
To train the proposed network, we define a multi-task loss function as a combined weighted sum loss:
in which the segmentation (final mask) loss, , is calculated as a combination of Dice and cross entropy loss ().
Given ground truth surface voxel coordinates , where , and predicted values, , the joint classification and regression loss can be calculated as:
in which the classification loss, , is the cross entropy loss. The likelihood of a given raw input image slice (), being part of the organ is . The ground-truth label, , is 1 if the image slice consists of the prostate, and is 0 otherwise. The second loss, , is thus defined over the surface landmarks where the ground truth is 1 and 0 otherwise. For positive ground truth (i.e. ), we use smooth
loss between corresponding voxels, which is considered as robust loss to outliers, as:
where and are the ground truth and predicted surface boundary coordinates, respectively, for given . The hyper-parameter controls the losses contributed from the segmentation and surface boundary coordinates.
3 Experimental setup and results
The proposed method was trained and evaluated on T2-weighted MRI and CT prostate images with vast variability in organ size, shape, scanning protocol, and from multiple clinical centers. Firstly, it was trained and evaluated on 60 T2-weighted MR exams. These datasets were acquired with an in-plane resolution ranging from x to x with a slice thickness between 1.250 mm and 2.722 mm. Similarly, we also trained and evaluated on 40 CT patient datasets (who underwent permanent prostate brachytherapy with for localized prostate cancer treatment). These CT exams were acquired from two clinical centers. The in-plane resolution of these CT data varies from x to x with a slice thickness between 1.5 mm and 2.5 mm (helical mode, 120 kVp, 172 mm FOV, and 440 mAs/slice). We looked for conversions of the datasets into the same voxel size of xx and xx for CT and MRI respectively. The prostate was manually delineated by experienced radiologists.
The input images (MR and CT) were pre-processed by zero-centering the intensity values and normalizing them by the standard deviation of all images before feeding to the network. All images were also center cropped and resized to have an image resolution of 256x256.
3.1.2 Training and testing details:
We trained the system by minimizing the loss (equation 1). The proposed system is trained as follows: 1) Firstly, we train the generative model with the inputs from a few sampled surface boundary landmarks of the prostate volume, specifically from only T2-weighted MRI labels. We used a binary cross entropy loss for training. We conducted five-fold cross validation experiment. 2) Secondly, the whole system is trained except the generator in which it only predicts the model shape given the predicted surface coordinate values from FC network. We use a batch size of 10 images for both MRI and CT. It is important to mention here that we feed the network with 2D instead of 3D as we have small datasets. Then, predicted image labels are stacked to create a 3D volume. The model is trained using Adam optimizer with a learning rate of . The whole ensembled architecture (except the generator) was trained from scratch considering 10 patients (25% of the datasets which were selected randomly) for validation.
3.1.3 Ablation study:
We conducted ablation experiments (Table 1) to investigate the effect of individual components in the proposed network. All ablation experiments were done on CT images under similar settings: 1) Unet architecture; 2) ResUnet (residual-based Unet); 3) SE-ResUnet (residual block with squeeze-and-excitation network based Unet ); 4) SE-Unet (the proposed method without the generator and the FC network (Fig. 2 (b))).
3.1.4 Evaluation metrics:
All experiments were evaluated using Dice Similarity Coefficient (DSC), Sensitivity (Sen), positive predicted value (PPV), and average surface distance (ASD) (in mm) .
3.2 Experimental results
The generator was trained using a five-fold cross validation method using T2-weighted MRI. It was then kept as a shape predictor by freezing its weights during training of the proposed method. As this is an intermediate output of the method, it can be considered as a region proposal (or as instantaneous shape generator) to be further refined by merging with the encoder-decoder output. Indeed, the model-generator can learn from good contrast images (MRI) and used directly (transfer without fine tuning by freezing) for low contrast images (CT), while the encoder-decoder extracts additional features. As one can see from the qualitative prostate segmentation results in Fig. 3, the proposed method can segment accurately the prostate on both T2-weighted MR and CT images.
In almost all evaluation metrics (with and without the generator, Table1), the proposed method with the shape model generator outperforms the state of the art methods. Since the implanted radioactive seeds were not uniformly placed over the volume of the prostate gland, it was observed to influence the segmentation quality (particularly the state of the art methods). However, they might perform better on CT images without the implanted radioactive seeds. Combining CNN-based extracted features with prior shape knowledge of the organ can improve time, reproducibility, and accuracy in fully automatic segmentation of the prostate in radiotherapy.
In this paper we proposed DGMNet, a new CNN approach for feature-model learning based anatomical structure segmentation. It is an encoder-decoder architecture and an embedded deep generative neural network based model-generator that enables training on limited data. The model-generator is used for embedding prior shape knowledge via learning based shape modeling and registration from high contrast images (such as MRI) and directly applied (by freezing) to low contrast images (such as CT). Further, we demonstrated that combining shape-model with a CNN-based feature extraction improves segmentation accuracy. We extensively evaluated models trained with and without prior shape generator on CT images with different metrics to verify the effect of the embedded shape generator. Experimental results, on MR and CT datasets, reveal that this method can be used to fully-automatically segment prostate gland in different imaging modalities. In the future, we plan on generalizing the proposed method to other modalities such as US images (for intra-operative radiotherapy) as well as to other organs (such as rectum, brain, and heart). In the case of US images, we shall propose to train the model-generator from MRI with an endorectal coil to consider the deformation characteristics of the prostate gland from the coil.
-  Martínez, F., Romero, E., Dréan, G., Simon, A., Haigron, P., De Crevoisier, R. et al: Segmentation of pelvic structures for planning CT using a geometrical shape model tuned by a multi-scale edge detector. Phys. Med. Biol. 59(6), 1471 (2014). doi: 10.1088/0031-9155/59/6/1471
-  Girum, K.B., Lalande, A., Quivrin, M., Bessières, I., Pierrat, N., Martin, E. et al.: Inferring postimplant dose distribution of salvage permanent prostate implant (PPI) after primary PPI on CT images. Brachytherapy. 17(6), 866-873 (2018). doi: 10.1016/j.brachy.2018.07.017
-  Ilunga-Mbuyamba, E., Avina-Cervantes, J.G., Lindner, D., Arlt, F., Ituna-Yudonago, J.F. et al.: Patient-specific model-based segmentation of brain tumors in 3D intraoperative ultrasound images. Int. J. Comput. Assist. Radiol. Surg. 13(3), 331-342 (2018). doi: 10.1007/s11548-018-1703-0
-  Zotti, C., Luo, Z., Lalande, A., Jodoin, P.M.: Convolutional neural network with shape prior applied to cardiac mri segmentation. IEEE J Biomed Health Inform, 23(3), 1119-1128 (2018). doi: 10.1109/JBHI.2018.2865450
-  Ma, L., Guo, R., Zhang, G., Tade, F., Schuster, D.M., Nieh, P. et al.: Automatic segmentation of the prostate on CT images using deep learning and multi-atlas fusion. In: Medical Imaging 2017: Image Processing pp. 101332O. SPIE (2017). doi: 10.1117/12.2255755
-  He, K., Cao, X., Shi, Y., Nie, D., Gao, Y., Shen, D.: Pelvic organ segmentation using distinctive curve guided fully convolutional networks. IEEE T Med. Imaging. 38(2), 585-595 (2019). doi: 10.1109/TMI.2018.2867837
-  Kazemifar, S., Balagopal, A., Nguyen, D., McGuire, S., Hannan, R., Jiang, S.m et al: Segmentation of the prostate and organs at risk in male pelvic CT images using deep learning. Biomed Phys Eng Express, 4(5), 055003 (2018). doi: 10.1088/2057-1976/aad100
-  Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: MICCAI, pp. 234-241. Springer (2015). doi: 10.1007/978-3-319-24574-4_28
-  Guo, Y., Gao, Y. and Shen, D.: Deformable MR prostate segmentation via deep feature learning and sparse patch matching. IEEE T Med. Imaging. 35(4), 1077-1089 (2016). doi: 10.1109/TMI.2015.2508280
-  Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., et al.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60-88 (2017). doi: 10.1016/j.media.2017.07.005
-  Yang, D., Xu, D., Zhou, S.K., Georgescu, B., Chen, M., Grbic, S., et al: Automatic liver segmentation using an adversarial image-to-image network. In: MICCAI, pp. 507-515. Springer (2017). doi: 10.1007/978-3-319-66179-7_58
-  Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR, pp. 7132-7141 (2018). doi: 10.1109/CVPR.2018.00745
-  Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR, (2016).
-  He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961-2969 (2017). doi: 10.1109/TPAMI.2018.2844175
-  Litjens, G., Toth, R., van de Ven, W., Hoeks, C., Kerkstra, S., van Ginneken, B., et al.: Evaluation of prostate segmentation algorithms for MRI: the PROMISE12 challenge. Med. Image Anal. 18(2), 359-373 (2014). doi: 10.1016/j.media.2013.12.002