1 Introduction
Multiorgan segmentation of radiology images is a critical task which is essential to many clinical applications such as computeraided diagnosis, computeraided surgery, and radiation therapy. Compared with other internal human structures like brain or heart, segmenting abdominal organs appears to be much more challenging due to the low contrast and high variability of shape in CT images. In this paper, we focus on the problem of multiorgan segmentation in abdominal regions, e.g., liver, pancreas, kidney, etc.
Fully supervised approaches can usually achieve high accuracy with a large labeled training set which consists of pairs of radiology images as well as their corresponding pixelwise label maps. However, it is quite timeconsuming and costly to obtain such a large training set especially in the medical imaging domain due to the following reasons: 1) precise annotations of radiology images must be hand annotated by experienced radiologists and carefully checked by additional experts and 2) contouring organs or tissues in 3D volumes requires tedious manual input. By contrast, large unannotated datasets of CT images are much easier to obtain. Thereby our study mainly focuses on multiorgan segmentation in a semisupervised fashion, i.e., how to fully leverage unlabeled data to boost performance, so as to alleviate the need for such a large annotated training set.
In the biomedical imaging domain, traditional methods for semisupervised learning usually adopt graphbased methods [14, 18] with a clustering assumption to segment pixels (voxels) into meaningful regions, e.g., superpixels. These methods were studied for tissue or anatomical structures segmentation in 3D brain MR images, ultrasound images, etc
. Other machine learning methods such as kernelbased large margin algorithms
[27] have been suggested for white matter hyperintensities segmentation. Although widely applied to biomedical imaging segmentation tasks in the past decade, the traditional methods cannot always produce a satisfactory result due to the lack of advanced techniques.With the recent advance of deep learning and its applications [23, 38, 37, 39], fully convolutional networks (FCNs) [24]
have been successfully applied to many biomedical segmentation tasks such as neuronal structures segmentation
[9, 13, 30, 35], single organ segmentation [32, 48, 47], and multiorgan segmentation [33, 42] in a fully supervised manner. Their impressive performances have shown that we are now equipped with much more powerful techniques than traditional methods. Nevertheless, networkbased semisupervised learning for biomedical image segmentation has not drawn enough attention. The current usage of deep learning for semisupervised multiorgan segmentation in the biomedical imaging domain is to train an FCN on both labeled and unlabeled data, and alternately update automated segmentations (pseudolabels) for unlabeled data and the network parameters [5]. However, if an error occurs in the initial pseudolabel of the unlabeled data, the error will be reinforced by the network during the following iterations. How to improve the quality of pseudolabels for unlabeled data hence becomes a promising direction to alleviate this negative effect.In this paper, we exploit the fact that CT scans are highresolution threedimensional volumes which can be represented by multiple planes, i.e., the axial, coronal, and sagittal planes. Taking advantages of this multiview property, we propose Deep MultiPlanar CoTraining (DMPCT), a systematic EMlike semisupervised learning framework. DMPCT consists of a teacher model, a multiplanar fusion module, and a student model. While the teacher model is trained from multiple planes separately in a slicebyslice manner with a few annotations, the key advantage of DMPCT is that it enjoys the additional benefit of continuously generating more reliable pseudolabels by the multiplanar fusion module, which can afterward help train the student model by making full usage of massive unlabeled data. As there are multiple segmentation networks corresponding to different planes in the teacher model and the student model, cotraining [7, 26] is introduced so that these networks can be trained simultaneously in our unified framework and benefit from each other. We evaluate our algorithm on our newly collected large dataset and observe a significant improvement of compared with the fully supervised method. At last, as DMPCT is a generic and flexible framework, it can be envisioned that better backbone models and fusion strategies can be easily plugged into our framework. Our unified system can be also practically useful for current clinical environments due to the efficiency in leveraging massive unlabeled data to boost segmentation performance.
2 Related Work
Fullysupervised multiorgan segmentation. Early studies of abdominal organ segmentation focused on atlasbased methods [22, 11, 43]. The frameworks are usually problematic because 1) they are not able to capture the large intersubject variations of abdominal regions and 2) computational time is tightly dependent on the number of atlases. Recently, learningbased approaches with relatively large dataset have been introduced for multiorgan segmentation [17, 34, 8]
. Especially, deep Convolutional Neural Networks (CNNs) based methods have achieved a great success in the medical image segmentation
[33, 10, 16, 41, 42, 46, 21] in the last few years. Compared with multiatlasbased approaches, CNNs based methods are generally more efficient and accurate. CNNs based methods for multiorgan segmentation can be divided into two major categories: 3D CNNs [33, 10, 16] based and 2D CNNs [41, 42, 46, 21] based. 3D CNNs usually adopt the slidingwindow strategy to avoid the out of memory problem, leading to high time complexity. Compared with 3D CNNs, 2D CNNs based algorithms can be directly endtoend trained using 2D deep networks, which is less timeconsuming.Semisupervised learning. The most commonly used techniques for semisupervised learning include selftraining [31, 28], cotraining [7], multiview learning [44] and graphbased methods [6, 40].
In selftraining, the classifier is iteratively retrained using the training set augmented by adding the unlabeled data with their own predictions. The procedure repeated until some convergence criteria are satisfied. In such case, one can imagine that a classification mistake can reinforce itself. Selftraining has achieved great performances in many computer vision problems
[31, 28] and recently has been applied to deep learning based semisupervised learning in the biomedical imaging domain [5].Cotraining [7] assumes that (1) features can be split into two independent sets and (2) each subfeature set is sufficient to train a good classifier. During the learning process, each classifier is retrained with the additional training examples given by the other classifier. Cotraining utilizes multiple sets of independent features which describe the same data, and therefore tends to yield more accurate and robust results than selftraining [36]. Multiview learning [44], in general, defines learning paradigms that utilize the agreement among different learners. Cotraining is one of the earliest schemes for multiview learning.
Graphbased semisupervised methods define a graph where the nodes are labeled and unlabeled examples in the dataset, and edges reflect the similarity of examples. These methods have been widely adopted in nondeeplearning based semisupervised learning algorithms in the biomedical imaging domain [14, 18, 27].
Different from other methods, our work tactfully embeds the multiview property of 3D medical data into the cotraining framework, which is simple and effective.
3 Deep MultiPlanar CoTraining
We propose Deep MultiPlanar CoTraining (DMPCT), a semisupervised multiorgan segmentation method which exploits multiplanar information to generate pseudolabels for unlabeled 3D CT volumes. Assume that we are given a 3D CT volume dataset containing organs. This includes labeled volumes and unlabeled volumes , where and denote a 3D input volume and its corresponding groundtruth segmentation mask. and are the numbers of labeled and unlabeled volumes, respectively. Typically . As shown in Figure 1, DMPCT involves the following steps:

Step 1: train a teacher model on the manually labeled data in the fully supervised setting (see Sec. 3.1).

Step 2: the trained model is then used to assign pseudolabels to the unlabeled data
by fusing the estimations from all planes (see Sec.
3.2). 
Step 3: train a student model on the union of the manually labeled data and automatically labeled data (see Sec. 3.3).

Step 4: perform step 2 & 3 in an iterative manner.
3.1 Teacher Model
We train the teacher model on the labeled dataset . By splitting each volume and its corresponding label mask from the sagittal (S), coronal (C), and axial (A) planes, we can get three sets of 2D slices, i.e., , , where is the number of 2D slices obtained from plane . We train a 2DFCN model (we use [24] as our reference CNN model throughout this paper) to perform segmentation from each plane individually.
Without loss of generality, let and denote a 2D slice and its corresponding label mask in , where is the organ label (0 means background) of the th pixel in . Consider a segmentation model , where denotes the model parameters and denotes the prediction for . Our objective function is
(1) 
where
denotes the probability of the
th pixel been classified as label on 2D slice and is the indicator function. We train the teacher model by optimizing w.r.t.3.2 MultiPlanar Fusion Module
Given a welltrained teacher model , our goal of the multiplanar fusion module is to generate the pseudolabels for the unlabeled data .We first make predictions on the 2D slices from each plane and then reconstruct the 3D volume by stacking all slices back together. Several previous studies [20, 29, 3, 4] suggest that combining predictions from multiple views can often improve the accuracy and the robustness of the final decision since complementary information can be exploited from multiple views simultaneously. Thereby, the fused prediction from multiple planes is superior to any estimation of a single plane. The overall module is shown in Figure 2.
More specifically, majority voting is applied to fuse the hard estimations by seeking an agreement among different planes. If the predictions from all planes do not agree on a voxel, then we select the prediction for that voxel with the maximum confidence. As simple as this strategy might sound, this method has been shown to result in highly robust and efficient outcome in various previous studies [1, 20, 29, 48]. The final decision for the th voxel of is:
(2) 
where . , , and denote the probabilities of the th pixel classified as label from the sagittal, coronal, and axial planes, respectively. denotes the hard estimation for the th pixel on plane , i.e., .
As shown in Figure 3, our multiplanar fusion module improves both over and underestimation by fusing aspects from different planes and therefore yields a much better outcome. Note that other rules [2, 41] can also be easily adapted to this module. We do not focus on discussing the influence of the fusion module in this paper, although intuitively better fusion module should lead to higher performance.
3.3 Student Model
After generating the pseudolabels for the unlabeled dataset , the training set can be then enlarged by taking the union of both the labeled and the unlabeled dataset, i.e., . The student model is trained on this augmented dataset the same way we train the teacher model as described in Sec. 3.1. The overall training procedure is summarized in Algorithm 1. In the training stage, we first train a teacher model in a supervised manner and then use it to generate the pseudolabels for the unlabeled dataset. Then we alternate the training of the student model and the pseudolabel generation procedures in an iterative manner to optimize the student model times. In the testing stage, we follow the method in Sec. 3.2 to generate the final estimation using the th student model.
4 Experiments
4.1 Dataset and Evaluation
Our fullylabeled dataset includes 210 contrastenhanced abdominal clinical CT images in the portal venous phase, in which we randomly choose 50/30/80 patients for training, validation, and testing, unless otherwise specified. A total of 16 structures (Aorta, Adrenal gland, Celiac AA, Colon, Duodenum, Gallbladder, Interior Vena Cava (IVC), Kidney (left, right), Liver, Pancreas, Superior Mesenteric Artery (SMA), Small bowel, Spleen, Stomach, Veins) for each case were segmented by four experienced radiologists, and confirmed by an independent senior expert. Our unlabeled dataset consists of 100 unlabeled cases acquired from a local hospital. To the best of our knowledge, this is the largest abdominal CT dataset with the most number of organs segmented. Each CT volume consists of slices of pixels, and have voxel spatial resolution of . The metric we use is the DiceSørensen Coefficient (DSC), which measures the similarity between the prediction voxel set and the groundtruth set , with the mathematical form of
. For each organ, we report an average DSC together with the standard deviation over all the testing cases.
Organ Type  FCN  SPSL  DMPCT (Ours)  pvalue  

50  0  50  50  50  100  50  50  50  100  
Aorta  
Adrenal gland  
Celiac AA  
Colon  
Duodenum  
Gallbladder  
IVC  
Kidney (L)  
Kidney (R)  
Liver  
Pancreas  
SMA  
Small bowel  
Spleen  
Stomach  
Veins  
Mean 
4.2 Implementation Details
We set the learning rate to be . The teacher model and the student model are trained for and iterations respectively. The validation set is used for tuning the hyperparameters. Similar to [19], we use three windows of , , and Housefield Units as the three input channels respectively. The intensities of each slice are rescaled to [0.0, 1.0]. Similar to [48, 45, 41], we initialize the network parameters by using the FCN8s model [24] pretrained on the PascalVOC image segmentation dataset. The iteration number in Algorithm 1 is set to 2, i.e., , as the performance of the validation set gets saturated.
4.3 Comparison with the Baseline
We show that our proposed DMPCT works better than other methods: 1) fully supervised learning method [24] (denoted as FCN), and 2) single planar based semisupervised learning approach [5] (denoted as SPSL). Both 1) and 2) are applied on each individual plane separately, and then the final result is obtained via multiplanar fusion (see Sec 3.2). As shown in Table 1, with labeled data, by varying the number of unlabeled data from 0 to 100, the average DSC of DMPCT increases from to and the standard deviation decreases from to . Compared with SPSL, our proposed DMPCT can boost the performance in both settings (i.e., 50 labeled data + 50 unlabeled data and 50 labeled data + 100 unlabeled data). Besides, the pvalues for testing significant difference between our DMPCT (50 labeled data + 100 unlabeled data) and FCN (50 labeled data + 0 unlabeled data) for organs are shown in the last column of Table 1, which suggests significant statistical improvements among almost all organs. Figure 4 shows comparison results of our DMPCT and the fully supervised method by box plots.
It is noteworthy that greater improvements are observed especially for those difficult organs, i.e., organs either small in sizes or with complex geometric characteristics. Table 1 indicates that our DMPCT approach boosts the segmentation performance of these small hard organs by (Pancreas), (Colon), (Duodenum), (Small bowels) and (Veins), (IVC). This promising result indicates that our method distills a reasonable amount of knowledge from the unlabeled data. An example is shown in Figure 5. In this particular case, the DSCs for Celiac AA, Colon, Duodenum, IVC, Pancreas and Veins are boosted from , , , , to , , , , respectively.
Organ  Spleen  Kidney (R)  Kidney (L)  Gall Bladder  Liver 

FCN  
DMPCT (Ours)  
Organ  Stomach  Aorta  IVC  Veins  Pancreas 
FCN  
DMPCT (Ours) 
4.4 Discussion
4.4.1 Amount of labeled data
For ablation analysis, we enlarge the labeled training set to 100 cases and keep the rest of the settings the same. As shown in Figure 6, with more labeled data, the semisupervised methods (DMPCT, SPSL) still obtain better performance than the supervised method (FCN), while the performance gain becomes less prominent. This is probably because the network is already trained well when large training set is available. We believe that if much more unlabeled data can be provided the performance should go up considerably. In addition, we find that DMPCT outperforms SPSL in every setting, which further demonstrates the usefulness of multiplanar fusion in our cotraining framework.
4.4.2 Comparison with 3D networkbased selftraining
Various previous studies [25, 41] demonstrate that 2D multiplanar fusion outperforms directly 3D learning in the fully supervised setting. 3D CNNs come with an increased number of parameters, significant memory and computational requirements. Due to GPU memory restrictions, these 3D CNN approaches which adopt the slidingwindow strategy do not act on the entire 3D CT volume, but instead on local 3D patches [12, 15, 17]. This results in the lack of holistic information and low efficiency. In order to prove that DMPCT outperforms direct 3D learning in the semisupervised setting, we also implement a patchbased 3D UNet [12]. 3D UNet gets in terms of mean DSC using 50 labeled data. When adding 100 unlabeled data the performance even drops to . This clearly shows that in 3D learning the teacher model is not trained well, thus the errors of the pseudolabels are reinforced during student model training.
4.4.3 Comparison with traditional cotraining
In order to show that our DMPCT outperforms traditional cotraining algorithm [7]
, we also select only the most confident samples during each iteration. Here the confidence score is measured by the entropy of probability distribution for each voxel in one slice. Under the setting of 50 labeled cases and 50 unlabeled cases, we select top 5000 samples with the highest confidence in each iteration. The whole training process takes about 67 iterations for each plane. The complete training requires more than 50 hours. Compared with our approach, this method requires much more time to converge. It obtains a mean DSC of
, slightly better than SPSL but worse than our DMPCT, which shows that selecting the most confident samples during training may not be a wise choice for deep network based semisupervised learning due to its low efficiency.4.4.4 Cross dataset generalization
We apply our trained DMPCT model (50 labeled data + 100 unlabeled data) and baseline FCN model (50 labeled data + 0 unlabeled data) on a public available abdominal CT datasets^{1}^{1}130 training data sets at https://www.synapse.org/#!Synapse:syn3193805/wiki/217789 with 13 anatomical structures labeled without any further retraining on new data cases. 10 out of the 13 structures are evaluated which are also manually annotated in our own dataset and we find that our proposed method improves the overall mean DSC and also reduces the standard deviation significantly, as shown in Table 2. The overall mean DSC as well as the standard deviation for the 10 organs is improved from to . We also directly test our models on the NIH pancreas segmentation dataset of 82 cases^{2}^{2}2https://wiki.cancerimagingarchive.net/display/Public/PancreasCT and observe that our DMPCT model achieves an average DSC of , outperforming the fully supervised method, with an average DSC of , by more than . This may demonstrate that our approach, which leverages more unlabeled data from multiple planes, turns out to be much more generalizable than the baseline model.
4.4.5 Computation time
In our experiments, the teacher model training process takes about hours on an NVIDIA TITAN Xp GPU card for iterations over all the training cases. The average computation time for generating pseudolabel as well as testing per volume depends on the volume of the target structure, and the average computation time for organs is approximately minutes, which is comparable to other recent methods [48, 32] even for single structure inference. The student model training process takes about hours for iterations.
5 Conclusion
In this paper, we designed a systematic framework DMPCT for multiorgan segmentation in abdominal CT scans, which is motivated by the traditional cotraining strategy to incorporate multiplanar information for the unlabeled data during training. The pseudolabels are iteratively updated by inferencing comprehensively on multiple configurations of unlabeled data with a multiplanar fusion module. We evaluate our approach on our own large newly collected highquality dataset. The results show that 1) our method outperforms the fully supervised learning approach by a large margin; 2) it outperforms the single planar method, which further demonstrates the benefit of multiplanar fusion; 3) it can learn better if more unlabeled data provided especially when the scale of labeled data is small.
Our framework can be practical in assisting radiologists for clinical applications since the annotation of multiple organs in 3D volumes requires massive labor from radiologists. Our framework is not specific to a certain structure, but shows robust results in multiple complex anatomical structures within efficient computational time. It can be anticipated that our algorithm may achieve even higher accuracy if a more powerful backbone network or an advanced fusion algorithm is employed, which we leave as the future work.
Acknowledgement. This work was supported by the Lustgarten Foundation for Pancreatic Cancer Research and also supported by NSFC No. 61672336. We thank Prof. Seyoun Park, Dr. Lingxi Xie, Cihang Xie, Zhishuai Zhang, Fengze Liu, Zhuotun Zhu and Yingda Xia for instructive discussions.
References
 [1] P. Aljabar, R. A. Heckemann, A. Hammers, J. V. Hajnal, and D. Rueckert. Multiatlas based segmentation of brain images: atlas selection and its effect on accuracy. NeuroImage, 46(3):726–738, 2009.
 [2] A. J. Asman and B. A. Landman. Nonlocal statistical label fusion for multiatlas segmentation. MIA, 17(2):194–208, 2013.
 [3] S. Bai, X. Bai, Z. Zhou, Z. Zhang, and L. J. Latecki. Gift: A realtime and scalable 3d shape search engine. In CVPR, pages 5023–5032, 2016.
 [4] S. Bai, X. Bai, Z. Zhou, Z. Zhang, Q. Tian, and L. J. Latecki. Gift: Towards scalable 3d shape retrieval. TMM, 19(6):1257–1271, 2017.
 [5] W. Bai, O. Oktay, M. Sinclair, H. Suzuki, M. Rajchl, G. Tarroni, B. Glocker, A. King, P. M. Matthews, and D. Rueckert. Semisupervised learning for networkbased cardiac mr image segmentation. In MICCAI, pages 253–260, 2017.
 [6] A. Blum and S. Chawla. Learning from labeled and unlabeled data using graph mincuts. 2001.
 [7] A. Blum and T. Mitchell. Combining labeled and unlabeled data with cotraining. In COLT, pages 92–100, 1998.
 [8] T. Brosch and A. Saalbach. Foveal fully convolutional nets for multiorgan segmentation. In Medical Imaging 2018: Image Processing, volume 10574, page 105740U, 2018.
 [9] H. Chen, X. Qi, J.Z. Cheng, P.A. Heng, et al. Deep contextual networks for neuronal structure segmentation. In AAAI, pages 1167–1173, 2016.
 [10] S. Chen, H. Roth, S. Dorn, M. May, A. Cavallaro, M. M. Lell, M. Kachelrieß, H. Oda, K. Mori, and A. Maier. Towards automatic abdominal multiorgan segmentation in dual energy ct using cascaded 3d fully convolutional network. arXiv preprint arXiv:1710.05379, 2017.
 [11] C. Chu, M. Oda, T. Kitasaka, K. Misawa, M. Fujiwara, Y. Hayashi, Y. Nimura, D. Rueckert, and K. Mori. Multiorgan Segmentation based on SpatiallyDivided Probabilistic Atlas from 3D Abdominal CT Images. In MICCAI, pages 165–172, 2013.
 [12] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger. 3d unet: learning dense volumetric segmentation from sparse annotation. In MICCAI, pages 424–432, 2016.
 [13] D. Ciresan, A. Giusti, L. M. Gambardella, and J. Schmidhuber. Deep neural networks segment neuronal membranes in electron microscopy images. In NIPS, pages 2843–2851, 2012.
 [14] A. Ciurte, X. Bresson, O. Cuisenaire, N. Houhou, S. Nedevschi, J.P. Thiran, and M. B. Cuadra. Semisupervised segmentation of ultrasound images based on patch representation and continuous min cut. PloS one, 9(7):e100972, 2014.
 [15] Q. Dou, H. Chen, Y. Jin, L. Yu, J. Qin, and P.A. Heng. 3d deeply supervised network for automatic liver segmentation from ct volumes. In MICCAI, pages 149–157, 2016.
 [16] E. Gibson, F. Giganti, Y. Hu, E. Bonmati, S. Bandula, K. Gurusamy, B. Davidson, S. P. Pereira, M. J. Clarkson, and D. C. Barratt. Automatic multiorgan segmentation on abdominal ct with dense vnetworks. TMI, 2018.
 [17] E. Gibson, F. Giganti, Y. Hu, E. Bonmati, S. Bandula, K. Gurusamy, B. R. Davidson, S. P. Pereira, M. J. Clarkson, and D. C. Barratt. Towards imageguided pancreas and biliary endoscopy: automatic multiorgan segmentation on abdominal ct with dense dilated networks. In MICCAI, pages 728–736, 2017.
 [18] L. Gu, Y. Zheng, R. Bise, I. Sato, N. Imanishi, and S. Aiso. Semisupervised learning for biomedical image segmentation via forest oriented super pixels (voxels). In MICCAI, pages 702–710, 2017.
 [19] A. P. Harrison, Z. Xu, K. George, L. Lu, R. M. Summers, and D. J. Mollura. Progressive and multipath holistically nested neural networks for pathological lung segmentation from ct images. In MICCAI, pages 621–629, 2017.
 [20] R. A. Heckemann, J. V. Hajnal, P. Aljabar, D. Rueckert, and A. Hammers. Automatic anatomical brain mri segmentation combining label propagation and decision fusion. NeuroImage, 33(1):115–126, 2006.
 [21] P. Hu, F. Wu, J. Peng, Y. Bao, F. Chen, and D. Kong. Automatic abdominal multiorgan segmentation using deep convolutional neural network and timeimplicit level sets. International journal of computer assisted radiology and surgery, 12(3):399–411, 2017.
 [22] J. E. Iglesias and M. R. Sabuncu. Multiatlas segmentation of biomedical images: A survey. MIA, 24(1):205–219, 2015.
 [23] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097–1105, 2012.
 [24] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, pages 3431–3440, 2015.
 [25] A. Prasoon, K. Petersen, C. Igel, F. Lauze, E. Dam, and M. Nielsen. Deep Feature Learning for Knee Cartilage Segmentation Using a Triplanar Convolutional Neural Network. MICCAI, 2013.
 [26] S. Qiao, W. Shen, Z. Zhang, B. Wang, and A. Yuille. Deep cotraining for semisupervised image recognition. In ECCV, 2018.
 [27] C. Qin, R. G. Moreno, C. Bowles, C. Ledig, P. Scheltens, F. Barkhof, H. RhodiusMeester, B. M. Tijms, A. W. Lemstra, W. M. van der Flier, B. Glocker, and D. Rueckert. A semisupervised large margin algorithm for white matter hyperintensity segmentation. In MICCAI, pages 104–112, 2016.
 [28] I. Radosavovic, P. Dollár, R. Girshick, G. Gkioxari, and K. He. Data distillation: Towards omnisupervised learning. In CVPR, 2018.
 [29] T. Rohlfing, R. Brandt, R. Menzel, and C. R. Maurer Jr. Evaluation of atlas selection strategies for atlasbased image segmentation with application to confocal microscopy images of bee brains. NeuroImage, 21(4):1428–1442, 2004.
 [30] O. Ronneberger, P. Fischer, and T. Brox. Unet: Convolutional networks for biomedical image segmentation. In MICCAI, pages 234–241, 2015.
 [31] C. Rosenberg, M. Hebert, and H. Schneiderman. Semisupervised selftraining of object detection models. In WACV/MOTION, pages 29–36, 2005.
 [32] H. R. Roth, L. Lu, N. Lay, A. P. Harrison, A. Farag, A. Sohn, and R. M. Summers. Spatial aggregation of holisticallynested convolutional neural networks for automated pancreas localization and segmentation. MIA, 45:94 – 107, 2018.
 [33] H. R. Roth, H. Oda, Y. Hayashi, M. Oda, N. Shimizu, M. Fujiwara, K. Misawa, and K. Mori. Hierarchical 3d fully convolutional networks for multiorgan segmentation. arXiv preprint arXiv:1704.06382, 2017.
 [34] H. R. Roth, C. Shen, H. Oda, T. Sugino, M. Oda, Y. Hayashi, K. Misawa, and K. Mori. A multiscale pyramid of 3d fully convolutional networks for abdominal multiorgan segmentation. arXiv preprint arXiv:1806.02237, 2018.
 [35] W. Shen, B. Wang, Y. Jiang, Y. Wang, and A. Yuille. Multistage multirecursiveinput fully convolutional networks for neuronal boundary detection. In ICCV, pages 2391–2400, 2017.
 [36] R. T. Sousa and J. Gama. Comparison between cotraining and selftraining for singletarget regression in data streams using amrules. 2017.
 [37] P. Tang, X. Wang, S. Bai, W. Shen, X. Bai, W. Liu, and A. L. Yuille. Pcl: Proposal cluster learning for weakly supervised object detection. TPAMI, 2018.
 [38] P. Tang, X. Wang, X. Bai, and W. Liu. Multiple instance detection network with online instance classifier refinement. In CVPR, pages 2843–2851, 2017.
 [39] P. Tang, X. Wang, A. Wang, Y. Yan, W. Liu, J. Huang, and A. Yuille. Weakly supervised region proposal network and object detection. In ECCV, pages 370–386, 2018.
 [40] J. Wang, T. Jebara, and S.F. Chang. Semisupervised learning using greedy maxcut. JMLR, 14(Mar):771–800, 2013.
 [41] Y. Wang, Y. Zhou, W. Shen, S. Park, E. K. Fishman, and A. L. Yuille. Abdominal multiorgan segmentation with organattention networks and statistical fusion. arXiv preprint arXiv:1804.08414, 2018.
 [42] Y. Wang, Y. Zhou, P. Tang, W. Shen, E. K. Fishman, and A. L. Yuille. Training multiorgan segmentation networks with sample selection by relaxed upper confident bound. In MICCAI, 2018.
 [43] R. Wolz, C. Chu, K. Misawa, M. Fujiwara, K. Mori, and D. Rueckert. Automated abdominal multiorgan segmentation with subjectspecific atlas generation. TMI, 32(9):1723–1730, 2013.
 [44] C. Xu, D. Tao, and C. Xu. A survey on multiview learning. arXiv preprint arXiv:1304.5634, 2013.

[45]
Q. Yu, L. Xie, Y. Wang, Y. Zhou, E. K. Fishman, and A. L. Yuille.
Recurrent Saliency Transformation Network: Incorporating MultiStage Visual Cues for Small Organ Segmentation.
In CVPR, 2018.  [46] X. Zhou, T. Ito, R. Takayama, S. Wang, T. Hara, and H. Fujita. Threedimensional ct image segmentation by combining 2d fully convolutional network with 3d majority voting. In Deep Learning and Data Labeling for Medical Applications, pages 111–120. 2016.
 [47] Y. Zhou, L. Xie, E. K. Fishman, and A. L. Yuille. Deep supervision for pancreatic cyst segmentation in abdominal ct scans. In MICCAI, 2017.
 [48] Y. Zhou, L. Xie, W. Shen, Y. Wang, E. K. Fishman, and A. L. Yuille. A fixedpoint model for pancreas segmentation in abdominal ct scans. In MICCAI, pages 693–701, 2017.