MDU-Net: Multi-scale Densely Connected U-Net for biomedical image segmentation

Radiologist is "doctor's doctor", biomedical image segmentation plays a central role in quantitative analysis, clinical diagnosis, and medical intervention. In the light of the fully convolutional networks (FCN) and U-Net, deep convolutional networks (DNNs) have made significant contributions in biomedical image segmentation applications. In this paper, based on U-Net, we propose MDUnet, a multi-scale densely connected U-net for biomedical image segmentation. we propose three different multi-scale dense connections for U shaped architectures encoder, decoder and across them. The highlights of our architecture is directly fuses the neighboring different scale feature maps from both higher layers and lower layers to strengthen feature propagation in current layer. Which can largely improves the information flow encoder, decoder and across them. Multi-scale dense connections, which means containing shorter connections between layers close to the input and output, also makes much deeper U-net possible. We adopt the optimal model based on the experiment and propose a novel Multi-scale Dense U-Net (MDU-Net) architecture with quantization. Which reduce overfitting in MDU-Net for better accuracy. We evaluate our purpose model on the MICCAI 2015 Gland Segmentation dataset (GlaS). The three multi-scale dense connections improve U-net performance by up to 1.8 MDU-net with quantization achieves the superiority over U-Net performance by up to 3



There are no comments yet.


page 8


Unified Multi-scale Feature Abstraction for Medical Image Segmentation

Automatic medical image segmentation, an essential component of medical ...

Sharp U-Net: Depthwise Convolutional Network for Biomedical Image Segmentation

The U-Net architecture, built upon the fully convolutional network, has ...

Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation

With pervasive applications of medical imaging in health-care, biomedica...

A Dense Siamese U-Net trained with Edge Enhanced 3D IOU Loss for Image Co-segmentation

Image co-segmentation has attracted a lot of attentions in computer visi...

MSRF-Net: A Multi-Scale Residual Fusion Network for Biomedical Image Segmentation

Methods based on convolutional neural networks have improved the perform...

SparseMask: Differentiable Connectivity Learning for Dense Image Prediction

In this paper, we aim at automatically searching an efficient network ar...

ViTBIS: Vision Transformer for Biomedical Image Segmentation

In this paper, we propose a novel network named Vision Transformer for B...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Figure 1: Example of a multi-scale dense connected encoder block.

Biological structures to support medical diagnosis, surgical planning and treatments. Based on fully convolutional networks (FCN) and U-Net [31, 26]

, deep convolutional networks (DNNs) have made significant improvemnents in biomedical image segmentation. Due to the high efficiency and capability to automatically capture information without hand-designed features, deep learning methods have dominated biomedical image analysis. Due to the segmentation abnormalities and histological variations, a higher level of pixelwise prediction in biomedical image analysis is required than in natural images. In particular, a marginal bias in biomedical segmentation will result in high false clinical treatment. Therefore, the improvement of segmentation remains boosting attention. Recent works such as U-Net which applied skip connections to combine feature maps from the current layer with higher layer feature maps and proved a competitive performance in maintaining fine-grained information. In the meantime, segmentation masks are generated with contextual details even if the background composition is rather complicated. We divide it into two categories.

1) intra-block dense connections which embeds the dense block to the traditional convolutional block such as FDU-Net[15]. In addition, cascaded of stacked U-Nets also gain enough attention. CU-Net[9] perform dense connections of the same level among multiple U-Nets. However, these works fail to consider transforming the size of feature maps. As a consequence, they are substantially different from our work. 2) Inter-block dense connections. Which means current layer can fuses from previous layer with differnet scale. For instance, MIMO-Net[30] takes input image of different scales in the encoder unit. However, the feature maps are not actually reused. U-Net++[45] fuses higher resolution feature maps in the decoder unit but it involves a massively computational costs due to the large number of intermediate convolutions. In U-Net ++, the current layer can only fuse the feature maps from higher layers.

Inspired by DenseNet[18], in order to improve segmentation accuracy, we directly down-sample features from lower layers and perform up-sampling functions for higher layers to the same resolution of the current layer and fuses them with feature maps from the current layer. We use 1*1 conv twice to control the number of channels the same as before. The whole operation involves in a small constant number of extra parameters. As far as we are concerned, we are the first to explore directly fusing deep semantic and coarse-grained feature maps from higher layers and low-level, fine-grained feature maps from lower layers. The modified fusing operation contains more object information and pixel information, and therefore improves the segmentation in U-Net architecture. We also systematically analyze the impact of different kinds of densely connected structure. The experiment shows that fusing higher and lower layer’s feature maps simultaneously turns out more effective and achieves a higher precision.

The contribution of our work is 1) conducting complete experiment and analysis on the influence on U-Net with multi-scale dense connections systematically. 2) we adopt the optimal model based on the experiment and propose a novel Multi-scale Dense U-Net (MDU-Net) architecture with quantization. The proposed model achieves the superiority over U-Net performance by up to 3% on testA and 4.1% on testB.

2 Related Work

In this section, we introduce late approaches towards U-Net architecture, dense connections, multi-scale representation, network quantization and biomedical image segmentation methods.

2.1 U-Net architecture

Models are designed as encoder-decoder architectures to retrieve high resolution from low resolution representations of the image. [31] initially proposed the U-shape network architecture with direct skip connection between the encoder and decoder. systematically analyzed and proved the importance of long skip connection in U-Net for biomedical image segmentation. Other than image segmentation, a variety of tasks involves in U-Net based architecture. Stacked U-Nets[34]

iteratively fuse multi-scale features without changing the resolutions. To deal with human pose estimation tasks,

[27, 38, 41] stacked modified U-Nets which captured both the top-down and bottom-up features as a whole. [33][12]follow the grid pattern in the U-shape structure. In a more general manner, [24] additionally employed multi-path refinement and global convolutional blocks respectively between the encoder and decoder. The classification and localization problems are solved simultaneously during the successive down-sampling and up-sampling operation in U-Net. Furthermore, we conduct experiments in detail on the impact of U-Net architecture with various dense connections.

2.2 Dense connections

Recently, the exploration on both the depth and the width of the network architecture has been a focused study. Approaches toward wider network begin with [36, 37]

which introduced ‘Inception Module’ by concatenating feature maps to approximate sparse structure. Moreover, residual network

[17, 19]

alleviated the vanishing gradient problem by summing up a shortcut connection with the residual function. Recent methods such as PSPNet

[43] and Refinenet [24] applied residual architecture more frequently as feature extractor in dense prediction tasks. [11]combined U-Net with residual network and proved skip connection effective in qiomedical image segmentation. Additionally, to improve the representational power without increasing the depth and width of the network, [18] proposed a typical structure of dense connections. In a dense block, each output of the convolution unit contributes to all the subsequent units as input through concatenation. With substantially fewer parameters, the network enables feature reuse and better gradient flow and therefore yields extremely competitive results. In FC-DenseNet [21], they extended the DenseNet [18] by replacing each convolutional block in the downsampling path of FCN with dense block which they referred as transition up module to deal with semantic segmentation problems. [40] further improved dense decoder blocks with feature-level long-range skip connections. With the cascaded architecture of single-pass, the network obtained surprising results with fewer computational costs on multi-scale works. The compact structure of dense connections integrates shortcut connection, feature reuse and implicit deep supervision while exhibiting no extra difficulties of optimization. Apart from directly adding dense connections in convolutional blocks, [1] composed a denser scale sampling and denser pixel sampling in an atrous spatial pyramid pooling module [4]. Dense connections proved extraordinarily effective in biomedical image processing due to the limited amount of data. [15] incorporated dense connectivity [22]within the encoder and decoder path. To address the spatial information of 3D input data,[23] used 2D-DenseUnet as intra-slice feature extractor along with hybrid feature fusion module to formulate an end-to-end learning. Inspired by the previous literature, we generalize the dense connections to extend feature fusion and contextual information of various scales between the encoder and decoder.

2.3 Multi-scale Representation

Approaches towards the application of encoding the multi-scale context information are widely explored. Other than the encoder-decoder structure discussed before, the construction of image pyramid [3, 5, 25] is frequently used so that various scales of objects are obtained in the network. Dilated or atrous convolution [3, 4, 42] deployed in parallel or cascaded expands the receptive fields while exhibiting no extra parameters. Further, ASPP [4] modified the atrous convolution in parallel within spatial pyramid pooling to efficiently capture features of an arbitrary scale. In particular, Dense-ASPP [1] stacks ASPP module in a denser manner. Beyond atrous convolution, deformable convolution [7] generalize the atrous convolution by boosting the spatial sampling locations.

2.4 Network Quantization

Usually, the increasing scale of the network results in high consumption of computational resources and relatively difficult optimization. Quantization techniques for training deep neural networks are gaining growing attention and recent approaches

[16, 6, 29]have succeeded in reducing the scale of the network by means of cropping precision operations and operands. Incremental quantization[44] compresses the parameters to the powers of two or zero by iteratively weight-partition, group-wise quantization and re-training. The pruning-inspired strategy forms the quantized parameters as a weak model and compensates the loss of precision by re-training the remaining parameters. Quantization of the network improves the generalization of the network and the robustness to potential overfitting at the cost of subtle loss of precision.

Figure 2: The illustration of a sample conbination of multi-scale dense encoder architecture, multi-scale dense decoder architecture and multi-scale dense Cross connections architecture based on U-Net:

2.5 Biomedical Image Segmentation

Previously, hand-crafted features containing morphological information are designed and traditional graph-based models are frequently used[20, 28, 35, 13]. However, malignant subjects vary seriously in appearance and they are beyond the capacity of traditional methods. Therefore, deep learning methods have dominated biomedical image processing in recent years[8, 10, 32], especially in histological section analysis[2, 32]Ṫo relieve effort of manual annotation, Suggestive Annotation[esa]

combined fully convolutional network with active learning to select hard examples for further annotation.

[2, 14]

modified loss functions and achieved promising results for Gland Segmentation. In addition, MIMO-Net

[44] deals with the variation of intense cell boundaries and sizes by exploiting multi-inputs and multi-outputs in the network. To this end, we propose a simple yet effective multi-scale connectivity pattern for biomedical image segmentation.

3 Method

In this section, firstly, we introduce three multi-scale dense connected blocks in encoder, decoder and across encoder and decoder. The overall combining three multi-scale dense connected blocks architecture of our network. As illustrated in Figure 2. Also, we compare the proposed blocks with U-Net in detail. Secondly, we decribe the implementation of quantization in proposed model, which reduce overfitting in model.

3.1 Dense Encoder and Dncoder Block

Our improvements is based on Unet’s. Let’s briefly look back at the basic structure of Unet. A traditional encoder unit can be defined as the left of Figure 3. and is the input and output of current layer, is the output of after downsample. Eq 1 and Eq 2 describe the process.

Figure 3: A traditional encoder unit in U-Net VS our purposed dense connected encoder unit

Our method is use instead of , which is denfined as Eq 4. We use encoder the feature maps , which are Adjusted to the same size as from pervious layer I-n to layer I-2. fuses two feature maps and . represents the concatenation operation and conv. The describe of n above refers to the number of current layer fuses oredered pervious layer feature maps. the influence about the dense connected number n will be discuss in section 5.2.

Specifically, each convolutional block is composed of two repeated cascaded structure of a conv

, all of them follows by a batch normalization and a ReLU activation function. Figure 1 is sample of dense connected decoder unit which

. Dense decoder block is similar to the dense encoder block, we won’t repeat it.

There are some meaning multi-scale dense connected different about above in encoder or decoder. Such as the multi-input (Min) and multi-output (Mout), as shown in Eq 5 and Eq 6. In Min dense connected unit, each layer only fuses the feature maps from input with downsampling to the corresponding size, meanwhile in Mout dense connected unit only the last layer fuses all the feature map from pervious layer with upsampling to the corresponding size.

3.2 Dense Cross connections Block

In this Section, we also start from the traditional Unet cross connections. As shown in figure 4, a traditional cross connections unit be defined as Eq 7, Eq 8 and Eq 9. and is the input and output of current layer, is the feature map in encoder corresponding to . is the output of after upsample. encoder the feature maps from layer I-1 in encoder and the output from pervious layer in decoder after upsampling.

Figure 4: a traditional connection across encoder and decoder unit in U-Net VS our purposed multi-scale dense connections across encoder and decoder unit

our method is use instead of , which is denfined as Eq 11. encoder two groups of feature maps from higher ordered encoder layer I+1 to I+n and lower ordered encoder layer I-n to I-1. fuses two feature maps and . represents the same opeartion. which adjust the number of channels the same as .

There are some meaning dense connected different about that. Such as the Upper and Lower, as shown in Eq 12 and Eq 13. In Upper dense connected unit, each layer in decoder can only fuses the feature from Upper layer in encoder, meanwhile in Lower dense connected unit can only fuses the feature from Upper layer in encoder,

3.3 Fully Multi-scale Dense connected U-shape architecture

In this section, we introduce the fully dense connected U-shape architecture based on U-Net. As illustrated in Figure 2, the improved structure of encoder is identical to Section 3.1. The decoding structure is the combination of multi-scale dense cross connections and multi-scale dense decoder. The detailed information follows Eq 7, 8, 9 in Section 3.2. The variants and operations share the same description with Section 3.2.

FMDU-Net encodes the dense cross connections, dense connected decoder with feature maps from corresponding feature maps of different scales in encoder and the feature maps from previous layers in decoding blocks respectively. We re-encode the information obtained from the first encoding operation. The encoded feature maps share the same number of channels with the original one.

3.4 Network Quantization

As the increasing scale of the network results in high consumption of computational resources and relatively difficult optimization, we adopt Incremental Quantization (INQ) to compress the parameters as a regularization function against potential overfitting. We integrate the results of multiple networks as the final result. The number of parallel model is referring to [39]. INQ quantizes the parameters to the power of two or zero which makes shift operation possible. As shown in Eq 18 where the is the original weights and is the quantized, u and l refer to upper and lower bound. iteratively, half of the weights are quantized and set fixed, and the network is then fine-tuned end-to-end until all the parameters are quantized. We experiment different bits of 3, 5 and 7 to ruduce overfitting of dense connections in section 4.4.

4 Experiments

To evaluate the proposed model thoroughly, we applied the Gland Segmentation (GlaS) dataset ,a biomedical image datasets, in Histology Image Challenge held at MICCAI 2015. It contains 165 images with 16 HE stained histological sections colon cancer. 85 images (37 benign and 48 malignant) are selected as training set while 80 images (37 benign and 43 malignant) are used for testing. To be specific, all test images were separated into two categories.(60 Test Part A and 20 Test Part B) We train our proposed end-to-end network with backpropagation on two NVIDIA GeForce GTX TITAN X, each contains 12 GB of memory. We set the learning rate to 0.005 in the beginning, and divides by 10 every time the iteration reaches a threshold. SGD optimization algorithm and a batch size of 4 is set during the training time.The optimal model is selected based on the performance on both training sets. Additionally, we conduct experiments on dense connections of various sizes and shapes. For dense encoder and dense decoder block, we compare the number of connections from 1 to 4 and two special cases (Min and Mout) mentioned before. For dense cross block, due to the limited depth of the network, we examine the effectiveness only on

and connections. Besides, the performance of quantization is evaluated independently.

Figure 5: Training loss on the Gland dataset with various dense connected architectures based on U-Net

As illustrated in Figure 5, we compare the train loss of original U-Net, single dense connected model and combination dense connected model based on U-Net in first four hundred epochs. We can see that after the one hundred epochs, our proposed models are more stable than original U-Net. Which proves our conclusion that dense connections improve the information flow encoder, decoder and across them. Multi-scale dense connections, and achieve a higher precision. We also compare the output of our proposed various dense connected model based on U-Net with original U-Net. The performance of our proposed models are better than that of original U-Net. In the next subsections, we will discuss the effect of the number of dense connections in single and combination model based on U-Net. From which we find that too much dense may led to overfitting, while improving accuracy. We also discuss the impact of our method, model quantization, for reducing the over fitting.

4.1 Discussion on the number of dense connections

In this section, we explore the influence on each dense structure (dense encoder block, dense decoder block, dense cross connections) as number of connections varies in detail. As shown in Table 1, 2 and 3, each structure is followed by the corresponding number of connections. Concluded from the experiment, obviously, the accuracy generally gets higher as the number of dense connections increases. The result indicates dense connections including the encoded object information from higher layers and pixel information from lower layers improve the feature reuse and thus gain a promising segmentation accuracy. On MICCAI 2015 Gland Dataset, the modification of certain structure obtains an accuracy of 91.8% on Test A and 87.1% on Test B which achieves a superiority by 2% on average over U-Net.

       Method mean IoU Dice Coefficient
0.797 0.738 0.886 0.853
0.841 0.753 0.906 0.862
0.852 0.771 0.915 0.871
0.856 0.772 0.918 0.869
0.859 0.779 0.919 0.877
0.861 0.778 0.919 0.872
Table 1: Prediction performance comparison of Unet with Multi-scale dense connected encoder
       Method mean IoU Dice Coefficient
0.797 0.738 0.886 0.853
0.841 0.759 0.908 0.861
0.852 0.768 0.915 0.866
0.857 0.770 0.917 0.870
0.860 0.784 0.919 0.877
0.861 0.784 0.920 0.870
Table 2: Prediction performance comparison of Unet with Multi-scale dense connected decoder
       Method mean IoU Dice Coefficient
0.797 0.738 0.886 0.853
0.852 0.762 0.917 0.866
0.855 0.766 0.918 0.870
0.857 0.770 0.916 0.868
0.861 0.778 0.920 0.872
Table 3: Prediction performance comparison of Unet with Multi-scale dense cross connected block
                       Method mean IoU Dice Coefficient
Part A Part B Part A Part B
Unet 0.797 0.738 0.902 0.842
0.853 0.764 0.916 0.864
0.859 0.770 0.918 0.870
0.863 0.768 0.920 0.871
0.866 0.764 0.925 0.857
Table 4: Prediction performance comparison of Unet with Multi-scale dense cross connected block

4.2 Discussion on the Combination of three Dense connections

In this section, we investigate the impact of combining three different dense connected blocks. We have reached a conclusion before that the increasing number of dense connections results in a better performance of the model. We select as the basic component, indicating feature maps in each encoding block contribute to four subsequent blocks and is chosen as the same manner. Note that we set connections consisting of two upper connections from subsequent layers, two lower connections from previous layers and the direct skip connection as U-Net. We systematically conduct the experiment of combining two or three basic components. The result is shown in Table 4. Obviously, in TestA, either the combination of two or three achieves a reasonable improvement. However, in TestB, the performance drops compared with the single model. We believe the decreased accuracy is caused by the potential overfitting as the distribution of train dataset and test set A are approximately closer. In the next section, we attempt to explore quantization methods to reduce the overfitting.

4.3 Discussion on network efficiency

Apart from assessing the accuracy of segmentation, we evaluate the number of parameters of the network. Recent methods based on U-Net appear wider, deeper and more complicated to optimize and deep supervision turns out an efficient trick for auxiliary training. In contrast, even the extremely dense structure we proposed increases a tiny number of parameters compared with U-Net. Due to the reuse of feature maps and concatenation operation, no extra computations and parameters are involved except for the 1*1 convolution. Table 1 demonstrate the comparison of the number of parameters of several excellent methods. We achieve the state-of-the-art accuracy while exhibiting ignorable increment of parameters, which reveals the high efficiency of our proposed model. On the other hand, our proposed model reveals a valuable extendibility and can be treated as a novel backbone rather than U-Net for U-shape based networks.

Method parameter number
U-Net 8M
U + dense encoder block 8M + 0.005M
U + dense decoder block 8M + 0.005M
U + dense cross connections 8M + 0.005M
MDUnet* 8M + 0.015M
Unet++ 8M + 1M
MILDnet 8M + 68M
MIMOnet 8M + 166M
  • MDU-Net means that the framework contains three dense connections based on U-Net

Table 5: comparison of parameter number about variant model based Unet
       Method mean IoU Dice Coefficient
Part A Part B Part A Part B
MDU-Net 0.866 0.764 0.925 0.857
* 0.871 0.784 0.925 0.873
0.866 0.790 0.923 0.876
0.859 0.791 0.918 0.865
0.872 0.772 0.928 0.878
0.865 0.786 0.922 0.876
0.857 0.750 0.916 0.881
0.867 0.776 0.919 0.871
0.862 0.772 0.925 0.870
0.859 0.768 0.922 0.878
  • The subscript 1/2 means that 1/2 parameters of the model are quantized

Table 6: Prediction performance comparison of quantization method

4.4 Discussion on network quantization

In this section, we explore quantization methods to improve the performance of our proposed network. In particular, Incremented Quantization is applied to quantize the parameters in order to reduce the overfitting problem instead of model compression. We analyze the quantized models of different degrees because completely quantizing on all the parameters leads to a reduction on segmentation accuracy. As stated in Table 6, the overfitting problem is largely reduced after the first quantization operation in which half of the parameters are quantized. Hence, the performance on Test set B is improved as expected while the prediction accuracy on Test set A remains. The generalization ability of the model is enhanced compared with the overall quantized model. We gain an surprisingly competitive accuracy of 0.88 on test B. In balance, we adopt the half-quantized architecture as our final model.

5 conclusion

In this paper, we propose three different multi-scale dense connections for U shaped architecture’s encoder, decoder and across them. Our architecture is directly fuses the neighboring different scale feature maps from both higher layers and lower layers to strengthen feature propagation in current layer. Which can largely improves the information flow encoder, decoder and across them. And next, we explore the effects of them in detail based on U-Net. Concluded from the experiment, obviously, the accuracy generally gets higher as the number of dense connections increases. We adopt the optimal model based on the experiment and propose a novel MDU-Net combining three dense connected architecture with quantization. which reduce the overfitting from dense connections. Finally, our model achieves the superiority dice coefficient over U-Net by up to 3% on testA and 4.1% on testB.

Figure 6: Visual gland segmentation results on the GlaS dataset. We compare our various multi-scale dense connected model based on U-Net to U-Net.


  • [1] P. Bilinski and V. Prisacariu. Dense decoder shortcut connections for single-pass semantic segmentation. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , pages 6596–6605, 2018.
  • [2] H. Chen, X. Qi, L. Yu, and P.-A. Heng. Dcan: deep contour-aware networks for accurate gland segmentation. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 2487–2496, 2016.
  • [3] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2018.
  • [4] L. C. Chen, G. Papandreou, F. Schroff, and H. Adam. Rethinking atrous convolution for semantic image segmentation. 2017.
  • [5] L. C. Chen, Y. Yang, J. Wang, W. Xu, and A. L. Yuille. Attention to scale: Scale-aware semantic image segmentation. In Computer Vision and Pattern Recognition, pages 3640–3649, 2016.
  • [6] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1. 2016.
  • [7] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei. Deformable convolutional networks. pages 764–773, 2017.
  • [8] N. Dhungel, G. Carneiro, and A. P. Bradley. Deep learning and structured prediction for the segmentation of mass in mammograms. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 605–612. Springer, 2015.
  • [9] L. Dong, L. He, M. Mao, G. Kong, X. Wu, Q. Zhang, X. Cao, and E. Izquierdo. Cunet: a compact unsupervised network for image classification. IEEE Transactions on Multimedia, 20(8):2012–2021, 2018.
  • [10] Q. Dou, H. Chen, L. Yu, L. Zhao, J. Qin, D. Wang, V. C. Mok, L. Shi, and P.-A. Heng.

    Automatic detection of cerebral microbleeds from mr images via 3d convolutional neural networks.

    IEEE transactions on medical imaging, 35(5):1182–1195, 2016.
  • [11] M. Drozdzal, E. Vorontsov, G. Chartrand, S. Kadoury, and C. Pal. The importance of skip connections in biomedical image segmentation. pages 179–187, 2016.
  • [12] D. Fourure, R. Emonet, E. Fromont, D. Muselet, A. Tremeau, and C. Wolf. Residual conv-deconv grid network for semantic segmentation. arXiv preprint arXiv:1707.07958, 2017.
  • [13] H. Fu, G. Qiu, J. Shu, and M. Ilyas. A novel polar space random field model for the detection of glandular structures. IEEE transactions on medical imaging, 33(3):764–776, 2014.
  • [14] S. Graham, H. Chen, Q. Dou, P.-A. Heng, and N. Rajpoot. Mild-net: Minimal information loss dilated network for gland instance segmentation in colon histology images. arXiv preprint arXiv:1806.01963, 2018.
  • [15] S. Guan, A. Khan, S. Sikdar, and P. V. Chitnis. Fully dense unet for 2d sparse photoacoustic tomography artifact removal, 2018.
  • [16] S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. Fiber, 56(4):3–7, 2015.
  • [17] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. pages 770–778, 2015.
  • [18] G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger. Densely connected convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition, pages 2261–2269, 2017.
  • [19] G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger. Deep networks with stochastic depth. pages 646–661, 2016.
  • [20] J. G. Jacobs, E. Panagiotaki, and D. C. Alexander. Gleason grading of prostate tumours with max-margin conditional random fields. In

    International Workshop on Machine Learning in Medical Imaging

    , pages 85–92. Springer, 2014.
  • [21] S. Jegou, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio. The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. pages 1175–1183, 2016.
  • [22] K. H. Jin, M. T. Mccann, E. Froustey, and M. Unser. Deep convolutional neural network for inverse problems in imaging. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society, 26(9):4509–4522, 2016.
  • [23] X. Li, H. Chen, X. Qi, Q. Dou, C. W. Fu, and P. A. Heng. H-denseunet: Hybrid densely connected unet for liver and tumor segmentation from ct volumes. IEEE Transactions on Medical Imaging, PP(99):1–1, 2017.
  • [24] G. Lin, A. Milan, C. Shen, and I. D. Reid. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Cvpr, volume 1, page 5, 2017.
  • [25] G. Lin, C. Shen, A. V. D. Hengel, and I. Reid. Efficient piecewise training of deep structured models for semantic segmentation. pages 3194–3203, 2015.
  • [26] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
  • [27] A. Newell, K. Yang, and J. Deng. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision, pages 483–499. Springer, 2016.
  • [28] K. Nguyen, A. Sarkar, and A. K. Jain. Structure and context in prostatic gland segmentation and classification. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 115–123. Springer, 2012.
  • [29] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision, pages 525–542, 2016.
  • [30] S. E. A. Raza, L. Cheung, D. Epstein, S. Pelengaris, M. Khan, and N. M. Rajpoot. Mimo-net: A multi-input multi-output convolutional neural network for cell segmentation in fluorescence microscopy images. In 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), pages 337–340, April 2017.
  • [31] O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. Springer International Publishing, 2015.
  • [32] H. R. Roth, L. Lu, A. Farag, H.-C. Shin, J. Liu, E. B. Turkbey, and R. M. Summers. Deeporgan: Multi-level deep convolutional networks for automated pancreas segmentation. In International conference on medical image computing and computer-assisted intervention, pages 556–564. Springer, 2015.
  • [33] S. Saxena and J. Verbeek. Convolutional neural fabrics. In Advances in Neural Information Processing Systems, pages 4053–4061, 2016.
  • [34] S. Shah, P. Ghosh, L. S. Davis, and T. Goldstein. Stacked u-nets: A no-frills approach to natural image segmentation. 2018.
  • [35] K. Sirinukunwattana, D. R. Snead, and N. M. Rajpoot. A novel texture descriptor for detection of glandular structures in colon histology images. In Medical Imaging 2015: Digital Pathology, volume 9420, page 94200S. International Society for Optics and Photonics, 2015.
  • [36] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
  • [37] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
  • [38] Z. Tang, X. Peng, S. Geng, L. Wu, S. Zhang, and D. Metaxas. Quantized densely connected u-nets for efficient landmark localization. In European Conference on Computer Vision (ECCV), 2018.
  • [39] X. Xu, Q. Lu, L. Yang, S. Hu, D. Chen, Y. Hu, and Y. Shi. Quantization of fully convolutional networks for accurate biomedical image segmentation. Preprint at https://arxiv. org/abs/1803.04907, 2018.
  • [40] M. Yang, K. Yu, C. Zhang, Z. Li, and K. Yang. Denseaspp for semantic segmentation in street scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3684–3692, 2018.
  • [41] W. Yang, S. Li, W. Ouyang, H. Li, and X. Wang. Learning feature pyramids for human pose estimation. In The IEEE International Conference on Computer Vision (ICCV), volume 2, 2017.
  • [42] F. Yu and V. Koltun. Multi-scale context aggregation by dilated convolutions. CoRR, abs/1511.07122, 2015.
  • [43] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. Pyramid scene parsing network. In IEEE Conference on Computer Vision and Pattern Recognition, pages 6230–6239, 2017.
  • [44] A. Zhou, A. Yao, Y. Guo, L. Xu, and Y. Chen. Incremental network quantization: Towards lossless cnns with low-precision weights. 2016.
  • [45] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang. Unet++: A nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pages 3–11. Springer, 2018.