CC-Net: Image Complexity Guided Network Compression for Biomedical Image Segmentation

01/06/2019 ∙ by Suraj Mishra, et al. ∙ 0

Convolutional neural networks (CNNs) for biomedical image analysis are often of very large size, resulting in high memory requirement and high latency of operations. Searching for an acceptable compressed representation of the base CNN for a specific imaging application typically involves a series of time-consuming training/validation experiments to achieve a good compromise between network size and accuracy. To address this challenge, we propose CC-Net, a new image complexity-guided CNN compression scheme for biomedical image segmentation. Given a CNN model, CC-Net predicts the final accuracy of networks of different sizes based on the average image complexity computed from the training data. It then selects a multiplicative factor for producing a desired network with acceptable network accuracy and size. Experiments show that CC-Net is effective for generating compressed segmentation networks, retaining up to 95 only 0.1



There are no comments yet.


page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

CNNs are often of very large size, resulting in high memory requirement and high latency of operations, and thus not suitable for resource-constrained applications (e.g., edge computing). To find a good compromise between network size and performance, a series of time-consuming training/validation experiments is often used for a specific imaging application. To address this challenge, we propose a new network compression scheme targeting biomedical image segmentation in resource-constrained application settings (e.g., low cost and easy-to-carry imaging devices for disaster/emergency response and military rescue).

Since the inception of FCNs [1], various improved segmentation networks [2, 3, 4, 5] were developed. To compress CNNs, various pre-training [6, 7] and post-training compression [8, 9] schemes were suggested. In these techniques, compression thresholds often need to be set manually in multiple pruning iterations.

In contrast with natural scene images, in biomedical or healthcare application settings, images are often for a specific type of disease/injury and captured by specific imaging devices; hence, their objects and settings are quite “stable”, making the image characteristics and complexity much more specific to analyze. In this paper, we leverage this observation to introduce CC-Net.

Based on the image complexity measure, target CNN, and user constraints (e.g., desired accuracy or available memory), CC-Net determines for the given dataset the most suitable multiplicative factor to compress the original CNN. The resulting compressed network is then trained, with much less effort and memory compared to the original network. Experiments using 5 public and 2 in-house datasets and 3 commonly-used CNN segmentation models as representative networks show that CC-Net is effective for compressing segmentation networks, retaining up to of the base network segmentation accuracy and utilizing only of trainable parameters of the full-sized networks in the best case.

2 Methodology

Feature-map (filter output) energy is a good indicator of filter’s feature extraction capability. We have conducted a large set of experiments to study the relationship between feature-map energy and training datasets. Fig. 

1 depicts 3 example energy distribution for the first convolution layer of U-Net [2]. One can observe that (i) a significant number of filter outputs have very low energy, and (ii) less “complex” (to be defined more precisely later) datasets have more low-energy filter outputs. These suggest that U-Net [2] may be unnecessarily large for some biomedical datasets, and in these cases, filters can be pruned without significantly deteriorating the accuracy.

Figure 1: Feature map energy distributions of the first convolutional layer of U-Net for several datasets: (left) gland images (high complexity), (middle) C2DH-HeLa cell images, and (right) wing-disk images (low complexity).

Based on above observations, we develop CC-Net, depicted in Fig. 2. Inputs and internal operations of CC-Net are shown in parallelograms and rectangles. Existing architectures are the 3 CNNs studied and parameterized in our work. Colored boxes highlights the key contributions of this paper. We elaborate the major components in CC-Net below.

Figure 2: Our proposed scheme for CC-Net.

2.1 Image Complexity Computation

We seek an image complexity metric that can (i) indicate the trends of segmentation accuracy and (ii) be easily computed. Our work examined the following candidate metrics: (i) signal energy, (ii) edge information (Sobel and Scharr filters along with image pyramid), (iii) local key-point detection using SURF [10], (iv) visual clutter information [11], (v) JPEG complexity [12] and (vi) blob density. To obtain a single complexity value for an entire dataset, we take the average of complexity values over all the images in the dataset.

Dataset Size Type J B Source
Glands (GL) 165 RGB 0.2401 0.5711 [13]
Lymph Nodes (LN) 74 Ultrasound 0.2445 0.0715 in-house
Melanoma (ME) 2750 RGB 0.1505 0.3055 [14]
C2DH-HeLa (CH) 20 Gray 0.1403 0.4607 [15]
Wing Discs (WD) 20 Gray 0.0925 0.1348 in-house
C2DH-U373 (CU) 34 Gray 0.1473 0.0699 [15]
C2DL-PSC (CP) 4 Gray 0.2296 0.3066 [15]
Table 1: Datasets and properties.

Out of 7 datasets shown in Table. 1, 5 datasets (train-set, top 5 rows) are used to formulate the methodology, while the remaining 2 datasets (test-set) are used for blind evaluation. Fig. 4 plots average complexities (normalized to the range [0,1]) against the train-set datasets arranged as their F1 and IU score degradation (two most popular segmentation accuracy metrics). Among these complexity measures, the JPEG complexity better follows the trend of F1 score degradation (i.e., higher complexity leads to lower F1). Since IU is related to both feature variety and quantity, to represent it, we linearly combine the JPEG complexity and blob density (, see Table 1), as , where is a value in . The value of is determined by inspecting the optimal regression fitting on the training datasets in our experiments. We consider J and JB for multiplier determination explained as follows.

2.2 Multiplier Determination and Network Compression

Keeping all other variables unchanged, we can express the relationship between the segmentation accuracy () and data complexity () as , where is the number of trainable parameters in a CNN. For general networks, the function can be rather complicate. But in general, segmentation accuracy is monotonically non-decreasing with respect to and , i.e., and .

For CNNs (see Fig. 3), we observe (as discussed in Section 3) that can be approximated by a linear function of . That is, for a constant that reflects the degree of degradation. Given the linear dependency, if and are known, then it is straightforward to compute the change in accuracy or in the number of parameters, when the other is provided. The value of is network-dependent, and can be obtained by performing systematic analysis on network compression and tracking the change in accuracy.

A simple way of compression is to uniformly scale down the number of feature maps in every convolution layer using a single multiplier (). Existing work has shown that it performs very well [7, 16]. The number of trainable parameters after scaling becomes , where and are the numbers of input and output feature maps, and and are filter dimensions. However, finding a good is challenging. We employ complexity measures to determine .

Figure 3:

CNN architectures. A colorless block represents a group of convolution, batch normalization, and ReLU. A red block and a green block represent pooling and up-scaling operations, respectively.

Figure 4: Mapping image complexity with accuracy degradation. (left) Our datasets are arranged in increasing order of drop in F1 score; (right) our datasets are arranged in increasing order of drop in IU score.

When producing compressed networks, we consider two practical scenarios: (1) memory-constrained best possible accuracy, and (2) accuracy-guided least memory usage. For (1), two sub-cases are: (1.a) disk space budget and (1.b) main memory budget. For case (1.a), given a disk space budget in MB, we first determine , based on the number of bits for each parameter. Then can be computed as . For case (1.b), sizes of feature-maps are considered along with the number of bits for , and the value of can be determined as . For (2), given the lowest acceptable accuracy and the original base network accuracy , using the linear model, , and so as can be readily computed. Using , a compressed network is produced, which then can be trained.

Figure 5: Some C2DH-U373 (top row) & C2DL-PSC (bottom row) segmentation output. (a) Input images, (b) ground truth, the segmentation output of (c) U-net, (d) CC-U-Net, (e) U-Net + [9], (f) CC-U-Net + [9], (g) U-Net + [6], and (h) CC-U-Net + [6].

3 Experimental Evaluation

5 train-set datasets (Glands, Lymph Nodes, Melanoma, C2DH-HeLa, Wing Discs) are used to determine for 3 CNN models (Fig. 3), which is then mapped to J & JB to determine . For simple calculations maintaining integer filter values, , , , are considered (Fig. 6 & Fig. 7

(a), (c) X-axis). 2 test-set datasets (C2DH-U373, C2DL-PSC) are used to validate our method. We use a standard back-propagation implementing Adam (learning rate = 0.00005) and cross entropy as loss function using data augmentation. Experiments are performed on NVIDIA-TITAN and Tesla P100 GPUs, using the Torch framework.

Fig. 5 shows some segmentation output. Fig. 6 and 7 show the calculated degree of degradation () for FCN [1], U-Net [2], and CUMedVision [3] networks. In these figures, (a) and (c) give the degradation in the relative F1 and IU accuracy (i.e., ) with respect to changes in the number of parameters expressed in logarithmic values. The slopes of regression lines for each dataset in (a) and (c) are plotted against the respective complexities in (b) and (d).

Figure 6: Calculated degree of degradation () for the FCN architecture. F1 ( and ) and IU ( and ).
Figure 7: Calculated degree of degradation (). U-Net (left): F1 ( and ) and IU ( and ); CUMedVision (right): F1 ( and ) and IU ( and ).
Figure 8: Predicted (CC-Net) and experimental F1 scores for test cases 1 (accuracy constraint) and 2 (memory constraint).
Figure 9: Trainable parameter and inference latency reduction achieved (on test case 1) for various datasets, arranged along the -axis in increasing image complexity.

Test case 1 (accuracy-guided least memory usage). We consider an example constraint of . The

is estimated using

and and complexity (Table 1). Using the ceiling values, compressed networks are trained and analyzed. As shown in Table 2, a significant compression is achieved (best 113x for C2DH-U373 on U-net and least 3.5x for C2DL-PSC on CUMed) with much better accuracy compared to compression achieved using only [6] or [9]. To validate the effectiveness in estimating , we introduce a small reduction in value (, smallest possible keeping integer filters); the accuracy degrades below (Table 2, row CC-Net-case1-). CC-Net compression does not show much improvement when pruned further, indicating few remaining ineffective filters.

Test case 2 (memory-constrained best possible accuracy). We consider a disk space budget of 1 MB. Using ceiling of , compressed networks are produced as shown in Table 2, whose accuracy satisfies the accuracy prediction made by our method (Fig. 8).

U-Net [2] CUMedVision [3] FCN [1]
Method Dataset F1 IU log(#P) F1 IU log(#P) F1 IU log(#P)
Base Network C2DH-U373 0.896 0.900 7.492 0.891 0.895 6.887 0.891 0.894 7.552
C2DL-PSC 0.801 0.820 0.793 0.814 0.755 0.788

Compressed Networks

Base Network + Squeeze [6] C2DH-U373 0.819 0.854 7.049 0.832 0.863 6.669 0.844 0.875 7.369
C2DL-PSC 0.752 0.781 0.751 0.781 0.697 0.753
Base Network + Prune [9] C2DH-U373 0.858 0.867 7.491 0.848 0.861 6.886 0.809 0.837 7.551
C2DL-PSC 0.749 0.785 7.491 0.744 0.768 6.886 0.691 0.738 7.552
CC-Net-case1 C2DH-U373 0.863 0.890 5.436 0.868 0.866 5.378 0.880 0.885 5.939
C2DL-PSC 0.775 0.818 6.640 0.763 0.794 6.341 0.720 0.766 6.949
CC-Net-case1 + Squeeze C2DH-U373 0.806 0.840 5.243 0.820 0.853 5.245 0.824 0.860 5.915
C2DL-PSC 0.681 0.735 6.197 0.629 0.705 6.176 0.663 0.728 6.786
CC-Net-case1 + Prune C2DH-U373 0.834 0.847 5.435 0.834 0.847 5.377 0.830 0.843 5.938
C2DL-PSC 0.772 0.800 6.639 0.750 0.786 6.341 0.678 0.730 6.949
CC-Net-case1- C2DH-U373 0.841 0.872 5.277 0.816 0.849 5.297 0.817 0.844 5.847
C2DL-PSC 0.751 0.781 6.603 0.759 0.785 6.315 0.713 0.742 6.922
CC-Net-case2 C2DH-U373 0.832 0.863 5.097 0.807 0.837 5.097 0.803 0.834 5.097
C2DL-PSC 0.698 0.745 5.097 0.711 0.743 5.097 0.644 0.719 5.097
Table 2: Segmentation accuracy and network parameters on the C2DH-U373 and C2DL-PSC datasets.

The overall reduction (R = ) in trainable parameters (PR) and evaluation latency (LR) for all 7 datasets (for test case 1) is plotted in Fig. 9. Larger complexity results in less compression, indicating a higher requirement in trainable parameters for extracting features. CC-Net achieves parameter and latency reduction in the range of to and to for different datasets.

Approach Dataset Pre-training Training Post-training
U-Net+[9] C2DH-U373 - 10781 ms 160
C2DL-PSC 2348 ms 30
Ours (new) C2DH-U373 O 4786 ms -
C2DL-PSC 1282 ms -
Ours (existing) C2DH-U373 Negligible 4786 ms -
C2DL-PSC 1282 ms -
Table 3: Training time consideration (test case 1)

Table 3 shows training time for [9]

and CC-Net on U-Net for test case 1 (on P100 GPU). Per epoch training time (in ms) is provided along with number of pruning epochs (column Post-training). We have used fewer fine-tuning iterations per pruning epoch, however, pruning is expensive and can exceed original network training by a factor of 3

[8, 9]. One time determination (‘O’ in Table 3) for any CNN is a bottleneck for CC-Net. Yet, after this process, significant reduction in training time can be achieved for any dataset, trained on the same network. We consider ‘O’ can be computed under 2x training time of base architecture, with a sufficient degree of accuracy, using 2 datasets with two points (.

4 Conclusions

In this paper, we presented a new image complexity-guided network compression scheme, CC-Net, for biomedical image segmentation. Instead of compressing CNNs after training, we focused on pre-training network size reduction, exploiting image complexity of the training data. Our method is effective in quickly generating compressed networks with target accuracy, outperforming state-of-the-art network compression methods. Our scheme accommodates practical applied design constraints for compressing CNNs for biomedical image segmentation.

5 Acknowledgement

This work was supported in part by the National Science Foundation under Grants CNS-1629914, CCF-1640081, and CCF-1617735, and by the Nanoelectronics Research Corporation, a wholly-owned subsidiary of the Semiconductor Research Corporation, through Extremely Energy Efficient Collective Electronics, an SRC-NRI Nanoelectronics Research Initiative under Research Task ID 2698.004 and 2698.005.


  • [1] Jonathan Long, Evan Shelhamer, and Trevor Darrell, “Fully convolutional networks for semantic segmentation,” CoRR, vol. abs/1411.4038, 2014.
  • [2] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” ArXiv e-prints, May 2015.
  • [3] Hao Chen, Xiaojuan Qi, Jie-Zhi Cheng, and Pheng-Ann Heng,

    “Deep contextual networks for neuronal structure segmentation,”

    in AAAI, 2016, pp. 1167–1173.
  • [4] L. Yang, Y. Zhang, J. Chen, S. Zhang, and D. Z. Chen,

    “Suggestive annotation: A deep active learning framework for biomedical image segmentation,”

    in 20th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2017, vol. III, pp. 399–407.
  • [5] L. Wu, Y. Xin, S. Li, T. Wang, P. A. Heng, and D. Ni, “Cascaded fully convolutional networks for automatic prenatal ultrasound image segmentation,” in 14th IEEE International Symposium on Biomedical Imaging (ISBI), April 2017, pp. 663–666.
  • [6] F. N. Iandola, S. Han, et al., “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and 0.5MB model size,” ArXiv e-prints, Feb. 2016.
  • [7] A. G. Howard, M. Zhu, et al., “MobileNets: Efficient convolutional neural networks for mobile vision applications,” ArXiv e-prints, Apr. 2017.
  • [8] S. Han, H. Mao, et al., “Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding,” ArXiv e-prints, Oct. 2015.
  • [9] P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, “Pruning convolutional neural networks for resource efficient inference,” ICLR, June 2017.
  • [10] Herbert Bay, Tinne Tuytelaars, and Luc Van Gool, “Surf: Speeded up robust features,” in ECCV, 2006, pp. 404–417.
  • [11] Ruth Rosenholtz, Yuanzhen Li, and Lisa Nakano, “Measuring visual clutter,” Journal of Vision, vol. 7, no. 2, pp. 17, 2007.
  • [12] Honghai Yu and Stefan Winkler, “Image complexity and spatial information,” 5th International Workshop on Quality of Multimedia Experience (QoMEX), pp. 12–17, 2013.
  • [13] K. Sirinukunwattana, J. P. W. Pluim, et al., “Gland segmentation in colon histology images: The GlaS challenge contest,” ArXiv e-prints, Mar. 2016.
  • [14] N. C. F. Codella, D. Gutman, et al., “Skin lesion analysis toward melanoma detection: ISBI 2017,” ArXiv e-prints, Oct. 2017.
  • [15] V. Ulman, M. Maška, et al., “An objective comparison of cell-tracking algorithms,” Nature Methods, 2017.
  • [16] Ariel Gordon, Elad Eban, et al., “MorphNet: Fast and simple resource-constrained structure learning of deep networks,” arXiv, 2017.