CNNs are often of very large size, resulting in high memory requirement and high latency of operations, and thus not suitable for resource-constrained applications (e.g., edge computing). To find a good compromise between network size and performance, a series of time-consuming training/validation experiments is often used for a specific imaging application. To address this challenge, we propose a new network compression scheme targeting biomedical image segmentation in resource-constrained application settings (e.g., low cost and easy-to-carry imaging devices for disaster/emergency response and military rescue).
Since the inception of FCNs , various improved segmentation networks [2, 3, 4, 5] were developed. To compress CNNs, various pre-training [6, 7] and post-training compression [8, 9] schemes were suggested. In these techniques, compression thresholds often need to be set manually in multiple pruning iterations.
In contrast with natural scene images, in biomedical or healthcare application settings, images are often for a specific type of disease/injury and captured by specific imaging devices; hence, their objects and settings are quite “stable”, making the image characteristics and complexity much more specific to analyze. In this paper, we leverage this observation to introduce CC-Net.
Based on the image complexity measure, target CNN, and user constraints (e.g., desired accuracy or available memory), CC-Net determines for the given dataset the most suitable multiplicative factor to compress the original CNN. The resulting compressed network is then trained, with much less effort and memory compared to the original network. Experiments using 5 public and 2 in-house datasets and 3 commonly-used CNN segmentation models as representative networks show that CC-Net is effective for compressing segmentation networks, retaining up to of the base network segmentation accuracy and utilizing only of trainable parameters of the full-sized networks in the best case.
Feature-map (filter output) energy is a good indicator of filter’s feature extraction capability. We have conducted a large set of experiments to study the relationship between feature-map energy and training datasets. Fig.1 depicts 3 example energy distribution for the first convolution layer of U-Net . One can observe that (i) a significant number of filter outputs have very low energy, and (ii) less “complex” (to be defined more precisely later) datasets have more low-energy filter outputs. These suggest that U-Net  may be unnecessarily large for some biomedical datasets, and in these cases, filters can be pruned without significantly deteriorating the accuracy.
Based on above observations, we develop CC-Net, depicted in Fig. 2. Inputs and internal operations of CC-Net are shown in parallelograms and rectangles. Existing architectures are the 3 CNNs studied and parameterized in our work. Colored boxes highlights the key contributions of this paper. We elaborate the major components in CC-Net below.
2.1 Image Complexity Computation
We seek an image complexity metric that can (i) indicate the trends of segmentation accuracy and (ii) be easily computed. Our work examined the following candidate metrics: (i) signal energy, (ii) edge information (Sobel and Scharr filters along with image pyramid), (iii) local key-point detection using SURF , (iv) visual clutter information , (v) JPEG complexity  and (vi) blob density. To obtain a single complexity value for an entire dataset, we take the average of complexity values over all the images in the dataset.
|Lymph Nodes (LN)||74||Ultrasound||0.2445||0.0715||in-house|
|Wing Discs (WD)||20||Gray||0.0925||0.1348||in-house|
Out of 7 datasets shown in Table. 1, 5 datasets (train-set, top 5 rows) are used to formulate the methodology, while the remaining 2 datasets (test-set) are used for blind evaluation. Fig. 4 plots average complexities (normalized to the range [0,1]) against the train-set datasets arranged as their F1 and IU score degradation (two most popular segmentation accuracy metrics). Among these complexity measures, the JPEG complexity better follows the trend of F1 score degradation (i.e., higher complexity leads to lower F1). Since IU is related to both feature variety and quantity, to represent it, we linearly combine the JPEG complexity and blob density (, see Table 1), as , where is a value in . The value of is determined by inspecting the optimal regression fitting on the training datasets in our experiments. We consider J and JB for multiplier determination explained as follows.
2.2 Multiplier Determination and Network Compression
Keeping all other variables unchanged, we can express the relationship between the segmentation accuracy () and data complexity () as , where is the number of trainable parameters in a CNN. For general networks, the function can be rather complicate. But in general, segmentation accuracy is monotonically non-decreasing with respect to and , i.e., and .
For CNNs (see Fig. 3), we observe (as discussed in Section 3) that can be approximated by a linear function of . That is, for a constant that reflects the degree of degradation. Given the linear dependency, if and are known, then it is straightforward to compute the change in accuracy or in the number of parameters, when the other is provided. The value of is network-dependent, and can be obtained by performing systematic analysis on network compression and tracking the change in accuracy.
A simple way of compression is to uniformly scale down the number of feature maps in every convolution layer using a single multiplier (). Existing work has shown that it performs very well [7, 16]. The number of trainable parameters after scaling becomes , where and are the numbers of input and output feature maps, and and are filter dimensions. However, finding a good is challenging. We employ complexity measures to determine .
When producing compressed networks, we consider two practical scenarios: (1) memory-constrained best possible accuracy, and (2) accuracy-guided least memory usage. For (1), two sub-cases are: (1.a) disk space budget and (1.b) main memory budget. For case (1.a), given a disk space budget in MB, we first determine , based on the number of bits for each parameter. Then can be computed as . For case (1.b), sizes of feature-maps are considered along with the number of bits for , and the value of can be determined as . For (2), given the lowest acceptable accuracy and the original base network accuracy , using the linear model, , and so as can be readily computed. Using , a compressed network is produced, which then can be trained.
3 Experimental Evaluation
5 train-set datasets (Glands, Lymph Nodes, Melanoma, C2DH-HeLa, Wing Discs) are used to determine for 3 CNN models (Fig. 3), which is then mapped to J & JB to determine . For simple calculations maintaining integer filter values, , , , are considered (Fig. 6 & Fig. 7
(a), (c) X-axis). 2 test-set datasets (C2DH-U373, C2DL-PSC) are used to validate our method. We use a standard back-propagation implementing Adam (learning rate = 0.00005) and cross entropy as loss function using data augmentation. Experiments are performed on NVIDIA-TITAN and Tesla P100 GPUs, using the Torch framework.
Fig. 5 shows some segmentation output. Fig. 6 and 7 show the calculated degree of degradation () for FCN , U-Net , and CUMedVision  networks. In these figures, (a) and (c) give the degradation in the relative F1 and IU accuracy (i.e., ) with respect to changes in the number of parameters expressed in logarithmic values. The slopes of regression lines for each dataset in (a) and (c) are plotted against the respective complexities in (b) and (d).
Test case 1 (accuracy-guided least memory usage). We consider an example constraint of . The
is estimated usingand and complexity (Table 1). Using the ceiling values, compressed networks are trained and analyzed. As shown in Table 2, a significant compression is achieved (best 113x for C2DH-U373 on U-net and least 3.5x for C2DL-PSC on CUMed) with much better accuracy compared to compression achieved using only  or . To validate the effectiveness in estimating , we introduce a small reduction in value (, smallest possible keeping integer filters); the accuracy degrades below (Table 2, row CC-Net-case1-). CC-Net compression does not show much improvement when pruned further, indicating few remaining ineffective filters.
Test case 2 (memory-constrained best possible accuracy). We consider a disk space budget of 1 MB. Using ceiling of , compressed networks are produced as shown in Table 2, whose accuracy satisfies the accuracy prediction made by our method (Fig. 8).
|U-Net ||CUMedVision ||FCN |
|Base Network + Squeeze ||C2DH-U373||0.819||0.854||7.049||0.832||0.863||6.669||0.844||0.875||7.369|
|Base Network + Prune ||C2DH-U373||0.858||0.867||7.491||0.848||0.861||6.886||0.809||0.837||7.551|
|CC-Net-case1 + Squeeze||C2DH-U373||0.806||0.840||5.243||0.820||0.853||5.245||0.824||0.860||5.915|
|CC-Net-case1 + Prune||C2DH-U373||0.834||0.847||5.435||0.834||0.847||5.377||0.830||0.843||5.938|
The overall reduction (R = ) in trainable parameters (PR) and evaluation latency (LR) for all 7 datasets (for test case 1) is plotted in Fig. 9. Larger complexity results in less compression, indicating a higher requirement in trainable parameters for extracting features. CC-Net achieves parameter and latency reduction in the range of to and to for different datasets.
|Ours (new)||C2DH-U373||O||4786 ms||-|
|Ours (existing)||C2DH-U373||Negligible||4786 ms||-|
and CC-Net on U-Net for test case 1 (on P100 GPU). Per epoch training time (in ms) is provided along with number of pruning epochs (column Post-training). We have used fewer fine-tuning iterations per pruning epoch, however, pruning is expensive and can exceed original network training by a factor of 3[8, 9]. One time determination (‘O’ in Table 3) for any CNN is a bottleneck for CC-Net. Yet, after this process, significant reduction in training time can be achieved for any dataset, trained on the same network. We consider ‘O’ can be computed under 2x training time of base architecture, with a sufficient degree of accuracy, using 2 datasets with two points (.
In this paper, we presented a new image complexity-guided network compression scheme, CC-Net, for biomedical image segmentation. Instead of compressing CNNs after training, we focused on pre-training network size reduction, exploiting image complexity of the training data. Our method is effective in quickly generating compressed networks with target accuracy, outperforming state-of-the-art network compression methods. Our scheme accommodates practical applied design constraints for compressing CNNs for biomedical image segmentation.
This work was supported in part by the National Science Foundation under Grants CNS-1629914, CCF-1640081, and CCF-1617735, and by the Nanoelectronics Research Corporation, a wholly-owned subsidiary of the Semiconductor Research Corporation, through Extremely Energy Efficient Collective Electronics, an SRC-NRI Nanoelectronics Research Initiative under Research Task ID 2698.004 and 2698.005.
-  Jonathan Long, Evan Shelhamer, and Trevor Darrell, “Fully convolutional networks for semantic segmentation,” CoRR, vol. abs/1411.4038, 2014.
-  O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” ArXiv e-prints, May 2015.
Hao Chen, Xiaojuan Qi, Jie-Zhi Cheng, and Pheng-Ann Heng,
“Deep contextual networks for neuronal structure segmentation,”in AAAI, 2016, pp. 1167–1173.
L. Yang, Y. Zhang, J. Chen, S. Zhang, and D. Z. Chen,
“Suggestive annotation: A deep active learning framework for biomedical image segmentation,”in 20th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2017, vol. III, pp. 399–407.
-  L. Wu, Y. Xin, S. Li, T. Wang, P. A. Heng, and D. Ni, “Cascaded fully convolutional networks for automatic prenatal ultrasound image segmentation,” in 14th IEEE International Symposium on Biomedical Imaging (ISBI), April 2017, pp. 663–666.
-  F. N. Iandola, S. Han, et al., “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and 0.5MB model size,” ArXiv e-prints, Feb. 2016.
-  A. G. Howard, M. Zhu, et al., “MobileNets: Efficient convolutional neural networks for mobile vision applications,” ArXiv e-prints, Apr. 2017.
-  S. Han, H. Mao, et al., “Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding,” ArXiv e-prints, Oct. 2015.
-  P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, “Pruning convolutional neural networks for resource efficient inference,” ICLR, June 2017.
-  Herbert Bay, Tinne Tuytelaars, and Luc Van Gool, “Surf: Speeded up robust features,” in ECCV, 2006, pp. 404–417.
-  Ruth Rosenholtz, Yuanzhen Li, and Lisa Nakano, “Measuring visual clutter,” Journal of Vision, vol. 7, no. 2, pp. 17, 2007.
-  Honghai Yu and Stefan Winkler, “Image complexity and spatial information,” 5th International Workshop on Quality of Multimedia Experience (QoMEX), pp. 12–17, 2013.
-  K. Sirinukunwattana, J. P. W. Pluim, et al., “Gland segmentation in colon histology images: The GlaS challenge contest,” ArXiv e-prints, Mar. 2016.
-  N. C. F. Codella, D. Gutman, et al., “Skin lesion analysis toward melanoma detection: ISBI 2017,” ArXiv e-prints, Oct. 2017.
-  V. Ulman, M. Maška, et al., “An objective comparison of cell-tracking algorithms,” Nature Methods, 2017.
-  Ariel Gordon, Elad Eban, et al., “MorphNet: Fast and simple resource-constrained structure learning of deep networks,” arXiv, 2017.