Biomedical image segmentation plays a key role in disease diagnosis and treatment. Recently, by outperforming traditional approaches, convolutional neural networks (CNNs) have become powerful tools for biomedical image segmentation. In one such CNN based early work, Ronneberger et al. (Ronneberger et al., 2015)
achieved state-of-the-art accuracy in segmenting neuronal structures by proposing U-Net. Since its inception, U-Net has become one of the most popular CNN models for biomedical image segmentation. Networks like CUMedVision(Chen et al., 2016), coarse-to-fine stacked networks (Zhang et al., 2016), cascaded networks (Wu et al., 2017), U-Net++ (Zhou et al., 2018), and UCU-Net (Mishra et al., 2020) were also designed to improve biomedical image segmentation accuracy. Such networks outperform traditional methods and are considered currently as state-of-the-art for many tasks such as melanoma segmentation (Yuan and Lo, 2019; Li and Shen, 2018; Perez et al., 2019), lymph node segmentation (Zhang et al., 2019), and retinal vessel segmentation (Wu et al., 2020; Mishra et al., 2020; Mou et al., 2020; Zhang and Chung, 2018; Mishra et al., 2021). However, CNNs are often of very large sizes, resulting in high memory requirements and high latency of operations, and thus may not be suitable for resource-constrained applications (e.g., edge computing).
Nowadays, low cost and easy-to-carry (e.g., handheld) imaging devices are widely used in edge computing type of biomedical and healthcare applications (e.g., disaster/emergency response, pandemic management, and military rescue), and desirably, the most effective image analysis techniques, including deep learning methods, are applied. However, in many edge computing scenarios (e.g., inremote or resource constrained areas, battlefields, etc), computing resources may be severely limited and cannot implement the ordinary (full) deep learning network models. Hence, compressed versions of deep learning models, subject to local computing resource constraints, should be deployed to achieve best possible performance.
Neural network compression is an important aspect of neural network design. Benefits of compression include faster training, faster inference, and less resources required to design more energy-efficient applications. Post-training compression techniques such as pruning (removing less important filters) and quantization (using lower-precision representations for weights) have been proposed (Han et al., 2016; Molchanov et al., 2017; Zafrir et al., 2019; Zhou et al., 2017; Zhao et al., 2019). Pre-training compression approaches focus on designing smaller networks to begin with (Iandola et al., 2016; Howard et al., 2017). Although these techniques are quite effective in finding smaller networks with acceptable accuracy, they require some parameters to be set manually and use multiple pruning – fine-tuning iterations. In most cases, one standardized big network for segmentation is used regardless of the input data. Hence, compression often commences with the same large initial network and incurs lots of computation overhead. Howard et al. (Howard et al., 2017) proposed to reduce network size using a uniform multiplicative factor for each convolutional layer, which can quickly produce a smaller network. However, no systematic approach was provided to determine the value of the multiplicative factor. Hence, searching for a compressed CNN architecture for a specific imaging application using (Howard et al., 2017)
typically involves a series of time-consuming training/validation experiments using the training data to find a good compromise between network size and performance accuracy. Further, a uniform multiplier based approach is not effective as different convolutional layers in a CNN do not contribute equally to feature extraction(Raghu et al., 2017). To address these challenges, in this paper, we propose a layer-wise multiplier based network compression framework targeting biomedical image segmentation in resource-constrained application settings, which quickly estimates a compressed model by exploiting properties inherent to the target application datasets.
For biomedical image segmentation, depending on the specific diseases or biological targets, the application datasets often exhibit distinctive properties that may shed light on how large of a network may be needed for segmenting the corresponding images. In contrast to natural scene images, in biomedical/healthcare application (or some application-specific) settings, images are often for a specific type of disease/injury and captured by specific imaging devices; hence, their objects and settings are quite “stable”, making the image characteristics and complexity much easier to analyze. We leverage this useful property of biomedical images and propose to use image complexity as a guide to analyze segmentation accuracy degradation caused by compression.
Compressing a CNN by removing network weights generally results in accuracy degradation. It is intuitive that a compressed network may not be able to capture robust image features well with fewer resources (i.e., fewer trainable network weights). We hypothesize that the drop in segmentation accuracy of a CNN caused by compression follows a pattern that can be linked to the target dataset complexity. This assumption is coherent with information theory as we believe ‘less’ complex images contain fewer features and hence can be captured by fewer network weights (or can be compressed more) while ‘more’ complex images require a larger amount of network weights to be successfully captured. Hence, compressing by pruning network weights will have different accuracy degradation on the same network for two different image datasets with different complexities. We seek to map this relation between dataset complexity and network accuracy degradation and call it degree of degradation. We believe that for a network architecture, its degree of degradation is a constant and can be estimated by tracking the accuracy degradation with network compression. Once calculated, the degree of degradation can be utilized with the dataset complexity to predict accuracy degradation on any target dataset that will be caused by network compression.
In this paper, we introduce a new framework for efficiently producing low latency and compressed deep learning networks for biomedical image segmentation without repeated training. We exploit the concept of training data complexity to guide the design of the compressed networks. Specifically, we quantify the complexity of the training image dataset and use it as an indicator of the target network’s trainable weight requirements. We propose several complexity metrics for this purpose, which are much less computationally demanding than CNN training. Then, we map the calculated image complexity with the accuracy degradation of the CNN caused by compression to extract the degree of degradation information. Using the computed image complexity of the training dataset and the degree of degradation of the target architecture, we predict the accuracy for different network sizes without conducting network training. Thus, one may choose a solution that meets both the size and accuracy requirements. Based on the complexity measure, the target network architecture, and specified network constraints (e.g., accuracy or available memory), we determine the most suitable layer-wise multiplicative factors for the given dataset that translates to a compressed network. The resulting compressed network is then trained from scratch, with much less effort and memory compared to a full network for image segmentation. Our approach complements post-training network reduction techniques, by focusing on the pre-training stage to quickly generate a size-reduced network structure for training. We conduct experiments using 3 publicly available and 2 in-house datasets, employing 3 commonly-used CNN architectures for biomedical image segmentation as representative networks to highlight the efficacy of our proposed framework.
Our main contributions are as follows:
Introducing a novel approach for compressing target CNNs for biomedical image segmentation based on image complexity, network architecture, and design constraints.
Analyzing various measures for representing image complexity and their suitability for guiding network compression.
Validating our approach on 3 representative biomedical image segmentation networks to generate corresponding compressed network architectures.
Our proposed framework (shown in Fig. 1) has three major components: (1) image complexity calculation, (2) network degree of degradation calculation, and (3) design constraint inclusion. In Section 2, we provide the details of image complexity calculation. In Section 3, degree of degradation calculation for neural networks is presented. Using user specified constraints to explore the design space is described in Section 4. Experimental evaluations and discussions are provided in Section 5 and Section 6, respectively. Section 7 concludes the paper.
2. Image Complexity Computation
In this section, we first explore various candidates for measuring image complexity in Section 2.1. Then we present our approach to select the target complexity measure in Section 2.2. Finally, we propose a method to compute the layer-wise image complexity which will enable us to perform fine-grain layer-wise pruning.
2.1. Complexity Candidates
Our goal of exploring various image complexity measures is to identify an indicator that represents the information content of data samples. We seek an image complexity metric that can (i) indicate the trend of segmentation accuracy and (ii) be easily computed. Our work examines the following candidate metrics for image complexity estimation.
Signal Energy: The summation across all squared coefficients of the frequency spectrum of a signal is taken as the signal energy. In CNN, we essentially perform filtering of various spatial frequency components present in the images. Hence, higher energy can be attributed to the presence of a richer frequency spectrum, and this may be considered as an indicator for a larger number of filter kernels in CNN to extract valuable information from the data. Similarly, lower energy of an image can be translated to a need for a smaller number of filters in CNN. To compute the image energy of a single image, we calculate the sum of the squared absolute values of the Fourier coefficients.
Edge Information: Since segmentation focuses on detecting the boundaries of the objects of interest, we consider edge information as an important component of image complexity estimation. Yu and Winkler (Yu and Winkler, 2013) used edge information to calculate image complexity. Spatial information at the pixel level is calculated by summing the squared horizontal and vertical edge information extracted using horizontal and vertical Sobel or Scharr kernels, respectively. We compute edge information at different scales to imitate CNN-based fine-to-coarse feature extraction. The mean value of the edge information at different levels is used as an indicator of the image complexity.
, are widely used for computer vision tasks. We consider the number of extracted SURF keypoints, along with their strengths, as another estimate of image complexity.
Visual Clutter: Rosenholtz et al. (Rosenholtz et al., 2007) presented a study of visual clutter, and its effect on feature extraction was provided. The presence of clutter affects visual tasks since it makes feature extraction more complicate. Hence, clutter can serve as a candidate for image complexity estimate. We consider feature congestion and sub-band entropy clutter measures for complexity computation. Feature congestion represents a subjective interpretation of visual clutter, while sub-band entropy is related to the visual information on the display.
JPEG Compression: JPEG-based complexity utilizes a JPEG image compressor. The JPEG-based complexity is defined as the inverse of the compression ratio, i.e., , where
The compressed image is generated using JPEG compression at 25% quality (Yu and Winkler, 2013). A higher JPEG complexity represents a less compressed image with less redundant information. A lower JPEG complexity signifies the presence of redundant information with a higher compression ratio.
Foreground Density: The foreground density accurately represents correlation between foreground and background pixels and can be easily computed as a ratio of the number of foreground pixels to the number of the total pixels in an image, i.e., .
2.2. Candidate Selection
Given that there are multiple training images in a training dataset, we use the mean of the complexity values of all the training images for a specific measure as the corresponding complexity value. Since biomedical images for a specific application (e.g., a specific disease or injury) are often captured by the same imaging modality and contain fixed types of objects, it is reasonable to expect a relatively small variation among the complexity values of different image samples in the same dataset (if an appropriate complexity measure is used). The average complexity value of the training data can then be considered as the representative complexity of the image data for that application.
To see which of the above complexity measures is the most suitable to be used as a guide to direct network compression, we map these complexity measures to segmentation accuracy drop during network compression (to be explained in Section 3). Since F1 score is the most suitable and robust metric for capturing accuracy in class imbalance problems along with being one of the most used accuracy metrics, we explore F1 score for measuring segmentation accuracy in our framework. In Fig. 2, different complexity measures (min-max normalized) are plotted against the F1 score degradation. The trends are different for most of the complexity measures. Compared to other complexity measures, the JPEG complexity clearly follows the trend of F1 score degradation, i.e., higher JPEG complexity values lead to higher F1 score degradation with compression, as shown in Fig. 2.
Along with F1 score, meanIU or IU (class-wise mean of Intersection over Union) is another commonly used metric (Zhang et al., 2016) to measure segmentation accuracy. Since IU relates to both feature variety and quantity, besides JPEG complexity, we introduce a new complexity measure which combines the JPEG complexity and foreground density, denoted by JB. Specifically, JB is defined as a linear function of the JPEG complexity and foreground density, i.e., , where is the JPEG complexity, is the foreground density, and is a value in . The value of is determined by inspecting the optimal regression fitting on the training datasets in our experiments.
2.3. Layer-wise Complexity
In CNNs, convolutional layers are stacked with intermediate sub-sampling operations in order to extract rich contextual features. Each sub-sampling operation reduces the input feature-map scale which is forwarded as input to the subsequent convolutional layers. Since every convolutional layer of a specific stage of a CNN (in between two sub-sampling operations) extracts features from a specific feature-map scale, we explore complexity from an image scale perspective. Such a scale based complexity will enable us to understand the relative information content at a specific image scale which will be helpful in performing layer-wise pruning of a network (fine-grain pruning).
In order to obtain layer-wise JPEG complexity, we extend the approach explained in Eq. (1) by reformulating JPEG complexity as:
Instead of using the ratio of the storage size of the compressed image and original image at a specific scale, we upsample every subsampled image before generating the complexity metric. Such an approach is used by following information theory to have a consistent frame of reference, where the input image to the network is considered as the base case with respect to which each calculation is performed. An example case of information content reduction with subsampling is shown in Fig. 3. Similar extension for layer-wise foreground density calculation is also performed.
3. Network Parameter Calculation
The segmentation accuracy (e.g., F1 and IU scores) depends on many factors, such as the number of network weights, arrangement of network weights (network architecture), training methods, and certainly training data. From the discussions in Section 2.2, it is evident that accuracy is also closely related to the input dataset complexity. Keeping all the other variables (e.g., the network architecture and training method) unchanged, we can express the relationship between the segmentation accuracy and data complexity as , where , , and represent the segmentation accuracy, number of trainable weights, and training data complexity, respectively. For general networks, the function can be rather complicate. But in general, segmentation accuracy is monotonically non-decreasing with respect to and , i.e., and .
For CNNs, which are widely used for biomedical image segmentation, we observe (as discussed in Section 5.3) that can be approximated by a linear function of . That is,
for a constant which reflects the degree of degradation. Given the linear dependency of on , if , , and are known, then it is straightforward to compute the change in accuracy or in the number of trainable network weights, when the other factors are provided. The value of is network-dependent, and can be obtained by performing systematic network compression and tracking the corresponding change in accuracy.
To obtain a compressed network, a widely used method is to reduce the number of channels in the feature maps. Since a channel multiplier based uniform reduction on the number of feature maps is quite simple and performs very well (Gordon et al., 2018), we use it for our network compression. For CNNs, the stored weights (determining the memory usage) are the weights of the filters for each convolutional layer, which, when ignoring biases, can be calculated as
where and are the numbers of channels in the input and output feature maps, and are the dimensions of the filter. With a multiplier , the number of network weights is reduced to
Note that for a given , one can reduce the number of weights by (Mishra et al., 2019).
Our proposed framework for degree of degradation calculation is shown in Fig. 4. Using a specific value, a thinner network architecture is generated. Training is performed on this thinner architecture and output accuracy is reported. In order to generate robust and values, we repeat this procedure for multiple times with different values. For every network, this operation has to be performed once, as we have assumed and to be specific for a fixed network architecture.
4. Design Constraints Inclusion
When producing compressed networks for biomedical image segmentation, we consider two practical design scenarios: (1) memory-constrained best possible accuracy, and (2) accuracy-guided least memory usage. Case 1 with memory-constrained best possible accuracy represents scenarios in many embedded devices where there is a memory budget. The budget can be provided either as main memory usage or as disk space storage and the objective is to design a network which can achieve maximum possible accuracy under the memory budget constraint. Case 2 with accuracy-guided least memory usage represents scenarios where multiple processes are sharing a single resource. In such a setup, some processes can be considered as auxiliaries to certain higher priority main processes where it is ok to compromise the accuracy of such auxiliary processes as long as it does not fall below a certain threshold. The budget can be provided as the accuracy threshold and the objective is to achieve least memory usage in order to free up resources for the main processes.
For each user constraint we explore two directions to compress the network: (a) using a uniform multiplier, and (b) using a nonuniform layer-wise multiplier.
4.1. Memory Constrained Best Possible Accuracy
The memory budget can be provided either as disk space budget or as main memory budget. The disk space budget sets an upper bound on the number of total trainable weights that the compressed network can have. The main memory budget similarly sets an upper bound on the number of total trainable weights in the compressed network. However, in this case, besides considering the number of bits for each weight, one must also take into consideration the sizes of intermediate feature maps since they also occupy the main memory when performing convolution operation. We provide a detailed formulation for disk space budget constraint while only highlighting the modifications necessary for main-memory budget consideration.
Uniform Multiplier: Given a disk space budget in MB, we first determine the number of trainable network weights, , for the compressed network, based on the number of bits for each weight. Then a uniform multiplier can be computed as
where is the number of network weights for the uncompressed original network model. Similarly for a main memory budget, can be calculated. However, taking intermediate feature-maps into consideration, the uniform multiplier in this case will be:
Nonuniform Multiplier: We want to formulate an approach for nonuniform layer-wise multiplier for effective pruning where each convolutional layer is pruned based on the layer-wise complexity of the image scale from which it extracts feature. To simplify notations, we consider a CNN with two convolutional layers, but similar results can be derived for any other CNN.
Consider a CNN with two convolutional layers and a sub-sampling operation in between. The first convolutional layer is associated with image complexity (C1) while second convolutional layer has C2 as the associated image complexity. As segmentation accuracy is defined for the whole network, for a different degree of pruning of layer 1 () and layer 2 (), we can rewrite Eq. (3) as
Further, we can rewrite Eq. (6) for the two convolutional layer CNN as:
where and are the weights associated with the first and second convolutional layers, respectively. However, with nonuniform multipliers associated with each individual layer (i.e., and for the first and second layers, respectively), Eq. (12) results in
Using Eq. (13) and Eq. (11), and can be determined when , , , , , , and are known. For a main memory budget, a similar approach can be used as Eq. (11) is unchanged for both the cases. The only modification is on Eq. (13) which becomes
4.2. Accuracy Guided Least Memory Usage
Provided the lowest acceptable accuracy () as a percentage of best possible accuracy (), our objective is to generate a model with least memory usage. We consider both uniform and nonuniform layer-wise multipliers for this case and provide implementation details as follows.
Uniform Multiplier: For a given accuracy threshold, (= ) can be computed. Using complexity C, and network and , change in number of network trainable weights can be computed as
Layer-wise Multiplier: For nonuniform layer-wise multiplier determination we formulate the problem using the two layer CNN as explained in Section 4.1. We divide the layer-wise multiplier determination task into two sub-problems each associating with one CNN layer. Each sub-problem represent a network extracting features from an image with associated complexity of . Using , , and (as the network structure is the same), we can determine as determined in Eq. (16), i.e.,
Essentially, a system of equations are generated associating each convolutional layer with respective layer-wise complexity. Using , and values, can be calculated which is used to compress for layer specifically. Intuitively, layers dealing with images of higher complexities are compressed less, while layers extracting features from less complex images are compressed more.
5. Experimental Evaluation
We first provide the details of the datasets used in our experiments in Section 5.1. Network architectures are described in Section 5.2. The degree of degradation calculation is shown in Section 5.3. Finally, user constraint based network compression is explained in Section 5.4.
|Scale||Wing disk||DRIVE||Melanoma||Lymph node||CHASE_DB1|
|Scale||J||B||JB (U-Net)||JB (CUMedVision)||JB (UCU-Net)|
5.1. Datasets and Complexities
We experiment with five biomedical image datasets of different modalities. In the DRIVE dataset (Staal et al., 2004), 40 fundus images are provided for retinal vessel segmentation. 20 images are used for training and the other 20 images are used for evaluation. In the CHASE_DB1 dataset (Fraz et al., 2012), 28 fundus images are provided for retinal vessel segmentation without any specific train-test split. Following (Wu et al., 2018; Mishra et al., 2020), we use 20 images for training and the remaining 8 images for evaluation. Melanoma segmentation using the ISIC 2017 skin lesion dataset (Codella et al., 2018) contains 2000 training, 150 validation, and 600 test RGB images for melanoma segmentation. Noticing the smaller validation set, we merge the training and validation sets and randomly select 20% of the merged set for validation as in (Perez et al., 2019). The lymph node dataset contains ultrasound images of the lymph node areas of 237 patients. Following (Zhang et al., 2019), we use 137 images for training (20% for validation) and the rest for testing, assuring no identity overlap. Wing disc pouches of fruit flies are used to study organ development (Liang et al., 2018; Mishra et al., 2019). 996 grayscale wing disc pouch images are investigated by using 889 images for training (20% for validation) and 107 images for testing.
In Table 1, JPEG complexity values calculated for these five biomedical image datasets are shown. CHASE_DB1 has the highest JPEG complexity among all the datasets while the wing disk dataset is considered as the least complex dataset for our experiments. Further, complexity values for different image scales are also provided. Observe that with subsampling operations, JPEG complexity decreases, indicating reduction in information content.
As discussed in Section 2.2, for JB calculation, is determined by examining the optimal regression fitting between vs JB, where the accuracy = IU. The value resulting in the best regression fitting (the best ) is used for JB calculation. For U-Net, CUMedVision, and UCU-Net, the values thus found are 0.7, 0.775, and 0.95, respectively. In Table 2, the JB values for the lymph node dataset are shown. Observe that with scaling, the blob density remains relatively constant.
5.2. Network Architecture and Setup
Three common networks (shown in Fig. 5), with an encoder-decoder architecture, for biomedical image segmentation are used in our experiments. U-Net (Ronneberger et al., 2015) and CUMedVision (Chen et al., 2016) are shown in Fig. 5(a) and Fig. 5(b), respectively. In Fig. 5(c), UCU-Net (Mishra et al., 2020) architecture is highlighted which has a similar encoder as U-Net. For the decoder, UCU-Net combines the U-Net and CUMedVision decoders to generate an architecture with superior contextual information flow (Lin et al., 2020, 2017; Mishra et al., 2021).
The experiments utilize the PyTorch framework with theHe initialization (He et al., 2015). To limit overfitting on a small training set, data augmentation is performed using random flipping and rotation. The training uses the Adam (Kingma and Ba, 2017) optimizer (
) with a fixed learning rate of 0.00002 using a cross-entropy based loss function. Experiments are performed on NVIDIA-TITAN and Tesla P100 GPUs for a number of epochs (CHASE_DB1: 5000, DRIVE: 5000, Melanoma: 3000, lymph node: 5000, wing disk: 3000). The images are resized (CHASE_DB1: 976976 (Mishra et al., 2020), DRIVE: 512 512 (Mishra et al., 2020), Melanoma: 320 320 (Li and Shen, 2018), lymph node: 224 224, wing disc: 320 320), and the training uses 128128 size patches. The batch size for each case is selected as the maximum size permissible by the GPU.
5.3. Degree of Degradation Calculation
As explained in Section 3, for the degree of degradation calculation, we systematically compress a given network architecture and track the accuracy degradation caused by the network compression. Then we map the dataset complexity () with the accuracy degradation caused by compression () to determine the degree of degradation (i.e., and ).
For simpler calculations maintaining the integer filter (channel) values, are used for network compression. The compressed network generated after multiplying uniformly across all the convolutional layers is trained and the corresponding accuracy is reported. In Fig. 6(a), drop in F1-score is plotted against the log number of trainable weights of the U-Net architecture. Each data point corresponds to the relative F1 score (i.e., ) for a specific network weight (i.e., a specific ). We repeat this procedure for all of the five datasets to generate the trend. The slope of the linear trend line best fitting the set of data points for a single dataset (i.e., slope = ) is calculated. In Fig. 6(b), the calculated slope for each dataset is plotted against the complexity of that specific dataset. The straight line best fitting the distributions of the points essentially represents the degree of degradation for that specific network architecture as the equation of the regressed line is , where and are the slope and the y-intercept of the regressed trend line. Similar calculations for the IU score degradation determination are shown in Fig. 6(c) and Fig. 6(d) (i.e., the drop in IU score is shown in Fig. 6(c) and degree of degradation for IU is shown in Fig. 6(d)). Experiments on CUMedVision and UCU-Net are shown in Fig. 7 and Fig. 8, respectively. For all the three examined networks, the calculated and values associated with the F1 and IU scores are tabulated in Table 3.
|CHASE_DB1 (Fraz et al., 2012)||DRIVE (Staal et al., 2004)|
Accuracies obtained on the CHASE_DB1 and DRIVE datasets for different settings on the U-Net architecture are highlighted in Table 4. Observe that for different settings, Specificity (Spe) and Accuracy (Acc) do not show any change. Such behavior can be attributed to the highly imbalance nature of these datasets. Higher number of background pixels dominate the smaller foreground pixels and are not reflected significantly as accuracy drop (Acc). However, with network compression Sensitivity (Sen) decreases significantly and is also shown in Fig. 9 and Fig. 10. The background segmentation quality does not degrade significantly resulting in higher specificity while the foreground segmentation quality shows significant degradation resulting in poor sensitivity for these two datasets. However, since both F1 and IU metrics takes both background and foreground into consideration (as explained in Section 3), the drop in accuracy with compression is correctly captured by these two metrics.
Accuracy obtained for melanoma dataset for different settings on U-Net architecture are highlighted in Table 5. Similar results are also obtained for lymph node and wing disc datasets. Example cases showing qualitative results for different settings are shown in Fig. 11.
5.4. Design Constraints Consideration
Using the design constraints as explained in Section 4, we formulate two test cases to evaluate the effectiveness of our proposed framework.
Test case 1 (memory-constrained best possible accuracy). We consider a disk space budget of 1 MB for a U-Net architecture on the lymph node dataset. Our objective is to obtain a U-Net type architecture which can achieve the best possible accuracy for the disk-space budget of 1MB. Following the framework provided in Section 4.1 we examine both the uniform and layer-wise multipliers for the experiments. The uniform multiplier () is determined using Eq. (6) while the layer-wise multipliers are determined using Eq. (11) and Eq. (13) (modified for the U-Net architecture). Complexity (C), and values are used as shown in Table 1 and Table 3. The layer-wise U-Net filter arrangements for both the uniform and layer-wise multiplier cases are shown in Table 6. Results of both the cases are given in Table 7. For the same disk space budget, layer-wise multiplier achieves a better F1 score than the uniform multiplier based approach. Similar to F1 score, layer-wise multiplier achieves a better IU score compared to the uniform multiplier based approach. Observe that compared to F1, IU shows relatively lower degradation with compression. This can be attributed to the degree of degradation associated with both the accuracy metrics. As shown in Table 3, lambda (slope) associated with F1 is higher compared to that with the IU score.
|Test case 1||Test case 2|
|Test case 1: Objective – higher F1||Uniform multiplier||0.7739||0.8157||5.089|
Test case 2 (accuracy-guided least memory usage). We consider an example constraint of for a U-Net architecture on the lymph node dataset. Our objective is to obtain a U-Net architecture with the least disk-space usage while not dropping its accuracy below 95%. Following the framework provided in Section 4.2, we examine both the uniform and layer-wise multipliers for experiments. The uniform multiplier is determined using Eq. (16) while the layer-wise multipliers are determined using Eq. (17) (modified for the specific architecture). Complexity (C), and values are used as shown in Table 1 and Table 3. The layer-wise filter arrangements for both the uniform and layer-wise multiplier cases are shown in Table 6. Results of both the cases are shown in Table 8. For the same accuracy threshold, layer-wise multiplier significantly outperforms uniform multiplier by compressing the network more. Similar to test case 1, relatively lower degradation in IU score can be attributed to its lower lambda value.
|Test case 2: Objective – lower||Uniform multiplier||0.8278||0.8641||6.834|
The degree of degradation for F1 score (as shown in Table 3) reveals that smaller networks can be pruned less compared to larger networks as the accuracy degrades quickly for smaller networks (e.g., for CUMedVision). However, it is interesting to note that, for U-Net, the IU score degrades relatively similarly as the smaller CUMedVision architecture. We believe that this is caused by the decoder structure of U-Net, in which scale-wise information is not fused to generate the output. This implies that a scale-wise decoder (as in CUMedVision) is more efficient compared to the decoder arrangement of U-Net.
We perform random pruning of trainable weights for each layer of U-Net trained on the lymph node dataset. Results obtained are shown in Fig. 12(a). Pruning 30% trainable weights in the initial layers of the network (e.g., L1), causes significant accuracy reduction. In comparison, pruning 30% of deeper layer weights (which is significantly larger in count compared to the number of pruned weights for initial layers) are more robust as they do not adversely affect accuracy. In Fig. 12 (b), weight reduction achieved by our proposed framework for both the test cases are highlighted. Deeper layers are also more penalized by our approach, which intuitively verifies why initial layers of the network are more vital compared to deeper layers of the network.
Additional experiments to verify the efficacy of our proposed framework are provided as follows.
|Method||Predicted F1||Achieved F1|
|Test case 2 (Ours)||Uniform multiplier||0.8212||0.8278|
|Test case 2 (Ours – )||Uniform multiplier||0.8212||0.8201|
|Test case 2 (Ours + (Iandola et al., 2016))||Uniform multiplier||0.8212||0.8073|
|Test case 2 (Ours + (Zafrir et al., 2019))||Uniform multiplier||0.8212||0.8194|
6.1. Prediction Accuracy
In Table 9, predicted F1 scores and achieved F1 scores for test case 2 are highlighted. For both the uniform and layer-wise multiplier based compressions, the conformity between the achieved F1 scores and the predicted F1 scores highlights the efficacy of our framework. Observe that uniform layer based compression achieves higher F1 scores compared to layer-wise multiplier compression. This is expected since the layer-wise multiplier based compression reduces more trainable weights by pruning a higher number of filters (which is the objective) while adhering to the accuracy constraint.
To further verify the precision of our scheme, we intentionally reduce the multiplier value by a small amount (denoted by in Table 9). Such a reduction in the multiplier prunes one or two additional convolutional filters from the predicted amount in each convolutional layer. We observe from Table 9 that, by further reducing the multiplier value, the network is unable to obey the accuracy constraint, validating the precision of our approach. For the layer-wise multiplier case, the accuracy drop is larger compared to the uniform multiplier case. We believe that there are fewer redundant/ineffective convolutional filters (also less trainable weights) in the network for the layer-wise multiplier case, and hence the drop is larger. Similar behavior is observed when further pruning by (Iandola et al., 2016) or performing (Zafrir et al., 2019).
We perform Squeeze-Net (Iandola et al., 2016) type compression on the U-Net architecture. Our experiments show that such an arrangement degrades accuracy. We think that the method in (Iandola et al., 2016) may not be very suitable for biomedical image segmentation as robust dense features are not extracted well by squeeze-type architecture. Using (Molchanov et al., 2017), we randomly prune a few filters and fine-tune the network. After some iterations of pruning and fine-tuning, accuracy degrades significantly, as shown in Table 10. Results obtained using dynamic quantization (Zafrir et al., 2019) is also highlighted in Table 10. Experiments are also performed on U-Net (Ronneberger et al., 2015) using a uniform multiplier inspired from (Mishra et al., 2019). With an , the method in (Mishra et al., 2019) achieves an F1 score = 0.8561. However, additional trainable weights need to be used by (Mishra et al., 2019) compared to our proposed method. In (Molchanov et al., 2019), channel pruning was explored by estimating the contribution of a filter to the final loss and iteratively pruning filters with smaller scores. Experiments with the best U-Net (Ronneberger et al., 2015) model generate an F1 score = 0.8383 using (Molchanov et al., 2019). Neuron merging was explored in (Kim et al., 2020) to compensate for the information loss caused by filter pruning. As shown in Table 10, neuron merging is capable of improving the segmentation accuracy of the compressed networks generated using our proposed method. Compared to other techniques, our proposed framework generates a better compressed network while adhering to the accuracy constraint of test case 2. Using a pruning factor of 10% with the -norm as the pruning criterion, neuron merging used on the compressed model generated by our proposed layer-wise multiplier attains a better F1 score of 0.8354.
|Base (Ronneberger et al., 2015)||0.8644||7.492|
|Test case 2||Uniform multiplier||0.8278||6.834|
|Pre-training compression||SqueezeNet (Iandola et al., 2016)||0.8267||7.049|
|Post-training compression||Taylor Pruning (Molchanov et al., 2017)||0.8205||7.491|
|Dynamic Quantization (Zafrir et al., 2019)||0.8249||7.492|
|CC-Net (Mishra et al., 2019)||0.8561||6.889|
|Importance Estimation Pruning (Molchanov et al., 2019)||0.8383||7.492|
|Test case 2 + Neuron Merging (Kim et al., 2020)||Uniform multiplier||0.8405||6.834|
6.3. Overall Gain
The overall reduction (R = ) in trainable weights and evaluation latency for all five datasets for a 95% accuracy threshold, approximated using uniform multiplier (as shown in Fig. 6, Fig. 7, and Fig. 8), is plotted in Fig. 13(a) and Fig. 13(b), respectively. Larger complexity results in less compression, indicating a higher requirement in trainable weights for extracting features. Our framework achieves best weight reduction ( on U-Net) and evaluation latency reduction ( on UCU-Net) in the case of the wing disk dataset. The least weight reduction ( on CUMedVision) and evaluation latency reduction ( on CUMedVision) are achieved for the most complex CHASE_DB1 dataset.
6.4. Bottleneck Consideration
One time determination of and (i.e., the degree of degradation) for any CNN architecture is the bottleneck for our approach. Yet, once the degree of degradation is determined, significant reduction in training and evaluation time can be achieved for any dataset, trained and evaluated on the compressed network. We propose that the one time degree of degradation calculation can be performed with an acceptable level of accuracy by using two datasets with three values (). We verify this by performing experiments on the CUMedVision network using the DRIVE and CHASE_DB1 datasets, tracking the F1 score degradation with compression. Results thus obtained are shown in Fig. 14. Using fewer data points to determine the degree of degradation causes only 2.4% change in the value (new = 0.525, new = 0.0116).
In this paper, we presented a new image complexity-guided deep learning based network compression approach for biomedical image segmentation. Instead of the usual practice of compressing CNN architectures after training, we focus on pre-training network compression, exploiting image complexity of the training data. Using the network’s degree of degradation information, we showed that our approach is fast in predicting the compressed network’s accuracy without training, and is effective in generating compressed networks. Our scheme accommodates practical applied design constraints for compressing CNNs for biomedical image segmentation by proposing fine-grain layer-wise multipliers. Such fine-grain control is capable of achieving better compression and better accuracy compared to uniform multiplier based compression techniques. Using five biomedical image segmentation datasets, we verified that our framework is capable of generating compressed networks, retaining up to of the full-sized network segmentation accuracy while utilizing significantly fewer trainable weights (in the range of to less).
This work was supported in part by the National Science Foundation under Grants CNS-1629914, CCF-1640081, and CCF-1617735, and by the Nanoelectronics Research Corporation, a wholly-owned subsidiary of the Semiconductor Research Corporation, through Extremely Energy Efficient Collective Electronics, an SRC-NRI Nanoelectronics Research Initiative under Research Task ID 2698.004 and 2698.005.
- SURF: speeded up robust features. In Computer Vision – ECCV 2006, A. Leonardis, H. Bischof, and A. Pinz (Eds.), Berlin, Heidelberg, pp. 404–417. External Links: Cited by: §2.1.
Deep contextual networks for neuronal structure segmentation.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, Phoenix, Arizona, pp. 1167–1173. Cited by: §1, §5.2.
- Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (ISIC). In 15th IEEE International Symposium on Biomedical Imaging, ISBI 2018, Washington, DC, USA, April 4-7, 2018, USA, pp. 168–172. External Links: Cited by: §5.1.
- An ensemble classification-based approach applied to retinal blood vessel segmentation. IEEE Transactions on Biomedical Engineering 59 (9), pp. 2538–2548. External Links: Cited by: §5.1, Table 4.
MorphNet: fast & simple resource-constrained structure learning of deep networks.
2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, USA, pp. 1586–1595. External Links: Cited by: §3.
- Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. External Links: Cited by: §1.
Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV)NIPSMedical Image Computing and Computer Assisted Intervention – MICCAI 2018ISBI17th IEEE International Symposium on Biomedical Imaging, ISBI 2020, Iowa City, IA, USA, April 3-7, 2020CVPRISBINIPSMICCAIICIPDLMIA 2018, and ML-CDS 2018, Held in Conjunction with MICCAI 2018, ProceedingsNIPS2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI)2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), A. F. Frangi, J. A. Schnabel, C. Davatzikos, C. Alberola-López, and G. Fichtinger (Eds.), ICCV ’15Lecture Notes in Computer Science, Vol. 11045-, USA. External Links: Cited by: §5.2.
- MobileNets: efficient convolutional neural networks for mobile vision applications. Vol. abs/1704.04861. External Links: Cited by: §1.
- SqueezeNet: alexnet-level accuracy with 50x fewer parameters and <1mb model size. Vol. abs/1602.07360. External Links: Cited by: §1, §6.1, §6.2, Table 10, Table 9.
- Neuron merging: compensating for pruned neurons. External Links: Cited by: §6.2, Table 10.
- Adam: a method for stochastic optimization. External Links: Cited by: §5.2.
- Skin lesion analysis towards melanoma detection using deep learning network. Sensors 18 (2), pp. 556. External Links: Cited by: §1, §5.2.
- A new registration approach for dynamic analysis of calcium signals in organs. In 15th IEEE International Symposium on Biomedical Imaging, ISBI 2018, Washington, DC, USA, April 4-7, 2018, USA, pp. 934–937. External Links: Cited by: §5.1.
- Feature pyramid networks for object detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, USA, pp. 936–944. External Links: Cited by: §5.2.
- Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42 (2), pp. 318–327. External Links: Cited by: §5.2.
- Object recognition from local scale-invariant features. In Proceedings of the International Conference on Computer Vision, Kerkyra, Corfu, Greece, September 20-25, 1999, Greece, pp. 1150–1157. External Links: Cited by: §2.1.
- Objective-dependent uncertainty driven retinal vessel segmentation. -, pp. 453–457. External Links: Cited by: §1, §5.2.
- A data-aware deep supervised method for retinal vessel segmentation. USA, pp. 1254–1257. External Links: Cited by: §1, §5.1, §5.2, §5.2.
- CC-NET: image complexity guided network compression for biomedical image segmentation. In 16th IEEE International Symposium on Biomedical Imaging, ISBI 2019, Venice, Italy, April 8-11, 2019, Italy, pp. 57–60. External Links: Cited by: §3, §5.1, §6.2, Table 10.
- Importance estimation for neural network pruning. -, pp. 11256–11264. External Links: Cited by: §6.2, Table 10.
- Pruning convolutional neural networks for resource efficient inference. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, France, pp. –. External Links: Cited by: §1, §6.2, Table 10.
Dense dilated network with probability regularized walk for vessel detection. IEEE Trans. Medical Imaging 39 (5), pp. 1392–1403. External Links: Cited by: §1.
- Solo or ensemble? choosing a CNN architecture for melanoma classification. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2019, Long Beach, CA, USA, June 16-20, 2019, USA, pp. 2775–2783. External Links: Cited by: §1, §5.1.
On the expressive power of deep neural networks.
Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, Proceedings of Machine Learning Research, Vol. 70, Australia, pp. 2847–2854. External Links: Cited by: §1.
- U-net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi (Eds.), Cham, pp. 234–241. External Links: Cited by: §1, §5.2, §6.2, Table 10.
- Measuring visual clutter. Journal of Vision 7 (2), pp. 17. External Links: Cited by: §2.1.
- Ridge-based vessel segmentation in color images of the retina. IEEE Trans. Medical Imaging 23 (4), pp. 501–509. External Links: Cited by: §5.1, Table 4.
- Cascaded fully convolutional networks for automatic prenatal ultrasound image segmentation. In 14th IEEE International Symposium on Biomedical Imaging, ISBI 2017, Melbourne, Australia, April 18-21, 2017, Australia, pp. 663–666. External Links: Cited by: §1.
- Multiscale network followed network model for retinal vessel segmentation. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2018, A. F. Frangi, J. A. Schnabel, C. Davatzikos, C. Alberola-López, and G. Fichtinger (Eds.), Cham, pp. 119–126. External Links: Cited by: §5.1.
- NFN+: A novel network followed network for retinal vessel segmentation. Neural Networks 126 (0893-6080), pp. 153 – 162. Cited by: §1.
- Image complexity and spatial information. In Fourth International Workshop on Quality of Multimedia Experience, QoMEX 2012, Melbourne, Australia, July 5-7, 2012, I. S. Burnett (Ed.), Australia, pp. 12–17. External Links: Cited by: §2.1, §2.1.
- Improving dermoscopic image segmentation with enhanced convolutional-deconvolutional networks. IEEE J. Biomed. Health Informatics 23 (2), pp. 519–526. External Links: Cited by: §1.
- Q8BERT: quantized 8bit BERT. Vol. abs/1910.06188. External Links: Cited by: §1, §6.1, §6.2, Table 10, Table 9.
- Deep supervision with additional labels for retinal vessel segmentation task. Cham, pp. 83–91. External Links: Cited by: §1.
- Decompose-and-integrate learning for multi-class segmentation in medical images. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2019, D. Shen, T. Liu, T. M. Peters, L. H. Staib, C. Essert, S. Zhou, P. Yap, and A. Khan (Eds.), Cham, pp. 641–650. External Links: Cited by: §1, §5.1.
- Coarse-to-fine stacked fully convolutional nets for lymph node segmentation in ultrasound images. In IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2016, Shenzhen, China, December 15-18, 2016, China, pp. 443–448. External Links: Cited by: §1, §2.2.
Improving neural network quantization without retraining using outlier channel splitting. Vol. abs/1901.09504. External Links: Cited by: §1.
- Incremental network quantization: towards lossless cnns with low-precision weights. Vol. abs/1702.03044. External Links: Cited by: §1.
- UNet++: A nested u-net architecture for medical image segmentation. Spain, pp. 3–11. External Links: Cited by: §1.