FilterNet: A Neighborhood Relationship Enhanced Fully Convolutional Network for Calf Muscle Compartment Segmentation

06/21/2020 ∙ by Zhihui Guo, et al. ∙ The University of Iowa 5

Automated segmentation of individual calf muscle compartments from 3D magnetic resonance (MR) images is essential for developing quantitative biomarkers for muscular disease progression and its prediction. Achieving clinically acceptable results is a challenging task due to large variations in muscle shape and MR appearance. Although deep convolutional neural networks (DCNNs) achieved improved accuracy in various image segmentation tasks, certain problems such as utilizing long-range information and incorporating high-level constraints remain unsolved. We present a novel fully convolutional network (FCN), called FilterNet, that utilizes contextual information in a large neighborhood and embeds edge-aware constraints for individual calf muscle compartment segmentations. An encoder-decoder architecture with flexible backbone blocks is used to systematically enlarge convolution receptive field and preserve information at all resolutions. Edge positions derived from the FCN output muscle probability maps are explicitly regularized using kernel-based edge detection in an end-to-end optimization framework. Our FilterNet was evaluated on 40 T1-weighted MR images of 10 healthy and 30 diseased subjects by 4-fold cross-validation. Mean DICE coefficients of 88.00 were achieved for the five 3D muscle compartments.



There are no comments yet.


page 1

page 2

page 4

page 5

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Calf muscle is a skeletal muscle group in the lower leg between the knee joint and the ankle, primarily supporting weight-bearing activities such as walking, running, and jumping. Anatomically, the group can be divided into five individual muscle compartments: Tibialis Anterior (TA), Tibialis Posterior (TP), Soleus (Sol), Gastrocnemius (Gas), and Peroneus Longus (PL) [38], as shown in Fig. 1(a). Volumetric and structural changes of the muscle are important for evaluating muscular disease severity and progression. For example, myotonic dystrophy type 1 (DM1), the most common form of inherited muscular dystrophy in adults, causes severe fatty degeneration of calf muscle in most of the patients [35]. Magnetic resonance (MR) imaging has been widely used in the clinic for muscular disease diagnosis and follow-up evaluation due to its high sensitivity to dystrophic changes [35, 31]. Changes in MR images also correlate with clinical outcome measures potentially serving as imaging biomarkers for clinical research [36]. Current analysis approaches invariably include hand-tracing of individual compartments that is time consuming and less than ideal for clinical trials. Automated segmentation of calf muscle is therefore essential for developing quantitative biomarkers of muscular disease progression and can contribute to its prediction.

A plethora of methods has been developed to separate calf muscle region, subcutaneous adipose tissue (SAT), and intermuscular adipose tissue (IMAT) from the lower leg tissue region, based on which a muscle fat percentage can be obtained as a measurement of fatty degeneration/infiltration that has been shown to have correlation with disease progression [37]. For example, Valentinitsch et al. [33]

applied multi-stage K-means clustering

[21] to segment calf muscle, SAT and IMAT. Amer et al. [3]

first used a fully convolutional network (FCN) to segment the whole muscle mask and then classified healthy muscle and IMAT from the segmented mask by deep convolutional auto-encoder. These whole muscle segmentation approaches rely on the intra-object homogeneity to separate muscle and adipose tissues. Afterward, an overall muscle fat percentage can be obtained. However, specific muscle compartments may be more affected in different neuromuscular diseases (e.g., the posterior compartment shows initial changes in this population

[16]). Assessing the changes in individual muscle compartments may be more sensitive than measuring change in the whole calf. In order to study the disease progression in individual muscle compartments [13, 2, 8], segmentation of individual muscle compartments is critical.

Fig. 1: Examples of T1-weighted MR images of calf muscle cross sections. Each panel shows the original image and the corresponding expert segmentations for TA, TP, Sol, Gas and PL. (a–c) Normal subjects. (d–e) Patients with severe DM1.

Different from the entire muscle-region segmentation, automated segmentation of individual muscle compartments is a challenging task due to the unique characteristics of MR muscle images as shown in Fig. 1. All non-diseased muscle compartments have similar appearances while the MR bias field exists across the whole image region. Besides, muscular dystrophy can introduce substantial appearance changes to a part or the whole compartment (Fig. 1d-e). Therefore, identifying individual compartments using only local characteristics is unrealistic. On the other hand, shape model based approaches are unsuitable due to large shape variations and deformations caused by the disease as well as by patient’s positioning in the scanner.

Attempts to segment individual muscle compartments on MR images are rare. Essafi et al. [11] used a landmark-based approach with diffusion wavelets to represent shape variations for 3D segmentation of TA and PL and achieved a mean DICE coefficient of 0.55. Wang et al. [34] encoded shape prior by a point distribution model in a higher-order Markov Random Field framework to segment the medial Gas compartment on the same dataset used in [11] and obtained an averaged landmark error around 7 mm. Commean et al. [8] presented a semi-automated method to segment five individual muscle compartments by thresholding and edge detection to study MR imaging measurement reproducibility. Troter et al. [19] used a multi-atlas registration approach for individual muscle segmentation in quadriceps femoris and achieved an averaged DICE of 0.87 0.11 on an MRI dataset of healthy young adults. Rodrigues et al. [26] adopted a two-stage mechanism to segment individual muscle compartments by first identifying all muscle voxels using Adaboost classifier and then registering the muscle mask to a reference atlas for muscle compartment labeling. However, the individual muscle compartment segmentation results were not visually promising and the accuracies were not reported.

Compared with traditional techniques, FCN has a large model capacity to learn complex representations and enables pixel-to-pixel training, which makes it suitable for this application. To overcome difficulties mentioned above, an FCN with strengthened neighborhood relationship is desired. Many methods have been reported that imposed high-level neighborhood-aware or edge-aware relationships to either refine FCN outputs or directly change FCN internal architectures. Bauer et al. [4] adopted a Conditional Random Field strategy to regularize classification results for brain tumor segmentation. Similarly, Guo et al. [14] applied a topology-wise graph to refine FCN output for pancreatic tumor segmentation. Both Chen et al. [6] and Shen et al. [28] used a multi-task FCN to predict object region and edge maps simultaneously for histological object segmentation and brain tumor segmentation respectively, with each task regularized by a cross entropy loss term. Recently, Kampffmeyer et al. [17] proposed a single-task FCN to predict pixel level connectivity maps based on -neighborhood (=4, 8) relationships, and reverted the prediction to segmentation masks using pixel-pair connectivity agreement.

In this paper, we propose a novel neighborhood relationship-aware FCN, called FilterNet, for 3D segmentation of all five calf muscle compartments. We enhance neighborhood relationships in two ways: efficiently enlarge convolution receptive field and explicitly derive object boundaries directly from object prediction maps in an end-to-end training optimization framework. Specifically, by enlarging the convolution receptive field, information in a larger neighborhood is taken into consideration when generating the prediction for each central voxel, increasing the model robustness. Additionally, we use kernel-based edge detectors on the prediction maps to regularize the voxel-level probability dissimilarity inside a neighborhood region defined by the kernel size. Motivations behind such kernel edge detector-based constraints are 3-fold. First, the sizes of medical datasets are often small, which makes it not favorable to learn edge regularization from scratch. Edge detectors assess pre-defined neighbor relations (often using derivative formulas) and can be regarded as high-level initialization of the edge regularization module. Second, kernel edge detectors played an important role in medical image segmentation approaches over the past several decades, and they are compatible with CNN end-to-end training. Third, this mechanism provides flexibility to further fine-tune hyper-parameters of the kernel, by making the parameters of interest trainable.

Compared with previously reported approaches, our work has several contributions: a) we report a fully automated approach for 3D segmentation of five calf muscle compartments simultaneously; b) by considering the similar textures shared by individual muscles, we are able to efficiently enlarge the receptive field and impose edge constrains in an end-to-end training manner; c) our method is robust in MR images from both healthy subjects and patients with DM1.

To the best of our knowledge, this is the first automated approach for five calf muscle compartments segmentation. Methodologically, our work is the first attempt to regularize neighborhood relationships in the form of kernel-based edge detection on prediction maps that allows direct back-propagation.

Ii Methods

Ii-a Pre-processing

In the pre-processing step, a bias field correction method described in [32]

is first applied to reduce image intensity inhomogeneity by estimating bias fields as Gaussian distributions and maximizing the high-frequency content of the estimated unbiased image. Each image is further normalized to zero mean and unit variance to reduce inter-subject variations.

Afterwards, to remove large portions of background and reduce model complexity, we utilize Otsu’s thresholding [24] and K-means clustering (k=2) to localize and separate left and right leg-areas. All right legs are mirrored to conform to left legs.

The workflow and dimensional change of images are shown in Fig. 2. Note that the pre-processing step is completely unsupervised.

Fig. 2: Workflow of the pre-processing step. After Bias field correction, Otsu’s thresholding, and K-means clustering, each leg-area is localized on the original MR images. The leg-areas are then extracted and resized to get a uniform dimensionality. Right leg is mirrored to left.

Ii-B FilterNet

A typical FCN often consists of an encoder and a decoder. In the encoding phase, multiple levels of features are extracted from the raw input by down-sampling operations to obtain deep and compact representations. The deep representations are then up-sampled to the full resolution in the decoding phase. During the down-sampling and up-sampling steps, a significant problem is the loss of resolution and the associated loss of fine details. UNet [27] and RetinaNet [20] include long skip connections to concatenate encoding features to decoding features at the same feature scale to restore information lost during down-sampling. FC-ResNet [9] further uses residual blocks with identity mapping introduced by ResNet [15] to allow direct gradient back-propagation to earlier layers.

Considering the numerous successes that UNet based neural networks achieved in medical image segmentation problems [14, 3], we based our FilterNet on the UNet architecture. The network details of both the UNet we used and our FilterNet are shown in Fig. 3. There are mainly two differences between the two networks. First, in order to preserve fine details and enlarge receptive field during the encoding phase, block B is used in the FilterNet. The unique block B in the FilterNet utilizes short skip connections to enable gradient identity mapping and therefore preserve fine details. It also provides a portal to increase convolution kernel size that allows the network to take account of a broadened view and utilize contextual information from a large neighborhood. However, a large kernel size significantly increases the number of parameters. To avoid this, FilterNet reduces feature channels in block B at down-sampled scales by a factor of 2 compared with block A used in UNet. Second, the FilterNet further employs an edge gate to extract localized neighborhood relationships in the form of edge detection that allows gradient back-propagation. Different from other works that predict object edges as an extra task to be fused with predictions of object regions [6, 28], FilterNet derives object edge information directly from the object region probability maps through the edge gate and regularizes the derived edge relationship using true edges. In this way, the neighborhood relationship is directly encoded into the region-based probability maps within a single-task FCN, without introducing a number of extra training parameters. In addition, FilterNet implements edge constraints in an end-to-end training manner, it avoids designing and fine-tuning an additional framework for post-processing purpose as [4] and [14]

did. Details of block A, block B, edge gate, and loss function are described as follows.

Ii-B1 Backbone blocks

Both blocks A and B work as backbone blocks for the FilterNet, as shown in Fig. 3(c). Block A consists of double layers of convolution () with kernel size


, stride

), batch normalization (BN), and rectified active function (ReLU). Block B adds one more

layer with undecided kernel size at the beginning, and a short skip connection of addition to achieve identity mapping. During the encoding phase, the kernel sizes in block B are set as 7, 5, and 3 for three scales of features to enlarge convolution receptive field. As a result [10], FiterNet increases receptive filed size by 22 voxels along each dimension. Note that due to padding and size restriction, the actual increased receptive field size along axis is less than 22 voxels.

Fig. 3: UNet (a) vs. FilterNet (b) for multi-class calf muscle segmentation. The two networks have the same input and output. [C, W, H, D]: channel, width, height, depth. “BG” represents background. (c) Details of block A, block B, and edge gate in FilterNet. An example is shown along with edge gate flow chart.

Ii-B2 Edge gate

Instead of stopping at FCN output of object pixel-wise probability maps as UNet does, an extra step is used in the FilterNet for the edge gate to directly and dynamically derive the true and predicted edge information, respectively, from the ground truth map and the output probability map and to impose constraints on the predicted edges. For the edge gate to support end-to-end training, kernel-based edge detections are used as convolutions with designed kernels inside the network. In this study, Laplacian of Gaussian (LoG) [30] is used. Suppose an input image patch is denoted as , the function of LoG is defined as


where and represent the Laplacian kernel and the Gaussian smoothing kernel and is the convolution operation. Since finds double edges, while the boundary of muscle compartments in this application is not sufficiently clear to define inside and outside edges, we add an active function to remove the negative edges, and obtain a new function

. To increase the flexibility of our edge gate and reduce the dependency on handcrafted parameters, the standard deviation

of the Gaussian smoothing function is made trainable. Thus, under the condition of a , the edge gate respond function can be written as


where is a variant of the non-linear hard tanh function that restricts the input value into range such that

Fig. 4: Cross-sectional segmentation examples overlaid with MR images. (a) Original scan. (b) Ground truth. (c) UNet. (d) Boundary-Aware FCN. (e) FilterNet. Each row represents a 2D cross-sectional example from a different image.

The MR images used in this study have in-plane (on - plane) resolution of 0.7 mm and slice thickness (along direction) of 7 mm. Therefore, convolution kernels and are defined as below and applied to - slices.


where is a constant adjusting factor under a given that ensures . Element in is defined as,


In Eq. 6, .

The gradient back-propagation keeps the fixed kernel unchanged and updates kernel with trainable . For the edge gate, an explicit neighborhood relationship is derived from the probability maps.

Ii-B3 Loss function

For an input image , the FilterNet predicted output map can be denoted as,


where is the encoder that uses block B’s and is the decoder that consists of block A’s.

The loss function of FilterNet consists of two terms, for region learning amd for edge constraining, balanced by a weighting factor such that


where is a multi-class cross-entropy loss defined as


where ,

is the one-hot encoded region label for class

, and is the corresponding predicted map for class . represents the least absolute error (L1-norm) loss between the true edge maps and the derived edge maps. is a weighting array.


During gradient back-propagation, the partial derivative of loss in terms of is,


According to the derivative chain rule,


where can be obtained from the differentiation of the Gaussian smoothing layer, and is calculated according to the definition of in Eq. 6.

The final multi-class classification output can be defined as the indices of the maximum values on probability maps along the channel dimension.


is optimized by stochastic gradient descent

[5]. The initial learning rate is

, which is divided by 5 every 10 epochs. In order to increase the robustness and generalization of the network, the input training patches are sub-regions sized

, cropped from the localized leg-areas (described in Fig. 2) with a step size of 20 voxels along x and y directions, which results in 9 times as large the number of the training patches as that of the leg-areas. The batch size is set as 2. We train the network with 30 epochs. Data augmentation is performed, where 3 more patches are generated for each training patch. Namely, a rotation value is randomly chosen between to , two scaling factors are randomly chosen between 0.8 and 1.2 along x and y directions. The initial value of is 0.001, multiplied by 10 every 10 epochs. is for the 6 classes of background, TA, TP, Sol, Gas, and PL, respectively. The initial value of is 1.

Iii Experiments and Results

Iii-a Experimental Setting

40 lower leg T1-weighted MR images of 40 subjects (10 were healthy, 30 with DM1) were included in this work. The original image size is and the voxel size is  mm. The acquisition of these images used the first echo of a 3-point Dixon gradient echo sequence with repetition time (TR) 150 ms, echo time (TE) 3.5 ms, field of view (FOV) 36 cm, bandwidth 224 Hz/pixel, and scan time 156 s. Expert-traced muscle compartment segmentations served as the independent standard.

Besides FilterNet, we also designed several other neural networks for performance evaluation and comparison purposes. In Section III-B, based on the differences between UNet and FilterNet, an ablation study was conducted to show the effectiveness of block B and edge gate. Then in Section III-C, thorough performance comparisons were presented among the UNet, a multi-task FCN that aggregates region and edge predictions, and FilterNet to demonstrate the superiority and efficiency of the FilterNet

. All neural networks were implemented using the PyTorch platform

[25] and applied to the same dataset. The training parameters were identical for these methods. The models were trained on Nvidia GeForce GTX 1080 Ti GPU with 8 GB of memory.

Given a limited-size dataset, 4-fold cross-validation (with randomization in subject scale) was used to evaluate the performance of each method. DICE Similarity Coefficient (DSC) and absolute surface-to-surface distance (ASSD, in mm) from the automated surface to the manual surface were used as evaluation metrics. For each subject, the performance was averaged for left and right legs.

Iii-B Ablation Study

Fig. 5: Mean values of DICE and ASSD (in mm) from UNet, BUNet, UNet- and FilterNet.
Fig. 6: 3D demonstration of muscle compartment segmentations from the same leg. Each row represents a method while each column represents a view category. (a) Ground truth. (b) UNet. (c) Boundary-Aware FCN. (d) FilterNet. The first three columns are three orthogonal views of the five muscle compartment segmentations, followed by five columns showing individual segmentations. Note the extra objects generated by UNet and Boundary-Aware FCN for TA, TP, Gas and PL.

There are two differences between FilterNet and UNet, the usage of block B and edge gate. In order to reveal the performance improvement introduced by block B and edge gate respectively, ablation experiments that included UNet, UNet with block B, UNet with edge gate, and FilterNet were conducted. Therefore, in addition to FilterNet, the other three were described as follows.

  • UNet: The details of UNet architecture used in this application are displayed in Fig. 3(a). UNet output is


    where is the encoder consisting of block A’s. The loss function is a multi-class cross-entropy loss the same as in Eq. 9.

  • BUNet: BUNet utilizes block B’s in the encoding path of UNet to enlarge convolution receptive field. BUNet output is the same as FilterNet output in Eq. 7. However, compared with FilterNet, BUNet does not have edge gate. Therefore, in Eq. 8 is set as 0, such that the loss function here is

  • UNet-: Similarly, UNet- and UNet share the same network architecture. The prediction output of UNet- is the same as in Eq. 13. However, different from UNet, edge gate is added onto in UNet-. Thus the loss function of UNet- is the same as in Eq. 8.

Fig. 5 shows mean values of DICE and ASSD for five individual muscle compartments obtained from the aforementioned approaches. Overall, FilterNet achieved the best accuracies in terms of DICE and ASSD among the four methods. Both BUNet and UNet- outperformed UNet. However, UNet- had better surface positioning accuracy than BUNet.

Iii-C Performance Comparison

Methods UNet Boundary-Aware FCN FilterNet
MeanSTD value* MeanSTD value* MeanSTD value*
TA DSC 89.8611.07 0.033 90.5113.22 0.333 91.2910.11 /
ASSD 2.172.00 0.001 1.801.97 0.001 1.040.81 /
TP DSC 87.206.89 0.007 88.116.16 0.043 89.464.19 /
ASSD 2.151.39 0.001 2.101.56 0.002 1.420.66 /
Sol DSC 86.058.94 0.001 87.098.50 0.041 88.007.92 /
ASSD 2.641.87 0.001 2.351.87 0.001 1.660.82 /
Gas DSC 87.8110.19 0.017 88.339.84 0.075 89.507.99 /
ASSD 2.441.99 0.001 2.382.17 0.001 1.601.29 /
PL DSC 86.4512.36 0.001 89.2211.12 0.568 89.4912.44 /
ASSD 2.933.37 0.001 1.641.60 0.007 1.141.02 /

Paired t-test with FilterNet (significance level

TABLE I: DSC and ASSD (meanstd) for five calf muscle compartments from UNet, Boundary-Aware FCN and FilterNet. The unit for ASSD is mm. Statistical significance in bold.

In this section, the performance of FilterNet was thoroughly compared with the performance of UNet and a multi-task FCN, called Boundary-Aware FCN.

  • Boundary-Aware FCN: Boundary-Aware FCN follows the basic idea of the kind of FCN proposed in [28], where the network integrates the predictions for region and edge maps explicitly. The schematic of Boundary-Aware FCN used in this application is shown in Fig. 7. The input patches, encoder and decoders are the same as those in UNet. One decoder attempts to predict region maps while the other decoder learns the corresponding edge maps. Then the predicted region and edge maps are concatenated and fed into several , , and layers (denoted as ) to get the final region prediction . Thus, can be described as,


    where is the concatenation operation, is the predicted region map, and is the predicted edge map.

    The loss function includes three cross-entropy loss terms, region loss , edge loss , and a final combined loss .


    where each is the corresponding label map. Note that our implementation of Boundary-Aware FCN uses the same encoder and decoder as described in our baseline UNet design, which is different from the original description in [28].

Fig. 7: The schematic of Boundary-Aware FCN.

As a result, Table I summarizes DSC and ASSD between the automated segmentations and the independent standard on five calf muscle compartments for the UNet, Boundary-Aware FCN and FilterNet. Compared with UNet, FilterNet generated significantly better results for each compartment in terms of both DICE and ASSD. FilterNet was also shown to have significant differences from Boundary-Aware FCN in DICE for TP and Sol and in ASSD for each muscle compartment.

From top to bottom, Fig. 4 displays four 2D segmentation examples from images of four patients, with each representing a unique situation. The first example shows a normal subject with calf muscle surrounded by a thick SAT layer. In this case, Gas segmented by UNet had leakage into Sol and SAT, while Gas and Sol segmented by Boundary-Aware FCN also leaked into the SAT layer. The second example is from an MR image of a patient with severe DM1, where TA segmentation obtained from UNet spread into TP and PL, as well as holes existing in Sol. TP segmentation from Boundary-Aware FCN had false positives in true PL, and a hole appeared in Gas. The third example shows notable intensity inhomogeneity around the boundary of TP and Sol. Both UNet and Boundary-Aware FCN were sensitive to the inhomogeneity. The last example presents a 2D cross-sectional slice that is near to one end of the lower leg, where the muscle compartment boundaries are complicated and tough to identify. UNet and Boundary-Aware FCN generated strangely shaped Sol and some voxels inside the true TA were misclassified as TP. In contrast to the situations happened to UNet and Boundary-Aware FCN, though the segmentations from FilterNet were not always perfect, FilterNet was able to relief these problems and appeared to be more robust to image inhomogeneity and object shape maintenance on these examples.

Fig. 6 lists the 3D shapes of the muscle segmentations from the same leg of a patient. From the - view, FilterNet generated smoother and topologically superior segmentations. When observing the individual muscle compartment models, UNet and Boundary-Aware FCN showed obvious region leakages of TA, TP, and PL, while FilterNet segmentations were free from such leakages.

Table II compared the number of model parameters, memory usage and averaged training time per epoch. FilterNet has the lowest number of parameters and memory usage, and UNet ran the fastest during training.

# parameters
Training time
UNet 1.97 4.67 11
Boundary-Aware FCN 2.39 6.01 25
FilterNet 1.44 3.76 16
TABLE II: Number of model parameters, memory usage, and averaged training time per epoch for UNet, Boundary-Aware FCN and FilterNet. Best performance in bold.

Iv Discussion

Iv-a Ablation Study

The superiority of BUNet against UNet as shown in Fig. 5 indicates the effectiveness of block B. By using B-blocks in the BUNet, the convolution receptive field was enlarged notably. However, the calculated ASSD index for Gas segmentation by BUNet became worse compared with UNet. Gas has a long and flexible shape on the - plane and also shares similar appearance patterns with nearby objects, when neighborhood was broadened while proper edge-aware regulations were lacking, false positives might have been increased across the whole scope and reflected in worsened surface positioning accuracy.

When edge gate was added, UNet- generated segmentations with apparent improvement in ASSD over UNet. The loss term associated with edge gate in Eq. 10 implies that a) when a pixel is not close to an object boundary, either inside or outside of the object, the probability values of all voxels located inside the neighborhood that belongs to the same class should be similar while b) when a pixel is close to an object boundary, probability value differences between such voxels may show dissimilarity. Therefore, with edge gate, UNet- was able to impose neighborhood-wise constraints to the predicted output and achieved more topologically correct results.

Compared with UNet, FilterNet incorporated with both block B and edge gate, and reached a new level of accuracy. As we mentioned earlier, when neighborhood is broadened, edge-aware regulations are needed to reduce false positives and strengthen boundary information. Likewise, when edge-aware regulations are presented, broadened neighborhood helps reduce sensitivity to local noise and results in improved segmentation accuracy. Therefore, the two mechanisms can benefit from each other and lead to the best performance of FilterNet in the ablation study.

Iv-B FilterNet vs. UNet, Boundary-Aware FCN

UNet is a single-task FCN that includes only one decoder to learn object region maps, while Boundary-Aware FCN is a multi-task FCN with two decoders to learn region and edge maps respectively and extra layers to fuse the output of the two tasks, with the cost of substantially increased numbers of parameters and computational time, as shown in Table II. With emphasized boundary information and integration of region and edge learning, Boundary-Aware FCN achieved superior performance in terms of DICE and ASSD compared with UNet (Table I).

Both FilterNet and Boundary-Aware FCN intended to take advantage of object boundary to improve object segmentation accuracy. Boundary-Aware FCN learns object boundary from scratch and regards predicted boundary maps as extra feature channels into region learning. While FilterNet derives object boundary directly from the predicted region maps and encodes the neighborhood-wise relationship by designed edge detectors with little cost. Instead of learning from scratch, FilterNet avoided introducing a lot of extra parameters to learn the desired edge pattern. The edge detectors turned out to be more efficient than an edge-learning decoder, from the numerical and visual comparisons in Table I, Fig. 4, and Fig. 6. There are two potential reasons behind such phenomena. First, given a small training dataset, a model with a huge amount of parameters may result in over-fitting, and thus the performance may be compromised. Second, calf muscle compartment boundaries are hard to learn, due to the fact that all muscle compartments share similar appearance patterns. Besides, edge-like appearances caused by intramuscular nerves and vessels [12, 29] can be misleading.

Fig. 4 and Fig. 6 further demonstrated that when multiple objects are close to each other and share very similar textures, optimization that is based on pure pixel-level classification loss may cause disjoint regions or holes inside the true object. When neighborhood-wise dissimilarity penalty were added for voxels away from the boundary, as edge gate did in FilterNet, the situations of disjoint regions and holes were mitigated.

Iv-C Approach and Future Work

The application of calf muscle compartment segmentation represents a category of multi-class segmentation problems, where nearby objects are next to each other and have very similar textures. In order to segment each object accurately and at the same time maintain object shape topology, we proposed FilterNet as a neighborhood relationship enhanced FCN that has broadened convolution receptive field and an edge detector based gate to apply constraints directly to the probability maps in an end-to-end training manner.

Besides increasing convolution kernel size as we did with block B, there are other ways to broaden the receptive field or integrate context information from a larger scope. For example, dilated convolutions [39] can be applied to enlarged ranges without increasing filter sizes. Attention [7] has the potential to take into account of multi-scale features simultaneously by trainable weights. Exploration of more efficient ways to enlarge feature neighborhood will remain as future work.

For the edge gate used in our FilterNet, Laplacian edge detector and Gaussian smoothing kernel with trainable were applied to derive object edges. After optimization, we obtained a of 0.89, 0.85, 0.92 and 0.90 pixel for each fold, respectively. We have also explored a sole use of the Laplacian edge detector in the edge gate, with fixed () while leaving largest connected component for each label of the results to be generated by as a post processing step. Note that only keeping the largest connected component is only feasible when the desired object is known to be a single region while our method does not have this restriction. It turns out that though the performance differences in terms of DSC are small, the performance differences in terms of ASSD are notable. As the ASSD values shown in Table III, our current FilterNet outperforms the other tested methods. This means that the usage of Gaussian smoothing makes the edge gate more robust to noise and the neural network has the ability to fine-tune hyper-parameters like to further improve the performance. Finding advanced convolution-based detectors with trainable hyper-parameters is also worth exploring in the future.

Method TA TP Sol Gas PL
1.57 1.85 2.28 2.24 2.02
1.48 1.84 2.11 2.15 1.85
& LLC 1.01 1.56 1.69 1.65 1.21
1.04 1.42 1.66 1.60 1.14
TABLE III: Averaged ASSD for each muscle compartment. : FilterNet with only Laplacian kernel used in the edge gate. : FilterNet with fixed =1 in the Gaussian kernel. & LLC: leaving largest connected component for each label is applied to the results of . : FilterNet with trainable . Best performance in bold.

In addition to reducing the sensitivity to noise, Gaussian smoothing also increases edge response area sizes, given extremely small portion of edge voxels in a volume. Due to the sparsity of edge response areas and the optimization efficiency of L1-norm in this case [22], the L1-norm was used in Eq. 10 instead of the L2-norm.

Use of edge constraints in a light-weight manner was previously considered, e.g., by Ronneberger et al. [27] who computed a weight map that highlighted the edge areas and calculated the weighted cross-entropy loss. This approach increases the importance of edge areas, but the weight map calculation is based on ground truth only. Our edge gate is applied to prediction maps during each back-propagation, thus has a high penalty in holes and undesired disjoint objects that may otherwise appear during prediction. In [1]

, deep features are used to directly learn shape parameters that reflect shape differences from a mean shape representation for cervical vertebrae on X-ray images. In this way, disjoint regions are eliminated and the edge is smooth. Cervical vertebrae are rigid with stable (position-independent) shapes, while muscle compartments consist of soft tissues that may change their shapes dramatically in response to changes in position, large deviations from the mean shape representation result. A boundary loss term based on the summation of nearest distances from segmentation pixels to the true boundary in level set representation was proposed in

[18] with pixels given penalties according to the distance from the true boundary. However, since the calculation is only carried out for pixels segmented as the target, regions corresponding to holes inside the prediction do not have any penalty associated with them. As an highly relevant improvement, our method penalizes both holes and regions away from the true edges while being computationally efficient.

Different from muscle segmentation by atlas-based methods applied to MR images from homogeneous populations [23], the dataset of 40 images we used in this study was quite diverse and included unaffected patients (10/40) and patients with DM1 (30/40). For patients with DM1, the posterior compartment muscles, gastrocnemius and soleus, are usually the earliest and eventually most severely affected muscles. The changes are not limited to these muscles. However, for the purpose of measuring changes over time for eventual clinical trials, being able to track muscles that are less severely affected may be equal or more important than those that are affected earliest and most severely in DM1. In this work, we have shown the feasibility and robustness of our approach in segmenting individual calf muscle compartments in normals and when DM1 was presented. In the future, for each muscle compartment segmentation, disease-affected regions can be easily clustered from the healthy muscle structure to develop compartment-based biomarkers guiding a disease progression study in the clinic.

V Conclusion

A neighborhood relationship enhanced FCN was reported and applied to individual calf muscle compartment segmentation on T1-weighted MR images. With an increased convolution receptive field, resolution-preserving skip connections, and explicitly edge-aware regulations by a kernel-base edge gate to constrain pixel-level probability values inside a neighborhood, our FilterNet

outperformed all other state-of-the-art deep-learning approaches tested here in both pixel-level label prediction and surface positioning. When focusing on the calf-muscle segmentation task, our novel

FilterNet approach delivered a striking performance improvement (DSC0.88, ASSD1.66 mm) over previously-reported methods (DSC0.55, ASSD7 mm), suggesting clinical-use feasibility of automated calf-compartment segmentation.


We would like to thank Eric Axelson for preparing the data and Ashley Cochran for performing the manual tracings.


  • [1] S. M. R. Al Arif, K. Knapp, and G. Slabaugh (2018) Spnet: shape prediction using a fully convolutional neural network. In MICCAI, pp. 430–439. Cited by: §IV-C.
  • [2] H. Alizai, L. Nardo, D. C. Karampinos, G. B. Joseph, S. P. Yap, T. Baum, R. Krug, S. Majumdar, and T. M. Link (2012) Comparison of clinical semi-quantitative assessment of muscle fat infiltration with quantitative assessment using chemical shift-based water/fat separation in MR studies of the calf of post-menopausal women. European radiology 22 (7), pp. 1592–1600. Cited by: §I.
  • [3] R. Amer, J. Nassar, D. Bendahan, H. Greenspan, and N. Ben-Eliezer (2019) Automatic segmentation of muscle tissue and inter-muscular fat in thigh and calf MRI images. In MICCAI, pp. 219–227. Cited by: §I, §II-B.
  • [4] S. Bauer, L. Nolte, and M. Reyes (2011)

    Fully automatic segmentation of brain tumor images using support vector machine classification in combination with hierarchical conditional random field regularization

    In MICCAI, pp. 354–361. Cited by: §I, §II-B.
  • [5] L. Bottou (2010)

    Large-scale machine learning with stochastic gradient descent

    In Proceedings of COMPSTAT’2010, pp. 177–186. Cited by: §II-B3.
  • [6] H. Chen, X. Qi, L. Yu, Q. Dou, J. Qin, and P. Heng (2017) DCAN: Deep contour-aware networks for object instance segmentation from histology images. Medical Image Analysis 36, pp. 135–146. Cited by: §I, §II-B.
  • [7] L. Chen, Y. Yang, J. Wang, W. Xu, and A. L. Yuille (2016) Attention to scale: scale-aware semantic image segmentation. In CVPR, pp. 3640–3649. Cited by: §IV-C.
  • [8] P. K. Commean, L. J. Tuttle, M. K. Hastings, M. J. Strube, and M. J. Mueller (2011) Magnetic resonance imaging measurement reproducibility for calf muscle and adipose tissue volume. J Magn Reson Imaging 34 (6), pp. 1285–1294. Cited by: §I, §I.
  • [9] M. Drozdzal, G. Chartrand, E. Vorontsov, M. Shakeri, L. Di Jorio, A. Tang, A. Romero, Y. Bengio, C. Pal, and S. Kadoury (2018) Learning normalized inputs for iterative estimation in medical image segmentation. Medical Image Analysis 44, pp. 1–13. Cited by: §II-B.
  • [10] V. Dumoulin and F. Visin (2016) A guide to convolution arithmetic for deep learning. arXiv preprint arXiv:1603.07285. Cited by: §II-B1.
  • [11] Essafi, Salma and Langs, Georg and Deux, Jean-François and Rahmouni, Alain and Bassez, Guillaume and Paragios, Nikos (2009) Wavelet-driven knowledge-based MRI calf muscle segmentation. In ISBI, pp. 225–228. Cited by: §I.
  • [12] K. FUXE and G. SEDVALL (1965) The distribution of adrenergic nerve fibres to the blood vessels in skeletal muscle. Acta physiologica Scandinavica 64 (1-2), pp. 75–86. Cited by: §IV-B.
  • [13] S. Gourgiotis, C. Villias, S. Germanos, A. Foukas, and M. P. Ridolfini (2007) Acute limb compartment syndrome: a review. Journal of Surgical Education 64 (3), pp. 178–186. Cited by: §I.
  • [14] Z. Guo, L. Zhang, L. Lu, M. Bagheri, R. M. Summers, M. Sonka, and J. Yao (2018) Deep LOGISMOS: deep learning graph-based 3D segmentation of pancreatic tumors on CT scans. In ISBI, pp. 1230–1233. Cited by: §I, §II-B.
  • [15] K. He, X. Zhang, S. Ren, and J. Sun (2016) Identity mappings in deep residual networks. In ECCV, pp. 630–645. Cited by: §II-B.
  • [16] L. Heskamp, M. van Nimwegen, M. J. Ploegmakers, and et al. Lower extremity muscle pathology in myotonic dystrophy type 1 assessed by quantitative MRI. Neurology 92 (24), pp. e2803––e2814. Cited by: §I.
  • [17] M. Kampffmeyer, N. Dong, X. Liang, Y. Zhang, and E. P. Xing (2019) ConnNet: A long-range relation-aware pixel-connectivity network for salient segmentation. TIP 28 (5), pp. 2518–2529. Cited by: §I.
  • [18] H. Kervadec, J. Bouchtiba, C. Desrosiers, E. Granger, J. Dolz, and I. B. Ayed (2019) Boundary loss for highly unbalanced segmentation. In MIDL, pp. 285–296. Cited by: §IV-C.
  • [19] A. Le Troter, A. Fouré, M. Guye, S. Confort-Gouny, J. Mattei, J. Gondin, E. Salort-Campana, and D. Bendahan (2016) Volume measurements of individual muscles in human quadriceps femoris using atlas-based segmentation approaches. Magnetic Resonance Materials in Physics, Biology and Medicine 29 (2), pp. 245–257. Cited by: §I.
  • [20] T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár (2017) Focal loss for dense object detection. In ICCV, pp. 2980–2988. Cited by: §II-B.
  • [21] J. MacQueen et al. (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, pp. 281–297. Cited by: §I.
  • [22] L. Melkumova and S. Y. Shatskikh (2017) Comparing ridge and lasso estimators for data analysis. Procedia engineering 201, pp. 746–755. Cited by: §IV-C.
  • [23] H. Nguyen, P. Croisille, M. Viallon, S. Leclerc, S. Grange, R. Grange, O. Bernard, and T. Grenier (2019) Robustly segmenting quadriceps muscles of ultra-endurance athletes with weakly supervised U-Net. Cited by: §IV-C.
  • [24] N. Otsu (1979) A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics 9 (1), pp. 62–66. Cited by: §II-A.
  • [25] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer (2017) Automatic differentiation in PyTorch. In NIPS Autodiff Workshop, Cited by: §III-A.
  • [26] R. Rodrigues and A. M. Pinheiro (2019) Segmentation of skeletal muscle in thigh Dixon MRI based on texture analysis. arXiv preprint arXiv:1904.04747. Cited by: §I.
  • [27] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: Convolutional networks for biomedical image segmentation. In MICCAI, pp. 234–241. Cited by: §II-B, §IV-C.
  • [28] H. Shen, R. Wang, J. Zhang, and S. J. McKenna (2017) Boundary-aware fully convolutional network for brain tumor segmentation. In MICCAI, pp. 433–441. Cited by: §I, §II-B, 1st item, 1st item.
  • [29] S. Sinha, U. Sinha, and V. R. Edgerton (2006)

    In vivo diffusion tensor imaging of the human calf muscle

    Journal of Magnetic Resonance Imaging: An Official Journal of the International Society for Magnetic Resonance in Medicine 24 (1), pp. 182–190. Cited by: §IV-B.
  • [30] M. Sonka, V. Hlavac, and R. Boyle (2014) Image processing, analysis, and machine vision. Cengage Learning. Cited by: §II-B2.
  • [31] R. Stramare, V. Beltrame, R. Dal Borgo, L. Gallimberti, A. Frigo, E. Pegoraro, C. Angelini, L. Rubaltelli, and G. Feltrin (2010) MRI in the assessment of muscular pathology: a comparison between limb-girdle muscular dystrophies, hyaline body myopathies and myotonic dystrophies. La Radiologia Medica 115 (4), pp. 585–599. Cited by: §I.
  • [32] N. Tustison and J. Gee (2009) N4ITK: Nick’s N3 ITK implementation for MRI bias field correction. Insight Journal 9. Cited by: §II-A.
  • [33] A. Valentinitsch, D. C. Karampinos, H. Alizai, K. Subburaj, D. Kumar, T. M. Link, and S. Majumdar (2013) Automated unsupervised multi-parametric classification of adipose tissue depots in skeletal muscle. Journal of Magnetic Resonance Imaging 37 (4), pp. 917–927. Cited by: §I.
  • [34] C. Wang, O. Teboul, F. Michel, S. Essafi, and N. Paragios (2010) 3D knowledge-based segmentation using pose-invariant higher-order graphs. In MICCAI, pp. 189–196. Cited by: §I.
  • [35] M. P. Wattjes, R. A. Kley, and D. Fischer (2010) Neuromuscular imaging in inherited muscle diseases. European Radiology 20 (10), pp. 2447–2460. Cited by: §I.
  • [36] R. J. Willcocks, W. D. Rooney, W. T. Triplett, S. C. Forbes, D. J. Lott, C. R. Senesac, M. J. Daniels, D. J. Wang, A. T. Harrington, G. I. Tennekoon, B. S. Russman, E. L. Finanger, B. J. Byrne, R. S. Finkel, G. A. Walter, H. L. Sweeney, and K. Vandenborne (2016)

    Multicenter prospective longitudinal study of magnetic resonance biomarkers in a large duchenne muscular dystrophy cohort

    Ann Neurol 79 (4), pp. 535–547. Cited by: §I.
  • [37] T. A. Wren, S. Bluml, L. Tseng-Ong, and V. Gilsanz (2008) Three-point technique of fat quantification of muscle tissue as a marker of disease progression in duchenne muscular dystrophy: preliminary study. American Journal of Roentgenology 190 (1), pp. W8–W12. Cited by: §I.
  • [38] A. Yaman, C. Ozturk, P. A. Huijing, and C. A. Yucesoy (2013) Magnetic resonance imaging assessment of mechanical interactions between human lower leg muscles in vivo. Journal of Biomechanical Engineering 135 (9), pp. 091003. Cited by: §I.
  • [39] F. Yu and V. Koltun (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122. Cited by: §IV-C.