U-Net has been providing state-of-the-art performance in many medical image segmentation problems. Many modifications have been proposed for U-Net, such as attention U-Net, recurrent residual convolutional U-Net (R2-UNet), and U-Net with residual blocks or blocks with dense connections. However, all these modifications have an encoder-decoder structure with skip connections, and the number of paths for information flow is limited. We propose LadderNet in this paper, which can be viewed as a chain of multiple U-Nets. Instead of only one pair of encoder branch and decoder branch in U-Net, a LadderNet has multiple pairs of encoder-decoder branches, and has skip connections between every pair of adjacent decoder and decoder branches in each level. Inspired by the success of ResNet and R2-UNet, we use modified residual blocks where two convolutional layers in one block share the same weights. A LadderNet has more paths for information flow because of skip connections and residual blocks, and can be viewed as an ensemble of Fully Convolutional Networks (FCN). The equivalence to an ensemble of FCNs improves segmentation accuracy, while the shared weights within each residual block reduce parameter number. Semantic segmentation is essential for retinal disease detection. We tested LadderNet on two benchmark datasets for blood vessel segmentation in retinal images, and achieved superior performance over methods in the literature.READ FULL TEXT VIEW PDF
In this project, we present ShelfNet, a lightweight convolutional neural...
Deep Neural Networks (DNN) have been widely used to carry out segmentati...
Segmentation algorithms of medical image volumes are widely studied for ...
We present the Video Ladder Network (VLN) for efficiently generating fut...
The state-of-the-art models for medical image segmentation are variants ...
Inspired by the recent success of fully convolutional networks (FCN) in
Speech derverberation using a single microphone is addressed in this pap...
Deep learning has achieved state-of-the-art performance in many computer vision tasks, such as image classification, semantic segmentation, object recognition, motion tracking and image captioning 
. Convolutional neural networks have become popular since the success of AlexNet
on the classification task for ImageNet dataset
mainly for the following reasons: first, large datasets and powerful computational resources are available nowadays; second, convolutional operation on an image is translation-invariant and enables weight sharing for feature extraction; third, the success of activation functions such as rectified linear unit (ReLU); fourth, efficient optimization algorithms such as stochastic gradient descent (SGD) and Adam optimizer.
Deep convolutional neural networks (CNN) have achieved near-radiologist performance in many semantic segmentation tasks in medical image analysis. Fully convolutional neural network (FCN)  have succeeded in semantic segmentation on Pascal VOC dataset, and U-Net 
achieved the top accuracy in the segmentation of neuronal structures in electron microscopic stacks. Other variants of CNN achieve state-of-the-art performance on benchmark semantic segmentation tasks, such as PSPNet and DeepLab . Among all these variants, U-Net is the most widely used structure in medical image analysis, mainly because the encoder-decoder structure with skip connections allows efficient information flow, and performs well when sufficiently large datasets are not available.
Many variants of U-Net have been proposed: Alom et al. proposed to use recurrent convolution in U-Net ; Ozan et al. proposed to use attention module in U-Net to determine where to look at; Simon et al. proposed Tiramisu , where the convolutional layers in a U-Net are replaced with dense blocks. However, all these variants still fall into the encoder-decoder structure, where the number of paths for information flow is limited. We propose LadderNet, a convolutional network for semantic segmentation with more paths for information flow. We demonstrate that LadderNet can be viewed as an ensemble of FCNs, and validate its superior performance on blood vessel segmentation task in retinal images.
U-Net and its variants in the literature all have an encoder-decoder structure. However, the number of paths for information flow in U-Net is limited. We propose LadderNet, a multi-branch convolutional neural network for semantic segmentation as shown in Fig. 1
, which has more paths of information flow. Features in different spatial scales are named with letters A to E, and columns are named with numbers 1 to 4. We name column 1 and 3 as encoder branches, and name column 2 and 4 as decoder branches. We use convolution with a stride of 2 to go from small-receptive-field features to large-receptive-field features (e.g., A to B), and use transposed convolution with a stride of 2 to go from large-receptive-field features to large-receptive-field features (e.g., B to A). The number of channels is doubled from one level to the next level (e.g., A to B).
LadderNet can be viewed as a chain of U-Nets. Columns 1 and 2 can be viewed as a U-Net, and Columns 3 and 4 can be viewed as another U-Net. Between two U-Nets, there are skip connections at levels A-D. Different from U-Net, where features from encoder branches are concatenated with features from decoder branches, we sum features from two branches. We demonstrate a LadderNet composed of 2 U-Nets here, but attach more U-Nets to form complicated network structures.
LadderNet can also be viewed as an ensemble of multiple FCNs. Veit et al. proposed that ResNet behaves like an ensemble of shallow networks 
, because the residual connections provide multiple paths of information flow. Similarly, LadderNet provides multiple paths of information flow, and we lists a few paths here as an example: (1), (2) , (3) , (4) . Each path can be viewed as a variant of FCN. The total number of paths grows exponentially with the number of encoder-decoder pairs and the number of spatial levels (e.g., A to E in Fig. 1). Therefore, LadderNet has the potential to capture more complicated features and produce a higher accuracy.
More encoder-decoder branches will increase the number of parameters and the difficulty of training. To solve this problem, we propose shared-weights residual blocks as shown in Fig. 1. Different from a standard residual convolutional block proposed by He , the two convolutional layers in the same block share the same weights. Similar to the recurrent convolutional neural network (RCNN) 
, the two convolutional layers in the same block can be viewed as one recurrent layer, except that the two batch normalization layers are different. A drop-out layer is added between two convolutional layers to avoid overfitting. The shared-weights residual block combines the strength of skip connection, recurrent convolution and drop-out regularization, and has much fewer parameters that a standard residual block.
We evaluated the proposed LadderNet on two popular datasets for retina blood vessel segmentation: the DRIVE dataset and the CHASE_DB1 dataset. The DRIVE dataset consists of 40 color images of the retina, 20 of which were used for training and the remaining 20 images for testing. Each image has pixels. To increase the number of training samples, we randomly sampled 190,000 patches of size from the training images, and used of the training samples as validation data.
The CHASE_DB1 dataset was collected from both left and right eyes of 14 school children. It has 28 color images of the retina, 20 of which were used for training and the other 8 images (from 4 children) for testing. The size of each image is . We randomly sampled 760,000 patches of size from the training images, and used of the training samples as validation.
All patches were converted to gray-scale for experiments. Field of view (FOV) is provided for the DRIVE dataset but not the CHASE_DB1 dataset. We applied similar techniques in  to generate FOV masks, and sampled patches over the entire image including regions outside of the FOV.
We chose a LadderNet with 5 levels (A-E) and a drop-out rate of 0.25, and set the number of channels of the first level (level A) as 10, resulting in a LadderNet with 1.5M parameters. We used cross-entropy loss for semantic segmentation, and applied Adam optimizer with default parameters and a batch size of 1024. We used ”reduce learning rate on plateau” strategy, and set the learning rate as 0.01, 0.001, 0.0001 on epochs 0, 20 and 150 respectively, and set the total learning epochs as 250.
We used several metrics to evaluate the performance of LadderNet, including accuracy (AC), sensitivity (SE), specificity (SP) and F1-score. First we calculated True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN). Different metrics are calculated as follows:
The F1-score is calculated as follows:
To further evaluate the performance of different neural networks, we calculated the receiver operating characteristics (ROC) curve and the are under curve (AUC).
|Qiaoliang Li ||2016||-||0.7569||0.9816||0.9527||0.9738|
|Residual UNet ||2018||0.8149||0.7726||0.9820||0.9553||0.9779|
|Recurrent UNet ||2018||0.8155||0.7751||0.9816||0.9556||0.9782|
|Qiaoliang Li ||2016||-||0.7507||0.9793||0.9581||0.9793|
|Residual U-Net ||2018||0.7800||0.7726||0.9820||0.9553||0.9779|
|Recurrent U-Net ||2018||0.7810||0.7459||0.9836||0.9622||0.9803|
Results of LadderNet are shown in Figs. 2-5. LadderNet generates predictions that are visually very close to the ground truth, and the areas under the ROC curves are above 0.97 for both tasks, and the areas under the precision-recall curves are above 0.88 for both tasks.
Table 1 demonstrates the quantitative results of different methods. LadderNet generates the highest F1-score, accuracy and AUC for both tasks. LadderNet also generates high SE and SP on two tasks. It’s easy to generate a high SE or SP by predicting image pixels towards one category more easily than the other; in the extreme case, predicting the entire image as blood vessel will generate a but . SE and SP focus more on one category than the other, while other metrics such as AC, AUC and F1-score evaluates the performance of a model based on performance on two categories; therefore, a higher AC, AUC and F1-score is more convincing than a higher SE or SP. LadderNet achieves the highest AU, AUC and F1-score in both tasks, therefore it performs the best compared to previous models.
We propose LadderNet for semantic segmentation in this paper. Compared to U-Net, LadderNet has more encoder-decoder pairs. The skip connections enable LadderNet to have multiple paths for information flow, and the number of paths increases exponentially with the number of encoder-decoder pairs. Another innovation is the shared-weights residual block, which combines the strengths of residual connection, recurrent convolutional layer and drop-out regularization. The shared-weights design also greatly reduces the number of parameters. Our LadderNet has a superior performance over previous methods in the literature on two public datasets, and improves the performance on the key problem to automatic retinal disease detection. LadderNet can also be used in other semantic segmentation tasks such as tumor segmentation or brain lesion detection.