A Macro-Micro Weakly-supervised Framework for AS-OCT Tissue Segmentation

by   Munan Ning, et al.

Primary angle closure glaucoma (PACG) is the leading cause of irreversible blindness among Asian people. Early detection of PACG is essential, so as to provide timely treatment and minimize the vision loss. In the clinical practice, PACG is diagnosed by analyzing the angle between the cornea and iris with anterior segment optical coherence tomography (AS-OCT). The rapid development of deep learning technologies provides the feasibility of building a computer-aided system for the fast and accurate segmentation of cornea and iris tissues. However, the application of deep learning methods in the medical imaging field is still restricted by the lack of enough fully-annotated samples. In this paper, we propose a novel framework to segment the target tissues accurately for the AS-OCT images, by using the combination of weakly-annotated images (majority) and fully-annotated images (minority). The proposed framework consists of two models which provide reliable guidance for each other. In addition, uncertainty guided strategies are adopted to increase the accuracy and stability of the guidance. Detailed experiments on the publicly available AGE dataset demonstrate that the proposed framework outperforms the state-of-the-art semi-/weakly-supervised methods and has a comparable performance as the fully-supervised method. Therefore, the proposed method is demonstrated to be effective in exploiting information contained in the weakly-annotated images and has the capability to substantively relieve the annotation workload.


Deep Learning Algorithms to Isolate and Quantify the Structures of the Anterior Segment in Optical Coherence Tomography Images

Accurate isolation and quantification of intraocular dimensions in the a...

Learning Whole-Slide Segmentation from Inexact and Incomplete Labels using Tissue Graphs

Segmenting histology images into diagnostically relevant regions is impe...

Reconstruction and Quantification of 3D Iris Surface for Angle-Closure Glaucoma Detection in Anterior Segment OCT

Precise characterization and analysis of iris shape from Anterior Segmen...

Cardiac Adipose Tissue Segmentation via Image-Level Annotations

Automatically identifying the structural substrates underlying cardiac a...

Mixed supervision for surface-defect detection: from weakly to fully supervised learning

Deep-learning methods have recently started being employed for addressin...

Unifying Structure Analysis and Surrogate-driven Function Regression for Glaucoma OCT Image Screening

Optical Coherence Tomography (OCT) imaging plays an important role in gl...

1 Introduction

Glaucoma is the leading cause of irreversible vision loss world-widely that is predicted to affect more than 100 million people by year 2040 [19]. Primary angle closure glaucoma (PACG), as a major subtype of glaucoma, develops when the angle between the iris and cornea is closed or narrowed, resulting in the blockage of drainage canals and sudden rise in intraocular pressure [16]. In the clinical practice, the anterior segment optical coherence technology (AS-OCT) [14] is widely utilized to obtain both quantitative and qualitative information on the anatomical structures of cornea and iris for the PACG diagnosis [6, 10, 11, 12]. However, manual analysis of each image is laborious and requires professional knowledge. Although the rapid development of deep learning technologies reveals the feasibility of fully automatic anatomical structure segmentation with high accuracy [5], it still requires a large quantity of images with pixel-wise annotations for the related structures, which is time-consuming and expertise-demanding.

To alleviate the intensive annotation workload of clinicians, a lot of efforts have been made on semi-/weakly-supervised segmentation [3, 8, 9, 13, 17, 18, 20]. The semi-supervision based methods aim to extract information from a large amount of unlabeled images with the assistance of some fully-annotated images or samples. For example, Perone et al.  [13] proposed a semi-supervised teacher-student framework, which leveraged the supervised knowledge learned from the teacher model to improve the segmentation performance of the student model. Yu et al.  [20] further adopted the uncertainty information to the teacher-student model to fully exploit the information of the unlabeled data by following the prediction consistencies under different perturbations. Hung et al.  [8] proposed an adversarial based strategy, which introduced a new discriminator to predict the confidence map for utilizing the information of unlabeled images. However, current semi-supervised methods still require a considerable quantity of fully-annotated images for a satisfactory performance. Another strategy is to improve the workload efficiency by adopting weak annotations111For the rest of paper, full annotation refers to manual label of each pixel, while weak annotation refers to circles, dots, or scribbles denoting the region of interest. for training. For example, Kervadec et al.  [9]

introduced a differentiable term into the proposed loss function to impose the soft size constraints extracted from the weak annotations on the target region. Tang

et al.  [17, 18] proposed to attain better performance by jointly optimizing the normalized cut with a deep learning model and CRFs for the weakly-supervised task. Although these weakly-supervised methods might relieve the annotation workload to some extent, their segmentation could be error-prone due to the lack of sufficient pixel-wise annotation information. In the clinical practice, apart from a large number of weakly-annotated samples, there is also a small number of full annotations, which might be combined together and employed to improve the model’s performance.

To address the above issues of semi-/weakly-supervised learning, an intuitive solution is to integrate both the fully-annotated images and the weakly-labeled samples into the training process, so that the former images can provide accurate pixel-wise tutorial while the latter ones offer more high-level region proposals for segmentation.

In this paper, we propose an uncertainty-aware macro-micro (UAMM) framework for the segmentation of the cornea and iris with a few fully-annotated data and a relatively large number of weakly-labeled samples. The network of the proposed UAMM approach consists of two main components with two flows: the macro model with the microscopic flow and the micro model with the macroscopic flow. Unlike the teacher-student framework in which only the teacher model provides guidance to the student model, the macro model and the micro model in the proposed framework offer information for each other to achieve better segmentation performance. Specifically, the macro model utilizes the weakly-labeled samples to learn segmentation proposals to induce the semantic clues for the optimization of the micro model (a.k.a, microscopic flow), while the micro model employs fully-annotated images to present pixel-wise tutorial to guide the learning process of the macro model (a.k.a, macroscopic flow). The main contributions of this study are four folds:

  • We propose a novel weakly-supervised methodology for the segmentation of cornea and iris in the AS-OCT images, which outperforms state-of-the-art semi-/weakly-supervised methods and achieves comparable performance as the fully-supervised network.

  • Besides the informative features distilled from the weakly-labeled samples, we propose to add the macroscopic flow from the micro model to provide pixel-wise guidance for the optimization of the macro model.

  • Other than pixel-wise annotation information learned from the fully-annotated images, the microscopic flow from the macro model is designed to offer more high-level semantic information for the training of the micro model.

  • We propose to introduce uncertainty guidance strategies into the microscopic flow and macroscopic flow for more accurate and stable guidance.

2 Method

Fig. 1 displays the diagram of the proposed UAMM framework, which consists of the micro model and the macro model. Both models have the same network architecture, i.e., DeepLabV3+ [2], with different parameters. The proposed framework is optimized via a two-stage training strategy. In the first stage, the two models are trained individually using the fully-annotated images and weakly-labeled samples, i.e., the individual training stage, marked as ① and ② in Fig. 1. In the second stage, the two models are trained jointly using only the weakly-labeled samples, i.e., the joint training stage marked as ③ in Fig. 1, which provide guidance (the macroscopic and microscopic flows, marked as ④ and ⑤) for each other to achieve better segmentation performance. To prevent potential misleading of the incorrect information, uncertainty guidance strategies are proposed to provide more accurate and stable guidance for the model training procedure. To clarify notations, denotes the input image, where and represent the height, width and three channels of the input RGB image, respectively; stand for the -way full and weak annotations, respectively; , and indicate the non-learning transformation of DeepLabV3+, the model parameters and model output, respectively.

Figure 1: The framework of our uncertainty-aware micro-macro framework. We only use full annotations in stage ②, while weakly-labeled images in the other.

2.1 Loss Functions for the Macro and Micro Model

In the first stage, the macro model and micro model are trained separately, i.e., the macro model is optimized with the weakly-labeled samples, while the micro model is trained with the fully-annotated images. Specifically, suppose there are fully-annotated images denoted as , and weakly-labeled samples represented by . The loss function for each model in the individual training stage is defined as:


where is the binary indicator denoting the weakly-annotated pixels; iterates over all locations with = and iterates over classes; and represent the outputs of the micro model and macro model, respectively.

Eq. 1 represents the vanilla cross-entropy loss [17] for the micro model, while Eq. 2 denotes the partial-cross-entropy (pCE) loss [17]

for the macro-model. The pCE loss only considers the weak label proposals and the relevant regions during the training process, and thus can discourage the probability of mistakenly classifying the unlabeled pixels as the background.

2.2 Uncertainty-aware KL Loss for the Macroscopic Flow

Because pixel-wise labels are not available for the weakly-labeled images, the macro model trained on them can hardly deliver satisfactory segmentation performance. In the second stage, to further improve the accuracy, we utilize the output of the micro model to guide the optimization of the macro model.

Specifically, we adopt the KL-divergence loss between the output of the two models to fine-tune the macro model. Despite the capability of KL-divergence to align the distributions of two models, the potential mistake of the micro model can result in inaccurate tutorials and mislead the optimization of the macro model. Therefore, we propose to use the uncertainty map to select the reliable pixels for guidance. By using the Monte Carlo dropout (MCD) method [7], the uncertainty map can be easily inferred, which serves as an indicator of the reliability of the model’s prediction. Specifically, we modify the micro network with several dropout layers, and then repetitively perform the forward pass times to obtain Monte Carlo samples , where denotes the softmax probability map of the class at the

forward pass. Because the variance of Monte Carlo samples can be treated as an approximation of the epistemic uncertainty 

[15], the uncertainty map of the micro model can be formulated as:


Furthermore, an empirical threshold is applied on the uncertainty map to obtain a binary indicator map, in which the positive values represent the reliable pixels. Then, the element-wise multiplication is performed between the KL-loss and the binary indicator map to select the reliable loss for back-propagation. Therefore, for the microscopic flow in the joint training stage, the macro model can be updated via the uncertainty guided KL loss, as defined below:


where denotes the uncertainty map, represents the binary map and the threshold is set to for all the experiments. Note that only weakly-labeled images are used in this step, because the micro model has extremely high confidence for the fully-annotated images, which has already been used to train the model in the first stage.

2.3 Uncertainty-aware EMA as the Microscopic Flow

As previously stated, the micro model is first trained with the fully-annotated images. Despite the fact that the fully-annotated images contain informative pixel-wise annotation, optimization with a limited number of samples can easily result in overfitting and deteriorate the generalization capability of the model. Therefore, in the second stage, we use the segmentation proposals learned from the macro model to induce the semantic clues for the micro model.

Unlike in the macroscopic flow where the output of the micro model can be directly used as the tutorial, the output of the macro model trained with weakly-labeled samples may not be accurate enough to be used for guidance. Yu et al.  [20] proposed an asynchronous updating solution for two collaborative models, i.e., the exponential moving average (EMA) mechanism, based on the idea that the weights of the model would contain implicit information of the inference evidence. In this work, the weights of the macro model contain critical information learnt from the weakly-labeled regions and could be useful for the training of the micro model. However, adopting the classic EMA strategy to partially update the micro model with the weights of the macro model requires a predefined updating rate, which may not be the optimal solution. Instead, we propose an uncertainty-aware exponential moving average (UEMA) mechanism for the microscopic flow. and are used to represent the weight parameters of the micro and macro model, respectively. The proposed UEMA in the joint training stage can be summarized as:


where the denotes the unit map with the same shape as . Note that represents the uncertainty map the same as in Eq. 4. The updating rate is calculated by dividing the sum of uncertainty binary map with the sum of . It is used to control the updating rate of UEMA. The less certain the micro model is, the more its parameters are going to be affected by the macro model. Through this asynchronous updating strategy, the segmentation proposal learnt by the macro model can effectively guide the micro model towards better generalization ability with adaptive updating rates.

3 Experiment

Experimental setup

The proposed method is evaluated on a publicly available dataset: the Angle closure Glaucoma Evaluation (AGE) Challenge [4], which provides 3200 AS-OCT images with the dimension of pixels. The original challenge dataset provides annotation for the angle closure classification label and location of the scleral spur. In order to further realize the quantitative analysis of iris and cornea, we have the two key tissues manually re-annotated by experienced ophthalmologists, and offered two types of annotations, i.e., the full annotation and the weak annotation. Pixel-wise masks of iris and cornea are provided by the full annotation, meanwhile, for the weak annotation, line strokes inside the tissues are marked. It is worth mentioning that the original PACG classification problem is reformulated to the tissue segmentation problem, therefore we do not use the original annotation in this work.

We randomly select of the images for training, for evaluation and for test (only full annotations are used for evaluation and test). All the images and the corresponding annotations are resized to 240

512 pixels, and the image intensities are normalized into the range of [-1, 1]. The framework is implemented with PyTorch on an NVIDIA Tesla P40 GPU. We utilize the SGD optimizer with

and to update the network parameters. The batch size is set to 4 for both micro and macro models. Dice coefficient (Dice, represented with percentage) and average distance of boundaries (ADB, represented with millimeter) [1]

are used as the evaluation criteria. Higher Dice and lower ADB imply better segmentation performance. For convenience, we denote Dice1/ADB1 and Dice2/ADB2 as the evaluation metrics of the cornea and the peripheral iris in this work.

Methods Combination Annotation Composition Ave Metric
Micro Macro Macro Flow Micro Flow Full Annotation Weak Annotation Dice ADB
Module ablations 1% 99% 82.13 2.85
1% 99% 83.55 1.87
1% 99% 86.70 1.15
1% 99% 89.35 0.66
1% 99% 90.12 0.43
1% 99% 91.64 0.30
Annotation compositions 5% 95% 92.48 0.27
10% 90% 92.60 0.26
25% 75% 93.06 0.22
50% 50% 93.42 0.20
: The same flow being adopted in this study without uncertainty assistance.
Macro Flow: adding macroscopic flow with our uncertainty-aware KL loss.
Micro Flow: adding microscopic flow with our uncertainty-aware EMA mechanism.
Table 1: Ablation studies on the proposed modules and annotation partition.

Ablation study.

To demonstrate the effectiveness of the proposed modules, we conduct ablation studies as well as experiments with different annotation composition. As shown in Table 1, the performance has improved around 5.80% and 2.29% in average Dice by adding the macroscopic and microscopic flow, respectively. In order to evaluate the effect of the proposed uncertainty strategies, the results of flows without uncertainty are presented as well, i.e., marked by the asterisk symbol. To be more specific, we use the conventional EMA for the macroscopic flow and the KL-loss for the microscopic flow directly. As expected, the result without uncertainty shows inferior performance (2.65% lower for macroscopic flow and 1.52% lower for microscopic flow, respectively), demonstrating that the proposed uncertainty strategies can improve the effectiveness of the tutorials. To evaluate the stability of the proposed method, we conducted additional experiments with different percentages of fully-annotated images. As expected, the more fully-annotated images we utilize, the better performance the method achieves, indicating that the proposed method can exploit the information from full annotations as well.

Figure 2: Visualization of the segmentation results by different methods and ours.

Comparison with State-of-the-art

As illustrated in Table 2, the two columns within the annotation composition represent the percentages of fully-annotated and weakly-labeled images used for training. The results of state-of-the-art semi-/weakly-supervised methods, including WACT [13], UAMT [20], AdvSemi [8], and CRF-rloss [18], are presented for comparison. In the training set for the proposed UAMM method, only 1% images are fully-annotated while the rest 99% samples are weakly-labeled. For the semi-supervised methods, i.e., WACT, UAMT and AdvSemi, generally weakly-annotated samples will not be utilized in their studies. Similarly, the full-annotated samples are not used in the weakly-supervised studies, i.e., CRF-rloss, either. For a fair comparison, both of full and weakly-annotated samples will be integrated in the training procedure and provide two versions of results, so as to keep the model comparison under the same evaluation criteria. Oracle indicates using only the micro model, i.e., a single DeepLabV3+ network [2]. As the baseline method, Oracle has been applied on four training sets with different percentages of fully-annotated images and weakly-labeled samples, as denoted in Row 2. With the same training data setup, the proposed UAMM method has achieved the best performance among these methods, with 91.64% in average Dice score and 0.3 in ADB. Furthermore, the evaluation metrics of UAMM are close to the metrics of fully-annotated trained Oracle (only 2.01% lower on average Dice), demonstrating that the proposed method can exploit segmentation guidance from the weak annotations. The visualization of representative examples is displayed in Fig. 2.

Method Annotation Composition Metric
Full Annotation Weak Annotation Dice ADB
Dice1 Dice2 Ave ADB1 ADB2 Ave
Oracle 100% 0% 95.71 91.59 93.65 0.13 0.21 0.17
0% 100% 55.14 35.03 45.09 9.30 13.62 11.46
1% 0% 78.83 68.64 73.73 5.79 6.29 6.04
1% 99% 83.71 80.55 82.13 2.58 3.11 2.85
WACT [13] 1% 0% 51.63 25.89 38.76 9.74 19.97 14.86
1% 99% 84.74 83.13 83.94 1.19 0.77 0.98
UAMT [20] 1% 0% 86.59 64.23 75.41 2.53 5.79 4.16
1% 99% 88.58 85.06 86.82 0.47 0.73 0.60
AdvSemi [8] 1% 0% 84.02 69.49 76.75 2.88 6.11 4.50
1% 99% 88.36 83.33 85.85 3.19 2.02 2.6
CRF-rloss [18] 0% 100% 86.37 83.97 85.17 1.09 0.87 0.98
1% 99% 93.44 83.26 88.35 0.32 0.92 0.62
UAMM 1% 99% 93.68 89.60 91.64 0.27 0.32 0.30
Table 2: Quantitative comparison with the state-of-the-art semi-/weakly-supervised learning algorithms.

4 Conclusion

In this work, we proposed a macro-micro weakly-supervised framework to tackle the problem of cornea and iris segmentation for the AS-OCT images. Specifically, an uncertainty-aware KL loss is designed for the macroscopic flow to assist the training of the macro model by the prediction priors from the micro model. Then, the microscopic flow is obtained with an uncertainty-aware moving average mechanism, which updates the micro-model by gradually involving the weights of the macro model. Our approach outperformed state-of-the-art semi-/weakly-supervised methods on the cornea and iris segmentation task for AS-OCT images. In addition, it achieved comparable performance by using only 1% of fully-annotated data with that of DeepLabV3+ using all fully-annotated images.

5 Acknowledgment

This work was funded by the Key Area Research and Development Program of Guangdong Province, China (No. 2018B010111001), National Key Research and Development Project (No. 2018YFC2000702) and Science and Technology Program of Shenzhen, China (No. ZDSYS201802021814180).


  • [1] C. Bian, X. Yang, J. Ma, S. Zheng, Y. Liu, R. Nezafat, P. Heng, and Y. Zheng (2018) Pyramid network with online hard example mining for accurate left atrium segmentation. In International Workshop on Statistical Atlases and Computational Models of the Heart, pp. 237–245. Cited by: §3.
  • [2] L. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In

    Proceedings of the European conference on computer vision (ECCV)

    pp. 801–818. Cited by: §2, §3.
  • [3] W. Cui, Y. Liu, Y. Li, M. Guo, Y. Li, X. Li, T. Wang, X. Zeng, and C. Ye (2019) Semi-supervised brain lesion segmentation with an adapted mean teacher model. In Information Processing in Medical Imaging, pp. 554–565. Cited by: §1.
  • [4] Cited by: §3.
  • [5] H. Fu, Y. Xu, S. Lin, D. W. K. Wong, B. Mani, M. Mahesh, T. Aung, and J. Liu (2018) Multi-context deep network for angle-closure glaucoma screening in anterior segment OCT. In International Conference on Medical Image Computing and Computer Assisted Intervention, pp. 356–363. Cited by: §1.
  • [6] H. Fu, Y. Xu, S. Lin, X. Zhang, D. W. K. Wong, J. Liu, A. F. Frangi, M. Baskaran, and T. Aung (2017) Segmentation and quantification for angle-closure glaucoma assessment in anterior segment OCT. IEEE Transactions on Medical Imaging 36 (9), pp. 1930–1938. Cited by: §1.
  • [7] Y. Gal and Z. Ghahramani (2016) Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In

    International Conference on Machine Learning

    pp. 1050–1059. Cited by: §2.2.
  • [8] W. Hung, Y. Tsai, Y. Liou, Y. Lin, and M. Yang (2018) Adversarial learning for semi-supervised semantic segmentation. arXiv preprint arXiv:1802.07934. Cited by: §1, §3, Table 2.
  • [9] H. Kervadec, J. Dolz, M. Tang, E. Granger, Y. Boykov, and I. B. Ayed (2019) Constrained-CNN losses for weakly supervised segmentation. Medical Image Analysis 54, pp. 88–99. Cited by: §1.
  • [10] H. Li, V. Jhanji, S. Dorairaj, A. Liu, D. S. Lam, and C. K. Leung (2012) Anterior segment optical coherence tomography and its clinical applications in glaucoma. Journal of Current Glaucoma Practice 6 (2), pp. 68. Cited by: §1.
  • [11] S. I. Niwas, W. Lin, X. Bai, C. K. Kwoh, C. J. Kuo, C. C. Sng, M. C. Aquino, and P. T. Chew (2016) Automated anterior segment OCT image analysis for Angle Closure Glaucoma mechanisms classification. Computer Methods and Programs in Biomedicine 130, pp. 65–75. Cited by: §1.
  • [12] W. P. Nolan, J. L. See, P. T. Chew, D. S. Friedman, S. D. Smith, S. Radhakrishnan, C. Zheng, P. J. Foster, and T. Aung (2007) Detection of primary angle closure using anterior segment optical coherence tomography in Asian eyes. Ophthalmology 114 (1), pp. 33–39. Cited by: §1.
  • [13] C. S. Perone and J. Cohen-Adad (2018) Deep semi-supervised segmentation with weight-averaged consistency targets. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 12–19. Cited by: §1, §3, Table 2.
  • [14] S. Radhakrishnan, A. M. Rollins, J. E. Roth, S. Yazdanfar, V. Westphal, D. S. Bardenstein, and J. A. Izatt (2001) Real-time optical coherence tomography of the anterior segment at 1310 nm. Archives of Ophthalmology 119 (8), pp. 1179–1185. Cited by: §1.
  • [15] L. Smith and Y. Gal (2018) Understanding measures of uncertainty for adversarial example detection. arXiv preprint arXiv:1803.08533. Cited by: §2.2.
  • [16] X. Sun, Y. Dai, Y. Chen, D. Yu, S. J. Cringle, J. Chen, X. Kong, X. Wang, and C. Jiang (2017) Primary angle closure glaucoma: What we know and what we don’t know. Progress in Retinal and Eye Research 57, pp. 26–45. Cited by: §1.
  • [17] M. Tang, A. Djelouah, F. Perazzi, Y. Boykov, and C. Schroers (2018) Normalized cut loss for weakly-supervised CNN segmentation. In

    IEEE Conference on Computer Vision and Pattern Recognition

    pp. 1818–1827. Cited by: §1, §2.1.
  • [18] M. Tang, F. Perazzi, A. Djelouah, I. Ben Ayed, C. Schroers, and Y. Boykov (2018) On regularized losses for weakly-supervised CNN segmentation. In European Conference on Computer Vision, pp. 507–522. Cited by: §1, §3, Table 2.
  • [19] Y. Tham, X. Li, T. Y. Wong, H. A. Quigley, T. Aung, and C. Cheng (2014) Global prevalence of glaucoma and projections of glaucoma burden through 2040: A systematic review and meta-analysis. Ophthalmology 121 (11), pp. 2081–2090. Cited by: §1.
  • [20] L. Yu, S. Wang, X. Li, C. Fu, and P. Heng (2019) Uncertainty-aware self-ensembling model for semi-supervised 3D left atrium segmentation. In International Conference on Medical Image Computing and Computer Assisted Intervention, pp. 605–613. Cited by: §1, §2.3, §3, Table 2.