Cardiac Adipose Tissue Segmentation via Image-Level Annotations

by   Ziyi Huang, et al.
Columbia University

Automatically identifying the structural substrates underlying cardiac abnormalities can potentially provide real-time guidance for interventional procedures. With the knowledge of cardiac tissue substrates, the treatment of complex arrhythmias such as atrial fibrillation and ventricular tachycardia can be further optimized by detecting arrhythmia substrates to target for treatment (i.e., adipose) and identifying critical structures to avoid. Optical coherence tomography (OCT) is a real-time imaging modality that aids in addressing this need. Existing approaches for cardiac image analysis mainly rely on fully supervised learning techniques, which suffer from the drawback of workload on labor-intensive annotation process of pixel-wise labeling. To lessen the need for pixel-wise labeling, we develop a two-stage deep learning framework for cardiac adipose tissue segmentation using image-level annotations on OCT images of human cardiac substrates. In particular, we integrate class activation mapping with superpixel segmentation to solve the sparse tissue seed challenge raised in cardiac tissue segmentation. Our study bridges the gap between the demand on automatic tissue analysis and the lack of high-quality pixel-wise annotations. To the best of our knowledge, this is the first study that attempts to address cardiac tissue segmentation on OCT images via weakly supervised learning techniques. Within an in-vitro human cardiac OCT dataset, we demonstrate that our weakly supervised approach on image-level annotations achieves comparable performance as fully supervised methods trained on pixel-wise annotations.


page 2

page 7

page 8

page 9

page 10

page 11


CAMEL: A Weakly Supervised Learning Framework for Histopathology Image Segmentation

Histopathology image analysis plays a critical role in cancer diagnosis ...

ShapePU: A New PU Learning Framework Regularized by Global Consistency for Scribble Supervised Cardiac Segmentation

Cardiac segmentation is an essential step for the diagnosis of cardiovas...

Weakly Supervised Estimation of Shadow Confidence Maps in Ultrasound Imaging

Detecting acoustic shadows in ultrasound images is important in many cli...

Atlas-ISTN: Joint Segmentation, Registration and Atlas Construction with Image-and-Spatial Transformer Networks

Deep learning models for semantic segmentation are able to learn powerfu...

Explainable cardiac pathology classification on cine MRI with motion characterization by semi-supervised learning of apparent flow

We propose a method to classify cardiac pathology based on a novel appro...

A Macro-Micro Weakly-supervised Framework for AS-OCT Tissue Segmentation

Primary angle closure glaucoma (PACG) is the leading cause of irreversib...

1 Introduction

Cardiovascular disease is the leading cause of death in the United States, with atrial fibrillation alone affecting at least 2.3 million people[19]. Treatment of complex arrhythmias such as atrial fibrillation and ventricular tachycardia is through catheter ablation, which directly destroys the cardiac substrates that cause irregular impulse propagation. However, this treatment is sub-optimal, due to the lack of capability to accurately identify optimal ablation targets. With the knowledge of patients’ heart structure, the ablation strategy can be further optimized by avoiding critical structures and identifying arrhythmia substrates, such as areas with increased amounts of adipose tissues. Recent work has shown that an increased amount of adipose tissues within the myocardium is a substrate for cardiac arrhythmias [34, 5, 30, 6, 35].

Optical coherence tomography (OCT) is a non-destructive optical imaging modality that has the capability to capture myocardial structures such as Purkinje network [38], atrial ventricular nodes[14], sinoatrial nodes[3], and myofiber organization[12]. In addition, it can be used to resolve critical tissue substrates of arrhythmias, such as fibrosis and adipose tissues [26]. With the development of OCT-integrated catheters[8], OCT can image the heart wall in real time through percutaneous access [36], which holds promise to aid catheter ablation.

To benefit from the real-time capacity of OCT imaging, analysis of OCT images is expected to be automated for timely decision making. Evaluation of adipose tissue distribution within a human atrial sample requires pixel-wise analysis of large volumetric datasets [9]. Manually annotating adipose tissues within a single OCT volumes can take a well-trained annotator over 10 hours. Therefore, automated identification of cardiac tissues, especially adipose tissue, in OCT images is greatly needed.

Current automated analysis on cardiac OCT images is mostly based on fully supervised learning models[24, 16, 15]. These models were limited and suffered from the drawback of manual workload in the labeling process. To avoid overfitting, a large amount of data is required to support the model training. For segmentation tasks, the labeling process is extremely time-consuming and has limited accuracy. Moreover, OCT images are volumetric, adding an additional challenge to labeling. Thus, automatic analysis with weakly supervised learning models is of great interest.

Figure 1: Representative OCT images from cardiac dataset. Sample (A) is obtained from right ventricle. Sample (B) and sample (C) are obtained from right atrium. Sample (D) is obtained from left atrium submerged in PBS solution. The features of adipose tissue present great variations among different locations and imaging conditions. The unclear boundary and irregular shape of adipose tissues add unique challenges for automated segmentation. Scale bar: .

Although recent study has investigated in retinal OCT analysis [37, 39]

, transferring OCT retinal segmentation to cardiac solution for weakly supervised cardiac OCT segmentation is still elusive for three reasons. First, cardiac adipose and fibrosis tissues can appear in multiple sub-regions with irregular shapes and infiltrating patterns. Thus, the cardiac OCT images are more complicate than retinal substrates with rather regular layered structure. Second, boundaries between cardiac substrates are more blurry than between the retinal layers. Third, cardiac substrates have larger variance among patients than retinal tissues.

In this study, we present a weakly supervised learning framework for cardiac tissue segmentation using image-level labels. Our training approach has two stages, namely pseudo label generation and segmentation network training. We first use the class activation map (CAM) results obtained from a binary classification network to generate adipose location seeds. Then, we develop a superpixel-based segmentation algorithm to generate pseudo labels followed by segmentation training. Our contributions are as follows:
(1) We propose a weakly supervised learning framework for cardiac tissue segmentation. Our model is trained without the need of pre-training or domain adaptive learning.
(2) We combine CAM with superpixel segmentation to effectively address the sparse seed challenge caused by irregular shape and unclear boundary of adipose tissues.
(3) We evaluate our approach on a human cardiac dataset and demonstrate that our weakly supervised model achieves comparable performance with fully supervised algorithms.

Figure 2:

Algorithm training flow of the proposed weakly supervised segmentation approach. The framework consists of two separate modules, namely pseudo label generation and segmentation network training. In pseudo label generation module, pixel-wise pseudo annotations are generated by the integration of CAM and superpixel methods. In segmentation network training module, a segmentation network is trained on the pseudo labels with a novel loss function.

2 Related Work

Regarding tissue analysis on cardiac OCT images, [10] imaged and analyzed features on dense collagen, loose collagen, fibrotic myocardium, normal myocardium, and adipose tissue for automatic classification. In [28], segmentation was obtained from the variance map through compressive sensing reconstruction. In [26], the distributions of adipose tissues and fiber orientations were retracted and mapped throughout human left atrium, while in [4], the visualization of cardiac fibers in the atrium, ventricle, atrioventricular node, and sinoatrial node were presented. Overall, conventional cardiac OCT image analysis relies on handcrafted features for tissue characterization or fiber orientation-based methods to focus on myofibers.

Deep learning approaches have achieved great success in OCT image segmentation tasks [17, 29, 13, 27, 32, 25, 22]. [31] developed a fully convolutional network with Gaussian process based post processing for retinal OCT segmentation. [7]

proposed a novel framework which combined a hybrid convolutional neural network and graph search method for retinal layer boundary detection.

[2] developed a fully convolutional network based AV-Net for artery-vein classification. Their model contained a multi-modal training process which involved both en-face OCT and optical coherence tomography angiography (OCTA) to provide the intensity and geometric profiles. [16]

trained a fully supervised segmentation network for cardiac tissue segmentation and used model uncertainty to estimate tissue heterogeneity. Existing work mainly relies upon the fully supervised learning.

In contrast to fully supervised methods, weakly supervised approaches use higher level labels to guide the pixel-level segmentation training process. [37] successfully segmented lesions by calculating the differences between the input abnormal images and normal-like retinal OCT images from a CycleGAN model. [39] employed a few shot learning technique for retinal disease classification and applied a GAN to enrich normal OCT images with OCT images of rare diseases. [18] proposed a Noise2Noise [23] based weakly supervised learning model for OCTA image reconstruction task.

3 Problem Analysis

Our study is conducted on a cardiac dataset that was acquired from 44 human hearts with median age at 62 years. The dataset contains both healthy hearts, end-stage heart failure, atrial fibrillation, coronary heart disease, cardiomyopathy, and myocardial infarction. A detailed clinical characteristic is presented in Section 5.1. These various disease conditions might alter the visual features of cardiac substrates, raising the following unique challenges on the algorithm design:

Various features and irregular shapes. As shown in Fig. 1, adipose regions present great variations among cardiac OCT images from human donors with cardiovascular disease due to heterogeneous heart remodeling. In Fig. 1 (A), intra-scan inconsistency can be clearly observed in the two sub-regions. Meanwhile, in comparison with Fig. 1 (B), the size of fat cells in Fig. 1 (A) is much smaller and the number of fat cells is larger, as indicated in the histology images. In addition, the distance to the endocardium tissue can also affect tissue appearance. Adipose tissues in Fig. 1 (C) are deeper in the myocardium and appear darker and blurrier than in Fig. 1 (B). Finally, the features and shapes can be further impacted by experimental conditions. In Fig. 1 (D), the OCT image was obtained from tissues submerged in phosphate buffered saline (PBS). In this sample, the adipose tissues have very low contrast with the surrounding normal tissues.

Similar pattern among adipose tissue and noise. Image noise and artifacts are inevitable during the acquisition process. Features of adipose tissue are very similar to those of speckle noise and artifacts.

Data imbalance and limited training data. In human cardiac samples, the majority of regions are normal tissues, such as myocardium and endocardium, rather than targeted adipose tissues. In our dataset, only OCT images shows large clusters of adipose tissues. At pixel level, pixels belonging to adipose tissues only account for 2.6% of the total pixels to label. Even for images that contain adipose tissues, the ratio of number of adipose-related pixels over total number of pixels is very small. Hence, the samples that are informative for model training are very limited.

4 Methodology

In this paper, we propose a weakly supervised learning framework for cardiac tissue segmentation task. We denote the space of images by . For any image , the image-level annotation , indicates whether contains the adipose tissues. Figure 2 shows the pipeline of our proposed framework. As shown, our training approach consists of two major stages: pseudo label generation and segmentation network training. The pseudo label generation module is formed by two components: we first apply the CAM approach to generate initial adipose seeds and then we use superpixel-base segmentation method to propagate the adipose seeds into pseudo pixel-wise labels. A detailed pseudo algorithm for the pseudo label generation module is listed in Algorithm 1. In the segmentation module, we introduce a novel loss function with a special focus on the adipose seed regions to increase the detection performance of our segmentation network.

4.1 Pseudo Label Generation

4.1.1 CAM-based seed localization

Image-level labels do not include any location clue for the target tissue. Thus, it cannot be directly used to train the segmentation networks. So, the first stage of our model is to find reliable adipose seeds to indicate the location of adipose tissues. We use the class activation map to generate initial adipose seeds. Since the adipose tissues do not have regular shapes and may appear at multiple cluster regions, we employ the global average pooling (GAP) layer to generate the class activation maps, as it has advantages on identifying the extent of target tissue regions over the global max pooling layer


The class activation maps have strong responses on regions with artifacts and high intensity noise. To increase the reliability of pseudo label generation, we apply a boundary masking algorithm on the class activation maps to filter out adipose seeds that are located in the background regions (false positive caused by noise) and regions close to the tissue-background boundary (false positive caused by artifacts). We adapt the cardiac layer segmentation algorithm from [10] and use the boundary of the top generated layer as the tissue-background interface. After getting the tissue surface, we remove adipose seeds above or close to the tissue-background boundary.

4.1.2 Superpixel-based seed propagation

Superpixels are generated as in [1]. An entire superpixel is labeled as adipose tissue if one of its inner pixels is labeled as adipose tissue. Upon the generation of superpixels, the initial segmentation pseudo labels can be further improved to eliminate the following two misclassifications: 1) The adipose seeds may omit some adipose regions and 2) the adipose seeds may incorrectly mark the normal regions as the adipose regions, due to the artifacts and intensity noise. To further remove the noisy annotations, we apply the Markov spatial regularisation strategy to add the ignored regions and remove the noisy adipose superpixels which only contain the normal tissues. Since the adipose cells are clustered in the cardiac tissue, the neighbors of an adipose region are more likely to belong to the adipose tissue clan, while small isolated adipose superpixels are more likely to be the normal regions corrupted by noise. Based on these criteria, we develop a simple yet effective spatial regularisation strategy: the label of a superpixel will be updated if most of its neighbors () belong to another class.

Input: Training dataset with image-level labels ;
Output: Pixel-wises pseudo labels.
Step 1: Train localization network from and .
Step 2: Apply CAM method to generate the initial tissue seed results .
Step 3: Apply the boundary masking on and get updated tissue seeds .
Step 4: Apply superpixel-based propagation method on to generate the initial pseudo segmentation labels .
Step 5: Update with the spatial regularisation strategy and get the final pseudo segmentation labels .
Algorithm 1 Algorithm Framework for Pseudo Label Generation Module

4.2 Segmentation Network Training

Adipose tissues are sparse, in comparison with normal tissues, such as myocardium and endocardium. Without special consideration, the segmentation performance might be severely limited due to data imbalance. A traditional way to solve data imbalance issue is to add special weights on the minority classes. However, in our initial training data, the pseudo segmentation maps generated by CAM-superpixels are not very precise, and thus cannot support the class-weighting strategy. To overcome this challenge, we use a seed loss, inspired by [21], to optimize our segmentation network.

First, we denote

as the predicted probability for class

at the pixel position with and

as the one-hot encoding of the ground-truth annotation for class

, where and is the number of classes. In this work, . The cross entropy loss (CEL) and seed loss (SL) is defined as follows:


where is the set of locations that are labeled with class (i.e., the adipose class). Compared with the CEL (Eq.1), the SL (Eq.2) only focuses on the regions of adipose tissue, and thus, it helps to reduce the impact of the false negatives in the pseudo segmentation labels.

We also use Dice loss (DL) in our loss function to learn the context information. The DL is defined as:


Finally, our segmentation network is jointly optimized by the combination of CEL, SL, and DL:


where and are weight hyper-parameters.

Characteristic Value
N 44
Demographic profile
  Age in years, median (average) 62 (62.2)
  Female, n (%) 20 (45.5)
Medical history, n (%)
  Heart failure 10 (22.7)
  Cardiomyopathy 8 (18.2)
  Coronary artery disease 11 (25.0)
  Myocardial infarction 10 (22.7)
  Atrial fibrillation 3 (6.8)
  Chronic obstructive pulmonary disease 16 (36.4)
  Diabetes 17 (38.6)
  Hypertension 27 (61.4)
Cause of death, n (%)
  Cardiac arrest 18 (40.9)
  Cardiopulmonary arrest 2 (4.5)
  Respiratory failure 5 (11.4)
  Chronic obstructive pulmonary disease 1 (2.27)
  Congestive heart failure 1 (2.27)
  Others, cardiac related 11 (25.0)
  Others, not cardiac related 6 (13.6)
Table 1: Clinical characteristics of heart donors
Figure 3: Comparison of tissue seeds before and after the boundary masking algorithm. Red: the detected tissue-background boundaries; blue: accurately annotated adipose seeds; green: false positives. As shown, the boundary masking algorithm can effectively remove the false positive adipose seeds caused by the artifacts and noise. Benefiting from it, the adipose seeds are more precise to be propagated for segmentation guidance. Scale bar: .
Figure 4: Comparison of pseudo labels with and without the spatial regularisation strategy. Blue: accurately annotated adipose pixels; red: false negatives. The spatial regularisation strategy helps to correct the mis-labeled pseudo labels by using the context information from nearby regions. After applying it, the false negatives have been significantly reduced. Scale bar: .
Before After
Accuracy 80.79 1.15 83.91 2.45
Precision 56.43 11.95 75.90 8.18
Table 2: Evaluation metrics (%) of adipose tissue seeds before and after the boundary masking algorithm.
Method True Positive Rate False Positive Rate Dice Coefficient
Superpixel 71.17 6.36 10.03 2.87 67.09 2.96
Superpixel + Spatial regularisation 71.77 6.72 8.33 2.90 69.70 3.34
Table 3: Evaluation metrics (%) on tissue pseudo labels before and after the Markov spatial regularisation.
Method True Positive Rate False Positive Rate Dice Coefficient
U-Net (Fully Supervised) 83.73 3.930 3.62 1.05 80.53 8.09
Proposed 86.05 5.534 6.73 2.63 79.67 6.98
w/o boundary masking 72.63 14.97 4.55 2.07 73.56 11.31
w/o Markov spatial regularisation 78.02 7.90 4.47 1.62 77.24 6.31
Seed loss + Dice loss 88.38 8.79 9.41 2.07 73.35 9.24
CE loss + Dice loss 74.63 15.77 3.47 0.71 75.94 10.74
Table 4: Evaluation metrics (%) of different models on whole dataset.
Figure 5: Representative segmentation results from human atrium and ventricle samples. Our proposed approach accurately identifies the adipose tissues located at different regions with various sizes and shapes. All prediction results are highly consistent with the ground truth labels. Scale bar:
Figure 6: The prediction results of images obtained from nearby regions. Our approach successfully pinpoints the adipose tissues from other tissue types, showing its strong identification ability on adipose tissues. Scale bar: 500 .
Figure 7: 3D visualization of adipose tissue segmentation. (A): the original OCT volume; (B): the original volume overlaid with segmented adipose regions; (C): the segmented adipose regions from proposed approach. The segmented boundaries accurately delineate the morphological changes in adipose shape.

5 Experiment Evaluation

5.1 Dataset

We evaluate the performance of our proposed model on the human cardiac dataset previously used in [9]. It consists of in-vitro cohort of 385 images taken from 44 human atria and ventricles using the Thorlabs OCT system. The samples were acquired through a National Disease Research Interchange approved protocol from Columbia University. All specimens were de-identified and considered not human subjects research, according to the Columbia University Institutional Review Board under 45 CFR 46. Table 1 presents the detailed clinical characteristic of the human donor hearts. Each OCT image is of size 512 800 pixels with a field of view of 2.51 mm 4 mm. Three experts, blinded to the algorithm design, annotated the OCT images under the guidance from a pathologist. All images were carefully annotated at pixel level with visual cron-check on corresponding histology images. Our evaluation is conducted on a five-fold cross validation strategy with validation sets randomly divided over human subjects.

5.2 Implementation Details

Seed localization network.

To avoid overfitting, we only train a localization network with three hidden layers for adipose tissue seed generation. We use the ReLU function as the activation function. The number of neurons in each hidden layer is 32, 32, and 64. We use GAP layer as the final layer on the localization network to learn the cluster pattern of adipose tissues. The networks are optimized on cross entropy loss via Adam optimizer

[20] with random Glorot uniform initialization [11]

. Over the cross validation sets, all networks converged within 300 epochs with a learning rate of


Segmentation network. We employ the classic medical segmentation network UNet [33] as the baseline of our learning framework. The hyper-parameter in the loss function (Eq.4) was determined from according to the proportion of adipose tissues in the training set with . All segmentation networks were randomly initialized and converged within 200 epochs with a learning rate of and weight decay .

5.3 Evaluation Metrics

In our experiment, we use accuracy and precision to evaluate the overall accuracy and the detection performance of adipose seed results. For pseudo label generation and segmentation evaluation, we use true positive rate (detection rate), false positive rate, and Dice coefficient (F1 score) to evaluate the tissue segmentation performance.

5.4 Evaluation of Pseudo Label Generation

Adipose tissue seed localization. The binary accuracy for our proposed localization network achieves very stable performance () on all validation sets. Figure 3 presents two representative adipose seed results generated from our localization network. In Fig. 3

, the detected tissue-background boundary is delineated in red and the accurately located adipose seeds are marked in blue with mis-classified adipose seeds marked in green. As seen, the boundary masking algorithm can effectively remove the misclassified edges and background noise from the original adipose seed results and the adipose tissues are successfully marked with same seeds. In Table

2, we report the accuracy and precision of adipose seeds before and after boundary masking. As shown, both accuracy and precision are improved after applying the boundary masking algorithm. In particular, the precision of adipose seeds has been significantly increased by approximately 20%, which evidently demonstrates the effectiveness of boundary masking on accuracy improvement.

Pseudo label generation. Figure 4 shows two superpixel segmentation results with accurately segmented pixels marked in blue and false negatives marked in red. In Table 3, we provide a quantitative evaluation of our spatial regularisation strategy. There is an increase of in Dice coefficient after applying it.

5.5 Comparison with Fully Supervised Segmentation

5.5.1 Cross Validation Experiments

We use fully supervised models trained from pixel-wise accurate segmentation masks as the baseline for comparison. Table 4

summarizes the averaged results and the standard deviation of our proposed weakly supervised approach and fully supervised baselines.

Weakly supervised learning vs fully supervised learning. Our weakly supervised model, trained from image-level labels, achieves comparable quality results than the fully supervised model that trained on pixel-wise labels. In addition, our dataset was acquired within a time frame that spanned over five years. During this time frame, imaging setup, such as sample freshness, imaging condition, and tissue preparation, varied among experiments. Thus, our results also demonstrate the generalization ability of our model against imaging condition variance, showing its strong potential on real-world clinical applications.

Ablation study. Performance drops are observed in Table 4 when the boundary masking algorithm and spatial regularisation strategy are removed. In particular, after removing the boundary masking algorithm, the true positive rate is severely decreased along with an increased standard deviation. This result indicates the necessity of using boundary masking algorithm to improve adipose seed quality at the early stage of pseudo label generation. In comparison with the false positive rate, the true positive rate has notable changes after applying the spatial regularisation, which shows its effectiveness on false negative correction.

We further conduct experiments to assess the influence of different loss functions in our proposed model. As shown in Table 4, the use of seed loss can notably increase the model detection performance but meanwhile hinder the false positive rate. In contrast, the cross entropy loss is more efficient on controlling the false alrams. These results show that the use of seed loss can efficiently reduce the impact of false negatives in the pseudo labels. This detection rate and false alarm trade-off can be balanced through adjusting the weights of seed loss and cross entropy loss.

5.5.2 Representative Segmentation Results

In this section, we present the visual output of our proposed weakly supervised model in overall performance, small adipose tissue region detection, and 3D segmentation.

Overall performance. Figure 5 shows the predicted tissue maps on four human cardiac samples. In Fig. 5 (A) and (B), our model accurately localizes the adipose tissue regions in arbitrary shapes. Meanwhile, in Fig. 5

(A), we can also observe the over segmented regions (regions at left corner) in the ground truth figure. Human annotators tend to oversegment the regions below the penetration depth, while for the network, it may identify these regions as non-adipose tissues because of the low signal to noise ratio. This over-segmentation tendency can lead to the decreased values in evaluation metrics. In Fig.

5 (C), our model successfully identifies the adipose tissues in the multiple regions with different penetration depth. In Fig. 5 (D), we show a human atrium sample which is slightly off the focus. Similar to previous results, our models still accurately differentiates the adipose tissues from other tissues, showing its robustness over different image qualities. In all cases, the predicted results are highly consistent with the ground truth labels. These results demonstrate the learning ability of our model via image-level labels, showing its effectiveness on clinical tissue identification.

Identifying small adipose tissue regions. Figure 6 presents two images obtained from nearby regions within the same human heart. As shown, Fig. 6 (A) and (B) are very similar and they all contain large regions of fibrosis tissue. However, in Fig. 6 (A), there is a small cluster of adipose tissue surrounded by the fibrosis tissues, while in Fig. 6 (B), there is no adipose tissue. As shown, this is a very challenging segmentation task due to the size of adipose tissue and the blurry boundary between different tissue types. Our model accurately delineates the adipose tissue regions in Fig. 6 (A) and it does not put any false alarm in Fig. 6 (B). In both cases, our model successfully distinguishes adipose tissues from other cardiac substrates. These results further demonstrate the strong learning ability of our model, as it can learn the most discriminative features via image-level labels, rather than simply memorizing the training samples.

Visualization of 3D segmentation. Figure 7 shows a typical result of 3D visualization of adipose tissue segmentation. We sequentially apply the trained network to segment consecutive Bscans and align segmented Bscans in 3D space. As shown, the segmented boundaries accurately delineate the morphological changes in the adipose tissues. Even though our model is trained on a small quantity of training data with image-level labels, it still successfully segments adipose regions of various sizes. These results indicate that our model has great potential to be applied to assess adipose tissue regions in catheter-based ablation operations.

6 Discussion

In this study, we propose a weakly supervised learning framework for cardiac adipose tissue segmentation on OCT images. Our approach contains two powerful modules: the pseudo label generation and the segmentation network training. In the pseudo label generation module, we use the superpixel-based propagation algorithm to address the sparse location seed challenge raised in the CAM results. Benefiting from our boundary masking algorithm and spatial regularisation strategy, the quality of the pseudo labels has been significantly improved for the training guidance. In the segmentation network training module, we introduce a novel loss function to increase the adipose tissue detection performance. By evaluating on a human cardiac dataset with cross validation strategy, our model achieves comparable results with the fully supervised baseline, showing its effectiveness on tissue characterization. Our study bridges the gap between the demand on automatic tissue analysis and the lack of high-quality pixel-wise annotations. To the best of our knowledge, this is the first study that attempts to address cardiac tissue segmentation via weakly supervised learning techniques.

One limitation of this study is that the results are evaluated on a benchtop OCT system. To move towards the aid of ablation procedures, catheter-based OCT system is needed, as it can help to optimize the treatment strategy by providing real-time cardiac substrates information. In the future, we will extend our current work into catheter-based in-vivo OCT images. Such extension will require further investigation on challenges such as image quality degeneration and motion disturbance. Compared with the benchtop OCT images, catheter-based OCT images are with lower image quality, suffering from lower contrast and motion effects. Without special consideration, the decreased image contrast may hinder the performance of the model. Motion disturbance caused by breath and heartbeat is another important factor that could lead to performance degradation. These disturbances could be partially corrected by applying the low pass filters. In the future, we will also extend our proposed weakly supervised framework to other OCT segmentation tasks, such as breast images and retinal images.

7 Conclusion

In this paper, we propose the first weakly supervised learning framework for adipose tissue segmentation in human cardiac OCT images. We design a novel CAM-superpixel segmentation approach which converts the sparse CAM results into pseudo pixel-wise labels for training. In addition, we also present and analyze the necessity and effectiveness of proposed steps and loss functions. Experimental results on the human cardiac dataset demonstrate that our model achieves comparable performance with models trained under full masks, showing the learning capability of our proposed model on image-level labels.


  • [1] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk (2010) Slic superpixels. Technical report Cited by: §4.1.2.
  • [2] M. Alam, D. Le, T. Son, J. I. Lim, and X. Yao (2020) AV-net: deep learning for fully automated artery-vein classification in optical coherence tomography angiography. Biomedical Optics Express 11 (9), pp. 5249–5257. Cited by: §2.
  • [3] C. M. Ambrosi, V. V. Fedorov, I. R. Efimov, R. B. Schuessler, and A. M. Rollins (2012) Quantification of fiber orientation in the canine atrial pacemaker complex using optical coherence tomography. Journal of Biomedical Optics 17 (7), pp. 071309. Cited by: §1.
  • [4] C. M. Ambrosi, N. Moazami, A. M. Rollins, and I. R. Efimov (2009) Virtual histology of the human heart using optical coherence tomography. Journal of Biomedical Optics 14 (5), pp. 054002. Cited by: §2.
  • [5] M. Bonou, S. Mavrogeni, C. J. Kapelios, G. Markousis-Mavrogenis, C. Aggeli, E. Cholongitas, A. D. Protogerou, and J. Barbetseas (2021) Cardiac adiposity and arrhythmias: the role of imaging. Diagnostics 11 (2), pp. 362. Cited by: §1.
  • [6] M. El Mahdiui, J. Simon, J. M. Smit, J. H. Kuneman, A. R. van Rosendael, E. W. Steyerberg, R. J. van der Geest, L. Száraz, S. Herczeg, N. Szegedi, et al. (2021) Posterior left atrial adipose tissue attenuation assessed by computed tomography and recurrence of atrial fibrillation after catheter ablation. Circulation: Arrhythmia and Electrophysiology 14 (4), pp. e009135. Cited by: §1.
  • [7] L. Fang, D. Cunefare, C. Wang, R. H. Guymer, S. Li, and S. Farsiu (2017) Automatic segmentation of nine retinal layer boundaries in oct images of non-exudative amd patients using deep learning and graph search. Biomedical Optics Express 8 (5), pp. 2732–2744. Cited by: §2.
  • [8] C. P. Fleming, H. Wang, K. J. Quan, and A. M. Rollins (2010) Real-time monitoring of cardiac radio-frequency ablation lesion formation using an optical coherence tomography forward-imaging catheter. Journal of Biomedical Optics 15 (3), pp. 030516. Cited by: §1.
  • [9] Y. Gan, T. H. Lye, C. C. Marboe, and C. P. Hendon (2019) Characterization of the human myocardium by optical coherence tomography. Journal of Biophotonics 12 (12), pp. e201900094. Cited by: §1, §5.1.
  • [10] Y. Gan, D. Tsay, S. B. Amir, C. C. Marboe, and C. P. Hendon (2016) Automated classification of optical coherence tomography images of human atrial tissue. Journal of Biomedical Optics 21 (10), pp. 101407. Cited by: §2, §4.1.1.
  • [11] X. Glorot and Y. Bengio (2010)

    Understanding the difficulty of training deep feedforward neural networks


    Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics

    pp. 249–256. Cited by: §5.2.
  • [12] C. J. Goergen, H. Radhakrishnan, S. Sakadžić, E. T. Mandeville, E. H. Lo, D. E. Sosnovik, and V. J. Srinivasan (2012) Optical coherence tractography using intrinsic contrast. Optics Letters 37 (18), pp. 3882–3884. Cited by: §1.
  • [13] M. Guo, M. Zhao, A. M. Cheong, H. Dai, A. K. Lam, and Y. Zhou (2019) Automatic quantification of superficial foveal avascular zone in optical coherence tomography angiography implemented with deep learning. Visual Computing for Industry, Biomedicine, and Art 2 (1), pp. 1–9. Cited by: §2.
  • [14] M. Gupta, A. M. Rollins, J. A. Izatt, and I. R. Efimov (2002) Imaging of the atrioventricular node using optical coherence tomography. Journal of Cardiovascular Electrophysiology 13 (1), pp. 95–95. Cited by: §1.
  • [15] Z. Huang, Y. Gan, T. Lye, D. Theagene, S. Chintapalli, S. Virdi, A. Laine, E. Angelini, and C. P. Hendon (2020) Segmentation and uncertainty measures of cardiac substrates within optical coherence tomography images via convolutional neural networks. In 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), pp. 1–4. Cited by: §1.
  • [16] Z. Huang, Y. Gan, T. Lye, H. Zhang, A. Laine, E. D. Angelini, and C. Hendon (2020) Heterogeneity measurement of cardiac tissues leveraging uncertainty information from image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 782–791. Cited by: §1, §2.
  • [17] M. S. Islam, J. Wang, S. S. Johnson, M. J. Thurtell, R. H. Kardon, and M. K. Garvin (2020) A deep-learning approach for automated oct en-face retinal vessel segmentation in cases of optic disc swelling using multiple en-face images as input. Translational Vision Science and Technology 9 (2), pp. 17–17. Cited by: §2.
  • [18] Z. Jiang, Z. Huang, B. Qiu, X. Meng, Y. You, X. Liu, M. Geng, G. Liu, C. Zhou, K. Yang, et al. (2020) Weakly supervised deep learning-based optical coherence tomography angiography. IEEE Transactions on Medical Imaging 40 (2), pp. 688–698. Cited by: §2.
  • [19] S. Khurshid, S. H. Choi, L. Weng, E. Y. Wang, L. Trinquart, E. J. Benjamin, P. T. Ellinor, and S. A. Lubitz (2018) Frequency of cardiac rhythm abnormalities in a half million adults. Circulation: Arrhythmia and Electrophysiology 11 (7), pp. e006273. Cited by: §1.
  • [20] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §5.2.
  • [21] A. Kolesnikov and C. H. Lampert (2016) Seed, expand and constrain: three principles for weakly-supervised image segmentation. In

    European Conference on Computer Vision

    pp. 695–711. Cited by: §4.2.
  • [22] K. Lee, A. K. Warren, M. D. Abràmoff, A. Wahle, S. S. Whitmore, I. C. Han, J. H. Fingert, T. E. Scheetz, R. F. Mullins, M. Sonka, et al. (2021) Automated segmentation of choroidal layers from 3-dimensional macular optical coherence tomography scans. Journal of Neuroscience Methods 360, pp. 109267. Cited by: §2.
  • [23] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila (2018) Noise2noise: learning image restoration without clean data. arXiv preprint arXiv:1803.04189. Cited by: §2.
  • [24] C. Li, H. Jia, J. Tian, C. He, F. Lu, K. Li, Y. Gong, S. Hu, B. Yu, and Z. Wang (2022) Comprehensive assessment of coronary calcification in intravascular oct using a spatial-temporal encoder-decoder network. IEEE Transactions on Medical Imaging 41 (4), pp. 857–868. External Links: Document Cited by: §1.
  • [25] J. Lo, M. Heisler, V. Vanzan, S. Karst, I. Z. Matovinović, S. Lončarić, E. V. Navajas, M. F. Beg, and M. V. Šarunić (2020)

    Microvasculature segmentation and intercapillary area quantification of the deep vascular complex using transfer learning

    Translational Vision Science & Technology 9 (2), pp. 38–38. Cited by: §2.
  • [26] T. H. Lye, C. C. Marboe, and C. P. Hendon (2019) Imaging of subendocardial adipose tissue and fiber orientation distributions in the human left atrium using optical coherence tomography. Journal of Cardiovascular Electrophysiology 30 (12), pp. 2950–2959. Cited by: §1, §2.
  • [27] S. Masood, R. Fang, P. Li, H. Li, B. Sheng, A. Mathavan, X. Wang, P. Yang, Q. Wu, J. Qin, et al. (2019) Automatic choroid layer segmentation from optical coherence tomography images using deep learning. Scientific Reports 9 (1), pp. 1–18. Cited by: §2.
  • [28] W. Meiniel, Y. Gan, C. P. Hendon, J. Olivo-Marin, A. Laine, and E. D. Angelini (2016) Sparsity-based simplification of spectral-domain optical coherence tomography images of cardiac samples. In 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI), pp. 373–376. Cited by: §2.
  • [29] R. Mirshahi, P. Anvari, H. Riazi-Esfahani, M. Sardarinia, M. Naseripour, and K. G. Falavarjani (2021) Foveal avascular zone segmentation in optical coherence tomography angiography images using a deep learning approach. Scientific Reports 11 (1), pp. 1–8. Cited by: §2.
  • [30] M. A. Pabon, K. Manocha, J. W. Cheung, and J. C. Lo (2018) Linking arrhythmias and adipocytes: insights, mechanisms, and future directions. Frontiers in Physiology, pp. 1752. Cited by: §1.
  • [31] M. Pekala, N. Joshi, T.Y. A. Liu, N.M. Bressler, D. C. DeBuc, and P. Burlina (2019) Deep learning based retinal oct segmentation. Computers in Biology and Medicine 114, pp. 103445. External Links: ISSN 0010-4825, Document, Link Cited by: §2.
  • [32] M. Pekala, N. Joshi, T. A. Liu, N. M. Bressler, D. C. DeBuc, and P. Burlina (2019) Deep learning based retinal oct segmentation. Computers in Biology and Medicine 114, pp. 103445. Cited by: §2.
  • [33] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Cited by: §5.2.
  • [34] R. Samanta, J. Pouliopoulos, A. Thiagalingam, and P. Kovoor (2016) Role of adipose tissue in the pathogenesis of cardiac arrhythmias. Heart Rhythm 13 (1), pp. 311–320. Cited by: §1.
  • [35] E. Sung, A. Prakosa, K. N. Aronis, S. Zhou, S. L. Zimmerman, H. Tandri, S. Nazarian, R. D. Berger, J. Chrispin, and N. A. Trayanova (2020) Personalized digital-heart technology for ventricular tachycardia ablation targeting in hearts with infiltrating adiposity. Circulation: Arrhythmia and Electrophysiology 13 (12), pp. e008912. Cited by: §1.
  • [36] H. Wang, W. Kang, A. P. Bishop, A. M. Rollins, T. Carrigan, N. Rosenthal, and M. Arruda (2011) In vivo intracardiac optical coherence tomography imaging through percutaneous access: toward image-guided radio-frequency ablation. Journal of biomedical optics 16 (11), pp. 110505. Cited by: §1.
  • [37] J. Wang, W. Li, Y. Chen, W. Fang, W. Kong, Y. He, and G. Shi (2021) Weakly supervised anomaly segmentation in retinal oct images using an adversarial learning approach. Biomedical Optics Express 12 (8), pp. 4713–4729. Cited by: §1, §2.
  • [38] X. Yao, Y. Gan, C. C. Marboe, and C. P. Hendon (2016) Myocardial imaging using ultrahigh-resolution spectral domain optical coherence tomography. Journal of Biomedical Optics 21 (6), pp. 061006. Cited by: §1.
  • [39] T. K. Yoo, J. Y. Choi, and H. K. Kim (2021) Feasibility study to improve deep learning in oct diagnosis of rare retinal diseases with few-shot classification. Medical & Biological Engineering & Computing 59 (2), pp. 401–415. Cited by: §1, §2.
  • [40] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba (2016)

    Learning deep features for discriminative localization


    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 2921–2929. Cited by: §4.1.1.