JAS-GAN: Generative Adversarial Network Based Joint Atrium and Scar Segmentations on Unbalanced Atrial Targets

05/01/2021 ∙ by Jun Chen, et al. ∙ MSN Imperial College London SUN YAT-SEN UNIVERSITY 11

Automated and accurate segmentations of left atrium (LA) and atrial scars from late gadolinium-enhanced cardiac magnetic resonance (LGE CMR) images are in high demand for quantifying atrial scars. The previous quantification of atrial scars relies on a two-phase segmentation for LA and atrial scars due to their large volume difference (unbalanced atrial targets). In this paper, we propose an inter-cascade generative adversarial network, namely JAS-GAN, to segment the unbalanced atrial targets from LGE CMR images automatically and accurately in an end-to-end way. Firstly, JAS-GAN investigates an adaptive attention cascade to automatically correlate the segmentation tasks of the unbalanced atrial targets. The adaptive attention cascade mainly models the inclusion relationship of the two unbalanced atrial targets, where the estimated LA acts as the attention map to adaptively focus on the small atrial scars roughly. Then, an adversarial regularization is applied to the segmentation tasks of the unbalanced atrial targets for making a consistent optimization. It mainly forces the estimated joint distribution of LA and atrial scars to match the real ones. We evaluated the performance of our JAS-GAN on a 3D LGE CMR dataset with 192 scans. Compared with the state-of-the-art methods, our proposed approach yielded better segmentation performance (Average Dice Similarity Coefficient (DSC) values of 0.946 and 0.821 for LA and atrial scars, respectively), which indicated the effectiveness of our proposed approach for segmenting unbalanced atrial targets.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 6

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Automated and accurate segmentations of left atrium (LA) and atrial scars from late gadolinium enhanced cardiac magnetic resonance (LGE CMR) are crucial for the quantification of atrial scars. The quantification of atrial scars usually requires the segmentations of the LA and atrial scars to obtain an accurate estimation of the scar percentage [ravanelli2014novel], helping the treatment stratification of patients with atrial fibrillation (AF) before and after radio-frequency catheter ablation [Karim2013Evaluation, vergara2011tailored]. Clinically, LGE CMR imaging allows the visualization of scar tissues through the amount of contrast agent left due to differences in interstitial cell structures [siebermair2017assessment]. Thus, LGE CMR has emerged as a promising technique to non-invasively detect and locate atrial scars to further provide the accurate quantification of atrial scars [siebermair2017assessment]. In clinical practice, this generally relies on manual segmentations of both the LA and atrial scars [khurram2016left], which is time-consuming. Automated segmentations of the LA and atrial scars from LGE CMR images would facilitate the rapid and reproducible quantification of atrial scars.

However, automated and accurate segmentations of LA and atrial scars from LGE CMR images are two very challenging tasks due to the complexities of the two unbalanced targets with significant volume contrast as shown in Fig. 1. Firstly, for the segmentation task of LA, the LGE CMR imaging technology is generally used to visualize scar tissue by enhancing its signal intensity. This gives rise to the attenuated contrast in non-diseased tissue [xiong2020global]. The attenuated contrast in healthy LA reduces the visibility of the LA boundaries, which limits the usage of edge and region based methods for the automated and accurate segmentation of the LA. Secondly, for the segmentation task of atrial scars, atrial scars occupy only a very small portion of LA volume. They are therefore highly susceptible to noise interference. Besides, compared with the voxels in the background, the amount of information available on the small atrial scars is very limited, which results in severe class-imbalance problems for hindering the automated and accurate segmentation of atrial scars. Furthermore, there are many other nearby tissues (aortic wall, oesophagus and other tissues) that are enhanced by LGE CMR imaging along with atrial scars, which also can interfere with the accurate recognition of the atrial scars. To tackle these difficulties, most of the work done in this field has focused on a separated two-phase segmentation framework, where the LA is obtained first followed by the delineation of the small atrial scars. This two-phase segmentation framework is limited to the inefficiency and error accumulation problem.

Fig. 1: Examples of transverse LGE CMR slices (left) together with manual segmentations (middle) and 3D visualization (right) in a pre-ablation scan (top raw) and a post-ablation scan (bottom raw) from two patients. The red regions denote the LA while the green ones denote the atrial scars.

In order to overcome the issues mentioned above, we investigate end-to-end joint learning for two semantic segmentation tasks of the unbalanced targets of LA and atrial scars. Because the atrial scars are located in the LA wall, there exists an inclusion relationship between the large LA and small atrial scars. We can make full use of the inclusion relationship to mine the dependent correlation between the segmentation tasks of LA and atrial scars for their joint learning. However, the LA and atrial scars are unbalanced targets as shown in Fig. 1, which can bring the problem of inconsistent target learning. Hence, we further investigate the adversarial learning for consistent target learning [isola2017image, hung2019adversarial].

In this paper, we propose a Joint Atrium (i.e., LA) and Scar (i.e., atrial scars) segmentation framework based on an inter-cascade Generative Adversarial Network, namely JAS-GAN, from LGE CMR images. In our proposed JAS-GAN, the inclusion relationship between the unbalanced targets of large LA and small atrial scars is effectively mined for their accurate and joint segmentations. Our proposed JAS-GAN consists of an adaptive attention cascade network and a joint discriminative network: (1) The adaptive attention cascade network contains an encoder-decoder module for LA segmentation and a residual network for atrial scars segmentation. The two modules are cascaded through an adaptive attention connection to model the spatial correlation of the LA and atrial scars. The adaptive attention connection makes full use of the segmented LA as an attention map to further roughly focus on the small atrial scars in an end-to-end way. (2) The joint discriminative network further transforms the segmentation problem of pixel-level classification for the unbalanced targets of LA and atrial scars into a problem of pixel-level identification, that is, whether the pixels at the same position in the LA and atrial scar segmentation maps are produced by the adaptive attention cascade network or from the ground truth label maps. It mainly employs an adversarial regularization to force the estimated joint distribution of LA and atrial scars to match the real ones, which can provide a consistent optimization for the segmentation task learning of unbalanced atrial targets.

Finally, the contributions of our framework can be summarized as follows:

  1. We propose an end-to-end segmentation framework for the LA and atrial scars to facilitate the rapid and reproducible quantification of atrial scars. The framework can further provide the essential guidance for clinicians to analyze the structures of LA and atrial scars directly from 3D LGE CMR images.

  2. We propose an inter-cascade adversarial learning paradigm to mine the relationship of unbalanced targets automatically by modelling their position and joint distribution.

  3. We have conducted comprehensive experiments on a 3D LGE CMR dataset with 192 scans for validating our proposed JAS-GAN. The results demonstrated the better performance of JAS-GAN over the state-of-the-art and traditional methods, which indicated the feasibility of unbalanced atrial targets segmentation framework.

Ii Related Work

Ii-a Two-Phase Segmentation Methods for Quantifying Atrial Scars

Currently, the most related methods to the quantification of atrial scars rely on a two-phase sequential segmentation of the LA and atrial scars [Karim2013Evaluation, Yang2018Fully]. These methods are inadequate to achieve accurate quantification as the segmentations of the LA and atrial scars are handled separately. There is no feedback loop existing between them during model learning, thus leading to the error accumulation problem.

In these methods, segmenting the LA cavity or LA wall is usually the first step to further locate the atrial scars. Furthermore, instead of directly segmenting the LA cavity or LA wall from LGE CMR scans, some methods rely on a separately acquired breath-hold magnetic resonance angiogram (MRA) study or on a respiratory and cardiac gated 3D Roadmap acquisition for LA segmentation. Then, they registered the segmented LA to the LGE CMR acquisition for the delineation of atrial scars. Previously proposed methods for LA wall segmentation include (1) manual segmentation [Karim2013Evaluation, Perry2015Automatic, ravanelli2014novel], which is tedious and inefficient, (2) segmentation of the LA cavity followed by some morphological dilations for LA wall extraction [karim2014method], and (3) automated or semi-automatic LA wall segmentation, e.g., active contour based segmentation [Karim2013Evaluation]. Furthermore, many automated methods have been proposed for segmenting the LA [Mortazi2017CardiacNET, Tobon2015Benchmark, XiongFully, chen2019discriminative, yu2019uncertainty, zhuang2019evaluation]. However, they have not yet been further applied to the quantification of the atrial scars.

Based on the segmented LA wall, histogram analysis, thresholding, k-means clustering, and graph-cuts based unsupervised methods have been applied to segment atrial scars

[Karim2013Evaluation]

. However, these unsupervised learning methods are susceptible to various image quality and noise conditions. Yang et al.

[Yang2018Fully]

proposed deep learning and support vector machines based supervised classification methods to segment atrial scars and achieved better results. However, it still relies on a two-phase segmentation for LA and atrial scars.

Fig. 2: The architecture of our proposed JAS-GAN for the joint segmentations of unbalanced atrial targets. Each 3D input volume is sliced at axial plane then fed into the JAS-GAN. An adaptive attention cascade network correlates the spatial location of LA and small atrial scars. A joint discriminative network is designed to regularize the adaptive attention cascade network to produce matched joint distribution of unbalanced atrial targets.

Ii-B Cascade Learning

Cascade is an efficient structure to improve performance for the deep learning based single task or multiple tasks solver [li2017not, murthy2016deep, cai2018cascade, dai2016instance, ouyang2017chained, lin2017cascaded, chen2019hybrid], which has been widely used in various applications including classification, detection and segmentation. For single task learning, the cascade can be divided into multi-stage learning. The latter stages can focus on more accurate learning to improve the performance stage by stage and achieve a faster inference. For the multiple tasks problem, the tasks are designed in a cascade manner that the tasks at a later stage depending on the output of an earlier stage.

Iii Methodology

Goal: Learn unbalanced atrial targets segmentation model by using LGE CMR images.

Notation: represents the input image with size of and channel of . The subscripts of and in notations denote the LA and the atrial scars respectively. and represent the estimated LA and the estimated atrial scars respectively, while represents the both estimated LA and atrial scars. and represent the ground truth of LA and atrial scars respectively, while represents the ground truth of both LA and atrial scars. and denote the weight maps. A represents the enhanced map. and denote the real confidence map and the fake confidence map with both size of and channel of 1. , and

represents the encoder-decoder network (EDN), the residual network (RN) and the convolutional long short-term memory (convLSTM) based adaptive attention connection module (AC), respectively.

represents the combination of , and . represents the joint discriminative network.

Iii-a An Overview of JAS-GAN

Fig. 2 displays our proposed JAS-GAN. JAS-GAN mainly comprises of an adaptive attention cascade network and a joint discriminative network . Specifically, the adaptive attention cascade network is designed for the joint segmentations of LA and atrial scars. The consists of an encoder-decoder network for LA segmentation, a residual network for atrial scars segmentation and an adaptive attention connection module for constructing cascade connection that the and the are cascaded by the . The joint discriminative network conditioned on the image is designed to force the to produce a correct joint distribution of LA and atrial scars.

Iii-B Adaptive Attention Cascade Network for Unbalanced Atrial Targets Simultaneous Estimation

We model the inclusion relationship of LA and atrial scars to build a cascade segmentation network for their joint segmentation. Firstly, consider an encoder-decoder establishes a mapping to estimate from directly. We notice that small atrial scars are located in the LA wall, the LA can be taken as prior knowledge to constrain the learnable area of atrial scars, which reduces the interference of external noise of LA for atrial scar identification. Therefore, we model their inclusion relationship to leverage the LA to focus on the small atrial scars roughly. Because the voxel values of predicted range from , the can be used as an attention map that 1 represents the full attention while the 0 denotes no attention to pay attention to atrial scars roughly on . However, atrial scars distributed beside the border of the LA as shown in Fig. 1. If the produces an under-segmented LA, the general attention operation may weaken the atrial scars partially or completely. Therefore, we further investigate an adaptive attention module to estimate an enhanced map A for scar identification by mining the relationship of and corresponding for adaptively adjusting the attention operation. In detail, firstly establishes a mapping based on a convLSTM to learn the relationship of and and estimate two weight maps of and . The two weight maps of and are then used to adaptively adjust the attention operation:

(1)

where the denotes the element-wise product. As shown in Equation (1), the terms of adaptively adjust the general attention operation of . Based on the obtained , we can separate atrial scars from the surrounding enhanced tissues and organs with highly similar intensities to scars. Then we consider a residual network to establish an another mapping to estimate . The residual network discards the downsampling operation to avoid information loss of small atrial targets.

Therefore, to achieve the joint estimation for and , we directly concatenate and by establishing a function: , defined by . In this case, firstly estimates the from input image . Then produces an adaptive attention map from the estimated and the input image . Finally, estimates the from . The adaptive attention cascade network integrates the segmentations of LA and atrial scars into one step by the seamless cascade connection. Such connection results in an optimal model learning to leverage the large LA to catch the small atrial scars. It further can automatically relieve the error accumulation and noisy interference for the accurate segmentation of small atrial scars.

Iii-C Joint Discriminative Network for Adversarial Regularization

The adaptive attention cascade network

is optimized to produce the right class label at each voxel location independently. We further investigate an adversarial learning to transform the unbalanced target segmentation that classifies large LA and small atrial scars to identify whether the pixels at the same position in the LA and atrial scar segmentation maps are produced by the

or from the ground truth. The adversarial learning regularizes the adaptive attention cascade network to force the estimated joint distribution of LA and atrial scars to match the real ones. Specifically, consider a joint discriminative network conditioned on establishes a mapping and . Each pixel ( and , where denotes the spatial position of the map with and ) of the confidence map represents whether that the pixels at the same position in the LA and atrial scar segmentation maps are sampled from the ground truth label or produced by the . The prior distributions on the two atrial targets and image are denoted as the and , respectively. Then, we consider the following objectives:

(2)

where

denotes the sigmoid function.

is the confidence map of at location while is the confidence map of at location . The Equation (2) makes the learning of and a dynamic adversarial process to regularize the to produce the estimated joint distribution of LA and atrial scars to match the real ones. Specifically, on the one hand, the tries to distinguish the estimated LA and atrial scars from the ground truth by

(3)

On the other hand, the mismatches between the estimated joint distribution and the real joint distribution can be penalized by

(4)

Beyond the optimization of that encouraging model to estimate the right class label at each voxel location independently, this part is taken as the regularization term to regularize the cascade segmentation network to drive it to approximate the real joint distribution of unbalanced atrial targets.

Iii-D Objective Function for Model Learning

The objective function of JAS-GAN designs for effectively generating reliable results on both the segmentation process and the adversarial training process. Beyond the adversarial training, the adaptive attention cascade network is optimized independently by minimizing the following objective:

(5)

where denotes the voxel-wise cross-entropy function while

denotes the Dice-like loss function

[Milletari2016V] for addressing the segmentation of small atrial scars. and are weight parameters used to balance the segmentation losses of the LA and atrial scars.

Then, an adversarial learning is applied to the and to further regularize the segmentations of LA and atrial scars. We integrate the adversarial learning into the voxel-wise estimations of LA and atrial scars. In this case, the can be learned by minimizing the following objective:

(6)

where is the weight parameter used to balance the segmentation loss and the adversarial loss. The can be learned by directly maximizing .

Iii-E Network Configuration

The framework of our proposed JAS-GAN mainly consists of a cascade segmentation network and a joint discriminative network (Please see the detailed structure of our proposed network architecture in the Appendix). The cascade segmentation network contains three modules of , and . is based on the 2D U-Net [Ronneberger2015U]

but uses bilinear interpolation for upsampling.

is based on a convolutional LSTM network with 32 kernels. is based on the residual structure [he2016deep]. It comprises three residual blocks. Each residual block contains three convolution blocks that each convolution block consists of a convolution layer with

filters, a batch normalization layer and a ReLU layer. Finally, a

convolutional layer with sigmoid function is used to predict atrial scars.

The structure of the joint discriminative network is similar to [hung2019adversarial]. It comprises of 4 convolution layers with

kernel and {32, 64, 128, 256} channels in the stride of two. Each convolution layer is followed by a batch normalization layer and a ReLU layer. Then, a sub-pixel convolution with kernels of 3072 to rescale the output of the last convolution layer to the size of

. Finally, a convolutional layer with a sigmoid function is used to predict the confidence maps.

Iv Experiments and Results

Iv-a Data Description

CMR data were acquired in patients with longstanding persistent AF on a Siemens Magnetom Avanto 1.5T scanner (Siemens Medical Systems, Erlangen, Germany). Transverse navigator-gated 3D LGE CMR [peters2009recurrence] was performed using an inversion prepared segmented gradient echo sequence (TE/TR 2.2ms/5.2ms) 15 minutes after gadolinium administration (Gadovistgadobutrol, 0.1mmol/kg body weight, Bayer- Schering, Berlin, Germany) [haissaguerre1998spontaneous]. The inversion time (TI) was set to null the signal from normal myocardium and varied on a beat-by-beat basis, dependent on the cardiac cycle length [keegan2015dynamic]. Detailed scanning parameters are: 30-34 slices at , reconstructed to 60-68 slices at , field-of-view . For each patient, prior to contrast agent administration, coronal navigatorgated 3D Roadmap (TE/TR 1ms/2.3ms) data were acquired with the following parameters: 72-80 slices at , reconstructed to 144-160 slices at , field-of-view . LGE CMR was acquired during free-breathing using a crossed-pairs navigator positioned over the dome of the right hemi-diaphragm with navigator acceptance window size of and CLAWS respiratory motion control [keegan2014improved, keegan2014navigator]. LGE CMR data were collected from 2011-2018 as a retrospective study. In total, 192 scans from 115 subjects including 97 pre-ablation and 95 post-ablation scans were used in this study (All subjects gave their informed consent for inclusion before they participated in the study with approval from the local institutional review board in accordance with the Declaration of Helsinki (Ethics approval reference number: 10/H0701/112, CMR Unit, Royal Brompton Hospital)). Manual segmentations of the LA and proximal PVs and atrial scars had been done by a physician with 3 years of experience and specialized in LGE CMR. A second senior radiologist (25 years of experience and specialized in cardiac MRI) confirmed the manual segmentations. The results confirmed by the radiologist were chosen as the ground truth for experiments. The LA label is the LA epicardium (LA wall and LA cavity). Because the atrial scars are located in LA wall, the atrial scars label is encapsulated in the LA label.

Target Pre-/Post-ablation DSC JI ASD (mm) NMI
LA and PVs Pre-ablation
Post-ablation
Pre-&Post-ablations
Atrial scars Pre-ablation
Post-ablation
Pre-&Post-ablations
TABLE I: Quantitative results of unbalanced atrial targets segmentation on pre-ablation, post-ablation and pre-&post-ablations in terms of DSC, JI, ASD and NMI. Results are presented in the form of mean standard deviation. Abbreviations: DSC, Dice Similarity Coefficient; JI, Jaccard Index; ASD, Average Surface Distance; NMI, Normalized Mutual Information.

Iv-B Implementation Details

We performed data normalization on the whole 3D volume for all experiments. In addition, because of the small proportion of positive pixels per axial slice, it is very ineffective to train the segmentation model on the entire LGE-MRI data directly. To relieve this, smaller patches of which contained positive and negative pixels centered on the raw LGE CMR image were generated as inputs.

We randomly divided our dataset into a training set (116 scans from 77 patients) and a testing set (76 scans from 38 patients with 38 pre-ablation and 38 post-ablation scans) for all experiments. The divided strategy for the dataset was that all scans from each unique patient were only in one of the training or testing sets.

We used the Adam method to perform the optimization for the cascade segmentation network with a decayed learning rate (the initial learning rate was set to 0.001 with a decay rate of ). The optimizer used for the joint discriminative network was Adam with a fixed learning rate of 0.0001. We used the current statistics of batch normalization for the both training and testing. In addition, to stabilize the training of GAN, we used the feature matching [Salimans2016Improved] for adversarial loss. The coefficients of and used to balance the two segmentation losses, were automatically learned based on the strategy of uncertainty [kendall2018multi]. The coefficient of used to balance the segmentation loss and adversarial loss, was set to a fixed value of 0.1.

Our deep learning model was implemented using Tensorflow

on an Ubuntu machine and was trained and tested using an Nvidia RTX 8000 GPU (48GB GPU memory).

Fig. 3: Qualitative visualization of the segmentation for the LA in pre-ablation and post-ablation cases. Each estimated segmentation is represented as a dashed green counter while the red contour denotes its corresponding manually delineated ground truth.
Fig. 4: Qualitative visualization of the segmentation for the atrial scars in pre-ablation and post-ablation cases. Each estimated segmentation is represented as a dashed green counter while the red contour denotes its corresponding manually delineated ground truth.
Fig. 5: Qualitative visualization of the segmentation for LA and atrial scars in worst cases. Each estimated segmentation is represented as a dashed green counter (bottom raw) while the red contour denotes its corresponding manually delineated ground truth (top raw). (a) Qualitative visualization of the segmentation for atrial scars. (b) Qualitative visualization of the segmentation for LA.

Iv-C Evaluation Criteria

To evaluate the segmentation performance of our proposed JAS-GAN, we used region-based metrics [dice1945measures, taha2015metrics], e.g., the Dice Similarity Coefficient (DSC) and the Jaccard Index (JI), to validate the predicted segmentation map against the manually defined ground-truth. We also used a surface-based metric called Average Surface Distance (ASD) to provide the distance in to quantify the accuracy of the predicted mesh () compared to the ground-truth mesh () [taha2015metrics]. We further adopted the Normalized Mutual Information (NMI) to measure the similarity between the estimated segmentation maps and the ground truth [taha2015metrics]. In addition, the segmentation performance of our proposed JAS-GAN was further evaluated by the over-segmentation rate (OSR) and the under-segmentation rate (USR), which are defined as and [miao2018image], where TP, FP and FN denote the True Positive, the False Positive, and the False Negative, respectively. They are defined as the number of voxels correctly identified as positive for target, the number of voxels incorrectly identified as positive for target, and the number of voxels incorrectly identified as negative for target, respectively. TP, FP and FN were calculated while considering all voxels in a 3D volume.

Target Methods DSC JI ASD (mm) NMI
LA and PVs EDN
EDN + AC
EDN + AC + T (JAS-GAN)
Atrial scars RN
RN + LA
RN + AC
RN + AC + T (JAS-GAN)
TABLE II: Ablation results comparison for JAS-GAN in terms of DSC, JI, ASD and NMI. The results are presented in the form of mean standard deviation. Abbreviations: EDN, baseline based on encoder-decoder network for LA segmentation; RN, baseline based on residual network for atrial scars segmentation; AC, adaptive attention cascade; T: joint discriminative network; DSC, Dice Similarity Coefficient; JI, Jaccard Index; ASD, Average Surface Distance; NMI, Normalized Mutual Information.

Iv-D Segmentation Performance of JAS-GAN.

Quantitative analysis: Table I

summarizes the quantitative segmentation results of JAS-GAN grouped by the pre-ablation, post-ablation and pre-&post-ablations. Despite the challenges in segmenting the LA and atrial scars from LGE CMR scans, our proposed JAS-GAN still achieved high segmentation accuracy in terms of DSC, JI, ASD and NMI for both LA and atrial scars in pre-ablation, post-ablation and pre-&post-ablations. We had further performed the statistical tests (t-test) to show the statistical differences between the pre-ablation and the post-ablation. The calculated lowest P-values of

and in terms of DSC, JI, ASD and NMI for LA and atrial scars demonstrated there were no significant differences between pre-ablation and post-ablation. Furthermore, we had investigated the inter-observation variability and the inter-observer agreement from two manual segmentations of LA and atrial scars. We provided 12 cases selected from the testing data (6 pre-ablation scans and 6 post-ablation scans) for two experts to manually label the LA and atrial scars independently. We had followed the [joskowicz2019inter] to use the mean (1-DSC) and mean DSC with ranges to measure the inter-observer variability and the inter-observer agreement based on the mean volume overlap variability values and the mean volume overlap values, respectively. The inter-observer variabilities for the LA and atrial scars are 0.082 [-0.009,0.007] and 0.291 [-0.072, 0.063], respectively. The inter-observer agreements for the LA and atrial scars were 0.918 [-0.007,0.009] and 0.709 [-0.063, 0.072], respectively. Compared with the inter-observer agreements, JAS-GAN has achieved higher segmentation accuracies for the unbalanced atrial targets. These results indicated the ability of JAS-GAN in handling the automated and accurate segmentations of LA and atrial scars.

Qualitative analysis: Fig. 3 and Fig. 4 show the qualitative segmentation results of JAS-GAN compared to the ground truth for the selected slices of pre-ablation and post-ablation. One can see that our proposed JAS-GAN has the ability to handle the shape and size variations of the LA and small atrial scars. We also provided the qualitative segmentation results of LA and atrial scars for the worst cases. The studied cohort is a difficult AF cohort that the patients have severe arrhythmia during the MRI scanning, the blurry LA boundary and indistinguishable atrial scars are major reasons for the worst segmentation results as shown in Fig. 5 (a) and (b). However, one also can see that our proposed JAS-GAN still achieves general segmentation results in visual. We will collect more LGE CMR data for model learning to overcome the blurry LA boundary and indistinguishable atrial scar.

Fig. 6: Analysis of over-segmentation rate (OSR) and under-segmentation rate (USR) for LA and atrial scars (Black bars denote standard deviation). (a) and (b) denote under-segmentation rate and over-segmentation rate comparison for the LA segmentation with/without using cascade connection. (c) and (d) denote under-segmentation rate and over-segmentation rate comparison for the atrial scars segmentation with/without using cascade connection. Abbreviations: EDN, baseline based on a encoder-decoder for LA segmentation; RN, baseline based on a residual network for atrial scars segmentation; AC, adaptive attention cascade.

Iv-E Ablation Analysis of JAS-GAN.

The effectiveness of the adaptive attention cascade network and the joint discriminative network was extensively analysed with ablation experiments. Firstly, the baselines based on the encoder-decoder network (EDN) and the residual network (RN) were performed for the LA and atrial scars segmentations, respectively. Then, we used the LA segmentation results of EDN to further define a region of interest (ROI) in the input image for further RN-based scar segmentation (RN+LA). Next, we constructed a cascade network that EDN and RN were cascaded by the adaptive attention cascade (AC) to perform the joint segmentations of LA and atrial scars in an end-to-end manner (EDN + AC for LA, RN + AC for atrial scars). Finally, based on the cascade network, we added the joint discriminative network T for adversarial regularization (EDN + AC + T for LA, RN + AC + T for atrial scars).

1) Effectiveness of adaptive attention cascade network: Adaptive attention cascade network leverages an adaptive attention cascade to automatically correlate the segmentation tasks of LA and atrial scars for their joint segmentations. The adaptive attention cascade makes the segmentation model try to produce the over-segmented LA rather than to produce the under-segmented LA for focusing on the small atrial scars roughly. As the results are shown in Table II, adaptive attention cascade can both improve the segmentation performance of LA and atrial scars in terms of , , and (EDN + AC vs. EDN, RN + AC vs. RN). One also can see that the improvement of segmentation accuracy for atrial scars was limited while using the LA segmentation output to define ROI in the image for further scar segmentation (RN + LA vs. RN). The reason is that the under-segmented LA can weaken the atrial scars partially or completely while using two-stage segmentation. The improvements of adaptive attention cascade had been demonstrated to be statistically significant based on t-tests (P-values 0.05). Fig. 6 summarizes the over-segmented and under-segmented results for estimated LA and atrial scars. The Fig. 6 (a) and (b) show that EDN with AC achieved lower and higher compared to EDN for the estimated LA, which illustrated that EDN with AC tries to produce over-segmented LA to pay attention to the small atrial scars. Fig. 6 (c) and (d) show that RN with AC achieved the lower and compared to RN for the segmentation of small atrial scars, which indicated that adaptive attention cascade leverages estimated LA to constrain the learnable area of small atrial scars for their accurate identification.

Fig. 7: Analysis of different cascade information for correlating the segmentation tasks of unbalanced atrial targets. (a) Pairwise tournament matrix for measuring the superiority of one cascade information relative to others for the correlation extent between the segmentation tasks of LA and atrial scars.

obtains the best superiority for the correlation extent between the two segmentation tasks. (b) First-order task affinity matrix for measuring the correlation between the two segmentation tasks for different cascade information.

achieves the best correlation between the two segmentation tasks. Abbreviations:

, LA probability map;

, , , and , information output by the encoder, the first up-sampling block of decoder, the second up-sampling block of decoder, the third up-sampling block of decoder and the fourth up-sampling block of decoder in LA segmentation network, respectively; TLS: segmentation tasks of LA and atrial scars

2) Effectiveness of the joint discriminative network: As the experiment results are shown in Table II

, compared with EDN + AC and RN + AC, JAS-GAN achieved better segmentation results for LA and atrial scars across all evaluation metrics, which indicated that the adversarial regularization achieved by the joint discriminative network is effective to improve the segmentation performance of adaptive attention cascade network. In addition, the improvements of the joint discriminative network had been demonstrated to be statistically significant (P-values

0.05) based on t-tests.

Methods DSC JI ASD (mm) NMI
TABLE III: Performance comparison of different cascade operations for atrial scars segmentation in terms of DSC, JI, ASD AND NMI. The results are presented in the form of mean standard deviation. Abbreviations: DSC, Dice Similarity Coefficient; JI, Jaccard Index; ASD, Average Surface Distance; NMI, Normalized Mutual Information; , element-wise add operation; , element-wise product operation; , concatenation operation; , our used adaptive attention operation.

Iv-F Analysis of Adaptive Attention Cascade Connection.

To analyse the feasibility of adaptive attention cascade connection, we performed extra experiments to validate the effectiveness of our used cascade information (LA probability map) and cascade operation (adaptive attention). In our proposed cascade framework, we used the estimated LA probability map, which represents complete decoding for the LA feature information, as the cascade information to correlate the segmentation tasks of LA and atrial scars. To demonstrate the superior of the LA probability map, we further investigated the influence of LA information with different decoding levels for correlating the segmentation tasks of LA and atrial scars. In our experiment, in addition to the LA probability map (), we also used extra five kinds of LA information with different decoding levels from the LA segmentation network (encoder-decoder) as feedforward information to correlate the segmentation tasks of LA and atrial scars. They were the information output by the encoder (), the first up-sampling block of the decoder (), the second up-sampling block of the decoder (), the third up-sampling block of the decoder () and the fourth up-sampling block of the decoder (). We followed the [zamir2018taskonomy] to construct a pairwise tournament matrix to measure the superiority of each information to correlate the segmentation tasks of LA and atrial scars compared to other information. As the constructed pairwise tournament matrix are shown in Fig. 7 (a), the element at of pairwise tournament matrix is the percentage of data in a test set , on which correlates the segmentation tasks of LA and atrial scars () better than did (i.e. ). Based on the pairwise tournament matrix, we further obtained the affinity matrix as shown in Fig. 7 (b), where each value represents the correlation between the two segmentation tasks achieved by the corresponding cascade information. As the results are shown in Fig. 7 (b), the cascade information of the estimated LA probability map obtained the best correlation for the two segmentation tasks.

To demonstrate that our used cascade operation of adaptive attention () is effective to segment small atrial scars, we further compared it to the pixel-wise add operation () which tries to use the LA segmentation output to enhance the LA region of image for subsequent segmentation of atrial scars, general attention operation with pixel-wise product () which tries to use the LA segmentation output to define ROI in input image for atrial scars segmentation with end-to-end model optimization, and direct concatenation operation () of the estimated LA and the input image. As the summarized results are shown in Table III, our used adaptive attention cascade achieved better segmentation results. Furthermore, the improvements had been demonstrated to be statistically significant based on t-tests (P-values 0.05). Those indicated the effectiveness of adaptive attention for the segmentation of small atrial scars.

Fig. 8: Principal components analysis (PCA) based visualization of joint distribution for LA and atrial scars. The x-axis represents the direction with the largest variance of the data before PCA operation while the y-axis represents the direction orthogonal to the x-axis with the largest variance of the data before PCA operation. (a) The joint distribution estimated by JAS-GAN with joint discriminative network and real ones. (b) The joint distribution estimated by JAS-GAN without joint discriminative network and real ones. Abbreviations: EJD, estimated joint distribution; RJD: real joint distribution).

Iv-G Analysis of Joint Discriminative Network.

In our proposed JAS-GAN, the joint discriminative network is used to further transform the semantic segmentation of pixel-level classification for the unbalanced targets into the joint pixel-level identification of unbalanced targets. It mainly utilizes the adversarial regularization to force the estimated joint distribution of LA and atrial scars produced by the cascade segmentation network to match the real ones. To demonstrate that the joint discriminative network has the ability to improve the consistency of joint distribution for LA and atrial scars, we had made a visualization of the estimated joint distribution (EJD) and the real ones (RJD) based on the principal components analysis (PCA) for visually assessing the matching degree of EJD and RJD as shown in Fig. 8, where the data points in the estimated joint distribution and the data points in the real ones were in a one-to-one correspondence. We also provided a quantitative distance between the EJD and the RJD based on the mean Euclidean distance of the corresponding points in the 2-dimensional coordinate system (Fig. 8). The calculated distance between the EJD by JAS-GAN with joint discriminative network and the RJD was while the calculated distance between the EJD by JAS-GAN without joint discriminative network and the RJD was 0.304. The qualitative visualization and the quantitative distance both denoted the JAS-GAN with joint discriminative network achieved a more consistent joint distribution between the estimated results and the real ones compared to JAS-GAN without joint discriminative network.

Methods DSC JI ASD (mm) NMI
Segnet
3D Densenet
2D U-Net
3D U-Net
MTL
Tversky Loss
Surface Loss
MVTT
JAS-GAN
TABLE IV: LA segmentation performance comparison between the different architectures (Segnet, 3D Densenet, 2D U-Net, 3D U-Net, MTL, MVTT and JAS-GAN) and the methods aiming to tackle the imbalance issue (Tversky loss and surface loss) in terms of DSC, JI, ASD AND NMI. The results are presented in the form of mean standard deviation. Abbreviations: DSC, Dice Similarity Coefficient; JI, Jaccard Index; ASD, Average Surface Distance; NMI, Normalized Mutual Information.
Methods DSC JI ASD (mm) NMI
2SD
Ostu
Segnet
3D Densenet
2D U-Net
3D U-Net
Tversky Loss
Surface Loss
MVTT
JAS-GAN
TABLE V: Scar segmentation performance comparison between the different architectures (Segnet, 3D Densenet, 2D U-Net, 3D U-Net, MVTT and JAS-GAN), the two-phase methods (2SD and Ostu) and the methods aiming to tackle the imbalance issue (Tversky loss and surface loss) in terms of DSC, JI, ASD AND NMI. The results are presented in the form of mean standard deviation. Abbreviations: DSC, Dice Similarity Coefficient; JI, Jaccard Index; ASD, Average Surface Distance; NMI, Normalized Mutual Information.

Iv-H Performance Comparison with Other Methods.

The performance of JAS-GAN had further been demonstrated by comparing it with the widely used methods and the state-of-the-art methods. For the segmentations of LA and atrial scars, we compared the segmentation performance of JAS-GAN to the 2D U-Net [Ronneberger2015U], 3D U-Net [cciccek20163d], 3D DenseNet [bui20173d], SegNet [badrinarayanan2017segnet], the method (MVTT) proposed by yang et al.[yang2020simultaneous] and two methods aiming to tackle the imbalance issue (Tversky loss [salehi2017tversky] and surface loss [kervadec2019boundary]). We also compare the LA segmentation performance of JAS-GAN to the method (MTL) proposed by Chen et al.[chen2018multi]. Table IV and Table V summarizes the experiment results.

For the LA segmentation, as the experiment results are shown in Table IV, JAS-GAN achieved higher segmentation performance with improved , , and reduced compared to other methods. The reason behind this is that we make full use of adaptive attention cascade connection and adversarial regularization to promote the performance of the encoder-decoder structure.

For the segmentation of atrial scars, as the experiment results are shown in Table V, our proposed JAS-GAN outperformed the compared methods in terms of , , and . The segmentation accuracy obtained by the widely used deep learning methods was limited. This is because they have no suitable mechanism to segment very small atrial scars. The Tversky loss and the Surface loss can effectively deal with imbalance problems. However, only relying on the loss function to deal with the problem of unbalanced target segmentation is still limited to the improvement of segmentation accuracy. The evaluation metrics also illustrated that our proposed JAS-GAN resulted in a more effective architecture for atrial scars segmentation compared to MVTT.

Fig. 9: The correlation analysis for JAS-GAN. (a) and (b) The high correlations between estimated scar volume and ground truth for pre-ablation and post-ablation, respectively. (c) and (d) The high correlations between estimated scar percentage and manual segmentation for pre-ablation and post-ablation, respectively. Abbreviations: r, pearson correlation coefficient; EV, estimated scar volume; MV, ground truth for scar volume; ESP, estimated scar percentage; MSP, ground truth for scar percentage.

Iv-I Analysis of Atrial Scars Quantification.

The quantification of the atrial scars is associated with scar percentage which is defined by the ratio of the scar volume to the LA wall volume. To measure the quantification results of atrial scars, we firstly reported scatter plots for the estimated scar volume and ground truth. As the linear regression results are shown in Fig.

9 (a) and (b), the Pearson correlation coefficients represented excellent correlation between the ground truth and our estimated results ( for pre-ablation and for post-ablation). Besides, As the agreement results based on Bland-Altman plots are shown in Fig. 10 (a) and (b), our JAS-GAN was capable of estimating the scar volume with consistently low error. We then reported scatter plots for the estimated scar percentage and ground truth. As the linear regression results are shown in Fig. 9 (c) and (d), the Pearson correlation coefficients also showed the excellent correlation between the ground truth and our estimated results ( for pre-ablation and for post-ablation). Furthermore, Fig. 10 (c) and (d) show the difference in calculated scar percentage against the scar percentage by manual segmentation. It is observed that the calculated scar percentage had a high agreement with manual delineation. These results indicated the ability of JAS-GAN for quantifying atrial scars.

Fig. 10: The agreement analysis based on Bland-Altman plots for JAS-GAN. (a) and (b) The high agreement between estimated scar volume and ground truth for pre-ablation and post-ablation, respectively. (c) and (d) The high agreement between estimated scar percentage and manual segmentation for pre-ablation and post-ablation, respectively. Abbreviations: EV, estimated scar volume; MV, ground truth for scar volume; ESP, estimated scar percentage; MSP, ground truth for scar percentage.
Methods LA and PVs Atrial scars
DSC JI ASD (mm) NMI DSC JI ASD (mm) NMI
2D U-Net
3D U-Net
Tversky loss
Surface loss
MVTT
JAS-GAN
TABLE VI: Compare the scar quantification and segmentation in LA on MICCAI 2018 Atrial Segmentation Challenge dataset. The results are presented in the form of mean standard deviation. Abbreviations: DSC, Dice Similarity Coefficient; JI, Jaccard Index; ASD, Average Surface Distance; NMI, Normalized Mutual Information

V Discussion

In this study, we have developed a JAS-GAN framework for the joint segmentations of unbalanced atrial targets of LA and atrial scars. The JAS-GAN framework consists of an adaptive attention cascade network and a joint discriminative network. In addition to the reported improvements by the ablation studies presented in Table II, extra analysis experiments were performed to further justify the rationale and the effectiveness of our used architecture illustrated in Section IV. F and Section IV. G. It is of note that the joint discriminative network only participates in the model training, thus will not increase the complexity of the final model in the testing phase or the practical applications.

Our proposed JAS-GAN framework is trained in an end-to-end manner based on the cascade connection with full supervision for the segmentations of unbalanced atrial targets, which provides an effective learning manner for the small atrial scars. We performed comprehensive experiments in the current study—comparing segmentation results of JAS-GAN with a two-phase segmentation for atrial scars with supervised learning (The automated segmented LA was used to define the ROI for the automated scar segmentation), and two two-phase methods with unsupervised learning (The manual segmented LA wall was used to define the ROI for the scar segmentation based on a classical method of the standard deviations thresholding (2SD)

[Karim2013Evaluation] and a state-of-the-art method of Ostu [ravanelli2014novel]). As the results are shown in Table I and Table V, the two-phase segmentation for atrial scars with supervised learning (RN+LA) improved the segmentation accuracy compared to the two-phase methods with unsupervised learning (2SD and Ostu), while our proposed JAS-GAN framework trained in the end-to-end manner with full supervision achieved the best segmentation accuracy. This is because that thresholding based 2SD and Ostu are the unsupervised methods, which are susceptible to noise. Because the atrial scars are very small, the noise hinders the accurate recognition of 2SD and Ostu for small atrial scars. Compared with unsupervised learning, deep learning based supervised learning can extract the high-level features to reduce the interference of noise for atrial scars identification. Furthermore, compared with the two-phase segmentation, the end-to-end learning manner is effective to relieve the problem that the inaccurate LA segmentation further leads to the inaccurate identification for atrial scars.

One limitation of our work is that our proposed method may not be applied directly to the external data if there are significant differences between our training data and the external testing data. This is a common issue while applying the deep learning algorithm to medical images in real clinical environment. Because the domain gaps widely exist between the training data and the external testing data if they come from different scanners or centres [perone2019unsupervised, cheplygina2017transfer]. This problem may also be more severe for MRI based study because routinely used structural MRI (e.g., LGE MRI) are not quantitative acquisitions. Standardisation and normalisation of LGE MRI data can be problematic and an open question for research that is beyond the scope of our current study. To demonstrate that our proposed segmentation framework can be generalized to the data from different centres, we performed the experiments on the data from the MICCAI 2018 Atrial Segmentation Challenge with re-training model. The MICCAI 2018 Atrial Segmentation Challenge provided 100 scans with the labels of the LA wall and LA endocardium [xiong2020global] (We directly combined the labels of LA wall and LA endocardium to obtain the label of LA epicardium. Then we automatically obtained the labels of atrial scars based on the protocol of Cardiac MRI Toolkit Slicer Extension from the National Alliance for Medical Image Computing). We randomly divided the data into a training set with 60 scans and a testing set with 40 scans for experiments. Then we compared our proposed JAS-GAN with the widely used methods and the state-of-the-art methods (2D U-Net, 3D U-Net, Tversky loss, Surface loss and MVTT). As the experiment results are summarized in Table VI, our proposed JAS-GAN still achieved better segmentation accuracy for the LA and atrial scars, which indicated the application of our proposed JAS-GAN.

Vi Conclusion

Automated and accurate segmentations of LA and atrial scars from LGE CMR images can provide great clinical significance for further quantifying atrial scars. In this study, we proposed a JAS-GAN model for the automated and accurate segmentations of the LA and atrial scars from LGE CMR images directly based on an adaptive attention cascade network and a joint discriminative network. The adaptive attention cascade network automatically captures the correlation of two segmentation tasks by building the relationship of LA and atrial scars. The joint discriminative network employs an adversarial regularization to force the estimated joint distribution of LA and atrial scars to match the real ones. The experimental results demonstrated that our proposed JAS-GAN enabled the accurate segmentations of the LA and atrial scars simultaneously. Therefore, our proposed JAS-GAN can provide an effective way in clinical practice to quantify the atrial scars for patients with AF.

References