1 Introduction
Atrial fibrillation (AF) is the most common cardiac arrhythmia which increases the risk of stroke, heart failure and death [2]. Radiofrequency ablation is a promising procedure for treating AF, where patient selection and outcome prediction of such therapy can be improved through left atrial (LA) scar localization and quantification. Atrial scars are located on the LA wall, thus it normally requires LA/ LA wall segmentation to exclude confounding enhanced tissues from other substructures of the heart. Late gadolinium enhanced magnetic resonance imaging (LGE MRI) has been an important tool for scar visualization and quantification. Manual delineations of LGE MRI can be subjective and laborintensive. However, automating this segmentation remains challenging, mainly due to the various LA shapes, thin LA wall, poor image quality and enhanced noise from surrounding tissues.
Limited studies have been reported in the literature to develop automatic LA segmentation and scar quantification algorithms. For LA segmentation, Xiong et al. proposed a dual fully convolutional neural network (CNN)
[11]. In an LA segmentation challenge [14], Chen et al. presented a twotask network for atrial segmentation and post/ pre classification to incorporate the prior information of the patient category [1]. Nunez et al. achieved LA segmentation by combining multiatlas segmentation and shape modeling of LA [9]. Recently, Yu et al. designed an uncertaintyaware semisupervised framework for LA segmentation [12]. For scar quantification, most of the current works adopted thresholdbased methods that relied on manual LA wall segmentation [6]. Some other conventional algorithms, such as Gaussian mixture model (GMM)
[4], also required an accurate initialization of LA or LA wall. However, automatic LA wall segmentation is complex and challenging due to its inherent thin thickness ( mm) [5]. Recent studies show that the thickness can be ignored, as clinical studies mainly focus on the location and extent of scars [10, 7]. For example, Li et al. proposed a graphcuts framework for scar quantification on the LA surface mesh, where the weights of the graph were learned via a multiscale CNN [7]. However, they did not achieve an endtoend training, i.e., the multiscale CNN and graphcuts were separated into two subtasks.Recently, deep learning (DL)based methods have achieved promising performance for cardiac image segmentation. However, most DLbased segmentation methods are trained with a loss only considering a label mask in a discrete space. Due to the lack of spatial information, predictions commonly tend to be blurry in boundary, and it leads to noisy segmentation with large outliers. To solve this problem, several strategies have been employed, such as graphcuts/ CRF regularization
[7, 3], and deformation combining shape priors [13].In this work, we present an endtoend multitask learning network for joint LA segmentation and scar quantification. The proposed method incorporates spatial information in the pipeline to eliminate outliers for LA segmentation, with additional benefits for scar quantification. This is achieved by introducing a spatially encoded loss based on the distance transform map, without any modifications of the network. To utilize the spatial relationship between LA and scars, we adopt the LA boundary as an attention mask on the scar map, namely surface projection, to achieve shape attention. Therefore, an endtoend learning framework is created for simultaneous LA segmentation, scar projection and quantification via the multitask learning (MTL) network embedding the spatial encoding (SE) and boundary shape attention (SA), namely MTLSESA network.
2 Method
Fig. 1 provides an overview of the proposed framework. The proposed network is a modified UNet consisting of two decoders for LA segmentation and scar quantification, respectively. In Section 2.1
, a SE loss based on the distance transform map is introduced as a regularization term for LA segmentation. For scar segmentation, a SE loss based on the distance probability map is employed, followed by a spatial projection (see Section
2.2). Section 2.3 presents the specific SA scheme embedded in the MTL network for the predictions of LA and LA scars in an endtoend style.2.1 Spatially Encoded Constraint for LA Segmentation
A SE loss based on the signed distance transform map (DTM) is employed as a regularization term to represent a spatial vicinity to the target label. Given a target label, the signed DTM for each pixel can be defined as:
(1) 
where and respectively indicate the region inside and outside the target label, denotes the surface boundary, represents the distance from pixel to the nearest point on , and
is a hyperparameter. The binary crossentropy (BCE) loss and the additional SE loss for LA segmentation can be defined as:
(2) 
(3) 
where and () are the prediction of LA and its ground truth, respectively, and denotes elementwise product.
2.2 Spatially Encoded Constraint with an Explicit Projection for Scar Quantification
For scar quantification, we encode the spatial information by adopting the distance probability map of normal wall and scar region as the ground truth instead of binary scar label. This is because the scar region can be very small and discrete, thus its detection presents significant challenges to current DLbased methods due to the classimbalance problem. In contrast to traditional DLbased algorithms optimizing in a discrete space, the distance probability map considers the continuous spatial information of scars. Specifically, we separately obtain the DTM of the scar and normal wall from a manual scar label, and convert both into probability maps . Here and is the nearest distance to the boundary of normal wall or scar for pixel . Then, the SE loss for scar quantification can be defined as:
(4) 
where () is the predicted distance probability map of both normal wall and scar region. Note that the situation of sometimes exists. One can compare these two probabilities to extract scars instead of employing a fixed threshold.
To ignore the wall thickness which varies from different positions and patients [5], the extracted scars are explicitly projected onto the LA surface. Therefore, the volumebased scar segmentation is converted into a surfacebased scar quantification through the spatially explicit projection. However, the pixelbased classification in the surfacebased quantification task only includes very limited information, i.e., the intensity value of one pixel. In contrast to extracting multiscale patches along the LA surface [7], we employ the SE loss to learn the spatial features near the LA surface. Similar to [7], the SE loss can also be beneficial to improving the robustness of the framework against the LA segmentation errors.
2.3 Multitask Learning with an Endtoend Trainable Shape Attention
To employ the spatial relationship between LA and atrial scars, we design an MTL network including two decoders, i.e., one for LA and the other for scar segmentation. As Fig. 1 shows, the Decoder is supervised by and , and the Decoder is supervised by . To explicitly learn the relationship between the two tasks, we extract the LA boundary from the predicted LA as an attention mask for the training of Decoder, namely explicit projection mentioned in Section 2.2. An SA loss is introduced to enforce the attention of Decoder on the LA boundary:
(5) 
where , , and is the boundary attention mask, which can be generated from the gold standard segmentation of LA () as well as the predicted LA (). Hence, the total loss of the framework is defined by combining all the losses mentioned above:
(6) 
where , , and are balancing parameters.
3 Experiments
3.1 Materials
3.1.1 Data Acquisition and Preprocessing.
The data is from the MICCAI2018 LA challenge [14]. The 100 LGE MRI training data, with manual segmentation of LA, consists of 60 postablation and 40 preablation data. In this work, we chose the 60 postablation data for manual segmentation of the LA scars and employed them for experiments. The LGE MRIs were acquired with a resolution of mm and reconstructed to mm. All images were cropped into a unified size of
centering at the heart region and were normalized using Zscore. We split the images into two sets, i.e., one with 40 images for training and the other with 20 for the test.
3.1.2 Gold Standard and Evaluation.
The challenge provides LA manual segmentation for the training data, and scars of the 60 postablation data were manually delineated by a welltrained expert. These manual segmentations were considered as the gold standard. For LA segmentation evaluation, Dice volume overlap, average surface distance (ASD) and Hausdorff distance (HD) were applied. For scar quantification evaluation, the manual and (semi) automatic segmentation results were first projected onto the manually segmented LA surface. Then, the Accuracy measurement of the two areas in the projected surface, Dice of scars (Dice) and generalized Dice score (Dice) were used as indicators of the accuracy of scar quantification.
3.1.3 Implementation.
The framework was implemented in PyTorch, running on a computer with 1.90 GHz Intel(R) Xeon(R) E52620 CPU and an NVIDIA TITAN X GPU. We used the SGD optimizer to update the network parameters (weight decay=0.0001, momentum=0.9). The initial learning rate was set to 0.001 and divided by 10 every 4000 iterations. The balancing parameters in Section
2.3, were set as follows, , , and , where and was multiplied by 1.1 every 200 iterations. The inference of the networks required about 8 seconds to process one test image.3.2 Result
Method  Dice  ASD (mm)  HD (mm) 

UNetBCE  
UNetSE  
MTLBCE  
MTLSE  
MTLSESA 
with two decoders; BCE, SE, SA and SESA refer to the different loss functions. The proposed method is denoted as MTLSESA.
Method  Accuracy  Dice  Dice 

LA+Otsu [10]  
LA+MGMM [8]  
LA+LearnGC [7]  
LA+Otsu  
LA+MGMM  
UNetBCE  
UNetDice  
UNetSE  
MTLBCE  
MTLSE  
MTLSESA  
InterOb 
3.2.1 Parameter Study.
To explore the effectiveness of the SE loss, we compared the results of the proposed scheme for LA segmentation using different values of for DTM in Eq. (1). Fig. 2 (a) provides the results in terms of Dice and HD, and Fig. 2 (b) visualizes three examples for illustrating the difference of the results using or without using the SE loss. One can see that with the SE loss, UNetSE evidently reduced clutter and disconnected parts in the segmentation compared to UNetBCE, and significantly improved the HD of the resulting segmentation (), though the Dice score may not be very different. Also, UNetSE showed stable performance with different values of except for too extreme values. In the following experiments, was set to 1.
3.2.2 Ablation Study.
Table 1 and Table 2 present the quantitative results of different methods for LA segmentation and scar quantification, respectively. For LA segmentation, combining the proposed SE loss performed better than only using the BCE loss. For scar quantification, the SE loss also showed promising performance compared to the conventional losses in terms of Dice. LA segmentation and scar quantification both benefited from the proposed MTL scheme comparing to achieving the two tasks separately. The results were further improved after introducing the newlydesigned SE and SA loss in terms of Dice (), but with a slightly worse Accuracy () and Dice (). Fig. 3 visualizes an example for illustrating the segmentation and quantification results of scars from the mentioned methods in Table 2. Compared to UNetBCE and UNetDice, MTLBCE improved the performance, thanks to the MTL network architecture. When the proposed SE and SA loss were included, some small and discrete scars were also detected, and an endtoend scar quantification and projection was achieved.
3.2.3 Comparisons with Literature.
Table 2 and Fig. 3 also present the scar quantification results from some stateoftheart algorithms, i.e., Otsu [10], multicomponent GMM (MGMM) [8], LearnGC [7] and UNet with different loss functions. The three (semi) automatic methods generally obtained acceptable results, but relied on an accurate initialization of LA. LearnGC had a similar result compared to MGMM in Dice based on LA, but its Accuracy and Dice were higher. The proposed method performed much better than all the automatic methods in terms of Dice with statistical significance (). In Fig. 3, one can see that Otsu and UNet tended to undersegment the scars. Though including Dice loss could alleviate the classimbalance problem, it is evident that the SE loss could be more effective, which is consistent with the quantitative results in Table 2. MGMM and LearnGC both detected most of the scars, but LearnGC has the potential advantage of small scar detection. The proposed method could also detect small scars and obtained a smoother segmentation result.
4 Conclusion
In this work, we have proposed an endtoend learning framework for simultaneous LA segmentation and scar quantification by combining the SE and SA loss. The proposed algorithm has been applied to 60 image volumes acquired from AF patients and obtained comparable results to interobserver variations. The results have demonstrated the effectiveness of the proposed SE and SA loss, and showed the superiority of segmentation performance over the conventional schemes. Particularly, the proposed SE loss substantially reduced the outliers, which frequently occurs in the prediction of DLbased methods. Our technique can be easily extended to other segmentation tasks, especially for discrete and small targets such as lesions. A limitation of this work is that the gold standard was constructed from the manual delineation of only one expert. Besides, the target included in this study is only postablation AF patients. In future work, we will combine multiple experts to construct the gold standard, and consider both pre and postablation data.
4.0.1 Acknowledgement.
This work was supported by the National Natural Science Foundation of China (61971142), and L. Li was partially supported by the CSC Scholarship.
References
 [1] (2018) Multitask learning for left atrial segmentation on GEMRI. In International Workshop on Statistical Atlases and Computational Models of the Heart, pp. 292–301. Cited by: §1.
 [2] (2014) Worldwide epidemiology of atrial fibrillation: a global burden of disease 2010 study. Circulation 129 (8), pp. 837–847. Cited by: §1.
 [3] (2017) Efficient multiscale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Medical image analysis 36, pp. 61–78. Cited by: §1.
 [4] (2014) A method to standardize quantification of left atrial scar from delayedenhancement MR images. IEEE journal of translational engineering in health and medicine 2, pp. 1–15. Cited by: §1.

[5]
(2018)
Algorithms for left atrial wall segmentation and thickness–evaluation on an opensource CT and MRI image database
. Medical image analysis 50, pp. 36–53. Cited by: §1, §2.2.  [6] (2013) Evaluation of current algorithms for segmentation of scar tissue from late gadolinium enhancement cardiovascular magnetic resonance of the left atrium: an openaccess grand challenge. Journal of Cardiovascular Magnetic Resonance 15 (1), pp. 105. Cited by: §1.
 [7] (2020) Atrial scar quantification via multiscale CNN in the graphcuts framework. Medical Image Analysis 60, pp. 101595. Cited by: §1, §1, §2.2, §3.2.3, Table 2.
 [8] (2017) Myocardium segmentation from DE MRI using multicomponent Gaussian mixture model and coupled level set. IEEE Transactions on Biomedical Engineering 64 (11), pp. 2650–2661. Cited by: §3.2.3, Table 2.
 [9] (2018) Left atrial segmentation combining multiatlas whole heart labeling and shapebased atlas selection. In International Workshop on Statistical Atlases and Computational Models of the Heart, pp. 302–310. Cited by: §1.
 [10] (2013) A novel skeleton based quantification and 3D volumetric visualization of left atrium fibrosis using late gadolinium enhancement magnetic resonance imaging. IEEE transactions on medical imaging 33 (2), pp. 566–576. Cited by: §1, §3.2.3, Table 2.
 [11] (2018) Fully automatic left atrium segmentation from late gadolinium enhanced magnetic resonance imaging using a dual fully convolutional neural network. IEEE transactions on medical imaging 38 (2), pp. 515–524. Cited by: §1.
 [12] (2019) Uncertaintyaware selfensembling model for semisupervised 3D left atrium segmentation. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pp. 605–613. Cited by: §1.
 [13] (2019) Liver segmentation in magnetic resonance imaging via mean shape fitting with fully convolutional neural networks. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pp. 246–254. Cited by: §1.
 [14] (2018) 2018 atrial segmentation challenge. Note: http://atriaseg2018.cardiacatlas.org/ Cited by: §1, §3.1.1.