Abdominal Aortic Aneurysm (AAA) is an enlargement of the abdominal aorta. It is usually asymptomatic while is with high mortality once rupture . This rupture is usually avoided via placing a stent graft in the aneurysm area to re-establish the blood flow and hence to exclude the aneurysm wall from blood pressure. Fenestrated Endovascular Aortic Repair (FEVAR), which is specific for aneurysms near or include the renal and visceral vessels, uses fenestrated stent grafts with fenestrations, scallops and branch stent grafts to perfuse main and branch vessels . A real main fenestrated stent graft and its model are shown in fig. 1a and b respectively. One main challenge in FEVAR is the cannulation of a branch stent graft from the fenestration or scallop into the corresponding branch vessel. Robot-assisted systems have been developed to facilitate this challenge, i.e. the Magellan system (Hansen Medical, CA, USA). However, only 2D fluoroscopy images are currently used for navigation while the cannulation needs precise 3D-3D geometrical alignments.
Various methods have been developed by researchers to improve the navigation. The stent graft delivery device was tracked and detected by Frangi filtering and Robust Principal Component Analysis (RPCA). The 3D stent shape was recovered from one X-ray image by registration and semi-simultaneous optimization . The aortic and illiac deformations caused by device insertions were corrected by the skeleton-based As-Rigid-As-Possible (ARAP) method . The 3D shape of a fenestrated stent graft after its deployment was instantiated semi-automatically from one fluoroscopy image of its compressed state with markers and the RP5P method . The 3D shape of a fenestrated and deployed stent graft was instantiated semi-automatically from one fluoroscopy image of its deployed state with the RP5P method, graft gap interpolation and semi-automatic marker center determination . In , markers could only be segmented into one class while manual classification was essential for 3D shape instantiation.
In this paper, an automatic 3D shape instantiation is possible, as the markers in  were segmented into multiple-classes automatically though being designed into five different shapes. One experimental fluoroscopy image with the customized markers labeled in different colors is shown in fig. 1c. We use marker segmentation rather than marker detection to determine the marker center position, as segmentation is a pixel-level classification and is more precise. There are two challenges in segmenting these customized markers into multiple-classes: 1) the markers are very small (the reason will be explained in section II-A), causing class-imbalance problems; 2) the markers are with similar appearances (the reason will be explained in section II-B).
Compared to conventional segmentation methods, deep convolutional neural network which extracts and classifies the features automatically with the using of multiple non-linear modules has outperformed significantly in semantic segmentation. Fully Convolutional Neural Network (FCNN) was the very first proposed network which improved the image-level classification with Convolutional Neural Network (CNN) to a pixel-level classification with the using of fully convolutional layers, deconvolutional layers and skip architectures
. Ronneberger et al. firstly introduced FCNN into biomedical segmentation and proposed U-Net on neuronal structure segmentation and cell segmentation. The Deeplab series including Deeplabv1 , Deeplabv2 , Deeplabv3 , and Deeplabv3+  with Atrous convolution, Atrous Spatial Pyramid Pooling (ASPP), and encoder-decoder modules were also popular networks in semantic segmentation.
Class-imbalance, where the background pixel number is much larger than the foreground pixel number, is a common challenging problem in semantic segmentation. Allocating large weights for the foreground pixels while allocating small weights for the background pixels were usually used to concentrate the training more on foreground pixels . Three shortages exist when applying weighted loss in our application (will be proved in section III-A): 1) the weight needs to be manually set; 2) when the weight is too small, weighted loss could not distinguish between different foreground classes, while if the weight is too large, the background would be mis-classified as a foreground; 3) its performance is insufficient.
Two-stage networks were also widely explored in both biomedical and natural community to improve the network performance on small object or class-imbalance segmentation. Cascade Fully Convolutional Network (CFCN) was proposed to segment the liver first as a Region of Interest (RoI), and then another FCN was trained to segment the small lesion inside the liver RoI . In Zhou et al.’s work, the pancreas was segmented firstly, and then the cyst inside the pancreas was segmented to improve the accuracy of the small cyst segmentation . In natural community, Mask Region-CNN (Mask R-CNN) was developed, where an object bounding box was regressed and classified firstly and then FCN was applied inside this bounding box .
Apart from improving the network structure and using two-stage networks, various researches have also been carried out on the loss function. Topology aware FCN was proposed with considering multi-region topological relationships and smooth boundaries into the loss function for histology gland segmentation 
. Convolutional AutoEncoder (CAE) was added to the loss function to consider the shape prior for semantic segmentation, which shown improved results in the kidney ultrasound image segmentation. Recently, focal loss was introduced in the object detection domain, which added different scaling factors automatically to focus on training hard examples . However, directly applying the focal loss in  into our application has three challenges: 1) the performance is insufficient (will be proved in section III-B); 2) it needs careful parameter initialization; 3) the weight used in  would introduce the same problems as stated before for the weighted loss.
In this paper, Equally-weighted Focal U-Net was proposed. ”Equally-weighted” means equal weight of 1 was applied to the foreground and the background. ”Focal” means focal loss was used. The proposed method is a one-stage network but with two-step training, as shown in fig. 2. Firstly, U-Net with equally-weighted loss function was applied to segment a preliminary result. Secondly, U-Net with equally-weighted focal loss was used to improve the preliminary segmentation. It outperformed the focal loss in  and Weighted U-Net in  in: 1) the model trained by equally-weighted loss is used as the initialization for later equally-weighted focal loss, avoiding careful manual parameter initialization; 2) equally-weighted loss avoids the possible problems caused by weighted loss and also reduces one hyper-parameter - the weight; 3) even though equally-weighted loss under-performs weighted loss, the later equally-weighted focal loss will improve the preliminary segmentation result and outperform weighted loss. U-Net was selected as the network structure, as it is easy to be trained from scratch with limited training data (80 images in this paper). The proposed Equally-weighted Focal U-Net and also the 3D shape instantiation were validated on 78 testing images, showing comparable results.
The section II describes the methodologies used in this paper, including marker design, image collection, Equally-weighted Focal U-Net, brief introduction of 3D shape instantiation, and experimental setup. In section III, the impact of block number, data augmentation, and image enhancement are explored, the comparison between different methods is carried out, as well as the performance of segmentation and 3D shape instantiation are shown. The discussion and conclusion of the proposed method are summarized in section IV and section V respectively.
The design of stent graft markers is described in section II-A. The section II-B introduces the progress of image collection. The section II-C explains the data representation, deep learning structure and loss function used in the proposed Equally-weighted Focal U-Net. 3D shape instantiation  is briefly introduced in section II-D to facilitate understanding. In section II-E, parameters for experimental setup are described and explained.
Ii-a Marker Design
Stent graft markers were designed based on commercially-used gold markers (shown in fig. 1a) into five different shapes and were placed at five non-planar positions on each stent segment. The marker parameters are shown in table I. The lengths were designed to be similar to that of commercial markers which are around . The thicknesses were empirically-determined for both minimized thickness and good imaging quality under lowest-radiation fluoroscopy. The shapes were designed with maximum differentiation and to be easily sewn onto the stents. Due to the high price of gold, these markers were printed on a Mlab Cusing R machine (ConceptLaser, Lichtenfels, Germany) with SS316L stainless steel powder for the experiment. The printed markers are shown in fig. 1d. The small marker size caused class-imbalance. The five marker classes occupied , , , , of the total pixels of the fluoroscopy image.
|Hole Radius (mm)||0.5||0.2||0.2||-||0.63|
Ii-B Image Collection
For simulating the intra-operative fluoroscopy images in FEVAR, each stent segment of three stent grafts (illiac, fenestrated, and thoracic) was sewn with the five newly designed markers at non-planar positions, as shown in fig. 1c. The modified stent grafts were inserted, delivered and deployed into five 3D printed patient aneurysm phantoms. For more details of the 3D printed phantoms, please read . Fourteen matching positions or setups were selected and each setup was scanned by a GE Innova 4100 (GE Healthcare, Bucks, UK) with 13 view angles from to . This varying view angle is necessary for proving that the 3D shape instantiation works for any view angle. It caused the 2D marker shape appearances to be similar in the fluoroscopy images, even though these markers were designed to be differentiable in 3D. During the experiment, one marker fell off which caused that setup to be abandoned. The operator forgot to store 11 fluoroscopy images, resulting 158 2D fluoroscopy images in total. images from 6 setups with complete 13 view angles were used for the testing while others were used for the training. Due to the limited number of available images, no evaluation images were split. More details about the experimental setup and image collection could be found in .
Ii-C Equally-weighted Focal U-Net
Ii-C1 Data representation
Given a training or testing data set , where is one image example with width and height , in this paper, is the total number of images in the training or testing data set. The intensity of each pixel in is normalized into by: . The segmentation ground truth of in the training data set is labelled as a labelling cube: , where is the number of marker classes, in this paper (fig. 2), has the same width and height , is the background labelling layer with background pixels labelled as and other pixels labelled as , is the class foreground or marker labelling layer with the class marker pixels labelled as and other pixels labelled as . Since the markers are very small, those markers do not fully overlap each other frequently during the varying fluoroscopy view angle. Hence, it is reasonable to consider the multiple-class marker segmentation as a no-overlap problem, where one pixel only belongs to one class.
Ii-C2 U-Net structure
According to the U-net structure , a normalized image is passed into the proposed network as an input, then a probability map cube is calculated, where is with the same width and height . The value of each pixel in is the probability of that pixel belongs to the class and is between
. The network structure used in this paper is consisted of convolutional layers, max-pooling layers and deconvolutional layers, as illustrated infig. 3. It has two paths: a contracting path (left) and an expansive path (right). For convenience, we term the layers that manipulate on images with the same size as a block. In the contracting path, each block is consisted of two convolutional layers following by a max-pooling layer. In the expansive path, each block is consisted of two convolutional layers following by a deconvolutional layer. The last block is consisted of two convolutional layers, a
convolutional layer, a pixel-wise softmax layer, and an argmax layer. The network infig. 3
is defined as a 3-block U-Net, as three max-pooling/deconvolutional layers are used in total. In this paper, the stride for the convolutional layer is alwayswhile that for the max-pooling layer is always .
Ii-C3 Loss function
After passing through the U-Net, each pixel will have a U-Net-predicted value for the classes: . Pixel-wise softmax is used to transform into the probability by:
Cross-entropy loss is calculated across the labelling and predicted probability cube to measure the difference between the predicted probability P and the ground truth L:
Usually, weighted loss was applied to solve the class-imbalance problem:
Here, while . In this paper, equally-weighted loss was applied for the first-step training. . When the loss converges to a minimum, equally-weighted focal loss was applied to improve the preliminary segmentation results:
The scaling factor of suppresses heavily the loss contribution of correctly-segmented pixels (when ). However, it suppresses lightly the loss contribution of wrongly-segmented pixels (when ). Thus the focal loss concentrates the training on wrongly-segmented pixels or hard pixels.
Ii-D 3D Shape Instantiation
The marker center positions segmented by the proposed Equally-weighted Focal U-Net were used as the input for the RP5P method to recover the 3D pose of each stent segment. The whole stent graft shape was then recovered by graft gap interpolation. Details of the 3D shape instantiation could be found in . Its codes are also available on-line.
Ii-E Experimental Setup
Ii-E1 Data augmentation
to evaluate the character of the proposed network to data augmentation, two different data augmentation methods were compared: 1) rotated the training images from to with as the interval; 2) rotated the training images from to with as the interval and flipped each rotated image along the horizontal and vertical direction respectively. Both data augmentation methods augmented the training images with times, resulting training images.
Ii-E2 Image enhancement
to evaluate the performance of the proposed network to image enhancement, image intensity adjustment and contrast-limited adaptive histogram equalization were applied with function:
Ii-E3 Ground truth labelling
the markers were labelled in Analyze (AnalyzeDirect Inc, Overland Park, KS, USA) with firstly magnifying the image from to and then shrinking the image from back to . Hence, the 1 pixel error of labelling in the resolution image would be shrunk to pixel in the resolution image.
the learning rate was set step-wisely and divided by two or five when the loss stopped decreasing. The dropout rate was set as 0.75. The weights in the neural network were initialized by truncated normal distribution withand while the biases were initialized by constant . The optimizer was the momentum optimizer in Tensorflow with the momentum set as 0.95. The batch size was set as 1. The loss function was written by tf.nn.softmax, tf.log, and tf.reduce_mean, which may present slightly worse stability than that using default tf.nn.softmax_cross_entropy_with_logits. The mean Intersection over Union (mIoU), the overlap of the ground truth and the prediction over the union of the ground truth and the prediction, was calculated to evaluate the segmentation performance. Except section III-A, all training procedures were based on the data augmented with image rotation and without image enhancement.
The characters of the proposed network with respect to the number of U-Net block, image enhancement, data augmentation, and weight are illustrated in section III-A. The comparison between different methods is presented in section III-B. Detailed multiple-class marker segmentation results are shown in section III-C. The accuracy of 3D shape instantiation based on the marker segmentation in this paper is presented in section III-D.
Iii-a Network Characters
The mIoUs achieved with different setups are shown in table II, where the highest mIoU is emphasized in bold font.
Iii-A1 Number of U-Net block
Equally-weighted Focal U-Net with block number from were trained to segment the multiple-class markers in fig. 2, mIoUs are listed in Row in table II. It can be concluded that 1-block U-Net and 6-block U-Net under-performed slightly others. However, the training time increased from 36 hours for 1-block U-Net to 120 hours for 6-block U-Net. Based on this comparison result, 2-block U-Net was chosen as a trade-off between the efficiency and the performance in the following validations.
Iii-A2 Data augmentation
Equally-weighted Focal U-Net with 2 blocks was trained on the data augmented with image rotation and with image rotation respectively. The mIoUs for six classes on the 78 testing images are summarized in the Row 2 and the Row 7 in table II. The results showed that the mIoUs achieved with image rotation are higher than that with image rotation in most classes, except for Marker 3 and Marker 4. Hence, image rotation was utilized as data augmentation in this paper.
Iii-A3 Image enhancement
Equally-weighted Focal U-Net with 2 blocks was trained on the training data with and without image enhancement respectively. The mIoUs of the six classes achieved on the 78 testing images are summarized in the Row 2 and the Row 8 in table II. The results presented that the mIoUs decreased significantly when the training data was pre-processed with image enhancement. Therefore, the images in the training set will only be processed by normalization in the following training.
2-block U-Net with the weight of 1, 20, 50, 100, 500 were trained respectively. The mIoUs of the six classes on the 78 testing images are listed in the Row 9-13 in table II. The results illustrated that 2-block U-net with the weight of presented optimal performance comparing with small weights (weight = 1, 20) and the large weight (weight = 500). Thus, 2-block U-Net with the weight of 50 was applied in the following work.
The segmentation results of the 2-block U-Net with different weights are illustrated in fig. 4. It can be seen that the five foreground or marker classes could not be clearly distinguished between each other with a small weight, i.e. . However, if the weight of the network is too large, i.e. , the background was mis-classified as a foreground, as this wrong classification contributed too less to the total loss. For example, a wrongly-segmented background () contributed to the total loss while a wrongly-segmented foreground () contributed to the total loss. The mIoUs of the background decreased along the increased weight (Row 9-13 in table II), which also proves this trend.
Iii-B Comparison between different methods
The performance of 2-block U-Net using five different methods were explored in fig. 5: 1) Equally-weighted Focal U-Net (the proposed method); 2) Weighted U-Net with the weight as 50 for foreground and the weight as 1 for background; 3) U-Net with Equally-weighted Focal Loss which used an equally-weighted focal loss from the beginning of the training; 4) Equally-weighted U-Net with the weight set as 1 for both the foreground and the background; 5) Weighted Focal U-Net with the weight set as 50 for the first step training, and then focal loss with the weight of 50 for the second step training. The performance of these methods are shown by the mean and std IoUs. The fig. 5 illustrated that the proposed method has outstanding performance on every marker class comparing with other methods.
Iii-C Multiple-class Marker Segmentation
Equally-weighted Focal U-Net with 3-block (Row 3 in table II) was applied to segment each testing image. The results are illustrated in fig. 6. The fig. 6 showed that the proposed network could segment most of the images with outstanding performance, except from a few markers in the image No.10, No.13, No.59 and No.71. Besides, the fig. 7 presents the segmentation details of image No.21 using the proposed method, where each marker class was segmented with a high overlap between the ground truth and the prediction.
Iii-D 3D Shape Instantiation
The 78 images contain 2470 markers, of them were segmented with a center position error which are 2 pixels on the fluoroscopy image. The marker center positions determined with error were corrected manually. With these marker center positions, the angular error and 3D distance error of 3D shape instantiation were illustrated in fig. 8, showing that the proposed method presents comparable performance with 3D shape instantiation with both manual and semi-automatic marker center determination. More 3D shape instantiation results could be found in .
All the training procedures were based on a NVIDIA TITAN Xp GPU. Segmenting one image took less than . The programming is based on the released code of .
Equally-weighted Focal U-Net was proposed to segment the customized stent graft markers into multiple-classes. The segmented marker center positions would be used by the RP5P method and hence automatic 3D stent graft shape instantiation was possibly achieved. Focal loss was successfully applied into semantic segmentation for the first time with convincing improvements.
In section III-A1,the performance of U-Net with different block number was explored. The results showed that Equally-weighted Focal U-Net did not achieve higher mIoU along with an increasing block number. One possible reason could be network degradation. In the future, the network structure will be explored in details.
In section III-A4, different weights were explored. Usually, weighted loss outperforms equally-weighted loss for class-imbalance segmentation, as it treats the foreground more importantly by assigning a higher weight for it. However, in this paper, we consider the background as equally important as the foreground, as a mis-classified background will also decrease the foreground IoU. So equally-weighted loss was applied.
The proposed method is capable for multiple-class marker segmentation, obtained an overall mIoU of 0.6943, and detected markers with center position error . Comparable 3D shape instantiation error was achieved () with the approximately-automatic marker center determination method in this paper, with respect to 3D shape instantiation with semi-automatic marker center determination () and with manual marker center determination () in .
In this paper, Equally-weighted Focal U-Net was proposed for multiple-class marker segmentation and then automatic 3D stent graft shape instantiation could be achieved. The performance of the proposed network was explored and discussed with different characters, such as the number of blocks, method of data augmentation, image enhancement, and different weights. Based on these results, 3-block Equally-weighted Focal U-Net showed optimal accuracy in multiple-class marker segmentation. In the future, the proposed network will be further improved and extended to a general framework for wider applications.
We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research. This work was supported by EPSRC project grant EP/L020688/1.
-  K. C. Kent, “Abdominal aortic aneurysms,” New England Journal of Medicine, vol. 371, no. 22, pp. 2101–2108, 2014.
-  J. Cross, K. Gurusamy, V. Gadhvi, D. Simring, P. Harris, K. Ivancev, and T. Richards, “Fenestrated endovascular aneurysm repair,” British Journal of Surgery, vol. 99, no. 2, pp. 152–159, 2012.
-  D. Volpi, M. H. Sarhan, R. Ghotbi, et al., “Online tracking of interventional devices for endovascular aortic repair,” IJCARS, vol. 10, no. 6, pp. 773–781, 2015.
-  S. Demirci, A. Bigdelou, L. Wang, et al., “3D stent recovery from one x-ray projection,” in MICCAI 2011. Springer, 2011, pp. 178–185.
-  D. Toth, M. Pfister, A. Maier, M. Kowarschik, and J. Hornegger, “Adaption of 3D models to 2D x-ray images during endovascular abdominal aneurysm repair,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2015, pp. 339–346.
-  X. Zhou, G. Yang, C. Riga, and S. Lee, “Stent graft shape instantiation for fenestrated endovascular aortic repair.” The Hamlyn Symposium on Medical Robotics.
-  X.-Y. Zhou, J. Lin, C. Riga, G.-Z. Yang, and S.-L. Lee, “Real-time 3-d shape instantiation from single fluoroscopy projection for fenestrated stent graft deployment,” IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 1314–1321, 2018.
-  J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in
-  O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2015, pp. 234–241.
-  L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected crfs,” arXiv preprint arXiv:1412.7062, 2014.
-  L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2018.
-  L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv preprint arXiv:1706.05587, 2017.
-  L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” arXiv preprint arXiv:1802.02611, 2018.
-  P. F. Christ, M. E. A. Elshaer, F. Ettlinger, S. Tatavarty, M. Bickel, P. Bilic, M. Rempfler, M. Armbruster, F. Hofmann, M. D’Anastasi, et al., “Automatic liver and lesion segmentation in ct using cascaded fully convolutional neural networks and 3d conditional random fields,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2016, pp. 415–423.
-  Y. Zhou, L. Xie, E. K. Fishman, and A. L. Yuille, “Deep supervision for pancreatic cyst segmentation in abdominal ct scans,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2017, pp. 222–230.
-  K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2017, pp. 2980–2988.
-  A. BenTaieb and G. Hamarneh, “Topology aware fully convolutional networks for histology gland segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2016, pp. 460–468.
-  H. Ravishankar, R. Venkataramani, S. Thiruvenkadam, P. Sudhakar, and V. Vaidya, “Learning and incorporating shape models for semantic segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2017, pp. 203–211.
-  T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense object detection,” in 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017, pp. 2999–3007.
-  J. Akeret, C. Chang, A. Lucchi, and A. Refregier, “Radio frequency interference mitigation using deep convolutional neural networks,” Astronomy and computing, vol. 18, pp. 35–39, 2017.