Weakly-supervised 3D coronary artery reconstruction from two-view angiographic images

03/26/2020 ∙ by Lu Wang, et al. ∙ Tsinghua University 10

The reconstruction of three-dimensional models of coronary arteries is of great significance for the localization, evaluation and diagnosis of stenosis and plaque in the arteries, as well as for the assisted navigation of interventional surgery. In the clinical practice, physicians use a few angles of coronary angiography to capture arterial images, so it is of great practical value to perform 3D reconstruction directly from coronary angiography images. However, this is a very difficult computer vision task due to the complex shape of coronary blood vessels, as well as the lack of data set and key point labeling. With the rise of deep learning, more and more work is being done to reconstruct 3D models of human organs from medical images using deep neural networks. We propose an adversarial and generative way to reconstruct three dimensional coronary artery models, from two different views of angiographic images of coronary arteries. With 3D fully supervised learning and 2D weakly supervised learning schemes, we obtained reconstruction accuracies that outperform state-of-art techniques.



There are no comments yet.


page 1

page 2

page 3

page 4

page 5

page 6

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Cardiovascular and cerebrovascular diseases have been one of the major death threats to human beings for many years. A large number of skilled physicians are urgently needed in the field of combating cardiovascular and cerebrovascular diseases. Interventional cardiac stent surgery, also known as coronary stent implantation, is an effective technique for the treatment of insufficient myocardial blood supply and cardiac artery obstruction caused by coronary heart diseases. In an interventional surgery, the physician inserts a catheter into the cardiovascular blood vessels through an artery. Under the guidance of angiographic images from several angles, the physician performs complicated operations such as catheter movement, contrast agent release, and stent installation. The whole process above is a very delicate operation, which requires high-level training for physicians. Generally speaking, only physicians with many years of practical experience can skillfully and safely perform complex operations. The ultimate goal of assisted navigation in interventional surgeries is to reconstruct the three-dimensional (3D) model of cardiovascular blood vessels prior to or during the surgeries, and to accurately register and fuse it with the intra-operative real-time angiographic image sequences, thus to guide physicians in clinical surgeries to locate the areas of artery stenosis, and assess the degree of stenosis. This will greatly reduce the difficulties of interventional surgeries, and is also of great economic and social significance to the teaching and training of novice physicians, enabling relatively complex surgeries to be completed without requiring extensive clinical experiences.

3D reconstruction of coronary arteries from coronary computed tomography angiography (CCTA) is a well developed technology, and has been integrated into CT machines produced by different manufacturers. However, the reconstruction from CCTA is affected by artifacts and heart beats, thus reducing the sensitivity (true positive rate) of detecting and assessing the stenosis[1]

. As a consequence, CCTA is not widely adopted as an essential treatment in the interventional surgery. Interventional coronary angiography is the most effective way to achieve the goal of locating and assessing the stenosis, so it is known as the golden standard and is widely applied in clinical situations. Compared with 3D reconstruction from CCTA images, it makes more sense to directly reconstruct 3D models from coronary angiography images of different angles of view. Stereo vision in the field of traditional computer vision have developed methods to recover the 3D shape of an object from multi-view 2D images. However, this requires dense work on calibration, feature extraction, matching and fusion. The advent and prosperity of deep learning makes 3D reconstruction an easier task, and allows to reconstruct 3D shape from a single or sparse angles of view.

Ii Related Work

There has been several previous work focusing on the topic of general 3D object reconstruction using deep learning techniques. Joshua B. Tenenbaum’s group from the Department of Brain and Cognitive Sciences of MIT introduced a 3D-GAN to do the reconstruction via a generator network. The benefits of their model were to capture object structure impicitly and synthesize high-quality 3D objects, as well as to map from a low-dimensional probabilistic space to the space of 3D objects for exploring the 3D object manifold[2]. Silvio Savarese’s group from the Computational Vision and Geometry Laboratory of Stanford University has been making great contributions to 3D object and scene reconstructions and interpretations in the past a few years. They proposed 3D-R2N2 as a 3D fully-supervised learning method[3], and introduced a 2D weakly supervised learning method using a unique raytrace pooling layer[4]

, which inspired us to implement the angiographic imaging process in the generator network. Abhinav Gupta’s group from the Robotics Institute of Carnegie Mellon University studied 3D shape attributes - generic properties that capture curvature, contact and occupied space in depth, to infer the above mentioned attributes from a single 2D image, as well as to infer a low-dimensional vector representing the 3D shape from a single 2D image


The attempts at 3D reconstructions via deep learning in the medical image field have also been thriving in the recent a few years. Guang-Zhong Yang’s group from the Institute of Global Health Innovation of Imperial College London published their work on right ventricle point cloud reconstruction through a convolutional neural network, which they called the PointOutNet

[6]. Lei Xing’s group from the Department of Radiation Oncology of Stanford University had done an excellent job on the patient-specific reconstruction of 3D volumetric anatomy of upper-abdomen, lung, or head-and-neck from a single CT scan image[7]. However, the 3D human organs reconstructed using the above methods are simple in shape, compared with the complexity of branches and twists of the coronary arteries. To our best knowledge, existing methods to reconstruct 3D coronary arteries from 2D angiographic projections were mainly based on conventional stereo vision algorithms, which required careful calibration of imaging systems and laboriously dense key points labeling and registration[8] [9] [10] [11] [12]. In this paper, we fused the deep learning methods of general object 3D reconstruction with the special process and details of coronary angiography. By modifying the layers of the network, we realized the automation of the 3D reconstruction of coronary arteries.

Iii Methods

Because CCTA is susceptible to artifacts and cardiac motion errors, it has not been widely used in the field of cardiovascular disease diagnosis. The number of coronary angiography images (2D) is much larger than the number of cardiovascular 3D models reconstructed by CCTA, which makes it difficult to realize the goal of using 3D models as labels for fully supervised learning. Therefore, it is necessary to study an unsupervised or semi-supervised method to complete the work of reconstructing a 3D blood vessel model from 2D angiographic images.

In this paper, we implemented a weakly supervised 3D reconstruction method by generative-adversarial deep learning. First, we used deep neural networks to reconstruct a 3D volumetric coronary artery model from 2D angiography images of two different angles of view. Then the reconstructed 3D model was reprojected to render two 2D images. We inserted a raytrace pooling layer into the generator network to accomplish this, according to the angiographic imaging law and the given camera parameters. Weak supervision was realized by setting the objective function of the generator to be the cross entropy between the generated 2D images and the input images.

Iii-a The Projection Transformation

In the setting of normal photography, rays of light emitted from or reflected by an object pass through a group of lenses, and are captured by films or digital image sensors. However, in the case of X-ray angiographic imaging, the setting is sightly different from that. The object, usually the patient, is located between the X-ray light source and the imaging surface. The light emitted by the X-ray source is blocked partially by the object, forming a weakened image on the imaging surface. The contrast of the image, ie., the degree of light and darkness of each pixel is inversely proportional to the absorption capacity of the object to X-ray. We first briefly formulate the imaging process of the angiographic system.

Iii-A1 Angiographic Projection

First, we introduce several coordinate systems implemented in the imaging process. It should be noted that the coordinate system is only a mathematical tool to assist us in understanding spatial relationships. It is virtual and not physical. For simplicity, we suppose the object, i.e., the patient lies in the origin of the world coordinate system, the X-ray source is placed below the patient, and the camera is placed above the patient. The world coordinate system is fixed in the space, and is a right hand system, whose -axis points along the left shoulder of the patient, -axis along the head, and -axis along the frontal direction.

Considering the different orientations of X-ray light, we introduce an ’perspective’ coordinate system (), in which the X-ray always points along the positive orientation of -axis. The origin of the perspective coordinate system coincides with that of the world coordinate system. Different orientations of fluoroscopic projection are produced by rotating the perspective system in the world system. We first rotate the world system around its -axis by , then around its -axis by . If , the view is denoted as left anterior oblique (LAO), otherwise right anterior oblique (RAO). If , the view is denoted as cranial (CRA), otherwise caudal (CAU)[16]. The overall rotational matrix is


The camera coordinate system () is a left hand system, whose origin is shifted with a distance of along the -axis of the perspective system.

Fig. 1: Perspective and camera coordinate systems.

The transformation matrix between the perspective and camera systems is easily obtained,


A point in the coordinate system is in the system:


Thus, the two coordinate systems are bound together. We alter them simultaneously.

Iii-A2 Voxel Coordinate System

We refer to the organ to be angiographically imaged as the object. The outer contour of the object is represented as a cube with a minimum side length , and is fit into a voxel coordinate system () to digitize the object. Although the object center coincides with the origin of the coordinate system, the origin of the voxel system is translated to one of the vertices of the cube for simplicity of computation. First we normalize the cube by its side length into an auxilary coordinate system (),


We further divide the cube into three-dimensional voxels, with a spatial resolution in each dimension. The spatial resolution of the cube is . The voxel coordinate system is related to the system as


Eventually, we have derived the transformation from perspective coordinate system to voxel coordinate system,


Iii-A3 Image Coordinate System

The imaging censor is placed at a distance from the object, as illustrated in Fig. 1. The transformation from the three-dimensional camera coordinate system to the two-dimensional image coordinate system squashes the -axis, expressed as


where is the focal length, , the dimensions of a pixel on the sensor, , the central positions of the sensor, , the two-dimensional size of the sensor, and , the width and height of the image. The transformation from voxel coordinate system to image system combines rotation, translation and squashing, expressed as


which was used (8) to project a 3D voxel model onto a 2D plane and render an image.

However, the naive inversion of the projection transformation fails to map a pixel point on the 2D image to a specific voxel point in the 3D space, for the inversion is a severely underdetermined problem. There has been plenty work using conventional stereo vision algorithms to resolve this problem [8] [9] [10] [11] [12], but all these methods required dense calibration and laborious manual work.

Iii-B GAN and WGAN nets

The recent development of deep learning has stretched its applications to pluralistic areas, including 3D reconstruction and interpretation of objects or scenes. In this paper, we follow the logic of the work of J. Gwak, et al. [4]. In the original work the authors proposed a novel method of weakly supervised 3D reconstruction from a single view or a sparse set of views of 2D silhouette images, with an adversarial generative network (GAN) [13]

. The authors used a set of realistic 3D shapes to play the role of the discriminator. They further introduced an innovative raytrace pooling layer to reproject the reconstructed 3D shapes to 2D images for training the generator, taking into account the camera parameters. Despite the improvements in performance over previous work, the loss function they used in the discriminator was equivalent to Jensen–Shannon divergence (JS–div). M. Arjovsky, et al., proved that GAN with JS–div loss function was problematic and ineffective in training, and almost always caused vanishing gradients and mode collapse

[14]. Instead of the traditional JS–div loss, we used Wasserstein distance as the loss function to train the discriminator, and to form a Wasserstein GAN [15].

Wasserstein distance (WD) is also known as Earth–Mover distance (EMD),


where is the real data distribution, is the synthetic data distribution, and

is the set of all the possible joint distributions. WD or EMD is defined as the inferior limit of the expectations of the distance between a real sample data

and a generated sample data . In the majority case, the support set of the overlap between and has a zero measure. This implies the JS–div has a constant value, thus its derivative is zero which hinders the training of the discriminator [14]. The great advantage of WD comes from the fact that even in this case, WD is still continuous and provides meaningful gradients for the optimization of the discriminator [15]. In real applications, WD is approximated by a regression neural network, to avoid the complexity of direct computation. Wasserstein GAN (WGAN) outperformed the original GAN or its derivatives in tasks such as fine image generation by a large margin.

Iii-C Our network

In this work, we used a WGAN to generate 3D coronary artery models from 2D angiographic images. The WGAN contained a generator and a discriminator network.

Two types of supervision were used to train the generator network: 3D full supervision and 2D weak supervision. During coronary interventional surgeries, physicians usually take coronary angiography from two different angles. To be consistent with this fact, the generator network accepted two 2D images of different angles of view as input. The 2D images were obtained by projecting the same 3D coronary artery model in two perspectives. The output of the generator network was a 3D voxel model. In the case of 3D full supervision, the cross-entropy of the generated 3D voxel and the 3D voxel label was calculated, and used as the objective function for training the generator. The purpose of using 3D full supervision is to verify the performance improvement of WGAN over the original GAN. In the case of 2D weak supervision, based on the angiographic imaging process described in equation (8) and the angle parameters of the input images, the generated 3D voxel model was reprojected through a ray-tracing pooling layer to generate two 2D images. Then the cross-entropy of the rendered 2D images and the input 2D images was calculated.

In training the discriminator, wasserstein distance (WD) between the generated 3D voxel and the 3D voxel label is used as the loss function of the discriminator. Theoretically, the generator and the discriminator are trained alternately, in an adversarial manner. However, in our experiment, we trained the discriminator after the generator was trained several times. The training process terminated when WD is no greater than the threshold or the number of iterations exceeds the upper limit.

The network structures of the generator and the discriminator, with 2D weak supervision, are illustrated in Fig. 2 and Fig. 3.

Fig. 2: Network structure of the generator.
Fig. 3: Network structure of the discriminator

Iv Experiments

Two experiments were conducted under different types of supervisions - 3D full supervision and 2D weak supervision, on our coronary artery dataset.

Iv-a Dataset

We collected 3D coronary artery models reconstructed directly from CCTA images, with a high accuracy and a nearly smooth surface. Each model contained the left anterior descending artery (LAD) and the right coronary artery (RCA). An example is shown in Fig. 4.

Fig. 4: 3D coronary artery model reconstructed from CCTA images.

We augmented the dataset by randomly rotating the original 3D models within the range of 5-15 degree along the three axes, as well as resizing the models with a random factor between 3%-10%, the similar treatment adpoted by Mateusz Buda, et al. [17]. Every original model was rotated with 20 different angles and resized with 10 different factors. Thus a dataset containing 8800 samples were created, which is referred to as CADSetA and used in the reconstruction experiment under 3D supervision.

In the 2D weakly supervised reconstruction experiment, in order to grasp the shape intrinsics of LAD and RCA, we delicately decomposed the original model into an LAD model and an RCA model manually. We performed the random rotation and resizing treatments mentioned above on the LAD and RCA models independently, to produce two datasets each containing 8800 samples - CADSetL and CADSetR. Two examples of LAD and RCA corresponding to one patient, under different sets of imaging angles are illustrated in Fig. 5.

Fig. 5: Separate models of LAD (left column) and RCA (right column).

Iv-A1 Details of processes of the dataset

Abiding by the imaging transformation principle described in (8), we digitally projected a 3D model into two 2D images as the input of the generator network, each with a resolution of . The camera parameters were also used in the later reprojection process. In the 3D full supervision case, the output voxel from the generator had a resolution of , so the original smooth 3D models were voxelized into the same resolution as ground truths. In the 2D weak supervision case, the generated voxels had a lower resolution of . Thus the original 3D models were voxelized into as ground truths. The input images were accordingly resized to images as the 2D supervisions.

Iv-B Results

Iv-B1 3D fully supervised experiment

In the experiment with 3D full supervision, we projected the entire 3D model to 2D images, including LAD and RCA. Four best examples of generated 3D voxels and the ground truths are presented in Fig. 6.

Fig. 6: The best results of 3D full supervised reconstruction of coronary artery. Top row: reconstructed voxels from our network. Bottom row: ground truths.

Except for minor dispositions in a few voxels, the reconstructed models looked approximately identical as the label models. We also present four examples of the least accurate results in Fig. 7.

Fig. 7: The least accurate results of 3D full supervised reconstruction of coronary artery. Top row: reconstructed voxels from our network. Bottom row: ground truths.

There existed missing parts, artefact, and deformation in these reconstructed models.

The performance assessment we used is IoU, ie., the ratio of the intersection between the generated voxel and the ground truth over their union. Both the reconstructions and ground truths had a resolution of . We obtained a mean IoU of 0.718 on CADSetA. As a comparison, J. Gwak, et al. presented their results of 3D fully-supervised generation (3D-R2N2) in Table 1 of their paper [4]. They obtained a mean IoU of on ShapeNet dataset, and regarded it as the upper limit of performance. Under careful analysis, we discovered that their model performed well on objects with simple shapes, for instance, IoU of 0.8338 for the car category, and of 0.6784 for the sofa. However, their model did not perform well on objects with complex shapes, which might have multiple branches or legs. For instance, they obtained an IoU of 0.5174 for the chair category, and 0.4950 for the bench category. Coronary arteries have a much more complex shape than the categories they tested, with several branches, twisting in multiple angles. Based on this knowledge, we can be confident that our 3D fully supervised reconstruction results are better than previous work. The result also meets the expectation of performance enhancements of WGAN over original GAN.

Iv-B2 2D weakly supervised experiment

In the 2D weak supervision experiment, we decomposed the original 3D coronary artery model into an LAD model and an RCA model. This treatment is in accordance with the real surgery situations. In an interventional surgery, the physician inspects LAD and RCA in different stages, not concurrently. We conducted independent experiments on the LAD model and on the RCA model, each with two different projection angles of and , and presented two examples of results from each category in Fig. 8 and Fig. 9. Each result contained two rows and three columns. The bottom row illustrated the ground truths, and the top row illustrated the reconstructions. The left column illustrated the reprojected images from the first angle, the middle column the reprojected images from the second angle, and the right column the 3D voxels. Because of the difficulties in training with 2D weak supervisions, the reconstructions and the ground truths had a resolution of

. The color of each pixel in the reprojected images represented the probability of occupancy

[4], with red and yellow color representing higher confidence, blue and green representing lower confidence.

Fig. 8: 2D weakly supervised reconstructions of left anterior descending arteries (LAD). Two examples are illustrated. For each example, top row: reconstructed voxels and projected images; bottom row: ground truths; left column: 2D projections of the first view; middle column: 2D projections of the second view; right column: 3D voxels.
Fig. 9: 2D weakly supervised reconstructions of right coronary arteries (RCA). Two examples are illustrated. For each example, top row: reconstructed voxels and projected images; bottom row: ground truths; left column: 2D projections of the first view; middle column: 2D projections of the second view; right column: 3D voxels.

Except for some artifacts in the reconstructed voxels, we obtained an IoU of 0.407 on our combined dataset of CADSetL and CADSetR. Again, our results outperformed the five-view reconstructions of chair category (0.3717) and bench category (0.321) in the previous work [4].

V Conclusion

By modifying the 3D reconstruction network from original GAN to WGAN, we improved the accuracy of 3D reconstruction for complex-shaped objects like the coronary arteries by a large margin over previous work. Basing on the principle of angiographic imaging, we introduced a reprojection layer into the generator network, and obtained a better accuracy for reconstructing 3D coronary artery models by 2D weakly supervised learning.


  • [1] K. Samarzija, P. Milosevic, Z. Jurjevic, and E. Erdeljac, ”Grading of carotid artery stenosis with computed tomography angiography: whether to use the narrowest diameter or the cross-sectional area,”  Insights into Imaging vol. 9, no. 4, pp. 527-534, Aug. 2018.
  • [2] J. Wu, C. Zhang, T. Xue, W. T. Freeman, and J. B. Tenenbaum, ”Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling,” in  Proc. 30th Conf. on Neural Information Processing Systems (NIPS), Barcelona, Spain, 2016, pp. 82-90.
  • [3] C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese, ”3D-R2N2: a unified approach for single and multi-view 3D object reconstruction,” in  Proc. 14th European Conf. on Computer Vision (ECCV), Amsterdam, The Netherlands, 2016, vol. 8, pp. 628-634.
  • [4] J. Gwak, C. B. Choy, M. Chandraker, A. Garg, and S. Savarese, ”Weakly supervised 3D reconstruction with adversarial constraint,” in  Proc. 2017 Int. Conf. on 3D Vision, Qingdao, China, 2017, pp. 263-272.
  • [5] D. F. Fouhey, A. Gupta, and A. Zisserman, ”From images to 3D shape attributes,”  IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 1, pp. 93-106, 2019.
  • [6] X. Zhou, Z. Wang, P. Li, J. Zheng, and G. Yang, ”One-stage shape instantiation from a single 2D image to 3D point cloud,” in  Proc. Medical Image Computing and Computer Assisted Intervention (MICCAI), Shenzhen, China, 2019, vol. 4, pp. 30-38.
  • [7] L. Shen, W. Zhao, and L. Xing, ”Patient-specific reconstruction of volumetric computed tomography images from a single projection view via deep learning,”  Nature Biomedical Engineering, vol. 3, pp. 880–888, Nov. 2019.
  • [8] C. J. Slager, J. J. Wentzel, J. C. H. Schuurbiers, J. A. F. Oomen, J. Kloet, R. Krams, et al., ”True 3-dimensional reconstruction of coronary arteries in patients by fusion of angiography and IVUS (ANGUS) and its quantitative validation,”  Circulation, vol.102, no. 5, pp. 511-516, Aug. 2000.
  • [9] C. V. Bourantas, I. C. Kourtis, M. E. Plissiti, D. I. Fotiadis, C. S. Katsouras, M. I. Papafaklis, et al., ”A method for 3D reconstruction of coronary arteries using biplane angiography and intravascular ultrasound images,”  Computerized Medical Imaging and Graphics, vol. 29, no. 8, pp. 597-606, Dec. 2005.
  • [10] N. Sang, W. Peng, H. Li, Z. Zhang, and T. Zhang, ”3D reconstruction of the coronary tree from two x-ray angiographic views,” in  Proc. Medical Imaging 2006: Image Processing, San Diego, California, USA, 2006, vol. 6144.
  • [11] R. Liao, D. Luc, Y. Sun, and K. Kirchberg, ”3-D reconstruction of the coronary artery tree from multiple views of a rotational X-ray angiography,”  The International Journal of Cardiovascular Imaging, vol. 26, no 7, pp. 733-749, Oct. 2010.
  • [12] R. M Tayebi, R. Wirza, P. S B Sulaiman, M. Z. Dimon, F. Khalid, A. Al-Surmi, et al., ”3D multimodal cardiac data reconstruction using angiography and computerized tomographic angiography registration,”  Journal of Cardiothoracic Surgery, vol. 10 58, 22 Apr. 2015.
  • [13] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, et al., ”Generative adversarial nets,” in  Proc. 28th Conf. on Neural Information Processing Systems (NIPS), Montreal, Canada, 2014, pp. 2672-2680.
  • [14] M. Arjovsky, and L. Bottou, (2017, April). ”Towards principled methods for training generative adversarial networks,” presented at  Proc. Int. Conf. on Learning Representations (ICLR), Toulon, France, 2017. [Online]. Available: https://openreview.net/forum?id=Hk4_qw5xe
  • [15] M. Arjovsky, S. Chintala, and L. Bottou, ”Wasserstein generative adversarial networks,” in  

    Proc. 34th Int. Conf. on Machine Learning, PMLR

    , Sydney, Australia, 2017, vol. 70, pp. 214-223.
  • [16] M. Kern, ”Angiographic projections made simple: an easy guide to understanding oblique views,”  Cath Lab Digest, vol. 19, no. 8, 2011.
  • [17] M. Buda, A. Sahaa, and M. A. Mazurowski, ”Association of genomic subtypes of lower-grade gliomas with shape features automatically extracted by a deep learning algorithm,”  Computers in Biology and Medicine, vol. 109, pp. 218-225, 2019.