Since its introduction in the 1970s, computed tomography (CT) has become widely used in medical imaging to obtain a comprehensive and non-invasive view of internal structures and has revolutionized diagnostic decision making. Especially in the field of surgery, CT imaging decreased the need for emergency procedures from 13% to 5% and has significantly decreased the need for exploratory surgical procedures [Power2016]. Furthermore, its incorporation into clinical practice has optimized hospital workflow by decreasing the number of patients requiring inpatient care [Rosen2000, Rosen2003]. In the NHS alone, 6 million CT scans were performed in 2018-2019 [Baker2020]. The basic principle of CT technology involves transmitting ionizing radiation, or x-rays, through a region-of-interest (ROI). The transmitted rays are then incident on an electronic detector to create a ‘cut’ through the object. Both the radiation source and the detector rotate around the object to obtain multiple ‘slices or ‘cuts’. These projections are then used to reconstruct a 3-D representation of the ROI. Recent advancements have allowed for faster acquisition times, less intrinsic movement artefact and the ability to capture a greater area at a higher resolution during a single acquisition period [Foley, Sun2012].
As the x-rays pass through the patient, the rays are attenuated depending on the density of tissue it passes through. The variations in physical density of different objects translates to differences in attenuation and subsequent radio- densities (measured in Hounsfield Units, HU) on a CT scan[Foley]. The higher the attenuation, the brighter CT image (ex. Bone and calcification) and the lower the attenuation, the darker the CT image (ex. Air). Therefore, the intrinsic contrast of the image is generated based on the differences in attenuation between adjacent tissues [Sun2012].
Figure 1 gives an example of an axial CT slice through the abdomen. Bony spine (orange arrow) is distinctly more radio-opaque compared to psoas muscle (blue arrow) or the abdominal aorta (red arrow). However, the densities between the abdominal aorta and psoas muscles appear similar here.
Where treatment of an artery is being considered, a detailed view of the arterial anatomy is required. In the example of abdominal aortic aneurysms (AAA, abnormal ballooning of the abdominal aorta), an intra-luminal thrombus (ILT) adherent to the aortic wall within the enlarging aneurysmal sac is present in 95% of cases [Aggarwal2011a]. Given the similarities in density between the blood lumen, ILT and the complex blood-thrombus interface, these regions cannot be readily distinguished using a conventional CT image.Clear visualization of these regions can only be achieved by the injection of an intravenous (IV) contrast agent. The primary purpose of IV contrast is to enhance luminal density and increase both the attenuation and intrinsic contrast between the vascular tree and surrounding soft tissues. This optimizes the visualization of the vasculature and generates a CT angiogram (CTA) [Foley, Sun2012].
Although CTAs may provide unique insight into the structure of the vascular tree, it is associated with several disadvantages [Sun2012, Hinson2017]. CTAs are contraindicated in patients with iodine allergies as most agents are iodine-based. Furthermore, administration of contrast agents requires needle insertion. This causes additional discomfort and has been associated with complications including inadvertent arterial puncture by needle, and contrast leak from veins causing skin irritation/damage [Hinson2017]. Additionally, contrast agents are nephrotoxic and have up to 12% incidence of acute kidney injury (contrast-induced nephropathy) following use [Hinson2017]. This is especially a problem within the elderly population, who either have decreasing baseline renal function or concomitant chronic kidney disease. In these high-risk patients, there is a recognised risk of complete kidney failure, which may lead to renal dialysis.
We hypothesise that the raw data acquired from a non-contrast CT contains sufficient information to differentiate blood and other soft tissue components. Blood, thrombus, and artery wall are made up of different components: blood is predominantly fluid, with red/white blood cells; thrombus is predominantly fibrinous and collagenous, with red cells/platelets; artery wall predominantly contains smooth muscle cells with collagen. These individual components vary in physical density which should reflect in different (albeit subtle) HUs on a CT scan. We further hypothesise that using deep learning approaches, the subtleties between the various components of the soft tissue can be defined and amplified to enable simulation of contrast enhanced CT images without the need to inject contrast agents.
Ii-a Obtaining CT images from a clinical cohort
Computerised Tomographic scans of the chest and abdomen were acquired through the Oxford Abdominal Aortic Aneurysm (OxAAA) study. The study received full regulatory and ethics approval from both Oxford University and Oxford University Hospitals (OUH) National Health Services (NHS) Foundation Trust (Ethics Ref 13/SC/0250). As part of the routine pre-operative assessment for aortic aneurysmal disease, a non-contrast CT of the chest/abdomen/pelvis and an arterial phase CT angiogram (CTA) was performed. CTA images were obtained following contrast injection in helical mode with a pre-defined slice thickness of 1.25 mm. On the other hand, non-contrast CT images were obtained with a pre-defined slice thickness of 2.5 mm. Paired contrast and non-contrast CT images were anonymised within the OUH PACS system before being downloaded onto the secure study drive. Twenty-six patients with paired non-contrast and CTA images of the abdominal region were randomly selected.
Ii-B Manual Segmentation of CT Images (Training Dataset)
Thirteen (of the 26) cases were randomly selected as the training dataset. Manual segmentation of the aortic inner lumen (in contrast enhanced CTAs) and aortic outer lumen (for both non-contrast and contrast enhanced CTAs) were performed using the open source ITK snap software[Yushkevich2006]. Segmentation of the aorta was performed from the aortic root to the iliac bifurcation.
Ii-C Reorientation of Contrast-Enhanced Scans
To account for voluntary and involuntary movement by the patient between scans, it was necessary to re-orientate the contrast-enhanced images obtained and corresponding segmentation masks to the non-contrast image plane. Registration of the axial slices was performed using the Image Orientation Patient (X, Y, Z coordinates) and Image Orientation Patient (patient’s relative rotation) attributes found within the DICOM header. Minimal variation in the orientation of the bowel/air bubbles was observed between the contrast and non-contrast slices. Appropriate registration of 100 randomly selected axial images was confirmed by 2 blinded reviewers (NS and PL) slices prior to subsequent analysis.
Ii-D Hounsfield Unit Sampling
In order to investigate the regional differences, non-contrast axial slices within the aneurysmal region, 10 consecutive axial slices were taking from 10 patients, (total = 100 slices) were sampled for the underlying Hounsfield unit (HU) distribution at the lumen, intra-luminal thrombus and interface locations (Figure 2). These visually indistinct regions on the non-contrast CT slice were identified from their paired contrast counterpart. The reoriented binary masks were used to establish the boundary locations in the non-contrast image. To account for slight inaccuracies in the image reorientation process and minimize sampling errors, the thrombus (blue) and lumen (yellow) areas were reduced by 20% and the zone between the two regions was demarcated as the interface (red). This delineation is clearly indicated in Figure 2.
One-way ANOVA was used to compare the average HU intensity within each region (in individual patients) both across and within each slice for each patient. The negative control consisted of concentric sampling within the lumen region. This was used as blood on a macroscopic level is randomly distributed and relatively homogeneous and should produce similar HU intensities.
Ii-E Image Pre-Processing
Of the 26 patients, 13 patients were randomly allocated to the training (ntrain = 13) cohort. Following segmentation, the original pre-aligned CT images and their corresponding image masks of patients in the training cohort were augmented using divergence transformations. These divergence transformations employ non-linear warping techniques to each axial slice, which manipulate the aorta in certain predefined locations. In this instance, both congruent and divergent local transformations were utilized to diversify the training dataset. These augmentation methods have been previously validated. Therefore, each patient’s scan in the training cohort was augmented in a ratio of 10:1 to obtain a total of 23,551 2-D images (axial).
Ii-F Cycle-GAN Architecture
In this study, we utilized a cycle-GAN, which is a variation of the popular generative adversarial network (GAN). These networks are a class of deep learning architecture whereby two neural networks train simultaneously, with one network focused on data generation (generator) and the other network focused on data discrimination (discriminator). The two neural networks ‘compete’ against each other, learning the statistical distribution of the training data, which in turn allows to generate new examples from the same distribution. GANs have been applied to generate imaginary portraits, landscapes, and artworks based on real examples[Zhu2017a]
. Many variations of the original GAN concept have been developed, including conditional GANs (cGANs), which can learn the transformation between two paired distributions, including the transformation between images using the pixel to pixel (Pix2Pix) approach[Isola2016]. The primary benefit of cylceGANs is that it can learn transformations from two distributions without the need for direct pairings between specific samples [Zhu2017a].
The generator and discriminator components in the NonContrast-to-Contrast (NC2C) model architecture (Figure 3
) were explicitly defined as least-squares GAN and a 70 x 70 PatchGAN, respectively. The former incorporates an additional least-squares loss function for the discriminator, which in turn, improves the training of the generative model. On the other hand, the discriminator goes through the image pairs, in 70 x 70 patches, and is trained to classify whether the image under question is “real” or “fake”.
Ii-G Model Training and Evaluation
The NC2C-cycleGAN (Figure 3) was trained with a learning rate of 2.0 * 10-4
for 200 epochs on 256 x 256 images centred around the aorta. Four networks (2 generators + 2 discriminators) were trained simultaneously and various loss functions were evaluated at each iteration to document model training. In addition to the loss metrics inherent to the networks, both an identity mapping and a cycle consistency loss functions were included to ensure appropriate style transfer and regularization of the generator to allow for image translation, respectively. The accuracy of the cycleGAN output in generating the lumen and thrombus interface was assessed by comparison to the contrast image, which serves as a gold standard.
Average HU intensity in the non-contrast images was significantly different between all three regions (Lumen vs. Thrombus, Lumen vs. Interface and Interface vs. Thrombus) for all patients assessed. When assessing on a slice-by-slice basis, the average HU intensity of the thrombus was significantly different from that of the lumen 94% of the time. Histograms corresponding to the HU frequencies for each region had considerable overlap, however a shift in frequency distribution is also apparent (Figure 4B). Figure 4B further displays the differences in HU intensities for 8 axial slices obtained from one of the ten patients analyzed. The average HU intensity for the thrombus was significantly lower when compared to that of the lumen for all axial slices sampled. Furthermore, the interface was also significantly different from the other two regions, indicating a gradual change in pixel intensity from the center lumen to the peripheral thrombus. No significant differences in HU intensity were noted following concentric sampling of the aortic blood flow lumen (Figure 4C,D).
Figure 5 illustrates the loss functions during the training of the NC2C-cycle GAN. The inherent generator and discriminator loss functions converge during model training. Additionally, the identity mapping and cycle consistency loss functions for both directions (1. Contrast-to-Non-Contrast, 2. Non-Contrast-to-Contrast) plateau within the first 50 epochs. Model performance was evaluated on the testing (ntest =13) cohort. This generative model is able to simulate the aortic lumen throughout the length of the aorta and differentiate between the lumen and intra-luminal thrombus of aneurysmal sections with strong resemblance to the ground truth. In addition to the aorta, this algorithm is able to transform other structures including the small mesenteric arteries (orange arrows), the pulmonary arteries (yellow arrow) and the kidneys (blue arow) as seen in Figure 6.
In the realm of medical image synthesis, traditional image transformation tasks have relied on patch-based regressions. These methods are able to sample a patch of an image or volume from one modality and predict its intensity in the target modality [Jog2013, Torrado-Carvajal2016]. In addition to patch-based regression models, sparse representation [Huang2017] and atlas-based models [Miller1993] have been used. The latter utilizes paired image atlases from the source- and target image modalities. Here, the atlas-to-image transformation is calculated for the source-modality and a similar transformation is performed for the target-modality atlas [Miller1993, Roy2013, Burgos2014]
In recent years, deep learning (DL) methods have been applied extensively in this domain and has achieved promising results. Dong et al. proposed a convolutional neural network approach to increase the resolution of medical images[Dong2014]. Additionally, GAN-based cross-modal image synthesis methods have gained significant traction and achieved success [Nie2016]. Here, these key methods rely on an adversarial training platform between a generator and a discriminator. The fundamental task of the generator is to produce images that are realistic and ultimately fool the discriminator, which is simultaneously trained to differentiate real and generated images [Goodfellow2014]. Variations of the GAN architecture include the Pix2Pix [Isola2016], and conditional [Mirza2014] and cycle -GAN networks [Zhu2017a]. As its name suggests, the Pix2pix network uses paired data to learn the appropriate pixel-to-pixel image transformations. This network strives to maintain pixel-to-pixel similarity between the real and the generated images [Isola2016]. On the other hand, the CycleGAN attempts to capture structural information by learning the translation mapping between unpaired images [Zhu2017a]. GAN-based methods have shown great promise in synthesizing various types of medical images including CT (from MR images [Nie2016], low-dose to routine-dose [Yang2017]), retinal [Costa2017, Costa2018], MR, and ultrasound images [Tom2017]. This study attempts, for the first time, the possibility to transform non-contrast CT images into contrast-enhanced CT Angiograms without the need the for intravenous (IV) contrast.
Currently, CT angiography relies on IV contrast injection to enhance the intrinsic contrast between the vascular tree and surrounding tissues. Therefore, this method is able to provide a unique view of the patient’s vasculature, which is essential for diagnosis and surgical planning. For example, in the case of AAA disease, CTAs are the current gold-standard to visualize the intra-luminal thrombus (ILT) within the enlarging aneurysmal sac. The orientation, location and extent of the ILT as well as the complex blood-ILT interface is vital information prior to surgical intervention [Aggarwal2011a]. Although CT angiography may provide unique insight into aneurysm morphology and the structure of the vascular tree, it is not without its disadvantages, as discussed above.
Additionally, this method to transform non-contrast CT to CTA images can be used to incorporate historic non-contrast CT scans for research purposes. A key priority for AAA research is to discovery novel indices of AAA growth prediction. Currently, this is documented by measuring the maximum antero-posterior aortic diameter of the abdominal aorta during serial duplex ultrasound scans [Schlosser2008]. However, this uses a 1-dimensional measurement that is susceptible to inter-observer error to characterize a diverse and complex 3-dimensional growing process [Kitagawa2013]. There is emerging evidence that patient-specific geometric and volumetric measurements more readily influence AAA growth [Shum2011]
. As small AAAs enlarge, a variety of geometrical changes have been observed. Many of these changes result in a unique non-uniform distribution of wall stress and have been hypothesized to either favour AAA growth deceleration or increase rupture risk[Shum2011, Martufi2013]
. Isolating and deciphering these changes will allow us to predict AAA growth and progression in each patient. Current work by our group has focused on generating a DL-algorithm for the automatic aortic segmentation of the aortic volume alongside feature extraction[Chandrashekar2020]. Integration of this image synthesis model with our volume extraction methods will allow us to utilize historic non-contrast CT scans across multiple centers for this complex geometric and morphological analysis.
This image synthesis pipeline can be extended to cover other anatomical structure (veins, solid organs, etc). Our preliminary work has shown that it is capable of generating reliable CTAs of the large arteries; however, there is evidence that this model is able to account for small vessels that branch from the aorta (ex. renal arteries, vertebral arteries, etc. and can be extended to other structures (ex. venogram, solid organ contrast CTs).
This study describes, for the first time, the ability to differentiate between visually incoherent soft tissue regions in non-contrast CT images using deep learning methods. Ultimately, refinement of this methodology may negate the use of intravenous contrast and prevent related complications.
We acknowledge the support from the following: Medical Sciences Division, University of Oxford Medical Research Fund; John Fell Fund, University of Oxford; Academy of Medical Sciences Starter Grant to RL (AMS-SGL013 \1015; Oxford University Clarendon Scholarship to AC. The methods described in this manuscript is subject to a patent filing (UK priority filing, P276235GB).