Metal objects in a patient, such as dental fillings, artificial hips, spine implants, and surgical clips, will significantly degrade the quality of computed tomography (CT) images. The main reason for such metal artifacts is that the metal objects in the field of view strongly attenuate x-rays or even completely block them so that reconstructed images from the compromised/incomplete data are corrupted in various ways, which are usually observed as bright or dark streaks. As a result, the metal artifacts significantly affect medical image analysis and subsequent clinical treatment. Particularly, the metal artifacts degrade the counters of the tumor and organs at risk, raising great challenges in determining a radio-therapeutic plan [Gian2017, Maerz].
Over the past decades, extensive research efforts [Gjesteby2016] have been devoted to CT metal artifact reduction (MAR), leading to various of MAR methods. Traditionally, the projection domain methods [Kalender, nmar]
focus on projection data inside a metal trace, and replace them with estimated data. Then, the artifact-reduced image can be reconstructed from the refined projection data using a reconstruction algorithm, such as filtered backprojection (FBP). However, the projection domain methods tend to produce secondary artifacts as it is difficult for the estimated projection values to perfectly match the ground truth. In practice, the original projection data and the corresponding reconstruction algorithm are not publicly accessible. To apply the projection based methods in the absence of original sinogram data, researchers such as reported in[Bal] proposed a post-processing scheme that generates the sinogram through forward projection of a CT image first and then applies the projection based method on the reprojected sinogram. However, the second-round projection and reconstruction may introduce extra errors.
To overcome the limitations of the projection domain methods, researchers worked extensively to reduce metal artifacts directly in the image domain [Hamid, karimi]
. With deep learning techniques[deepimaging], data-driven MAR methods were recently developed based on deep neural networks [Huang2018MetalAR, Wang2018, Gjesteby_2019, zhang2018, dudonet, adn], demonstrating superiority over the traditional algorithms for metal artifact reduction. However, most existing deep learning based methods are fully-supervised, requiring a large number of paired training images, i.e., the artifact-affected image and the co-registered artifact-free image. In clinical scenarios, it is infeasible to acquire a large number of such paired images. Therefore, the prerequisite of these method is to simulate artifact-affected images by inserting metal objects into artifact-free images to obtain paired data. However, simulated images cannot reflect all real conditions due to the complex physical mechanism of metal artifacts and many technical factors of the imaging system, degrading the performance of the fully-supervised models. To avoid synthesized data, the recently proposed ADN [adn] only uses clinical unpaired metal-affected and metal-free CT images to train a disentanglement network on adversarial losses, giving promising results on clinical datasets, outperforming the fully-supervised methods trained on the synthesized data. However, without accurate supervision, the proposed ADN method is far from being perfect, and cannot preserve structural details in many challenging cases.
In this study, we improve the MAR performance on clinical datasets from two aspects. First, we formulate the MAR as the artifact disentanglement while at the same time leveraging the low-dimensional patch manifold of image patches to help recover structural details. Specifically, we train a disentanglement network with ADN losses and simultaneously constrain a reconstructed artifact-free image to have a low-dimensional patch manifold. The idea is inspired by the low-dimensional manifold model (LDMM) [LDMM] for image processing and CT image reconstruction [cong2019]. However, how to apply the iterative LDMM algorithm to train the disentanglement network is not trivial. To this end, we carefully design an LDM-DN algorithm for simultaneously optimizing objective functions of the disentanglement network and the LDM constraint. Second, we improve the MAR performance of the disentanglement network by integrating both unpaired and paired supervision. Specifically, the unpaired supervision is the same as that used in ADN, where unpaired images come from artifact-free and artifact-affected groups. The paired supervision relies on synthesized paired images to train the model in a pixel-to-pixel manner. Although the synthesized data cannot perfectly simulate the clinical scenarios, they still provide helpful information for recovering artifact-free images from artifact-affected ones. Finally, we design a hybrid training scheme to combine both the paired and paired supervision for further improving MAR performance on clinical datasets.
The rest of this paper is organized as follows. In the next section, we review the related work. In section 3, we describe the proposed method, including the problem formulation, the construction of a patch set, the dimension of a patch manifold, the corresponding optimization algorithm, and the hybrid training scheme. In section 4, we evaluate the proposed LDM-DN algorithm for MAR on synthesized and clinical datasets. Extensive experiments show that our proposed method consistently outperforms the state-of-the-art ADN model and other competing methods. Finally, we conclude the paper in section 5.
2 Related work
2.1 Metal Artifact Reduction
CT metal artifact reduction methods can be classified into three categories, including projection domain methods, image domain methods and dual domain methods.
Projection based methods aim to correct projections for MAR. Some methods of this type [park2016, jiang2000, meyer2010]
directly corrects corrupted data by modeling the underlying physical process, such as beam hardening and scattering. However, the results of these methods are not satisfactory when high-atom number metals are presented. Thus, a more sophisticated way is to treat metal-affected data within the metal traces as unreliable and replaced them with the surrogates estimated by reliable ones. Linear interpolation (LI)[Kalender] is a basic and simple method for estimating metal corrupted data. The LI method is prone to generate new artifacts and distort structures due to mismatched values and geometry between the linearly interpolated data and the unaffected data. To address this problem, the prior information is employed to generate a more accurate surrogate in various ways [Bal, nmar, wj2013]. Among these methods, the state-of-the-art normalized MAR (NMAR) method [nmar] is widely used due to its simplicity and accuracy. NAMR introduces a prior image of tissue classification for normalizing the projection data before the LI is operated. With the normalized projection data, the data mismatch caused by LI can be effectively reduced for better results than that of the generic LI method. However, the performance of NMAR largely depends on the accurate tissue classification. In practice, the tissue classification is not always accurate so that NMAR also tends to produce secondary artifacts. Recently, deep neural networks [Bernhard2017, liao2019, Ghani2020] were applied for projection correction and achieved promising results. However, such learning based methods require a large number of paired projection data.
The image domain based methods directly reduce metal artifacts based on CT image post-processing. Conventionally, some methods [Hamid, karimi] leverage image processing techniques to estimate and remove the streak artifacts from original artifact-affected images, but these hand-crafted methods have a limited performance. From a data-driven perspective, deep learning methods for MAR have superiority over traditional approaches. For example, RL-ARCNN [Huang2018MetalAR]
is a convolutional neural network with residual learning for MAR, achieving better results than the plain CNN[vgg]. cGANMAR [Wang2018]
regards MAR as an image-to-image transformation, and adapts the Pix2pix[Isola_2017_CVPR] model to improve the MAR performance.
To benefit from both projection and image domains, various dual domain based methods were also proposed. DestreakNet [Gjesteby_2019] takes a corrected image by the state-of-the-art NMAR method and a detail mapping derived from the original image as the inputs to a dual-stream network, giving better results than what achievable in a single domain. In the CNN-MAR method [zhang2018], the CNN first takes the original image and corrected images by BHC [Verburg2012CTMA] and LI [Kalender] as the inputs and then produces a CNN output image, which is used to generate a prior image. Then, the projection data of the prior image is used to correct the original projection data, and the final image is reconstructed with FBP. DuDoNet [dudonet] introduces an end-to-end dual domain network to simultaneously correct sinogram data and CT images.
All above deep learning methods for MAR require a large number of synthesized paired project datasets and/or CT images for training. A recent study [adn] has shown that the models [zhang2018, Wang2018] trained on the synthesized data cannot generalize the well on the clinical datasets. Then, ADN was designed and tested on clinical unpaired data, achieving promising results. However, without a strong supervision, ADN can hardly recover structural details in challenging cases.
In this study, we introduce a novel image prior, i.e., low-dimensional manifold, and different levels of supervision to train the disentangle network for improving the MAR performance on clinical datasets. Our proposed LDM prior guided disentanglement framework and synergistic supervision scheme have the potential to empower image domain and dual domain based methods as further detailed below.
2.2 Low-dimensional manifold
The patch set of natural images has been proved coming from a low-dimensional manifold [Lee2003, carlsson2008, peyre2008, peyre2009]
. Based on this low dimensionality of the patch manifold, LDMM first computes the dimension of the patch manifold based on the differential geometry and then uses the dimension to regularize an image recovery problem, including image impainting, super-resolution, and denoising. Based on LDMM, LDMNet[ldmnet] proposes to regularize the combination of input data and output features within a low-dimensional manifold in the context of the classification task, showing a competitive performance over popular regularizers such as low-rank and DropOut. Recently, Cong et al [cong2019] proposed to use LDMM in regularizing the CT image reconstruction, demonstrating that the LDMM has a strong ability to recover detailed structures in CT images. Inspired by the recent results with LDMM, here we propose an LDM constrained disentanglement network with both paired and unpaired supervision for improving the MAR performance on clinical datasets.
3.1 Problem formulation
Before the problem formulation, let us introduce the general neural network based method in the image domain. In the supervised learning mode, we have the paired dataavailable, where each artifact-affected image , has a corresponding artifact-free image , and is the number of paired images. Then, the deep neural network based model can be trained on this dataset with the loss function:
where is a loss function, such as the -distance function, and represents the predicted image of
by the neural network with a parameter vector. In practice, a large number of paired data are synthesized for training the model, as the clinical datasets only contain unpaired images.
To improve the MAR performance on clinical datasets, ADN adapts the generative adversarial learning based disentanglement network for MAR, only requiring an unpaired dataset, , where represents an artifact-free image. The ADN consists of several encoders and decoders, which are trained with several loss functions, including two adversarial losses, a reconstruction loss, a cycle-consistent loss and an artifact-consistent loss. For simplicity, we denote the ADN loss functions as:
where represents the combination of all loss functions of ADN.
In this work, we introduce a general image property of known as LDM to improve the MAR performance on clinical datasets. Specifically, we assume that a patch set of artifact-free images samples a low-dimensional manifold. Therefore, we formulate the MAR problem as follows:
where denotes the patch set of artifact-free or artifact-corrected images, is a smooth manifold isometrically embedded in the patch space, and can be any network loss functions, such as for paired learning or for unpaired learning.
To solve the above optimization problem, we need to specify the reconstruction of a patch set, the computation of a patch manifold, and the learning algorithm for simultaneously optimizing the network functions and the dimensionality of a patch manifold. In the following sections, we will describe each of them.
3.2 Construction of a patch set
In this work, we adapt the state-of-the-art disentangle network in our proposed LDMM-based optimization framework under different levels of supervision. For such a disentanglement network, we leverage its two branches to construct a patch set. As shown in Fig. 1, one is the artifact-corrected branch that maps artifact-affected images to artifact-corrected images, and the other is the artifact-free branch that maps artifact-free images to themselves. Considering the spatial correspondence between the input/output image and its convolutional feature maps, we concatenate each feature vector along the spatial axes and its corresponding image patch to represent a patch. For the artifact-corrected branch, we take patches from the artifact-corrected images, denoted as . For the artifact-free branch, we take patches from the original images, denoted as . As we assume that the patch set of the images without artifacts samples a low-dimensional manifold, the final patch set is the concatenation of these two patch sets, denoted by .
In our implementation, the input image size is and the step size for down-sampling the encoder features is , then the patch size is , the shape of or is , and each patch .
3.3 Dimension of patch manifold
In this work, we adopt the definition introduced in LDMM [LDMM] for computing the dimension of a patch manifold. Specifically, we have the following theorem.
Let be a smooth submanifold isometrically embedded in . For any patch ,
where is the coordinate function, denotes the gradient of the function on . More details on the definition of on can be found in [LDMM]. In our implementation, , where the patch is parameterized by the neural network parameter vector , as introduced in Section 3.2.
According to the the construction of a patch set and the definition of a patch manifold dimension, we can reformulate Eq. (3) as:
To solve this problem, we design an iterative algorithm, named LDM-DN, for optimizing the LDM constrained disentanglement network based on the algorithm for image processing introduced in [LDMM]. Specifically, given at step satisfying , step consists of the following sub-steps:
Update and the perturbed coordinate functions as the minimizers of (7) with the fixed manifold :
Repeat above two sub-steps until convergence.
It is noted that if the iteration converges to a fixed point, will be very close to the coordinate functions, and and will be very close to each other.
Eq. (7) is a constrained linear optimization problem. We can use the alternating direction method of multipliers to simplify the above algorithm as:
Update , with a fixed ,
Using a standard variational approach, the solutions of the objective function (9) can be obtained by solving the following PDE
where is the boundary of , and is the out normal of .
Eq. (12) can be solved with the point integral method. For the Laplace-Beltrami equation, the key observation is the following integral approximation:
where is a hyper parameter and
is a positive function which is integrable over , and is the normalizing factor
We usually set , then is Gaussian.
Based on the above integral approximation, we approximate the original Laplace-Beltrami equation as:
This integral equation is easy to discretize over the point cloud.
To simplify the notation, we denote the patch set , where is the number of patches, in the -th iteration. We assume that the patch set samples the submanifold
and it is uniformly distributed. Then, the integral equation can be discretized as
where , and is the volume of the manifold .
We rewrite Eq. (17) in the matrix form:
where , , and is a matrix,
is the weight matrix, with , and
The final LDM-DN learning algorithm is described in Algorithm 1, where we assume that the patch set of all images samples a low-dimensional manifold. However, it is impractical to optimize the LDM problem when the number of patches is very large. To this end, we randomly select a batch of images to construct the patch set, and then estimate the coordinate functions , update the network parameters and dual variables in each iteration. Thus, in our implementation the number of iterations in training the network is the same as that in the LDM optimization. While practically updating the dual variables in the original LDMM algorithm [LDMM], the values of usually increase as the number of iterations increases. As the number of iterations is usually very large, the value of the LDM term in step 6 of Algorithm 1 will become increasingly large, leading to a bad solution. To overcome this problem, the dual variables are normalized in step 7 of Algorithm 1.
3.5 Combination of paired and unpaired learning
ADN only requires unpaired clinical images for training so that performance degradation of the model can be avoided when it is first trained on the synthesized dataset and then transferred to a clinical application. However, the GAN loss based unpaired supervision is not strong enough for recovering full image contents details. On the other hand, although the synthesized data may not perfectly simulate real scenarios, it does provide helpful information via accurate supervision. To benefit from both the paired learning and unpaired learning, here we design a hybrid training scheme. Specifically, during training, both unpaired clinical images and paired synthetic images are selected to construct a mini-batch, which are then fed to the corresponding branches to optimize the objective functions simultaneously, as shown in Fig. 2.
4 Experimental design and results
In our experiments we evaluated the proposed method on one synthesized dataset from DeepLesion [yan2018] and one clinical dataset from Spineweb 111spineweb.digitalimaginggroup.ca, which are the same as those used in ADN [adn].
For the synthesized dataset, 4,118 artifact-free CT images were randomly selected from DeepLesion. Then, the paired images with and without metal artifacts were synthesized using the method introduced in CNNMAR [zhang2018]. Finally, 3,918 pairs of images were used for training and 200 pairs for testing. For a fair comparison, the images used for training and testing, and all pre-processing processes are the same as those used for ADN [adn].
For the clinical dataset, 6,170 images with metal artifacts and 21,190 images without metal artifacts are selected for training, and additional 100 images with metal artifacts were selected for evaluation. The criteria for selecting these images are the same as that in the ADN study. Specifically, if an image contains pixels with HU values greater than 2,500 and the number of these pixels is larger than 400, then the image is grouped into the artifact-affected class. The images with the largest HU values less than 2,000 are grouped into the artifact-free class. Furthermore, to study the effectiveness of combining both paired and unpaired supervision, we randomly selected 6,170 images from the artifact-free group. Then, we extracted 6,170 metal objects from the images in the artifact-affected group, and used CatSim [catsim] to simulate the paired images by inserting each extracted metal shape into a selected artifact-free image. Finally, 6,170 synthesized paired images were obtained.
4.2 Implementation details
We implemented different network architecture variants for different learning paradigms, as shown in Fig. 3. For unpaired learning, we use the same architecture as ADN [adn] as shown in Fig. 3 (a), and the architectures of all other learning paradigms are the variants of ADN. In Fig. 3 (b), to construct the patch set for the LDM constraint, we add two convolutional layers on the top of the encoders in the artifact-corrected and artifact-free branches respectively, as described in Section 3.2. For paired learning only, we simply use the encoder-decoder in the artifact-corrected branch as shown in Fig. 3 (c). In combination of paired learning and the LDM constraint, we keep two encoder-decoder branches as shown in Fig. 3 (d).
In Fig. 1 and Fig. 3 (b) and (d), the extra convolutional layers are used to compress the channels of the latent code. Specifically, the input image size is , the down-sampling rate is 8, the matrix of is of , the matrix of is of , the patch size is , and the dimension of the point in the patch set is 128.
4.3 Results on synthesized dataset
4.3.1 Reimplementation of ADN
To simulate the unpaired learning, the synthesized paired images were divided into two groups, and then the artifact-affected images were selected from one group and the artifact-free images from the other group. In [adn], the ratio of numbers of images in these two groups was simply set to 1:1. However, in clinical scenarios, the number of artifact-affected images is much smaller than the number of artifact-free images. Therefore, we evaluated the effectiveness of the ratio to the MAR performance in the unpaired learning setting. In Table 1, ADN-0.85, ADN-0.50 and ADN-0.15 signify various ratios of artifact-affected images to all images. Table 1 shows that there is little difference between the models trained with different ratios between artifact-affected and artifact-free images. In addition, we found that the metrics for the MAR performance would not strictly converge in the unpaired learning setting. Therefore, we selected the best one as the final results, which are better than the reported results in [adn]. In practice, it is also reasonable to select the best performance model on the synthesized images first and then apply the selected model to clinical images. As a representative ratio between the artifact-affected and artifact-free images in the clinical conditions, the ADN-0.15 serves as the baseline in all following experiments.
|Paired leaning||Unpaired learning|
4.3.2 Comparative results
On the synthesized dataset, we evaluated the quantitative and qualitative performance of our proposed method as well as the compared methods. For quantitative results, we used the peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) metrics. Table2 gives the comparison results of the proposed method with the competing methods in the paired and unpaired learning settings. In Table 2, ADN is our improved ADN, see Section 4.3.1 for details. The results show that the proposed LDM-DN method outperformed ADN in terms of both PSNR and SSIM metrics in the unpaired learning setting. For the paired learning part in Table 2, Sup corresponds to the network architecture in Fig. 3 (c), which was trained with the paired data. It is noted that this encoder-decoder architecture contains the skip connections between the encoder and the decoder, which is the same as ADN. LDM-Sup adds the LDM constraints to the Sup during the paired training, corresponding to the architecture in Fig. 3 (d). Although the paired images have accurate pixel-to-pixel supervision, the LDM based learning algorithm can further improve the performance. The above results strongly demonstrate that our proposed LDM-DN algorithm can consistently improve the existing models in the paired and unpaired learning settings.
We also visually compared the results as shown in Fig 4. The visual impressions are consistent with the numerical results. In the unpaired learning setting, although ADN can remove a majority of metal artifacts, the local details were not well preserved. By comparing the results of ADN* and LDM-DN, an evident improvement was made on these details. Compared with the unpaired learning (Sup vs.ADN*), the ideal paired learning gave better results on the synthesized test dataset. In this case, our proposed LDM-DN learning algorithm obtained further improvements, where the boundaries of structures are sharper visually (LDM-Sup vs. Sup). These results strongly show the effectiveness of the proposed LDM-DN algorithm.
4.4 Results on clinical dataset
In this subsection, we evaluated the proposed networks on the clinical dataset. As there are not ground truth images for evaluating the performance of the models, here we can only show the visual results of two examples in Fig. 5. On the clinical dataset, we see the superiority of the proposed LDM-DN algorithm in preserving (the green boxes for LDM-DN vs. ADN* in the first example) and recovering (the blue boxes for LDM-DN vs. ADN* in the second example) local details. We also evaluated the performance of the model trained with synthesized images on the same dataset in a supervised learning manner. As shown in Section 4.3, the results of ADN* is better than that in the unpaired learning on synthesized dataset. However, the results in Fig. 5 shows that the performance of Sup is definitely worse than that of the unpaired learning model trained on the clinical dataset, as the synthesized data does not reflect the real conditions. This is consistent with the observation in [adn]. However, although the performance of Sup is degraded, it still shows some merits over the unpaired learning methods, such as some structures are sharper (green boxes of Sup vs. ADN* and LDM-DN for both examples) and some regions are better (blue boxes of Sup vs. ADN* and LDM-DN in the second example). Combining ADN and Sup leads to an over-correction as shown in Fig. 5, where some structures are over sharper (green boxes of ADN-Sup vs. others for both examples) and some regions are over dark (blue boxes of ADN-Sup vs. others for both examples). We attribute these results to that ADN help reduce the under-correction of Sup in some regions (red boxes of ADN-Sup vs. Sup for the second example) but simultaneously resulting in over-correcting some other regions (blue boxes of ADN-Sup vs. Sup for the second example). Therefore, we propose to combine all merits of unpaired learning, paired learning and LDM through a hybrid learning scheme. As the results of LDM-DN-Sup shown in Fig. 5, it can inherit all good points as analyzed above. Particularly, compared with ADN-Sup, LDM in LDM-DN-Sup constrains that the structurally similar patches, especially the adjacent patches, to be coherent without dramatic changes to be too dark or too bright. The above results on the clinical dataset strongly demonstrate the effectiveness of LDM and the superiority of the hybrid training scheme.
We have proposed an LDM constrained disentanglement network for MAR. Specifically, we have designed a LDM-DN learning algorithm to simultaneously optimize the objective functions of deep neural networks and constrain the recovered images to have a low-dimensional patch manifold representation. The LDM-DN algorithm can effectively help preserve and recover structural details in CT images. Moreover, we have investigated both paired and unpaired learning based models for MAR, showing their relative advantages. Finally, we have designed a hybrid optimization scheme to combine paired learning, unpaired learning and LDM-DN learning algorithm for integrating their advantages. The experimental results on synthesized and clinical datasets have strongly demonstrated the superiority of the proposed method. We believe that the proposed LDM-DN algorithm has a great potential to solve various CT MAR problems.