1 Introduction
Metal objects in a patient, such as dental fillings, artificial hips, spine implants, and surgical clips, will significantly degrade the quality of computed tomography (CT) images. The main reason for such metal artifacts is that the metal objects in the field of view strongly attenuate xrays or even completely block them so that reconstructed images from the compromised/incomplete data are corrupted in various ways, which are usually observed as bright or dark streaks. As a result, the metal artifacts significantly affect medical image analysis and subsequent clinical treatment. Particularly, the metal artifacts degrade the counters of the tumor and organs at risk, raising great challenges in determining a radiotherapeutic plan [Gian2017, Maerz].
Over the past decades, extensive research efforts [Gjesteby2016] have been devoted to CT metal artifact reduction (MAR), leading to various of MAR methods. Traditionally, the projection domain methods [Kalender, nmar]
focus on projection data inside a metal trace, and replace them with estimated data. Then, the artifactreduced image can be reconstructed from the refined projection data using a reconstruction algorithm, such as filtered backprojection (FBP). However, the projection domain methods tend to produce secondary artifacts as it is difficult for the estimated projection values to perfectly match the ground truth. In practice, the original projection data and the corresponding reconstruction algorithm are not publicly accessible. To apply the projection based methods in the absence of original sinogram data, researchers such as reported in
[Bal] proposed a postprocessing scheme that generates the sinogram through forward projection of a CT image first and then applies the projection based method on the reprojected sinogram. However, the secondround projection and reconstruction may introduce extra errors.To overcome the limitations of the projection domain methods, researchers worked extensively to reduce metal artifacts directly in the image domain [Hamid, karimi]
. With deep learning techniques
[deepimaging], datadriven MAR methods were recently developed based on deep neural networks [Huang2018MetalAR, Wang2018, Gjesteby_2019, zhang2018, dudonet, adn], demonstrating superiority over the traditional algorithms for metal artifact reduction. However, most existing deep learning based methods are fullysupervised, requiring a large number of paired training images, i.e., the artifactaffected image and the coregistered artifactfree image. In clinical scenarios, it is infeasible to acquire a large number of such paired images. Therefore, the prerequisite of these method is to simulate artifactaffected images by inserting metal objects into artifactfree images to obtain paired data. However, simulated images cannot reflect all real conditions due to the complex physical mechanism of metal artifacts and many technical factors of the imaging system, degrading the performance of the fullysupervised models. To avoid synthesized data, the recently proposed ADN [adn] only uses clinical unpaired metalaffected and metalfree CT images to train a disentanglement network on adversarial losses, giving promising results on clinical datasets, outperforming the fullysupervised methods trained on the synthesized data. However, without accurate supervision, the proposed ADN method is far from being perfect, and cannot preserve structural details in many challenging cases.In this study, we improve the MAR performance on clinical datasets from two aspects. First, we formulate the MAR as the artifact disentanglement while at the same time leveraging the lowdimensional patch manifold of image patches to help recover structural details. Specifically, we train a disentanglement network with ADN losses and simultaneously constrain a reconstructed artifactfree image to have a lowdimensional patch manifold. The idea is inspired by the lowdimensional manifold model (LDMM) [LDMM] for image processing and CT image reconstruction [cong2019]. However, how to apply the iterative LDMM algorithm to train the disentanglement network is not trivial. To this end, we carefully design an LDMDN algorithm for simultaneously optimizing objective functions of the disentanglement network and the LDM constraint. Second, we improve the MAR performance of the disentanglement network by integrating both unpaired and paired supervision. Specifically, the unpaired supervision is the same as that used in ADN, where unpaired images come from artifactfree and artifactaffected groups. The paired supervision relies on synthesized paired images to train the model in a pixeltopixel manner. Although the synthesized data cannot perfectly simulate the clinical scenarios, they still provide helpful information for recovering artifactfree images from artifactaffected ones. Finally, we design a hybrid training scheme to combine both the paired and paired supervision for further improving MAR performance on clinical datasets.
The rest of this paper is organized as follows. In the next section, we review the related work. In section 3, we describe the proposed method, including the problem formulation, the construction of a patch set, the dimension of a patch manifold, the corresponding optimization algorithm, and the hybrid training scheme. In section 4, we evaluate the proposed LDMDN algorithm for MAR on synthesized and clinical datasets. Extensive experiments show that our proposed method consistently outperforms the stateoftheart ADN model and other competing methods. Finally, we conclude the paper in section 5.
2 Related work
2.1 Metal Artifact Reduction
CT metal artifact reduction methods can be classified into three categories, including projection domain methods, image domain methods and dual domain methods.
Projection based methods aim to correct projections for MAR. Some methods of this type [park2016, jiang2000, meyer2010]
directly corrects corrupted data by modeling the underlying physical process, such as beam hardening and scattering. However, the results of these methods are not satisfactory when highatom number metals are presented. Thus, a more sophisticated way is to treat metalaffected data within the metal traces as unreliable and replaced them with the surrogates estimated by reliable ones. Linear interpolation (LI)
[Kalender] is a basic and simple method for estimating metal corrupted data. The LI method is prone to generate new artifacts and distort structures due to mismatched values and geometry between the linearly interpolated data and the unaffected data. To address this problem, the prior information is employed to generate a more accurate surrogate in various ways [Bal, nmar, wj2013]. Among these methods, the stateoftheart normalized MAR (NMAR) method [nmar] is widely used due to its simplicity and accuracy. NAMR introduces a prior image of tissue classification for normalizing the projection data before the LI is operated. With the normalized projection data, the data mismatch caused by LI can be effectively reduced for better results than that of the generic LI method. However, the performance of NMAR largely depends on the accurate tissue classification. In practice, the tissue classification is not always accurate so that NMAR also tends to produce secondary artifacts. Recently, deep neural networks [Bernhard2017, liao2019, Ghani2020] were applied for projection correction and achieved promising results. However, such learning based methods require a large number of paired projection data.The image domain based methods directly reduce metal artifacts based on CT image postprocessing. Conventionally, some methods [Hamid, karimi] leverage image processing techniques to estimate and remove the streak artifacts from original artifactaffected images, but these handcrafted methods have a limited performance. From a datadriven perspective, deep learning methods for MAR have superiority over traditional approaches. For example, RLARCNN [Huang2018MetalAR]
is a convolutional neural network with residual learning for MAR, achieving better results than the plain CNN
[vgg]. cGANMAR [Wang2018]regards MAR as an imagetoimage transformation, and adapts the Pix2pix
[Isola_2017_CVPR] model to improve the MAR performance.To benefit from both projection and image domains, various dual domain based methods were also proposed. DestreakNet [Gjesteby_2019] takes a corrected image by the stateoftheart NMAR method and a detail mapping derived from the original image as the inputs to a dualstream network, giving better results than what achievable in a single domain. In the CNNMAR method [zhang2018], the CNN first takes the original image and corrected images by BHC [Verburg2012CTMA] and LI [Kalender] as the inputs and then produces a CNN output image, which is used to generate a prior image. Then, the projection data of the prior image is used to correct the original projection data, and the final image is reconstructed with FBP. DuDoNet [dudonet] introduces an endtoend dual domain network to simultaneously correct sinogram data and CT images.
All above deep learning methods for MAR require a large number of synthesized paired project datasets and/or CT images for training. A recent study [adn] has shown that the models [zhang2018, Wang2018] trained on the synthesized data cannot generalize the well on the clinical datasets. Then, ADN was designed and tested on clinical unpaired data, achieving promising results. However, without a strong supervision, ADN can hardly recover structural details in challenging cases.
In this study, we introduce a novel image prior, i.e., lowdimensional manifold, and different levels of supervision to train the disentangle network for improving the MAR performance on clinical datasets. Our proposed LDM prior guided disentanglement framework and synergistic supervision scheme have the potential to empower image domain and dual domain based methods as further detailed below.
2.2 Lowdimensional manifold
The patch set of natural images has been proved coming from a lowdimensional manifold [Lee2003, carlsson2008, peyre2008, peyre2009]
. Based on this low dimensionality of the patch manifold, LDMM first computes the dimension of the patch manifold based on the differential geometry and then uses the dimension to regularize an image recovery problem, including image impainting, superresolution, and denoising. Based on LDMM, LDMNet
[ldmnet] proposes to regularize the combination of input data and output features within a lowdimensional manifold in the context of the classification task, showing a competitive performance over popular regularizers such as lowrank and DropOut. Recently, Cong et al [cong2019] proposed to use LDMM in regularizing the CT image reconstruction, demonstrating that the LDMM has a strong ability to recover detailed structures in CT images. Inspired by the recent results with LDMM, here we propose an LDM constrained disentanglement network with both paired and unpaired supervision for improving the MAR performance on clinical datasets.3 Method
3.1 Problem formulation
Before the problem formulation, let us introduce the general neural network based method in the image domain. In the supervised learning mode, we have the paired data
available, where each artifactaffected image , has a corresponding artifactfree image , and is the number of paired images. Then, the deep neural network based model can be trained on this dataset with the loss function:(1) 
where is a loss function, such as the distance function, and represents the predicted image of
by the neural network with a parameter vector
. In practice, a large number of paired data are synthesized for training the model, as the clinical datasets only contain unpaired images.To improve the MAR performance on clinical datasets, ADN adapts the generative adversarial learning based disentanglement network for MAR, only requiring an unpaired dataset, , where represents an artifactfree image. The ADN consists of several encoders and decoders, which are trained with several loss functions, including two adversarial losses, a reconstruction loss, a cycleconsistent loss and an artifactconsistent loss. For simplicity, we denote the ADN loss functions as:
(2) 
where represents the combination of all loss functions of ADN.
In this work, we introduce a general image property of known as LDM to improve the MAR performance on clinical datasets. Specifically, we assume that a patch set of artifactfree images samples a lowdimensional manifold. Therefore, we formulate the MAR problem as follows:
(3) 
where denotes the patch set of artifactfree or artifactcorrected images, is a smooth manifold isometrically embedded in the patch space, and can be any network loss functions, such as for paired learning or for unpaired learning.
To solve the above optimization problem, we need to specify the reconstruction of a patch set, the computation of a patch manifold, and the learning algorithm for simultaneously optimizing the network functions and the dimensionality of a patch manifold. In the following sections, we will describe each of them.
3.2 Construction of a patch set
In this work, we adapt the stateoftheart disentangle network in our proposed LDMMbased optimization framework under different levels of supervision. For such a disentanglement network, we leverage its two branches to construct a patch set. As shown in Fig. 1, one is the artifactcorrected branch that maps artifactaffected images to artifactcorrected images, and the other is the artifactfree branch that maps artifactfree images to themselves. Considering the spatial correspondence between the input/output image and its convolutional feature maps, we concatenate each feature vector along the spatial axes and its corresponding image patch to represent a patch. For the artifactcorrected branch, we take patches from the artifactcorrected images, denoted as . For the artifactfree branch, we take patches from the original images, denoted as . As we assume that the patch set of the images without artifacts samples a lowdimensional manifold, the final patch set is the concatenation of these two patch sets, denoted by .
In our implementation, the input image size is and the step size for downsampling the encoder features is , then the patch size is , the shape of or is , and each patch .
3.3 Dimension of patch manifold
In this work, we adopt the definition introduced in LDMM [LDMM] for computing the dimension of a patch manifold. Specifically, we have the following theorem.
Theorem 1.
Let be a smooth submanifold isometrically embedded in . For any patch ,
(4) 
where is the coordinate function, denotes the gradient of the function on . More details on the definition of on can be found in [LDMM]. In our implementation, , where the patch is parameterized by the neural network parameter vector , as introduced in Section 3.2.
3.4 Optimization
According to the the construction of a patch set and the definition of a patch manifold dimension, we can reformulate Eq. (3) as:
(5) 
where
(6) 
To solve this problem, we design an iterative algorithm, named LDMDN, for optimizing the LDM constrained disentanglement network based on the algorithm for image processing introduced in [LDMM]. Specifically, given at step satisfying , step consists of the following substeps:

Update and the perturbed coordinate functions as the minimizers of (7) with the fixed manifold :
(7) 
Update :
(8) 
Repeat above two substeps until convergence.
It is noted that if the iteration converges to a fixed point, will be very close to the coordinate functions, and and will be very close to each other.
Eq. (7) is a constrained linear optimization problem. We can use the alternating direction method of multipliers to simplify the above algorithm as:

Update , with a fixed ,
(9) where

Update ,
(10) 
Update ,
(11)
Using a standard variational approach, the solutions of the objective function (9) can be obtained by solving the following PDE
(12)  
where is the boundary of , and is the out normal of .
Eq. (12) can be solved with the point integral method. For the LaplaceBeltrami equation, the key observation is the following integral approximation:
(13) 
where is a hyper parameter and
(14) 
is a positive function which is integrable over , and is the normalizing factor
(15) 
We usually set , then is Gaussian.
Based on the above integral approximation, we approximate the original LaplaceBeltrami equation as:
(16) 
This integral equation is easy to discretize over the point cloud.
To simplify the notation, we denote the patch set , where is the number of patches, in the th iteration. We assume that the patch set samples the submanifold
and it is uniformly distributed. Then, the integral equation can be discretized as
(17) 
where , and is the volume of the manifold .
We rewrite Eq. (17) in the matrix form:
(18) 
where , , and is a matrix,
(19) 
is the weight matrix, with , and
(20) 
The final LDMDN learning algorithm is described in Algorithm 1, where we assume that the patch set of all images samples a lowdimensional manifold. However, it is impractical to optimize the LDM problem when the number of patches is very large. To this end, we randomly select a batch of images to construct the patch set, and then estimate the coordinate functions , update the network parameters and dual variables in each iteration. Thus, in our implementation the number of iterations in training the network is the same as that in the LDM optimization. While practically updating the dual variables in the original LDMM algorithm [LDMM], the values of usually increase as the number of iterations increases. As the number of iterations is usually very large, the value of the LDM term in step 6 of Algorithm 1 will become increasingly large, leading to a bad solution. To overcome this problem, the dual variables are normalized in step 7 of Algorithm 1.
3.5 Combination of paired and unpaired learning
ADN only requires unpaired clinical images for training so that performance degradation of the model can be avoided when it is first trained on the synthesized dataset and then transferred to a clinical application. However, the GAN loss based unpaired supervision is not strong enough for recovering full image contents details. On the other hand, although the synthesized data may not perfectly simulate real scenarios, it does provide helpful information via accurate supervision. To benefit from both the paired learning and unpaired learning, here we design a hybrid training scheme. Specifically, during training, both unpaired clinical images and paired synthetic images are selected to construct a minibatch, which are then fed to the corresponding branches to optimize the objective functions simultaneously, as shown in Fig. 2.
4 Experimental design and results
4.1 Datasets
In our experiments we evaluated the proposed method on one synthesized dataset from DeepLesion [yan2018] and one clinical dataset from Spineweb ^{1}^{1}1spineweb.digitalimaginggroup.ca, which are the same as those used in ADN [adn].
For the synthesized dataset, 4,118 artifactfree CT images were randomly selected from DeepLesion. Then, the paired images with and without metal artifacts were synthesized using the method introduced in CNNMAR [zhang2018]. Finally, 3,918 pairs of images were used for training and 200 pairs for testing. For a fair comparison, the images used for training and testing, and all preprocessing processes are the same as those used for ADN [adn].
For the clinical dataset, 6,170 images with metal artifacts and 21,190 images without metal artifacts are selected for training, and additional 100 images with metal artifacts were selected for evaluation. The criteria for selecting these images are the same as that in the ADN study. Specifically, if an image contains pixels with HU values greater than 2,500 and the number of these pixels is larger than 400, then the image is grouped into the artifactaffected class. The images with the largest HU values less than 2,000 are grouped into the artifactfree class. Furthermore, to study the effectiveness of combining both paired and unpaired supervision, we randomly selected 6,170 images from the artifactfree group. Then, we extracted 6,170 metal objects from the images in the artifactaffected group, and used CatSim [catsim] to simulate the paired images by inserting each extracted metal shape into a selected artifactfree image. Finally, 6,170 synthesized paired images were obtained.
4.2 Implementation details
We implemented different network architecture variants for different learning paradigms, as shown in Fig. 3. For unpaired learning, we use the same architecture as ADN [adn] as shown in Fig. 3 (a), and the architectures of all other learning paradigms are the variants of ADN. In Fig. 3 (b), to construct the patch set for the LDM constraint, we add two convolutional layers on the top of the encoders in the artifactcorrected and artifactfree branches respectively, as described in Section 3.2. For paired learning only, we simply use the encoderdecoder in the artifactcorrected branch as shown in Fig. 3 (c). In combination of paired learning and the LDM constraint, we keep two encoderdecoder branches as shown in Fig. 3 (d).
In Fig. 1 and Fig. 3 (b) and (d), the extra convolutional layers are used to compress the channels of the latent code. Specifically, the input image size is , the downsampling rate is 8, the matrix of is of , the matrix of is of , the patch size is , and the dimension of the point in the patch set is 128.
We implemented the proposed method in PyTorch
^{2}^{2}2https://pytorch.org/. To be fair, we keep all hyper parameters the same as those in DAN [adn]. In Algorithm 1, we empirically set the batch size , .4.3 Results on synthesized dataset
4.3.1 Reimplementation of ADN
ADN0.85  ADN0.50  ADN0.15  

PSNR  34.1  34.0  34.0 
SSIM  92.8  92.8  92.9 
To simulate the unpaired learning, the synthesized paired images were divided into two groups, and then the artifactaffected images were selected from one group and the artifactfree images from the other group. In [adn], the ratio of numbers of images in these two groups was simply set to 1:1. However, in clinical scenarios, the number of artifactaffected images is much smaller than the number of artifactfree images. Therefore, we evaluated the effectiveness of the ratio to the MAR performance in the unpaired learning setting. In Table 1, ADN0.85, ADN0.50 and ADN0.15 signify various ratios of artifactaffected images to all images. Table 1 shows that there is little difference between the models trained with different ratios between artifactaffected and artifactfree images. In addition, we found that the metrics for the MAR performance would not strictly converge in the unpaired learning setting. Therefore, we selected the best one as the final results, which are better than the reported results in [adn]. In practice, it is also reasonable to select the best performance model on the synthesized images first and then apply the selected model to clinical images. As a representative ratio between the artifactaffected and artifactfree images in the clinical conditions, the ADN0.15 serves as the baseline in all following experiments.
Paired leaning  Unpaired learning  

CNNMAE  UNet  cGANMAR  Sup  LDMSup  CycleGAN  DIP  MUNT  DRIT  ADN  ADN  LDMDN  
PSNR  32.5  34.8  34.1  37.6  38.0  30.8  26.4  14.9  25.6  33.6  34.0  35.0 
SSIM  91.4  93.1  93.4  96.1  96.3  72.9  75.9  7.5  79.7  92.4  92.9  94.2 
4.3.2 Comparative results
On the synthesized dataset, we evaluated the quantitative and qualitative performance of our proposed method as well as the compared methods. For quantitative results, we used the peak signaltonoise ratio (PSNR) and structural similarity index (SSIM) metrics. Table
2 gives the comparison results of the proposed method with the competing methods in the paired and unpaired learning settings. In Table 2, ADN is our improved ADN, see Section 4.3.1 for details. The results show that the proposed LDMDN method outperformed ADN in terms of both PSNR and SSIM metrics in the unpaired learning setting. For the paired learning part in Table 2, Sup corresponds to the network architecture in Fig. 3 (c), which was trained with the paired data. It is noted that this encoderdecoder architecture contains the skip connections between the encoder and the decoder, which is the same as ADN. LDMSup adds the LDM constraints to the Sup during the paired training, corresponding to the architecture in Fig. 3 (d). Although the paired images have accurate pixeltopixel supervision, the LDM based learning algorithm can further improve the performance. The above results strongly demonstrate that our proposed LDMDN algorithm can consistently improve the existing models in the paired and unpaired learning settings.We also visually compared the results as shown in Fig 4. The visual impressions are consistent with the numerical results. In the unpaired learning setting, although ADN can remove a majority of metal artifacts, the local details were not well preserved. By comparing the results of ADN* and LDMDN, an evident improvement was made on these details. Compared with the unpaired learning (Sup vs.ADN*), the ideal paired learning gave better results on the synthesized test dataset. In this case, our proposed LDMDN learning algorithm obtained further improvements, where the boundaries of structures are sharper visually (LDMSup vs. Sup). These results strongly show the effectiveness of the proposed LDMDN algorithm.
4.4 Results on clinical dataset
In this subsection, we evaluated the proposed networks on the clinical dataset. As there are not ground truth images for evaluating the performance of the models, here we can only show the visual results of two examples in Fig. 5. On the clinical dataset, we see the superiority of the proposed LDMDN algorithm in preserving (the green boxes for LDMDN vs. ADN* in the first example) and recovering (the blue boxes for LDMDN vs. ADN* in the second example) local details. We also evaluated the performance of the model trained with synthesized images on the same dataset in a supervised learning manner. As shown in Section 4.3, the results of ADN* is better than that in the unpaired learning on synthesized dataset. However, the results in Fig. 5 shows that the performance of Sup is definitely worse than that of the unpaired learning model trained on the clinical dataset, as the synthesized data does not reflect the real conditions. This is consistent with the observation in [adn]. However, although the performance of Sup is degraded, it still shows some merits over the unpaired learning methods, such as some structures are sharper (green boxes of Sup vs. ADN* and LDMDN for both examples) and some regions are better (blue boxes of Sup vs. ADN* and LDMDN in the second example). Combining ADN and Sup leads to an overcorrection as shown in Fig. 5, where some structures are over sharper (green boxes of ADNSup vs. others for both examples) and some regions are over dark (blue boxes of ADNSup vs. others for both examples). We attribute these results to that ADN help reduce the undercorrection of Sup in some regions (red boxes of ADNSup vs. Sup for the second example) but simultaneously resulting in overcorrecting some other regions (blue boxes of ADNSup vs. Sup for the second example). Therefore, we propose to combine all merits of unpaired learning, paired learning and LDM through a hybrid learning scheme. As the results of LDMDNSup shown in Fig. 5, it can inherit all good points as analyzed above. Particularly, compared with ADNSup, LDM in LDMDNSup constrains that the structurally similar patches, especially the adjacent patches, to be coherent without dramatic changes to be too dark or too bright. The above results on the clinical dataset strongly demonstrate the effectiveness of LDM and the superiority of the hybrid training scheme.
5 Conclusion
We have proposed an LDM constrained disentanglement network for MAR. Specifically, we have designed a LDMDN learning algorithm to simultaneously optimize the objective functions of deep neural networks and constrain the recovered images to have a lowdimensional patch manifold representation. The LDMDN algorithm can effectively help preserve and recover structural details in CT images. Moreover, we have investigated both paired and unpaired learning based models for MAR, showing their relative advantages. Finally, we have designed a hybrid optimization scheme to combine paired learning, unpaired learning and LDMDN learning algorithm for integrating their advantages. The experimental results on synthesized and clinical datasets have strongly demonstrated the superiority of the proposed method. We believe that the proposed LDMDN algorithm has a great potential to solve various CT MAR problems.