Metal artifact is one of the most prominent artifacts which impede reliable computed tomography (CT) or cone beam CT (CBCT) image interpretation. It is commonly addressed in the sinogram domain
where the metal-affected regions in the sinograms are segmented and replaced with synthesized values so that metal-free CT images can be ideally reconstructed from the corrected sinograms. Early sinogram domain approaches fill the metal-affected regions by interpolation or from prior images 
. These methods can effectively reduce metal artifacts, but secondary artifacts are often introduced due to the loss of structural information in the corrected sinograms. Recent works propose to leverage deep neural networks (DNNs) to directly learn the sinogram correction. Park et al. applies U-Net  to correct metal-inserted sinogram, and Gjesteby et al.  proposes to refine NMAR-corrected sinograms 
using a convolutional neural network (CNN). Although better sinogram completions are achieved, the results are still subject to secondary artifacts due to the imperfect completion.
The development of DNNs in recent years also enables an image domain approach that directly reduces metal artifacts in CT/CBCT images. Specifically, the existing methods [2, 13, 12, 8] train image-to-image CNNs to transform metal-affected CT images to metal-free CT images. Gjesteby et al.  proposes to include the NMAR-corrected CT as the input with a two-stream CNN. Zhang et al.  fuses beam hardening corrected and linear interpolated CT images for better correction. All the current image domain approaches use synthesized data to generate the metal-affected and metal-free image pairs for training. However, the synthesized data may not fully simulate the CT imaging under the clinical scenario making the image domain approaches less robust to clinical applications.
In this work, we propose a novel learning-based sinogram domain approach to metal artifact reduction (MAR). Unlike the existing image domain methods, the proposed method does not require synthesized metal artifact during training. Instead, we treat MAR as an image inpainting problem, i.e., we apply random metal traces to mask out artifact-free sinograms, and train a DNN to recover the data within the metal traces. Since metal-affected regions are viewed as missing, factors such as X-ray spectrum and the material of metal implants will not affect the generalizability of the proposed method. Unlike the existing learning-based sinogram domain approaches, our method delivers high-quality sinogram completion with three designs.First, we propose a two-stage projection-sinogram111We denote the X-ray data that captured at the same view angle as a “projection” and a stack of projections corresponding to the same CT slice as a “sinogram”. completion scheme to achieve more contextually consistent correction results. Second, we introduce adversarial learning into the projection-sinogram completion so that more structural and anatomically plausible information can be recovered from the metal regions. Third, to make the learning more robust to the various shapes of metallic implants, we introduce a novel mask pyramid network (MPN) to distill the geometry information of different scales and a mask fusion loss to penalize early saturation. Our extensive experiments on both synthetic and clinical datasets demonstrate that the proposed method is indeed effective and perform better than the state-of-the-art MAR approaches.
An overview of the proposed method is shown in Fig. 2
. Our method consists of two major modules: a projection completion module (blue) and a sinogram correction module (green). The projection completion module is an image-to-image translation model enhanced with a novel mask pyramid network. Given an input projection image and a pre-segmented metal mask, the projection completion module generates anatomically plausible and structurally consistent surrogates within the metal-affected regions. The sinogram correction module predicts a residual map to refine the projection-corrected sinograms. This joint projection-sinogram correction approach enforces inter-projection consistency and makes use of the context information between different viewing angles. Note that we perform projection completion first due to the observation that the projection images contain better structural information that facilitates the learning of an image inpainting model.
2.0.1 Base Framework
Inspired by recent advances in deep generative models [9, 3], we formulate the projection and sinogram correction problems under a generative image-to-image translation framework. The structure of the proposed model is illustrated in Fig. 2. It consists of two individual networks: a generator and a discriminator . The generator takes a metal-segmented projection as the input and generates a metal-free projection . The discriminator
is a patch-based classifier that predicts if the metal-free projectionor , is real or not. Similar to the PatchGAN  design, is constructed as a CNN without fully-connected layers at the end to enable the patch-wise prediction. The detailed structures of and are presented in the supplementary material. and are trained adversarially with LSGAN , i.e.,
In addition, we also expect the generator output to be close to its metal-free counterpart . Therefore, we add a content loss to ensure the pixel-wise consistency between and ,
2.0.2 Mask Pyramid Network
Metallic implants have various shapes and sizes, such as metallic balls, bars, screws, wires, etc. When X-ray projections are acquired at different angles, the projected implants would exhibit complicated geometries. Hence, unlike typical image inpainting problems, where the shape of the mask is usually simple and fixed, projection completion is more challenging since the network has to learn how to fuse such diversified mask information of the metallic implants. Directly using metal-masked image as the input requires the metal mask information to be encoded by each layer and passed along to the later layers. For unseen masks, this encoding may not work very well and hence the mask information may be lost. To retain sufficient amount of mask information, we introduce a mask pyramid network (MPN) into the generator to feed the mask information into each layer explicitly.
The architecture of the generator with this design is illustrated in Fig. 4. The MPN takes a metal mask as the input, and each block (in yellow) of is coupled with an encoding block (in grey) in . Let denote the th block of and denote the th block of . When and are coupled, the output of will be concatenated to the output of . In this way, the mask information will then be used by , and a recall of the mask is achieved. Each block of. Hence, the metal mask output by not only has the same size as the feature maps from , but also takes into account the receptive field of the convolution operation in .
2.0.3 Mask Fusion Loss
In conventional image-to-image framework, the loss is usually computed on the entire image. On the one hand, this makes the generation less efficient, as a significant portion of the generator’s computation will be spent on recovering the already known information. On the other hand, this also introduces early saturation during adversarial training, in which the generator stops improving in the masked regions, since the generator does not have information about the mask. We address this issue with two strategies. First, when computing the loss function, we only consider the content within the metal mask. That is, the content loss is rewritten as
Second, we modulate the output score matrix from the discriminator by the metal mask so that the discriminator can selectively ignore the unmasked regions. As shown in Fig. 4, we implement this design using another MPN . But this time, we do not feed the intermediate outputs from to the coupled blocks in , since the metal mask will, in the end, be applied to the loss. The adversarial part of the mask fusion loss is given as
and the total mask fusion loss can be written as
where balances the importance between and .
2.0.4 Sinogram Correction with Residual Map Learning
Although the proposed projection completion framework in previous sections can produce an anatomically plausible result, it only considers the contextual information within a projection. Observing that a stack of consecutive projections form a set of sinograms. We use a simple yet effective model to enforce the inter-projection consistency by having the completion results look like sinograms.
Let denote a sinogram formed from previous projection completion step. A generator, as shown in Fig. 4, predicts a residual map which is then added to to correct the projection completion results. Here, we use the same generator structure as the one introduced in Fig. 4. For the objective function, we apply the same one as used in Eq. 7, except that we have .
3 Experimental Evaluations
3.0.1 Implementation Details and Baselines
We implement the proposed model using PyTorch and train the model with the Adam optimization method. For the hyper-parameters, we set learning rate, , , and batch size . We compare our projection completion (PC) model and joint projection-sinogram correction (PC+SC) model with the following baseline MAR approaches: 1) LI, sinogram correction by linear interpolation ; 2) BHC, beam hardening correction for MAR ; 3) NMAR, a state-of-the-art MAR model 
that produces a prior CT image to correct metal artifacts; and 4) CNNMAR, the state-of-the-art deep learning based method that uses a CNN to output the prior image for MAR.
3.0.2 Datasets and Simulation Details
For the synthesized dataset, we use the images collected from a CBCT scanner that is dedicated for lower extremities. The size of the CBCT projections is and the projections contain no metal objects. We randomly apply masks to the projections to obtain masked and unmasked projection pairs. In total, there are 27 CBCT scans, each with 600 projections. Projections from 24 of the CBCT scans are used for training, and the rest are held out for testing.
Two types of object masks are collected for the experiments: metal masks and blob masks. For the metal masks, we collect 3D binary metal implant volumes from clinical records and forward project them to obtain 2D metal projection masks. In total, we obtain 18,000 projection masks from 30 binary metal implant volumes. During training, we simulate the metal implants insertion process by randomly placing metal segmentation masks on the metal-free projections. For the blob masks, we adopt the method from  by drawing randomly shaped blobs on the image. Results for projection and sinogram completion with the metal and blob masks are provided in the supplementary material.
For a fair comparison, we adopt the same procedures as in  to synthesize metal-affected CBCT volumes. We assume a 120 kVp X-ray source with photons. The distance from the X-ray source to the rotation center is set to 59.5cm, and 416 projection views are uniformly spaced between 0-360 degrees. The size of the reconstructed volume is . During simulation, we set the material to iron for all the metal masks. Note that since the metal masks are from clinical records, the geometries and intensities of the metal artifacts are extremely diverse, which makes MAR highly challenging.
For the clinical dataset, we use the vertebrae localization and identification dataset from Spineweb222spineweb.digitalimaginggroup.ca. We first define regions with HU values greater than 2,500 as metal regions. Then, we select images with the largest-connected metal region greater than 400 pixels as metal-affected images and images with the largest HU value smaller than 2,000 as metal-free images. The metal masks for the projections and sinograms are obtained by forward projecting the metal regions in the CT image domain. The training for this dataset is performed on the metal-free images with metal masks obtained from the metal-affected images.
3.0.3 Quantitative Comparisons
We use two metrics: the rooted mean square error (RMSE) and structural similarity index (SSIM) for quantitative evaluations. We conduct a thorough study by evaluating RMSE and SSIM for a wide range of mask sizes. The results are summarized in Fig. 5. We observe that the proposed method achieves superior performance over the other methods. For example, the RMSE error of the second-best method CNNMAR  almost doubles that of the proposed method when the implant size is large. In addition, by further refining in the sinogram domain, improved performance can be achieved especially in terms of the SSIM metric. From Fig. 5, we also perceive that methods which require tissue segmentation (e.g. NMAR and CNNMAR) perform well when the metallic object is smaller than pixels. However, when the size of the metallic implants becomes larger, these methods deteriorate significantly due to erroneous segmentation. The proposed joint correction approach, which does not rely on tissue segmentation, exhibits less degradation.
3.0.4 Qualitative Comparisons
Fig. 6 shows MAR results on synthesized metal-affected images. It is clear that the proposed method successfully restores streaking artifacts caused by metallic implants. Unlike other approaches that generates erroneous surrogates, our method fills in contextually consistent values through generative modeling and joint correction. For the results with clinical data (Fig 7), we also observe that our method produces qualitatively better results. BHC and NMAR cannot totally reduce the metal artifacts. LI and CNNMAR can recover most of the metal-affected regions. However, they also produce secondary artifacts. We notice a performance degradation for CNNMAR on the clinical data compare to the synthesized data, which demonstrates that image domain approaches relying on synthesizing metal artifact have worse generalizability.
We present a novel MAR approach based on a generative adversarial framework with joint projection-sinogram correction and mask pyramid network. From the experimental evaluations, we show that existing MAR methods does not effectively reduce metal artifact. By contrast, the proposed approach leverages the extra contextual information from sinogram and achieves a superior performance over other MAR methods in both the synthesized and clinical datasets.
Acknowledgement. This work was supported in part by NSF award #1722847, the Morris K. Udall Center of Excellence in Parkinson’s Disease Research by NIH, and the corporate sponsor Carestream.
-  Gjesteby, L., Yang, Q., Xi, Y., Zhou, Y., Zhang, J., Wang, G.: Deep learning methods to guide ct image reconstruction and reduce metal artifacts. In: SPIE Medical Imaging (2017)
-  Gjesteby, L., Yang, Q., Xi, Y., Claus, B., Jin, Y., De Man, B., Wang, G.: Reducing metal streak artifacts in ct images via deep learning: Pilot results. In: The 14th International Meeting on Fully Three-Dimensional Image Reconstruction in Radiology and Nuclear Medicine. pp. 611–614 (2017)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proc. CVPR. pp. 1125–1134 (2017)
-  Kalender, W.A., Hebel, R., Ebersberger, J.: Reduction of ct artifacts caused by metallic implants. Radiology 164(2), 576–577 (1987)
Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 2794–2802 (2017)
-  Meyer, E., Raupach, R., Lell, M., Schmidt, B., Kachelrieß, M.: Normalized metal artifact reduction in computed tomography. Medical physics 37(10) (2010)
-  Park, H.S., Chung, Y.E., Lee, S.M., Kim, H.P., Seo, J.K.: Sinogram-consistency learning in ct for metal artifact reduction. arXiv preprint arXiv:1708.00607 (2017)
-  Park, H.S., Lee, S.M., Kim, H.P., Seo, J.K.: Machine-learning-based nonlinear decomposition of ct images for metal artifact reduction. arXiv preprint arXiv:1708.00244 (2017)
-  Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: Feature learning by inpainting. arXiv preprint arXiv:1604.07379 (2016)
-  Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Proc. MICCAI. pp. 234–241. Springer (2015)
-  Verburg, J.M., Seco, J.: Ct metal artifact reduction method correcting for beam hardening and missing projections. Physics in Medicine & Biology 57(9) (2012)
-  Xu, S., Dang, H.: Deep residual learning enabled metal artifact reduction in ct. In: Medical Imaging 2018: Physics of Medical Imaging. vol. 10573, p. 105733O. International Society for Optics and Photonics (2018)
-  Zhang, Y., Yu, H.: Convolutional neural network based metal artifact reduction in x-ray computed tomography. IEEE Transactions on Medical Imaging (2018)