I Introduction
Xray Computed Tomography (CT) is widely applied in clinical diagnosis, industrial nondestructive testing, and safety inspection [34, 2]. In the recent several years, CT played a critical role in the auxiliary diagnosis and disease course monitoring of COVID19 pneumonia [16]. However, the highlevel radiation exposure caused by longitudinal frequent CT scans may increase the lifetime risk of cancer, especially for patients undergoing disease monitoring via CT scans such as pneumonia and cancer [6, 1]. Therefore, reducing the radiation exposure to CT imaging is an urgent need for the current public health status.
Methods  Key Factors for SVCT  Input  Output  Learning Target  Main Objective  Encoding Strategy  
GRFF [32]  INR Implicit Prior  SV sinogram  CT image  CT image  Fourier encoding  
IntroTomo [38] 

SV sinogram  CT image  CT image  Fourier encoding  
NeRP [27]  INR Implicit Prior 

CT image  CT image  Fourier encoding  
CoIL [31]  DV Sinogram Generation  SV sinogram  DV sinogram  Sinogram  Linear encoding  
SCOPE (Ours) 

SV sinogram  DV sinogram  CT image  Hash encoding 
Mathematically, the CT acquisition process can be formulated as a linear forward model:
(1) 
where is the measurement data (also known as sinogram), denotes the CT image to be constructed, represents the CT system forward imaging model (e.g., Radon transform operator for parallel Xray beam CT), and is the system noise. To reduce the imaging radiation dose, one can decrease the dimension of measurement data, denoting as , an undersampling of sinogram . To reconstruct a CT image from the undersampled sinogram is referred to as SparseView (SV) CT reconstruction, a highly illposed underdetermined inverse imaging problem. Using analytical reconstruction algorithms, such as Filtered BackProjection (FBP) [10] on the SV sinogram , results in severe streaking artifacts in the constructed CT image.
To eliminate the streaking artifacts, conventional machine learning methods
[29, 11, 30, 25] formulate the underdetermined inverse imaging as a regularized optimization problem. Explicit image prior assumptions (e.g., minimal Total Variation (TV) [25] for inducing smoothness in CT image) are adopted as a regularization term to restrict the search space and promote desired consistent image solutions [15]. Recently, supervised Deep Learning (DL) methods [9, 7, 41, 13, 28, 14, 5]have shown great potential for SVCT reconstruction. Instead of directly solving the inverse imaging problem, a supervised DL reconstruction mostly employs Convolutional Neural Network (CNN) to learn an endtoend mapping from lowquality images to their highquality reconstruction over a large dataset. For example,
[9] proposed FBPConvNet that trains a UNet [24] to learn a residual from artifactscorrupted inputs to artifactsfree outputs. It is known that the performance of the supervised DL methods mostly depends on the scale and data distribution of the image pairs in the training dataset (i.e., a largescale training dataset that covers more types of variations generally provides better performance). However, it is very challenging to build a comprehensive training dataset that includes all variable influencing factors on SVCT images, such as different levels of SV undersampling, different beam types for measurement data projection, imaging for different body tissue and other unlimited conditions. For real clinical applications, supervised DL methods performance might be quite limited and even fail in extreme cases (e.g., rare diseasecaused image patterns) that are not covered by the training dataset.Implicit Neural Representation (INR) has recently been proposed to model and represent 3D scenes from a sparse set of 2D views using coordinatebased deep neural networks in a selfsupervised fashion. The core component in INR is a continuous implicit function parameterized by a MultiLayer Perceptron (MLP). Benefiting from the image continuity prior imposed by the implicit function and the neural network architecture, INR has achieved superior performance in various vision problems (
e.g., surface reconstruction [20, 4, 17], view synthesis [18, 39, 22], and image superresolution
[3, 33]).For SVCT imaging, an early attempt by [32] indicated that INR could be applied to recover the CT image from a single SV sinogram without using any external data. Since then, more INRbased works [31, 27, 38, 23] have been emerged. We summarize the recent works that solve the tomography imaging inverse problem using INRbased methods in Table I to compare the design ideas and characteristics of various methods more clearly. [31] proposed CoIL that trains an INR to represent the SV sinogram and predicts the accordance DenseView (DV) sinogram based on the continuous nature of INR. The CT image reconstruction is then processed by applying FBP [10] on the DV sinogram. However, the coordinate space of sinogram does not follow the intuitive orthogonal assumption of Fourier spacial encoding in INR model. Thus the CT reconstruction performance of CoIL is limited and not comparable with supervised methods. NeRP [27] proposed to utilize a series of longitudinal CT scans of the same subject to build CT image from SV sinogram. The INR is firstly trained in advance on a prior CT scan, and then used as image prior to recover highquality CT images from the rest SV sinograms. However, longitudinal CT scans are not always available, which may limit the model application scenario. [38]
proposed IntroTomo consists of a sinogram prediction module and a geometry refinement module, which are applied iteratively. The former estimates the CT image from the SV sinogram, while the latter combines explicit priors (TV and Nonlocal Mean) into an optimization framework to refine CT images. The iterative training strategy produces elevation in CT image quality but severely prolongs reconstruction time.
Compared with the works in Table I, the proposed method is most related to [32, 31]. However, there are two major limitations unsolved in those works: (i) The INR estimated the desired CT image by minimizing the loss between the network predicted sinogram and the real measured sinogram. Thus the paradigm is more efficient in sinogram generation but not in CT image reconstruction. Due to the highly sparse sinogram, the MLP tends to approach an implicit function that overfits the SV sinogram, which manifests as noisy INR represented CT images; (ii) Due to the heavy computation of the coordinatebased based deep MLP, the imagespecific INR based CT reconstructions generally performs poorly on timeefficiency.
In this paper, we propose a Selfsupervised COordinate Projection nEtwork (SCOPE) to reconstruct the artifactsfree CT image from a single SV sinogram by solving the inverse tomography imaging problem. Compared with existing related works [32, 38], one of our key contributions is a simple and effective reprojection strategy that significantly improves the reconstruction quality of tomography images. This strategy is inspired by the simple relationship between linear algebra and inverse problems. We consider the SVCT inverse imaging problem as an underdetermined system of linear equations. The total number of Xrays involved in all sinogram is equivalent to the number of independent linear equations (i.e., the rank of matrix in Equation 1). Thus the number of free variables in the linear equations largely increases with the decrease of matrix ’s rank in SV sinogram. By introducing INR, the solution space of image is efficiently constrained in a continuous space, inducing a satisfied inverse CT reconstruction from a highly sparse sinogram. However, this reconstruction is one unstable solution among infinite solutions that satisfy the acquired SV sinogram , which can be easily affected by network overfitting to the SV sinogram. For achieving a more stable solution, we propose the novel reprojection strategy to build a DV sinogram from this initial CT reconstruction. This process is equivalent to generating a higher rank linear equations system for presenting the inverse imaging task, which assist us to find a more stable solution with much fewer free variables in the CT image. Our experiment results demonstrate that through the reprojection strategy, we can further suppress the image noise while preserving the image details in the resulting CT images, which significantly improves the image reconstruction quality (+3 for PSNR at least). In addition, learning highfrequency signals via simple MLP is practically very difficult due to the spectral bias problem [21, 36]. Existing INRbased methods mostly combine predefined encoding modules (e.g., Fourier encoding in [32]) with a deep MLP to learn the implicit function, which results in heavy computational cost. To accelerate the model training, we integrate the recent hash encoding [19] into our SCOPE model, enabling shallow (threelayers) MLP achieve superior fitting ability (1 minute). We conduct extensive experiments on two publicly datasets (AAPM and COVID19) for model evaluation. Both qualitative and quantitative results indicate that SCOPE outperforms two most recent INRbased methods (CoIL and GRFF [32]) and two wellknown supervised CNNbased models (FBPConvNet [9] and TF UNet [7]). To our best knowledge, the proposed SCOPE is the first selfsupervised method that outperform the supervised DL models for SVCT reconstruction. The main contributions of this work are summarized as below:

We propose SCOPE that recover the highquality CT image from single SV sinogram without involving any external data.

We propose a simple and effective reprojection reconstruction strategy that significantly improve the resulting CT image quality.

We integrate the hash encoding [19] into our SCOPE, which greatly accelerates the model training and thus improves the model practicability.

We conduct extensive experiments, and the results indicate that our SCOPE outperforms two latest INRbased methods and two wellknown supervised DL methods, quantitatively and qualitatively.
Ii Methodology
Iia Model Overview
In the proposed SCOPE model, we represent the desired CT image as a continuous function parameterized by a neural network:
(2) 
where denote the trainable parameters (weights and biases) of the network, is any 2D spatial coordinate in the imaging plane, and is the corresponding image intensity at the position in the image . Based on the acquired SV sinogram , we then optimize the network to approximate the implicit function using backpropagation gradient descent algorithm to minimize the objective as below:
(3) 
where represents the predicted SV sinogram and
is the loss function that measures the discrepancy between the predicted SV sinogram
and the acquired SV sinogram .The key insight behind Equation 3 is that using the image continuity prior imposed by the implicit function and the neural network architecture to regularize the inverse imaging problem of SVCT and thus obtaining the desired solution. After the network training, the optimal image is theoretically . However, due to the highly underdetermined inverse imaging problem, the network tends to approach an implicit function that overfits the SV sinogram and thus fails to approximate the desired implicit function well, which manifests as a lot of noises in the resulting CT image .
To this end, we propose a reprojection reconstruction strategy, in which the learned function is used to generate a DV sinogram . Then the final highquality CT image is reconstructed by applying FBP [10] on . An essential insight is that the INR network overfitting on the SV sinogram results in unexpected pixel intensity mutations in the CT image reconstruction. Figure 2 illustrates a toy example of different types of sample points in SV reconstructed CT. For example, the black sample points are scanned by multiple Xrays, which can be considered as constrained by multiple linear equations. Thus the INR network can accurately recover its image intensity through the constraints of the cross projections. For the gray and white sample points scanned only by few, or even no Xrays, the pixel intensities are not tightly constrained in the inverse problem. Their pixel intensities are mostly approximated by the image continuity prior imposed by the implicit function, and are easily effected by the overfitting effected towards the sparse measurements of sinogram. Therefore, the learned function may output pixel intensity mutation at those free variable positions due to the overfitting problem. Althrough these mutations manifests similarly to image noise, they do not follow any typical distribution, thus the performance of inserting common denoising regularization term is limited [38]. The most effective strategy to suppress free variable mutations is thus to generate a higher rank linear equation system that tightly constrain the pixel intensities in the CT image and produce the same solution space with the SV sinogram. The generation of a DV sinogram from is thus proposed. The workflow of the proposed SCOPE model is shown in Figure 1.
IiB Learning Implicit Function
Figure 1 A demonstrates the pipeline of learning implicit function by a neural network. Given a SV sinogram , where and are the number of projection views and Xrays per view respectively, we first build a total number of Xrays from the sparse projection views (i.e., Xrays per view). Next, we feed the spatial coordinates of sample points along the SV Xrays into the implicit function to produce the corresponding image intensities . Finally, we compute the predicted projection of each one in the Xrays by a summation operator as below:
(4) 
where are the sparse projection views and are the positions of Xrays in the detector.
Since the summation operator (Equation 4) is differentiable, the neural network used for parameterizing the implicit function can be optimized by using backpropagation gradient decent algorithm to minimize the loss between the predicted projection and the real projection from the SV sinogram . In this work, we employ norm as the loss function, which is defined as below:
(5) 
where and are respectively the number of sampled projection views and the sampled Xrays per view at each training iteration.
IiC Reprojection Reconstruction
Figure 1 B shows the workflow of the proposed reprojection reconstruction strategy, in which the learned implicit function is used to generate the DV sinogram and then the final highquality CT image is reconstructed from the DV sinogram. More specifically, we first build Xrays from dense projection views (i.e., Xrays per view). Then, the spatial coordinates of the sample points along the DV Xrays are fed into the learned function to predict the corresponding image intensities . Similarly, the projection of the Xrays are also calculated by the summation operator (Equation 4). The DV sinogram is thus generated. Inspired by the data consistency in MRI acceleration reconstruction [37], we combine the estimated DV sinogram with the acquired SV sinogram to generate the final DV sinogram. In particular, we replace the projection profiles at the corresponding views in the DV sinogram with the SV sinogram . Finally, we apply FBP [10] on the final DV sinogram to reconstruct the artifactsfree CT image.
IiD Network Architecture
As shown in Figure 3, the network used for learning the implicit function consists of an encoding module (via hash encoding [19]) and a threelayers MLP. The network maps the input coordinate
to a feature vector
and then converts the feature vector to the image intensity . Formally, this process can be expressed as below:(6) 
where and represent respectively the trainable parameters of the MLP and hash encoding. They are simultaneously optimized to estimate the implicit function .
IiD1 Hash Encoding
The universal approximation theorem [8] proved that a pure MLP could approximate any complicated function theoretically. However, fitting highfrequency signals via the pure MLP is practically very difficult due to the spectral bias problem [21, 36]. To alleviate the issue, many encoding strategies [19, 31, 32, 18] have been proposed to map lowdimensional inputs into highdimensional feature vectors, which allows the subsequent MLP to capture highfrequency components easily and thus reduce approximation error. In SCOPE, we adopt recent hash encoding. Unlike predefined encoding rules (e.g., position encoding [18]), hash encoding assigns a trainable feature for each input coordinate. This adaptive encoding strategy is taskspecific, which benefit from using a shallow MLP while achieving powerful fitting ability. For a coordinate grid of , hash encoding first builds multiresolution of levels feature maps . Here is the feature map at the th level, where each element is a trainable feature vector of length. Then, each feature map is mapped into a hash table of size to reduce memory footprint. After the hash table construction, given input coordinate , we compute its feature vector at the
th level via trilinear interpolation. Then, we concatenate
feature vectors to produce the final feature vector . More details about the hash encoding can be referred to [19]. Table II demonstrates the hyperparameters of the hash encoding used in our SCOPE model.HyperParameter  Symbol  Value 
Number of levels  
Hash table size  
Number of feature dimensions per entry  
Coarsest resolution  
Finest resolution 
IiD2 ThreeLayers MLP
After the hash encoding, the 2D input coordinate is encoded to the highdimensional feature vector . Then, a threelayers MLP is used to convert the feature vector to the image intensity
. The two hidden layers in the MLP have 64 neurons and are followed by ReLU activation and the output layer is followed by Sigmoid activation.
IiE Training Parameters
For the training of the proposed SCOPE model, at each iteration, we first randomly sample 3 ones (i.e., in Equation 5) from sparse projection views and then randomly sample 10 ones (i.e., in Equation 5) from Xrays per view. We adopt Adam optimizer [12] to minimize the loss function and the hyperparameters of the Adam are as follows: . The initial learning rate is
and decays by a factor of 0.5 per 500 epochs. The total number of training epochs is 5000, which only takes about 5 minutes on a single NVIDIA RTX 3060 GPU. It is worth noting that all the training parameters above are the same for different cases, such as different types of Xray beam and input views.
Iii Experiments
To evaluate the proposed SCOPE model, we perform the following three experiments: (i) We investigate the effectiveness of the reprojection reconstruction strategy; (ii) We validate the effectiveness of the hash encoding [19]; (iii) We compare our SCOPE with other five reconstruction methods quantitatively and qualitatively.
Function  Hyperparameter  Value 
radon  theta  
iradon  theta  
output_size  
fanbeam  D  
FanRotationIncrement  
FanSensorSpacing  
ifanbeam  D  
FanRotationIncrement  
FanSensorSpacing  
OutputSize 

is the number of projection views and are the size of raw slice.
Iiia Dataset & Preprocessing
IiiA1 AAPM dataset
Based on the normal dose part of the 2016 lowdose CT challenge AAPM dataset^{1}^{1}1https://www.aapm.org/GrandChallenge/LowDoseCT/ that consists of twelve 3D CT volumes acquired from twelve subjects, the AAPM dataset used in our experiments is built. Specifically, we extract 1171 2D slices from the 3D CT volumes on axial view and then split these slices into three parts: 1069 slices from ten subjects in training set, 98 slices from one subject in validation set, and 4 slices from one subject in test set. The training and validation sets are only prepared for optimizing two supervised CNNbased baselines (FBPConvNet [9] and TF UNet [7]), while other methods (FBP [10], CoIL [31], GRFF [32], and our SCOPE) directly recover the corresponding highquality CT image from the single SV sinogram.
IiiA2 COVID19 dataset
COVID19 dataset [26] is a largescale CT dataset, which consists of 3D CT volumes from 1000+ patients with confirmed COVID19 infections. A 3D CT volume of the COVID19 dataset is employed as an additional test data. We select 4 slices from the volume on axial view as 4 test samples.
IiiA3 Dataset Simulation
For the parallel and fan Xray beam SVCT reconstruction, we follow the strategies in [9, 7, 28] to simulate the pairs of lowquality and highquality CT images. Specifically, we first generate the sinograms of different views (720, 120, 90, and 60) by projecting the raw slices using the builtin functions radon and fanbeam in MATLAB, respectively. Then, we transfer the sinograms back to CT images using the builtin functions iradon and ifanbeam in MATLAB, respectively. Detailed hyperparameters of the four functions are demonstrated in Table III. The images reconstructed from 720 views are used for Ground Truth (GT), while the images reconstructed from 120, 90, and 60 views are used for input images corresponding to three different factors , , and . Note that the parallel and fan Xray beam SVCT are considered as two independent reconstruction tasks. Thus, all the training and test processes are solely conducted.
IiiB Compared Methods & Evaluation Metrics
IiiB1 Compared Methods
We compare the proposed SCOPE model with five SVCT reconstruction methods: (i) FBP [10], a classical analytical reconstruction algorithm; (ii) CoIL [31], an INRbased method. Since the output of CoIL is the DV sinogram. we thus apply FBP on the generated DV sinogram to reconstruct the CT image; (iii) GRFF [32], an INRbased method with Gaussian random Fourier feature encoding strategy; (iv) FBPConvNet [9], a supervised DL method based on UNet [24]; (v) TF UNet [7], a supervised DL method based on Tigh Frame UNet. We train FBPConvNet and TF UNet on the training set of the AAPM dataset through Adam optimizer [12] with a minibatch of 8. The learning rate starts form 10 from 10, which gradually decreases over each training epoch. The total training epochs are set as 500 and the best model is saved by checkpoints during the training process. The two INRbased methods (CoIL and GRFF) are implemented following the original papers.
IiiB2 Evaluation Metrics
To quantitatively measure the performance of the compared methods, we calculate Peak SignaltoNoise Ratio (PSNR) and Structural Similarity Index Measure (SSIM)
[35]. They are the two most used objective image quality metrics in lowlevel vision tasks. PSNR is defined based on pixelbypixel distance while SSIM measures structural similarity using the mean and variance of images. Moreover, we also compute LPIPS
[40], a DLbased objective perceptual similarity metric.IiiC Effectiveness of Reprojection Reconstruction
First, we investigate the effectiveness of the proposed reprojection strategy. After the network training, we adopt the following two strategies to recover the final CT image: (i) No Reprojection, we directly feed all the coordinates into the MLP to produce the corresponding image intensities; (ii) Reprojection Views, we employ the MLP to generate the DV sinograms (360, 480, 640, 720, and 1440 views) and then apply FBP algorithm [10] on the DV sinogram to reconstruct the CT images.
Figure 4 shows the quantitative results on the COVID19 dataset for parallel and fan Xray beam SVCT of 60, 90, and 120 views. Overall, the reprojection reconstruction strategy significantly improves performance for all the cases. For example, PSNR improves about 3 for fan Xray beam SVCT reconstruction of 60 views. More importantly, there is a common trend in all the cases: The model performance gradually increase when the reprojection views increase from 360 to 720 but slightly decrease when the reprojection views increase from 720 to 1440. Our explanation is: (i) The projections of views less than 720 are not dense enough. Although the intensity mutation of the highest frequency are completely removed, the image details of subhigh frequency are also partially lost. (ii) The projections of views more than 720 are overdense, which results in incomplete removal of the intensity mutation of the highest frequency and obtains the suboptimal performance. Therefore, we set the reprojection views as in this paper, but it is worth noting that the parameter may need to be adjusted for specific cases. Figure 5 demonstrates the qualitative results on a test sample for fan Xray beam SVCT of 90 views. We observe that the image from the direct reconstruction (i.e., No Re.) contains a lot of noises, while the results from our reprojection strategy are clear and closer to GT images.
IiiD Effectiveness of Hash Encoding
Next, we validate the effectiveness of the hash encoding [19]. The proposed SCOPE model with three different encoding modules are compared: (i) No Encoding, a pure ninelayers MLP without any encoding module; (ii) Position Encoding, a ninelayers MLP with the position encoding [18]; (iii) Hash Encoding, a threelayers MLP with the hash encoding.
Xray  Views  No En.  Pos. En.  Ha. En. 
Parallel  
Fan  
Table IV shows the quantitative results on the COVID19 dataset for parallel and fan Xray beam SVCT of 60, 90, and 120 views. From the results, we see that compared with no encoding, both the position encoding and hash encoding significantly improve the model performance in terms of all the three metrics for all the cases. For example, PSNR respectively improve 13.59 (39.56 vs. 25.97) and 15.79 (41.76 vs. 25.97) for fan Xray beam SVCT reconstruction of 90 views. This is due to the spectral bias problem [21, 36] (i.e., a pure MLP is biased toward learning lowfrequency signals during the practical training). Thus, encoding modules are critical for improving the MLP’s ability to learn highfrequency signals. Besides, we observe that the hash encoding slightly outperforms the position encoding for the most cases. For instance, PSNR improves 1.66 (37.93 vs. 36.27) for fan Xray beam SVCT reconstruction of 60 views. Figure 6 shows the qualitative results on a test sample (95) for fan Xray beam SVCT reconstruction of 90 views. Overall, the hash encoding achieves the best image quality and the fastest reconstruction speed. benefiting from the shallower MLP (3 vs. 9), the hash encoding takes only about 1 to obtain the same performance as the position encoding. However, the position encoding takes 12, which is about 12 acceleration. We also show the performance curves of the SCOPE model with the three encoding modules over training epochs in Figure 7. Obviously, the hash encoding produces the best performance.
Dataset  Views  FBP  CoIL  GRFF  FBPConvNet  TF UNet  SCOPE (Ours) 
AAPM  
COVID19  
Dataset  Views  FBP  CoIL  GRFF  FBPConvNet  TF UNet  SCOPE (Ours) 
AAPM  
COVID19  
IiiE Comparison with Other Methods
We compare the proposed SCOPE model with the five baselines on the AAPM and COVID19 datasets for parallel and fan Xray beam SVCT reconstruction. Since FBPConvNet [9] and TF UNet[7] are supervised DL methods, we train them on the training set of the AAPM dataset. Other four methods (FBP [10], CoIL [31], GRFF [32], and our SCOPE model) are imagespecific and thus they direct reconstruct the corresponding highquality CT image from each SV sinogram. Note that the parallel and fan Xray beam SVCT are considered two independent reconstruction tasks and thus all the training and test processes are solely conducted.
IiiE1 Parallel Xray Beam SVCT
Table V shows the quantitative results of the compared methods on the two datasets for parallel Xray beam SVCT of 60, 90, and 120 views. On the AAPM dataset, our SCOPE produces the best performance for most cases. Compared with the two supervised DL methods (FBPConvNet [9] and TF UNet [7]), SCOPE also obtains minor performance improvements. For instance, PSNR respectively improve 0.27 (42.18 vs. 41.95) and 0.44 (42.18 vs. 41.74) when 90 input views. On the COVID19 dataset, we, however, observe that FBPConvNet and TF UNet suffer from severe performance drops. This is mainly due to the domain shift problem (i.e., the training and test data do not share the same distribution). In comparison, our SCOPE model still produces excellent reconstruction results on the COVID19 data because it is imagespecific. For example, the difference in PSNR between SCOPE and FBPConvNet is up to +3.92 (40.57 vs. 36.65) when 90 input views. Figure 8 9 show the qualitative results on two test samples (109 and 90) from the two datasets for parallel Xray beam SVCT of 90 views. On the test sample 109 from the AAPM dataset, both FBP [10] and CoIL [31] can not produce the satisfactory results, which still include a lot of streaking artifacts. GRFF [32] yields the smooth result that lost some image details. In comparison, FBPConvNet, TF UNet, and SCOPE all recover the desirable images that are hardly distinguished from GT image. On the test sample 90 from the COVID19 dataset, the two supervised models obtain suboptimal results including moderate streaking artifacts, while our SCOPE model still produces highquality image that is closest to GT image.
IiiE2 Fan Xray Beam SVCT
Table VI demonstrates the quantitative results of the compared methods on the two datasets for fan Xray beam SVCT of 60, 90, and 120 views. We observe that the proposed SCOPE and GRFF [32] respectively produce the best and secondbest performance in terms of all the three metrics for all the cases. For example, on the AAPM dataset for 90 input views, SCOPE and GRFF respectively achieve 40.92 and 37.54 , while TF UNet [7] only obtains 32.47 in terms of PSNR. It is not common that FBPConvNet [9] and TF UNet cannot produce the satisfactory performance on the AAPM dataset although they are trained on the AAPM dataset. We guess that, for learning the endtoend mapping as in the supervised DL methods, the fan Xray beam CT is a more difficult task than the parallel Xray beam CT when same input views. In our experiments, for the sinograms of the same projection views, the results of the fan Xray CT include more severe streaking artifacts than that of the parallel Xray CT after applying the FBP algorithm [10]. While FBPConvNet [9] and TF UNet [7] directly learn the inverse mapping from the artifactscorrupted inputs to the artifactsfree outputs. Therefore, they are not expected to perform as well in the fan Xray CT as in the parallel Xray CT. In contrast, GRFF [32] and SCOPE train neural network to learn the implicit function of the unknown CT image by computing the loss on the SV sinogram (i.e., they do not manipulate image information directly). Thus, they all work well for different types of Xray beams. Figure 10 11 show the qualitative results on two test samples (104 and 95) from the two datasets for fan Xray beam SVCT reconstruction of 90 views. We see that the four compared methods do not recover the good results. The results from FBP algorithm [10] and CoIL [31] include severe streaking artifacts, while FBPConvNet [9] and TF UNet [7] produce the overly smooth results. GRFF [32] obtains the secondbest results that lost some image details. Only the proposed SCOPE removes streaking artifacts greatly and preserves fine image details well.
Iv Conclusion
In this work, we propose SCOPE, a selfsupervised INRbased method for SVCT reconstruction. Like previous INR works [32], SCOPE represents the desired CT image as an implicit continuous function and trains a neural network to learn the implicit function by minimizing predicted errors on the acquired SV sinogram. Benefiting from image continuity prior imposed by the implicit function and neural network architecture prior, the function can be estimated. However, the solution is not optimal due to the overfitting problem. To this end, we propose a simple and effective reprojection strategy that greatly improves the resulting CT image quality. Besides, we adopt the recent hash encoding [19] into our SCOPE to accelerate the model training greatly. Experimental results on two publicly available datasets indicate that the proposed SCOPE model is not only superior to two last INRbased methods, but also outperforms two wellknown supervised CNNbased methods, qualitatively and quantitatively.
References
 [1] (2007) Computed tomography—an increasing source of radiation exposure. New England journal of medicine 357 (22), pp. 2277–2284. Cited by: §I.
 [2] (2017) Lowdose ct with a residual encoderdecoder convolutional neural network. IEEE transactions on medical imaging 36 (12), pp. 2524–2535. Cited by: §I.

[3]
(2021)
Learning continuous image representation with local implicit image function.
In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
, pp. 8628–8638. Cited by: §I.  [4] (2019) Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5939–5948. Cited by: §I.
 [5] (2021) Learnable multiscale fourier interpolation for sparse view ct image reconstruction. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pp. 286–295. Cited by: §I.
 [6] (2008) Cancer risks from diagnostic radiology. The British journal of radiology 81 (965), pp. 362–378. Cited by: §I.
 [7] (2018) Framing unet via deep convolutional framelets: application to sparseview ct. IEEE transactions on medical imaging 37 (6), pp. 1418–1429. Cited by: §I, §I, §IIIA1, §IIIA3, §IIIB1, §IIIE1, §IIIE2, §IIIE.
 [8] (1989) Multilayer feedforward networks are universal approximators. Neural networks 2 (5), pp. 359–366. Cited by: §IID1.
 [9] (2017) Deep convolutional neural network for inverse problems in imaging. IEEE Transactions on Image Processing 26 (9), pp. 4509–4522. Cited by: §I, §I, §IIIA1, §IIIA3, §IIIB1, §IIIE1, §IIIE2, §IIIE.
 [10] (2001) Principles of computerized tomographic imaging. SIAM. Cited by: §I, §I, §IIA, §IIC, §IIIA1, §IIIB1, §IIIC, §IIIE1, §IIIE2, §IIIE.
 [11] (2014) Sparseview spectral ct reconstruction using spectral patchbased lowrank penalty. IEEE transactions on medical imaging 34 (3), pp. 748–760. Cited by: §I.
 [12] (2015) Adam: a method for stochastic optimization. CoRR abs/1412.6980. Cited by: §IIE, §IIIB1.
 [13] (2018) Deepneuralnetworkbased sinogram synthesis for sparseview ct image reconstruction. IEEE Transactions on Radiation and Plasma Medical Sciences 3 (2), pp. 109–119. Cited by: §I.
 [14] (2019) Learning to reconstruct computed tomography images directly from sinogram data under a variety of data acquisition conditions. IEEE transactions on medical imaging 38 (10), pp. 2469–2481. Cited by: §I.
 [15] (2021) Zeroshot learning of continuous 3d refractive index maps from discrete intensityonly measurements. arXiv preprint arXiv:2112.00002. Cited by: §I.
 [16] (2020) Diagnosis of the coronavirus disease (covid19): rrtpcr or ct?. European journal of radiology 126, pp. 108961. Cited by: §I.
 [17] (2019) Occupancy networks: learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4460–4470. Cited by: §I.
 [18] (2020) Nerf: representing scenes as neural radiance fields for view synthesis. In European conference on computer vision, pp. 405–421. Cited by: §I, §IID1, §IIID.
 [19] (2022) Instant neural graphics primitives with a multiresolution hash encoding. arXiv preprint arXiv:2201.05989. Cited by: item 3, §I, Fig. 3, §IID1, §IID, TABLE II, §IIID, §III, §IV.
 [20] (2019) Deepsdf: learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 165–174. Cited by: §I.
 [21] (2019) On the spectral bias of neural networks. In International Conference on Machine Learning, pp. 5301–5310. Cited by: §I, §IID1, §IIID.
 [22] (2021) Derf: decomposed radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14153–14161. Cited by: §I.
 [23] (2021) Dynamic ct reconstruction from limited views with implicit neural representations and parametric motion fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2258–2268. Cited by: §I.
 [24] (2015) Unet: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computerassisted intervention, pp. 234–241. Cited by: §I, §IIIB1.
 [25] (1992) Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena 60 (14), pp. 259–268. Cited by: §I.
 [26] (2021) COVID19ctdataset: an openaccess chest ct image repository of 1000+ patients with confirmed covid19 diagnosis. BMC Research Notes 14 (1), pp. 1–3. Cited by: §IIIA2.
 [27] (2022) NeRP: implicit neural representation learning with prior embedding for sparsely sampled image reconstruction. IEEE Transactions on Neural Networks and Learning Systems (), pp. 1–13. External Links: Document Cited by: TABLE I, §I.
 [28] (2019) Rnet: recurrent and recursive network for sparseview ct artifacts removal. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pp. 319–327. Cited by: §I, §IIIA3.
 [29] (2006) Accurate image reconstruction from fewviews and limitedangle data in divergentbeam ct. Journal of Xray Science and Technology 14 (2), pp. 119–139. Cited by: §I.
 [30] (2008) Image reconstruction in circular conebeam computed tomography by constrained, totalvariation minimization. Physics in Medicine & Biology 53 (17), pp. 4777. Cited by: §I.
 [31] (2021) CoIL: coordinatebased internal learning for tomographic imaging. IEEE Transactions on Computational Imaging 7 (), pp. 1400–1412. External Links: Document Cited by: TABLE I, §I, §I, §IID1, §IIIA1, §IIIB1, §IIIE1, §IIIE2, §IIIE.
 [32] (2020) Fourier features let networks learn high frequency functions in low dimensional domains. Advances in Neural Information Processing Systems 33, pp. 7537–7547. Cited by: TABLE I, §I, §I, §I, §IID1, §IIIA1, §IIIB1, §IIIE1, §IIIE2, §IIIE, §IV.
 [33] (2021) Joint implicit image function for guided depth superresolution. In Proceedings of the 29th ACM International Conference on Multimedia, pp. 4390–4399. Cited by: §I.
 [34] (2008) An outlook on xray ct research and development. Medical physics 35 (3), pp. 1051–1064. Cited by: §I.
 [35] (2004) Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13 (4), pp. 600–612. External Links: Document Cited by: §IIIB2.
 [36] (2019) Frequency principle: fourier analysis sheds light on deep neural networks. arXiv preprint arXiv:1901.06523. Cited by: §I, §IID1, §IIID.
 [37] (2020) Selfsupervised learning of physicsguided reconstruction neural networks without fully sampled reference data. Magnetic resonance in medicine 84 (6), pp. 3172–3191. Cited by: §IIC.
 [38] (2021) IntraTomo: selfsupervised learningbased tomography via sinogram synthesis and prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1960–1970. Cited by: TABLE I, §I, §I, §IIA.
 [39] (2020) Nerf++: analyzing and improving neural radiance fields. arXiv preprint arXiv:2010.07492. Cited by: §I.

[40]
(2018)
The unreasonable effectiveness of deep features as a perceptual metric
. In CVPR, Cited by: §IIIB2.  [41] (2018) A sparseview ct reconstruction method based on combination of densenet and deconvolution. IEEE transactions on medical imaging 37 (6), pp. 1407–1417. Cited by: §I.