I Introduction
Improvements in imaging sensors and computing power have made it possible to solve increasingly difficult reconstruction problems. In particular, the dimensionality of reconstruction problems has increased from the traditional 2D and 3D problems representing space to more difficult 4D or even 5D problems representing spacetime and, for example, heart or respiratory phase [1, 2].
These higherdimensional reconstruction problems pose surprisingly difficult challenges both computationally and perhaps more importantly, in terms of algorithmic design and training due to the curse of dimensionality
[3]. However, the high dimensionality of the reconstruction also presents important opportunities to improve reconstruction quality by exploiting regularity. In particular, it has been shown that the temporal resolution of 4D CT imaging can be increased by an order of magnitude by exploiting the time space regularity of objects being imaged [2, 4, 5, 6]. These approaches use modelbased iterative reconstruction (MBIR) [7, 8] to enforce regularity using simple spacetime prior models.Recently, it has been demonstrated that PlugandPlay (P&P) priors [9, 10, 11, 12] can dramatically improve reconstruction quality by enabling the use of stateoftheart denoisers as prior models in MBIR. So P&P has great potential to improve reconstruction quality in 4D CT imaging problems. However, stateoftheart denoisers such as deep Convolutional Neural Networks (CNNs) and BM4D are primarily available for 2D and sometimes 3D images, and it is difficult to extend them to higher dimensions [13, 14, 3]
. In particular, extending CNNs to 4D requires very computationally, and memory intensive 4D convolution applied to 5D feature tensor structures. This problem is further compounded by the lack of GPU accelerated routines for 4D convolution from major DeepLearning frameworks such as Tensorflow, Keras, PyTorch. Currently only 1D, 2D, and 3D convolutions are supported with GPU acceleration. Furthermore, 4D CNNs require 4D ground truth data to train the P&P denoisers, which might be difficult or impossible to obtain.
In this paper, we present a novel 4D Xray CT reconstruction algorithm that combines multiple lowdimensional CNN denoisers to implement a highly effective 4D prior model. Our approach, which we call MultiSlice Fusion, integrates the multiple lowdimensional priors using multiagent consensus equilibrium (MACE) [15]. MACE is an extension of the P&P framework that formulates the inversion problem using an equilibrium equation—as opposed to an optimization—and allows for the use of multiple prior models and agents.
Figure 1 illustrates the basic concept of our approach. MultiSlice Fusion integrates together three distinct denoisers each of which is designed to remove noise along lower dimensional slices of the 4D object. When MACE fuses the denoisers it simultaneously
enforces the constraints of each denoising agent, so the reconstructions are constrained to be smooth in all four dimensions. Consequently, MultiSlice Fusion results in highquality reconstructions that are practical to train and compute even when the dimensionality of the reconstruction is high. In our implementation, one MACE agent estimates the conebeam tomographic inversion. The remaining 3 agents are CNN denoisers trained to remove additive white Gaussian noise (AWGN) along 2.5D slices of the reconstruction
[3, 16, 17].The MACE solution may be computed using a variety of algorithms, including variants of the PlugandPlay algorithm based on ADMM or other approaches [18, 19, 10, 9]. We implement our method for the problem of 4D conebeam Xray CT reconstruction [20] of moving parts, and the MACE solution is computed on a heterogeneous cluster in which different agent updates are computed on different cluster nodes. In particular, the algorithm runs in parallel on heterogeneous cluster nodes, with the inversion portion running on CPU nodes and the prior portion running on GPU nodes.
Experimental results on both simulated data and real data obtained from an North Star Imaging X50 Xray conebeam scanner indicate that the MultiSlice Fusion reconstruction using CNN priors can substantially improve reconstructed image quality as compared to MBIR reconstruction using traditional 4D MRF priors.
Ii Problem Formulation
In XRay CT imaging, an object is rotated and several 2D projections (radiographs) of the object are measured for different angles. The problem is then to reconstruct the 4D array of Xray attenuation coefficients from these measurements, where 3 dimensions correspond to the spatial dimensions and the fourth dimension corresponds to time.
Let be the number of timepoints, and be the number of voxels at each time point of the 4D image. For each timepoint, , define
to be the vector of sinogram measurements at time
, and to be the 3D volume of Xray attenuation coefficients for that timepoint. Let us stack all the measurements to form a vector and stack all the 3D volumes to form a vector , where and . The 4D reconstruction problem then becomes the task of recovering the 4D image of attenuation coefficients, , from the series of sinogram measurements, .The image can be reconstructed as a maximum a posteriori (MAP) estimate as
(1) 
where is the datafidelity or loglikelihood term and is the 4D regularizer or prior model for the image, . The datafidelity term of the 4D image, , can be written in a separable fashion as
(2) 
where
(3) 
is the data fidelity term for time , where is the system matrix, and is the weight matrix, and is a normalizing weight scalar [20]. The values in the diagonal weight matrix,
, account for the nonuniform noise variance for each measurement.
If the prior model, , can be expressed analytically like a 4D Markov Random Field (MRF) as in [2], then the expression in Equation (1) can be minimized iteratively. However, in practice, it can be difficult to represent an advanced prior model in the form of a simple cost function that can be minimized. Consequently, P&P methods have been created as a method for representing prior models as denoising operations[9, 10]. More recently, P&P methods have been generalized to the consesus equilibrium framework as a way to integrate multiple criteria [15].
Iii Fusing Denoisers via Consensus Equilibrium
We use the MACE framework to fuse multiple denoisers to form a prior model for reconstruction. This allows us to make a prior model out of denoisers that are not optimization based [15].
Let for be a lowerdimensional denoiser that denoises a full 4D volume by processing the volume “slice by slice”. Furthermore, each denoiser,
, is trained to remove additive white Gaussian noise (AWGN) of standard deviation
from the image. The denoisers serve as prior models even though a corresponding regularization term, , does not necessarily exist. The design of these denoisers is described in more detail in section IV. Each of the denoisers is a separate agent in the MACE framework.The datafidelity is achieved using a proximal map operation, , as another agent. The operator is defined as the proximal map of as
(4) 
where is the same as above and the unitless parameter is used to vary the relative weight of the operator with with respect to the denoisers and in turn the amount of regularization in the reconstruction can be adjusted.
The operator enforces fidelity to the measurements while each of the denoisers , , enforces regularity of the image in orthogonal image planes. Our method imposes a consensus between the operators , , , to achieve a balanced reconstruction that lies at the intersection of the solution space of the measurement model and each of the prior models.
In order to formally state the consensus equilibrium criteria and present the algorithm for computing the solution, we will need to introduce some additional notation. First, define the column vector denotes a tall vector as the concatination of state vectors where each state vector represents the input to one of the agents. Furthermore, let denote the operator formed by the application of all agents on each corresponding state vector. More precisely, we define the agent operator, , as
(5) 
So simply denotes the parallel application of each agent to its corresponding state vector.
Next we define the averaging operator, , as
(6) 
where denotes the weighted average of the state vectors for each agent given by
(7) 
Notice that we choose the weighting so that there is equal weighting of the datafidelity (i.e., agent ) and regularization (i.e., agents to ), irrespective of how many denoiser we use to form our prior model.
Using, the above notation, it can be shown [15, 21] that when the agents are at consensus, the equilibrium equation
(8) 
holds. The equilibrium condition of equation (8) implies that at equilibrium each agent with different inputs produces the same output. This output is then the consensus solution that represents a balance point between the forces of each agent.
From equation (8) it follows that the consensus solution is the fixed point of the map and can found efficiently by using partial update Mann iterations [15, 21, 22]. Partial update Mann iterations use approximate updates for the proximal map operator, . We define to be the corresponding partial update approximation of
(9) 
where is the same as but uses only an approximation proximal mapping formed by a single update of a iterative optimization method initialized by . That is, the iterative optimization in the operator in equation (4) is initialized by and terminated early.
The the partial update Mann iterations for fusing denoisers can be simplified to form Algorithm 1. The weighing parameter is used to control speed of convergence. The initial reconstruction used should be close to the final solution for faster convergence. Output of a fast conventional algorithm can be used as the initial reconstruction. We continue the partial update Mann iterations until the differences between state vectors become smaller than a fixed threshold. We use three passes of Iterative Coordinate Descent (ICD) for computing the approximate proximal mapping, .
Iv CNN SpaceTime Denoiser
Since we iteratively apply multiple denoisers to multiple 4D volumes to compute the reconstruction, it is essential for each denoiser to be fast. We use Deep Convolutional Neural Networks since they provide stateoftheart quality in denoising while at the same time being fast due to optimized GPU libraries.
The architecture of the CNN denoiser is shown in Figure 2. Each of the CNN denoisers used in the MultiSlice Fusion is designed to perform 2.5D denoising of a 3D volume using a 2D moving window. By 2.5D we mean that as an input the neural network takes a thin 3D patch consisting of five consecutive 2D slices while the output is the denoised center slice. The other slices are being denoised by shifting the 5slice moving window.
The 2D slices are processed using 2D convolution while the adjacent slices are treated as separate input channels. The neural network architecture is modeled after the residual network architecture presented in [17] but extended so that it works on 3D rather than 2D images. In [3, 16], it has been shown that this type of 2.5D processing is a computationally efficient way of performing effective 3D denoising with CNNs. We note that it is similar to the method presented in [17] where the 5 adjacent slices are analogous to the 3 color channels used in denoising of color images.
Our 3D CNN denoiser can theoretically be trained with spacetime data to get different time and space regularization. However, in the absence of spacetime training data, we trained the network on 3Dspace volumes. For training, we used a CT reconstruction of a bottle and screw cap made from different plastics that has low noise. A slice from the training volume is illustrated in Figure 3. While our training and testing volumes are different, all of them are representative of a variety of NDE problems in which objects to be imaged are constructed from a relatively small number of distinct materials.
We extracted partially overlapping 3D patches of size from our training volume of size . As a data augmentation technique random rotation, flipping, and mirroring were applied during training. Pseudorandom additive white Gaussian noise (AWGN) was applied to the patches and the network was trained to remove the noise. The AWGN noise used does not represent CT measurement noise but is a result of using the MACE framework. More specifically, the AWGN noise that the denoisers removes in MACE is an ’artificial’ noise induced from the quadratic norm used in equation (4).
V 4D Prior as MultiSlice Fusion
In this section we describe how we fuse the spacetime denoisers described in section IV using the approach in section III to form a 4D reconstruction.
Although, the approach described in section III can fuse arbitrary number of denoisers, we fuse three 3D CNN denoisers to form our 4D prior. We refer to denoisers in equation (5) as , , and , respectively. The denoiser is a CNN spacetime denoiser that does convolution in the xy plane and uses five input channels to input slices from neighboring timepoints. The denoisers and are analogous to but are applied along the yz and zx plane, respectively. Each of the denoisers, despite being 3D denoisers, denoise the 4D volume by processing “slice by slice”.
Each denoiser incorporates information along a spatial slice and uses a moving temporal window of 5 slices to denoise a single slice that is in the center. However, our MultiSlice Fusion fuses all the denoisers , , and to incorporate information along all four dimensions.
Vi Experimental Results
To demonstrate the improved reconstruction quality of our method, we present results on both real and simulated 4D Xray CT data for NDE applications. In both cases we compare MultiSlice Fusion reconstruction with 4D Markov Random Field based MBIR reconstruction (MBIR+4DMRF) and conventional 3D Filtered Back Projection reconstruction (FBP). The MBIR+4DMRF used was a variant of the QGGMRF model described in [2] with 26 spatial and 2 temporal neighbors. We also compare with reconstructions using single CNNs to illustrate the effect of model fusion. We refer to reconstructions using single CNN denoisers , , and as MBIR+, MBIR+, and MBIR+, respectively.
In all our experiments the conebeam inversion within MultiSlice Fusion was executed on two CPU cluster nodes, each with 20 Kaby Lake CPU cores. It was parallelized over timepoints trivially and within time points using multiple threads similar to [20]. We perform three equits per each Mann iteration for the conebeam inversion. The CNN denoisers ran in parallel with the conebeam inversion in a separate cluster with a Nvidia Tesla P100 GPU.
Via Real Dataset
Scanner Model 



SourceDetector Distance  839  
Magnification  5.57  
Number of Views per Rotation  150  
Cropped Detector Array  ,  
Voxel Size  
Reconstruction Size 
In this section we present results on real data to evaluate our method. The data is from a dynamic conebeam Xray scan of a glass vial, with elastomeric stopper and aluminum crimpseal, using a North Star Imaging (Rogers, MN) X50 Xray CT system. The vial is undergoing dynamic compression during the scan, to capture the mechanical response of the components. Of particular interest is the moment when the aluminum seal is no longer in contact with the underside of the glass neck finish. This indicates the moment when the force applied exceeds that exerted by the rubber on the glass; this is known as the “residual seal force”
[23]. During the scan the vial was held in place by fixtures going in and out of the Field of View thus resulting in artifacts. All our 4D results incorporate preprocessing to correct these artifacts. The experimental setup is summarized in Table I.Figure 6 compares MultiSlice Fusion with several other methods. Each image is a slice through the reconstructed vial for one time point along the spatial xy plane. Both FBP and MBIR+4DMRF suffer from obvious windmill artifacts, higher noise and have blurred contours. In contrast to that, the MultiSlice Fusion reconstruction has smooth and uniform textures while preserving edge definition. Figure 6 also illustrates the effect of model fusion by comparing MultiSlice Fusion with MBIR+, MBIR+, and MBIR+. MBIR+ and MBIR+ suffer from horizontal and vertical streaks respectively since the denoisers were applied in those planes. MBIR+ does not suffer from streaks in the figure since we are viewing a slice along the xy plane, but it suffers from other artifacts. MBIR+ cannot reconstruct the outer ring since the slice displayed is at the edge of the aluminum seal and the xy plane does not contain sufficient information. In contrast, MultiSlice Fusion can resolve the edges of the rings better than either of MBIR+, MBIR+, and MBIR+ since it uses information from all the spatial coordinates.
Next, we plot a crosssection through the object for MultiSlice Fusion, MBIR+4DMRF and FBP in Figure 4. For this, we choose a timepoint where we know the aluminum and glass have separated spatially. The MultiSlice Fusion reconstruction has a steeper and more defined slope in the junction of aluminum and glass. This supports that MultiSlice Fusion is able to preserve fine details in spite of producing a smooth regularized image.
Finally in Figure 5 we plot a crosssection through the object with time to show the improved spacetime resolution of our method. We do this for both the 4D methods: MBIR+4DMRF and MultiSlice Fusion. MultiSlice Fusion results in improved spacetime resolution of the separation of aluminum and glass.
ViB Simulated Data
SourceDetector Distance  839 

Magnification  5.57 
Number of Views per Rotation  75 
Cropped Detector Array  , 
Voxel Size  
Reconstruction Size 
In this section we present results on simulated data to evaluate our method. We take a lownoise CT reconstruction of a bottle and screw cap and denoise it further using BM4D [14] to generate a clean 3D phantom. We then move the 3D volume vertically for each time point to generate a 4D phantom. We forward project the phantom to generate sinogram data and use that to reconstruct the object. The simulated experimental setup is summarized in Table II.
Figure 8 compares MultiSlice Fusion with several other methods. Each image is a slice through the reconstructed object for one time point along the spatial xy plane. Both FBP and MBIR+4DMRF suffer from high noise and jagged edges and fail to recover the small hole in the bottom of the image. MBIR+ and MBIR+ suffer from horizontal and vertical streaks respectively since the denoisers were applied in those planes. MBIR+ does not suffer from streaks in the figure since we are viewing a slice along the xy plane, but it suffers from other artifacts. MBIR+ cannot reconstruct the small hole in the bottom of the image since the xy plane does not contain sufficient information.
Next we plot a crosssection through the object for MultiSlice Fusion, MBIR+4DMRF, FBP, and the phantom Figure 7. MultiSlice Fusion results in the most accurate reconstruction of the gap between materials.
Finally we report the Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index (SSIM) with respect to the phantom for each method in Table III to objectively measure image quality. We define the PSNR for a given reconstruction with a phantom as
(10) 
where range is computed from the and percentiles of the phantom. As can be seen from Table III, MultiSlice Fusion results in the highest PSNR and SSIM scores.
Method  PSNR(dB)  SSIM 

FBP  19.690  0.609 
MBIR+4DMRF  25.837  0.787 
MultiSlice Fusion  29.071  0.943 
MBIR+  29.026  0.922 
MBIR+  28.040  0.932 
MBIR+  28.312  0.926 
Vii Conclusion
In this paper, we proposed a novel 4D XRay CT reconstruction algorithm, MultiSlice Fusion, that combines multiple low dimensional denoisers to form a 4D prior. Our method allows the formation of an advanced 4D prior using stateoftheart CNN denoisers without needing to train on 4D data. Furthermore, it allows for multiple levels of parallelism; thus enabling reconstruction of large volumes in a reasonable time. Although we focused on 4D XRay CT reconstruction for Non Destructive Evaluation (NDE) applications, our method can be used for any reconstruction problem involving multiple dimensions.
Acknowledgment
The authors would like to acknowledge support from Eli Lilly and Company under research project funding agreement 17099289. Charles A. Bouman and Gregery T. Buzzard were supported in part by NSF grant CCF1763896. We also thank M. Cory Victor and Dr. Coralie Richard from Eli Lilly for their assistance and guidance in setting up the residual seal force test experiment.
References
 [1] C. Huang, J. L. Ackerman, Y. Petibon, T. J. Brady, G. El Fakhri, and J. Ouyang, “MRbased motion correction for PET imaging using wired active MR microcoils in simultaneous PETMR: Phantom study,” Medical physics, vol. 41, no. 4, 2014.
 [2] K. A. Mohan, S. Venkatakrishnan, J. W. Gibbs, E. B. Gulsoy, X. Xiao, M. De Graef, P. W. Voorhees, and C. A. Bouman, “TIMBIR: A method for timespace reconstruction from interlaced views.” IEEE Trans. Computational Imaging, vol. 1, no. 2, pp. 96–111, 2015.
 [3] A. Ziabari, D. H. Ye, K. D. Sauer, J. Thibault, and C. A. Bouman, “2.5D deep learning for CT image reconstruction using a multiGPU implementation,” in Signals, Systems, and Computers, 2018 52nd Asilomar Conference on. IEEE, 2018.
 [4] J. Gibbs, K. A. Mohan, E. Gulsoy, A. Shahani, X. Xiao, C. Bouman, M. De Graef, and P. Voorhees, “The threedimensional morphology of growing dendrites,” Scientific reports, vol. 5, p. 11824, 2015.
 [5] K. A. Mohan, S. Venkatakrishnan, J. W. Gibbs, E. B. Gulsoy, X. Xiao, M. De Graef, P. W. Voorhees, and C. A. Bouman, “4D modelbased iterative reconstruction from interlaced views.” in ICASSP, 2015, pp. 783–787.
 [6] X. Wang, K. A. Mohan, S. J. Kisner, C. Bouman, and S. Midkiff, “Fast voxel line update for timespace image reconstruction,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016, pp. 1209–1213.
 [7] S. J. Kisner, E. Haneda, C. A. Bouman, S. Skatter, M. Kourinny, and S. Bedford, “Modelbased CT reconstruction from sparse views,” in Second International Conference on Image Formation in XRay Computed Tomography, 2012, pp. 444–447.
 [8] K. Sauer and C. Bouman, “A local update strategy for iterative reconstruction from projections,” IEEE Transactions on Signal Processing, vol. 41, no. 2, pp. 534–548, 1993.

[9]
S. Sreehari, S. V. Venkatakrishnan, B. Wohlberg, G. T. Buzzard, L. F. Drummy, J. P. Simmons, and C. A. Bouman, “Plugandplay priors for bright field electron tomography and sparse interpolation,”
IEEE Transactions on Computational Imaging, vol. 2, no. 4, pp. 408–423, 2016.  [10] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, “Plugandplay priors for model based reconstruction,” in Global Conference on Signal and Information Processing (GlobalSIP), 2013 IEEE. IEEE, 2013, pp. 945–948.
 [11] Y. Sun, B. Wohlberg, and U. S. Kamilov, “An online plugandplay algorithm for regularized image reconstruction,” arXiv preprint arXiv:1809.04693, 2018.
 [12] U. S. Kamilov, H. Mansour, and B. Wohlberg, “A plugandplay priors approach for solving nonlinear imaging inverse problems,” IEEE Signal Processing Letters, vol. 24, no. 12, pp. 1872–1876, 2017.
 [13] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3D transformdomain collaborative filtering,” IEEE Transactions on image processing, vol. 16, no. 8, pp. 2080–2095, 2007.
 [14] M. Maggioni, G. Boracchi, A. Foi, and K. Egiazarian, “Video denoising using separable 4D nonlocal spatiotemporal transforms,” in Image Processing: Algorithms and Systems IX, vol. 7870. International Society for Optics and Photonics, 2011, p. 787003.
 [15] G. T. Buzzard, S. H. Chan, S. Sreehari, and C. A. Bouman, “Plugandplay unplugged: Optimizationfree reconstruction using consensus equilibrium,” SIAM Journal on Imaging Sciences, vol. 11, no. 3, pp. 2001–2020, 2018.
 [16] D. Jiang, W. Dou, L. Vosters, X. Xu, Y. Sun, and T. Tan, “Denoising of 3D magnetic resonance images with multichannel residual learning of convolutional neural network,” Japanese journal of radiology, vol. 36, no. 9, pp. 566–574, 2018.
 [17] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep CNN for image denoising,” IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3142–3155, 2017.
 [18] Y. Sun, B. Wohlberg, and U. S. Kamilov, “Plugin stochastic gradient method,” arXiv preprint arXiv:1811.03659, 2018.
 [19] Y. Sun, S. Xu, Y. Li, L. Tian, B. Wohlberg, and U. S. Kamilov, “Regularized fourier ptychography using an online plugandplay algorithm,” arXiv preprint arXiv:1811.00120, 2018.
 [20] T. Balke, S. Majee, G. T. Buzzard, S. Poveromo, P. Howard, M. A. Groeber, J. McClure, and C. A. Bouman, “Separable models for conebeam MBIR reconstruction,” Electronic Imaging, vol. 2018, no. 15, pp. 181–1, 2018.
 [21] V. Sridhar, G. T. Buzzard, and C. A. Bouman, “Distributed framework for fast iterative CT reconstruction from viewsubsets,” Electronic Imaging, vol. 2018, no. 15, pp. 102–1, 2018.
 [22] V. Sridhar, X. Wang, G. T. Buzzard, and C. A. Bouman, “Distributed memory framework for fast iterative CT reconstruction from viewsubsets using multiagent consensus equilibrium,” Manuscript in preparation for IEEE Transactions on Computational Imaging, 2018.
 [23] R. Mathaes, H.C. Mahler, J.P. Buettiker, H. Roehl, P. Lam, H. Brown, J. Luemkemann, M. Adler, J. Huwyler, A. Streubel et al., “The pharmaceutical vial capping process: Container closure systems, capping equipment, regulatory framework, and seal quality tests,” European Journal of Pharmaceutics and Biopharmaceutics, vol. 99, pp. 54–64, 2016.
Comments
There are no comments yet.