Improvements in imaging sensors and computing power have made it possible to solve increasingly difficult reconstruction problems. In particular, the dimensionality of reconstruction problems has increased from the traditional 2D and 3D problems representing space to more difficult 4D or even 5D problems representing space-time and, for example, heart or respiratory phase [1, 2].
These higher-dimensional reconstruction problems pose surprisingly difficult challenges both computationally and perhaps more importantly, in terms of algorithmic design and training due to the curse of dimensionality. However, the high dimensionality of the reconstruction also presents important opportunities to improve reconstruction quality by exploiting regularity. In particular, it has been shown that the temporal resolution of 4D CT imaging can be increased by an order of magnitude by exploiting the time space regularity of objects being imaged [2, 4, 5, 6]. These approaches use model-based iterative reconstruction (MBIR) [7, 8] to enforce regularity using simple space-time prior models.
Recently, it has been demonstrated that Plug-and-Play (P&P) priors [9, 10, 11, 12] can dramatically improve reconstruction quality by enabling the use of state-of-the-art denoisers as prior models in MBIR. So P&P has great potential to improve reconstruction quality in 4D CT imaging problems. However, state-of-the-art denoisers such as deep Convolutional Neural Networks (CNNs) and BM4D are primarily available for 2D and sometimes 3D images, and it is difficult to extend them to higher dimensions [13, 14, 3]
. In particular, extending CNNs to 4D requires very computationally, and memory intensive 4D convolution applied to 5D feature tensor structures. This problem is further compounded by the lack of GPU accelerated routines for 4D convolution from major Deep-Learning frameworks such as Tensorflow, Keras, PyTorch. Currently only 1D, 2D, and 3D convolutions are supported with GPU acceleration. Furthermore, 4D CNNs require 4D ground truth data to train the P&P denoisers, which might be difficult or impossible to obtain.
In this paper, we present a novel 4D X-ray CT reconstruction algorithm that combines multiple low-dimensional CNN denoisers to implement a highly effective 4D prior model. Our approach, which we call Multi-Slice Fusion, integrates the multiple low-dimensional priors using multi-agent consensus equilibrium (MACE) . MACE is an extension of the P&P framework that formulates the inversion problem using an equilibrium equation—as opposed to an optimization—and allows for the use of multiple prior models and agents.
Figure 1 illustrates the basic concept of our approach. Multi-Slice Fusion integrates together three distinct denoisers each of which is designed to remove noise along lower dimensional slices of the 4D object. When MACE fuses the denoisers it simultaneously
enforces the constraints of each denoising agent, so the reconstructions are constrained to be smooth in all four dimensions. Consequently, Multi-Slice Fusion results in high-quality reconstructions that are practical to train and compute even when the dimensionality of the reconstruction is high. In our implementation, one MACE agent estimates the cone-beam tomographic inversion. The remaining 3 agents are CNN denoisers trained to remove additive white Gaussian noise (AWGN) along 2.5D slices of the reconstruction[3, 16, 17].
The MACE solution may be computed using a variety of algorithms, including variants of the Plug-and-Play algorithm based on ADMM or other approaches [18, 19, 10, 9]. We implement our method for the problem of 4D cone-beam X-ray CT reconstruction  of moving parts, and the MACE solution is computed on a heterogeneous cluster in which different agent updates are computed on different cluster nodes. In particular, the algorithm runs in parallel on heterogeneous cluster nodes, with the inversion portion running on CPU nodes and the prior portion running on GPU nodes.
Experimental results on both simulated data and real data obtained from an North Star Imaging X50 X-ray cone-beam scanner indicate that the Multi-Slice Fusion reconstruction using CNN priors can substantially improve reconstructed image quality as compared to MBIR reconstruction using traditional 4D MRF priors.
Ii Problem Formulation
In X-Ray CT imaging, an object is rotated and several 2D projections (radiographs) of the object are measured for different angles. The problem is then to reconstruct the 4D array of X-ray attenuation coefficients from these measurements, where 3 dimensions correspond to the spatial dimensions and the fourth dimension corresponds to time.
Let be the number of time-points, and be the number of voxels at each time point of the 4D image. For each time-point, , define
to be the vector of sinogram measurements at time, and to be the 3D volume of X-ray attenuation coefficients for that time-point. Let us stack all the measurements to form a vector and stack all the 3D volumes to form a vector , where and . The 4D reconstruction problem then becomes the task of recovering the 4D image of attenuation coefficients, , from the series of sinogram measurements, .
The image can be reconstructed as a maximum a posteriori (MAP) estimate as
where is the data-fidelity or log-likelihood term and is the 4D regularizer or prior model for the image, . The data-fidelity term of the 4D image, , can be written in a separable fashion as
is the data fidelity term for time , where is the system matrix, and is the weight matrix, and is a normalizing weight scalar . The values in the diagonal weight matrix,
, account for the non-uniform noise variance for each measurement.
If the prior model, , can be expressed analytically like a 4D Markov Random Field (MRF) as in , then the expression in Equation (1) can be minimized iteratively. However, in practice, it can be difficult to represent an advanced prior model in the form of a simple cost function that can be minimized. Consequently, P&P methods have been created as a method for representing prior models as denoising operations[9, 10]. More recently, P&P methods have been generalized to the consesus equilibrium framework as a way to integrate multiple criteria .
Iii Fusing Denoisers via Consensus Equilibrium
We use the MACE framework to fuse multiple denoisers to form a prior model for reconstruction. This allows us to make a prior model out of denoisers that are not optimization based .
Let for be a lower-dimensional denoiser that denoises a full 4D volume by processing the volume “slice by slice”. Furthermore, each denoiser,
, is trained to remove additive white Gaussian noise (AWGN) of standard deviationfrom the image. The denoisers serve as prior models even though a corresponding regularization term, , does not necessarily exist. The design of these denoisers is described in more detail in section IV. Each of the denoisers is a separate agent in the MACE framework.
The data-fidelity is achieved using a proximal map operation, , as another agent. The -operator is defined as the proximal map of as
where is the same as above and the unit-less parameter is used to vary the relative weight of the -operator with with respect to the denoisers and in turn the amount of regularization in the reconstruction can be adjusted.
The operator enforces fidelity to the measurements while each of the denoisers , , enforces regularity of the image in orthogonal image planes. Our method imposes a consensus between the operators , , , to achieve a balanced reconstruction that lies at the intersection of the solution space of the measurement model and each of the prior models.
In order to formally state the consensus equilibrium criteria and present the algorithm for computing the solution, we will need to introduce some additional notation. First, define the column vector denotes a tall vector as the concatination of state vectors where each state vector represents the input to one of the agents. Furthermore, let denote the operator formed by the application of all agents on each corresponding state vector. More precisely, we define the agent operator, , as
So simply denotes the parallel application of each agent to its corresponding state vector.
Next we define the averaging operator, , as
where denotes the weighted average of the state vectors for each agent given by
Notice that we choose the weighting so that there is equal weighting of the data-fidelity (i.e., agent ) and regularization (i.e., agents to ), irrespective of how many denoiser we use to form our prior model.
holds. The equilibrium condition of equation (8) implies that at equilibrium each agent with different inputs produces the same output. This output is then the consensus solution that represents a balance point between the forces of each agent.
From equation (8) it follows that the consensus solution is the fixed point of the map and can found efficiently by using partial update Mann iterations [15, 21, 22]. Partial update Mann iterations use approximate updates for the proximal map operator, . We define to be the corresponding partial update approximation of
where is the same as but uses only an approximation proximal mapping formed by a single update of a iterative optimization method initialized by . That is, the iterative optimization in the -operator in equation (4) is initialized by and terminated early.
The the partial update Mann iterations for fusing denoisers can be simplified to form Algorithm 1. The weighing parameter is used to control speed of convergence. The initial reconstruction used should be close to the final solution for faster convergence. Output of a fast conventional algorithm can be used as the initial reconstruction. We continue the partial update Mann iterations until the differences between state vectors become smaller than a fixed threshold. We use three passes of Iterative Coordinate Descent (ICD) for computing the approximate proximal mapping, .
Iv CNN Space-Time Denoiser
Since we iteratively apply multiple denoisers to multiple 4D volumes to compute the reconstruction, it is essential for each denoiser to be fast. We use Deep Convolutional Neural Networks since they provide state-of-the-art quality in denoising while at the same time being fast due to optimized GPU libraries.
The architecture of the CNN denoiser is shown in Figure 2. Each of the CNN denoisers used in the Multi-Slice Fusion is designed to perform 2.5D denoising of a 3D volume using a 2D moving window. By 2.5D we mean that as an input the neural network takes a thin 3D patch consisting of five consecutive 2D slices while the output is the denoised center slice. The other slices are being denoised by shifting the 5-slice moving window.
The 2D slices are processed using 2D convolution while the adjacent slices are treated as separate input channels. The neural network architecture is modeled after the residual network architecture presented in  but extended so that it works on 3D rather than 2D images. In [3, 16], it has been shown that this type of 2.5D processing is a computationally efficient way of performing effective 3D denoising with CNNs. We note that it is similar to the method presented in  where the 5 adjacent slices are analogous to the 3 color channels used in denoising of color images.
Our 3D CNN denoiser can theoretically be trained with space-time data to get different time and space regularization. However, in the absence of space-time training data, we trained the network on 3D-space volumes. For training, we used a CT reconstruction of a bottle and screw cap made from different plastics that has low noise. A slice from the training volume is illustrated in Figure 3. While our training and testing volumes are different, all of them are representative of a variety of NDE problems in which objects to be imaged are constructed from a relatively small number of distinct materials.
We extracted partially overlapping 3D patches of size from our training volume of size . As a data augmentation technique random rotation, flipping, and mirroring were applied during training. Pseudo-random additive white Gaussian noise (AWGN) was applied to the patches and the network was trained to remove the noise. The AWGN noise used does not represent CT measurement noise but is a result of using the MACE framework. More specifically, the AWGN noise that the denoisers removes in MACE is an ’artificial’ noise induced from the quadratic norm used in equation (4).
V 4D Prior as Multi-Slice Fusion
Although, the approach described in section III can fuse arbitrary number of denoisers, we fuse three 3D CNN denoisers to form our 4D prior. We refer to denoisers in equation (5) as , , and , respectively. The denoiser is a CNN space-time denoiser that does convolution in the xy plane and uses five input channels to input slices from neighboring time-points. The denoisers and are analogous to but are applied along the yz and zx plane, respectively. Each of the denoisers, despite being 3D denoisers, denoise the 4D volume by processing “slice by slice”.
Each denoiser incorporates information along a spatial slice and uses a moving temporal window of 5 slices to denoise a single slice that is in the center. However, our Multi-Slice Fusion fuses all the denoisers , , and to incorporate information along all four dimensions.
Vi Experimental Results
To demonstrate the improved reconstruction quality of our method, we present results on both real and simulated 4D X-ray CT data for NDE applications. In both cases we compare Multi-Slice Fusion reconstruction with 4D Markov Random Field based MBIR reconstruction (MBIR+4D-MRF) and conventional 3D Filtered Back Projection reconstruction (FBP). The MBIR+4D-MRF used was a variant of the QGGMRF model described in  with 26 spatial and 2 temporal neighbors. We also compare with reconstructions using single CNNs to illustrate the effect of model fusion. We refer to reconstructions using single CNN denoisers , , and as MBIR+, MBIR+, and MBIR+, respectively.
In all our experiments the cone-beam inversion within Multi-Slice Fusion was executed on two CPU cluster nodes, each with 20 Kaby Lake CPU cores. It was parallelized over time-points trivially and within time points using multiple threads similar to . We perform three equits per each Mann iteration for the cone-beam inversion. The CNN denoisers ran in parallel with the cone-beam inversion in a separate cluster with a Nvidia Tesla P100 GPU.
Vi-a Real Dataset
|Number of Views per Rotation||150|
|Cropped Detector Array||,|
In this section we present results on real data to evaluate our method. The data is from a dynamic cone-beam X-ray scan of a glass vial, with elastomeric stopper and aluminum crimp-seal, using a North Star Imaging (Rogers, MN) X50 X-ray CT system. The vial is undergoing dynamic compression during the scan, to capture the mechanical response of the components. Of particular interest is the moment when the aluminum seal is no longer in contact with the underside of the glass neck finish. This indicates the moment when the force applied exceeds that exerted by the rubber on the glass; this is known as the “residual seal force”. During the scan the vial was held in place by fixtures going in and out of the Field of View thus resulting in artifacts. All our 4D results incorporate preprocessing to correct these artifacts. The experimental setup is summarized in Table I.
Figure 6 compares Multi-Slice Fusion with several other methods. Each image is a slice through the reconstructed vial for one time point along the spatial xy plane. Both FBP and MBIR+4D-MRF suffer from obvious windmill artifacts, higher noise and have blurred contours. In contrast to that, the Multi-Slice Fusion reconstruction has smooth and uniform textures while preserving edge definition. Figure 6 also illustrates the effect of model fusion by comparing Multi-Slice Fusion with MBIR+, MBIR+, and MBIR+. MBIR+ and MBIR+ suffer from horizontal and vertical streaks respectively since the denoisers were applied in those planes. MBIR+ does not suffer from streaks in the figure since we are viewing a slice along the xy plane, but it suffers from other artifacts. MBIR+ cannot reconstruct the outer ring since the slice displayed is at the edge of the aluminum seal and the xy plane does not contain sufficient information. In contrast, Multi-Slice Fusion can resolve the edges of the rings better than either of MBIR+, MBIR+, and MBIR+ since it uses information from all the spatial coordinates.
Next, we plot a cross-section through the object for Multi-Slice Fusion, MBIR+4D-MRF and FBP in Figure 4. For this, we choose a time-point where we know the aluminum and glass have separated spatially. The Multi-Slice Fusion reconstruction has a steeper and more defined slope in the junction of aluminum and glass. This supports that Multi-Slice Fusion is able to preserve fine details in spite of producing a smooth regularized image.
Finally in Figure 5 we plot a cross-section through the object with time to show the improved space-time resolution of our method. We do this for both the 4D methods: MBIR+4D-MRF and Multi-Slice Fusion. Multi-Slice Fusion results in improved space-time resolution of the separation of aluminum and glass.
Vi-B Simulated Data
|Number of Views per Rotation||75|
|Cropped Detector Array||,|
In this section we present results on simulated data to evaluate our method. We take a low-noise CT reconstruction of a bottle and screw cap and denoise it further using BM4D  to generate a clean 3D phantom. We then move the 3D volume vertically for each time point to generate a 4D phantom. We forward project the phantom to generate sinogram data and use that to reconstruct the object. The simulated experimental setup is summarized in Table II.
Figure 8 compares Multi-Slice Fusion with several other methods. Each image is a slice through the reconstructed object for one time point along the spatial xy plane. Both FBP and MBIR+4D-MRF suffer from high noise and jagged edges and fail to recover the small hole in the bottom of the image. MBIR+ and MBIR+ suffer from horizontal and vertical streaks respectively since the denoisers were applied in those planes. MBIR+ does not suffer from streaks in the figure since we are viewing a slice along the xy plane, but it suffers from other artifacts. MBIR+ cannot reconstruct the small hole in the bottom of the image since the xy plane does not contain sufficient information.
Next we plot a cross-section through the object for Multi-Slice Fusion, MBIR+4D-MRF, FBP, and the phantom Figure 7. Multi-Slice Fusion results in the most accurate reconstruction of the gap between materials.
Finally we report the Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index (SSIM) with respect to the phantom for each method in Table III to objectively measure image quality. We define the PSNR for a given reconstruction with a phantom as
where range is computed from the and percentiles of the phantom. As can be seen from Table III, Multi-Slice Fusion results in the highest PSNR and SSIM scores.
In this paper, we proposed a novel 4D X-Ray CT reconstruction algorithm, Multi-Slice Fusion, that combines multiple low dimensional denoisers to form a 4D prior. Our method allows the formation of an advanced 4D prior using state-of-the-art CNN denoisers without needing to train on 4D data. Furthermore, it allows for multiple levels of parallelism; thus enabling reconstruction of large volumes in a reasonable time. Although we focused on 4D X-Ray CT reconstruction for Non Destructive Evaluation (NDE) applications, our method can be used for any reconstruction problem involving multiple dimensions.
The authors would like to acknowledge support from Eli Lilly and Company under research project funding agreement 17099289. Charles A. Bouman and Gregery T. Buzzard were supported in part by NSF grant CCF-1763896. We also thank M. Cory Victor and Dr. Coralie Richard from Eli Lilly for their assistance and guidance in setting up the residual seal force test experiment.
-  C. Huang, J. L. Ackerman, Y. Petibon, T. J. Brady, G. El Fakhri, and J. Ouyang, “MR-based motion correction for PET imaging using wired active MR microcoils in simultaneous PET-MR: Phantom study,” Medical physics, vol. 41, no. 4, 2014.
-  K. A. Mohan, S. Venkatakrishnan, J. W. Gibbs, E. B. Gulsoy, X. Xiao, M. De Graef, P. W. Voorhees, and C. A. Bouman, “TIMBIR: A method for time-space reconstruction from interlaced views.” IEEE Trans. Computational Imaging, vol. 1, no. 2, pp. 96–111, 2015.
-  A. Ziabari, D. H. Ye, K. D. Sauer, J. Thibault, and C. A. Bouman, “2.5D deep learning for CT image reconstruction using a multi-GPU implementation,” in Signals, Systems, and Computers, 2018 52nd Asilomar Conference on. IEEE, 2018.
-  J. Gibbs, K. A. Mohan, E. Gulsoy, A. Shahani, X. Xiao, C. Bouman, M. De Graef, and P. Voorhees, “The three-dimensional morphology of growing dendrites,” Scientific reports, vol. 5, p. 11824, 2015.
-  K. A. Mohan, S. Venkatakrishnan, J. W. Gibbs, E. B. Gulsoy, X. Xiao, M. De Graef, P. W. Voorhees, and C. A. Bouman, “4D model-based iterative reconstruction from interlaced views.” in ICASSP, 2015, pp. 783–787.
-  X. Wang, K. A. Mohan, S. J. Kisner, C. Bouman, and S. Midkiff, “Fast voxel line update for time-space image reconstruction,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016, pp. 1209–1213.
-  S. J. Kisner, E. Haneda, C. A. Bouman, S. Skatter, M. Kourinny, and S. Bedford, “Model-based CT reconstruction from sparse views,” in Second International Conference on Image Formation in X-Ray Computed Tomography, 2012, pp. 444–447.
-  K. Sauer and C. Bouman, “A local update strategy for iterative reconstruction from projections,” IEEE Transactions on Signal Processing, vol. 41, no. 2, pp. 534–548, 1993.
S. Sreehari, S. V. Venkatakrishnan, B. Wohlberg, G. T. Buzzard, L. F. Drummy, J. P. Simmons, and C. A. Bouman, “Plug-and-play priors for bright field electron tomography and sparse interpolation,”IEEE Transactions on Computational Imaging, vol. 2, no. 4, pp. 408–423, 2016.
-  S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, “Plug-and-play priors for model based reconstruction,” in Global Conference on Signal and Information Processing (GlobalSIP), 2013 IEEE. IEEE, 2013, pp. 945–948.
-  Y. Sun, B. Wohlberg, and U. S. Kamilov, “An online plug-and-play algorithm for regularized image reconstruction,” arXiv preprint arXiv:1809.04693, 2018.
-  U. S. Kamilov, H. Mansour, and B. Wohlberg, “A plug-and-play priors approach for solving nonlinear imaging inverse problems,” IEEE Signal Processing Letters, vol. 24, no. 12, pp. 1872–1876, 2017.
-  K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3D transform-domain collaborative filtering,” IEEE Transactions on image processing, vol. 16, no. 8, pp. 2080–2095, 2007.
-  M. Maggioni, G. Boracchi, A. Foi, and K. Egiazarian, “Video denoising using separable 4D nonlocal spatiotemporal transforms,” in Image Processing: Algorithms and Systems IX, vol. 7870. International Society for Optics and Photonics, 2011, p. 787003.
-  G. T. Buzzard, S. H. Chan, S. Sreehari, and C. A. Bouman, “Plug-and-play unplugged: Optimization-free reconstruction using consensus equilibrium,” SIAM Journal on Imaging Sciences, vol. 11, no. 3, pp. 2001–2020, 2018.
-  D. Jiang, W. Dou, L. Vosters, X. Xu, Y. Sun, and T. Tan, “Denoising of 3D magnetic resonance images with multi-channel residual learning of convolutional neural network,” Japanese journal of radiology, vol. 36, no. 9, pp. 566–574, 2018.
-  K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep CNN for image denoising,” IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3142–3155, 2017.
-  Y. Sun, B. Wohlberg, and U. S. Kamilov, “Plug-in stochastic gradient method,” arXiv preprint arXiv:1811.03659, 2018.
-  Y. Sun, S. Xu, Y. Li, L. Tian, B. Wohlberg, and U. S. Kamilov, “Regularized fourier ptychography using an online plug-and-play algorithm,” arXiv preprint arXiv:1811.00120, 2018.
-  T. Balke, S. Majee, G. T. Buzzard, S. Poveromo, P. Howard, M. A. Groeber, J. McClure, and C. A. Bouman, “Separable models for cone-beam MBIR reconstruction,” Electronic Imaging, vol. 2018, no. 15, pp. 181–1, 2018.
-  V. Sridhar, G. T. Buzzard, and C. A. Bouman, “Distributed framework for fast iterative CT reconstruction from view-subsets,” Electronic Imaging, vol. 2018, no. 15, pp. 102–1, 2018.
-  V. Sridhar, X. Wang, G. T. Buzzard, and C. A. Bouman, “Distributed memory framework for fast iterative CT reconstruction from view-subsets using multi-agent consensus equilibrium,” Manuscript in preparation for IEEE Transactions on Computational Imaging, 2018.
-  R. Mathaes, H.-C. Mahler, J.-P. Buettiker, H. Roehl, P. Lam, H. Brown, J. Luemkemann, M. Adler, J. Huwyler, A. Streubel et al., “The pharmaceutical vial capping process: Container closure systems, capping equipment, regulatory framework, and seal quality tests,” European Journal of Pharmaceutics and Biopharmaceutics, vol. 99, pp. 54–64, 2016.