Natural signals such as images, audio and video often demonstrate sparsity in some representation domain such as wavelets pati:93:omp
, Fourier transform, etc. Dictionary learning methods can extract features from natural signal data and represent the natural signal in a sparse format. Such sparse representation methods have been used in many areas like image denoisingelad:06:idvyang:10:isr, inpainting julien:08:src, medical image reconstruction zhang:17:tbd, etc.
There are two typical approaches to find a sparse representation of a signal: the synthesis model and the analysis model. Let be the measurement outcome of signal with measurement operator . For synthesis models, a signal
is encoded as a combination of column vectors in a synthesis dictionaryby assuming , where contains only a few nonzero elements, and then the data consistency is enforced simultaneously in the form of . For analysis models, a signal is assumed to be sparse in an (overcomplete) transform domain with , where is referred to as the analysis dictionary, and meanwhile one in parallel enforces the data consistency .
The sparse coding step is NP-hard, and the algorithms to learn synthesis or analysis dictionaries are computationally expensive such as K-SVD aharon:06:ksa, Analysis K-SVD rubinstein:13:aka, NAAOL yaghoobi:13:coa, etc. The sparsifying transform model manifests advantage in terms of low computational cost for computing the sparse coding of signals ravishankar:15:lst. By modeling where indicates an error term, is approximately sparse in the transform domain. Such a sparsifying transform model ravishankar:13:lst can be regarded as a generalized analysis model. For instance, Ravishankar et al. proposed to learn doubly sparse transforms for signals ravishankar:13:lds, where the sparsifying transform is a product of two different transforms, a sparse transform and a fast, analytical transform. Other works also developed efficient blind compressed sensing algorithms sabsam for joint sparsifying transform learning and image reconstruction for MRI.
Recent years have witnessed the growing development of deep learning methods. A typical method is to employ deep neural networks (DNN) and train an end-to-end model between the input and the output. For example, Jin et al. proposed the FBPConvNet based on the U-net frameworkjin:17:dcn
, which constructs a mapping from noisy low-dose CT reconstruction images to high-quality images. Chen et al. combined the autoencoder and deconvolution network and proposed the RED-CNN framework for low-dose CT imagingchen:2017:redCNN. Despite that DNN-based methods show potential for capturing underlying features of big data, a large amount of (often paired) training data is indispensable to train a feasible network. In addition, an explicit understanding of the modeling or representation captured in the network is currently lacking.
In the past few years, researchers imitated the nested network structure and created more complex sparse signal models. For example, Tang et al. proposed a Deep Micro-Dictionary Learning and Coding Network, where classic convolutional layers are replaced with dictionary learning and feature encoding operations tang:19:dmd. Singhal et al. Singhal:18:mmt presented a deep dictionary learning network (DDL) for image classification by minimizing the reconstruction error after going through multiple DL feature encoding steps. Ravishankar et al. exploited multi-layer extension for sparsifying transform model, which enables sparsifying an input image layer by layer ravishankar:18:lml. Other works proposed to improve signal models by combining multiple signal modules in a parallel structure wen:14:sos, zheng:18:pua. Wen et al. extended the single sparsifying transform model to a union of sparsifying transforms model, where different inputs are assigned to different clusters and transforms are learned with respect to each cluster wen:14:sos.
Several applications of sparsifying transform (ST) learning and various extensions have been demonstrated in the field of medical image reconstruction such as magnetic resonance image reconstruction and computed tomography (CT) image reconstruction zheng:18:pua, yang:20:mars. The conventional method for regular-dose CT image reconstruction is the analytical filtered back-projection (FBP) feldkamp:84:pcb. However, with reduced/low X-ray dose, severe artifacts and noise can degrade the quality of the reconstructed image. A class of reconstruction methods takes the physics model of the imaging system into account along with statistical models of measurements and noise and often simple object priors. These methods are referred to as model-based image reconstruction methods (MBIR) MBIR_review_21.
The penalized weighted least squares (PWLS) approach is a common MBIR method, whose loss function includes a weighted quadratic data-fidelity term and a regularizer term. The regularizer is conventionally based on hand-crafted models and priorssauer:93:alu, thibault:06:arf, thibault:07:atd, cho:15:rdf. Previous studies manifested a promising result by introducing learned sparsifying transform learning into the PWLS scheme pfister:14:mbi, pfister:14:trw, pfister:14:ast, chun:17:esv. The sparsifying transform learned in an unsupervised manner is incorporated into a prior term, which is equivalent to involving effective information extracted from the training slices into the process of image reconstruction. Moreover, using multiple sparsifying transforms increases the number of obtained features zheng:18:pua, yang:20:mars and further boosts the performance of reconstruction algorithms.
In this study, we propose a network-structured sparsifying transform learning module, which we refer to as the Multi-layer Clustering-based residual Sparsifying Transform (MCST) model. We extended the original sparsifying transform model in two aspects. On the one hand, the multi-layer structure of MCST model enables gaining potential features by exploiting residual maps layer by layer. On the other hand, the clustering operation in each layer implements an attention-like mechanism by grouping input patches. We propose a pipeline for training the MCST model and show how to update the variables in this pipeline. In addition, we have applied the proposed learned MCST model to LDCT reconstruction through a PWLS optimization scheme. We named the new LDCT reconstruction algorithm PWLS-MCST. Experimental results demonstrate that PWLS-MCST not only provides better image reconstruction quality than conventional methods like FBP and PWLS method with the non-adaptive edge-preserving (EP) regularizer, but also outperforms several advanced methods such as PWLS-ULTRA zheng:18:pua and PWLS-MARS yang:20:mars in terms of RMSE and SSIM.
2.1 Model description
In this section, we give a mathematical description of the MCST model and introduce its framework. Figure 1 shows the structure of the proposed multi-layer clustering-based residual sparsifying transform (MCST) model. Table 1 in Appendix gives a detailed description of the notation used in our methodology. Matrix denotes the transform in class in layer . and represent residual map and sparse code submatrices respectively, in layer with column indices , where is a set containing indices of clustered variables of class in the -th layer. In particular, each column in
is a vectorized overlapping image patch extracted from the training dataset (of images) with an appropriate patch stride. Figure2 gives an illustration of the patch extraction and patch clustering processes on an image across multiple layers. For patches with the same index, even though they are in different layers, the location of the included pixels in the image is the same. Each patch is vectorized for further clustering processing after extraction. The proposed MCST model extends the original sparsifying transform model in two dimensions to strengthen the model’s representation ability. The first type of extension is increasing the depth of the model. We borrow the design idea of the recent MARS yang:20:mars model. Multi-layer structure enforces the filtered residuals (the difference between the transformed input signals and their sparse representation) to be further sparsified layer by layer. Each layer keeps partial information or captures effective features from the input. The second type of improvement is changing the width of the MCST model. Similar to the popular attention mechanism, for different parts of the input signal in each layer, we learn sparsifying transform modules separately. Since our method is patch-based, a clustering operation is necessary to assign each patch into a specific class. Moreover, the relative order of the patches cannot be disrupted by the clustering operation, which requires us to remember the index of each patch at each layer and combine them in the same order before the next clustering.
2.2 Algorithm for model training and image reconstruction
Figure 3 gives an overview of the algorithms for model training and low-dose CT reconstruction. The whole process is divided into two stages. In the process of model training, (2.1) is solved by employing a block coordinate descent (BCD) method. We propose an iterative algorithm involving cluster assignment update, transform update, and sparse coding steps. Note that we use unlabelled images to train the proposed MCST signal model. A similar cyclical updating strategy is applied to the image reconstruction stage as well. Instead of updating the transform in each iteration, for the reconstruction stage, an image update step is included into the BCD algorithm. We designed the prior function (regularizer) incorporating the learned MCST model to harness the effective information obtained in the training stage.
2.2.1 Algorithm for MCST model training
The formulation for training the MCST model is given as in (2.1). Since (2.1) is nonconvex, similar to the recent MARS work yang:20:mars, we apply the BCD algorithm to solve this problem, which takes an iterative updating strategy among different variables to be optimized. Algorithm 1 shows the full MCST learning pipeline. We update variables in MCST by looping over layers. The specific solving procedure is presented in the following steps:
Cluster assignment update in -th layer
In this step, we find the cluster assignment for every patch in the th layer by solving the following optimization problem (1). Random initialization is employed for cluster assignment in the first iteration. When updating the cluster assignment in the -th layer, the patch assignments in the deeper layers are fixed.
The objective function of optimization problem (1) consists of two parts: sparse encoding residual and an penalty to enforce sparsity on the encoded representation. When computing the cluster assignment of a patch in the -th layer, which is denoted as , the transforms in later layers ( where ) still play a role. This is because the residuals of the patch at later layers () depend on transforms () and the residual () in the -th layer, which is reflected in the constraint.
To solve problem (1), we start at layer and proceed to later layers. To determine , one only needs to enumerate the values of the objective function with , and choose the index that gives the minimal objective function value. We repeat the solution process across all layers until the cluster assignment no longer changes.
Sparse coding step for
The objective to update in each iteration is shown in (2). With the cluster assignment for every patch and all transforms and sparse coefficient maps except the th layer fixed, the objective function in (2.1) becomes
To solve (3), we fix the base layer index as . Recall that the residuals in later layers depends on the residual and the encoding vector in layer . Therefore, we need to peel off this dependence to find the solution for the encoding vector in layer . We use the unitary property of the transforms to disentangle the dependence. Consider the encoding discrepancy term in a layer where :
where we define the back-propagation information vector from the -th to the -th layer for the patch as
Consequently, the objective function in the optimization problem (3) can be re-written as
The solution of optimal to the optimization problem above is thus
The solution form depends on the specific layer , where optimization is done in (2).
Transform update for
To update the transforms in the -th layer, we leave out quadratic terms with layer depth less than and all terms which are unrelated to the target optimization variables. By assembling the column vectors , into and respectively where , (2.1) can be rewritten as the following optimization problem. With cluster assignment and sparse code fixed, we solve (7) to update .
Define as the matrix consisting of back-propagation information vectors from the -th layer to the -th layer for all patches in layer belonging to cluster . We need to handle the same entanglement between transforms in the former layers and later layers. With the same argument to untwine such dependence as in equation (2.2.1), the objective function in problem (7) can be re-written as
where the symbol means equal up to some additive terms that do not depend on the variables being optimized for.
From the rewritten objective function, the unitary constraint on transforms reduces the optimization over each to the orthogonal procrustes problem, which thus can be solved separately. Define as
2.2.2 PWLS-MCST image reconstruction algorithm
We adopt the Penalized Weighted Least Squares (PWLS) approach to reconstruct an image from its noisy sinogram. The learned MCST model is incorporated into the cost function as the regularizer term. The specific PWLS problem is shown in (P1), where denotes the reconstructed image and is the number of pixels in the image. Vector represents the noisy sinogram data and is the system matrix of the CT scan. is the diagonal weighting matrix with
indicating the inverse variance of. Matrix is the diagonal majorizing matrix of . Operator extracts the -th overlapping patch from the original image with patch stride of pixel. Parameter controls the trade-off between the data-fidelity term and the regularizer term. are tunable thresholds which control the sparsity of the MCST model in each layer.
where is defined as
To solve (P1), we decompose (P1) into several subproblems (image update for , cluster assignment update for , sparse coding for ) and solve these subproblems sequentially in each iteration of the proposed algorithm. The complete algorithm corresponding to the image reconstruction stage is shown in Algorithm 2.
Image update step
Here, we fix the cluster assignments and sparse representations . The resulting subproblem can be rewritten as follows.
The solving process for (11) is identical to the image update step in Ref. yang:20:mars For each outer iteration, we update the image times with parameter decreasing as in (12). The formulae to compute and the Hessian matrix for image update are given in (13) and (14), respectively.
(12) (13) (14)
Cluster assignment update for
In the process of image reconstruction, the cluster assignment for every reconstructed image patch is updated dynamically in the same manner as in the learning stage. We find the optimal class for the -th patch in the -th layer by solving the following problem as in Section 2.2.1.
Sparse coding step for
3.1 Experiment setup
In this study, we evaluate the performance of the proposed MCST sparse signal model and the PWLS-MCST algorithm for LDCT reconstruction. We investigate the PWLS-MCST algorithm with different numbers of layers (2, 3 layers) with each layer containing 5 clusters to classify input patches into various groups. We compare the proposed methods against several baseline methods including FBPfeldkamp:84:pcb, PWLS with EP regularization cho:15:rdf, and the recent MARS yang:20:mars and ULTRA zheng:18:pua regularizations in PWLS, respectively. The details of these methods are described as follows.
FBP: We used a Hanning window with width to accomplish the back-projection reconstruction.
PWLS-EP: An edge-preserving (EP) regularizer is employed in the PWLS reconstruction scheme. The mathematical representation of the EP regularizer is , where counts the overall number of pixels, denotes the neighborhood of the th pixel , and and denote the analytically determined parameters to encourage uniform resolution. is a potential function.
PWLS-MARS: A multi-layer residual sparsifying transform (no clustering involved) is adopted as the prior in PWLS reconstruction. For a fair comparison, we choose MARS models with , layers (MARS2, MARS3), which have the same depth as the MCST models in our experiments.
PWLS-ULTRA: PWLS with a union of transforms model, which is equivalent to the MCST signal model with a single layer.
We used two metrics, RMSE and SSIM xu:12:ldx, zhang:17:tbd, to evalute the quality of reconstructions. We compute the root mean square error (RMSE) as RMSE , where and denote the reconstructed image and ground truth image, respectively, and calculates the pixel number inside the region of interest (ROI), which is a circular region containing all structures and tissues.
Our experiments include two main parts: learning transforms for constructing regularizers, and using them for low-dose CT image reconstruction. We work with two datasets: XCAT phantom simulated data Segars:08:rcs and Mayo clinic data Mayo:16:data. We generate the low-dose measurements with the “Poisson + Gaussian” noise model, i.e., ding:16:mmp, where denotes the incident photon intensity of the X-ray beam and we set per ray and with no scatter. is the variance of electronic noise. For XCAT phantom data, we use 5 slices to train the model. We simulate the low-dose measurements with GE 2D LightSpeed fan-beam geometry. The size of measurements is . The pixel size for the XCAT phantom is mm. For the Mayo Clinic data, 7 regular-dose images collected from three patients were used to train the MCST model. We synthesized the low-dose measurements using a fan-beam CT geometry with a monoenergetic source. The sinograms are of size . Specifically, the width of the detector column is mm. The distances from the source to detector and to rotation center are mm and mm, respectively. The size of the reconstructed images is with mm.
3.2 Low-dose experiments with XCAT phantom dataset
3.2.1 Behavior of the learned MCST models
First, we display the learned transforms from the XCAT phantom training dataset. Figure 4 shows the image set with all training slices. The number below each subfigure indicates its location in the volume. We extracted overlapping patches with a patch stride . The total number of training patches is approximately .
We learn the transforms in the MCST method with different model depth: ULTRA (single-layer MCST), MCST2 (2-layer), and MCST3 (3-layer), where each layer contains 5 clusters to enable learning rich features. To verify the effectiveness of the multi-layer and multi-cluster extension, MARS2 (2-layer) and MARS3 (3-layer) are included for comparison. We ran 1000 iterations to train each model and vary the parameters as for the MARS model in our previous work yang:20:mars. Specifically, we set parameters for ULTRA, for MCST2 and MARS2, for MARS3, for MCST3. We initialize the transforms for MCST in the first layer with 2D DCT matrices and the rest of the transforms are initialized with random matrices. Figure 5 shows the pre-learned transforms for ULTRA with 5 clusters (shown in the orange box), MARS with two layers (shown in the gray box), and MCST2 with 5 clusters in both layers (shown in the blue box). Each row of the learned transforms is reorganized as an square matrix for display purposes. The results show that the MCST model integrates the advantages of ULTRA model and MARS model. In particular, the transforms in the first layer of MCST show rich features (like ULTRA), while the transforms in the second layer capture finer features by further sparsifying the representation of residuals.
3.2.2 LDCT reconstruction results with PWLS-MCST algorithm
The reconstructed slice and slice of the XCAT phantom dataset are illustrated in Figures 6 and 7, respectively. We compare the results of PWLS-MCST with the conventional method FBP, PWLS-EP, PWLS-ULTRA, and PWLS-MARS. We used the FBP reconstruction as initialization for PWLS-EP and set the regularization parameter as . We ran 1000 iterations of the relaxed LALM algorithm for PWLS-EP and ran 1500 outer iterations for other ST-based iterative methods to ensure their convergence. We take the same training dataset and validate image slice (slice 48 of the XCAT phantom) as in Ref. yang:20:mars. The reconstruction parameters are varied based on their values in Ref. yang:20:mars. Specifically, we set the regularization parameter and sparsity parameters as for ULTRA, for MARS2 and MCST2, for MARS3, and for MCST3, respectively.
The reconstruction results show that PWLS-MCST attains better performance than all baseline methods. Compared with PWLS-MARS, the images reconstructed by PWLS-MCST demonstrate greater contrast due to flexible image modeling with multi-class transforms learned from different groups of patches. The performance difference is exemplified in the zoomed-in portions and details pointed by red arrows. Also, using multiple layers enables PWLS-MCST to preserve more key features compared to PWLS-ULTRA. For example, in the reconstructed images by PWLS-ULTRA, some subtle bone structures are missing (zoomed-in region at the bottom-left corner of subfigures), whereas PWLS-MCST mitigates this effect; the zoom-in regions located at the bottom-right corner of subfigures for PWLS-MCST are closer to the ground truth than those with PWLS-ULTRA. We point out that PWLS-MCST provides the best performance in terms of RMSE and SSIM.
However, compared with the 2-layer MCST model, the improvement with deeper MCST model (MCST3) is limited, which is in accordance with the performance of PWLS-MARS. Since XCAT phantom images have relatively simple structures, the complexity of MCST3 models may be too high to demonstrate significant usefulness with this dataset. A larger number of tunable parameters in deeper MCST models might also create potential sub-optimality in the training process.
3.3 Low-dose experiments with Mayo Clinic dataset
3.3.1 MCST model training
Next, we train the MCST model on the Mayo Clinic dataset. We use 7 regular-dose slices collected from three different patients (L096, L067, L143) for training. Figure 8 displays all training slices used in the experiment. Similar to the XCAT phantom experiment, we extracted overlapping patches from these training slices. The total number of patches is .
We trained the ST-based signal model with similar parameters as for the XCAT phantom experiment. We set iteration number for training as 1000 and set for ULTRA, for MARS2 and MCST2, for MARS3, and for MCST3. The initialization of the transforms in the MCST model are DCT matrices for the first layer and random matrices for successive layers, respectively.
As Figure 9 shows, the learned transforms from the ULTRA learning algorithm contain some compound features (e.g. the third transform in the first row). Similar to the XCAT phantom experiment, MCST2 blends the advantages of MARS with the ULTRA model and captures richer features from the training patch set. Furthermore, considering the complexity of the Mayo Clinic dataset, MCST2 is able to learn effective features in every transform of the first layer.
3.3.2 Numerical and visual results on Mayo Clinic data
After obtaining a series of transforms, we incorporate those pre-learned transforms into the PWLS scheme and compare various reconstruction methods. For PWLS-EP, we iterate the relaxed LALM algorithm with 1000 iterations with the FBP reconstruction as initialization. The hyperparameteris set as . For PWLS-ULTRA, PWLS-MARS2, and PWLS-MARS3, we set the maximum outer iterations as 1500 and inner image update times . Similar to the MARS work yang:20:mars, we tuned the parameters on one validation image (slice 100 of patient L506) and tested the performance of our proposed algorithm on other slices (slice 140 of patient L333, slice 90 of patient L109, and slice 120 of patient L067). We varied the reconstruction parameters based on the fine-tuned parameters in the PWLS-MARS work. Specifically, we chose for PWLS-ULTRA, for PWLS-MARS2 and PWLS-MCST2, and for PWLS-MARS3 and PWLS-MCST3, respectively.
give the image reconstruction results of slice 140 of patient L333, slice 90 of patient L109, and slice 120 of patient L067, respectively. The proposed PWLS-MCST algorithm gives better quality of image reconstructions compared to other recent unsupervised learning-based methods (PWLS-ULTRA and PWLS-MARS), especially for recovering subtle details. Although the result of PWLS-ULTRA is clearer than the initial PWLS-EP reconstruction, it can be improved by incorporating the rich multi-layer structure from MCST. Numerical results indicate that PWLS-MCST performs the best in terms of RMSE and SSIM criteria.
In this work, we proposed a multi-layer clustering-based residual sparsifying transform (MCST) network comprised of multiple sparsifying transform modules to extract rich information from training images. The MCST model is different from the previous single sparsifying transform model in two aspects. First, learning multiple transforms (in each layer) by clustering the data allows capturing rich features from patches. Second, the multi-layer structure aims to better capture the sparsity of the signal representation across several layers. We applied the learned MCST model to LDCT reconstruction and presented a novel image reconstruction algorithm dubbed PWLS-MCST. Experimental results indicate that PWLS-MCST provides better image reconstruction quality and image features than conventional methods such as FBP and PWLS-EP. Furthermore, PWLS-MCST shows advantages over several recent sophisticated algorithms for LDCT reconstruction including PWLS-MARS and PWLS-ULTRA, especially for displaying clearer edges and preserving subtle details. In the future, we plan to involve more sophisticated architectures like skipped connections among different layers and multi-scale transformation through down-sampling and up-sampling within the MCST model and formulations. Additional imaging applications can be explored to validate the general applicability of the proposed model.
We use lower case bold font to denote vectors and upper case bold font to denote matrices. is the 2-norm for vectors, is the Frobenius norm for matrices, and counts the non-zero entries. Operator is the hard-thresholding function which zeros out values less than . We list the notation used in the description of our methodology in Table 1.
|Number of classes in layer|
|A set containing indices of patches in class in the -th layer|
|-th column of the matrix and|
|Submatrix of and respectively, with column indices in the set|
|Transform of the class in -th layer|
|Cluster assignment of the patch in -th layer|
|information back-propagation vector for the patch from -th layer to -th layer|