DeepAI

# Regularization by Denoising Sub-sampled Newton Method for Spectral CT Multi-Material Decomposition

Spectral Computed Tomography (CT) is an emerging technology that enables to estimate the concentration of basis materials within a scanned object by exploiting different photon energy spectra. In this work, we aim at efficiently solving a model-based maximum-a-posterior problem to reconstruct multi-materials images with application to spectral CT. In particular, we propose to solve a regularized optimization problem based on a plug-in image-denoising function using a randomized second order method. By approximating the Newton step using a sketching of the Hessian of the likelihood function, it is possible to reduce the complexity while retaining the complex prior structure given by the data-driven regularizer. We exploit a non-uniform block sub-sampling of the Hessian with inexact but efficient Conjugate gradient updates that require only Jacobian-vector products for denoising term. Finally, we show numerical and experimental results for spectral CT materials decomposition.

• 3 publications
• 4 publications
01/18/2016

### Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Large scale optimization problems are ubiquitous in machine learning and...
12/21/2021

05/06/2019

### DLIMD: Dictionary Learning based Image-domain Material Decomposition for spectral CT

The potential huge advantage of spectral computed tomography (CT) is its...
01/06/2020

### Uniqueness criteria in multi-energy CT

Multi-Energy Computed Tomography (ME-CT) is a medical imaging modality a...
10/28/2018

### Multi-Spectral Imaging via Computed Tomography (MUSIC) - Comparing Unsupervised Spectral Segmentations for Material Differentiation

Multi-spectral computed tomography is an emerging technology for the non...
07/02/2016

### Sub-sampled Newton Methods with Non-uniform Sampling

We consider the problem of finding the minimizer of a convex function F:...
03/31/2022

### Iterative Reconstruction of the Electron Density and Effective Atomic Number using a Non-Linear Forward Model

For material identification, characterization, and quantification, it is...

## 1 Introduction

Spectral Computed Tomography (CT) is a developing technique for diagnostic and therapy in medical imaging which can gather enhanced physical parameters information compared to conventional CT by exploiting the dependence of the attenuation coefficients of an object with distinct photon energy spectra. This information leads to the estimation of images of basis materials [Alvarez1976], which can be used for quantitative imaging [Alessio2013], reducing beam hardening artifacts and improving the contrast-to-noise ratio [Yu2011]. Recently, spectral CT with photon counting detectors has attracted increasing interest.

In contrast to conventional transmission CT, photon counting detectors can resolve the X-ray photon energy and allow to acquire registered energy selective images [Roessl2007]

with effective reduction of the electronic readout noise, which is of crucial importance for low-dose applications. However, material decomposition algorithms typically lead to a degradation of the signal-to-noise ratio (SNR) and noise amplification

[kapper2010] compared to the unprocessed spectral images. This is still a fundamental problem of spectral CT since it limits the usability of material selective images.

Most spectral CT algorithms separate the process of material decomposition and image reconstruction. Material decomposition is either performed in the projection space prior to image reconstruction [Alvarez2011, Vilches2017] or an image based decomposition algorithm is applied after reconstructing the images corresponding to different photon energy spectra [Maab2009, Gao2011, Kim2015, Kazantsev2018]

by using tensor factorization

[wu2020] or dictionary learning [wu2020image]

. Separating these steps is sub-optimal because the full statistical information contained in the spectral tomographic measurements cannot be exploited. Statistical iterative reconstruction (SIR) techniques provide a data consistent approach to obtaining material selective images using a statistical model of the projection measurements and incorporating prior knowledge about the reconstructed material images. Unfortunately, the loss function induced by the forward model which directly maps the expected spectral projection measurements and the material images is non-convex and difficult to minimize; convex approximations of the loss function such as convex-concave techniques

[Long2014, Barber2016, Schmidt2017] or semi-empirical models [Weidinger2016, Mechlem2018] have been proposed but suffer of high computational cost because of the double-loop nature of the solver.

The main challenges which are attracting considerable interest are related to the fact that the image to be recovered lies in an high-dimensional space and to leverage more sophisticated or learned data-dependent priors which are needed since the estimation problem is generally ill-posed. Regarding the issue of reducing the per-iteration computational cost to solve the reconstruction in the high-dimensional setting, numerous first order solvers which leverage random numerical linear algebra have been deployed in different machine learning and imaging applications. Well established algorithms like Stochastic Gradient Descent (SGD)

[needell2014, konevcny2015], order subsets [kim2014] and Alternating Direction Method of Multiplier (ADMM) have gained considerably attention in spectral CT [Sawatzky2014, ramani2011]. Unfortunately, the iteration complexity of first order methods relies significantly on the condition number of the problem like SGD which has a sub-linear convergence rate to only a neighborhood of the solution [gower2019]. Instead, the information related to the curvature is decisive to improve the convergence by reducing the condition number and to obtain physical meaningful and accurate quantitative estimation. Therefore, accounting for a greedy selection of parameters and step-size is crucial to design first-order algorithms for imaging like CT image reconstruction [kim2014, he2018]. Deterministic second order methods for solving regularized optimization problems with Generalized Linear Models (GLM) have been widely studied [schmidt2011, lee2014] but despite the superior, linear or super-linear, convergence rate of Newton methods compared to first order methods one weakness relies on the high computational cost of calculating the Hessian matrix in high dimensional imaging application [schmidt2011, lee2014]. The Newton method requires at each iteration to solving exactly a linear system with complexity , with the dimension of the measurement vector and the dimension of the input vector, and therefore it is practical only when the dimension

is low or if the Hessian can be factorized in time linear to the input dimension, like by the fast Fourier Transform (FFT)

[nocedal2006]. Instead inexact methods solve approximately the linear system by iterative algorithms like Conjugate Gradient (CG) which does not require to calculate the full Hessian matrix but only operate through vectors products [gene1996]; this is particularly appealing in imaging systems like CT where there are fast implementations of the operator that represents the physical forward model.

In this work, a new second-order algorithm for statistical iterative material decomposition (Denoising-IHS) is developed which enjoys an improved accuracy and computation trade-off. We propose a computational efficient quantitative estimation with second order and parameters-free methods based on randomized linear algebra and regularization induced by denoising. We exploit the recent development of randomized sketches [woodruff2014] and sub-sampling methods [bollapragada2019] for the Hessian matrix to improve upon the computational cost-per-iteration. Unfortunately, these methods are efficient in the regime where the measurement dimension is much larger than the input dimension , i.e. [berahas2020], since the theoretical bound on the embedding of the sketched Hessian matrix requires a number of samples of order . Furthermore, these methods have been mainly analyzed for minimizing smooth and convex cost function and the interesting case of using denoiser-based regularizers has not been deeply studied. Therefore, we aim at developing an algorithm for spectral CT model where the dimension of the measurements is of the same order as the dimension of the image to be reconstructed, i.e. and exploit denoising function to capture the complex structure of the material images to be reconstructed.

### 1.1 Related works

Important contributions related to sketching methods for solving optimization problems refer to the Newton-Sketch algorithm [pilanci2017]

where the Newton equations are solved using an unbiased estimator of the true Hessian obtained by performing a dimensionality reduction using random sketching matrices. Different types of sketching such as Sub-Gaussian matrices, Hadamard or Fourier transforms, sparse Count Sketch matrices can be employed and have been analyzed

[pilanci2017]. Furthermore, sub-sampling Newton methods [byrd2011, erdogdu2015, roosta2016, xu2016]

are based on random row sub-sampling according to a uniform or non-uniform probability distribution

. It is interesting to note that random matrix sub-sampling schemes can be considered a particular type of sketching and they are more efficient especially in imaging problems where the linear forward model has a precise, deterministic structure, like the Random transform for CT imaging. Furthermore, since the Radon matrix

is an high dimensional tall and sparse matrix, the count sketch might be efficient because of multiplication complexity, where stands for the number of non-zero elements of the matrix . Unfortunately, the count sketch requires a bound on the sketch size of order , with the dimension of the vectorized image input, which is practically useless in CT scenarios where is of the same order of the number of measurements .

Our main interest is to develop a non-uniform sub-sampling Newton algorithm for CT imaging by incorporating data-dependent regularizer which is becoming state-of-the art in model-based iterative reconstruction. In imaging applications where the input prior is difficult to model, powerful regularization techniques are based on data-driven models or by exploiting denoising functions, like the Plug-and-play (PnP) approach [venkatakrishnan2013] or the regularization by denoising (RED) [romano2017, reehorst2019, sun2019]. It is interesting how both approaches allow to integrate in the iterative solver either deterministic denoisers, like BM3D [dabov2007]

, NLM, TV or deep learning CNN-based denoisers

[zhang2017]. The RED approach is based on an explicit regularization term [romano2017] and scalable first-order solvers for the RED problem in high dimensions have been proposed [sun2019block]. As far as the authors are aware, the RED approach has been investigated with first order solvers but not with Newton methods mainly because by default it would require to explicitly generate the Jacobian matrix of the denoiser which is infeasible in high-dimensions.

### 1.2 Main Contribution and Relation to Existing Works

In this work, we develop a second order method (Denoising-IHS) which combine the iterative sketching [pilanci2017] for the log-likelihood function and the RED regularization which can be implemented by a generic denoiser through the score matching formulation with application to spectral CT material decomposition. We propose an effective choice for both the sketching scheme and the solver in order to reduce the computational complexity for imaging problems characterized by physical-based convolutional operators.

The overall algorithmic time complexity is where is the time needed at each iteration to construct the sketching, is the time to solve the linear sub-problem inexactly, and is the total number of iterations which is affected by the convergence rate. Considering , taking inspiration from [xu2016]

, we develop a random non-uniform sub-sampling of the measurement where the probabilities are calculated according to a penalized block-leverage score metric. While penalizing the ridge leverage scores allows to reduce the sketch size with provable convergence, this metric exploits the CT measurements block structure which is defined as a product of the number of views and the number of detectors. This means that we randomly sub-sample over the acquired angles and the ridge leverage score associated with each view is calculated over a block constituted by the number of detectors. We take advantage of the convolutional structure of the Radon transform to estimate the leverage scores through the Fourier decomposition and reduce the complexity to

. Regarding the computational time to solve the sub-problem , we employ the CG algorithm since it does not require to store any gradient or Jacobian matrices and all operations can be performed using implicitly operators instead of matrix form. We develop a method to estimate the denoiser’s Jacobian product through automatic differentiation for CNN-based denoisers. Finally, for the term , we provide convergence analysis of the proposed method.

### 1.3 Notation

Matrices or discrete operators and column vectors are written respectively in capital and normal boldface type, i.e. and to distinguish from scalars and real-valued variables written in normal weight. refers to the transpose of a matrix and

refers to a vector of ones. Non-random quantities and random realizations are not distinguished typographically while random variable are written with capital letters. The conditional probability density function of

given is denoted by . A Gaussian random variable with vector mean

and isotropic variance

is denoted by . refers to the vector inner product.

### 1.4 Structure of the Paper

The remainder of this paper is structured as follows: in Section 2, the spectral CT mathematical model with the Gaussian approximation of the noise is described and Section 3 analyses the optimization problem together with a review of the Hessian sketching Newton method and the formulation of the regularization by denoising approach. The proposed algorithms is presented in Section 4 which contains the details of the denoising-based regularizer, the spectral CT data loss and an analysis of the CG inner solver. The further sections describe the crucial blocks of the Denoising-IHS algorithm: the approach to estimating the denoiser directional derivatives needed in the CG solver is described in Section 5 while the sketching strategy based on block leverage scores sub-sampling of the Hessian matrix of the data loss function is presented in Section 6. Finally, in Sections 7 and 8 we show the numerical simulation and real acquisition on different sets of images and we assess the algorithm in terms of accuracy and computation.

## 2 Spectral X-ray CT Model

In this Section, we describe the CT physical measurement process in the continuous domain with the spectrum of the X-ray source beams composed by different energy intervals. The spectral X-ray mathematical model is based on the Beer’s law which provides the X-ray intensity after transmission, of the -th ray at the -th energy interval (bin) as follows

 pki=∫ESk(E)e−∫l∈Liμ(→κ(l),E)dldE+ηki,{i=1,2,…,Nd×Np,k=1,2,…,Nb, (1)

where is the X-ray intensity of the -th detector pixel in the -th detector bin, is the photon flux density, is the number of detector pixels, is the number of projections, denotes the linear attenuation coefficient that is related to the position function and the energy level , is the Gaussian error term for the -th element in the -th energy bin and represents the photon flux density for the -th detector bin, which is the number of incident photons at the energy in the -th energy window.

In the discrete domain, we map the continuous spatially and energy dependent distribution by using a basis material decomposition for representing the object

 μ(→κ(l),E)≈Nm∑m=1cm(E)Nv∑j=1ϕj(→κ)xj,m (2)

where indicates the linear attenuation coefficient of the -th material at the energy level , defines the -th among basis functions associated with a discrete sampling on a Cartesian grid, represents the number of basis materials and is the weight fraction of the -th material in the -th image voxel/pixel. According to the basis material parameterization in Eq. (2), the line integral in Eq. (1) becomes a summation:

 ∫l∈Liμ(→κ(l),E)dl≈Nm∑m=1cm(E)Nv∑j=1xj,m∫l∈Liϕj(→κ(l))dl=Nm∑m=1Nv∑j=1ri,jxj,mcm,e (3)

where represents the element of the system matrix describing the discrete summation along the -path from source through object at pixel position onto each detector. Repeating this over all lines defines the full view linear tomographic system matrix ; by physical design, is a sparse and non-negative matrix with dimensions . For a discrete set of energies of dimension , the discrete model (1) can be written as

 pki=Ne∑e=1ske∑Nmm=1∑Nvj=1ri,jxj,mcm,e+ηki (4)

The source spectrum is modeled as a vector with non-negative entries and the detector response is a positive matrix whose dimension is the product of the number of bins and the number of discrete energies . Therefore, the spectrum for all the energy bins is the matrix obtained as Hadamard point-wise product . Both the source spectrum and the detectors response matrix are assumed to be known.

In matrix notation, denotes the matrix obtained as columns concatenation of the vectorized image of the basis materials. Therefore, , and are collected in a matrix form and , , are concatenated with respect to their energy windows. The corresponding matrix equation of (4) can be represented as

 P=exp(−RXCT)S+N (5)

with a matrix of the size where each entry corresponds to , the linear attenuation coefficient of the energy , and the -th material. We vectorize the nonlinear matrix equation (5) on both sides and linearize to avoid a tensor form for computing second order derivatives and we follow the mathematical model derived in [Hu2019]. In the forward problem, we use the full spectrum, and the matrix is rectangular while for solving the inverse problem, we choose the average in each energy window to represent the corresponding energy spectrum, so and the matrix in is a nonsingular diagonal matrix. Therefore, we multiply and vectorize on both sides of (5) and using the properties of Kronecker products we obtain

 (S−T⊗I)p=exp(−(C⊗R)x)+(S−T⊗I)n (6)

where , and . Denoting and and taking the logarithm, we have .

Using a first order Taylor expansion at , , we obtain the following approximation where . Assuming Gaussian noise , we have

 y|x∼N((C⊗R)x,Σ) (7)

where the inverse covariance matrix is expressed as

 Σ−1=diag(˜p)(S⊗I)diag(p)−1(ST⊗I)diag(˜p) (8)

is a matrix which represents the number of photons of each energy window in the corresponding column and each element of is a positive integer. From expression (8), we can see that the is diagonal since we have assumed that is diagonal.

We consider the MAP estimation problem for the spectral X-ray CT model with Gaussian approximation. Given the measurement vector of dimension , we consider the data-fit loss function as a measure of the distance between the input parameter vector , of dimension , and the measurement vector , with . The loss function is given by the following empirical negative log-likelihood (NLL)

 f(x,y)=−m∑i=1logp(yi|x)≈12m∑i=1σ−1i(aix−yi)2=12∥Ax−y∥2Σ−1 (9)

where . Given the likelihood function in Eq. (9) with , it results that

 ∇xf(x,y)=ATΣ−1(Ax−y),∇2xf(x,y)=ATΣ−1A (10)

It is important to note that in the region of physical interest, i.e. , is bounded.

## 3 Optimization problem

The MAP estimator can be formulated as the following optimization problem

 ^x=argminx∈Rn+f(x,y)+ρσ,ν(x) (11)

where represents the likelihood loss function as described in Eq. (9) and is the regularization term which defines a prior on the vector to be estimated, with the noise level. In the following, we will consider the constrained minimization over the set of non-negative vectors . We focus on a particular non-convex regularizer which can be explicitly defined according to a denoising (RED) function dependent on a set of parameters . The regularization term and the gradient can be expressed as

 ρσ,ν(x)=12νxT(x−Dσ(x)),∇ρσ,ν(x)=1ν(x−Dσ(x)) (12)

if the denoiser is locally homogeneous, i.e. and the its Jacobian is symmetric, . When the denoiser does not follows the above conditions, the relationship between and the gradient expression in Eq. (12) does not hold. In the following Section 3.1, an interpretation of RED for generic denoisers is summarized which is based on the score-matching by denoising framework [reehorst2019].

### 3.1 Denoising Score Matching

In medical imaging applications, the true prior is not available and therefore it is not possible to compute directly the MMSE estimator . Since calculating is infeasible in high dimensions, the main idea is to approximate the optimal MMSE estimator by a generic denoiser, parametrized over , even for denoisers not locally homogeneous or with non-symmetric Jacobian. By utilizing as prior of for the regularization term in (11), it is necessary to evaluate the expression for and . Instead, using an efficient denoiser on the class , although an analytic expression for is either not available or complex to handle, from an algorithmic perspective the prior is not needed but only the gradient and its Jacobian. Therefore, we exploit the approach based on approximating the score instead of the prior directly by invoking the connection between denoising and the score, i.e. the gradient of the log-prior, and its Hessian [hyvarinen2005]. It was shown in [vincent2011] that if we choose a function such that

 θσ(x)=Dσ(x)−xν,∂θσ(x)∂x=1ν(J[Dσ(x)]−I) (13)

where is the Jacobian matrix of the denoising function of dimension , then for any and , the score-matching error is connected to the denoiser approximation error as follows

 ∥∥Dσ(x)−DMMSE,ν(x)∥∥2=∥∥θσ(x)−∇xlogp(x,ν)∥∥2=∥∥∥∂θσ(x)∂x−∇2xlogp(x,ν)∥∥∥2 (14)

It is important to stress that the equivalence between Eq. (13) and (14) holds also in the case where the parameter vector of the denoiser is not optimal. The Jacobian matrix of dimension is computationally prohibitive to calculate and store at each iteration of the solver in high dimensions. In Section 5 we propose a way to approximate the Jacobian-vector products required within the proposed iterative sub-sampled Newton solver.

### 3.2 Review of the Newton Method

In this section we build the analysis of the sub-sampled Newton method by first summarizing the Newton method. Let be a closed, convex, and twice-differentiable function that is bounded below. Given the current iterate , the Newton method generates a new iterate by performing a constrained minimization of the second-order Taylor expansion

 xt+1=argminx∈C[12⟨x−xt,∇2xtg(xt,y)(x−xt)⟩+⟨∇xtg(xt,y),x−xt⟩] (15)

where for the traditional Newton’s method represents the full Hessian matrix. In the following derivations, we simplify the notation of the gradient and the Hessian with and respectively, hiding the dependence over . By setting the gradient respect to equal to zero, it follows the fixed-point update for the exact full Newton’s method

 ∇2g(xt)(x−xt)=−∇g(xt)⇒xt+1=xt−αt[∇2g(xt)]−1∇g(xt) (16)

The two forms of updates presented in (16) highlight that 2 different types of solvers can be employed; in the first case, we solve a linear system in order to estimate the vector and this is generally obtained by using the CG algorithm. Alternatively, it is possible to directly invert the square Hessian matrix. The step-size in (16) can be assumed constant, .

In the following Section, we describe the proposed Denoising-IHS algorithm which uses a stochastic approximation of the Hessian in order to compute with reduced computational cost, instead of calculating an Hessian matrix and solving the linear system in (16) exactly which would be remarkably expensive in terms of time. We assume that the gradient is computed exactly and the Hessian is approximated by random projection to lower dimension by applying a sketching matrix such that .

## 4 Proposed Denoising-based Sub-sampled Newton Method

The Denoising-IHS algorithm minimizes the original MAP-driven regularized composite cost function of the form in Eq. (11) and we utilize a partially sketched Newton update which means that a sketch of the Hessian is performed while retaining the exact form of the Hessian associated with the regularizer, i.e. . The reason behind this choice is while it is possible to reduce the computational complexity related to the loss function by using fast non-uniform rows sampling, for generic black-box denoising function the computational cost of applying the sketching to the regularizer is higher than using the original function. The algorithm assumes that the square root of the Hessian matrix of dimension , indicated as , can be computed for the CT model in (10), it follows that

being a diagonal matrix. The sketched Hessian of the loss function is constructed as

 Htf(xt)=(St[∇2f(xt)]12)TSt[∇2f(xt)]12=G(xt)TG(xt),St∈Rs×n (17)

with , while the full Hessian of the denoising-based regularizer is denoted by with a positive semi-definite matrix. Eq. (17) represents an unbiased estimator for the full Hessian of the loss

 Et[([∇2f(xt)]12)T(St)TSt[∇2f(xt)]12]=∇2f(xt) (18)

where the expectation is taken over the iteration index conditioned over the random sketches selected at previous iterations . The splitting leads to the partially sketched update

 xt+1 = argminx∈C[12(x−xt)THt(xt)(x−xt)+⟨∇g(xt),x−xt⟩] (19) Ht(xt) =

By substituting into (19) the definition for the loss in (9) and the denoising-based regularizer in (13), the expressions for the gradient and the Hessian become

 ∇g(xt) = ∇f(xt,y)+∇ρν,σ(xt)=ATΣ−1(Axt−y)+1ν(Dσ(xt)−xt) G(xt) = St[∇2f(xt)]12=St[Σ−12A] Ht(xt) = G(xt)TG(xt)+1ν(J[Dσ(xt)]−I) (20)

The update is where is obtained as solution of the system of equations

 Ht(xt)pt=−∇g(xt) (21)

We consider to solve the system (21) approximately using the Conjugate gradient method. We focus on the analysis of the accuracy of the solution and the complexity. The inner update is

 xt+1=xt+ptr (22)

where is the approximate solution after

inner iterations of the CG solver. We consider the case where the gradient is fully computed which is a realistic scenario in case of separable loss function, and the Hessian is non-uniformly sampled. To perform an analysis of the complexity of the Sketch Newton-CG method, we remind that for a symmetric positive definite linear system the convergence of CG is not uniform but depends on the difference between the minimum and maximum eigenvalues. In detail, the convergence rate of the CG method varies at every iteration depending on the spectrum

of the positive definite matrix . After steps of the CG method applied to the linear system in (21), the iterate satisfies

 ∥ptr−pt∗∥2C≤(λn−r−λ1λn−r+λ1)2∥pt0−pt∗∥2C,C=Ht(xt)

where indicates the exact solution and . The linear CG algorithm allows to solve the system (21) using only matrix/operator-vector products with . therefore, we use CG as an Hessian-free inner solver which does not require to store the full Hessian matrix. It is possible to obtain sketched Hessian-vector products by performing two matrix-vector products with respect to the sketched square-root Hessian. In particular, we compute for each inner iteration of the CG solver the following variable vector

 vi=Ht(xt)pi=(G(xt)TG(xt)+1ν(J[Dσ(xt)]−I))pi (23)

using an auxiliary vector and performing two steps as follows

 zi=G(xt)pi,vi=G(xt)Tz+1νJ[Dσ(xt)]pi−1νpi (24)

Furthermore, by using the CG solver the matrix is fixed through the inner iterations (being the index of the outer iteration as in Algorithm LABEL:Algo:SN_PnP and the index of the CG solver). In the following, we will not explicitly indicate the index associated with the vector for notation simplicity. Since depends by the Jacobian of the denoiser, as shown in (4), it would be unfeasible to use gradient based solvers which requires to calculate the Jacobian at each inner iteration. Instead, with CG only directional derivatives of the denoiser, i.e. , are needed to be computed and not the full Jacobian; in Section 5 we present fast methods to estimate accurately the directional derivative of learned CNN-based denoisers.

algocf[!h]

Although in the worst case, CG will require iterations to converge (thus requiring the evaluation of products ), the behavior CG can achieve significant progress in the minimization of the objective function after a reduced number of iterations . The following Theorem outlines that the proposed Denoising-IHS algorithm LABEL:Algo:SN_PnP can achieve quadratic convergence, with constants dependent on the minimum and maximum eigenvalues of the full Hessian and the condition number of the problem as defined in the proof in Appendix A

###### Theorem 4.1.

Let be the iterates generated by the non-uniform sampling Newton-CG algorithm with CG iterations are performed. Then,

 Et∥xt+1−x∗∥≤C1∥xt−x∗∥2+(C2+C3)∥xt−x∗∥2 (25)

## 5 Denoiser directional derivative estimators

The major computational cost of the inexact Newton solver relies on the calculation at each outer iteration of the Jacobian matrix, of dimensions , of the denoising function in Eq. (4) whose computation is unfeasible in high dimensions. On a closed look, CG requires instead to calculate explicitly the direction derivative that for the -dimensional vector ,can be computed using finite differences at the cost of a single denoising evaluation as in Eq. (13) through the following expression

 J[Dσ(xt)]pt=limϵ→0Dσ(xt+ϵpt)−Dσ(xt)ϵ (26)

Although the finite differences approach suffers from numerical instabilities and also requires the computationally expensive evaluation of non-linear functions, there is an efficient procedure for computing exactly the Jacobian-vector product for neural networks.

### 5.1 Automatic differentiation for CNN denoisers

The procedure to calculate is based on deriving the right multiplication of Jacobian matrices. We denote the denoiser and let be the right multiplication vector and

be a dummy variable. The left multiplication of the Jacobian matrix, i.e.

, can be calculated by back propagation, which is common in reverse mode automatic differentiation packages

 uT∂D∂xT=uTJ[Dσ(xt)] (27)

We exploit (27) to compute by defining . Since is a vector in , the mapping of can be defined as the function . We can take derivative of with respect to , while providing the left multiplying vector :

 pt∂g∂u=pt∂(uTJ[Dσ(xt)])∂u=ptJ[Dσ(xt)]T=(J[Dσ(xt)]pt)T (28)

## 6 Block Leverage Scores Sketching

We focus on sketches based on non-uniform random block rows sampling which can be easily and efficiently implemented on GPU-based imaging operators such as CT [van2016]. Given a probability distribution , we randomly sample the rows of the matrix a total of times with replacement from the given probability distribution. The rows of are independent and take the values with probability for , where is the -th canonical basis vector. The discrete probability distribution is chosen according to the ridge leverage scores.

###### Definition 6.1.

(Ridge Leverage Scores) Given and a scalar , then for , the -th leverage scores of is defined as .

We consider to estimate the ridge leverage scores of the square root matrix of the Hessian matrix of the loss function, i.e. as defined in (10). The idea is to adaptively penalize the leverage scores of the Hessian-related matrix according to the noise level estimated through the trace of the Jacobian of the denoiser based regularizer . Defining , we estimate the trace

 Hdρ=Trace[Hρ]=1ν(Trace[J[Dσ(xt)]]−1) (29)

We exploit the property that if is independent of (where the expectation is taken over

). Therefore, given a noise realization normally distributed

, we propose to compute the vector instead of the matrix . The expectation over is calculated using a Monte-Carlo method, i.e. generate i.i.d. samples vectors, estimate the divergence for each vector and obtain the global divergence by averaging:

 Trace[J[Dσ(xt)]]=1mK∑j=1nTj(Dσ(xt+ϵnj)−Dσ(xt)ϵ) (30)

Given the vectorized image lying in a high dimensional space, it has been empirically observed [ramani2008] that we can accurately approximate the expected value using only a single random sample, i.e. . In all the simulations we have used the Monte Carlo method with .

###### Definition 6.2.

(Block partial leverage scores) Given and the scalar defined in (29), let the leverage scores of the matrix . The block partial leverage score for the -th block is defined as

 lHdρi(B)=iNd∑j=(i−1)Nd+1lj,i=1,…,Np (31)

and the non-uniform sampling distribution is

###### Theorem 6.1 ([xu2016]).

Given , and , let be the block partial leverage scores and Let construct the sketch , as in (17), by sampling the -th block of with probability and rescaling it by . If with probability at least , then the following condition holds

 ∥∥(GTSTSG+Hdρ)−(GTG+Hdρ)∥∥2≤ϵ∥∥GTG+Hdρ∥∥2 (32)
###### Corollary 6.1.

Let focus on the case of one-row sampling, and

a scaled identity matrix,

. Let define

as the singular value decomposition. Then, the leverage score is

 lγi=gi(GTG+mγI)†gi=n∑j=1σ2jσ2j+mγu2ij,i=1,…,m (33)

and therefore, the effective -dimension, , is much smaller than

 neff(G)=m∑i=1lγi=n∑j=1σ2jσ2j+mγ≪n (34)

if

is larger than the majority of singular values of

.

It is worth noting we have used the notion of SVD to show the property of the dimensionality reduction but using the SVD is not practical for calculating the leverage scores. Instead, we will show how to use fast matrix decomposition through the FFT.

### 6.1 Convolutional operators

In principle, computing the leverage scores for a generic matrix has a time complexity equivalent to doing an SVD which is computationally demanding in high dimensions. We focus our attention on convolutional operator which can be used to describe the structure of the physical CT operator. We show that the convolutional structure allows to compute efficiently the leverage scores which determines the non-uniform random sub-sampling of the Hessian.

###### Definition 6.3.

(CT Convolutional operator) The continuous 2D X-ray Transform is a convolutional linear operator which computes the line integral of a function in the 2D input space. The Fourier central slice Theorem states that where is the 2D Fourier transform (FT), is the coordinate transform operator from Cartesian to polar coordinates, and is the inverse 1D FT with respect to .

The output of the linear operator is the sinogram that is a function of and the polar space variable . Both and

are normal-convolutional operators since in the frequency domain

 RTR=FH(Ωω−1)T(F−1γ)HF−1γΩω−1F(a)=FH|detJω|F(b)=FHD(1|ρ|)F (35)

where (a) follows from being an unitary operator and (b) derives from the back-projection CT filter formulation where defines the Jacobian of and is the diagonal polar Fourier space operator. From the continuous to the discrete domain, the Fourier-based Radon transform can be written as [matej2004iterative, o2006fourier]:

 R=F−1γΩω−1F (36)

where is the 2D unitary discrete Fourier transform operator which takes as input the image of dimensions , with . The operator performs a discretized version of the continuous coordinate transform in (6.3) which outputs a matrix of polar coordinate samples that are equally-spaced along at the discrete locations for . The non-uniform FFT operator takes as input the input image matrix and output a matrix of dimensions ; applies the 1D unitary discrete Fourier transform (DFT) matrix separately to each of the radial lines vectors of dimension and it is defined as the Kronecker product between a 1D DFT matrix and the identity matrix, i.e. . Therefore, the final output is a vector of dimensions where is the number of detectors and is the number of angles (or number of projections) in agreement with (3).

## 7 Numerical simulations

We have preliminary tested the Denoising-IHS algorithm using the following parameters and geometry; the X-ray spectrum is obtained by using the Spektr [siewerdsen2004] with tube potential keV and initial average photon flux of . Figure 1 shows the spectrum in the energy interval keV of the five discrete selected energy bins. In Figure 1(a), the mass attenuation coefficients of the materials to be estimated are shown; it is interesting to note that the two distinct K-edges associated with the iodine and gadolinium appear at energies corresponding to two distinct energy bins as shown in Figure 1(b).

To generate the multichannel projections, we use the ASTRA-toolbox [van2016] to implement the forward operator associated with a fan-beam CT geometry; the distance from the source to the detector is 45 cm, the source to object distance is 31 cm and the detector cell size is 50 . The pixel resolution of the reconstructed image is and the data is simulated on a double grid of

to avoid the inverse crime, the number of projection angles is set to 360 and the Poisson distributed noise has been added to the projection data. Figure

2(a) shows the plot of the probabilities of sampling each projection according to the leverage score metric. This plot is obtained by calculating exactly the leverage score of the ASTRA operator offline using the power method. Many projections have a very low score and this shows that a random one-by-one sampling would lose an important quantity of information or it would require to run the algorithm on a number of samples of the order of the projection’s dimension.

Instead, the block sampling procedure takes advantage, by averaging, of this block structure of the leverage scores induced by the CT operator as depicted in Figure 2(b) where each vertical red line corresponds to an adjacent angle and the inner point coincides with the detector array elements. Although this plot represents the exact sampling probabilities, for computational constraint the algorithm estimates these probabilities by exploiting the spectral decomposition of the ASTRA operator through (36).

We have tested the proposed algorithm on a synthetic phantom composed of three basis materials, water, iodine and gadolinium. The geometric shape of the 2D phantom is build used the model number 4 of the TomoPhantom Matlab software [kazantsev2018Tomo] containing three materials, water, iodine and gadolinium. The phantom is constituted of different circles with increasing diameter as shown in Figure 3(a). As denoiser-based regularizer , we have used the implementation of the U-Net [ronneberger2015]

trained on a large set of 1000 images obtained by rotating and placing in different position the circles that are included in the test image. Furthermore, we have used the Autograd software in pytorch to compute the Jacobian-vector product of the U-Net denoiser

[paszke2017]. The Hessian block sub-sampling for the Denoising-IHS was set to . Figure 3(b) shows the quantitative reconstruction of the three materials while Figure 3(c) reports the results obtained using the multi-material framework (OS-PWSQS) proposed in [Long2014] which considers a model-based optimization with an edge-preserving hyperbola regularizer. By comparing the images for each materials, it is possible to clearly note how the the Denoising-IHS approach is able to separate the each material and to reduces the noise in the reconstruction, as can be seen in particular for the water and iodine. The iodine image in figure 3(c) indicates that algorithm [Long2014] does not manage to fully separate the iodine and gadolinium materials.

Figure 4(a) shows the comparison between the root-mean-square error (RMSE), i.e. of the vector containing the concatenation of the estimated materials respect to the ground truth between the OS-PWSQS algorithm and the proposed algorithm. The solution achieved by the two algorithms is different because the cost function, both data fidelity term and regularizer, that OS-PWSQS [Long2014] aims at minimizing is different from (11). By analyzing Figure 4(a), it is possible to highlight that the Denoising-IHS algorithm achieves a more accurate solution. Looking in detail at the blue curve, it is possible to note some local plateaus with sudden reduction of the error and this is in agreement with the fact that in the outer loop the divergence of the denoiser is calculated to update the ridge leverage scores (29) which has a cost equivalent of applying the denoiser as stated in (30). Furthermore, Figure 4(b) shows the comparison in objective function between the convergence plot (semi-log) in dB of the Denoising-IHS algorithm and the solver which consider the full Hessian of the data loss at each iteration. This plot highlights how the strategy for block sub-sampling of the Hessian leads to a noticeable reduction in computation (almost 2.2 times) while both reach a similar minimum value in the cost function. We can analyze that at early iterations when the estimate is inaccurate both algorithms have similar accuracy/computation trade-off, while going further the Newton solver with full Hessian requires higher computation per iteration and overall it converges close to the Denoising-IHS solution but with increased computational time. From Figure 4(b) the time reduction is slightly less compared to the measurement sub-sampling rate and this could be anticipated since the ASTRA operator is not exactly linear in the number of measurements and the Denoising-IHS requires an additional denoising step at each outer iteration for the sampling probabilities update.

The following numerical test considers a more structured dataset, derived from the Adult Reference Computational Phantom (ICRP Publication 110, 2009), which is a segmented image of defined chemical composition to represent real tissues [mason2017]. Figure 5(a) shows two slices selected for testing through the pelvis with image resolution with a display window of .

Figure 5(b) reports the qualitative estimation of the bone component in the testing images; it is possible to note how the Denoising-IHS algorithm is able to tackle the contours of the bone precisely in both images. Figure 5(c) shows the qualitative estimation of the fat component in the testing images; in this case the Denoising-IHS algorithm manages to estimate correctly most of the complicated structure in the pelvis with some spurious pixels missing.

To evaluate the quantitative estimation, Figure 6 reports the normalized oracle mass attenuation of the fat and the estimated one with Denoising-IHS related to longitudinal section at pixel of the first test image.

This plot confirms that the proposed algorithm is able to accurately estimate the mass attenuation with and error which is less than 3%. In particular, Table 1 reports the root mean square error (RMSE) for different materials compared to the oracle ground truth.

## 8 Preliminary Experimental Results

We have assessed the algorithm with a real acquisition using a photon counting detector CT scanner depicted in Figure 7(a) with a specimen in (b) containing cylinders of iodine and PMMA at different concentrations. The geometry is cone beam with source to axis of rotation distance of 8.1 cm and source to origin distance of 11.4 cm.

Figure 8(a) show the decrease of normalized cost function for the Denoising-IHS subsamplig case and full Hessian; it is worth noting the reduction in computation although is slower compared to the results achieved in the numerical simulations. Table (2) and Figures 8(b-c) show the quantitative and qualitative error in concentration of the iodine.

## 9 Conclusion

We propose a preliminary study for an efficient second order optimization method, Denoising-IHS, for multi-material decomposition in spectral CT which combines dimensionality reduction through Hessian matrix non-uniform random sub-sampling and regularization by denoising. A practical procedure for efficiently calculating the non-uniform sampling probabilities is shown which involves Fourier spectral approximation for estimating the ridge leverage scores. Furthermore, we analyze the CG solver which does not require the computation of the full Jacobian of the denoiser but only directional derivatives with a substantial reduction in computation. This problem is tested on numerical spectral CT simulations with Poisson noise model. We evaluate the convergence of the algorithm and we are able to show improvements both in terms of qualitative and quantitative accuracy compared to state-of-the-art algorithms. Furthermore, we show reduction in the computational complexity respect to the Newton method involving the full Hessian matrix of the loss function. Further analysis to assess the proposed framework on more complex phantom with experimental acquisitions will be conducted together with a tighter analysis on the lower bounds for the dimensional reduction that it is possible to theoretically achieve and an analysis of the robustness of the algorithm to training sample dimension.

## Acknowledgments

The research leading to these results has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement no. 713683 (COFUNDfellowsDTU). We would like to sincerely thank Jan Kehres for providing the experimental spectral CT dataset. The experimental spectral CT dataset used in Section 8 is available in [Perelli2021data].

## Appendix A Convergence Analysis

Recalling the definition of the function that we aim at minimizing as in (11) where we hide for simplicity the dependency over , we use the denotation