X-ray computed tomography (CT) is an important application of inverse problems, which reconstructs the attenuation coefficients of an object from the damping of X-rays. Since different materials have different attenuation coefficients, by X-ray CT technique we are able to see interior of the object. As X-ray CT is widely applied in a lot of fields, many reconstruction methods have been proposed and applied in the industry, such as the filtered back projection algorithm [kuchment2014radon], and algebraic reconstruction techniques [kak2001principles, hansen2012air].
After reconstruction, very often we would apply image segmentation technique to distinguish some interested regions or objects. A main drawback of separating reconstruction and segmentation is the error propagation, i.e., the errors in the reconstruction will continue to affect the segmentation. To overcome this drawback, several simultaneous reconstruction and segmentation (SRS) methods were proposed in recent years. As far as we know, the first SRS method in CT was proposed in [ramlau2007mumford] based on Mumford-Shah level-set approach . In [van2008simultaneous], another SRS method according to hidden Markov measure field model (HMMFM) [marroquin2003hidden]
was proposed, where the segmentation result is obtained through a probability map. For CT with shadowed data, an SRS method based on Potts model[potts1952some] was proposed in [10.1007/978-3-319-58771-4_25]. In order to segment the objects that have different patterns during CT reconstruction, in  the dictionary learning technique was introduced into SRS method.
Following the idea from [marroquin2003hidden], in [doi:10.1080/17415977.2015.1124428]
the means and variances of the segmentation classes were used as prior and a new SRS method was proposed. Numerical results show that this SRS method can significantly improve the accuracy of reconstruction and segmentation. Further, this method was extended by withdrawing the prior information on the variance, see[10.1007/978-3-319-58771-4_21]. In [doi:10.1080/17415977.2015.1124428]
, by using Bayes’ rule and the maximum a posteriori (MAP) estimate with proper prior information on the means and variances of the classes, the authors proposed the following variational model:
where is the system matrix of CT scanner, is the image of attenuation coefficients, and contains all measurements. Here, we assume that the image consists of classes, and define with as the probability of pixel belonging to the th class. Then we have the constraint, and for all and . Furthermore, and
are the mean and standard deviation of theth class, respectively. In addition, are the regularization parameters, and denotes the regularization according to the prior information. In [doi:10.1080/17415977.2015.1124428], Tikhonov regularization [tikhonov1943stability] and total variation (TV) regularization  have been tested. In the last term of (1), which is called as log-sum term, because the logarithmic operator and summation operator are non-separable, the constrained minimization problem in (1) is very difficult to solve. When solving in [doi:10.1080/17415977.2015.1124428], the authors have to use an approximation of the log-sum term. Moreover, the algorithm is time-consuming, which limits its applications to large-scale CT problems.
In this paper, we propose a new method to solve the model (1). Inspired by the work in [5, 2], the log-sum term can be transformed into a sum-log term by introducing an auxiliary variable, then the sub-problems become much easier to solve. Numerical results show that our method can provide comparable results as in [doi:10.1080/17415977.2015.1124428] with much less CPU time.
The rest of the paper is organized as follows. In Section 2, the method in [doi:10.1080/17415977.2015.1124428] is briefly reviewed. In Section 3, by transforming the log-sum term, we introduce a new method that can solve the minimization problem in the model (1) without simplification. Furthermore, an improved model is proposed and some convergence results are given. In Section 4, numerical results are presented. Finally, conclusions are drawn in Section 5.
2 The method in [doi:10.1080/17415977.2015.1124428]
In [doi:10.1080/17415977.2015.1124428], the authors use alternating minimization method [csiszar1984information] to solve the model (1) as follows,
The last term in (1) is a log-sum term, which is non-separable to the classification index , thus it is very difficult to minimize the sub-problem according to . In order to solve the sub-problem (2), i.e.,
the authors applied two-step approximation to simplify the log-sum term. Denote
In the first step,
is approximated by a “flat” Gaussian distribution
Clearly, after using a single Gaussian distribution to approximate the sum of several Gaussian distribution in , the summation operator of the log-sum in the model (1) is eliminated, and the reconstruction is obtained by solving a simple least-squares problem.
The sub-problem (3) is
with the constraint and for all and . Because of the constraint and the non-differentiable TV term, a general constrained gradient descent method such as Frank-Wolfe algorithm [bertsekas1999nonlinear] could be applied here. Due to the singularity of the TV term, it is well known that gradient descent algorithm is very inefficient .
Obviously, solving the minimization problem in (1) is challenging due to the log-sum term. In this paper, we will take advantage of the commutativity of the log-sum operations studied in [5, 2], and transform the model (1) into several easy-to-solve convex sub-problems. Besides, we could provide some convergence results that were missing in [doi:10.1080/17415977.2015.1124428].
3 Our method and convergence results
In this section, we propose a new method that solves the model (1) and provide some convergence results of the new method.
3.1 Proposed method
Due to the log-sum term in (1), the sub-problem with respect to is non-convex and very difficult to solve. Inspired by the transformation of the log-sum operations introduced in [5, 2], we can transform the log-sum term into a much simpler form. To do so, we would need the following proposition.
According to the log-sum term in (1), we define
where and . Note that here we let to avoid being 0. Furthermore, in our method we use TV regularization on each column of , i.e.
where represent the neighbor pixels of point in the horizontal and vertical directions, respectively. At the boundary of the image domain, the value of the nearest pixel is repeated.
In order to solve the minimization problem (8), we apply alternating minimization method [csiszar1984information], i.e., we solve the sub-problems with respect to , and alternately. The detailed algorithm is given in Algorithm 1.
In Algorithm 1, the objective function in the sub-problem (9) on is quadratic, so it can be efficiently solved by using CGLS method [bjorck1996numerical]. Since the sub-problem (11) is separable, we can solve element-wise. According to the first-order optimality condition together with Lagrange multiplier technique, we can easily obtain the closed-form solution:
In the following subsection, we will focus on solving the sub-problem (10).
3.1.1 The sub-problem
The difficulties of solving the minimization problem in (10) are mainly from the non-differentiable term , highly nonlinear and nonquadratic term , and the constraint . In order to split them apart, we introduce two auxiliary variables, , and let , while each of them take one of the difficulties from , , and , respectively. Now (10) becomes
Then, we apply the alternating direction method of multiplier (ADMM) [boyd2011distributed] to solve it.
By introducing two Lagrangian multipliers for the linear constraint and , respectively, we obtain the augmented Lagrangian :
where and are positive penalty parameters. According to ADMM, we need solve
in order to obtain in Algorithm 1, and the iterates in ADMM are generated as follows:
For sub-problem (14) we can solve it column-by-column on , and each column can be solved by applying split Bregman method  with Neumann boundary condition. In sub-problem (15), each element of can be solved separately. Based on the first-order optimaility condition, we obtain that the minimizer should satisfy the following equation
Due to the logarithmic operation, need be positive, and its solution has closed form:
Sub-problem (16) is a least-squares problem with a constraint on a convex set, and one approximate solution could be obtained by
where is a pre-defined small positive number.
3.2 An improved model based on (8)
If we look back to the model (8), we can see that the only regularization term in the model is on . In [doi:10.1080/17415977.2015.1124428], due to the log-sum term in (1), the segmentation information represented in is utilized to regularize the reconstruction, and there were not any smoothing requirement on . While in the proposed new model (9), is updated by using the unregularized , which brings a risk: the isolated points on unregularized might lead to isolated points in the reconstruction result . This phenomenon is presented in Figure 3 in our numerical results. In order to further improve the reconstruction, we add Tikhonov regularization on , and modify the model (8) as follows:
where denotes the discrete gradient operator using the forward finite difference scheme with Neumann boundary condition, and is a regularization parameter.
We list the detailed algorithm in Algorithm 2. The only difference between Algorithm 1 and Algorithm 2 is the added Tikhonov regularization term in the sub-problem (20) with respect to . With Tikhonov regularization the objective function is still quadratic, so we can apply CGLS method [bjorck1996numerical] to solve (20) efficiently.
3.3 Convergence results
For the sequence generated by Algorithm 2, every cluster point is a coordinatewise minimum point of .
We use Theorem 5.1 in [Tseng2001] to give our convergence result. For self-contained, we list Theorem 5.1 in the appendix. For more details, please see [Tseng2001]. Here we verify that our algorithm satisfies the assumptions of the theorem.
According to (19), we set
We have the following statements:
Assumption B1 (in Definition 6.3) is satisfied, due to that is continuous in its domain.
Assumption B3 (in Definition 6.3) is satisfied, because is continuous on its domain and are lsc.
Assumption C2 (in Definition 6.3) is satisfied, because the domain of is .
4 Numerical experiments
In this section, we present some numerical experiment results to demonstrate the performance of our methods. We compare them with the one introduced in [doi:10.1080/17415977.2015.1124428], where a simplified model is solved instead of (1). All numerical tests are done on a linux server equipped with CPU 2.30Hz and MATLAB R2018a. In the tests, when we use CGLS method to solve the sub-problem with respect to in (8) or (19), we set the maximum iteration number as 100 and the stopping rule as
where is the inner iteration index. When we solve the sub-problem on , we stop ADMM after 50 iterations or the condition
is satisfied. In the split Bregman method for solving (14), the stopping rule is
where is the iteration index inside split Bregman method. The parameter is set to 0.0001 for all experiments.
4.1 Experimental settings
In our numerical experiments, the phantoms are generated from AIR Tools package [hansen2012air] with the command phantomgallery, and the system matrix of CT scanner is obtained by calling paralleltomo. Without special mention, the size of phantoms is 64-by-64, the number of pixels in the detector is 91, and the projection angles are from to with the equal space . Then, the underdetermined rate of our inverse problem, is 0.667. In our tests, we assume that the measurements are corrupted by additive white Gaussian noise with mean 0 and standard deviation , where gives the noise level and denotes the true attenuation coefficients.
In order to illustrate the performance of the methods, we define the reconstruction error and segmentation error as
where is the reconstruction result, is the Dirac function, is the label of pixel in the segmentation result, which is given by
and denotes the true label for pixel . The regularization parameters in our method as well as in the method proposed in [doi:10.1080/17415977.2015.1124428] are chosen as the ones which give the smallest value.
4.2 Comparison on a piecewise constant phantom
We first compare our methods by solving (8) and (19) with the one proposed in [doi:10.1080/17415977.2015.1124428] on an 8-class piecewise constant phantom. The noise level is set as 0.05, and the prior information on the mean and standard deviation for each class is and for . In our methods, we set the parameters as .
In Table 1, we list the reconstruction error and segmentation error defined as in (23) together with CPU times in second for all three methods. All results shown in Table 1 are the average of running on 50 different noise realizations. It can be seen that the methods by solving (8) and (19) can achieve almost the same segmentation results, and they are better than the one from [doi:10.1080/17415977.2015.1124428]. Comparing reconstruction error, we can see that the method by solving (8) gives the largest error, which is due to not enough regularization on the reconstruction. By adding Tikhonov regularization in the model (19), we are able to obtain comparable results as the simplified model in [doi:10.1080/17415977.2015.1124428]. In addition, both our methods cost similar CPU time, which is less than the one in [doi:10.1080/17415977.2015.1124428] with around a factor .
|CPU Time (in second)|
|Method in [doi:10.1080/17415977.2015.1124428]||0.086||0.033||469.7|
|Method by solving (8)||0.106||0.027||93.2|
|Method by solving (19)||0.088||0.026||93.9|
In order to compare the results visually, in Figure 1 and 2 we show the reconstruction and segmentation results with respect to the best and worst case in these 50 tests, i.e., the results with the smallest and the largest values, respectively. It is clear that our methods provide more accurate segmentation results, which can be seen from the dark gray class in the middle red region, see the third row for zooming images. For the reconstruction results, the method in [doi:10.1080/17415977.2015.1124428] gives much smoother results than both the proposed methods.
In Figure 3, we use one example from the 50 tests to show the difference on in (8), (19) comparing with in [doi:10.1080/17415977.2015.1124428]. We treat as a segmentation using (24), and show it in the third row of Figure 3. Note that in the method proposed in [doi:10.1080/17415977.2015.1124428], there is no variable . As you can see, in method (8) the isolated points in marked by red squares lead to isolated points in the reconstructions. After adding regularization on in (19), the number of isolated points is obviously reduced, and we can clearly see the smoothness in the reconstruction result from method (19). This indicates that the regularization term is helpful for improving the reconstruction. According to the reconstruction from [doi:10.1080/17415977.2015.1124428], although there is no regularization on , the reconstruction is still rather smooth without isolated points. It means that the simplification of the model in [doi:10.1080/17415977.2015.1124428] potentially gives smoothing regularization on the reconstruction through the segmentation .
4.3 Comparison on a smooth phantom
In this experiment, we compare the methods on a smooth phantom with 3 different classes, which is much more difficult to segment comparing with piecewise constant phantoms. The noise level is , and the prior information on the mean and standard deviation for three classes are , , and . Other parameters are set as .
In Table 2, we list the averages of the reconstruction errors, segmentation errors and CPU times on 50 experiment tests with different noise realizations. We can see that our methods by solving the models (8) and (19) achieve similar segmentation results, and the method proposed in [doi:10.1080/17415977.2015.1124428] provides the smallest segmentation error. Comparing the reconstruction results, it turns out that our method with the model (19) gives the best reconstruction results, followed by [doi:10.1080/17415977.2015.1124428]. It is obvious that Tikhonov regularization in the model (19) plays an important role, which reduces the reconstruction error significantly. In addition, our methods cost only CPU time comparing with the method with simplified model.
|CPU Time (in second)|
|Method in [doi:10.1080/17415977.2015.1124428]||0.203||0.166||374.9|
|Method by solving (8)||0.215||0.177||28.8|
|Method by solving (19)||0.195||0.172||33.7|
4.4 Comparison of CPU times on different resolutions
In this section, we compare our methods with the method in [doi:10.1080/17415977.2015.1124428] with respect to CPU times. To do so, we generate the piecewise constant phantom with different resolutions, then adjust the number of the projection angles such that the underdetermined rate is still kept as 0.667. The test results are the averages on 3 different noise realizations.
|Method in [doi:10.1080/17415977.2015.1124428]||469.7||1981.4||9447.3||–|
|Method with (8)||93.2||161.2||783.5||3224.2|
|Method with (19)||93.9||138.0||560.4||2928.7|
In Table 3, we list the CPU times in second for all three methods. Note that in the case of 512-by-512 phantom, we could not apply the method in [doi:10.1080/17415977.2015.1124428] due to heavy computational cost and limited memory. It is obvious that both our methods cost much less CPU times than the one in [doi:10.1080/17415977.2015.1124428]. When the resolution increases, the method with Tikhonov regularization, i.e., the one solving the model (19), utilizes less and less computing time compared to the one in [doi:10.1080/17415977.2015.1124428], with a factor of , and less time compared to method (8). The latter one is because, the sub-problem on is better conditioned and less iterations are needed in order to reach stopping rule.
All the above comparison indicate that the proposed algorithm is faster than the method in [doi:10.1080/17415977.2015.1124428]. There might be three reasons: a, the objective function of the Frank-Wolf algorithm is singularity due to the TV norm; b, the dimension of is large; c, although there are more variables to solve in the proposed method, the proposed method is separable and many sub-problems have close-form solutions.
In this paper, we propose two new methods for simultaneous reconstruction and segmentation in X-ray CT application. By using the commutativity of log-sum operations, the original model proposed in [doi:10.1080/17415977.2015.1124428] could be solved more efficiently. Although one more variable is introduced, the energy function becomes separable and each sub-problem is convex and easy-to-solve. We also show some convergence results, and numerically discover the role of the simplification steps in [doi:10.1080/17415977.2015.1124428]. Numerical results show that the proposed methods provide comparable reconstruction and segmentation results with much less CPU time.
We thank Hans Martin Kjer from Technical University of Denmark for providing us the codes for the method introduced in [doi:10.1080/17415977.2015.1124428]. Y. Dong acknowledges the support of the National Natural Science Foundation of China under Grant 11701388 and the Villum Foundation under Grant 25893. C. Wu was supported by National Natural Science Foundation of China under Grants 11871035 and 11531013; and Recruitment Program of Global Young Expert.
-  (2009) The split Bregman method for L1-regularized problems. SIAM Journal on Imaging Sciences 2 (2), pp. 323–343. Cited by: §3.1.1.
-  (2013) A weighted dictionary learning model for denoising images corrupted by mixed noise. IEEE Transactions on Image Processing 22 (3), pp. 1108–1120. Cited by: §1, §2, §3.1, Proposition 3.1.
-  (1989) Optimal approximations by piecewise smooth functions and associated variational problems. Communications on Pure and Applied Mathematics 42 (5), pp. 577–685. Cited by: §1.
-  (1992) Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena 60 (1), pp. 259–268. Cited by: §1.
A unified continuous optimization framework for center-based clustering methods.
Journal of Machine Learning Research8 (Jan), pp. 65–102. Cited by: §1, §2, §3.1, Proposition 3.1.
-  (1996) Iterative methods for total variation denoising. SIAM Journal on Scientific Computing 17 (1), pp. 227–238. Cited by: §2.
-  (2010) Augmented Lagrangian method, dual methods, and split Bregman iteration for ROF, vectorial TV, and high order models. SIAM Journal on Imaging Sciences 3 (3), pp. 300–339. Cited by: §3.1.1.
All the following definitions and theorem come from [Tseng2001].
Let have the following form:
where , . is proper.
Block coordinate descent (BCD) method:
Initialization. Choose any .
Iteration with . Given , choose an index and compute a new iterate
[Essentially cyclic rule] There exists a constant such that every index is chosen at least once between the th iteration and the ()th iteration, for all .
More assumptions about .
(B1) is continuous on .
(B2) For each and , the function is quasiconvex and hemivariate.
(B3) are lsc.
(C1) is open and tends to at every boundary point of .
(C2) , for some .
is quasiconvex if
is hemivariate if is not constant on any line segment belonging to .
is a coordinatewise minimum point of , if and
if is strictly convex, then is quasiconvex and hemivariate.
Theorem 6.4 (Theorem 5.1 in [Tseng2001]).
Suppose that satisfy Assumptions B1, B2, B3 and that satisfies either Assumption C1 or C2. Also, assume that the sequence generated by the BCD method using the essentially cyclic rule is defined. Then, either , or else every cluster point is a coordinatewise minimum point of .