1 Introduction
For sparse compressed signals, Donoho [1] proposed a compressive sensing theory that enables efficient data sampling at a much lower rate than the requirements, which can be modeled as follows in its standard formulation.
: In this paper, the matrices are represented in capital letters. For a matrix , , and denote the column, the row and element of , respectively; the represents the
norm of a vector. All the vectors are column vectors unless transposed to a row vector by a prime superscript
.Compressive sensing can be formulated as:
(1) 
where is an unknown vector, is an observed vector, is called the compressive sensing matrix (usually ), and is the unknown disturbance term or noise. Obviously, this is an underdetermined system of equations that does not have a sole solution. The leastsquares method is usually used to solve the problem.
(2) 
To suppress overfitting, some scholars [2, 3, 4, 5] added the norm regularizer to introduce sparse prior information.
(3) 
where denotes the number of nonzero components of and
is a hyperparameter to control the tradeoff between accuracy and sparsity. Many methods have been developed to solve this problem, such as the penalty decomposition method
[6], iterative hard threshold method [7], fixedpoint continuation method (FPC) [9], approximate gradient homotopy method (PGH) [10] and reweighted minimization method [11, 12].However, Eq. (3) is an NPhard optimization problem [13], which is highly discrete so that it is difficult to solve using a precise algorithm. Thus, we need to seek an effective approximation solution for this problem. The norm regularizer is introduced as a substitute for that of the norm. Such an approximation can be traced back to a wide range of fields, such as seismic traces [14], sparse signal recovery [15], sparse model selection in statistics (LASSO) [16], and image processing [17]. Many scholars have attempted to find the optimal solution to the following problem:
(4) 
It is a convex continuous optimization problem with a sole nondifferentiable point (), which can usually be transformed into a secondorder cone programming problem and then solved by methods such as interiorpoint methods. However, in largescale problems, due to the high algorithmic complexity, the interiorpoint method is very timeconsuming. Based on this, many researchers have solved the problem through simple gradientbased methods. Among them, the iterative shrinkagethresholding algorithm (ISTA) proposed by Chambolle [18, 19] has attracted much attention. ISTA updates through a shrinkage/soft threshold operation in each iteration.
(5) 
where represents the th iteration, is an appropriate stepsize and is the soft threshold operation function.
(6) 
Recently, the iteratively weighted shrinkagethresholding algorithm (IWSTA) has attracted much interest compared with ISTA, which outperforms their unweighted counterparts in most cases. In these methods, decision variables and weights are optimized alternatingly, or decision variables are optimized under heuristically chosen weights. It can be written as:
(7) 
The method assigns different weights to each component of in the iterative process and then updates . In this way, each subproblem is convex and easy to solve. Many algorithms have been developed to solve it. For example, the iterative support detection (ISD) method [20] assigns a weight of 0 to components in the support set and a weight of 1 to the other components during iteration, in which the support set at each iteration consists of all components whose absolute value is greater than the threshold. Zhao [12] proposed a new method to calculate the optimal
by the duality of linear programming based on the property of weighted range space. It alternately solves the weighted original problem with fixed weights to obtain a new solution
, and then it solves the duality problem to obtain a new weight . More variants are available in [11, 21] and its references. The details of some examples are listed in Tab. 1.Author  Termed  Weights  Min.  Max.  Regularizer 
Chambolle [18]  ISTA  1  1  1  
Candes [22]  IRL1  0  
Foucart [23]  WLP  0  
Wipf [24]  NW4  0  1 
There is a drawback for the above methods: the weights do not meet the usual definition of weights, and their sum is one, which leads them to be distributed in a very large range (see Tab. 1). Such weights are difficult to explain and can lead to an inaccurate result.
This paper proposes a new IWSTA type, called entropy regularized IWSTA (ERIWSTA), which obtains easily computable and interpretable weights. The weights automatically fall in the range of [0, 1], and the summation is one so that they can be considered a probability of the contribution of each attribute to the model. This is achieved by adding an entropy regularizer to the cost function and then using the Lagrange multiplier method to solve the problem. Experiments are executed for CT image restoration, and the results show that the proposed algorithm performs better in terms of both convergence speed and restoration accuracy compared with some stateoftheart methods.
2 Methodology
The main idea of the IWSTA type algorithms is to define a weight for each attribute based on the current iteration and then use them to obtain a new . In this section, we introduce an entropy regularizer to the cost function and obtain the following optimization model:
(8)  
where is a given hyperparameter.
As can be seen, while we do not use the entropy regularizer, can easily be solved as if , or 0 otherwise^{1}^{1}1The update rule can be easily explained by an example as
corresponds to the minimum value of {4, 1, 5}. It is very similar to the computation of the weights in the kmeans algorithm.
. It shows a simple fact that only one element of is 1, and the others are 0, which is grossly incompatible with the actual problem. Then, we add the negative entropy of the weights to measure the uncertainty of weights and stimulate more attributes to help signal reconstruction because it is well known that is minimized in information theory when(9) 
As follows, we will alternatively solve and in Eq. (2).
2.1 Update rule for
To solve , we introduce the Lagrange multipliers and then obtain the following Lagrange function. Note that is a constant with respect to , so we only construct the Lagrange function on .
(10) 
Set the partial derivative of with respect to and to zero and then obtain the following two equations.
(11)  
(12) 
From Eq. (11), we know that
(13) 
Substituting Eq. (13) into Eq. (12), we have
(14) 
It follows that
(15) 
Substituting this expression to Eq. (13), we obtain that
(16) 
Such weights certainly meet the constraints that and .
2.2 Update rule for
Inspired by the work of ISTA [8], a similar approach was adopted for the iterative update of . The construction of a majorization is an important step in obtaining the updating rule.
Definition 2.1
(Majorization) Denote as a majorization for at (fixed) if and .
Clearly, is nonincreasing under the updating rule because
(17) 
Then, we can construct the majorization for .
Proposition 2.1
Obviously, is a Lipschitz continuous and differentiable convex function, which has a majorization function at fixed current iteration as
Proof 2.1
It is wellknown that
(19) 
We compare and
and find that only the last terms are different. By singular value decomposition (SVD) of a symmetric definite matrix, we know that
, in whichis an orthogonal matrix consisting of all eigenvectors and
is diagonal consisting of all eigenvalues. Let , then(20) 
And it is also certain that if . Thus, the proof is established.
Now, we obtain the majorization for the cost function on .
(21) 
which can be reorganized as
(22)  
We find that the variables of the majorization are separable such that their minimizations can be easily obtained on each , respectively, as follows:
(23) 
Termed  MAE  
ISTA  
WLP  
NW4  
IRL1  
ERIWSTA 
Termed  MAE  
ISTA  
WLP  
NW4  
IRL1  
ERIWSTA 
3 Numerical experiments
Numerical experiments are provided to evaluate the performance of the proposed ERWISTA compared with ISTA, WLP, NW4 and IRL1 on the denoising problem of computed tomography (CT) images. All experiments are performed on an HP computer with a 2.5 GHz Intel(R) Core(TM) i74710MQ CPU with 12 GB of memory using MATLAB R2019a for coding. A simulated SheppLogan phantom with pixels was used to evaluate the algorithm performance, which is usually used in CT image analysis. There are many advantages to using simulated phantoms, including prior knowledge of the pixel values and the ability to control noise. We blurred the image by using a uniform kernel (applied by the MATLAB function "fspecial") and then added Gaussian noise by the following formula. We select and as examples of high and low noise levels for the following experiments.
(24) 
Fig. 1 shows the original and blurredandnoisy images. Based on the good timefrequency localization characteristics of the wavelet transform, it can effectively distinguish highfrequency noise from lowfrequency information. Therefore, the wavelet transform is used to reduce noise. The introduction of the wavelet matrix can also ensure the sparsity of the whole optimization algorithm. Without losing generality, let , where is the predetermined system matrix indicating the blurring information and represents the secondorder Haar wavelet matrix.
Mean absolute error (MAE) was used to measure the similarity to the true image. The value of the MAE was calculated by taking the average of the squared differences between the restored pixel values and the true pixel values.
(25) 
3.1 Hyperparameter selection
To select the penalty hyperparameter and the entropy weighted hyperparameter , we compare the MAE value after 100 iterations with respect to them from to . The results are shown in Fig. 2, demonstrating that ERIWSTA can achieve a consistently low MAE value over a wide range of and , which displays its robustness.
We also quantitatively display the optimal MAE and corresponding hyperparameters of the algorithms in Tabs. 2 and 3. An interesting observation is that, regardless of low or high noise levels, the restoration accuracy of our algorithm is always better than the others. These optimal hyperparameters are also used in the following experiments.
3.2 Algorithmic performance
Fig. 3 displays the cost function of the algorithms. As can be seen, the proposed algorithm always have the fast convergence speed, which arrive at the stable status early.
Fig. 4 shows the MAE curves of the algorithms with respect to the number of iterations. The proposed ERIWSTA always has superior performance to the other algorithms, rapidly obtaining the minimum MAE value.
Figs. 5 and 6 indicate the denoising results with the given noise level. As can be seen, all of the algorithms achieve a similar image. Howver, Figs. 7 and 8 quantitatively compares the vertical profiles of the restored images with that of the true phantom in the central row and column. We can see that ERIWSTA follows the outline of the phantom more accurately than the other algorithms.
4 Conclusions
In this paper, a new IWSTA type, called ERIWSTA, is proposed to solve the linear inverse problem. An entropy weighted term is introduced to measure the certainty of the weights, and then the Lagrange multiplier method is used to obtain a simple solution. The experimental results on image restoration of a synthetic CT head phantom show that ERIWSTA can achieve convergence faster with fewer iterations and better restoration accuracy than the other algorithms.
However, as with many existing algorithms, our algorithm also involves two main hyperparameters ( and ). In the future, we will focus on designing an automatic method to adjust these hyperparameters.
Acknowledgments
This work was supported by the Fundamental Research Funds for the Central Universities (N2019006 and N180719020).
References
 [1] E. Candes, J. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Commun. Pure Appl. Math. 59 (8) (2005) 12071223.
 [2] M. Elad, M. A. Figueiredo, Y. Ma, On the role of sparse and redundant representations in image processing, Proc. IEEE 1998, 6 (2010) 972982.

[3]
J. Wright, M. Yi, J. Mairal, G. Sapiro, T. S. Huang, S. Yan, Sparse representation for computer vision and pattern recognition, Proc. IEEE 1998, 6 (2010) 10311044.
 [4] M. H. Alsuwaiyel, Sparse Coding and its Applications in Computer Vision, World Scientific, 1999.

[5]
M. Gong, J. Liu, H. Li, Q. Cai, L. Su, A multiobjective sparse feature learning model for deep neural networks, IEEE Trans. Neural Netw. Learn. Syst. 26 (12) (2015) 32633277.
 [6] Z. Lu, Y. Zhang, Sparse approximation via penalty decomposition methods, SIAM J. Optimiz. 23 (4) (2012) 24482478.
 [7] T. Blumensath, M. E. Davies, Iterative thresholding for sparse approximations, J. Fourier Anal. Appl. 14 (5) (2008) 629654.
 [8] A. Beck, M. Teboulle, A fast iterative shrinkagethresholding algorithm for linear inverse problems, SIAM J. Imag. Sci 2 (1) (2009) 183202.
 [9] E. T. Hale, W. Yin, Y. Zhang, Fixedpoint continuation for minimization: Methodology and convergence, SIAM J. Optimiz. 19 (3) (2008) 11071130.
 [10] L. Xiao, T. Zhang, A proximalgradient homotopy method for the sparse leastsquares problem, SIAM J. Optimiz. 23 (2) (2012) 10621091.
 [11] Z. Xie, J. Hu, Rewighted minimization for sparse solutions to underdetermined linear systems, 2013 6th Int. Congress Image Sig. Process. (CISP) 22 (2012) 1065–1088.
 [12] M. K. Y.B.Zhao, A new computational method for the sparsest solutions to systems of linear equations, SIAM J. Optimiz. 25 (2) (2015) 11101134.
 [13] B. K. Natarajan, Sparse approximate solutions to linear systems, SIAM J. Comput. 24 (2) (1995) 227–234.
 [14] H. L. Taylor, S. C. Banks, J. F. Mccoy, Deconvolution with the norm, Geophysics 44 (1) (1979) 39.
 [15] S. S. Chen, D. Saunders, Atomic decomposition by basis pursuit, SIAM Rev. 43 (1) (2001) 129159.
 [16] R. J. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. B 58 (1) (1996) 267288.
 [17] P. V. Blomgren, J. D. Villasenor, B. Engquist, S. Osher, V. B. Bo, Total variation methods for restoration of vector valued images, SIAM J. Optimiz. 25 (2) 11101134.
 [18] A. Chambolle, R. De Vore, N.Y. Lee, B. Lucier, Nonlinear wavelet image processing: Variational problems, compression, and noise removal through wavelet shrinkage, IEEE Trans. Image Process. 7 (3) (1998) 319335.
 [19] M. Figueiredo, R. D. Nowak, An EM algorthim for waveletbesed image restoration, IEEE Trans. Image Process. 12 (8) (2003) 906916.
 [20] Y. Wang, W. Yin, Sparse signal reconstruction via iterative support detection, SIAM J. Imag. Sci. 3 (3) (2009) 462491.
 [21] D. L., Donoho, For most large underdetermined systems of linear equations the minimal 1norm solution is also the sparsest solution, Commun. Pure Appl. Math. 59 (6) (2006) 797829.
 [22] E. Candès, M. B. Wakin, S. P. Boyd, Enhancing sparsity by reweighted minimization, J. Fourier. Anal. Appl. 14 (56) (2008) 877905.
 [23] S.Foucart, M.Lai, Sparsest solutions of undetermined linear systems via minimization for , IEEE Trans. Image Process. 26 (3) (2009) 395407.
 [24] D. Wipf, S. Nagarajan, Iterative reweighted and methods for finding sparse solutions, IEEE J. Sel. Topics Signal Process. 4 (2) (2010) 317329.
Comments
There are no comments yet.