Entropy Regularized Iterative Weighted Shrinkage-Thresholding Algorithm (ERIWSTA): An Application to CT Image Restoration

by   Bingxue Wu, et al.
Northeastern University

The iterative weighted shrinkage-thresholding algorithm (IWSTA) has shown superiority to the classic unweighted iterative shrinkage-thresholding algorithm (ISTA) for solving linear inverse problems, which address the attributes differently. This paper proposes a new entropy regularized IWSTA (ERIWSTA) that adds an entropy regularizer to the cost function to measure the uncertainty of the weights to stimulate attributes to participate in problem solving. Then, the weights are solved with a Lagrange multiplier method to obtain a simple iterative update. The weights can be explained as the probability of the contribution of an attribute to the problem solution. Experimental results on CT image restoration show that the proposed method has better performance in terms of convergence speed and restoration accuracy than the existing methods.



There are no comments yet.


page 5

page 7


Learning optimal nonlinearities for iterative thresholding algorithms

Iterative shrinkage/thresholding algorithm (ISTA) is a well-studied meth...

An Iterative Shrinkage Approach to Total-Variation Image Restoration

The problem of restoration of digital images from their degraded measure...

Learned Block Iterative Shrinkage Thresholding Algorithm for Photothermal Super Resolution Imaging

Block-sparse regularization is already well-known in active thermal imag...

Robust lEarned Shrinkage-Thresholding (REST): Robust unrolling for sparse recover

In this paper, we consider deep neural networks for solving inverse prob...

An Online Plug-and-Play Algorithm for Regularized Image Reconstruction

Plug-and-play priors (PnP) is a powerful framework for regularizing imag...

A Trained Regularization Approach Based on Born Iterative Method for Electromagnetic Imaging

A trained-based Born iterative method (TBIM) is developed for electromag...

Iterative Shrinkage Approach to Restoration of Optical Imagery

The problem of reconstruction of digital images from their degraded meas...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

For sparse compressed signals, Donoho [1] proposed a compressive sensing theory that enables efficient data sampling at a much lower rate than the requirements, which can be modeled as follows in its standard formulation.

: In this paper, the matrices are represented in capital letters. For a matrix , , and denote the column, the row and element of , respectively; the represents the

-norm of a vector. All the vectors are column vectors unless transposed to a row vector by a prime superscript


Compressive sensing can be formulated as:


where is an unknown vector, is an observed vector, is called the compressive sensing matrix (usually ), and is the unknown disturbance term or noise. Obviously, this is an underdetermined system of equations that does not have a sole solution. The least-squares method is usually used to solve the problem.


To suppress overfitting, some scholars [2, 3, 4, 5] added the -norm regularizer to introduce sparse prior information.


where denotes the number of nonzero components of and

is a hyperparameter to control the tradeoff between accuracy and sparsity. Many methods have been developed to solve this problem, such as the penalty decomposition method

[6], iterative hard threshold method [7], fixed-point continuation method (FPC) [9], approximate gradient homotopy method (PGH) [10] and reweighted minimization method [11, 12].

However, Eq. (3) is an NP-hard optimization problem [13], which is highly discrete so that it is difficult to solve using a precise algorithm. Thus, we need to seek an effective approximation solution for this problem. The -norm regularizer is introduced as a substitute for that of the -norm. Such an approximation can be traced back to a wide range of fields, such as seismic traces [14], sparse signal recovery [15], sparse model selection in statistics (LASSO) [16], and image processing [17]. Many scholars have attempted to find the optimal solution to the following problem:


It is a convex continuous optimization problem with a sole nondifferentiable point (), which can usually be transformed into a second-order cone programming problem and then solved by methods such as interior-point methods. However, in large-scale problems, due to the high algorithmic complexity, the interior-point method is very time-consuming. Based on this, many researchers have solved the problem through simple gradient-based methods. Among them, the iterative shrinkage-thresholding algorithm (ISTA) proposed by Chambolle [18, 19] has attracted much attention. ISTA updates through a shrinkage/soft threshold operation in each iteration.


where represents the -th iteration, is an appropriate stepsize and is the soft threshold operation function.


Recently, the iteratively weighted shrinkage-thresholding algorithm (IWSTA) has attracted much interest compared with ISTA, which outperforms their unweighted counterparts in most cases. In these methods, decision variables and weights are optimized alternatingly, or decision variables are optimized under heuristically chosen weights. It can be written as:


The method assigns different weights to each component of in the iterative process and then updates . In this way, each subproblem is convex and easy to solve. Many algorithms have been developed to solve it. For example, the iterative support detection (ISD) method [20] assigns a weight of 0 to components in the support set and a weight of 1 to the other components during iteration, in which the support set at each iteration consists of all components whose absolute value is greater than the threshold. Zhao [12] proposed a new method to calculate the optimal

by the duality of linear programming based on the property of weighted range space. It alternately solves the weighted original problem with fixed weights to obtain a new solution

, and then it solves the duality problem to obtain a new weight . More variants are available in [11, 21] and its references. The details of some examples are listed in Tab. 1.

Author Termed Weights Min. Max. Regularizer
Chambolle [18] ISTA 1 1 1
Candes [22] IRL1 0
Foucart [23] WLP 0
Wipf [24] NW4 0 1
Table 1: Variants of weighted method.

There is a drawback for the above methods: the weights do not meet the usual definition of weights, and their sum is one, which leads them to be distributed in a very large range (see Tab. 1). Such weights are difficult to explain and can lead to an inaccurate result.

This paper proposes a new IWSTA type, called entropy regularized IWSTA (ERIWSTA), which obtains easily computable and interpretable weights. The weights automatically fall in the range of [0, 1], and the summation is one so that they can be considered a probability of the contribution of each attribute to the model. This is achieved by adding an entropy regularizer to the cost function and then using the Lagrange multiplier method to solve the problem. Experiments are executed for CT image restoration, and the results show that the proposed algorithm performs better in terms of both convergence speed and restoration accuracy compared with some state-of-the-art methods.

2 Methodology

The main idea of the IWSTA type algorithms is to define a weight for each attribute based on the current iteration and then use them to obtain a new . In this section, we introduce an entropy regularizer to the cost function and obtain the following optimization model:


where is a given hyperparameter.

As can be seen, while we do not use the entropy regularizer, can easily be solved as if , or 0 otherwise111The update rule can be easily explained by an example as

The solution is , and , in which

corresponds to the minimum value of {4, -1, 5}. It is very similar to the computation of the weights in the k-means algorithm.

. It shows a simple fact that only one element of is 1, and the others are 0, which is grossly incompatible with the actual problem. Then, we add the negative entropy of the weights to measure the uncertainty of weights and stimulate more attributes to help signal reconstruction because it is well known that is minimized in information theory when


As follows, we will alternatively solve and in Eq. (2).

2.1 Update rule for

To solve , we introduce the Lagrange multipliers and then obtain the following Lagrange function. Note that is a constant with respect to , so we only construct the Lagrange function on .


Set the partial derivative of with respect to and to zero and then obtain the following two equations.


From Eq. (11), we know that


Substituting Eq. (13) into Eq. (12), we have


It follows that


Substituting this expression to Eq. (13), we obtain that


Such weights certainly meet the constraints that and .

2.2 Update rule for

Inspired by the work of ISTA [8], a similar approach was adopted for the iterative update of . The construction of a majorization is an important step in obtaining the updating rule.

Definition 2.1

(Majorization) Denote as a majorization for at (fixed) if and .

Clearly, is nonincreasing under the updating rule because


Then, we can construct the majorization for .

Proposition 2.1

Obviously, is a Lipschitz continuous and differentiable convex function, which has a majorization function at fixed current iteration as



is larger than or equal to the maximum eigenvalue of


Proof 2.1

It is well-known that


We compare and

and find that only the last terms are different. By singular value decomposition (SVD) of a symmetric definite matrix, we know that

, in which

is an orthogonal matrix consisting of all eigenvectors and

is diagonal consisting of all eigenvalues. Let , then


And it is also certain that if . Thus, the proof is established.

Now, we obtain the majorization for the cost function on .


which can be reorganized as


We find that the variables of the majorization are separable such that their minimizations can be easily obtained on each , respectively, as follows:

Figure 1: The original and noisy head phantom images. head phantom with 256×256 pixels; and blurred image with a 55 uniform kernel and additive Gaussian noise with and .
Figure 2: 3D profile of and on MAE with different Gaussian noise levels: (a) and (b) .
Termed MAE
Table 2: The optimal MAE value and corresponding hyperparameter (Gaussian noise with ).
Termed MAE
Table 3: The optimal MAE value and corresponding hyperparameter (Gaussian noise with ).
Figure 3: Cost function versus iteration number for different Gaussian noise levels: (a) and (b) .
Figure 4: MAE versus iteration number for different Gaussian noise levels: (a) and (b) .
Figure 5: After 30 iterations, the denoising results of ISTA, WLP, NW1, IRL1 and ERIWSTA with Gaussian noise with .
Figure 6: After 30 iterations, the denoising results of ISTA, WLP, NW1, IRL1 and ERIWSTA with Gaussian noise with .
Figure 7: Horizontal central profiles of the restored images with different Gaussian noise levels: (a) and (b) ..
Figure 8: Vertical central profiles of the restored images with different Gaussian noise levels: (a) and (b) .

3 Numerical experiments

Numerical experiments are provided to evaluate the performance of the proposed ERWISTA compared with ISTA, WLP, NW4 and IRL1 on the denoising problem of computed tomography (CT) images. All experiments are performed on an HP computer with a 2.5 GHz Intel(R) Core(TM) i7-4710MQ CPU with 12 GB of memory using MATLAB R2019a for coding. A simulated Shepp-Logan phantom with pixels was used to evaluate the algorithm performance, which is usually used in CT image analysis. There are many advantages to using simulated phantoms, including prior knowledge of the pixel values and the ability to control noise. We blurred the image by using a uniform kernel (applied by the MATLAB function "fspecial") and then added Gaussian noise by the following formula. We select and as examples of high and low noise levels for the following experiments.


Fig. 1 shows the original and blurred-and-noisy images. Based on the good time-frequency localization characteristics of the wavelet transform, it can effectively distinguish high-frequency noise from low-frequency information. Therefore, the wavelet transform is used to reduce noise. The introduction of the wavelet matrix can also ensure the sparsity of the whole optimization algorithm. Without losing generality, let , where is the predetermined system matrix indicating the blurring information and represents the second-order Haar wavelet matrix.

Mean absolute error (MAE) was used to measure the similarity to the true image. The value of the MAE was calculated by taking the average of the squared differences between the restored pixel values and the true pixel values.


3.1 Hyperparameter selection

To select the penalty hyperparameter and the entropy weighted hyperparameter , we compare the MAE value after 100 iterations with respect to them from to . The results are shown in Fig. 2, demonstrating that ERIWSTA can achieve a consistently low MAE value over a wide range of and , which displays its robustness.

We also quantitatively display the optimal MAE and corresponding hyperparameters of the algorithms in Tabs. 2 and 3. An interesting observation is that, regardless of low or high noise levels, the restoration accuracy of our algorithm is always better than the others. These optimal hyperparameters are also used in the following experiments.

3.2 Algorithmic performance

Fig. 3 displays the cost function of the algorithms. As can be seen, the proposed algorithm always have the fast convergence speed, which arrive at the stable status early.

Fig. 4 shows the MAE curves of the algorithms with respect to the number of iterations. The proposed ERIWSTA always has superior performance to the other algorithms, rapidly obtaining the minimum MAE value.

Figs. 5 and 6 indicate the denoising results with the given noise level. As can be seen, all of the algorithms achieve a similar image. Howver, Figs. 7 and 8 quantitatively compares the vertical profiles of the restored images with that of the true phantom in the central row and column. We can see that ERIWSTA follows the outline of the phantom more accurately than the other algorithms.

4 Conclusions

In this paper, a new IWSTA type, called ERIWSTA, is proposed to solve the linear inverse problem. An entropy weighted term is introduced to measure the certainty of the weights, and then the Lagrange multiplier method is used to obtain a simple solution. The experimental results on image restoration of a synthetic CT head phantom show that ERIWSTA can achieve convergence faster with fewer iterations and better restoration accuracy than the other algorithms.

However, as with many existing algorithms, our algorithm also involves two main hyperparameters ( and ). In the future, we will focus on designing an automatic method to adjust these hyperparameters.


This work was supported by the Fundamental Research Funds for the Central Universities (N2019006 and N180719020).


  • [1] E. Candes, J. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Commun. Pure Appl. Math. 59 (8) (2005) 1207-1223.
  • [2] M. Elad, M. A. Figueiredo, Y. Ma, On the role of sparse and redundant representations in image processing, Proc. IEEE 1998, 6 (2010) 972-982.
  • [3]

    J. Wright, M. Yi, J. Mairal, G. Sapiro, T. S. Huang, S. Yan, Sparse representation for computer vision and pattern recognition, Proc. IEEE 1998, 6 (2010) 1031-1044.

  • [4] M. H. Alsuwaiyel, Sparse Coding and its Applications in Computer Vision, World Scientific, 1999.
  • [5]

    M. Gong, J. Liu, H. Li, Q. Cai, L. Su, A multiobjective sparse feature learning model for deep neural networks, IEEE Trans. Neural Netw. Learn. Syst. 26 (12) (2015) 3263-3277.

  • [6] Z. Lu, Y. Zhang, Sparse approximation via penalty decomposition methods, SIAM J. Optimiz. 23 (4) (2012) 2448-2478.
  • [7] T. Blumensath, M. E. Davies, Iterative thresholding for sparse approximations, J. Fourier Anal. Appl. 14 (5) (2008) 629-654.
  • [8] A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imag. Sci 2 (1) (2009) 183-202.
  • [9] E. T. Hale, W. Yin, Y. Zhang, Fixed-point continuation for -minimization: Methodology and convergence, SIAM J. Optimiz. 19 (3) (2008) 1107-1130.
  • [10] L. Xiao, T. Zhang, A proximal-gradient homotopy method for the sparse least-squares problem, SIAM J. Optimiz. 23 (2) (2012) 1062-1091.
  • [11] Z. Xie, J. Hu, Rewighted -minimization for sparse solutions to underdetermined linear systems, 2013 6th Int. Congress Image Sig. Process. (CISP) 22 (2012) 1065–1088.
  • [12] M. K. Y.B.Zhao, A new computational method for the sparsest solutions to systems of linear equations, SIAM J. Optimiz. 25 (2) (2015) 1110-1134.
  • [13] B. K. Natarajan, Sparse approximate solutions to linear systems, SIAM J. Comput. 24 (2) (1995) 227–234.
  • [14] H. L. Taylor, S. C. Banks, J. F. Mccoy, Deconvolution with the -norm, Geophysics 44 (1) (1979) 39.
  • [15] S. S. Chen, D. Saunders, Atomic decomposition by basis pursuit, SIAM Rev. 43 (1) (2001) 129-159.
  • [16] R. J. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. B 58 (1) (1996) 267-288.
  • [17] P. V. Blomgren, J. D. Villasenor, B. Engquist, S. Osher, V. B. Bo, Total variation methods for restoration of vector valued images, SIAM J. Optimiz. 25 (2) 1110-1134.
  • [18] A. Chambolle, R. De Vore, N.-Y. Lee, B. Lucier, Nonlinear wavelet image processing: Variational problems, compression, and noise removal through wavelet shrinkage, IEEE Trans. Image Process. 7 (3) (1998) 319-335.
  • [19] M. Figueiredo, R. D. Nowak, An EM algorthim for wavelet-besed image restoration, IEEE Trans. Image Process. 12 (8) (2003) 906-916.
  • [20] Y. Wang, W. Yin, Sparse signal reconstruction via iterative support detection, SIAM J. Imag. Sci. 3 (3) (2009) 462-491.
  • [21] D. L., Donoho, For most large underdetermined systems of linear equations the minimal 1-norm solution is also the sparsest solution, Commun. Pure Appl. Math. 59 (6) (2006) 797-829.
  • [22] E. Candès, M. B. Wakin, S. P. Boyd, Enhancing sparsity by reweighted minimization, J. Fourier. Anal. Appl. 14 (5-6) (2008) 877-905.
  • [23] S.Foucart, M.Lai, Sparsest solutions of undetermined linear systems via -minimization for , IEEE Trans. Image Process. 26 (3) (2009) 395-407.
  • [24] D. Wipf, S. Nagarajan, Iterative reweighted and methods for finding sparse solutions, IEEE J. Sel. Topics Signal Process. 4 (2) (2010) 317-329.