BSGD-TV: A parallel algorithm solving total variation constrained image reconstruction problems

by   Yushan Gao, et al.

We propose a parallel reconstruction algorithm to solve large scale TV constrained linear inverse problems. We provide a convergence proof and show numerically that our method is significantly faster than the main competitor, block ADMM.



There are no comments yet.


page 1

page 2

page 3


Enhanced total variation minimization for stable image reconstruction

The total variation (TV) regularization has phenomenally boosted various...

Higher-order total variation approaches and generalisations

Over the last decades, the total variation (TV) evolved to one of the mo...

Block stochastic gradient descent for large-scale tomographic reconstruction in a parallel network

Iterative algorithms have many advantages for linear tomographic image r...

Compressed MRI Reconstruction Exploiting a Rotation-Invariant Total Variation Discretization

Inspired by the first-order method of Malitsky and Pock, we propose a no...

Hessian Schatten-Norm Regularization for Linear Inverse Problems

We introduce a novel family of invariant, convex, and non-quadratic func...

Image reconstruction from few views by L0-norm optimization

The L1-norm of the gradient-magnitude images (GMI), which is the well-kn...

On and beyond Total Variation regularisation in imaging: the role of space variance

Over the last 30 years a plethora of variational regularisation models f...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Our algorithm is inspired by applications in computed tomography (CT), where the efficient inversion of large sparse linear systems is required [1]: , where is the vectorised version of a 3D image that is to be reconstructed and is an X-ray projection model. are the vectorised noisy projections. We are interested in minimizing , where is quadratic and is convex but non-smooth [2]. For example:


where is a relaxation parameter and is the total variation (TV) of the image . For 2D images, it is defined as:


where is the intensity of image pixel in row and column .

We recently introduced a parallel reconstruction algorithm called coordinate-reduced stochastic gradient descent (CSGD) to minimize quadratic objective function

[3]. We here introduce a slight modification by simplifying the step length calculation and show that the modified version converges to the least squares solution of . We will call this modified algorithm block stochastic gradient descend (BSGD). We combine BSGD with an iterative shrinkage/thresholding (ISTA-type) step [4] to solve Eq.1. The new algorithm, called BSGD-TV, is compared with block ADMM-TV [5]

, an algorithm sharing the same parallel architecture and the same communication cost. Simulation results show that BSGD-TV is significantly faster as it requires significantly fewer matrix vector products compared to block ADMM-TV.

2 BSGD-TV Algorithm

2.1 Algorithm description

BSGD works on blocks of and . We assume that is divided into row blocks and column blocks. Let and be sub-vectors of and and let be the associated block of matrix so that . Our algorithm splits the optimization into blocks, so that each parallel process only computes using a single block and for some and

. Each process also requires an estimate of the current residual

and computes a vector , both of which are of the same size as . The main steps (ignoring initialisation) are described in Algo.1.

1:  for  do
2:     for  pairs drawn randomly without replacement in parallel  do
5:     end for
8:      (constant )
10:  end for
Algorithm 1 BSGD-TV algorithm

To effectively solve line 9, we here adopt method proposed in [6].

2.2 BSGD Convergence

BSGD without the proximal operator (=0 in Eq.1), and with parallelization over all subsets can be shown to converge to the least squares solution. To see this, we write the update of as


In this form, BSGD is similar to gradient descent but uses an old gradient. Assume that there is a fixed point defined by . Note that the fixed point condition implies that, if is full column rank, then . Thus the fixed point is the least squares solution. Theorem 2.1 states the conditions on parameter for convergence when all subsets and

are selected within one epoch.

Theorem 2.1.

If , where

is the maximum eigenvalue of

and assume is full column rank, then BSGD without the TV operator (=0 in Eq.1), and with parallelization over all subsets converges to the least squares solution .

Proof of Theorem 2.1.

The iteration in Eq.3 can be written as


Standard convergence results for iterative method of this type with fixed require the spectral radius of to be less than 1 [7]. Let be any (possibly complex valued) eigenvalue of , i.e. satisfies det. It is straightforward to obtain:


By Eq.5, we see that eigenvalues of correspond to


Eigenvalues of are then given by


As the spectral radius of corresponds to the largest magnitude of the eigenvalues of , we require and to ensure the convergence of the algorithm. is a positive definite matrix and thus has only positive, real valued eigenvalues . Thus and are real valued if and complex valued if is . In the complex case, it is easy to see that and if , implying that the acceptable range of is . ∎

Theorem 2.2 gives a general convergence condition when applying BSGD-TV to solve Eq.1.

Theorem 2.2.

If the constant step length satisfies


where is defined in Eq.1, then BSGD-TV converges to the optimal solution of Eq.1.

Proof of Theorem 2.2.

With parallelization over all subsets, BSGD-TV computes


We define a function as


where is the squared norm. The fact that means that:


Finally, the definition of and the requirement on the step length in Eq.8, mean that holds, so that BSGD-TV converges to the fixed point of Eq.1. ∎

3 Simulations

We show experimentally that the method also converges when only a fraction and of subsets of and are randomly selected to calculate the at each iteration. The simulation geometry is shown in Fig.1.

Figure 1: Using a fan-beam x-rays geometry to scan a 2D Shepp-Logan phantom. Projections are taken at intervals. The image is partitioned into 4 subsets and the total projections are also partitioned into 4 subsets.

We add Gaussian noise to the projections so that the SNR of is 17.7 dB. We define the relative error as , where is the norm of the difference between reconstructed image vector and the original vector . Convergence is shown in Fig.2a. We plot relative error against epochs, where an epoch is a normalised iteration count that corrects for the fact that the stochastic version of our algorithm only updates a subset of elements at each iteration.

BSGD-TV and ADMM-TV are faster than ISTA in terms of epochs. However, ADMM-TV is significantly slower than BSGD-TV, because ADMM-TV requires matrix inversions at each iteration, while BSGD does not. Even when implementing ADMM-TV using as few conjugate gradient iterations per step as possible, as shown in Fig.2b, BSGD-TV is more computationally efficient in terms of the number of required matrix vector multiplications [3]. Compared to ISTA and GD, our block method allows these computations to be fully parallelised which would enable to reconstruct large scale CT reconstructions while the computation node have limited storage capacity.

Figure 2: The step length for ISTA, gradient descent (GD, which solves in Eq.1 ) and BSGD is and in Eq.1 is . For BSGD and ADMM, is divided into sub-matrices while ISTA processes as a whole. (a): Relative error vs. epoch. The high relative error of GD suggests the necessity of incorporating the TV norm. (b): BSGD-TV uses significantly fewer matrix-vector multiplications compared to ADMM-TV.

4 Conclusion

BSGD-TV is a parallel algorithm for large scale TV constrained CT reconstruction. It is similar to the popular ISTA algorithm but is specially designed for optimisation in distributed networks. The advantage is that individual compute nodes only operate on subsets of and , which means they can operate with less internal memory. The method converges significantly faster than block-ADMM methods.


  • [1] X. Guo, “Convergence studies on block iterative algorithms for image reconstruction”, Applied Mathematics and Computation, 273: 525–534, 2016.
  • [2] E. Sidky and X.Pan, “Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization”, Physics in Medicine Biology, 53(17):4777–4807, 2008.
  • [3] Y. Gao and T.Blumensath, “A Joint Row and Column Action Method for Cone-Beam Computed Tomography”, IEEE Transactions on Computational Imaging, 2018.
  • [4] A. Beck and M.Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems”, SIAM journal on imaging sciences, 2(1):183–202, 2009.
  • [5] N. Parikh and S. Boyd, “Block splitting for distributed optimization”, Mathematical Programming Computation, 6(1):77–102, 2014.
  • [6] A. Beck and M.Teboulle, “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems”, IEEE Transactions on Image Processing, 18(11):2419–2434, 2009.
  • [7] Y. Saad, “Iterative methods for sparse linear systems”, siam,82, 2003.