Our algorithm is inspired by applications in computed tomography (CT), where the efficient inversion of large sparse linear systems is required : , where is the vectorised version of a 3D image that is to be reconstructed and is an X-ray projection model. are the vectorised noisy projections. We are interested in minimizing , where is quadratic and is convex but non-smooth . For example:
where is a relaxation parameter and is the total variation (TV) of the image . For 2D images, it is defined as:
where is the intensity of image pixel in row and column .
We recently introduced a parallel reconstruction algorithm called coordinate-reduced stochastic gradient descent (CSGD) to minimize quadratic objective function. We here introduce a slight modification by simplifying the step length calculation and show that the modified version converges to the least squares solution of . We will call this modified algorithm block stochastic gradient descend (BSGD). We combine BSGD with an iterative shrinkage/thresholding (ISTA-type) step  to solve Eq.1. The new algorithm, called BSGD-TV, is compared with block ADMM-TV 
, an algorithm sharing the same parallel architecture and the same communication cost. Simulation results show that BSGD-TV is significantly faster as it requires significantly fewer matrix vector products compared to block ADMM-TV.
2 BSGD-TV Algorithm
2.1 Algorithm description
BSGD works on blocks of and . We assume that is divided into row blocks and column blocks. Let and be sub-vectors of and and let be the associated block of matrix so that . Our algorithm splits the optimization into blocks, so that each parallel process only computes using a single block and for some and
. Each process also requires an estimate of the current residualand computes a vector , both of which are of the same size as . The main steps (ignoring initialisation) are described in Algo.1.
To effectively solve line 9, we here adopt method proposed in .
2.2 BSGD Convergence
BSGD without the proximal operator (=0 in Eq.1), and with parallelization over all subsets can be shown to converge to the least squares solution. To see this, we write the update of as
In this form, BSGD is similar to gradient descent but uses an old gradient. Assume that there is a fixed point defined by . Note that the fixed point condition implies that, if is full column rank, then . Thus the fixed point is the least squares solution. Theorem 2.1 states the conditions on parameter for convergence when all subsets and
are selected within one epoch.
Proof of Theorem 2.1.
The iteration in Eq.3 can be written as
Standard convergence results for iterative method of this type with fixed require the spectral radius of to be less than 1 . Let be any (possibly complex valued) eigenvalue of , i.e. satisfies det. It is straightforward to obtain:
By Eq.5, we see that eigenvalues of correspond to
Eigenvalues of are then given by
As the spectral radius of corresponds to the largest magnitude of the eigenvalues of , we require and to ensure the convergence of the algorithm. is a positive definite matrix and thus has only positive, real valued eigenvalues . Thus and are real valued if and complex valued if is . In the complex case, it is easy to see that and if , implying that the acceptable range of is . ∎
Proof of Theorem 2.2.
We show experimentally that the method also converges when only a fraction and of subsets of and are randomly selected to calculate the at each iteration. The simulation geometry is shown in Fig.1.
We add Gaussian noise to the projections so that the SNR of is 17.7 dB. We define the relative error as , where is the norm of the difference between reconstructed image vector and the original vector . Convergence is shown in Fig.2a. We plot relative error against epochs, where an epoch is a normalised iteration count that corrects for the fact that the stochastic version of our algorithm only updates a subset of elements at each iteration.
BSGD-TV and ADMM-TV are faster than ISTA in terms of epochs. However, ADMM-TV is significantly slower than BSGD-TV, because ADMM-TV requires matrix inversions at each iteration, while BSGD does not. Even when implementing ADMM-TV using as few conjugate gradient iterations per step as possible, as shown in Fig.2b, BSGD-TV is more computationally efficient in terms of the number of required matrix vector multiplications . Compared to ISTA and GD, our block method allows these computations to be fully parallelised which would enable to reconstruct large scale CT reconstructions while the computation node have limited storage capacity.
BSGD-TV is a parallel algorithm for large scale TV constrained CT reconstruction. It is similar to the popular ISTA algorithm but is specially designed for optimisation in distributed networks. The advantage is that individual compute nodes only operate on subsets of and , which means they can operate with less internal memory. The method converges significantly faster than block-ADMM methods.
-  X. Guo, “Convergence studies on block iterative algorithms for image reconstruction”, Applied Mathematics and Computation, 273: 525–534, 2016.
-  E. Sidky and X.Pan, “Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization”, Physics in Medicine Biology, 53(17):4777–4807, 2008.
-  Y. Gao and T.Blumensath, “A Joint Row and Column Action Method for Cone-Beam Computed Tomography”, IEEE Transactions on Computational Imaging, 2018.
-  A. Beck and M.Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems”, SIAM journal on imaging sciences, 2(1):183–202, 2009.
-  N. Parikh and S. Boyd, “Block splitting for distributed optimization”, Mathematical Programming Computation, 6(1):77–102, 2014.
-  A. Beck and M.Teboulle, “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems”, IEEE Transactions on Image Processing, 18(11):2419–2434, 2009.
-  Y. Saad, “Iterative methods for sparse linear systems”, siam,82, 2003.