## 1 Introduction

Since the 1930s, radio astronomy has transitioned from the first observations to a data-rich era. Due to rapid technological developments, radio astronomy will transition from a data-rich era to the so-called big-data era in coming years. Radio interferometric (RI) telescopes allow us to explore the Universe by detecting radio waves emitted from a wide range of objects in the sky. They observe the radio sky with high angular resolution and sensitivity, providing valuable information for astrophysics and cosmology (Ryle & Vonberg, 1946; Ryle & Hewish, 1960; Thompson et al., 2008). Next-generation RI telescopes are being built to achieve science goals ranging from probing cosmic magnetic fields (Johnston-Hollitt et al., 2015)

to the detection of the epoch of re-ionization

(Koopmans et al., 2015), to name just a few.Representative next-generation RI telescopes include:
the LOw Frequency ARray (LO-FAR^{1}^{1}1http://www.lofar.org, van Haarlem
et al., 2013),
the Extended Very Large Array (EVLA^{2}^{2}2http://www.aoc.nrao.edu/evla),
the Australian Square Kilometre Array Pathfinder (ASKAP^{3}^{3}3http://www.atnf.csiro.au/projects/askap, Hotan et al., 2014),
the Murchison Widefield Array (MWA^{4}^{4}4http://www.mwatelescope.org/telescope, Tingay
et al., 2013),
and the Square Kilometer Array (SKA^{5}^{5}5http://www.skatelescope.org, Dewdney et al., 2013

). These new telescopes will acquire large volumes of data, and achieve significantly higher dynamic range and angular resolution than previous generations. The SKA itself, for instance, will provide a considerable step in dynamic range – six or seven orders of magnitude beyond prior telescopes – and angular resolution, while producing massive volumes of data. For example, data rate estimates of SKA phase I are around five terabits per second for both SKA1-low (a low frequency aperture array) and SKA1-mid (a mid frequency array of reflector dishes)

(Broekema et al., 2015). Moreover, the data volume will be greater still in SKA phase II.Briefly speaking, radio interferometers sample Fourier coefficients (visibilities) of the radio brightness distribution in the sky. Due to limited sampling in the Fourier plane, imaging an observation requires solving an ill-posed linear inverse problem (Thompson et al., 2008), which is an important first step in many subsequent scientific analyses. The emerging era of big-data ushered in by the new generation of radio telescopes will, inevitably, bring further challenges in imaging and scientific analysis. The enormous data rates will create practical challenges in both storage and computation.

Classical image reconstruction methods, such as CLEAN-based methods (Högbom, 1974; Bhatnagar & Corwnell, 2004; Cornwell, 2008; Stewart et al., 2011; Offringa et al., 2014; Pratley & Johnston-Hollitt, 2016) and the maximum entropy method (MEM) (Ables, 1974; Gull & Daniell, 1978; Cornwell & Evans, 1985), have served the community well but do not exploit modern image reconstruction techniques. They suffer from cumbersome parameter tuning and/or slow computation speed and require post processing due to their limited image models. Furthermore, they struggle to confront the upcoming big-data era. Recently, compressive sensing (CS) techniques were ushered into RI imaging for image reconstruction (Wiaux et al., 2009a, b; McEwen & Wiaux, 2011; Li et al., 2011a, b; Carrillo et al., 2012, 2014; Wolz et al., 2013; Dabbech et al., 2015, 2017a, 2017b; Garsden et al., 2015; Onose et al., 2016; Onose et al., 2017; Pratley et al., 2018; Kartik et al., 2017a, b). CS-based methods exploit sparse regularisation techniques and have shown promising results compared to traditional approaches such as CLEAN (e.g. Pratley et al. 2018; Dabbech et al. 2017b; Kartik et al. 2017a). Furthermore, several algorithms (Carrillo et al., 2014; Onose et al., 2016; Onose et al., 2017; Kartik et al., 2017b) have been developed to scale such approaches to big-data, as anticipated from the SKA, using, e.g., distribution, parallelisation, dimensionality reduction, and/or stochastic strategies. In Cai et al. (2017b, a)

, Bayesian inference techniques for sparsity-promoting priors were presented to quantify the uncertainties associated with reconstructed images,

e.g.to estimate local credible intervals (

cf. error bars) on recovered pixels. In particular, in Cai et al. (2017a), maximum-a-posteriori (MAP) estimation techniques were presented to scale uncertainty quantification to massive data sizes, i.e. to big-data.All of these reconstruction methods (e.g., CLEAN, MEM and CS-based methods), however, store the entire set of observed visibilities for subsequent processing once data acquisition is completed (i.e., after the full observation is made). In other words, they can be categorised as offline methods. In this article we develop online methods for RI imaging. Online methods process data piece-by-piece as they are acquired, without having the entire data-set available from the start (Shalev-Shwartz, 2011; Hazan, 2015). Processing is thus performed while the data are acquired. In particular, online methods generally outperform offline methods in terms of computational efficiency, and memory and storage costs. In RI imaging, since the visibility acquisition process can take a reasonably long time (often 10 hours or longer) and the observed visibilities require huge storage, particularly in the big-data era, online methods can provide considerable advantages.

In this article we propose a special type of online imaging method for RI. The proposed online method is based on iterative convex optimisation algorithms (e.g. Combettes & Pesquet, 2010), which have been applied to RI imaging problems already (e.g. Cai et al., 2017a)

. Compared with the standard (offline) algorithms which use visibilities from an entire observation at every iteration, our online method needs to deal only with one single visibility block acquired at the latest time slot, at each iteration. Most importantly, a visibility block will be discarded once the online method assimilates it. The storage space will be released at the same time and will then be used for next data block. Consequently, the storage requirements of our online algorithm are limited to the size of the visibility block, rather than the size of the full set of visibilities acquired over an observation. Storage requirements of our online algorithm are thus a very small fraction of the storage requirements of offline methods. Once arriving at the last data block (the entire visibilities then have been observed) and after processing it, the image from the observation will have been reconstructed by our online method. At the moment the final visibilities are acquired, our online method is close to completing its reconstruction work, whereas standard methods are only able to start the reconstruction. Moreover, we verify the convergence properties of our online method and show that the quality of the images recovered by our online method is essentially the same as the equivalent offline method. Although our proposed online method is customised for the application of RI imaging, the concept of the method itself is generic and therefore can be directly applied to many other applications, such as medical imaging. Furthermore, our online framework supports the uncertainty quantification methodology proposed in

Cai et al. (2017a). Both techniques can thus be combined to target image reconstruction and uncertainty analysis for the upcoming big-data era.The remainder of this article is organised as follows. In Section 2

we review the RI imaging inverse problem, MAP (maximum a posteriori) estimation, related state-of-the-art optimisation algorithms and some classical online methods. Our general online optimisation algorithm is proposed in Section

3, with a discussion of its convergence properties. Section 4 focuses on sparse image reconstruction for RI imaging using the proposed online method, including an analysis of visibility storage requirements and computational cost. Numerical results evaluating the performance of our proposed method and the comparison with related methods are reported in Section 5. Finally, in Section 6, we conclude with a brief description of our main contributions and future works.## 2 Radio interferometric imaging and related methods

In this section the inverse reconstruction problem of RI imaging is first reviewed. Then, we review MAP estimation techniques to address the RI imaging problem efficiently. Finally, some representative online optimisation methods are reviewed.

### 2.1 Radio interferometry

In the following, we briefly recall the background of the inverse problem of RI imaging (for further details see, e.g., Cai et al. 2017b and references therein). A radio interferometer consists of an array of radio antennae, where each pair of antennae forms a baseline. When the field of view is narrow and the baselines are co-planar, the telescope measures visibilities,

, by correlating the signals from pairs of antennas in a given baseline, with baseline vector

being defined as the separation of the antennae.Let represent the sky brightness distribution, described in coordinates , and represent the primary beam of the telescope. The RI measurement equation for obtaining can be represented as (Thompson et al., 2008)

(1) |

Recovering the sky intensity signal from the measured visibilities , acquired according to equation (1), forms a linear inverse problem (Rau et al., 2009).

In the discrete setting, let represent the sampled intensity signal (the sky brightness distribution) which, under a basis or dictionary (e.g., a wavelet basis or an over-complete frame) , can be represented as

(2) |

where vector represents the synthesis coefficients of under . In particular, is said to be sparse if contains only non-zero coefficients, with . Similarly, is called compressible if many coefficients of are nearly zero, i.e., its sorted coefficients satisfy a power law decay. In practice, it is ubiquitous that natural images are sparse or compressible. Let be the visibilities observed under a linear measurement operator modelling the realistic acquisition of the sky brightness components. Then, we have

(3) |

where represents additive noise. Without loss of generality, we subsequently consider independent and identically distributed (i.i.d.) Gaussian noise. In practice, is not sufficiently sampled, which results in an ill-posed inverse problem in the perspective of image reconstruction of from (3).

For the upcoming big-data era, the number of the data points in could be much larger than the size of image , i.e., , providing additional challenges in processing speed, memory load and data storage requirements. These new challenges call for new efficient solvers, in addition to the distributed, parallel and stochastic computation approaches (e.g. Carrillo et al. 2014; Onose et al. 2016).

### 2.2 Maximum a posteriori (MAP) estimation

One very effective way to address the ill-posed inverse problem in (3

) is by using Bayes’ theorem to infer the posterior distribution of the image

given data , by(4) |

where is the likelihood and is the prior. The normalising constant in the denominator of (4) is the marginal likelihood or Bayesian evidence, which need not be computed for parameter inference. The MAP estimator, a Bayesian point estimator, can be obtained as a solution of problem (3) by

(5) |

In the following we derive a common objective functional used to produce . For the RI imaging problem of (3), in the case of i.i.d. Gaussian noise, the likelihood function reads

(6) |

and let the prior distributions of be

(7) |

where

represents the standard deviation of the noise,

and are problem-related linear operators, and encodes prior information of the image (acting as a regularising penalty). Refer to Cai et al. (2017b) for more detailed discussions regarding choices of the prior function . Let(8) |

Consider convex, from which it follows that is convex. Then the inverse problem in (3) can be solved by the MAP estimator given by

(9) |

Refer to Cai et al. (2017b) for further details about Bayesian inference in the context of RI imaging.

Let (the adjoint of ) and for the analysis setting and (identity operator) and for the synthesis setting. After equipping with the norm, , to promote sparseness (Wiaux et al., 2009a, b; McEwen & Wiaux, 2011; Onose et al., 2016; Pratley et al., 2018; Cai et al., 2017a, b), the MAP estimation model in (9) reads

(10) |

or

(11) |

where is the regularisation parameter used to balance the tradeoff between sparsity and data fidelity. Models (10) and (11) are generally coined as analysis and synthesis unconstrained frameworks, respectively. Further discussions about the analysis and synthesis forms can be found in, e.g., Maisinger et al. (2004); Elad et al. (2007); Cleju et al. (2012); Cai et al. (2017b, a).

### 2.3 Convex optimisation methods for MAP estimation

MAP estimation models like the analysis model (10) and the synthesis model (11) can be solved by convex optimisation methods.

Consider a general problem represented as

(12) |

where is proper, convex and lower semi-continuous, and , which is usually abbreviated as afterwards (when it is associated with all ), is convex, and differentiable with Lipschitz constant , i.e.,

(13) |

Define, , the proximity operator of convex function at as (Moreau, 1965)

(14) |

and represent by for simplification.

The minimisation problem with form (12) can be solved by many convex optimisation methods, e.g., the forward-backward splitting algorithm, the Douglas-Rachford splitting algorithm, the alternating direction method of multipliers (ADMM), or the simultaneous direction method of multipliers (SDMM) (see Combettes & Pesquet 2010 and references therein). In the following, we briefly recall the forward-backward algorithm, due to its simplicity, efficiency, and pertinence to the objective functionals considered in this article, i.e., the analysis model (10) and synthesis model (11).

In general, from the fixed point equation

(15) |

the iteration formula of the forward-backward algorithm can be written as (Combettes & Pesquet, 2010)

(16) |

where is the step size in a suitable bounded interval. After sufficient iterations, sequence generated by (16) converges to a minimiser of problem (12). There are several variants of (16) that achieve more efficient performance; refer to Combettes & Wajs (2005), Bauschke & Combettes (2011) and Cai et al. (2017a) for more detailed discussions.

### 2.4 Online optimisation methods

Nowadays, various online problems have emerged in different research areas and application domains, such as computer science and machine learning (

e.g.online regression, pattern recognition); for more details refer to

Shalev-Shwartz (2011) and Hazan (2015), and references therein. Even through these problems are different to our focus in this article, their intrinsic ideas, i.e., their online manner, can help to design appropriate methods in new applications such as RI imaging. We therefore, in the following, briefly recall some classic online optimisation methods.The general online optimisation protocol can be described as follows. In general, consider a set of possible actions , a set of incoming observations

, and a given loss function

. Here set has a general meaning, not restricted to RI observations. At each time slot : (i) choose an action and simultaneously an observation ; (ii) obtain the loss . Then, the objective of the online optimisation protocol is to select actions to minimise the total loss(17) |

The above problem (17) can be solved by many methods, such as the online mirror descent, the online Newton step algorithms, or online gradient methods (see Shalev-Shwartz 2011; Hazan 2015 for more detail). As an example, starting at , the online gradient method, for , updates iteratively by

(18) | ||||

(19) |

Note that in the iteration formula (16) all observations acquired are needed at every iteration. On the contrary, the online iteration formula (18) shows that, at iteration , only action (observed at last iteration) and the latest incoming observation (not all observations) are used to derive a new action. For the online setting, all observations are only known at the final stage. Seeing the visibility acquisition property in RI imaging in this manner, i.e., the visibilities are observed one-by-one (or bunch-by-bunch), the online approach mentioned above can be adapted to tackle the RI imaging problem. In the next section we propose our generic online method for minimisation problems such as (12), based on convex optimisation methods applicable for MAP estimation.

## 3 Proposed online imaging method

In practice, in RI the time of acquiring the measurements can be long (often 10 hours or longer), and the space needed to store the data can be extremely large, particularly in the big-data era. Waiting to obtain and store all measurements is not efficient and imposes large costs that may be avoided. Furthermore, popular RI imaging methods (e.g. CLEAN and CS-based methods) can require relatively long computation times to recover images. There is an urgent need for online processing of the incoming data to recover images, which is the main focus of this article. In this section we present our general online reconstruction method for solving inverse imaging problems, i.e., the analysis model (10) and synthesis model (11), with its convergence analysis. This new technique can significantly improve data processing efficiency in both computation time and data storage, and is essential to cope with the challenges of next-generation radio telescopes in the big-data era.

The diagram in Figure 1 shows the methodology of our proposed online method. As is shown, firstly, the algorithm checks whether the data observation stage has completed. If yes, no new data block will be observed and thus the online method stops. Otherwise, the algorithm: loads the new observed data block; assimilates it; releases the data block; updates the intermediate reconstructed image (using the newly assimilated data); and then sets the current reconstructed image as the starting point for the next iteration. The above steps are repeated until the data observation stage completes and then the final reconstructed image is set as the output – the reconstruction result of the online method.

### 3.1 Data blocks

Without loss of generality, we split the measurements into blocks and assume these blocks are received at different but consecutive time slots, i.e.,

(20) |

where block is received earlier than and . Accordingly, we split the measurement operator into blocks, i.e.,

(21) |

Note that, for each data block , problem (3) can be rewritten as

(22) |

where represents additive noise associated with measurement block .

### 3.2 Online algorithm

For the blocks of in (20), which are assumed to be obtained at different times, we suggest using the measurement blocks as they are received for early-stage reconstruction, rather than waiting and starting reconstruction once all measurement blocks are received. Then, once receiving the last measurement block, a complete reconstruction can be achieved immediately, or at least much faster than an offline approach, since a close estimate of the underlying image is already available from early-stage reconstructions. In the following, we present our online reconstruction strategy based on the standard forward-backward algorithm reviewed in the previous section.

We assume (the second term in (12)) is separable according to the blocks of in (20), i.e.,

(23) |

where represents function corresponding to data block in . For simplicity, we use to denote in the following. Note the fact that the data fidelity term in the analysis form (10) or synthesis form (11) is indeed separable. Then, using (23), the general minimisation problem (12) can be rewritten as

(24) |

In the online setting, the blocks of are being received one by one at different time slots, so the objective functional (24) can only be exactly formed when all the blocks are acquired. For the first blocks available at some time slot where , problem (24) can be written as

(25) |

Clearly, all the methods and techniques mentioned in Section 2.3 that are applicable to solve problem (12) can also be applied to solve problem (25). For example, applying the forward-backward iterative formula (16) to problem (25), we obtain

(26) |

where .

With a given starting point, the iterative strategy of the proposed online algorithm is implemented as follows. Starting with (corresponding to the first data block), execute formula (26) associated with objective functional (25) for a few iterations (a single iteration can be applied in practice), and denote the current result by . Then, move to (corresponding to the second data block) at the appropriate time by executing formula (26) using as the starting point. Continue this procedure until (corresponding to the final data block), and denote the final result by – the reconstruction result of the online method. In order to solve problem (24), the online algorithm is tackling a series of subproblems (25) according to different (), where the reconstruction result corresponding to the previous subproblem is used as a starting point to process the next one, until the final subproblem amounts to the full problem.

We summarise our online algorithm in Algorithm 1. While we present an overview of the general form of the algorithm, it is possible to separate the data assimilation and imaging stages and discard the data block as soon as the data are assimilated (as discussed in Section 4). The stopping criterions used in Algorithm 1 are specified as type I and type II, and are defined as follows: (i) type I can be set using the maximum number of data blocks (if known in advance) or a feedback of whether no new data blocks are available; (ii) type II can be set using a chosen maximum iteration number and, in practice, we set this value to 1, which requires the least computational cost. Apart from the maximum iteration number, the relative error of the solutions at two consecutive iterations can also be adopted as a stopping criterion for type II.

###### Remark 3.1

The forward-backward algorithm presented in Algorithm 1 is just a specific example of our proposed online methodology. The online iterative form, however, can be very general. Any iterative methods which are applicable for minimising problem (24) are likely to be compatible with the online strategy proposed here. In other words, these methods can be simply extended to online versions in the same manner.

### 3.3 Convergence analysis

Since the convergence properties of standard splitting algorithms have been verified (e.g. see Combettes & Pesquet 2010 and references therein), the convergence of our online algorithm is therefore self-evident if the stopping criterion type II in Algorithm 1 is proper, i.e., the maximum iteration number is assigned large enough. Nevertheless, it is still important to consider the converge performance when executing as few iterations as possible, e.g. a single iteration in stopping criterion type II.

For simplicity, represent the iterative form (26) as

(27) |

In particular, symbol here is a general operator which represents any type of iterative formula applicable. Let be a minimiser of problem (24). We want to show that goes to and goes to . Obviously, when and , we have

(28) |

due to the standard convergence results of splitting methods.

For the online algorithm, the computed are associated with just part of data blocks before all of them are acquired. As are finally expected to minimise problem (24), there is an important relationship between and the objective functional , which considers all visibility blocks. To address this point, the following Theorem 3.2 concludes that sequence is monotone decreasing to , under a mild assumption: let be obtained with data blocks, , then assume

(29) |

In words, inequality (29) means that the intermediate reconstruction corresponding to a later iteration fits the unknown data blocks better than the reconstruction corresponding to an earlier iteration. This is reasonable, since the more data that is received and used, the better the intermediate reconstruction should fit the whole data.

###### Theorem 3.2

###### Proof

We just need to verify, (the set of all integers),

(30) |

Obviously, the above inequality (30) is correct if is produced using all visibility blocks. Otherwise, if is associated with visibility blocks, , we have

(31) |

The above inequality (31) is obtained using the convergence property of splitting methods: the total energy of the objective functional with respect to fixed visibilities (input) is monotone decreasing. Then, using (29) and (31), and the fact that

(32) |

we have

(33) | ||||

(34) | ||||

(35) |

Thus inequality (30) still holds if is associated with visibility blocks, . This completes the proof.

## 4 Online RI Imaging

In this section we present the explicit procedures of applying the general online method proposed in the previous section (i.e., Algorithm 1) to find MAP estimators for the RI imaging problem, using both the analysis form (10) and synthesis form (11). Moreover, we also discuss the visibility storage requirements and computational costs of our online method. We use the label for symbols related to the analysis model and for symbols related to the synthesis model (following Cai et al. 2017b, a).

### 4.1 MAP estimation by online convex optimisation

#### 4.1.1 Analysis

Set and , . Then the reconstruction problem for the analysis form (10), , i.e.,

(36) |

can be solved by the forward-backward iteration formula given in (26), i.e.,

(37) |

Assume , where

is identity matrix. We have,

,(38) |

and

(39) |

where is the pointwise soft-thresholding operator of vector defined by

(40) |

Refer to Cai et al. (2017b) for the derivation of (38). Substituting (38) and (39) into (37), then the analysis problem (36) can be solved iteratively by

(41) | ||||

(42) |

with initialisation set to, e.g. , i.e. the dirty image according to the first data block. Note, importantly, that the term related to in (41), i.e., , can be computed once in advance.

###### Remark 4.1

In the analysis form (10), when choosing a such that , i.e. an over-complete basis , then can be computed in an iterative manner:

(43) | ||||

(44) |

where ( is a constant satisfying ) is a predefined step size and ; refer to Fadili & Starck (2009) and Jacques et al. (2011) and references therein for details. Here, we repeat the remark given in Cai et al. (2017b, a) for ease of reference.

#### 4.1.2 Synthesis

Set and . Then, similar to (36), the reconstruction problem for the synthesis form (11), , i.e.,

(45) |

can be solved by the forward-backward iteration formula given in (26), i.e.,

(46) |

We have, ,

(47) |

and

(48) |

Substituting (47) and (48) into (46), we get the iterative formula solving the synthesis problem (45), i.e.,

(49) |

Like (39), the term related to in (49), i.e., , can be computed once in advance.

###### Remark 4.2

We summarise the online forward-backward splitting algorithm corresponding to the analysis and synthesis reconstruction forms in Algorithms 2 and 3, in which the stopping criteria mentioned in Algorithm 1 are specified explicitly. In particular, when processing individual visibility blocks, for simplicity and efficiency, we execute one iteration (cf. stopping criterion type II in Algorithm 1) for each block. Furthermore, when the number of visibility blocks is small (less than the iteration number necessary for a standard forward-backward algorithm to converge), a few more optional iterations can then be applied after the algorithms process the last visibility block. However, in practice is generally very large in order to reduce storage costs, especially with the trend towards big-data, in which case these optional iterations are not necessary.

(a) | (b) |

### 4.2 Storage requirements

We discuss visibility storage requirements of the proposed online method in RI imaging. In the forthcoming big-data era storing the visibilities will be challenging. Standard offline methods, which need all visibilities to be acquired and stored in advance, have extremely large storage requirements. The proposed online method for RI imaging, as illustrated in Algorithms 2 and 3, can dramatically reduce storage requirements.

The online method only needs to deal with a single block of visibilities (i.e., a subset of the visibilities) at one time. The size of each block can be controlled as required: when a large storage volume is available, a large visibility block can be considered; otherwise, any arbitrarily small block can be considered, to the extreme case of just a single visibility in each block (see lines 7 and 8 respectively in Algorithms 2 and 3 about the visibility block loading and assimilation). Note that after loading and assimilating a block by the online method, the storage used to store that block will be released for storing another block (see line 9 in Algorithms 2 and 3 about the visibility block storage releasing). The ratio of visibility storage required for the online method relative to the offline method, which must store all visibilities, is therefore

(50) |

When all blocks are the same size, the storage requirement is of the total visibilities, which means less than 1 percent of visibilities need to be stored when . Figure 2 (the blue solid curve) shows the ratio of visibility storage requirements between the online algorithm and the standard algorithm for different number of visibility blocks.

Another important advantage of the online method in terms of storage requirement is that, due to its independence with respect to the number of visibility blocks, it has the ability of tackling RI imaging problems encountered with an arbitrarily large amount of visibilities – just divide the entire visibilities into individual visibilities blocks and then conquer them one-by-one online.

Finally, since the standard offline methods can only deal with a complete set of visibilities, when new visibilities are available it is not possible for standard methods to use the new input to improve their reconstruction quality in a principled manner (unless the computation is restarted). The online method, on the contrary, is able to immediately process any new observed visibilities – just treat the new input as a normal visibility block and assimilate it to update the reconstruction.

It should be noted that storage during the image reconstruction process is not only burdened by the measurements, but also by storing the baseline coordinates and weights, which are comparable. Furthermore, the interpolation kernel for performing a two dimensional non-uniform fast Fourier Transform can take up to 16 or more times the amount of storage from the measurements alone

(Fessler & Sutton, 2003; Offringa et al., 2014; Pratley et al., 2018). However, methods exist to reduce this storage cost. For example, it is possible for the interpolation kernel to be calculated on-the-fly, or to prune the interpolation kernel. Furthermore, alternative efficient methods can be developed to reduce these storage costs, e.g. by precomputing .### 4.3 Computational cost

We now compare the computational cost between the online method and the standard method. Comparing to the standard method, in addition to dramatically reducing storage costs, the online method can provide considerable computational savings when the number of visibility blocks considered is not much larger than the number of iterations necessary for the standard method.

For both the online and standard methods, at each iteration, the most computationally demanding part is to apply the measurement operator on an image (refer to line 10 in Algorithms 2 and 3), in that the rest of the computations are highly dominated by this step. In particular, for this step the standard method needs all the blocks, i.e., , to be involved in the computation for each iteration, whereas only the first blocks, , are used at the -th iteration in the online method. In other words, for the online method at iteration , just portion of and is involved in the computation (again, as an example, refer to line 10 in Algorithm 2), whereas the standard method needs the whole operators. When we consider our online method with iterations (to ensure convergence ), and where denotes the maximum iteration number of the standard method, the ratio of the computational cost between the online method and the standard method can be approximated by

(51) |

where the numerator and denominator represent the computational costs of the online method and the standard method, respectively. Figure 2 (the brown dashed lines) shows the approximate ratio of the computational cost between the proposed online method (with different number of visibility blocks, ) and the standard method (with different maximum iteration numbers, ).

From (51), we conclude that, when is large enough (i.e., ) and both methods execute similar iterations (i.e., ), the online method then requires only a half of the computations of the standard method, approximately, i.e.,

(52) |

Furthermore, ratio (51) also reveals the following twofold fact: one is that if , the online method will be much more economical; the other is that if , the online method will be computationally expensive, but then its visibility storage requirement is extremely low – just approximate of the storage space needed by the standard method (see (50)). On the whole, the online method provides the choice of using a large or small number of visibility blocks . This choice depends on the priority of the application, in saving storage space or reducing computational cost. Again, see Figure 2 for the pictorial explanation of the comparison between the online and standard methods in terms of visibility storage requirements and computational cost.

Finally, it is worth emphasising that the online method actually achieves a reconstruction once no more visibility blocks are available. In other words, no matter what the computational costs, the online method executes them before the data acquisition stage finishes. On the contrary, all of the computational costs of the standard method have to be carried out after the data acquisition stage. In this sense the online method always wins at the starting point of the offline method.

## 5 Experimental results

In this section we investigate the performance of the proposed online method using representative RI test images and compare to the standard (offline) method.

(a) M31 | |

(b) Cygnus A | |

(c) W28 | (d) 3C288 |

### 5.1 Simulations

Figure 3 shows the four images used for tests, including an HI region of the M31 galaxy (size ), the Cygnus A radio galaxy (size ), the W28 supernova remnant (size ), and the 3C288 radio galaxy (size ). The hardware used to perform simulations and subsequent numerical experiments is a laptop (Macbook) with CPU of 2.2 GHz, four Intel Core i7 processors and memory of 16 GB. All the codes are running on Matlab R2015b.

The measurement (sensing) operator, , , used for simulations, for simplicity, is a Fast Fourier Transform (FFT) operator, , followed by a masking operation, , i.e., . In principle, this can easily be extended to the more realistic case for measurements that lie off the Fourier grid, where

is replaced with a degridding matrix, and zero padding and degridding correction is applied before the the FFT (see

Pratley et al. 2018 for more details).The entire visibilities are generated randomly through the variable density sampling profile (Puy et al., 2011) in half the Fourier plane with 10% of discrete Fourier coefficients of each ground truth image. Then, the visibilities are corrupted by complex Gaussian noise with zero mean and standard deviation , where , is the infinity norm (referring to the maximum absolute value of components of ), and SNR (signal to noise ratio) is set to 30 dB for all simulations. The definition of SNR is given by

(53) |

where is the ground truth of a reconstruction . The basis in the analysis and synthesis models (10) and (11) is set to Daubechies 8 wavelets, which can be applied by using the Matlab built-in function wavedec2. Note that appreciable difference between the results of the analysis and synthesis models is not expected, since this basis satisfies . Nevertheless, using other varieties (e.g. overcomplete bases) is straightforward and likely to improve reconstruction fidelity. Since the focus of this article is the online method rather than optimising imaging performance, we leave the discussion of other cases of for future investigation.

In practice, visibilities will be provided as they are acquired by a telescope and so likely to be observed along tracks. One simple way to capture this, approximately, in our simplified experimental setup is to take visibilities based on distance from the origin. In any case, to demonstrate the robustness of the online method with respect to different visibility splitting settings, we also consider uniformly random visibility selection (from the variable density sampled profile) as a visibility splitting rule. The performance of the online method with respect to different number of visibility blocks is investigated as well.

For simplicity, the regularisation parameter is fixed to by trial-and-error inspection. For the standard algorithm, the maximum iteration number is used as its stopping criterion, and is set to 50, which is sufficient for reasonably good reconstruction. Without loss of generality, we report the results corresponding to the analysis model and do not show results for the synthesis model. The reason being that the performance difference of the tested methods between the synthesis and the analysis models is negligible under an orthogonal basis, as anticipated.

(a) Offline algorithm | (b) Offline algorithm | (c) Online algorithm (our work) |

(storage: 100% visibilities) | (storage: 2% visibilities) | (storage: 2% visibilities) |

### 5.2 Algorithm performance

We present the results of the tested methods – the online method (our work) and the standard method – and their comparison in terms of reconstruction quality, visibility storage requirements and computational cost. Moreover, the quantitative performance of the methods is also analysed with respect to different settings regarding the number of visibility blocks.

#### 5.2.1 Reconstruction

Figure 4 shows the reconstruction results of the tested methods for the analysis model (10) on the four test images. For the online method, all the acquired visibilities (10% of Fourier coefficients) are partitioned into blocks, where every block has the same number of visibilities (2% of the acquired visibilities or 0.2% of the total number of discrete Fourier coefficients). Figure 4 (a) shows the results of the standard method using the entire observed visibilities. Similarly, Figure 4 (b) shows the results of the standard method but with just 0.2% of Fourier coefficients uniformly randomly selected from the variable density samples (rather than by distance to the origin), which equals the number of Fourier coefficients contained in a single visibility block used for the online method. This is to test both methods under the situation of limited storage – a storage which is not large enough to store the entire observed visibilities. Recall that the online method in principle works for arbitrarily small storage. Clearly, using all visibilities a good reconstruction is obtained by the standard method, which is not the case when using just 0.2% of Fourier coefficients, which is too few to produce a reasonable reconstruction. Figure 4 (c) shows the results of our online method, which are as good as those of the standard method (cf. Figure 4 (a)) under visual validation and quantitative comparison in SNR. For instance, both methods achieve the same SNR, 14.2946 dB, for test image M31. More detailed quantitative comparison of SNR will be deferred to the next subsection. On the whole, the results between each test image are consistent with each other, demonstrating the excellent performance of the online method, which provides approximately as good reconstruction quality as the standard method.

We test that the online method is independent of the visibility splitting strategies, i.e., that different strategies produce consistent results. As mentioned before, the alternative strategy to split visibilities is to sample the full set of variable density sampled visibilities in a uniformly random manner. As an example, the result of M31 shown in Figure 4 (c, top) is achieved with SNR 14.2946 dB, and an almost identical result with SNR 14.2943 dB is obtained using the alternative visibility splitting strategy. For all the tests, we do not show the reconstructed images when using the different splitting strategy to avoid repetition, since, from visual validation the results are very similar to those shown in Figure 4 (c) based on distance from the origin. Seeing this fact, in the following experiments, the splitting strategy based on distance from the origin is adopted.

(a) M31 | (b) Cygnus A |

(c) W28 | (d) 3C288 |

(e) Zoom in (a) | (f) Zoom in (b) | (g) Zoom in (c) | (h) Zoom in (d) |

#### 5.2.2 SNR analysis

Images | Number of visibility blocks (extra number of iterations of the online method) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|

50 | 50 (5) | 100 | 100 (5) | 200 | 200 (5) | 300 | 300 (5) | 500 | 500 (5) | |

M31 | ||||||||||

Cygnus A | ||||||||||

W28 | ||||||||||

3C288 |

For the online method, as we discussed, splitting the entire visibilities into different numbers of visibility blocks impacts both storage requirements and computational cost. Firstly, when the number of blocks is relatively small, i.e., similar to the number of iterations of the standard (offline) method, the computational cost is reduced compared to the standard method. Secondly, the larger the number of blocks, the lower the visibility storage requirements. Even though a large number of blocks requires lots of iterations, the first iterations are very fast due to the small amount of data used. Moreover, no matter how large the number of blocks is, since almost all the computation is done before the data acquisition finishes, the online method always ends its reconstruction task much faster than the standard method. Recall Figure 2 for a pictorial inspection.

In order to show the influence of the number of visibility blocks on the reconstruction quality, simulation results are reported in Figure 5 with, as examples, the number of visibility blocks set to 50, 100, 200, 300 and 500. The iteration histories (SNR against iterations) of the standard method and the online method are respectively shown by blue (solid) and red (dot-dashed) lines. Moreover, for the online method, five more iterations, which are regarded as extra iterations, are executed when the obtained SNR is lower than that of the standard method and are shown by the magenta line with cross marks. For all the test images, Figure 5 demonstrates that the online method, for different numbers of visibility blocks, provides reconstructions with as good SNR as those achieved by the standard method, i.e., slightly higher or lower with almost equal chances but with no substantial difference.

In particular, from Figure 5, we can also see that the larger the visibility block size, the quicker (in terms of number of iterations) the highest SNR is reached. Nevertheless, after processing the last visibility block, all of the settings with different number of blocks get reconstructions with very similar SNR. This suggests that the online method converges stably and is robust with respect to arbitrary numbers of visibility blocks.

Accompanying Figure 5, Table 1 gives the relative difference in SNR between the results of the standard method () and online method () under different number of visibility blocks, i.e.,

(54) |

Positive and negative signs of the relative difference correspond to better performance of the standard method and the online method, respectively. In the previous subsection, we concluded that both methods achieve comparable reconstruction results via visual validation. This agrees with Table 1, which shows that both methods quantitatively perform similarly: sometimes the standard method is slightly better and sometimes the online method is, but there are no substantial differences in quality.

Figure 5 and Table 1 tells us that the online method can already provide very good reconstructions after processing the last visibility block. After that, it is optional to execute a few more iterations to improve the SNR of the reconstruction, as we can see from the magenta lines with cross marks in Figure 5. However, the improvement is not dramatic; the standard number of iterations, basically, can ensure excellent reconstructions already.

## 6 Conclusions

The work in this article has been motivated by critical computational problems in scaling RI imaging to the big-data era of radio astronomy that will be ushered in by the SKA and precursor telescopes. In particular, we addressed the extremely high storage requirements and computational costs of standard (offline) methods of recovering images from the raw data that will be acquired by forthcoming telescopes. We presented an online imaging methodology by extending standard sparse regularisation methods.

Generally speaking, our online method starts the reco

Comments

There are no comments yet.