 # Sparse Randomized Kaczmarz for Support Recovery of Jointly Sparse Corrupted Multiple Measurement Vectors

While single measurement vector (SMV) models have been widely studied in signal processing, there is a surging interest in addressing the multiple measurement vectors (MMV) problem. In the MMV setting, more than one measurement vector is available and the multiple signals to be recovered share some commonalities such as a common support. Applications in which MMV is a naturally occurring phenomenon include online streaming, medical imaging, and video recovery. This work presents a stochastic iterative algorithm for the support recovery of jointly sparse corrupted MMV. We present a variant of the Sparse Randomized Kaczmarz algorithm for corrupted MMV and compare our proposed method with an existing Kaczmarz type algorithm for MMV problems. We also showcase the usefulness of our approach in the online (streaming) setting and provide empirical evidence that suggests the robustness of the proposed method to the distribution of the corruption and the number of corruptions occurring.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In recent years, there has been a drastic increase in the amount of available data. This so-called “data deluge” has created a demand for fast, iterative algorithms that can be used to process large-scale data. Stochastic iterative algorithms, such as the Randomized Kaczmarz Algorithm or Stochastic Gradient Descent, have become an increasingly popular option for processing large-scale data

[3, 10]. These methods recover signals given a vector of measurements and a measurement matrix such that:

 Y=ΦX, (1.1)

without accessing the full measurement matrix in a single iteration. We refer to (1.1) as a Single Measurement Vector (SMV) model. In the Multiple Measurement Vector (MMV) setting, one may have thousands of measurement vectors pouring in overtime. Each measurement vector corresponds to a signal where signals typically share a common property such as sparsity, smoothness, etc. For simplicity, let and

. Since high-dimensional data is typically sparse in nature, a commonality of particular interest is

joint sparsity, or when the support of all signals are the same. In particular, the support of a vector v is defined to be the set of indexes for which v is nonzero i.e., .

Many algorithms have been developed for the MMV setting, especially in applications such as line spectral estimation

[14, 20] and modal analysis . In particular, the authors in these works extend the previous SMV-based algorithms as well as theoretical analysis in [4, 19, 8] to the MMV case. The theoretical bound in  also indicates that MMV settings could make compressed signal recovery much easier than in the SMV setting. In particular, the number of measurements needed for perfect recovery in each signal decreases as the number of signals increases reducing the sample complexity per signal.

As a motivating example, consider diffuse optical tomography (DOT) where the goal is to find small areas of high contrast corresponding to the location of cancerous cells . Since cancerous cells have a much larger absorption coefficient than healthy cells, the two-dimensional medical image can be interpreted as a sparse signal where each entry of the signal represents the absorption coefficient of a given pixel and the nonzero entries correspond to tumor locations. In a hyperspectral DOT setting, hundreds of different wavelengths are used to acquire a variety of images of the same tissue, allowing practitioners to obtain a more accurate location of tumors . The result of the hyperspectral imaging process is a jointly sparse MMV, where each wavelength produces a different image (or signal), and the joint support across all images represents the locations of cancerous cells.

Signals may share support but it is improbable for them to be perfectly accurate. Since sensing mechanisms are not impervious to error, signals can contain corruptions. Other sources of corruption in signal processing include spikes in power supply, defective hardware, and adversarial agents . Going back to the hyperspectral imaging example, “corruptions” in each signal may be caused by noncancerous cells that absorb more light at a given wavelength than their neighbors. For example, if a cell contains an anomalous amount of melanin, then it absorbs more light at shorter wavelengths in the visible spectrum (i.e., violet or blue light) compared to a typical noncancerous cell [15, 5]. This produces a large nonzero absorption coefficient in the location of a healthy cell, i.e., a corruption. These corrupt entries erroneously indicate the presence of cancerous cells in a location with healthy cells.

Corruptions cause support recovery algorithms such as the MMV Sparse Randomized Kaczmarz (MMV-SRK) algorithm, which we describe in detail in Section 2, to fail due to the algorithmic dependence on the row norms of the signal approximation to estimate the support . Thus, large corruptions in a signal with comparatively small entries may erroneously be included in the support estimate given by these algorithms. In the corrupt MMV setting, the availability of multiple measurement vectors becomes vital to the estimate of the true support. Clearly, if only a single measurement vector is available, there would be no way to distinguish a corrupt nonzero entry without any additional assumptions on the signal or corruption. Corrupt measurement signals have been studied in the context of the SMV model. In  and , additive noise in the measurement scheme is assumed to be sparse. Both works focus on the compressive sensing setting where .

The primary objective of this work is to design an algorithm for recovering the support of jointly sparse, corrupt signals from large-scale MMV. We propose a new online algorithm called Sparse Randomized Kaczmarz for Corrupted MMV (cMMV-SRK) for support recovery. Note that the proposed algorithm can recover the signals very well, but we mainly focus on support recovery in this work. Our experiments show that the proposed algorithm outperforms the previously proposed Kaczmarz type algorithm in recovering the joint support from MMV when the signals are corrupted.

### 1.1 Problem Formulation

The mathematical formulation of the problem can be stated as follows. Suppose one is given a set of linear measurements and a measurement matrix such that:

 Y(⋅,j)=ΦX(⋅,j) for j=1,…J, (1.2)

with . We assume that the data is large-scale, meaning we cannot access all of at once ( and/or are too large) and must only operate on one row of at a time. We allow the system to be overdetermined () or underdetermined () and assume ’s are jointly sparse such that and . For an -dimensional vector , let return with zeros in the smallest (in magnitude) entries. We also assume that each column of contains one or more corruptions. In other words, instead of , the joint support set, the support of is:

 supp(X(⋅,j))=S∪Cj,Cj⊂{1,…,n},

where is the “corrupt index set” and are not necessarily the same for every . In this work, our goal is to recover the joint support from the given linear measurements .

The remainder of this manuscript is organized in the following way. Section 2 discusses the Sparse Randomized Kaczmarz method and the MMV-SRK algorithm. Section 3 provides a discussion on how corruptions can negatively impact the performance of MMV-SRK. Section 3 also presents our method, cMMV-SRK, a variant of SRK which works in the corrupted signal setting. Numerical experiments using this method are presented in Section 4 and we conclude with a summary of our contributions and future directions in Section 5.

## 2 Related and Existing Work

### 2.1 Sparse Randomized Kaczmarz

In this work, we utilize the Sparse Randomized Kaczmarz algorithm to recover the support of each column of . The original Kaczmarz algorithm was first introduced in the early 1900s by Kaczmarz himself and was revitalized as the Algebraic Reconstruction Technique in the early 1980s [10, 6]. The Randomized Kaczmarz Algorithm (RK) used here was first introduced by Strohmer and Vershynin and enjoys an expected linear convergence rate to the solution of a consistent linear system . The Sparse Randomized Kaczmarz (SRK) algorithm is another variant designed specifically for overdetermined systems with sparse solutions. SRK has also been empirically shown to solve underdetermined systems with sparse solutions as well .

Algorithm 1 outlines the SRK algorithm. Note that ties are broken lexicographically in Step 5 of Algorithm 1 and all algorithms presented in this work. The estimated support size is a parameter of the algorithm and is typically chosen to be larger than the true support size . In this variant, the algorithm runs for a specified number of iterations (up to ). However, any stopping criteria one would use for an iterative algorithm e.g. terminating after the residual meets a certain criteria, after the updates become small, etc. can be used. Algorithm 1 also differs from the original presented algorithm by  in that at every iteration the support estimate has size instead of starting with and shrinking the size to . We find that these modifications do not significantly affect the behavior of SRK.

Algorithm 1 has been shown empirically to find the solution to overdetermined, consistent (i.e., a solution exists) linear systems but there are no theoretical results supporting this. One can make a few observations about the behavior of SRK for support recovery. Concerning the support size estimate , it is clear that if then the probability that the true support is contained in the support of the approximation is 0, i.e., . Additionally, if , then . In regards to the choice of weighting, as , so that row elements inside the support estimate contribute mostly to the approximation. If one has a weighting function that decreases too rapidly, the true support may not be captured in causing the algorithm to fail. If the weighting function decreases too slowly, then the algorithm will converge more slowly.

Although Algorithm 1 and the following algorithms require the Frobenious norm of the matrix, , for row selection, practically speaking row selections can be done uniformly at random to avoid using the full measurement matrix in a single iteration. Indeed, it is advantageous to select the rows at random to avoid introducing bias from rows with larger norms.

### 2.2 SRK for MMV

Here we present a previous SRK-based approach to the MMV setting proposed by . Because we are assuming joint sparsity in the MMV model, the estimated support of a signal reveals information about the support of all signals. The authors of  present Algorithm 2 to leverage this idea. There are a few key aspects to note about this version of the SRK algorithm. First, the algorithm is running one iteration of SRK for every signal in the MMV model then updating the support estimate based on the row norms of the estimate . Due to this, the algorithm does not lend itself well to being extended for an online variant which only receives a small number (possibly 1) of signals at a time. Second, the algorithm uses the same selected row for each signal. It has been well observed that a random selection scheme reduces the possibility of a poor choice of row ordering and it may be advantageous to allow each signal to be projected onto a different randomly selected row [7, 9]

## 3 Main Results

### 3.1 Corrupted MMV

To review, we are interested in constructing an algorithm that recovers the support of jointly sparse corrupted high-dimensional MMV, that is, where we can only access one row of the measurement matrix at a time. To this end, we propose Algorithm 3, which we refer to as cMMV-SRK. We first note that the base of this algorithm is the SRK algorithm (Algorithm 1), which is an effective algorithm for large-scale problems due to its low memory footprint, requiring only one row of the measurement matrix to be used at a time. cMMV-SRK has been adapted to the MMV case using the intuition that the individual signals give us information about the common support between all signals. We keep track of a bin or tally vector that estimates the true support of the signals. In particular, we use the nonzeros in to indicate the estimated joint support. This binning process allows the algorithm to be robust in the face of corruptions in the signal, as the corruptions will receive a low number of tallies compared to the entries in the true support because the corruptions occur in random positions for every signal. Note that in the corrupted MMV case, we expect Algorithm 2 to fail as the support estimate step relies on the -norm of the rows to be large if an index is in the support and small otherwise. The corruptions may be so large that a single corruption in a row could lead to mis-identification of the corrupt entry being in the joint support.

Finally, in Algorithm 3 we account for signals being processed one at a time, as they are in an online or “streaming” setting.

For each signal, let be the number of SRK projections performed on and let . In the online setting, one can imagine that the amount of time before the next signal is acquired may vary due to, for example, stalls in the measurement process. The varying amount of time that the system has to process each signal is one of the major challenges of support recovery in the online setting. In order to improve the joint support estimate when varies, we weight the binning based on . In other words, we let where is the -th entry of and is the maximum number of inner iterations of SRK for the signal. This reweighting scheme places a larger importance on support estimates which have had more time (iterations) to improve. In the online setting where s is not known a priori, can be set manually. We adopt the following notation for cMMV-SRK: is the estimated support at the SRK iteration for , and denotes the joint support estimate.

If the number of inner iterations is large enough, the support estimate should be such that it contains the joint support (along with the corruption index). Because we are tallying the support estimate after every iterations, it is clear that the entries in the joint support will have an overwhelming number of tallies compared to all other entries. The experimental results in the next section support these claims and we leave the analytical study of these algorithms for future work.

## 4 Experiments

In this section, we compare Algorithm 2 and Algorithm 3 under a variety of settings, specifically comparing the robustness of each algorithm in the presence of corruptions in the signal. To test this robustness, we vary the number of corrupt entries, the distribution from which the corruptions are drawn, the number of columns in , and the number of projection computations made for each signal. In what follows, we will refer to as the number of SRK iterations. These experiments are summarized in Table 1. In all experiments, the results are averaged over 40 trials and the nonzero entries of are drawn independently and identically distributed (i.i.d.) from . Note that on the -axis we plot the number of “iterations” where a single iteration is defined by a projection. In other words, the -axis represents every time Step 9 is performed in Algorithm 3 and Algorithm 2.

Figure 1 compares Algorithm 2 and Algorithm 3 with , , and . The support size estimate is . To create , we uniformly at random select of indexes to be joint support and set . The corrupt entries are drawn uniformly at random from . To start off, each signal has one corrupt entry. We choose corruptions i.i.d. from to simulate corruptions being large spikes in the signal (possibly caused by system malfunction or an adversarial agent). The maximum number of SRK iterations for each signal is . In Figure 0(a), we create and . In Figure 0(b), the entries of

are drawn from a uniform distribution ranging from 0 to 1. We note that, in both cases, Algorithm

3 is able to recover the full support after a sufficient number of iterations, whereas Algorithm 2 is only able to recover at most about 20% of the support, regardless of the number of iterations. Since Algorithm 2 relies on row norms to estimate the joint support, it is to be expected that the relatively large value of the corruption would cause it to often be erroneously chosen to be part of the joint support estimate. As a result, this experiment highlights the advantage of the binning in Algorithm 3 in the presence of a single corruption with a high magnitude.

In Figure 2, we experiment further with the magnitude of the corruption. Here we have that , , and

but instead of the corrupt entries being drawn from a mean 7 and standard deviation 1 distribution, it is drawn from a standard normal distribution, as are the entries in the support. This allows us to test the robustness of our method to the choice of distribution. Note that Algorithm

2 is able to find an increasingly accurate approximation for the support in this case, and will be able to recover the full support after a sufficiently large number of iterations. Because the magnitudes of the corruptions are smaller, the algorithm still has a chance of detecting the correct support using row norms to estimate . However, Algorithm 3 is able to obtain an accurate support estimate much faster than Algorithm 2.

In the next two experiments, we test the robustness of our proposed algorithm against multiple corruptions. In Figure 3, we allow for each signal to have multiple corruptions. For each signal, or column of , we uniformly at random select an integer from 1-3 to be the number of corruptions. The value of the corruptions are drawn i.i.d. from and an example of the resulting matrix can be seen in Figure 2(a). The performance of the methods can be seen in Figure 2(b). We note that the results of this experiment are very similar to those of the experiment in Figure 1 since the corruptions are drawn from the same distribution. As we would expect, again due to the use of row norms, in the presence of multiple corruptions Algorithm 2 gives a less accurate estimate than in the presence of only one corruption drawn from this distribution, recovering no more than about 15% of the support.

Figure 3(a) shows the performance results for Algorithm 3 in a simulated online setting. Instead of allowing the algorithm to loop for a fixed number of projections for each signal, we have 90% of the signals with and the other 10% of signals with . The purpose of this is to simulate a variation in the amount of time a system has to work with a signal. The longer runs represent stalls in the online setting. For each signal, we first draw a random Bernoulli variable with probability of success . If , then we choose an integer in uniformly at random. If , then an integer in is chosen uniformly at random. Algorithm 2 cannot be investigated under this setting as the support estimate relies on processing all signals in every iteration. We note that with respect to other parameters, the only difference between this experiment and that in Figure 3 is the size of . We choose to be large enough such that the maximal number of projections made is 15000 (as in Figure 3).

The following experiment is motivated by compressed sensing and utilizes an underdetermined linear systems as opposed to an overdetermined system. We repeat the the parameters as in Figure 3(a) with the exception of the measurement matrix, which has rows and columns, and the total number of signals . The results can be found in Figure 3(b). In the underdetermined case, the proposed algorithm is still successful in recovering the support of the signal.

Finally, we tested the robustness of our algorithm on the hyperspectral diffuse optical tomography motivating problem discussed in the introduction. For this experiment, we simulated absorption coefficient values for a two-dimensional circular sample of tissue of radius 25 centimeters, centered at the origin, with a circular tumor of radius 5 centimeters centered at the point (-15,-10). See Figure 4(a). Each signal was thus representing a reconstruction of the absorption coefficient value at each point in a mesh of size 541 over the sample area. The number of measurements corresponded to the number of source-detector pairs in the imaging process. We used a random Gaussian measurement matrix with and , with total signals, each corresponding to a different wavelength at which the tissue was imaged. We note that this is also an underdetermined system. As in previous experiments, 1 to 3 corruptions for each signal were drawn from a normal distribution with mean the average value of the absorption coefficient for the cancerous cells at each wavelength, and standard deviation a quarter of the distance between that value and the value of the absorption coefficient for the healthy cells. The online setting was not used for this experiment. The results can be found in Figure 6. We see that the proposed algorithm is still successful in recovering the support of the signal. (a) Geometry of the real-world example. Figure 6: Investigating the robustness of cMMV-SRK in a simulated real-world setting (hyperspectral diffuse optical tomography) when random multiple corruptions occur in each signal. In this setting, the measurement matrix is underdetermined (m=248, n=541).

The experiments shown in this section highlights the usefulness of Algorithm 3 for support recovery of jointly sparse MMVs, especially in the presence of (even quite large) corruptions. In each comparison between Algorithm 2 and Algorithm 3, our proposed method outperforms the previously proposed method for support recovery. Additionally, the proposed method lends itself to the online setting.

## 5 Conclusion

In this work, we construct an algorithm for the support recovery of corrupted jointly sparse MMV. Our empirical results demonstrate that the proposed algorithm outperforms the previously proposed method, MMV-SRK, for support recovery of jointly sparse MMV, specifically when corruptions are present in the signal. Furthermore, empirical evidence indicates that our method is robust to the magnitude of corruptions and the number of corruptions. This improvement is due to the fact that the support estimate in Algorithm 2, as many other signal recovery approaches for the jointly sparse MMV problem, depends on the row norms of the signals, which in this case would be dominated by the corruption In comparison, the estimate for Algorithm 3 only depends on the number of times an index appears in the support estimate of a signal. Lastly, our method lends itself well into the online setting when measurement vectors are streaming in continuously. We leave the analytical study of our method for future work.

## Acknowledgements

The initial research for this effort was conducted at the Research Collaboration Workshop for Women in Data Science and Mathematics, July 17-21 held at ICERM. Funding for the workshop was provided by ICERM, AWM and DIMACS (NSF grant CCF-1144502). SL was supported by NSF CAREER grant CCF

. DN was partially supported by the Alfred P. Sloan Foundation, NSF CAREER , and NSF BIGDATA . JQ was supported by the faculty start-up fund of Montana State University.

## References

•  H. K. Aggarwal and A. Majumdar. Extension of sparse randomized kaczmarz algorithm for multiple measurement vectors. arXiv preprint arXiv:1401.2288, 2014.
•  S. R. Arridge and J. C. Schotland. Optical tomography: forward and inverse problems. Inverse Problems, 25(12), July 2009.
•  L. Bottou.

Large-scale machine learning with stochastic gradient descent.

In Proceedings of COMPSTAT’2010, pages 177–186. Springer, 2010.
•  E. J. Candès and C. Fernandez-Granda.

Towards a mathematical theory of super-resolution.

Communications on Pure and Applied Mathematics, 67(6):906–956, 2014.
•  L. Fodor, M. Elman, and Y. Ullmann. Aesthetic Applications of Intense Pulsed Light. Springer-Verlag London, 2011.
•  R. Gordon, R. Bender, and G. T. Herman. Algebraic reconstruction techniques (ART) for three-dimensional electron microscopy and X-ray photography. J. Theoret. Biol., 29:471–481, 1970.
•  C. Hamaker and D. C. Solmon. The angles between the null spaces of X-rays. J. Math. Anal. Appl., 62(1):1–23, 1978.
•  R. Heckel and M. Soltanolkotabi. Generalized line spectral estimation via convex optimization. IEEE Transactions on Information Theory, 2017.
•  G. Herman and L. Meyer. Algebraic reconstruction techniques can be made computationally efficient. IEEE Trans. Medical Imaging, 12(3):600–609, 1993.
•  S. Kaczmarz. Angenäherte auflösung von systemen linearer gleichungen. Bull. Int. Acad. Polon. Sci. Lett. Ser. A, pages 335–357, 1937.
•  F. Larusson, S. Fantini, and E. L. Miller. Hyperspectral image reconstruction for diffuse optical tomography. Biomedical Optics Express, 2(4):946–965, April 2011.
•  J. N. Laska, M. A. Davenport, and R. G. Baraniuk. Exact signal recovery from sparsely corrupted measurements through the pursuit of justice. In Signals, Systems and Computers, 2009 Conference Record of the Forty-Third Asilomar Conference on, pages 1556–1560. IEEE, 2009.
•  S. Li, D. Yang, G. Tang, and M. B. Wakin. Atomic norm minimization for modal analysis from random and compressed samples. arXiv preprint arXiv:1703.00938, 2017.
•  Y. Li and Y. Chi. Off-the-grid line spectrum denoising and estimation with multiple measurement vectors. IEEE Transactions on Signal Processing, 64(5):1257–1269, 2016.
•  G. Lu and B. Fei. Medical hyperspectral imaging: a review. Journal of Biomedical Optics, 19(1), January 2014.
•  H. Mansour and Ö. Yilmaz. A fast randomized kaczmarz algorithm for sparse solutions of consistent linear systems. CoRR, abs/1305.3803, 2013.
•  T. Strohmer and R. Vershynin. Comments on the randomized Kaczmarz method. J. Fourier Anal. Appl., 15(4):437–440, 2009.
•  C. Studer, P. Kuppinger, G. Pope, and H. Bolcskei. Recovery of sparsely corrupted signals. IEEE Transactions on Information Theory, 58(5):3115–3130, 2012.
•  G. Tang, B. N. Bhaskar, P. Shah, and B. Recht. Compressed sensing off the grid. IEEE transactions on information theory, 59(11):7465–7490, 2013.
•  Z. Yang and L. Xie. Exact joint sparse frequency recovery via optimization methods. IEEE Transactions on Signal Processing, 64(19):5145–5157, 2014.