Compressive sensing(CS) is a novel sampling theory, which provides a new signal sampling (encoding) and reconstruction (decoding) approach [1, 2, 3, 4, 5]. In detail, given a compressible signal where is the transform basis and is a sparse signal, signal can be measured by a nonadaptive linear projections (namely sensing matrix) , i.e. where
is the measurement vector andis the measurement matrix. Then, at the decoder, (or ) can be recovered from using reconstruction algorithms. Since CS framework can provide far less sampling rate than Nyquist as well as high recovery accuracy, it has been widely used in many applications such as medical imaging  and radar imaging .
As an important procedure in CS, recovering a sparse signal from insufficient number of measurement data has drawn much attention in recent years. In the last decade, many algorithms have been proposed to show accurate reconstruction performance [8, 9, 10, 11, 12, 13]. An important theoretical guarantee that behind CS reconstruction is the restricted isometry property (RIP) . It has been proved that if obeys RIP, the sparse signal can be recovered from small number of measurements
. It has also been shown that random matrices such as Gaussian matrix and Bernoulli matrix can satisfy the RIP condition with a high probability[15, 16].
Although CS Reconstruction problem has been intensively studied, applying CS reconstruction to large-scale data (such as image data) is still a challenging work. In 
, the authors proposed a conjugate gradient orthogonal matching pursuit (CG-OMP) algorithm. CG-OMP utilizes Structurally Random Matrix (SRM) as the sensing matrix , which can speed up the signal recovery process as well as reduce the storage requirement. In particular, SRM is related to operator-based approaches, and can improve all greedy algorithms and several iterative shrinkage/threshold (IST) methods such as gradient projection for sparse reconstruction algorithm (GPSR)  and sparse reconstruction by separable approximation (SpaRSA) . However, although can be fast computed, the transform basis may not always be fast computable. Thus or may still need to be stored, which needs high requirement of storage for large-scale data. To reduce the storage of , a block based compressive sensing (BCS) method was proposed [21, 22]. In BCS, the input signal is separated into several small block signals, each signal is individually sensed and recovered. BCS essentially utilizes a block diagonal matrix as the sensing matrix, which, however, is lack of theoretical guarantee. Moreover, BCS needs to modify the sampling strategy at the sensing procedure, and the applicability is limited due to unclear structure.
An efficient way to solve the storage problem for large scale data is to store the data in a distributed network, and then optimize the problem in a distributed way. So far, several distributed optimization algorithms have been proposed including distributed stochastic dual coordinate ascent (DisDCA) , communication-efficient distributed dual coordinate ascent (CoCoA)  and distributed stochastic variance reduced gradient (DSVRG)
and distributed stochastic variance reduced gradient (DSVRG). These algorithms focus on solving the optimization problem in parallel, and need to collect the updated information from all nodes at each iteration. However, in many distributed network such as wireless sensor network (WSN), it is hard to gather the whole information across the network at each iteration.
In this paper we propose a novel diffusion adaptation framework for CS reconstruction. The measurement matrix are partitioned and stored in a decentralized manner, i.e. each node of the network stores only a small part of . Therefore, the whole storage load is distributed to each node in the network. Further, inspired by diffusion adaptation strategies [26, 27, 28, 29, 30], a simple yet efficient diffusion -LMS algorithm (D
-LMS) is proposed. Each node utilizes the finite number of data recursively. The estimation are shared among neighbours at each iteration. Therefore, taking advantages of diffusion adaptation, the D-LMS algorithm can collaboratively recover the sparse signals across the network. Utilizing -norm as the regularization term can guarantee the sparse estimation. Information exchange within network also provide ability of fast convergence speed, thus greatly increases the computational efficiency. Moreover, a mini-batch based D-LMS (MB-D-LMS) is also proposed in this study. MB-D-LMS utilizes the mini-batch gradient descend (MBGD) method and can further improve the convergence speed.
The proposed D-LMS is a variant of traditional sparse diffusion LMS algorithm [31, 32], which have shown abilities in learning the sparse structure over distributed networks. Actually, the formulation of the D-LMS algorithm for CS is similar to traditional sparse diffusion algorithm except the data is recursively used. However, applying sparse diffusion algorithm to specific CS task brings more features on both experimental results and theoretical analysis. In particular, diffusion adaptation framework with sparsity constrain gives ability to collaboratively estimate the sparse signal, even though each node cannot individually reconstruct the signal due to insufficient numbers of data. Moreover, in CS all data has been already collected before process, thus the finite known data offers convenience for theoretical analysis, and also gives access to use mini-batch method in gradient estimation. Besides, since the reconstruction speed is a critical issue in evaluating the CS reconstruction algorithm, a larger step size is preferred to achieve faster convergence speed. Obviously the small step size assumption in analysis of traditional diffusion algorithm [26, 27, 28, 29, 30] is not suitable for analysis for CS, thus in this paper carry out a new theoretical analysis on the step size condition for convergence without small step size assumption.
The proposed D-LMS is also related to -LMS for CS . -LMS can be seen as a special case of D-LMS where the network contains only 1 node. By introducing traditional sparse LMS algorithm to CS, -LMS algorithm has shown great performance improvement compared with other algorithms. In particular, -LMS demands less requirement in memory, and achieves better reconstruction performance than other existing algorithms when dealing with large-scale CS reconstruction problem. In D-LMS, each node actually performs the same weight update process with -LMS, thus the computational complexity of each node in each iteration are the same as -LMS. Moreover, the diffusion algorithm gives ability to allow much larger step size for convergence condition, and is confirmed by experiments that the convergence speed is much faster than -LMS. Futher, simulations also show that D-LMS can achieve similar reconstruction accuracy with -LMS.
One should also distinguish our work from distributed compressive sensing (DCS) [34, 35, 36, 37]. In DCS, a number of measurement data are recovered by a group of sensors. The measurement data at each node are assumed to be individually sparse in some basis and are correlated from sensor to sensor. The DCS aims to solve the jointly sparse ensemble reconstruction problem, which is not the topic of our work. Another related work is the Distributed Compressed Estimation(DCE) scheme . The DCE incorporates compression and decompression modules into the distributed estimation procedure. The compressed estimator is estimated across the network using diffusion adaptation strategy, and then the reconstruction algorithms are employed to recover the sparse signal from compressed estimator. In DCE, each node still need to store the whole sensing matrix. Moreover, the reconstruction procedure is independent of diffusion adaptation procedure, which may still suffer from the same problem of typical reconstruction methods when dealing with large scale CS reconstruction problem.
The paper is organized as follows. In section II we briefly review the concept of compressive sensing and propose the diffusion adaptation framework for CS reconstruction. The gradient based and the mini batch based algorithms for diffusion CS reconstruction are then proposed in Section III. The stability analysis of D-LMS is carried out in Section IV. In Section V, simulation results are presented to verify the reconstruction performance. Finally, the conclusion is given in Section VI.
Ii Diffusion Adaptation Framework for Compressive Sensing Reconstruction
Suppose a real valued discrete signal is compressible, i.e. can be represented as where is a transform basis matrix and is a sparse signal with sparsity . In the theory of CS, the sparse signal can be measured by
where is the measurement matrix, is the sensing matrix and is the measurement vector (). In practice, the observed measurement vector may be noisy, thus the observed measurement vector can be described as
where is the additive noise vector. The CS reconstruction task is to recover the sparse signal from the measurement matrix and the corresponding noisy measurement . To successfully recover , the measurement matrix should obey the restricted isometry property (RIP).
In practice, the CS reconstruction problem can be viewed as solving a sparse constrained least squares problem with the cost function
where is the regularization term and is regularization parameter.
To apply CS reconstruction in a decentralized manner, one can modify the above cost function. In particular, considering a connected network with nodes (i.e. the network size is ), we can obtain the estimation of by minimizing the following global cost function
Fig.1 shows an example of diffusion adaptation framework for CS reconstruction. The connected network includes 7 nodes. The whole is partitioned into small parts . Then, node only stores a small part of the measurement matrix and receives the corresponding measurement data . The information of a node can be transmitted within its neighbourhood (denoted as red links). Although each node has insufficient numbers of measurements and can only exchange information within local neighbours, the information diffusion across the whole network provides the ability to access the whole information of .
Iii Proposed Algorithms For Compressive sensing
Iii-a Gradient descent D-LMS for CS
The diffusion adaptation algorithms for stream data has been intensively studied [28, 32, 39, 40]. Given the temporal sparse input data sequence and the corresponding output data sequence at node , the sparse diffusion LMS adaptation algorithm  obtains the estimation by minimizing the following global cost function 
Intuitively, we can define and as
where and denote the transpose of the th row of and the -th scalar of , respectively. Thus the solution to CS reconstruction problem in Eq.(4) can be formulated based on the the diffusion adaptation algorithm with cost function in Eq.(5).
In traditional diffusion algorithm, the data size is always much larger than input dimension . However, in CS is much smaller than . When directly apply the sparse diffusion algorithm to CS, the adaptation process may not converge to the steady state due to insufficient number of data. To solve this problem, we follow the method described in  and use the data recursively. In particular, the data used at -th iteration in node are
Therefore, combining diffusion adaptation strategy and modified data sequence in Eq.(6), we can derive two gradient-descend based diffusion adaptive algorithms for CS, namely, the Adapt-then-Combine (ATC) diffusion -LMS (ATC-D-LMS) algorithm
and the Combine-then-Adapt (CTA) diffusion -LMS (CTA-D-LMS) algorithm
where and are intermediate vectors of node at time , is the derivation of and
is the instantaneous gradient vector. is the neighbourhood of node and is the corresponding step size. are non-negative weights assigned to link between and for adaptation and combination step, respectively. Further, can be seen as the -th entries of matrices and respectively. Specifically, and should satisfy
where denotes the all one vector.
An important problem left is to calculate . Since is non-differentiable, one should use an approximation instead. There are several approximations of norm  which can work well for sparse identification [33, 32]. In this paper we use the zero attraction term similar to -LMS, which is defined as 
and is the zero attraction parameter.
The pesudo code of ATC-D-LMS is summarized in Algorithm 1. At iteration , each node sends the data pair to its neighbours. Then the adaptation is performed at each node. After adaptation, the estimation in each node is transferred to its neighbours for combination. The process of CTA-D-LMS is similar with ATC-D-LMS except that the order of adaptation step and combination step are reversed.
which is the typical -LMS for CS . Therefore, -LMS can be viewed as a special case of ATC-D-LMS and CTA-D-LMS.
Remark 2: In a strict sense, the name D-LMS is nonstandard since the algorithm actually minimize an approximation of norm. In this paper, we just use the D-LMS to keep the name consistent with -LMS for CS .
Remark 3: For large scale data, to reduce the amount of network transmission, one can put away the adaptation information exchange, i.e. . That is, each node only utilize its own data to perform adaptation, and then share its estimation to neighbours for combination. Simulation results in Section V will verify the feasibility of this strategy.
Iii-B Mini-batch D-LMS for CS
The proposed gradient-based D-LMS is a typical extension of traditional sparse diffusion algorithm. We should notice that unlike traditional diffusion adaptation, in CS all the data is already known. Therefore, one can utilize more information during each iteration. In , the regularized exponetially forgetting windows LMS (-EFWLMS) algorithm is proposed to improve the convergence speed of -LMS. Extended from affine projection algorithm(APA) , -EFWLMS utilizes a sliding window approach to use more data to improve the gradient estimation. However, -EFWLMS still follows the traditional adaptive filtering method.
In data regression problem, mini-batch gradient descent (MBGD) method has been widely used. MBGD selects a small part of the sample data, computes gradient for each data, and then calculates the average gradient as the gradient estimation. For diffusion CS, the input data for node at each iteration can be chosen as
where is the index vector whose elements are uniformly and randomly chosen from . Then, according to MBGD method, the average gradient is defined as
Then, the weight update of corresponding ATC mini-batch diffusion -LMS algorithm (ATC-MB-D-LMS) can be represented as
In real application, utilizing mini-batch method gives faster convergence speed than D-LMS, but may also cause instability when the step size is large. To alleviate the negative impact caused by MBGD method, we optimize the iterative process by constraining the sparsity variance during the convergence process. In particular, the sparsity of the estimation in at -th iteration is defined as
and is a small positive threshold. Then, after a small number of iterations , if the sparsity of -th iteration is larger than 1.5 times of the sparsity at -th iteration, the estimation will not update.
The pesudo code of ATC-MB-D-LMS is summarized in Algorithm 2. Unlike Algorithm 1 where is transmitted within neighbours, to alleviate the load of network transmission, here the estimation is transmitted to neighbours. The gradient is computed at neighbour nodes and then sent back. Similar to ATC-D-LMS, one can also set to put away the adaptation step to save the amount of network transmission.
Iii-C Data Partition Strategy and Stop criterion
For a connected network, using diffusion based algorithm allows each node to observe all the information of the data, thus in theory for any partition strategy the sparse signal can always be recovered. However, as discussed later in Part E, Section IV, the data correlation may influence the convergence condition on step size which is directly related to reconstruction speed. Since the data is recursively used, the data correlation may occur every iterations. Therefor, to avoid high correlation of data, the partition strategy should selected so that for each node are as large as possible. In practice, uniformly assign the data to each node is a proper choice.
Although one can use the maximum iteration number to stop the iteration, one would like a more practical stop criterion. In , the author utilizes the squared error between adjacent estimation as the index of stop condition. However, it is always not operational in real applications. Observing the fact that the sparsity of the estimation will maintain as the algorithm converges to the steady state, here we propose a new stop criterion based on the sparsity of the estimation: given the window length and the threshold , by defining the count at iteration
and , if , then the algorithm will stop.
Iv Convergence Analysis
In this section, we carry out the convergence analysis of D-LMS algorithm for CS reconstruction. We first derive the recursion form of the algorithm, and then analyze the sufficient condition for convergence on step size under different parameter settings. Finally the influence of regularization term and data reuse is also discussed.
Iv-a Recursion form derivation
where can be seen as the -th entries of matrices and and respectively. ATC-D-LMS and CTA-D-LMS can be viewed as two special cases by setting and , respectively.
Iv-B Sufficient condition for convergence under deterministic measurement matrix
Based on the recursion in Eq.(22) and property that the data is recursively used, one can further derive the following theorem.
Given a certain measurement matrix and matrices , , , for any finite initial vector and finite noise vector , define the product
where is the least common multiple of , and define the sequence as the eigenvalues of
as the eigenvalues ofarranged from large to small according to the modulus. For arbitrary initial condition, in the diffusion algorithm in Eq.(22) will converge if the step sizes are selected so that the modulus of the -th eigenvalue is less than 1, i.e.
See Appendix A.
The above result can be used in practical since the generated measurement matrix is always fixed during the specific CS task. Different from traditional diffusion process, once the data has been collected, the process is deterministic and there is no more randomness in the operation of the reconstruction.
Iv-C Sufficient condition for convergence on random measurement matrices
In many situations, the measurement matrix may not always fixed during all reconstruction process. However, since the measurement matrix is always selected as the random matrix, we can follow the method of traditional diffusion algorithm [26, 27, 28, 29, 30] and analyze the convergence of the proposed algorithm in mean and mean squares sense.
Similar to traditional diffusion algorithms, for tractability of the analysis we use the following assumptions:
A1. The elements of noise vector are i.i.d processes and independent of measurement matrix .
A2. of each node is sufficient large so that at arbitrary node is independent of .
A3. The noise has finite variance.
We should remark that in CS reconstruction, due to recursively use of data, Assumption A2 may not be satisfied in practice. Nevertheless, if is sufficient large so that the correlation is sufficient small, the independent assumption can almost reach. This fact has been proved by simulation results of -LMS for CS . Moreover, one should also notice that the small step size assumption in analysis of traditional diffusion algorithms is removed. In [26, 27, 28, 29, 30], the analysis is based on the assumption that the step sizes are sufficient small so that the higher-order powers of step sizes can be ignored. However, in CS a larger step size is preferred to achieve faster convergence speed. Thus in this paper we eliminate this assumption and propose a new analysis on mean and mean squares performance. In the analysis, without loss of generality, the variances for each element of are all set to .
Postmultipling both sides of Eq.(22) with their respective transposes, and then utilizing trace operation, we can obtain the following weighted mean square relation
Using the relationship of vectorization operator, matrix trace and Kronecker product
by defining , Eq.(25) can be derived as
where and denotes the same quantity, and
It is easy to prove that and are always bounded. Moreover, for one can obtain
Therefore, to ensure the bounded property of , should be always bounded. This recalls the mean recursion relation, which is obtained by taking expectation of both sides of Eq.(22)
It is known that will be bounded for all if converges as . Moreover, it has been proven that the stability of and can ensure the convergence of Eq.(25) and Eq.(33), respectively . Thus, for arbitrary the sufficient condition for both mean and mean square stability of Eq.(25) should be
where denotes the spectral radius of matrix .
To further simplify the condition in Eq.(35), we propose the following theorem:
For arbitrary real square matrices sequence , the following inequality will always hold
See Appendix B.
According to Theorem.1, if we set and for , we can obtain
Moreover, according to eigenvalue relationship of kronecker product, we can obtain . Therefore, we have the following conditions
Therefore, we can conclude that under Assumptions A1-A3, the condition will guarantee both mean and mean-square stability of the proposed algorithm.
For large scale data, computing the eigenvalue of is hard since the complexity grows significantly as increases. To simplify the computation of , we further propose the following theorem:
Given sum of arbitrary real square matrices sequence
where is the arbitrary identity matrix. Then, we will have
is the arbitrary identity matrix. Then, we will have
See Appendix C.
Rewriting Eq.(31) we can obtain
Thus, according to Theorem.2, one can compute by simply set . Specifically, by defining
we will have . Therefore, instead of calculating the spectrum radius of with dimensions, we can simplify obtain from which has only dimensions. In practical, the is typically a sparse matrix. Thus we can use an computationally-efficient search technique [44, 45] to find , which is easy to implement.
Iv-D Further analysis under general parameter settings
Under Assumption A1-A3, gives the sufficient condition for the convergence of the proposed algorithm. Moreover, in practical diffusion adaptation, the step size of all nodes are always set to the same value, and
is always set to doubly stochastic matrix[26, 27, 28, 29, 30, 46]. Therefore, in the following analysis we set where is the identical step size for all nodes. The is set to doubly stochastic such that . Without loss of generality, here we only analysis ATC strategy, such that . Thus Eq.(43) can be further simplified as