1 Introduction
In recent years, there has been growing interest in developing signal processing tasks and learning methods that work directly on compressive versions of data [1]
. The main idea behind this approach is to maintain informative sketches by using lowdimensional random projections of data known as compressive measurements. The task of learning is then done using the compressive measurements instead of the full data. As a result, these learning techniques can be used to reduce the computational, storage, and communication burden in today’s highdimensional data regimes
[2].There are several lines of work that focus on performing signal processing tasks and information retrieval using only the partial information embedded in the compressive measurements. For example, the authors in [3] studied the performance limit of classification based on compressive measurements. In [4], Atia studied the problem of change detection based on random projections that satisfy the restricted isometry property (RIP) [5]. Moreover, dataadaptive dictionary learning from compressive versions of data has been studied in several works, including [6, 7, 8, 9].
In this paper, we focus on the estimation of the sample covariance matrix from compressive measurements. Covariance matrix estimation is a fundamental problem in signal processing and data analysis. For example, eigendecomposition of the data’s covariance matrix can be used to perform principal component analysis (PCA), which is frequently used for feature extraction
[10].In this work, we propose and analyze an unbiased estimator for the sample covariance matrix, where the data samples are observed only through lowdimensional random projections. In particular, we consider a general class of random projections that has entries drawn independent and identically distributed (i.i.d.) from a zeromean distribution with finite first four moments. Thus, one can use sparse random projections [11]
with independent Rademacher entries (a Rademacher random variable takes two values
with equal probability) to achieve reduction of the data acquisition cost and computational complexity. Therefore, our covariance estimator can be utilized on power constrained mobile devices and wearables such as cell phones and surveillance cameras
[2, 12]. Our estimator can also be used in the distributed covariance estimation problem, e.g., environmental/atmospheric monitoring, where the data samples are observed at distributed sensors. In such settings, we would like to reveal the underlying structure with minimal measurement and communication overhead [13, 14, 15].The key contributions that differentiate this paper from prior work are: (i) a fixed data setting is considered without making distributional assumptions on the set of data samples; (ii) we do not impose structural assumptions on the covariance matrix such as being lowrank; and (iii) our theoretical analysis does not require random projections that satisfy the distributional JohnsonLindenstrauss property [16]. Instead, our theoretical result applies to a wide range of random projections with entries drawn from a zeromean distribution.
The problem setup and a brief review of prior work are presented in Section 2. In Section 3, we introduce our unbiased estimator for the sample covariance matrix, with theoretical guarantees deferred to Section 4. In Section 5, we present experimental results on three realworld data sets from different application domains to verify the performance of our covariance estimator in the compressive domain.
2 Preliminaries
2.1 Notation
We name column vectors by lowercase bold letters and matrices by uppercase bold letters. The matrix
is the identity matrix of size
. For a symmetric matrix , represents the spectral norm and defined as .Moreover, let denote the trace of the matrix , and represent the matrix formed by zeroing all but the diagonal entries of the matrix .
2.2 Problem formulation
Consider a collection of data samples in and let be a matrix whose th column is the data sample . In this work, we do not make any distributional assumptions on the data and our goal is to recover the sample covariance matrix from a single pass over the data:
(1) 
In modern data settings, the high dimensionality of massive data sets and limited resources in the data acquisition process make full data access infeasible [17]. To address this problem, we aim to recover
from compressive measurements of the data. This can be viewed as forming a random matrix
with for , . The entries of are drawn i.i.d. from a zeromean distribution with finite first four moments [18]. We are interested in estimating the sample covariance matrix from lowdimensional random projections .As we will see, the shape of the distribution in generating random matrices
is an important factor in order to find an unbiased estimator for the sample covariance matrix. The kurtosis is a measure of heaviness of tail for a distribution. It is defined as
, where andare the second and fourth moments of the distribution. The value of kurtosis can also be used as a measure of deviation from the Gaussian distribution, since the kurtosis of a zeromean Gaussian distribution is zero.
2.3 Relation to prior work
Our work is most closely related to [19, 20], where the estimation of covariance matrices from lowdimensional random projections is studied. It was shown that the sample covariance matrix of (scaled by a known factor) can be used to estimate the covariance structure of the full data . The motivation behind this technique is that approximates the projection of onto the column space of , where the expensive matrix inverse is eliminated in .
A major drawback of the prior work is that the result only holds for data samples drawn from a specific probabilistic model known as the spiked covariance model. In fact, this simple statistical model is popular in studying the covariance estimation problem, but unfortunately many realworld data sets do not fall into the spiked covariance model.
Another disadvantage of the previous work is that the proposed estimators are biased. One must have prior knowledge of the parameters of the probabilistic model to modify these estimators. In reality, it is almost infeasible to accurately estimate these parameters.
3 The proposed unbiased covariance estimator
In this section, we present an unbiased estimator for the sample covariance matrix of the full data from the compressive measurements . Recall that the entries of , , are drawn i.i.d. from a zeromean distribution with finite second and fourth moments and , and kurtosis . One advantage of our approach compared to the prior work is that our work makes much weaker assumptions on the data. Therefore, our analysis can be used to develop a better understanding of extracting the covariance structure in the compressive domain.
3.1 Compressive covariance estimation
To begin, let’s consider a rescaled version of the sample covariance matrix of the projected data :
(2) 
We compute the expectation of based on Theorem 1 in Section 4:
(3) 
where represents the matrix formed by zeroing all but the diagonal entries of .
We observe that the expectation of the covariance estimator has three components: the sample covariance matrix of the full data that we wish to extract and two additional bias terms. The bias term can be viewed as representing a bias of the estimator towards the nearest canonical basis vectors. The value of this term depends on both the kurtosis of the distribution used in generating random matrices and the structure of the underlying covariance matrix . In contrast, the bias term in (3) is independent of the shape of the distribution. This term shows that the energy of the compressed data is somewhat scattered into different directions in , as represents the energy of the input data. Note that these two bias terms are decreasing functions of , where is the number of linear measurements.
The challenge here is to remove these two bias terms by modifying the sample covariance matrix in the compressive domain. To do this, we first find the expectation of by using linearity of expectation:
(4)  
We also need to compute the expectation of :
(5) 
since trace is a linear operator. Hence, we can modify to obtain an unbiased estimator for the sample covariance matrix of the full data:
(6) 
where
(7) 
and
(8) 
We immediately see that is an unbiased estimator for the sample covariance matrix of the full data:
(9) 
Note that and are two known constants that are used to modify the biased estimator based on and .
3.2 Covariance estimation via sparse random projections
Throughout our analysis, the entries of are drawn from a zeromean distribution with finite first four moments. A common choice is a random matrix with i.i.d. Gaussian entries as in [19]. In this case, the kurtosis of the Gaussian distribution is zero which means that the bias term , regardless of the structure of . In fact, this is due to the rotational invariance of the Gaussian distribution which eliminates the dependence on the orientation of data. However, as the random Gaussian matrix is dense, the memory and computational loads are high for largescale data sets.
In this paper, we adopt sparse random projections that were originally proposed to reduce the memory and computational burden in the estimation of pairwise distances [11]. In this setting, the entries of are distributed on with probabilities . Therefore, the parameter controls the sparsity of random matrices such that each column of has nonzero entries, on average. It is easy to verify that the second and fourth moments are and thus the kurtosis:
(10) 
Hence, as the sparsity of increases, the value of the bias term in spectral norm increases proportional to the sparsity parameter . This means that the modification of the covariance estimator becomes more important as the sparsity parameter increases.
In this work, we are specifically interested in choosing the parameters and such that the compression factor:
(11) 
The benefits of this choice are twofold. First, the average computation cost to acquire or access each data sample is , , compared to the cost for acquiring the full data sample . Hence, we can achieve a substantial cost reduction in the data acquisition process when the number of samples is large.
Second, the expected number of nonzero entries in each of the columns of is . Therefore, each projected data has at most nonzero entries on average. Thus, the cost to form will be , in contrast to the cost for computing which is . Hence, the computational complexity to form the sample covariance matrix in the compressive domain is reduced by a factor of for some . Empirical results will be provided later in Section 5 to reveal the tradeoffs between computational savings and accuracy on various data sets.
4 Theoretical results
In this section, we provide theoretical guarantees on the expectation of , which is introduced in the previous section.
Theorem 1
Consider a rescaled version of the sample covariance matrix in the compressive domain:
(12) 
where are data samples and the entries of , , are drawn i.i.d. from a zeromean distribution with finite second and fourth moments and , and kurtosis . Then, the expectation of over the randomness in :
(13) 
where and denotes the matrix formed by zeroing all but the diagonal entries of .
Proof 1
First, we compute the expectation of each summand in (12). Note that the authors in [20] computed this expectation when each data sample is drawn i.i.d. from the spiked covariance model. In this statistical model, it is assumed that the data samples are drawn from a lowrank Gaussian distribution. Here, a new analysis is given where we make no distributional assumptions on the data samples to provide a better understanding of covariance estimation from compressive measurements.
Consider any , which we call to simplify the notation. The entry in the th row and th column of is denoted by , where and . Thus, the th column of the matrix has the following form:
Define . We calculate the entries of the matrix
using the fact that the entries of are drawn i.i.d. from a zeromean distribution:
(1) The th diagonal entry of :
(2) The other diagonal entries of , e.g., the th diagonal entry for :
(3) The offdiagonal entries of , e.g. the entry in the th row and the th column when :
since these entries have at least one term with degree . Therefore, we see that:
(14)  
where denotes the th vector of the canonical basis in . Thus, entries of are all zero except for the th one which is .
We can compute the entries of the matrix for using the same strategy.
In this case, each entry has at least one term with degree , so
it will be zero, except the following two terms:
(1) The entry in the th row and th column of :
where this follows from the fact that the entries of are i.i.d.
(2) The entry in the th row and th column of :
Hence, we get:
(15) 
Now, we compute the expectation over the randomness in for a fixed data sample :
(16)  
where we used . We rescale the expectation:
(17) 
Next, we rewrite the sample covariance matrix as:
Using linearity of expectation, it is straightforward to see:
(18) 
and this completes the proof.
5 Experimental results
In this section, we examine the performance of our unbiased covariance estimator on three realworld data sets. We compare our proposed with the biased estimator introduced in [20]. To evaluate the estimate accuracy, we use the normalized covariance estimation error defined as:
Therefore, this criterion is a measure of closeness of the estimated covariance matrix from compressive measurements to the underlying covariance matrix based on the spectral norm. Since the estimation of covariance matrix from compressive measurements is stochastic, we rerun each of the following experiments times and report the mean.
In all experiments, the parameter is fixed and we analyze the accuracy of the estimated covariance matrix from the sparse random projections, introduced in Section 3.2, for various values of compression factor . Note that we are interested in choosing the parameter such that . In fact, a value of closer to zero indicates the case where we achieve substantial reduction of the acquisition and computation cost by increasing the parameter in the distribution of random matrices.
5.1 MNIST data set
In the first experiment, we examine the MNIST data set of handwritten digits. This data set consists of centered versions of digits that are stored as a pixel image. We use data from digit “0” which contains data samples that are vectorized with the dimension . Some examples are shown in Fig. 1.
The performance of our unbiased estimator is compared with the biased estimator [20] in Fig. 2. We see that our proposed estimator results in more accurate estimates for all values of the compression factor . For example, at , our estimator decreases the estimation error by almost a factor of .
To see the tradeoffs between computational savings and accuracy, the running times to form our unbiased covariance etsimator and the biased estimator are reported in Fig. 2. The running times are normalized by the time spent to form the sample covariance matrix of the full data . Both unbiased and biased estimators from compressive measurements show a speedup over by almost a factor of , as expected from the discussion in Section 3.2. Moreover, we observe that has roughly the same running time as . Hence, the additional computation cost to remove the bias terms in equation (6) is negligible. Consequently, our unbiased estimator leads to more accurate estimates than the biased estimator with approximately the same computation cost.
We also show a sample visual comparison of the first eigenvector of the underlying covariance matrix
and that estimated by our approach for , , and . Note that the eigenvectors of covariance matrix encode an orthonormal basis that captures as much of the data’s variability as possible. Therefore, we intend to visualize the first eigenvector of and that estimated by our approach from sparse random projections as a measure of information extraction. As we see in Fig. 3, for small values of the compression factor, e.g., , the resulting eigenvectors from the full data and compressed data are almost identical. In fact, the first eigenvector of our estimator in Fig. 3 reveals the underlying structure and pattern in the data set consisting of handwritten digits “0”.5.2 Gen4 data set
Our second experiment is with the gen4 data matrix of size ( and
) from linear programming problems. This data set is available from the University of Florida Sparse Matrix Collection.
In Fig. 4, the accuracy of our proposed covariance estimator is compared with the biased estimator in [20]. We observe that covariance matrices recovered using our estimator are significantly more accurate than those returned by [20]. In fact, our proposed estimator decreases the estimation error by an order of magnitude.
Moreover, we examine the tradeoffs between computational savings and accuracy on this data set. The running times to compute and for various values of are reported in Fig. 4. Recall that the running times are normalized by the time spent to form the sample covariance matrix of the full data . We see that both estimators show a speedup over by almost a factor of . Similar to the previous example, the additional computation cost to remove the bias terms in equation (6) is negligible. Therefore, our unbiased estimator results in more accurate estimates than with roughly the same computation cost.
5.3 Traffic data set
In the last experiment, we consider the traffic data set which contains video surveillance of traffic from a stationary camera. Some examples of this data set are shown in Fig. 5. Considering individual frames of video as data samples, each frame is a pixel image that is vectorized so . Moreover, we have access to frames.
Our proposed approach is used to estimate the sample covariance matrix of the full data from sparse random projections. As we see in Fig. 6, using unbiased estimator for this data set results in accurate estimates such that the normalized estimation error is less than for all values of compression factor . The biased estimator has approximately the same accuracy as on this data set. The reason is because the bias term in equation (3) depends on both the kurtosis and the structure of the underlying covariance matrix . Hence, the value of this bias term can be relatively small for some special structures on . However, our proposed estimator provides an unbiased estimate of without any restrictive assumptions on the data’s structure.
Moreover, we examine the tradeoffs between computational savings and accuracy on the traffic data set. We report the running times to compute and normalized by the time spent to form the sample covariance matrix of the full data . As we see in Fig. 6, the computational savings are proportional to . We also observe that has roughly the same running time as .
In Fig. 7, we provide a sample visual comparison of the first eigenvector of the underlying covariance matrix and that estimated by our approach for , , and . We see that the first eigenvectors of and are almost identical even for small values of the compression factor, e.g., . The first eigenvector of our covariance estimator reveals the underlying structure from sparse random projections, which appears to be a traffic trend along the roadway.
This experiment typifies a realworld application of our approach. We showed that our proposed estimator can be used to extract the covariance structure of the data from sparse random projections. At the same time, we achieve significant reduction of the computation cost proportional to , where is the compression factor.
As one final note, we observe that our proposed covariance estimator results in more accurate estimates on the traffic data set compared to the MNIST and gen4 data sets. To explain this, let us consider the stable rank , where represents the Frobenius norm of the sample covariance matrix . The parameter
is a measure that gives us important information about how spread out the eigenvalues of
are. In fact, a value of closer to indicates a faster rate of decay of the eigenvalues. The values of for each of the MNIST, gen4, and traffic data sets are , , and respectively. Therefore, the sample covariance matrix of the traffic data set has one dominant eigenvector. Hence, the experimental results demonstrate that our proposed estimator leads to more accurate estimates of when the eigenvalues of decay faster, i.e., is closer to .6 Conclusions
This paper presents an unbiased estimator to extract the covariance structure from lowdimensional random projections of data. As the main theoretical ingredients, our analysis holds for any random projection matrix consisting of i.i.d. zeromean entries with finite first four moments. In addition, our analysis does not require restrictive assumptions on the sample covariance matrix. Therefore, the present work extends prior work on compressive covariance estimation and provides a new analysis of covariance estimation from compressive measurements.
Furthermore, we have employed very sparse random projections to reduce the computation burden in highdimensional data regimes. In fact, we showed that our approach can be used to estimate the covariance matrix, while achieving significant savings in computation cost. We have presented experimental results on several realworld data sets to show the accuracy of our proposed approach for varying values of the compression factor.
Acknowledgements
I would like to thank Stephen Becker for his helpful comments and suggestions.
References
 [1] M. Davenport, P. Boufounos, M. Wakin, and R. Baraniuk, “Signal processing with compressive measurements,” IEEE Journal of Selected Topics in Signal Processing, vol. 4, pp. 445–460, 2010.
 [2] K. Slavakis, G. Giannakis, and G. Mateos, “Modeling and optimization for big data analytics: (statistical) learning tools for our era of data deluge,” IEEE Signal Process. Mag., vol. 31, no. 5, pp. 18–31, 2014.
 [3] T. Wimalajeewa, H. Chen, and P. Varshney, “Performance limits of compressive sensingbased signal classification,” IEEE Transactions on Signal Processing, vol. 60, pp. 2758–2770, 2012.
 [4] G. Atia, “Change detection with compressive measurements,” IEEE Signal Processing Letters, vol. 22, no. 2, pp. 182–186, 2015.
 [5] E. Candès and M. Wakin, “An introduction to compressive sampling,” IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 21–30, 2008.
 [6] S. Gleichman and Y. Eldar, “Blind compressed sensing,” IEEE Transactions on Information Theory, vol. 57, no. 10, pp. 6958–6975, 2011.
 [7] C. Studer and R. Baraniuk, “Dictionary learning from sparsely corrupted or compressed signals,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3341–3344, 2012.
 [8] F. PourkamaliAnaraki and S. Hughes, “Compressive KSVD,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5469–5473, 2013.
 [9] F. PourkamaliAnaraki, S. Becker, and S. Hughes, “Efficient dictionary learning via very sparse random projections,” in Sampling Theory and Applications (SampTA), pp. 478–482, 2015.

[10]
G. Yu, G. Sapiro, and S. Mallat, “Solving inverse problems with piecewise linear estimators: from gaussian mixture models to structured sparsity,”
IEEE Transactions on Image Processing, vol. 21, no. 5, pp. 2481–2499, 2012.  [11] P. Li, T. Hastie, and K. Church, “Very sparse random projections,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 287–296, 2006.

[12]
V. Sindhwani, T. Sainath, and S. Kumar, “Structured transforms for smallfootprint deep learning,” in
Advances in Neural Information Processing Systems (NIPS), pp. 3088–3096, 2015.  [13] Y. Chi, Y. Eldar, and R. Calderbank, “PETRELS: Parallel subspace estimation and tracking by recursive least squares from partial observations,” IEEE Transactions on Signal Processing, vol. 61, no. 23, pp. 5947–5959, 2013.
 [14] K. Kumar, J. Liu, Y. Lu, and B. Bhargava, “A survey of computation offloading for mobile systems,” Mobile Networks and Applications, vol. 18, no. 1, pp. 129–140, 2013.
 [15] M. Azizyan, A. Krishnamurthy, and A. Singh, “Subspace learning from extremely compressed measurements,” in Asilomar Conference on Signals, Systems and Computers, pp. 311–315, 2014.
 [16] J. Park, M. Wakin, and A. Gilbert, “Modal analysis with compressive measurements,” IEEE Transactions on Signal Processing, vol. 62, no. 7, pp. 1655–1670, 2014.
 [17] J. Fowler, “Compressiveprojection principal component analysis,” IEEE Transactions on Image Processing, vol. 18, no. 10, pp. 2230–2242, 2009.

[18]
A. Kabán, “New bounds on compressive linear least squares regression,” in
The 17th International Conference on Artificial Intelligence and Statistics (AISTATS 2014)
, pp. 448–456, 2014.  [19] F. PourkamaliAnaraki and S. Hughes, “Efficient recovery of principal components from compressive measurements with application to Gaussian mixture model estimation,” in IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 2332–2336, 2014.

[20]
F. PourkamaliAnaraki and S. Hughes, “Memory and computation efficient PCA
via very sparse random projections,” in
Proceedings of the 31st International Conference on Machine Learning (ICML)
, pp. 1341–1349, 2014.
Comments
There are no comments yet.