We consider the problem of uncovering the intrinsic low rank component of a high-dimensional dataset. We further focus on data that resides on a certain graph and the data changes smoothly between the connected vertices . In many problems, the underlying graph is unknown or inexact [2, 3, 4]. For example, the graph may be estimated from the input data which is grossly corrupted. We propose an algorithm to estimate the low-rank component of the data, using the graph smoothness assumption to assist the estimation. Our algorithm also simultaneously and iteratively refines the graph to improve the effectiveness of the graph smoothness constraint, thereby increasing the quality of low-rank component estimation.
High-dimensional data is common in many engineering areas such as image / video processing, biomedical imaging, computer networks and transportation networks. We are specifically interested in automatic analysis of brain imaging data. In many cases, the task is to find the spatiotemporal neural signature of a task, by performing classification on cortical activations evoked by different stimuli [5, 6]
. Common brain imaging techniques are Electroencephalography (EEG) and Magnetoencephalography (MEG). These measurements are high-dimensional spatiotemporal data. For instance, in our experiments, we use a recumbent Elekta MEG scanner with 306 sensors to record the brain activity for 1100 milliseconds. Furthermore, the measurements are degraded by various types of noise (e.g., sensor noise, ambient magnetic field noise, etc.) and the noise model is complicated (potentially non-Gaussian). The high-dimensionality and noise limit both the speed and accuracy of the signal analysis, that may result in unreliable signature modeling for classification. The high-dimensionality of these signals also increases the complexity of the classifier.
Note that it has been recognized that there are patterns of anatomical links, or statistical dependencies or causal interactions between distinct units within a nervous system . Some techniques have also been developed to estimate this brain connectivity graph [8, 9]. However, this task is complicated and in many cases the estimated graph may not be accurate. Our contribution is to develop a robust algorithm to determine the reduced dimensionality components that include task-related information, under the assumption that the brain imaging data is graph-smooth but the knowledge of the graph is imperfect. Specifically, our contributions are: (i) based on a model with a low-rank component plus a sparse perturbation, and an initial graph estimation, we propose an algorithm to simultaneously learn the low-rank component and the graph; (ii) we derive the learning steps using ADMM ; (iii) we evaluate the algorithm using synthetic and real brain imaging data in a supervised classification task.
1.1 Related Work
This work is inspired by . While the focus of  is to learn the connectivity graph topology, their algorithm also estimates some noise-free version of the input data as by-product. Gaussian noise model and Frobenius norm optimization are employed in . Therefore, their work is suitable for problem when noise is small. In our work, starting from a model with a low-rank component plus a sparse perturbation, and an initial graph estimation, we adopt the idea of  to incrementally refine the underlying connectivity graph, thereby better low-rank estimation of the data can be obtained. As will be shown in our experiment, our method can perform better with high-dimensional graph data grossly corrupted by complicated noise, such as brain imaging signals. In addition to , learning a graph from smooth signals has attracted a fair amount of interests recently [3, 4]. These works focus on learning the graph, and advanced formulations (e.g., matrix optimization problem) have been derived. Estimation of the brain connectivity graph using a Gaussian noise model has been proposed in . On the other hand, focusing on low-rank estimation, some works have proposed to incorporate spectral graph regularization [12, 13, 14]. Their graphs are fixed in their algorithms, pre-computed from the noisy input data. On the contrary, our algorithm uses the improved low-rank estimation to refine the graph, which in turn is used to improve the quality of the low-rank estimation. Besides, graph signal processing has been applied to a few different brain imaging tasks. In 
, graph Fourier transform is applied to decompose brain signals into low, medium, and high frequency components for analysis of functional brain networks properties.
uses the eigenvectors of the graph Laplacian to approximate the intrinsic subspace of high-dimensional brain imaging data. They experimented different brain connectivity estimations to compute the graph. presents a graph based framework for fMRI brain activation mapping. Graph signal processing has also been shown to be useful in image compression , temperature data , wireless sensor data . A few signal features motivated by graph signal processing have also been proposed [20, 21]. Moreover, several linear / nonlinear dimensionality methods have been proposed that make use of the graph Laplacian of the sample affinity graphs [22, 23, 24] These methods are geometrically motivated, aim to preserve the local structures of the data, and involve different algorithms compared to our work.
2 Simultaneous Low Rank and Graph Estimation
We consider , the high dimensional data matrix that consists of -dimensional data points. For our brain imaging data, are the measurements by the sensors at the time instants, i.e., time series. We assume the data points have low intrinsic dimensionality and lie near some low-dimensional subspace. We assume the following mathematical model for the data:
is the low-rank component of the data matrix that is of primary interest in this paper, and is a perturbation matrix. We assume that can have arbitrarily large magnitude but its support is sparse.
Principal component analysis (PCA) is the most popular technique for determining the low-rank component with application domains ranging from image, video, signal, web content, to network. The classical PCA finds the projection of in a -dimensional () linear space characterized by an orthogonal basis , by solving the following optimization:
The and matrices are known as principal components and projected data points, respectively. is the approximation of the low-rank component. The classical PCA suffers from a few disadvantages. First, it is susceptible to grossly corrupted data in . Second, it does not consider the implicit data manifold information.
Candes et al.  addressed the first issue by designing Robust PCA
, which is robust to outliers by directly recovering the low-rank matrixfrom the grossly corrupted :
denotes the nuclear norm which is used as a convex surrogate of rank.
In this work we propose to extend (3) with an additional graph smoothness regularization, while the underlying graph topology that captures the data correlation could be unknown or inexact (thus some refinement is needed):
Here is the graph Laplacian of the feature graph describing the correlation between individual features: consists of a finite set of vertices , with , a set of edges , and a weighted adjacency matrix , with quantifying the similarity between the -th and -th features of the
-dimensional measurement vectors., with being the diagonal degree matrix. is the set of all valid graph Laplacian :
given : For a given (even a rough estimate), imposes an additional constraint on the underlying (unknown) low-rank data . Specifically,
where is the -th row vector of . Therefore, in (4) forces the row and of to have similar values if is large. Note that in our brain imaging data, individual rows represent the time series captured by sensors. Thus, forces the low-rank representations of the time series to be similar for highly correlated sensors. Prior information regarding measurement correlation (such as the physical distance between the capturing sensors) can be incorporated as the initial to bootstrap the estimation of .
given : On the other hand, for a given estimate of the low-rank data , guides the refinement of and hence the underlying connectivity graph . In particular, a graph that is consistent with the signal variation in is favored: large if row and of have similar values. In many problems, the given graph for a problem can be noisy (e.g., the graph is estimated from the noisy data itself [13, 14]). The proposed formulation iteratively improves using the refined low-rank data. The improved in turn facilitates the low-rank data estimation.
3 Learning Algorithm
We propose to solve the problem in Eq (4) with alternating minimization scheme where, at each step, we fix one or two variables and update the other variable.
At the first step, for a given , we solve the following optimization problem using ADMM with respect to and . It means given a graph, it estimates the low rank matrix:
At the second step, and are fixed and we solve the following optimization problem with respect to , which means that based on the low rank matrix we got in the previous step, it updates the graph.
For equation (8), it can be written as:
We can form the augmented Lagrangian of (9) as:
Then we can get the following formula to update for , and :
where is the Lagrangian parameter and is the Euclidean projection onto set .
4.1 Synthetic Experiment
In this synthetic experiment, we generate low-rank, graph-smooth and grossly-corrupted data. We generate synthetic data with the following model:
where is a low rank matrix with rank and is the sparse matrix.
We generate as a product where and . is also graph-smooth and generated as follows. The (ground-truth) graph consists of
nodes, with each pair of nodes having a probability ofto be connected together. The edge weights between different nodes are drew uniformly from 0 to 1 and presented in a symmetric adjacency matrix . We calculate the Laplacian matrix from
and compute the eigenvectors and eigenvalues of. The eigenvectors, corresponding to top eigenvalues, are selected as the columns of . For matrix , the entries are independently sampled from a distribution. Therefore, is low-rank and graph-smooth. We introduce errors in the matrix
from an i.i.d Bernoulli distribution. Each corrupted entry takes a valuewith a probability .
We compare the proposed method, GL-SigRep, RPCAG, RPCA, and PCA on the data to estimate the low rank matrix and the graph matrix. The estimation accuracies are evaluated by the reconstruction errors: and . All the methods are initialized with the same feature similarity graph (consider each row of as a node) computed using the procedure in  with a K-nearest neighbor strategy ().
Table 1 shows the results on synthetic data generated by the Eq (12) with . With cross-validation, we set . The low rank approximation of proposed method achieves the smallest error. For the estimate graph matrix, the proposed method also achieves the smallest estimation error. Synthetic experiment results show that the proposed method can achieve good performance on extracting low rank approximation and the underlying graph from non-Gaussian noisy data.
|Methods||Low Rank Matrix||Graph Matrix|
4.2 Brain Imaging Data Experiment
We also apply our proposed method on a high-dimensional brain imaging dataset to extract the brain connectivity graph and the low rank approximation from the high dimensionality. This is practically useful for brain imaging studies: due to the high dimensionality of data, low signal-to-noise ratio, and small number of available samples, it is challenging to estimate the low rank approximation in these studies.
The brain imaging dataset used here is a set of magnetoencephalography (MEG) signal recordings of brain activities, in response to two categories of visual stimuli: 320 face images and 192 non-face images. These images were randomly selected and displayed passively with no task, and 16 subjects were asked to simply fixate at the center of the screen. All images were normalized for size and brightness among other factors, and were each displayed once for 300 ms with random inter-stimulus delay intervals of 1000 ms to 1500 ms. We used a recumbent Elekta MEG scanner with 306 sensors to record the brain activity for 1100 milliseconds (100 milliseconds before and 1000 milliseconds after the presentation) for each stimuli. The classification task in this experiment is to distinguish signals evoked by face images from signals evoked by non-face images.
4.2.1 Initial graph matrix
In the proposed method, a suitable starting point is important for solving the optimization problem. We therefore initialize with the brain connectivity matrix generated with the resting state measurements. The resting state in our experiment is 100ms of signal recording before the stimuli presentation. Note that our method and all other methods are initialized with the same connectivity matrix.
Three different types of brain connectivity graphs are commonly used in the literature: structural connectivity, functional connectivity and effective connectivity. Structural connectivity shows the anatomical structure in the brain; functional connectivity quantifies functional dependencies between different brain regions; and effective connectivity shows directed or causal relationship between different brain regions .
In this paper we use a coherence connectivity, a functional connectivity, quantifying oscillatory interdependency between different brain regions . It is the frequency domain analog of the cross-correlation coefficient. Given two series of signals , and a frequency , the first step is to spectrally decompose the signal at target to obtain the instantaneous phase at each time point . After band-pass filtering each signal between , the convolution of with a Morlet wavelet centered at frequency provides the instantaneous phase at time . Thus, the two signals can be represented as: and , where and are amplitudes. and are the phase for and at time t. Then the coherence connectivity edge is calculated as below:
After we get the adjacency matrix , we can calculate the Laplacian matrix as the initialization.
4.2.2 Brain Imaging Data Experiment Results
To evaluate the performance of proposed method, we use a supervised classifier, SVM, to classify the low-rank outputs into face and non-face classes. We compare these methods based on their classification accuracies as well as compatibility of their connectivity graph matrix to the related neuroscience findings on suggested cortical regions involving face processing.
MEG and EEG components corresponding to the face/non-face distinction have been reported at latencies of approximately 100 ms, and more reliably at 170 ms (also known as N170 marker, reported at about 145ms in MEG studies), after visual stimulus onset (e.g. see [28, 29, 30, 31]). In this experiment, we therefore choose the data from two time slots, namely 96ms to 105ms and 141ms to 150ms after the stimuli presentation, to be able to compare our automate assessment with the related neuroscience literature.
We applied commonly employed mean subtraction on the time-locked data. In addition to this step, we refrained from providing any prior information to our method. This is due to the goal of this paper to showcase the capabilities of our proposed method, and therefore chose a pure data driven approach to the problem of estimation of underlying connectivity graph. For example, we did not enforce constraints in the optimization step on the sensor sensitivity to field spread (i.e. nearby electrodes record similar brain activities). Similarly, we did not differentiate between the magnetometers and gradiometers. With no prior information fed to the method along-side the data, the estimated underlying graph can be easily compared with the results of other methods. This is while more specific experiments can be conducted in the future to reveal the effects of prior information on the results.
Given the input data and the initial graph, our proposed method outputs some low rank estimation . We decompose the low rank matrix using SVD, select the components according to the rank of , project the data onto the components to obtain low-dimensionality representations, and classify the reduced dimension data. We compare the proposed method with GL-SigRep, RPCAG, RPCA and PCA. Using cross-validation with , , (for step 2, only the ratio of and matters the results), we set for the first time slot and for the second time slot.Table 2 shows the classification results for different methods on the two time slots. The proposed method gives the best results for both time slots.
For another comparison, in Figure 1, we visualize the estimated graph matrix by our method as well as the GL-SigRep method, by projecting the graph connectivity weights on the MEG sensor locations. The first row of Figure 1 shows the initialization graph used for both method, the coherence connectivity graph obtained from the resting state data. The second row visualizes the output of the two methods for data at 100ms and 145ms time point.
Comparing the graph visualization results from GL-SigRep (Figure 1, b and d), one can see that using the data from 96-105ms, GL-SigRep indicates connectivities at the left temporal and middle and inferior frontal gyri. The estimated graph by GL-SigRep does not significantly change using the data from 141-150ms either. None of these regions high-lighted by GL-SigRep has apparent link with early visual processing described in neuroscientific literature (note that GL-SigRep assumes Gaussian noise and would not be appropriate for MEG data). At 100ms after projection of a visual stimuli like a face image, neuroscientific literature seem to report the visual information is still being processed at early visual cortex at occipital and occipitotemporal regions (e.g. see [29, 31]). Unlike GL-SigRep results, at 96-105ms, the graph connectivity estimation by our method (Figure 1, a) tends to span more on the occipital and left occipitotemporal regions, and therefore seem to have reached a more successful estimation of the true underlying connectivity at this time-point.
The connectivity graph estimated by our method during 141ms to 150ms (Figure 1, c) converges on connections on the right occipitotemporal region. This graph connectivity is comparable to the neuroscientific findings on face perception, and specifically the N170 marker. In several studies such as [28, 30], the fusiform gyrus (at the occipitotemporal region) are suggested for processing face perception during about 145ms after presentation of a face image stimuli, also known as N170 marker (named after its first discovery at 170ms in EEG studies). In this work, our technique reveals almost the same regions as important graph connections for face perception. The compatibility of our estimated graph connectivity with neuroscientific literature further supports our proposed method over others.
We propose an algorithm in learning the low rank component and graph simultaneously. It is suitable for cases where the perturbations on the low rank components are grossly but sparse. We showed that the proposed method on both synthetic data and brain imaging data is competitive. Our method achieves good performance on both low rank approximation and graph estimation. In addition, when applying to the brain imaging data, our method could recover a connectivity graph that is more compatible to the neuroscientific literature, indicating its better estimation of the true underlying graph. Future work applies the proposed algorithm for other high-dimensional data .
-  David Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, Pierre Vandergheynst, et al., “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains,” Signal Processing Magazine, IEEE, vol. 30, no. 3, pp. 83–98, 2013.
-  Xiaowen Dong, Dorina Thanou, Pascal Frossard, and Pierre Vandergheynst, “Laplacian matrix learning for smooth graph signal representation,” in Proc. ICASSP, 2015.
-  Eduardo Pavez and Antonio Ortega, “Generalized laplacian precision matrix estimation for graph signal processing,” in Proc. ICASSP, 2016.
-  Vassilis Kalofolias, “How to learn a graph from smooth signals,” in AISTATS 2016, 2016.
-  Kleovoulos Tsourides, Shahriar Shariat, Hossein Nejati, Tapan K Gandhi, Annie Cardinaux, Christopher T Simons, Ngai-Man Cheung, Vladimir Pavlovic, and Pawan Sinha, “Neural correlates of the food/non-food visual distinction,” Biological Psychology, 2016.
-  H. Nejati, K. Tsourides, V. Pomponiu, E.C. Ehrenberg, Ngai-Man Cheung, and P. Sinha, “Towards perception awareness: Perceptual event detection for brain computer interfaces,” in EMBC2015, Aug 2015, pp. 1480–1483.
-  Ed Bullmore and Olaf Sporns, “Complex brain networks: graph theoretical analysis of structural and functional systems,” Nature Reviews Neuroscience, vol. 10, no. 3, pp. 186–198, 2009.
-  James S Hyde and Andrzej Jesmanowicz, “Cross-correlation: an fmri signal-processing strategy,” NeuroImage, vol. 62, no. 2, pp. 848–851, 2012.
-  Andrea Brovelli, Mingzhou Ding, Anders Ledberg, Yonghong Chen, Richard Nakamura, and Steven L Bressler, “Beta oscillations in a large-scale sensorimotor cortical network: directional influences revealed by granger causality,” Proceedings of the National Academy of Sciences of the United States of America, vol. 101, no. 26, pp. 9849–9854, 2004.
Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein,
“Distributed optimization and statistical learning via the
alternating direction method of multipliers,”
Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011.
-  Chenhui Hu, Lin Cheng, Jorge Sepulcre, Georges El Fakhri, Yue M. Lu, and Quanzheng Li, “A graph theoretical regression model for brain connectivity learning of alzheimer’s disease,” in ISBI2013, San Francisco, CA, 7-11 Apr. 2013.
-  Bo Jiang, Chris Ding, Bin Luo, and Jin Tang, “Graph-Laplacian PCA: Closed-form solution and robustness,” in CVPR, 2013, pp. 3490–34968.
Nauman Shahid, Vassilis Kalofolias, Xavier Bresson, Michael Bronstein, and
“Robust principal component analysis on graphs,”
Proceedings of International Conference on Computer Vision, Santiago, Chile, 2015, pp. 2812–2820.
-  Nauman Shahid, Nathanael Perraudin, Vassilis Kalofolias, Gilles Puy, and Pierre Vandergheynst, “Fast robust PCA on graphs,” arXiv:1507.08173v2 [cs.CV] 25 Jan 2016, pp. 1–17, 2016.
-  Weiyu Huang, Leah Goldsberry, Nicholas F Wymbs, Scott T Grafton, Danielle S Bassett, and Alejandro Ribeiro, “Graph frequency analysis of brain signals,” arXiv preprint arXiv:1512.00037v2, 2016.
-  Liu Rui, Hossein Nejati, and Ngai-Man Cheung, “Dimensionality reduction of brain imaging data using graph signal processing,” in ICIP. IEEE, 2016, pp. 1329–1333.
-  Hamid Behjat, Nora Leonardi, Leif Sörnmo, and Dimitri Van De Ville, “Anatomically-adapted graph wavelets for improved group-level fmri activation mapping,” NeuroImage, vol. 123, pp. 185–199, 2015.
-  Wei Hu, Gene Cheung, Antonio Ortega, and Oscar C Au, “Multiresolution graph fourier transform for compression of piecewise smooth images,” Image Processing, IEEE Transactions on, vol. 24, no. 1, pp. 419–433, 2015.
Hilmi E Egilmez and Antonio Ortega,
“Spectral anomaly detection using graph-based filtering for wireless sensor networks,”in ICASSP. IEEE, 2014, pp. 1085–1089.
-  Xiaowen Dong, Antonio Ortega, Pascal Frossard, and Pierre Vandergheynst, “Inference of mobility patterns via spectral graph wavelets,” in ICASSP. IEEE, 2013, pp. 3118–3122.
-  Jieqi Kang, Shan Lu, Weibo Gong, and Patrick A Kelly, in ICIP. IEEE, 2014, pp. 2051–2055.
-  M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation,” Neural Computation, vol. 15, no. 6, pp. 1373–1396, 2003.
-  Xiaofei He and Partha Niyogi, “Locality preserving projection,” in Proceedings of NIPS, 2003.
-  Shuicheng Yan, Dong Xu, Benyu Zhang, and Stephen Lin, “Graph embedding and extensions: a general framework for dimensionality reduction,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007.
-  Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright, “Robust principal component analysis?,” Journal of the ACM, vol. 58, no. 3, pp. 11:1–11:37, May 2011.
-  Xiaowen Dong, Dorina Thanou, Pascal Frossard, and Pierre Vandergheynst, “Learning laplacian matrix in smooth graph signal representations,” arXiv preprint arXiv:1406.7842v3, 2014.
-  Benjamin T Schmidt, Avniel S Ghuman, and Theodore J Huppert, “Whole brain functional connectivity using phase locking measures of resting state magnetoencephalography,” Front. Neurosci, vol. 8, no. 141, pp. 10–3389, 2014.
David I. Perrett, Amanda J. Mistlin, and Andrew J. Chitty,
“Visual neurones responsive to faces,”Trends in Neurosciences, vol. 10, no. 9, pp. 358 – 364, 1987.
-  S. Thorpe, D. Fize, and C. Marlot, “Speed of processing in the human visual system,” Nature, vol. 381, no. 6582, pp. 520–2, 1996.
-  Jia Liu, Alison Harris, and Nancy Kanwisher, “Stages of processing in face perception: an MEG study.,” Nature neuroscience, vol. 5, no. 9, pp. 910–916, Sept. 2002.
-  R. Desimone, “Face-selective cells in the temporal cortex of monkeys,” Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 1–8, Jan. 2006.
-  Lu Fang, Ngai-Man Cheung, D Tian, A Vetro, H Sun, and O Au, “An analytical model for synthesis distortion estimation in 3d video,” IEEE Transactions on Image Processing, vol. 23, no. 1, pp. 185–199, 2014.