Graph filtering for data reduction and reconstruction

09/25/2018 ∙ by Ioannis D. Schizas, et al. ∙ 0

A novel approach is put forth that utilizes data similarity, quantified on a graph, to improve upon the reconstruction performance of principal component analysis. The tasks of data dimensionality reduction and reconstruction are formulated as graph filtering operations, that enable the exploitation of data node connectivity in a graph via the adjacency matrix. The unknown reducing and reconstruction filters are determined by optimizing a mean-square error cost that entails the data, as well as their graph adjacency matrix. Working in the graph spectral domain enables the derivation of simple gradient descent recursions used to update the matrix filter taps. Numerical tests in real image datasets demonstrate the better reconstruction performance of the novel method over standard principal component analysis.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Data dimensionality reduction and reconstruction has been extensively studied, with the workhorse approach being the principal component analysis (PCA) framework which determines proper compression and reconstruction matrices that minimize the mean-square error (MSE), see e.g., [4]

. Standard PCA relies on data correlations within each data vector to find a MSE-optimal data representation in a reduced dimensional space. Our goal here, is to exploit similarity among different data vectors when performing dimensionality reduction, manifested as edge weights on a graph, to improve the data reconstruction performance.

Graph signal processing is an emerging field where similarity among the available data is exploited, via the utility of shift operators, to improve the performance in a variety of tasks including sampling, filtering, clustering and sampling/reconstruction [2, 12, 16]. The concept of sampling a graph signal in a subset of nodes and reconstructing it wherever is not available has been extensively explored [2, 6, 11, 13, 17]

. In these works, the idea of bandlimited signals is extended in the graph spectral domain, and techniques exploiting the Laplacian eigenspace are devised to reconstruct the signal values in every node of the graph from a subset of nodes.

Dimensionality reduction in graphs has been proposed by expanding the PCA or nonnegative matrix factorization formulations with a Laplacian regularization term that takes into account similarity among single-hop neighboring data entities in a graph [8, 5, 7, 15, 14]. In the aforementioned line of work dimensionality reduction is performed to improve data clustering performance. Differently, our goal here is data dimensionality reduction and reconstruction by exploiting data similarity quantified here by the graph adjacency matrix.

The tasks of data dimensionality reduction and reconstruction are carried out via graph filtering, while the order of the matrix filters will determine the neighborhood size that will be utilized in determining the compressed and reconstructed data. The novel formulation is seeking MSE-optimal filter matrices that minimize the reconstruction MSE in the graph. A computationally effective gradient descent approach is proposed to recursively determine the filters. For zero-order filters the novel framework boils down to standard PCA. Numerical tests using real image datasets demonstrate the superiority of the novel graph-based dimensionality reduction and reconstruction framework over standard PCA.

2 Problem Setting and Preliminaries

Consider a collection of data , where each data vector has scalar entries. Columns in could correspond to a collection of images, sensor measurements and so on [9, 5]. In many practical applications the data vectors lie on a low dimensional vector space , where .

One of the most effective ways to apply dimensionality reduction to the data is to employ principal component analysis (PCA), see e.g., [4]. PCA, being the dimensionality reduction workhorse, extracts the principal components by projecting the data onto a low dimensional vector subspace in which the data demonstrate the largest variability. PCA is determining a dimensionality reducing matrix of size , with and a reconstruction matrix , which are found by minimizing the reconstruction MSE


where corresponds to a centered version of the data, with for , and denotes the Frobenius norm. It turns out that , where contains in its columns the

principal eigenvectors of sample-average covariance matrix


PCA is designed to estimate the low dimensional subspace

using , without taking into account similarity among different data vectors. However, the dataset may contain groups of data vectors that exhibit similarity in some sense, e.g., images depicting a similar object or having similar texture. Standard PCA does not take into account data similarity information that can potentially identify structurally similar data and lead to better reconstruction.

Data similarity measures if available can be utilized in a graph. Specifically, let scalar quantify the similarity between data vectors and for . Then, an undirected graph with nodes within set and edges in can summarize the similarity among the different data in . Note that since the graph is undirected then . The similarity quantities can be summarized in the so called adjacency matrix which is an

symmetric matrix whose eigenvalue decomposition can be written as

, where is a diagonal matrix that contains the eigenvalues, while

is a unitary matrix containing the eigenvectors of


PCA is redesigned in this work to exploit data similarities summarized in the adjacency matrix , via graph filtering, and improve reconstruction performance.

3 Data Reduction and Reconstruction via Graph Filtering

To exploit the similarity weights on the graph edges we utilize graph filtering (GF) [12, 13, 16]. A scalar linear shift-invariant graph filter of order is given as, see e.g., [12, 13, 16]


where denotes a graph shift operator that in this paper will be the adjacency matrix . Building upon (2) we define the following data reducing graph matrix filtering operation


where is obtained after stacking the columns in on top of each other, while refers to an identify matrix of size and is the Kronecker product.

Vector contains the reduced dimensionality data vectors with entries for each node , while each is produced by compressing and linearly combining data vectors from neighboring nodes (up to hops away from node ) using the dimensionality reducing matrices for . The motivation behind this reducing filtering step is that data vectors within a neighborhood of few hops will exhibit large similarity, and these data can be used jointly to better reduce to the contents of . Note that for , (3) boils down to which pertains to standard PCA.

Similarly, graph filtering can be utilized as in (3) to reconstruct the data vectors using the reduced vectors , the adjacency matrix and reconstruction matrices in the following way


The dimensionality reducing and reconstruction matrices will be determined such that the reconstruction MSE resulting after applying (3) and (4) is minimized, i.e.,


For simplicity it has been assumed that the order of the reducing and reconstruction filters is , nonetheless the proposed framework allows for different orders. Note that for the cost function in (3) boils down to which corresponds to the standard PCA formulation which does not take into account data similarity information.

3.1 Graph Spectrum MSE Reformulation

The cost function in (3) is reformulated next to facilitate the determination of the matrix filter taps . Multiplication of and in (3) with the unitary matrix has no effect in the cost, i.e., . Let

denote the graph Fourier transform (GFT) of the data

with respect to the adjacency matrix . In detail, the GFT at the ith frequency (ith eigenvalue of ) is given as , where corresponds to the th entry of . After the unitary transformation of the reconstruction MSE and using the property that for , the minimization problem in (3) can be rewritten as


where , while and . Thus, (6) can be viewed as a spectral version of (3) and convolution has been transformed into a multiplication between the filters’ spectral response and the GFT of the data vectors. Note that can be viewed as the spectral response of the reconstruction matrix filter at eigenvalue , similarly corresponds to the spectral response of the reducing matrix filter at .

The cost function in (6) can be rewritten as follows


where and .

Taking first-order derivatives of (7) with respect to (wrt) and and setting them equal to zero, we obtain the following first-order optimality conditions [3]


The equalities in (8) can be utilized to show the following result (the proof has been omitted due to space considerations).

Corollary 1

The reducing matrix filter taps in can be written as a linear combination of the transformed data vectors , i.e.,


where , while .

The result of Corollary 1 can be utilized to replace with in (6) reducing in that way the number of primary optimization variables. Note that contains entries that need to be found, whereas has entries that need to be determined. For applications where , Cor. 1 can be used to introduce computational savings when solving (6).

3.2 Gradient Descent Based Algorithm

We resort to a gradient descent approach to devise a computationally simpler method to minimize the cost in (7). Specifically, during iteration the gradient descent updates [3] for and are given as


where are nonnegative step-sizes to be determined by line-search later on, and , are the gradients of the cost function in (7) evaluated wrt and , respectively. Differentiation of (7) wrt gives

where and denotes the th column of , i.e., and .

Similarly, the gradient can be calculated as

From (10) and (3.2) each submatrix in for can be updated as


whereas is found as


The computational complexity (number of additions and multiplications) for carrying out the the gradient descent recursions in (12) is of the order of , while for (13) complexity is of the order of . Complexity is proportional to the dimensionality of the data vectors , the order of the filters and quadratic in .

Optimal step-size selection: We resort to line search, see e.g., [3], where the step-sizes in and are set such that they minimize the cost function in (6) after substituting and with the updating recursions in (12) and (13) and minimizing wrt to the or parameters. We demonstrate the process for . After substituting in (7) with the right hand side in (12), and it turns out that the optimal choice for during iteration can be obtained as


where with . Further, the quantities and are


Then, it follows readily that the optimal step-size in (3.2) is equal to .

Using a similar approach where we substitute with , and then replace with the right hand side of (13) in (7) we can find the optimal selection for step-size as where


Initialization: and can be initialized using the solution of standard PCA to which our framework boils down to when . Let the standard PCA compression and reconstruction matrices be denoted as Then, we can initialize as . From Corollary 1 it holds that (when ) from which we can obtain . The gradient descent based approach is tabulated as Alg. 1. and are updated until the norm of the difference between successive iterates drops below a desired threshold .

Remark: Note that the original data consist of scalars, which can be prohibitively large. When, applying the dimensionality reduction matrix filter each data vector is described by scalars corresponding to the entries of . Thus, a total of scalars are utilized to characterize the dimensionality reduced data. Notice that to form the reconstructed data in (4), the entries of and , as well as the different entries of the symmetric adjacency matrix and the scalars in are needed. The cost of storing the entries of , and for the graph-based data reduction scheme, is higher than storing scalars required in standard PCA for and the . Nonetheless, the graph-based approach achieves better reconstruction accuracy as detailed next. Here compression occurs as long as


Thus, for high-dimensional data

(such as images) and a limited amount of data vectors, the right hand side in (18) can be approximated as . Thus, as long as there is meaningful data reduction.

1:  Initiliaze and using standard PCA.
2:  for  do
3:     Determine optimal step-sizes and .
4:     Update and via (12) and (13), respectively.
5:     If then stop.
6:  end for
Algorithm 1 Gradient Based Matrix Filter Determination

4 Numerical Simulations

We test and compare the performance of the graph-based reduction and reconstruction approach versus standard PCA (where

) in the MNIST database of handwritten digits, and the Extended Yale-B (EYB) face image dataset

[10, 9]. The MNIST dataset consists of grayscale images of handwritten digits. The EYB database contains frontal colored images of size of individuals. Using the MNIST dataset we pick randomly images of randomly selected digits giving rise to a graph with nodes each associated with a data vector of size . The approach is repeated times to perform averaging when testing the performance. In a similar fashion, EYB is used to randomly pick roughly images for randomly chosen individuals giving rise to a graph with nodes. Each facial image is rescaled to a size of and converted to grayscale, thus here entries.

For the MNIST dataset the adjacency matrix is built such that its th entry is given as whereas for the EYB a Gaussian similarity kernel is employed where and . A k-nearest neighbor rule is applied where for each node connectivity with the most similar neighbors is preserved.

Fig. 1 depicts the reconstruction MSE, in the MNIST-derived dataset, versus the reduced dimension for the standard PCA (), as well as different graph matrix filters orders and . Clearly, the introduction of graph filtering leads to much lower reconstruction MSE which improves as increases. Though, after a certain filter order the MSE reduction becomes negligible. Similar conclusions can be drawn from Fig. 2 that depicts the reconstruction MSE associated with the EYB-derived dataset. The utilization of similarity information in the adjacency matrix of the graph boosts the reconstruction performance over PCA ().

Figure 1: Reconstruction MSE versus in MNIST.
Figure 2: Reconstruction MSE versus in EYB.

5 Conclusion

A novel graph-filtering based data reduction and reconstruction scheme was proposed. A novel formulation incorporates in the reconstruction MSE graph-filtering, that takes into account data vector similarities. Working in the graph spectral domain enables the derivation of computationally efficient gradient descent techniques to determine the reducing and reconstruction matrix filters. Numerical tests on the image datasets EYB and MNIST demonstrate the improvement in reconstruction quality with respect to standard PCA.


  • [1]
  • [2] A. Anis, A. Gadde, and A. Ortega, “Towards a Sampling Theorem for Signals on Arbitrary Graphs,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), pp. 3864–3868, May 2014.
  • [3] D. P. Bertsekas, Nonlinear Programming, Second Edition, Athena Scientific, 2003.
  • [4] D. R. Brillinger, Time Series: Data Analysis and Theory. Expanded Edition, Holden Day, 1981.
  • [5] D. Cai, X. He, J. Han, and T. S. Huang, “Graph Regularized Nonnegative Matrix Factorization for Data Representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 8, pp. 1548–1560, Aug. 2011.
  • [6] S. Chen, R. Varma, A. Sandryhaila, and J. Kovacevic, “Discrete Signal Processing on Graphs: Sampling Theory,” IEEE Trans. Signal Process., vol. 63, no. 24, pp. 6510–6523, Dec. 2015.
  • [7] B. Jiang, C. Ding, and J. Tang, “Graph-Laplacian PCA: Closed-form Solution and Robustness,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2013, pp. 3492–3498.
  • [8] T. Jin, Z. Yu, L. Li, and C. Li, “Multiple Graph Regularized Sparse Coding and Multiple Hypergraph Regularized Sparse Coding for Image Representation,” Elsevier Neurocomputing, vol. 154, pp. 245–256, 2015.
  • [9]

    K. C. Lee, J. Ho, and D. J. Kriegman, “Acquiring Linear Subspaces for Face Recognition Under Variable Lighting,”

    IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 5, pp. 684–698, May 2005.
  • [10] MNIST Dataset: Available:
  • [11] K. Qiu, X. Mao, X. Shen, X. Wang, T. Li and Y. Gu, “Time-Varying Graph Signal Reconstruction,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 6, pp. 870–883, Sept. 2017.
  • [12] A. Sandryhaila and J. M. F. Moura, “Discrete Signal Processing on Graphs,” IEEE Trans. Signal Processing, vol. 61, no. 7, pp. 1644-1656, 2013.
  • [13] S. Segarra, A. G. Marques, G. Leus and A. Ribeiro, “Reconstruction of Graph Signals Through Percolation from Seeding Nodes,” IEEE Transactions on Signal Processing, vol. 64, no. 16, pp. 4363–4378, Aug. 2016.
  • [14] N. Shahid, N. Perraudin, V. Kalofolias, G. Puy, and P. Vandergheynst, “Fast Robust PCA on Graphs,” IEEE J. Sel. Topics Signal Process., vol. 10, no. 4, pp. 740–756, Feb. 2016.
  • [15] Y. Shen, P. A. Traganitis and G. B. Giannakis, “Nonlinear Dimensionality Reduction on Graphs,” 2017 IEEE 7th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), Curacao, 2017, pp. 1-5.
  • [16] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The Emerging Field of Signal Processing on Graphs: Extending High-Dimensional Data Analysis to Networks and Other Irregular Domains,” IEEE Signal Process. Mag., vol. 30, no. 3, pp. 83–98, May 2013.
  • [17] X. Wang, P. Liu, and Y. Gu,“Local-Set-Based Graph Signal Reconstruction,” IEEE Trans. Signal Processing, vol. 63, no. 9, pp. 2432–2444, 2015.