Graph signal processing aims to extend basic concepts of classical signal processing developed for signals defined on Euclidean domains to signals that reside on irregular domains with a network structure . Oftentimes, large scale graphs appear as a product of several smaller graphs. For example, time series on a sensor network, can be factorized using a cycle graph to represent time, and a spatial map to represent the network . In genomics, the graph that relates the different phenotypes of a population is a product of the graphs that relate the different character phenotypes . And, the movie ratings on a platform like Netflix can be viewed as signals living on the product of the social network of the users and the graph of relations among movies . More formally, we say that a graph is a product graph when its node set can be decomposed as a cartesian product of the nodes of two smaller factor graphs, and its edges are related with a known connection to the edges of the factors .
As the number of nodes increases within each factor, the total number of nodes in the product graph grows exponentially. When this happens, the tools that were designed to process data on small networks start to fail. In literature, this issue is commonly known as the curse of dimensionality. To circumvent this issue, some authors have proposed exploiting the product structure of large graphs to parameterize graph signals with a reduced dimensionality .
In this paper, we focus on the reconstruction of graph signals that reside on the vertices of a product graph by just observing a small subset of its vertices. In particular, we propose using a structured sampling scheme with which we select a sparse subset of nodes from each factor, thereby observing the signal at a few specific nodes of the product graph. This approach contrasts with traditional graph signal processing methods which do not take into account any underlying graph factorization when designing the sampling set [7, 8, 9, 10, 11, 12, 13]. When the underlying graph factorization is not accounted for, the complexity of designing the sampling set scales with the total number of vertices in the graph, and therefore, the applicability of such methods to large graphs is very limited.
Our proposed scheme circumvents this issue by reducing the original product search space into the union of two much smaller spaces. Hence, avoiding the curse of dimensionality. An illustration of this is shown in Fig. 1, where we see that selecting nodes from the factors reduces the possible candidate locations from 20 to 9. In essence, the aim is to reconstruct a signal on the product graph (rightmost in Fig. 1) by observing a subset of nodes of the factor graphs (on the left of Fig. 1) that generate the product graph.
In this paper, we present a low-complexity near-optimal greedy algorithm to design such a structured subsampling scheme, and demonstrate its performance on real datasets related to dynamic 3D point clouds and recommender systems.
2 Background and modeling
Throughout this paper we will use upper (lower) case bold face letters to denote matrices (column vectors), and we will denote sets using calligraphic letters.
2.1 Graph signals
A graph signal consists of a collection of values that can be associated to the nodes of a known graph , with a vertex set , and an edge set that reveals the connections between the nodes. For the sake of exposition, we will focus on undirected graphs.
Using the graph structure, we can construct an adjacency matrix that stores the strength of the connection between edges and in its th and th entries, . The degree of the th node of a graph is defined as . Related to the adjacency matrix, we can also define an alternative matrix representation known as the graph Laplacian , where . Both the adjacency matrix and the graph Laplacian belong to the class of matrices that represent a valid graph-shift operator , i.e., a matrix with a sparsity pattern defined by the graph connectivity.
Since for undirected graphs
is symmetric, it admits an eigenvalue decomposition
where the eigenvectorsand the eigenvalues provide a notion of frequency for the graph setting . If working with directed graphs, one can simply replace (1) by a Jordan decomposition .
The vectors provide a Fourier-like basis for graph signals, allowing to decompose any signal into its spectral components . In this sense, we say that a graph signal is bandlimited  when has non-zero entries.
2.2 Product graphs
Let and be two111For the sake of exposition, we restrict ourselves to the product of two graphs. Extensions to higher-order product graphs can be found in the journal version of this paper . undirected graphs with and vertices, respectively. The product graph , of and denoted with the symbol , is the graph given by
where denotes the cartesian product of the vertex sets, and defines a valid edge set for according to the rules of the graph product. Depending on the set of rules that detemine , three different product graphs are usually defined: the Cartesian product, the Kronecker product, and the strong product. For details of these rules we refer the reader to . The eigenvalue decomposition of the graph-shift operator of a product graph, denoted by , is related to the eigenvalue decompositions of its factors through 
where denotes the Kronecker product between matrices, and are the eigenvectors of the graph-shift operators for and , respectively, and is some diagonal matrix that depends on , and the type of product.
Because every node in a product graph can be indexed using a pair of vertices of the factor graphs, we can rearrange any graph signal in a matrix form such that . The spectral decomposition of then takes the form
Using this formulation, we can say that a product graph signal is bandlimited when it is simultaneously bandlimited in both domains, with a sparse having columns and rows different than zero with known support. In such cases, admits a low-dimensional representation as
where and are obtained by removing the columns of and corresponding to the indices of the rows and columns of that are zero, respectively; and and are the non-zero spectral components in and .
3 Sampler design
The low-dimensional representation (i.e., the notion of joint bandlimitedness) allows one to subsample graph signals by selecting just a few elements from each graph domain. Indeed, (3) defines an overdetermined system of equations with unknowns and equations. Since
we know that if and have full column rank we can recover from a subsampled version of .
Mathematically, sampling a subset of nodes from the graph factors is equivalent to selecting a subset of rows and columns from . Let and be two subsets of vertices from and , respectively, with and . To mathematically represent the proposed sampling scheme, we introduce two selection matrices and . Then, the subsampled observations can be related to using the following linear model
which can equivalently be expressed in the vectorized form as
In the following, and in order to simplify the notation, whenever it will be clear, we will drop the explicit dependency of and on the sets of selected nodes and , and we simply use and .
One can estimatefrom [cf. (4)] using least-squares as
where denotes the Moore-Penrose left pseudo-inverse of a matrix, which due to the Kronecker structure can efficiently be computed separately for each domain. From one can obtain using (3).
In the presence of additive white Gaussian noise, the performance of the least-squares solution depends on the proximity of the eigenvalues of the Fisher information matrix 
where ; and and are the Fisher information matrices of the sampled factor graphs. To design the sampling sets, we solve the following optimization problem
where possible candidates for are [17, 18], , or . In this paper, we use , also known as frame potential, as a figure of merit, since it can be showed that minimizing the frame potential is directly related to minimizing the mean squared error of the reconstruction. Furthermore, we show that the use of the frame potential as objective function allows to develop a low-complexity algorithm that has a multiplicative near-optimality guarantee.
To prove that our algorithm achieves near-optimality we rely on the concept of submodularity 
, a notion based on the property of diminishing returns that is useful for solving discrete combinatorial optimization problems of the form (7). Submodularity can formally be defined as follows.
Definition 1 (Submodular function ).
A function defined over the subsets of is submodular if it satisfies that for every , and we have
If a submodular function is also monotone non-decreasing, i.e., , for all ; and normalized, i.e., , then one can show that the solution of the greedy maximization Algorithm 1, , is 1/2 near-optimal with respect to the solution of , where is a matroid . We will show next that under some minor modifications (7) satisfies the aforementioned conditions.
Notice that we can express the frame potential of a sampled product graph as
where we see that this function can be factorized as a product of the frame potential of its factors. With a slight modification we can obtain a normalized, monotone non-decreasing, submodular function that can be employed as a surrogate cost function.
The set function given by is a monotone non-decreasing, normalized, and submodular function. Here, , and , with and .
See Appendix. ∎
Under the change of variables , maximizing is the same as maximizing , and thus can be used as submodular cost function in (7). Furthermore, the constraints in (7) form a truncated partition matroid  with and , where . Therefore, the solution to the greedy algorithm that solves (7), , is 1/2 near-optimal , i.e. , where is the optimal solution to (7).
4 Numerical results
In this section, we test the performance of the proposed sampling scheme for two different applications: subsampling a dynamic point cloud of a dancer, and active querying via subsampling for a recommender system.
4.1 Dynamic 3D point cloud
The moving dancer dataset consists of frames in which the 3D coordinates of markers, placed on the body of a dancer, were recorded. To represent the data as a graph signal we build a spatial 5-nearest-neighbor graph with the time-averaged position of the markers as suggested in ; and consider time as a cycle graph with vertices. The resulting product graph consists of more than nodes, and the dynamic signal can be represented using three matrices .
A visual inspection of the spectral decomposition of these signals shows that most of the energy is confined in the first few eigenmodes of the temporal and spatial graphs. In particular, in our simulations we limit the support of the signals to the first and spatial and temporal frequencies, respectively.
Using this representation, we run our greedy method with to select which nodes to sample. An illustration of the quality of the results for two different frames of the point cloud video is shown in Fig. 2. Even with this amount of compression on such a highly dimensional signal, the reconstruction is very accurate. The point cloud’s shape is hardly distorted and the point deviations are very small. In addition, we compute the relative error between the original and the estimated graph signals and we obtain a relative error of 3.44%. We emphasize here that the estimates are obtained by observing only 525 vertices from and from , i.e., 4.57% of all the vertices in the product graph. We also draw 1000 different random subsets of 600 vertices from with at least and vertices selected from and , respectively. However, all these random sampling sets lead to a singular system of equations, and hence the results are not shown here.
4.2 Active learning for recommender system
Most of the current recommendation algorithms solve the following estimation problem: given the past recorded preferences of a set of users for a set of products, what is the rating that these users would give to some other set of products? In this paper, in contrast, we focus on the data acquisition phase of the recommender system. In particular, we claim that by carefully designing which users to poll and on which items, we can obtain an estimation performance on par with the state-of-the-art methods using only a fraction of the data that current methods require, and using a simple least-squares estimator. Thus, expensive and random querying can be avoided.
We showcase this idea on the MovieLens dataset  that contains partial ratings of users over movies which are stored in a matrix . This dataset also provides different features that can be used to build 10-nearest-neighbors graphs for the user and movie relations. This way, we can regard as a signal living on the product of these two graphs.
The bandlimitidness of
has already been exploited to impute its missing entries[25, 26]. In our experiments, we use . Using this representation, we run our greedy algorithm with , resulting in a selection of user and movie vertices, i.e., 1875 vertices in the product graph. We reconstruct our signals using (5) and (3) and compute the RMSE of the estimation using the test mask provided by the dataset. Nevertheless, since our active query method requires access to ground truth data which is not provided in the dataset, we use GRALS  to complete the matrix, and use its estimates when required.
Fig. 3, shows the proposed active query sample resulting from the application of Algorithm 1 to solve (7). The user graph [cf. Fig. 2(a)] is made out of small clusters connected in a chain-like structure, resulting in a uniformly spread distribution of observed vertices. On the other hand, the movie graph [cf. Fig. 2(b)] is made out of a few big and small clusters. Hence, the proposed active query sample assigns more observations to the bigger clusters and fewer observations to the smaller ones. We also compare the performance of the state-of-the-art methods to that of our algorithm [cf. Table 1]. In light of these results, it is clear that a proper design of the sampling set allows to obtain the best performance with significantly fewer observed values, and using a much simpler non-iterative estimator.
In this paper, we have investigated the design of sparse samplers for the estimation of signals that reside on the vertices of a product graph. We have shown that by designing the sampling set using a combination of vertices from the graph factors we can overcome the curse of dimensionality, and design efficient subsampling schemes that guarantee a good performance for the reconstruction of graph signals. We have also proposed, a low-complexity greedy algorithm to select which vertices to sample, and provided a bound for its near-optimality with respect to the optimal subsampling set.
Appendix A Proof of Theorem 1
In order to simplify the derivations, let us introduce the notation
so that can also be written
We start by proving normalization, i.e.,
Now, recall that in  they proved that the frame potential set function is a monotone non-decreasing supermodular function. The complementary function preserves the supermodular property, but changes the monotonicity to non-increasing. Thus, when we invert its sign to obtain we obtain a monotone non-decreasing submdoular function.
Furthermore, since the multiplication of two monotone non-decreasing functions results in a monotone non-decreasing function, and the addition of a constant preserves monotonicity, is a monotone non-decreasing set function.
Let be a partition of , with for . To prove submodularity of we use Definition 1. However, since the ground set is now partitioned into the sets and , there are two possible ways the elements and can be selected: either they both belong to the same set, or they belong to different sets. We prove that (8) is satisfied for both cases.
1. If , then (8) can be developed as
Multiplying both sides of the inequality by we get
which is always satisfied since is supermodular. The same proof holds if .
-  A. Ortega, P. Frossard, J. Kovačević, J. M. F. Moura, and P. Vandergheynst, “Graph Signal Processing: Overview, Challenges, and Applications,” Proc. IEEE, vol. 106, no. 5, pp. 808–828, May 2018.
-  F. Grassi, A. Loukas, N. Perraudin, and B. Ricaud, “A Time-Vertex Signal Processing Framework: Scalable Processing and Meaningful Representations for Time-Series on Graphs,” IEEE Trans. Signal Process., vol. 66, no. 33, pp. 817–829, Feb 2018.
-  G. P. Wagner and P. F. Stadler, “Quasi-Independence, Homology and the Unity of Type: A Topological Theory of Characters,” J. Theor. Biol., vol. 220, no. 4, pp. 505 – 527, Feb 2003.
-  H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King, “Recommender systems with social regularization,” in Proc. ACM Int. Conf. Web Search and Data Mining, 2011, pp. 287–296.
-  W. Imrich and S. Klavzar, Product graphs, structure and recognition. New York : Wiley, 2000.
-  A. Sandryhaila and J. M. F. Moura, “Big Data Analysis with Signal Processing on Graphs: Representation and processing of massive data sets with irregular structure,” IEEE Signal Process. Mag., vol. 31, no. 5, pp. 80–90, 2014.
-  M. Tsitsvero, S. Barbarossa, and P. Di Lorenzo, “Signals on graphs: Uncertainty principle and sampling,” IEEE Trans. Signal Process., vol. 64, no. 18, pp. 4845–4860, Sep.
-  X. Zhu and M. Rabbat, “Approximating signals supported on graphs,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Kyoto, Japan, March 2012, pp. 3921–3924.
-  A. Anis, A. Gadde, and A. Ortega, “Towards a sampling theorem for signals on arbitrary graphs,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Florence, Italy, May 2014, pp. 3864–3868.
-  S. Chen, R. Varma, A. Sandryhaila, and J. Kovačević, “Discrete Signal Processing on Graphs: Sampling Theory,” IEEE Trans. Signal Process., vol. 63, no. 24, pp. 6510–6523, Dec 2015.
-  S. P. Chepuri and G. Leus, “Graph Sampling for Covariance Estimation,” IEEE Trans. Signal Inf. Process. Netw., vol. 3, no. 3, pp. 451–466, Sept 2017.
-  S. Chepuri, Y. Eldar, and G. Leus, “Graph Sampling With and Without Input Priors,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Calgari, Canada, April 2018, pp. 4564–4568.
-  D. Romero, M. Ma, and G. B. Giannakis, “Kernel-based reconstruction of graph signals,” IEEE Trans. Signal Process., vol. 65, no. 3, pp. 764–778, Feb.
-  A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs: Frequency analysis,” IEEE Trans. Signal Process., vol. 62, no. 12, pp. 3042–3054, Jun 2014.
-  G. Ortiz-Jiménez, M. Coutino, S. P. Chepuri, and G. Leus, “Sparse sampling for inverse problems with tensors,” arXiv preprint arXiv:1806.10976, 2018.
-  S. P. Chepuri and G. Leus, “Sparse sensing for statistical inference,” Foundations and Trends in Signal Processing, vol. 9, no. 3-4, pp. 233–368, 2016.
-  S. Joshi and S. Boyd, “Sensor selection via convex optimization,” IEEE Trans. Signal Process., vol. 57, no. 2, pp. 451–462, Feb 2009.
-  M. Shamaiah, S. Banerjee, and H. Vikalo, “Greedy sensor selection: Leveraging submodularity,” in Proc. 49th IEEE Conf. Decis. Control, Atlanta, GA, USA, Dec 2010, pp. 2572–2577.
-  S. Rao, S. P. Chepuri, and G. Leus, “Greedy sensor selection for non-linear models,” in IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, Cancun, Mexico, Dec 2015, pp. 241–244.
-  J. Rainieri, A. Chebira, and M. Vetterli, “Near-Optimal Sensor Placement for Linear Inverse Problems,” IEEE Trans. Signal Process., vol. 62, no. 5, pp. 1135–1146, March 2014.
-  “Submodular functions and optimization,” ser. Annals of Discrete Mathematics, S. Fujishige, Ed. Elsevier, 2005, vol. 58.
-  M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey, “An analysis of approximations for maximizing submodular set functions—II,” in Polyhedral combinatorics. Springer, Dec 1978, pp. 73–87.
-  R. G. Parker and R. L. Rardin, “Polynomial Algorithms–Matroids,” in Discrete Optimization, ser. Computer Science and Scientific Computing, R. G. Parker and R. L. Rardin, Eds. San Diego: Academic Press, 1988, pp. 57–106.
-  F. M. Harper and J. A. Konstan, “The movielens datasets: History and context,” ACM Trans. Interact. Intell. Syst., vol. 5, no. 4, p. 19, Jan 2016.
-  W. Huang, A. G. Marques, and A. Ribeiro, “Collaborative filtering via graph signal processing,” in Proc. Eur. Signal Process. Conf., Kos, Greece, Aug 2017, pp. 1094–1098.
-  V. Kalofolias, X. Bresson, M. Bronstein, and P. Vandergheynst, “Matrix completion on graphs,” in Proc. Neural Inf. Process. Systems, Workshop ”Out of the Box: Robustness in High Dimension”, Montreal, Canada, Dec 2014.
-  N. Rao, H.-F. Yu, P. K. Ravikumar, and I. S. Dhillon, “Collaborative filtering with graph information: Consistency and scalable methods,” in Proc. Neural Inf. Process. Systems,, Montreal, Canada, Dec 2015, pp. 2107–2115.
-  M. Bastian, S. Heymann, and M. Jacomy, “Gephi: An open source software for exploring and manipulating networks,” in Int. AAAI Conf. Weblogs and Social Media, San Jose, CA, USA, May 2009.
F. Monti, M. Bronstein, and X. Bresson, “Geometric matrix completion with recurrent multi-graph neural networks,” inProc. Advances Neural Inf. Process. Systems, Montreal, Canada, Dec 2017, pp. 3700–3710.
-  R. v. d. Berg, T. N. Kipf, and M. Welling, “Graph convolutional matrix completion,” arXiv:1706.02263, 2017.