1 Introduction
Matrix completion deals with the recovery of missing values of a matrix from a subset of its entries,
(1) 
Here stands for the unknown matrix, for the ground truth matrix, is a binary mask representing the input support, and denotes the Hadamard product. Since problem (1) is illposed, it is common to assume that belongs to some low dimensional subspace. Under this assumption, the matrix completion problem can be cast via the leastsquares variant,
(2) 
Relaxing the intractable rank penalty to its convex envelope, namely the nuclear norm, leads to a convex problem whose solution coincides with (2) under some technical conditions (Candès and Recht, 2009). Another way to enforce low rank is by explicitly parametrizing in factorized form, . The rank is upperbounded by the minimal dimension of . Further developing this idea, can be parametrized as a product of several matrices , a model we denote as deep matrix factorization (DMF). Gunasekar et al. (2017); Arora et al. (2019) investigated the minimization of overparametrized DMF models using gradient descent, and came to the following conclusion (which we will formally state in section 2): whereas in some restrictive settings minimizing DMF using gradient descent is equivalent to nuclear norm minimization (i.e., convex relaxation of (2)), in general these two models produce different results, with the former enforcing a stronger regularization on the rank of . This regularization gets stronger as (the depth) increases. In light of these results, we shall henceforth refer by "DMF" to the aforementioned model coupled with the specific algorithm used for its minimization, namely, gradient descent.
Oftentimes, additional information is available in the form of a graph that neatly encodes structural (geometric) information about . For example, we can constrain
to belong to a subspace of the eigenvectors of some graph Laplacian, i.e., to be bandlimited on the graph. Such information is generally overlooked by purely algebraic entities (e.g., rank), and becomes invaluable in the data poor regime, where the theorems governing reconstruction guarantees (i.e.,
(Candès and Recht, 2009)) do not hold. Our work leverages the recent advances in DMF theory to marry the two concepts: a framework for matrix completion that is explicitly motivated by geometric considerations, while implicitly promoting lowrank via its DMF structure.Contributions.
Our contributions are as follows:

We propose taskspecific DMF models that follow from geometric considerations, and study their dynamics.

We show that with our proposed models it is possible to obtain stateoftheart results on various recommendation systems datasets, making it one of the first successful applications of deep linear networks on real problems.

Our findings challenge the quality of the side information available in various recommendation systems datasets, and the ability of contemporary methods to utilize it in a meaningful and efficient way.
2 Preliminaries
Spectral graph theory.
Let be a (weighted) graph specified by its vertex set and edge set , with its adjacency matrix denoted by . Given a function on the vertices, we define the following quadratic form (also known as Dirichlet energy) measuring the variability of the function on the graph,
(3) 
The matrix is called the (combinatorial) graph Laplacian, and is given by , where is the degree matrix. is symmetric and positive semidefinite and therefore admits a spectral decomposition . Since ,
is always an eigenvalue of
. The graph Laplacian is a discrete generalization of the continuous LaplaceBeltrami operator, and therefore has similar properties. One can think of the eigenpairs as the graph analogues of "harmonic" and "frequency".A function on the vertices of the graph whose coefficients are small for large , demonstrates a "smooth" behaviour on the graph in the sense that the function values on nearby nodes will be similar. A standard approach to promoting such smooth functions on graphs is by using the Dirichlet energy (3) to regularize some loss term. For example, this approach gives rise to the popular bilateral and nonlocal means filters (Gadde et al., 2013). Structural information about the graph is encoded in the spectrum of the Laplacian. For example, the number of connected components in the graph is given by the multiplicity of the zero eigenvalue, and the second eigenvalue (counting multiple eigenvalues separately) is a measure for the connectivity of the graph (Spielman, 2009).
Product graphs and functional maps.
Let , be two graphs, with , being their corresponding graph Laplacians. The bases can be used to represent functions on these graphs. We define the Cartesian product of and , denoted by , as the graph with vertex set , on which two nodes are adjacent if either and or and . The Laplacian of
is given by the tensor sum of
and ,(4) 
and its eigenvalues are given by the Cartesian sum of the eigenvalues of , i.e., all combinations where is an eigenvalue of and is an eigenvalue of . Let be a function defined on . Then it can be represented using the bases of the individual Laplacians, . In the shape processing community, such is called a functional map, as it it used to map between the functional spaces of and . For example, given two functions, on and on , one can use to map between their representations and , i.e., . We shall henceforth interchangeably switch between the terms "signal on the product graph" and "functional map".
We will call a functional map smooth if it maps close points on one graph to close points on the other. A simple way to construct a smooth map is via a linear combination of eigenvectors of
corresponding to small eigenvalues ("low frequencies"). Notice that while the singular vectors of
are outer products of the columns of and , their ordering with respect to the eigenvalues of might be different than their lexicographic order.Implicit regularization of DMF.
Let be a matrix parametrized as a product of matrices (which can be interpreted as
linear layers of a neural network), and let
be an analytic loss function. Without loss of generality, we will assume that
. Arora et al. (2018, 2019)analyzed the evolution of the singular values and singular vectors of
throughout the gradient flow , i.e., gradient descent with an infinitesimal step size, with balanced initialization,(5) 
As a first step, we state that
admits an analytic singular value decomposition.
Lemma 1.
(Lemma 1 in Arora et al. (2019)) The product matrix can be expressed as:
(6) 
where: , and are analytic functions of ; and for every , the matrices and have orthonormal columns, while is diagonal (elements on its diagonal may be negative and may appear in any order).
The diagonal elements of , which we denote by , are signed singular values of ; the columns of and , denoted and , are the corresponding left and right singular vectors (respectively). Using the above lemma, Arora et al. (2019) characterized the evolution of singular values as follows:
Theorem 1.
(Theorem 3 in (Arora et al., 2019)) The signed singular values of the product matrix evolve by:
(7) 
If the matrix factorization is nondegenerate, i.e., has depth , the singular values need not be signed (we may assume for all ).
The above theorem implies that the evolution rates of the singular values are dependent on their size exponentiated by . As increases, the gap between their convergence rates grows, thereby inducing an implicit regularization on the effective rank of .
3 DMF with spectral geometric regularization
We assume that we are given a set of samples from the unknown matrix , encoded by a binary mask , and two graphs , encoding relations between the rows and the columns, respectively. Denote the Laplacians of these graphs and their spectral decompositions by , . We denote the Cartesian product between and by , and will henceforth refer to it as our reference graph. Our approach relies on a minimization problem of the form
(8) 
with denoting a data term of the form
(9) 
and is the Dirichlet energy of on , given by (see (4))^{1}^{1}1Note that it is possible to weigh the two terms differently, as we do in some of our experiments.
(10) 
To that end, we parametrize via a matrix product and eliminate the rank constraint,
(11) 
Since (11) is now a DMF model, this parametrization renders the rank constraint redundant, as according to Theorem 1 it will be captured by the implicit regularization induced by gradient descent.
To interpret this matrix factorization geometrically, we interpret as a signal living on a latent product graph
. Via the linear transformation
this signal is transported onto the reference graph , where it is assumed to be both lowrank and smooth (see Figure 1). Notice that the latent graph is used only for the purpose of illustrating the geometric interpretation, and there is no need to find it explicitly. Nevertheless, it is possible to promote particular properties of it via spectral constraints that can sometime improve the performance. We demonstrate these extensions in the sequel.To give a concrete example, suppose is a permuted version of some lowrank matrix , i.e., , and is the 2D Euclidean Laplacian. Then, via an appropriate ordering of the rows and columns of , it is possible to obtain a signal which is both smooth on and lowrank^{2}^{2}2On a sidenote, that is exactly the goal of the well known and closely related seriation problem (Recanati, 2018)..
For later reference let us rewrite (11) in the spectral domain. We will denote the Laplacians of the latent graph factors comprising by and their eigenbases by . Using those eigenbases and the eigenbases of the reference Laplacians , we can write,
(12)  
(13)  
(14) 
Under this reparametrization we get
(15) 
With some abuse of notation, (11) becomes
(16) 
with
(17) 
and
(18) 
3.1 Extensions
Additional regularization via spectral filtering.
We propose a stronger explicit regularization by demanding that both and be smooth on their respective graphs. Since we do not know the Laplacian of , we smooth via spectral filtering, i.e., through direct manipulation of its spectral representation . To that end, we pass through a bank of prechosen spectral filters , i.e., diagonal positive semidefinite matrices, and transport the filtered signals to according to
(19) 
In particular, we use the following filters,
(20) 
where denotes a vector with ones followed by zeros. For these manipulations to take effect, we replace in (16) with the following loss function,
(21) 
Despite the fact that we used separable filters in (19), these filters are coupled through the loss (21). This results in an overall inseparable spectral filter that still retains a DMF structure, since (19) is a layer DMF with two fixed layers. While the theory developed by Arora et al. (2019) does not cover the case of a multilayer DMF where only a subset of the layers are trainable, our empirical evaluations encourage us to conjecture that the implicit rank regularization is still in place. This additional regularization allows us to get decent reconstruction errors even when the number of measurements is extremely small, as we show in Figure 5.
Regularization of the individual layers.
Another extension we explore is imposing further regularization on the individual layers. For example, one could ask and to be jointly diagonalized by . Using (12)(13) we get,
(22) 
Thus, we can approximately enforce this constraint with the following penalty term,
(23) 
where denotes the offdiagonal elements. A similar treatment to the columns graph gives,
(24) 
We again emphasize that while these penalty terms are not a function of the product matrix, we are encouraged by our experimental results and by the results of Arora et al. (2018), to think that Theorem 1 can be extended to account for these terms as well. We leave these extensions to future work.
4 Experimental study on synthetic data
The goal of this section is to compare between our approach and vanilla DMF on a simple example of a community structured graph. We exhaustively compare between the following distinct methods:

Deep matrix factorization (DMF):
(25) 
Spectral geometric matrix completion (SGMC): The proposed approach defined by the optimization problem (16).

Functional Maps (FM, SGMC1): This method is like SGMC with a single layer, i.e., we optimize only for , while and are set to identity.
We use the graphs taken from the synthetic Netflix dataset. Synthetic Netflix is a small synthetic dataset constructed by (Kalofolias et al., 2014) and (Monti et al., 2017), in which the user and item graphs have strong communities structure. See Figure 10 in Appendix A for a visualization of the user/item graphs. It is useful in conducting controlled experiments to understand the behavior of geometryexploiting algorithms. In all our tests we use a randomly generated bandlimited matrix on the product graph . For the complete details please refer to the captions of the relevant figures.
Performance evaluation.
To evaluate the performance of the algorithms in this section, we report the root mean squared error,
(26) 
computed on the complement of the training set. Here is the recovered matrix and is the binary mask representing the support of the set on which the RMSE is computed.
We explore the following aspects:
In this experiment, we study the robustness of SGMC in the presence of noisy graphs. We perturbed the edges of the graphs by adding random Gaussian noise with zero mean and tunable standard deviation to the adjacency matrix. We discarded the edges that became negative as a result of the noise, and symmetrized the adjacency matrix. SGMC1/SGMC2/SGMC3 stand for SGMC with 1 layer (training only
), 2 layers (training ) and 3 layers (). Left: With clean graphs all SGMC methods perform well. As the noise increases, the regularization induced by the depth kicks in and there is a clear advantage for SGMC3. For large noise, SGMC3 and DMF achieve practically the same performance. Middle & Right: eigenvalues of for different noise levels. Notice the steps in the spectra reflecting the community structure of the graphs. Even for moderately large amounts of noise, the structure of the lower part of the spectrum is preserved, and the effect on the lowfrequency (smooth) signal remains small.
DMF 


SGMC 
Sampling density.
We investigate the effect of the number of samples on the reconstruction error and the effective rank of the recovered matrix (Roy and Vetterli, 2007). We demonstrate that in the datapoor regime, the implicit regularization of DMF is too strong resulting in poor recovery, compared to a superior performance achieved by incorporating geometric regularization through SGMC. These experiments are summarized in Figure 2.
Initialization.
In all of our experiments we initialize with balanced initialization (5), with scaled identity matrices . We explore the effect of initialization in Figure 11 (in Appendix A).
Rank of the underlying matrix.
We explore the effect of the rank of the underlying matrix, showing that as the rank increases it becomes harder for both SGMC and DMF to recover the matrix. A remarkable property of SGMC is that it is able to get a decent approximation of the effective rank of the matrix even with extremely low number of samples. These experiments are summarized in Figure 2.
Noisy graphs.
We study the effect of noisy graphs on the preformance of SGMC. Figure 3 demonstrates that SGMC is able to utilize graphs with substantial amounts of noise before its performance drops to the level of vanilla DMF (which does not rely on any knowledge of the row/column graphs).
Dynamics.
5 Results on recommender systems datasets
We demonstrate the effectiveness of our approach on the following datasets: Synthetic Netflix, Flixster, Douban, Movielens (ML100K) and Movielens1M (ML1M) as referenced in Table 1. The datasets include user ratings for items (such as movies) and additional features. For all the datasets we use the users and items graphs taken from Monti et al. (2017). The ML1M dataset was taken from Berg et al. (2017), for which we constructed 10 nearest neighbor graphs for users/items from the features, and used a Gaussian kernel with for edge weights. See Table 4 in Appendix A for a summary of the dataset statistics. For all the datasets, we report the results for the same test splits as that of (Monti et al., 2017) and (Berg et al., 2017). The compared methods are referenced in Table 1.
Proposed baselines.
We report the results obtained using the methods discussed above, with the addition of the following method:

SGMCZ: a variant of SGMC that uses (21) as a data term. For this method we chose a maximal value of (which can be larger than ) and a skip determining the spectral resolution, denoted by . We use .
In addition, we add the diagonalization terms (23),(24) weighted by , respectively, to the SGMC/SGMCZ methods. The optimization is carried out using gradient descent with fixed step size (i.e., fixed learning rate), which is provided for each experiment alongside all the other hyperparameters in Table 5.
Initialization.
All our methods are deterministic and did not require multiple runs to account for initialization. We always initialize the matrices with . In Figure 11 we reported results on synthetic Netflix and ML100K datasets for different values of . We noticed that for SGMC and SGMCZ it is best to use . According to (Gunasekar et al., 2017; Li et al., 2017), DMF requires a large to decrease the generalization error. We used DMF with for Synthetic Netflix and for the real world datasets, in accordance with Figure 11 and our experimentation. In the cases where only one of the bases was available, such as in Douban and Flixsteruser only benchmarks, we set the basis corresponding to the absent graph to identity.
Stopping condition.
Our stopping condition for the gradient descent iterations is based on a validation set. We use of the available entries for training (i.e., to construct the mask ) and the rest for validation. The split was chosen at random. We stop the iterations when the RMSE (26), evaluated on the validation set, does not change by more than between two consecutive iterations, . Since we did not apply any optimization into the choice of the validation set, we also report the best RMSE achieved on the test set via early stopping. In this regard, the number of iterations is yet another hyper parameter that has to be tuned for best performance.
5.1 Cold start analysis
A particularly interesting scenario in the context of recommender systems is the presence of coldstart users, referring to the users who have not rated enough movies yet. We perform an analysis of the performance of our method in the presence of such cold start users on the ML100K dataset. In order to generate a dataset consisting of cold start users, we sort the users according to the number of ratings provided by each user, and retain at most ratings (chosen randomly) of the bottom users (i.e., the users who provided the least ratings). We choose the values and
, and run our algorithms: DMF, SGMC and SGMCZ, with the same hyperparameter settings used for obtaining Table
1. We use the official ML100K test set for evaluation. Similar to before, we use of the training samples as a validation set used for determining the stopping condition. The results presented in Figure 5 suggest that the SGMC and SGMCZ outperform DMF significantly, indicating the importance of the geometry as data becomes scarcer. As expected, we can see that the performance drops as the number of ratings per user decreases. Furthermore, we can observe that SGMCZ consistently outperforms SGMC by a small margin. We note that SGMCZ, even in the presence of cold start users with ratings, is still able to outperform the full data performance of (Monti et al., 2017), demonstrating the strength of geometry and implicit lowrank induced by SGMCZ.Scalability.
All the experiments presented in the paper were conducted on a machine consisting of 64GB CPU memory, on an NVIDIA GTX 2080Ti GPU. Most of our largescale experiments take upto 1030 minutes of time until convergence, therefore, are rather quick. In this work we focused on the conceptual idea of solving matrix completion via the framework of deep matrix factorization by incorporating geometric regularization, paying little attention to the issue of scalability. The dependence of our method on eigenvalue decomposition might hinder its scalability prospects. While this did not pose a problem for the small data sets we used in this report, it is nonetheless an issue that we intend to address in our future work. We believe that our approach can be carefully transformed into the spatial domain where the sparse structure of the Laplacian matrix can be exploited to tackle scalability issues.
5.2 Discussion
A few remarkable observations can be extracted from Table 1: First, on the Douban and ML100K dataesets, vanilla DMF shows competitive performance with all the other methods. This suggests that the geometric information is not very useful for these datasets. Second, the proposed SGMC algorithms outperform the other methods, despite their simple and fully linear architecture. This suggests that the other geometric methods do not exploit the geometry properly, and this fact is obscured by their cumbersome architecture. Third, while some of the experiments reported in Table 1 showed only slight margins in favor of SGMC/SGMCZ compared to DMF, the results in the Synthetic Netflix column, the ones reported on Synthetic Movielens100K (Table 3 in Appendix A) and the ones reported in Figure 2, suggest that when the geometric model is accurate our methods demonstrate superior results. Table 2 in Appendix A presents the results of Movielens1M. First, we can deduce that vanilla DMF model is able to match the performance of complex alternatives. Furthermore, using graphs produces slight improvements over the DMF baseline and overall provides competitive performance compared to heavily engineered methods. On Synthetic Netflix, we notice that by using SGMC, we outperform Monti et al. (2017) by a significant margin, reducing the test RMSE by half. Furthermore, it can be observed that DMF performs poorly on both synthetic datasets compared to SGMC/SGMCZ, raising a question as to the quality of the graphs provided with those datasets on which DMF performed comparably.
A compelling argument for this behaviour is given by Table 4 in Appendix A. We can see that in the real datasets we tested on, the number of available samples is way below the density required by DMF to achieve good performance, in accordance with our findings in section 4. With high quality graphs, we should have expected SGMC to outperform DMF by a large margin. Our conclusion is that while geometric matrix completion algorithms may seem like gold, in the absence of enough data and good geometric priors, they are just fool’s gold.
6 Related work
Geometric matrix completion.
There is a vast literature on classical approaches for matrix completion, and covering it is beyond the scope of this paper. In recent years, the advent of deep learning platforms equipped with efficient automatic differentiation tools allows the exploration of sophisticated models that incorporate intricate regularizations. Some of these contemporary approaches to matrix completion fall under the umbrella term of
geometric deep learning, which generalizes standard (Euclidean) deep learning to domains such as graphs and manifolds. For example, (GCNNs) follow the architecture of standard CNNs, but replace the Euclidean convolution operator with linear filters constructed using the graph Laplacian. We distinguish between graph based approaches which make use of the bipartite graph structure of the rating matrix (e.g., Berg et al. (2017)), and geometric matrix completion techniques which make use of side information in the form of graphs encoding relations between rows/columns (Kovnatsky et al., 2014; Kalofolias et al., 2014; Monti et al., 2017).More recently, it has been demonstrated that some graph CNN architectures can be greatly simplified, and still perform competitively on several graph analysis tasks (Wu et al., 2019). Such simple techniques have the advantage of being easier to analyze and reproduce. One of the simplest notable approachs is deep linear networks, networks comprising of only linear layers. While these network are still mostly used for theoretical investigations, we note the recent results shown in BellKligler et al. (2019) that successfully employed such a network for the task of blind image deblurring.
Product manifold filter & Zoomout.
The inspiration for our paper stems from techniques for finding shape correspondence. In particular, the functional maps framework and its variants (Ovsjanikov et al., 2012, 2016). Most notably the work of (Litany et al., 2017) who combined functional maps with joint diagonalization to solve partial shape matching problems, and the product manifold filter (PMF) (Vestner et al., 2017a, b) and zoomout (Melzi et al., 2019) – two greedy algorithms for correspondence refinement by gradual introduction of high frequencies.
7 Conclusion
In this work we have proposed a simple spectral technique for matrix completion, building upon recent practical and theoretical results in geometry processing and deep linear networks. We have shown, through extensive experimentation on real and synthetic datasets, that combining the implicit regularization of DMF with explicit, and possibly noisy, geometric priors can be extremely useful in datapoor regimes. Our work is a step towards building interpretable models that are grounded in theory, and proves that such simple models need not only be considered for theoretical study. With the proper glasses, they can be made useful.
References
 On the optimization of deep networks: implicit acceleration by overparameterization. External Links: 1802.06509 Cited by: §2, §3.1.
 Implicit regularization in deep matrix factorization. arXiv preprint arXiv:1905.13655. Cited by: Table 2, §1, §2, §2, §3.1, Table 1, Lemma 1, Theorem 1.

Blind superresolution kernel estimation using an internalgan
. In Advances in Neural Information Processing Systems 32, pp. 284–293. Cited by: §6.  Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263. Cited by: Table 2, Table 1, §5, §6.
 Exact matrix completion via convex optimization. Foundations of Computational mathematics 9 (6), pp. 717. Cited by: §1, §1, Table 1.
 Neural network matrix factorization. CoRR abs/1511.06443. External Links: Link, 1511.06443 Cited by: Table 2.
 Bilateral filter: graph spectral interpretation and extensions. In 2013 IEEE International Conference on Image Processing, pp. 1222–1226. Cited by: §2.
 Implicit regularization in matrix factorization. In Advances in Neural Information Processing Systems, pp. 6151–6159. Cited by: §1, §5.
 The movielens datasets: history and context. Acm transactions on interactive intelligent systems (tiis) 5 (4), pp. 19. External Links: Link Cited by: Table 1.
 A matrix factorization technique with trust propagation for recommendation in social networks. In Proceedings of the fourth ACM conference on Recommender systems, pp. 135–142. Cited by: Table 1.
 Matrix completion on graphs. arXiv preprint arXiv:1408.1717. Cited by: §4, Table 1, §6.
 Matrix factorization techniques for recommender systems. Computer 42 (8), pp. 30–37. External Links: ISSN 00189162 Cited by: Table 2.
 Functional correspondence by matrix completion. External Links: 1412.8070 Cited by: §6.

LLORMA: local lowrank matrix approximation.
Journal of Machine Learning Research
17 (15), pp. 1–24. Cited by: Table 2.  Algorithmic regularization in overparameterized matrix sensing and neural networks with quadratic activations. arXiv preprint arXiv:1712.09203. Cited by: §5.
 Fully spectral partial shape matching. In Computer Graphics Forum, Vol. 36, pp. 247–258. Cited by: §6.
 Recommender systems with social regularization. In Proceedings of the fourth ACM international conference on Web search and data mining, pp. 287–296. Cited by: Table 1.
 ZoomOut: spectral upsampling for efficient shape correspondence. arXiv preprint arXiv:1904.07865. Cited by: §6.
 Geometric matrix completion with recurrent multigraph neural networks. In Advances in Neural Information Processing Systems, pp. 3697–3707. Cited by: Figure 10, §4, §5.1, §5.2, Table 1, §5, §6.
 Functional maps: a flexible representation of maps between shapes. ACM Transactions on Graphics (TOG) 31 (4), pp. 30. Cited by: §6.
 Computing and processing correspondences with functional maps. In SIGGRAPH ASIA 2016 Courses, pp. 9. Cited by: §6.
 Collaborative filtering with graph information: consistency and scalable methods. In Advances in neural information processing systems, pp. 2107–2115. Cited by: Table 1.
 Relaxations of the seriation problem and applications to de novo genome assembly. Ph.D. Thesis. Cited by: footnote 2.
 The effective rank: a measure of effective dimensionality. In 2007 15th European Signal Processing Conference, pp. 606–610. Cited by: Figure 2, §4.
 Restricted boltzmann machines for collaborative filtering. In Proceedings of the 24th International Conference on Machine Learning, ICML ’07, New York, NY, USA, pp. 791–798. External Links: ISBN 9781595937933 Cited by: Table 2.
 Probabilistic matrix factorization. In Proceedings of the 20th International Conference on Neural Information Processing Systems, NIPS’07, USA, pp. 1257–1264. External Links: ISBN 9781605603520 Cited by: Table 2.

AutoRec: autoencoders meet collaborative filtering
. In Proceedings of the 24th International Conference on World Wide Web, WWW ’15 Companion, New York, NY, USA, pp. 111–112. External Links: ISBN 9781450334730 Cited by: Table 2.  Spectral graph theory. Lecture Notes, Yale University, pp. 740–0776. Cited by: §2.
 Efficient deformable shape correspondence via kernel matching. In 2017 International Conference on 3D Vision (3DV), pp. 517–526. Cited by: §6.

Product manifold filter: nonrigid shape correspondence via kernel density estimation in the product space
. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 3327–3336. Cited by: §6.  Simplifying graph convolutional networks. arXiv preprint arXiv:1902.07153. Cited by: §6.
 A neural autoregressive approach to collaborative filtering. In Proceedings of The 33rd International Conference on Machine Learning, M. F. Balcan and K. Q. Weinberger (Eds.), Proceedings of Machine Learning Research, Vol. 48, New York, New York, USA, pp. 764–773. Cited by: Table 2.
Appendix A Appendix
Ablation study.
We study the effects of different hyperparameters of the algorithms on the final reconstruction of the matrix. We perform an ablation study on the effects of on DMF, SGMC and SGMCZ. The results are summarized in Figures 6, 7, 8. It is interesting to note that in the case of DMF and SGMC, overparametrizing consistently improves the performance (see Figure 8), but it only holds up to a certain point, beyond which the overparametrization does not seem to effect the reconstruction error. Notice that in the Table 5, control the Dirichlet energy of rows and columns; while govern the weights of row/column diagonalization energy.
Synthetic MovieLens100K.
While the experiments reported in Table 1 showed slight margins in favor of methods using geometry, we further experimented with a synthetic model generated from the ML100K dataset. The purpose of this experiment is to investigate whether the results are due to the DMF model or due to the geometry as incorporated by SGMC/SGMCZ. The synthetic model was generated by projecting on the first 50 eigenvectors of , and then matching the ratings histogram with that of the original ML100K dataset. This nonlinear operation increased the rank of the matrix from to about . See Figure 9 in the Appendix for a visualization of the full matrix, singular value distribution and the users/items graphs. The test set and training set were generated randomly and are the same size as those of the original dataset. The results reported in Table 3 and those on the Synthetic Netflix column in Table 1 clearly indicate that SGMC/SGMCZ outperforms DMF, suggesting that when the geometric model is accurate it is possible to use it to improve the results.
Model  ML1M 

PMF (Salakhutdinov and Mnih, 2007)  
IRBM (Salakhutdinov et al., 2007)  
BiasMF (Koren et al., 2009)  
NNMF (Dziugaite and Roy, 2015)  
LLORMALocal (Lee et al., 2016)  
IAUTOREC (Sedhain et al., 2015)  
CFNADE (Zheng et al., 2016)  
GCMC (Berg et al., 2017)  
DMF (Arora et al., 2019), (ours)  
SGMC (ours) 
Model  Synthetic ML100K 

DMF  
SGMC  
SGMCZ 
Dataset  Method  

DMF  
FM  
SGMC  
SGMCZ  
DMF  
Flixster  SGMC  
SGMCZ  
Flixster  SGMC  
(users only)  SGMCZ  
DMF  
Douban  SGMC  
SGMCZ  
DMF  
ML100K  SGMC  
SGMCZ  
DMF  
ML1M  SGMC  
DMF  
SGMC  
SGMCZ 