 # Subspace Least Squares Multidimensional Scaling

Multidimensional Scaling (MDS) is one of the most popular methods for dimensionality reduction and visualization of high dimensional data. Apart from these tasks, it also found applications in the field of geometry processing for the analysis and reconstruction of non-rigid shapes. In this regard, MDS can be thought of as a shape from metric algorithm, consisting of finding a configuration of points in the Euclidean space that realize, as isometrically as possible, some given distance structure. In the present work we cast the least squares variant of MDS (LS-MDS) in the spectral domain. This uncovers a multiresolution property of distance scaling which speeds up the optimization by a significant amount, while producing comparable, and sometimes even better, embeddings.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Multidimensional Scaling (MDS) is one of the most popular techniques for dimensionality reduction, whose purpose is to represent dissimilarities between objects as distances between points in a low dimensional space. Since its invention in the 50’s by Torgerson  within the field of psychometrics, it has been used in diverse fields including sociology, marketing, and colorimetry, to name a few [5, 22]. More recently, MDS has found applications in the field of geometry processing, in such tasks as analysis and synthesis of non-rigid shapes [8, 6]. In this regard, MDS can be thought of as a shape from metric algorithm, consisting of finding a configuration of points in the Euclidean space that realize, as isometrically as possible, some given distance structure. Least squares Multidimensional Scaling (LS-MDS) is a particular MDS model which seeks a low dimensional Euclidean embedding by minimizing a geometrically meaningful criterion, namely the Kruskal stress

 σ(X)=∑i

Here is a (symmetric) measure of dissimilarity between two sample points and ; is a (symmetric) weight assigned to the pairwise term between those samples, and are the coordinates of the low-dimensional Euclidean embedding. Since (1) is a nonlinear non-convex function, a variety of non-linear optimization techniques have been employed for its minimization, often resulting in local minima whose quality crucially depends on the initialization. Of these techniques, one of the most popular approaches is the SMACOF algorithm discovered by Jan de-Leeuw , which stands for Scaling by Majorizing a Complicated Function. SMACOF is essentially a gradient based method, which suffers from the typical slow convergence associated with first order optimization methods. To supplement on this, a single iteration of SMACOF requires the computation of the Euclidean pairwise distances between all points participating in the optimization at their current configuration, a time consuming task on its own, which limits its application to small size data.

A related family of methods with the same goal in mind, confusingly called classical (multidimensional) scaling, aims at finding a low-dimensional Euclidean embedding by minimizing the following algebraic criterion called strain

 (2)

where is a centering matrix transforming a Euclidean squared-distance matrix into a Gram matrix of inner products by fixing the origin, and

denotes the vector of ones. If

is a Euclidean squared-distance matrix between points in , is a Gram matrix of rank (at most) , and can be faithfully approximated by a Gram matrix of lower rank. However, if is not a Euclidean squared-distance matrix, one can only hope that it is indeed well approximated by a low-rank Gram matrix after applying the double centering transformation.

Though (2) is not convex, it can be minimized globally by eigendecomposition techniques. On top of being expensive to apply, these techniques require the availability of the complete matrix , which may be time consuming in the best case, or impossible to obtain in the worst case. Iterative methods can also be used for strain minimization , which allows the incorporation of weights and missing distances, at the expense of losing the global convergence guarantees. Due to the lack of geometric intuition behind it, strain minimization results in inferior embeddings 

and has some other drawbacks. For example, classical scaling exhibits large sensitivity to outliers since the double centering transformation spreads the effect of a single outlier to the rest of the points in the same row/column.

##### Contribution.

Our work leverages the recent trend in geometry processing of using the truncated eigenbasis of the Laplace-Beltrami operator (LBO) to approximate smooth functions on manifolds. This favorable representation has been shown to improve classical scaling methods by orders of magnitude , as well as other key problems in geometry processing, e.g. non-rigid shape correspondence 

. By constraining the embedding to an affine subspace spanned by the leading eigenvectors of the Laplace-Beltrami operator, we show that LS-MDS exhibits a multi-resolution property, and can be solved significantly faster, making it applicable to larger size problems and to new applications.

The rest of the article is arranged as follows: In Section 2 we review previous works and their relation to our work. In Section 3 we discuss the problem of stress minimization and describe in detail the SMACOF algorithm, a minimization-majorization type algorithm suited for this purpose. In Section 4 we give a short overview of the Laplace-Beltrami operator in the continuous and discrete settings, and the representation of band-limited signals on manifolds. In Section 5 we describe the core numerical scheme of our subspace LS-MDS approach, an algorithm we call spectral SMACOF and a multiresolution approach. Finally, in Section 6 we present our results, followed by a concluding discussion in Section 7.

## 2 Related work

In this section we review previous work related to MDS, stress minimization techniques and subspace parametrization. The oldest version of MDS, called classical scaling, is due to Torgerson 

. The formulation of MDS as a minimization over a loss function is due to Shepard

 and Kruskal [20, 21]. The SMACOF algorithm we describe in the next sections is due to de Leeuw , but in fact was previously developed by Guttman . For a comprehensive overview of MDS problems and algorithms we refer the reader to . A variety of methods have been proposed to accelerate classical scaling over the years, trying to reduce the size of the input pairwise distance matrix.  and 

propose to embed only a subset of the points and find the rest by interpolation.

 uses the Nyström method to perform out of sample extension to an existing embedding. However, it does not make use of the geometry of the data.  interpolate the distance matrix from a subset of pairwise distances by representing it in the Laplace-Beltrami eigenbasis and forcing it to be smooth on the manifold.  also uses the LBO and the Nyström method to interpolate the distance matrix from a set of known rows. This time the interpolation is formulated in the spatial domain so there is no need to compute a basis.

Regarding stress based scaling,  proposed a multi-grid approach. They rely heavily on the spatial representation of the shapes in switching between resolutions, which makes their scheme hard to generalize.  used vector extrapolation techniques to accelerate the convergence rate of the SMACOF algorithm. This acceleration technique is a meta-algorithm that can be put on top of our algorithm as well.  considers MDS problems with linear restrictions on the configurations, essentially constraining the embedding to belong to a subspace.  uses a subspace method for LS-MDS, but their subspace is an ad-hoc construction used for multi-channel image visualization.

Linear subspace methods are common in machine learning and pattern recognition applications where they are used to restrict a signal to some low-rank subspace, usually an eigenspace of some matrix, which preserves its important details. Examples include principle component analysis (PCA), locally linear embedding (LLE), linear discriminant analysis (LDA) and many more

. One of those methods which is related to our algorithm is the Laplacian eigenmaps method . It is essentially an MDS problem using only local distances. Its outcome is an embedding of the distance matrix into the first eigenvectors of the Laplace-Beltrami operator. In some sense, our proposed approach is a merge between Laplacian eigenmaps and LS-MDS. A related line of works uses Laplacian (spectral) regularization in the context of low-rank subspace constructions on graphs [28, 18].

Subspace methods are also common in numerical optimization to accelerate and stabilize optimization algorithms. A review can be found here . It is also worth mentioning the work of  which uses the cotangent Laplacian as a proxy for the Hessian in various geometric optimization problems. Ours is a similar but different approach. We employ a subspace method which effectively projects the Hessian on the subspace of the leading eigenvectors of the Laplacian, thus solving a series of small dimensional problems, rather than approximate the Hessian using the Laplacian while retaining the dimension of the original problem.

## 3 LS-MDS and stress minimization

Given an symmetric matrix of dissimilarities, least squares MDS (LS-MDS) is an MDS model that looks for an Euclidean embedding by minimizing the stress function (1). The latter can be minimized by nonlinear optimization techniques, e.g. gradient descent111Note also that (1) is not differentiable everywhere.. One particularly elegant and popular algorithm is SMACOF [12, 8] – a majorization-minimization type algorithm which can be shown equivalent to gradient descent with a specific choice of the step size . In light of that, we consider it as a representative for the class of first order methods for the minimization of (1). The idea is to use the Cauchy-Schwarz inequality to construct a majorizing function for the stress, obeying for every and and . To do so, one should first note that (1) can be written alternatively as

 σ(X)=tr(XTVX)−2tr(XTB(X)X)+∑i

where and are symmetric row-and column-centered matrices given by

 vij={−wiji≠j∑k≠iwiki=j (4)
 bij=⎧⎪⎨⎪⎩−wijdij∥xi−xj∥−1i≠j,xi≠xj0i≠j,xi=xj−∑k≠ibiki=j, (5)

and define

 h(X,Z)=tr(XTVX)−2tr(ZTB(Z)X)+∑i

which is a convex (quadratic) function in . Therefore, if we use the following iteration

 Xk+1 =argminXh(X,Xk)=V†BkXk=(V+1N11T)−1BkXk, (7)

where and denotes the pseudo-inverse of the rank deficient matrix , we obtain a monotonically decreasing sequence of stress values

 σ(Xk+1)≤h(Xk+1,Xk)≤h(Xk,Xk)=σ(Xk). (8)

• Equation (8) follows directly from (6) and (7); in the last passage, we used the fact that

 V†=(V+1N11T)−1−1N11T (9)

and that . The multiplicative update (7) suggests that at each iteration, each coordinate in the current embedding is a weighted mean of the coordinates of the previous embedding, starting with an initial embedding , where the weights are given by the ratio of the pairwise distances, . In what follows, we refer to (8) as the majorization inequality.

Each iteration of SMACOF requires the computation of the pairwise Euclidean distances between all points in the current embedding, a task of complexity , and the solution of a linear system involving the pseudo-inverse of the matrix . Despite being rank deficient (in fact, is the graph Laplacian of the graph encoded by the adjacency matrix , i.e., the pairwise weights), is not necessarily sparse, and computing its pseudo-inverse is of order , except when has a special form. For example, in case all the weights are equal to , we get

 V†=1N(I−1N11T), (10)

• and the SMACOF iteration reduces to

 Xk+1=1NBkXk. (11)

To summarize, a single iteration of SMACOF consist of two steps: Majorization – the construction of (5), which requires non-linear operations; and Minimization – the solution of (7), which in general requires for the factorization of in the first iteration, and operations for solving the linear system using forward/backward substitution in the following iterations.

## 4 Laplace-Beltrami operator and band-limited signals

The Laplace-Beltrami operator is an analogue of the Euclidean Laplacian defined on a manifold, which has become a standard tool in the field of geometry processing. For a closed compact manifold surface , let denote its Laplace-Beltrami differential operator. Consider the equation . The pair are an eigenpair of . Note that

is always an eigenvalue, the corresponding eigenfunctions are constant functions. The eigenvalues of the Laplace-Beltrami operator are non-negative and constitute a discrete set. We will assume that the eigenvalues are distinct, so we can put them into ascending order,

. The appropriately normalized eigenfunction corresponding to will be denoted by . The normalization is achieved using the inner product. Given two functions and on the surface, their inner product is denoted by , and is defined as the surface integral

 ⟨f,g⟩=∫Sfgda. (12)

Since the Laplace-Beltrami operator is Hermitian, the eigenfunctions corresponding to its different eigenvalues are orthogonal. Given a function on the surface, one can expand it in terms of the eigenfunctions , where the coefficients are . We refer to signals for which as -bandlimited. This intuitively suggests that to smooth a function one should simply discard the coefficients corresponding to the larger eigenvalues, i.e. truncate the infinite expansion above. A variety of discretizations exist for the Laplacian, each with its own pros and cons. We use the cotangent Laplacian of . Instead of using the usual trigonometric construction, it can also be expressed in terms of the edge lengths and Heron’s formula. See  for example. The result is a generalized eigenvalue problem where is a symmetric positive definite matrix called stiffness matrix and is a diagonal matrix called the mass matrix. Notice that in the discrete setting, this basis is orthonormal with respect to the inner product on the manifold .

## 5 Subspace LS-MDS

The dependence of a single iteration of SMACOF on (or ) severely limits its applicability in embedding a large number of points. Deviating from the LS-stress model (1) makes things even worse. Different stress models like the stress, which could in theory be solved as a series of weighted MDS (WMDS) problems using an iteratively reweighted least squares technique, require very long time to converge if one uses the regular SMACOF iterations. Moreover, adding linear constraints on the configuration would require the solution of a large quadratic program in each iteration, making it practically impossible to use.

In order to reduce the dependence on , we restrict the embedding to lie within a low-dimensional subspace of . Low-rank representation of signals is a very popular approach in a variety of machine learning applications, and have also been applied previously to stress majorization . Our goal is to show that for multidimensional scaling problems arising in geometry processing, a particular choice of subspace, namely the subspace of band-limited signals on a manifold, enables to accelerate stress majorization by a significant amount. Unlike the ad-hoc construction in , this subspace is a geometric construction. We also draw a connection between the number of samples, the band-limit, and the quality of the approximation. Although only an observational result at this stage, similar results have been obtained for the case of reconstruction of band-limited signals on graphs .

We assume that an initial manifold embedded in that encodes the local relationship between points is easily obtainable, and denote the discrete samples of this manifold by . The (discrete) embedding can be written as , and we can reformulate MDS in terms of the displacement field . Our observation is that since the embedding found by LS-MDS maintains the local relationship between samples, as encouraged by the stress term (1), can be modeled as a band-limited signal (i.e. ”smooth”) on this manifold. We do that by explicitly representing the displacement field as a linear combination of the first eigenvectors of the Laplace-Beltrami operator on the manifold . We denote the concatenation of the first eigenvectors by , and the problem therefore becomes

 minα∈Rp×mσ(X0+Φα) (13)

The new update step in terms of is

 αk+1 =argminαh(X0+Φα,Xk) (14) (15)

where . Notice that the majorization inequality (8) still holds, since the requirement is true for any and , so it must also be true for restricted to an affine subspace

 σ(Xk+1)≤h(X0+Φαk+1,Xk)≤σ(Xk). (16)

Restricting the embedding to lie within a rank subspace reduces the cost of solving the linear system (7) to (amortized ). This may have a profound effect on the minimization step of SMACOF for weighted MDS problems or when additional terms are added to the stress. However, the main bottleneck remains in the majorization step, which requires the computation of all pairwise Euclidean distances in the current embedding. Our main observation is that when the displacement field is -bandlimited, i.e., can be written as a linear combination of the first Laplace-Beltrami eigenvectors, it is enough to use points to get an embedding which is almost as good, in terms of stress value, as the embedding achieved by using all the points. The quantity is a sampling ratio which can be found by experimentation. In practice, we set in all our experiments. Denoting the set of sampled points by we define a sampling matrix with if and otherwise, and instead of minimizing the objective of (13), we minimize . Each iteration of SMACOF now becomes

 αk+1 =argminαh(SX0+SΦα,SXk) (17) (18)

• where are the matrices defined in (4),(5) constructed only from the sampled points. We denote the solution and the final embedding by and , respectively, referring to the latter as spectral interpolation. We summarize this algorithm, dubbed spectral SMACOF, in Alg. 1.

Notice that the pairwise distances are only required between the sampled points. In order to sample the points, we employ the farthest point sampling strategy , computing the distances as part of the sampling process. Though the algorithm requires the computation of the truncated eigenbasis of the LBO, in order to apply (18) we do not need the full basis , but only its sampled version . The full basis is only needed for the final interpolation of the displacement field , which can also be reformulated as an optimization problem

 minδ∥Sδ−SΦα∗∥2F+λ⟨δ,Lδ⟩, (19)

where is the Laplacian matrix. This opens up the possibility of reducing the amount of time required to construct the basis, and is an issue we leave for future research.

To summarize, a single iteration of spectral SMACOF consist of two steps (excluding the interpolation which can be deferred to the final step): Majorization – the construction of , which requires operations; and Minimization – the solution of (18), which in general requires for the computation of in the first iteration, and operations in the succeeding iterations.

### 5.1 Multi-resolution scheme for stress minimization

To control the trade off between convergence speed and approximation quality, we employ a multi-resolution scheme. Multi-resolution and multi-grid methods have been previously employed for distance scaling with limited success . Since those methods relied solely on the spatial representation of the embedding, the switch between resolution levels required cumbersome decimation and interpolation operators which were hard to generalize. The spectral formulation provides those operators practically ”for free”, where the decimation simply requires sampling the basis functions at the sampling points, and the interpolation is carried out using the full subspace basis.

The scheme consists of two distinct hierarchies, one for the spatial domain and one for the spectral domain. The two hierarchies are quantized and stored in the vectors and , where the last level for both could possibly be , i.e. using the full set of vertices, in which case we get a ”pure” subspace MDS, or using the complete basis, in which case we get an equivalent of the landmark MDS method , or both, which reduces to applying a regular SMACOF iteration. It is important that each pair complies with the ”sampling criterion”, .

We highlight two properties of this scheme: First, for a fixed and varying , it follows from the majorization inequality that if we intialize each resolution level with the embedding from the previous resolution, we get a monotonic decrease in stress values. i.e., in the ”pure” subspace setting , spectral SMACOF produces a monotonically decreasing series of stress values, just like the regular SMACOF.

Second, for a fixed with varying , we observe that there is a jump in the stress value around , consistent with our observation in the derivation of the sampling criterion, and from there the approximation error and stress value decrease, though not monotonically.

## 6 Results

In this section we compare the spectral SMACOF algorithm to other LS-MDS algorithms in terms of running time and quality of the embedding. We implemented the algorithms in Matlab® using fast vector arithmetic. All the experiments were conducted on an Intel NUC6i7KYK equipped with Intel® CoreTM i7-6770HQ CPU 2.6GHz and 32 GB RAM. In all experiments we set , and use three resolution levels and . For the spectral SMACOF algorithm we use the following parameters: (see Alg. (1)), except for the full-resolution for which we set . For the SMACOF algorithm we use . We run it until it reaches the same value of stress achieved by the spectral SMACOF algorithm or lower. We also compare to the algorithm proposed in  which uses vector extrapolation methods (RRE) to accelerate SMACOF. Every iterations of regular SMACOF it tries to decrease the stress further by using a linear combination the previous descent directions. This acceleration technique is a meta-algorithm that can be put on top of the spectral SMACOF algorithm as well, and we only put it here for illustration purposes. In the following examples we set .

##### Ls-Mds.

We apply spectral SMACOF to compute canonical forms of several shapes from the low-res TOSCA dataset , which have 3400 vertices. The canonical form is an isometric embedding of a surface equipped with a metric , e.g. the geodesic metric, into . This embedding allows us to reduce the problem of non-rigid shape correspondence into a much simpler problem of rigid shape correspondence. For more information see . Figure ((a)a) shows running time for computing a canonical form using the unweighted stress. Figure ((b)b) shows running time for a canonical form obtained with . This kind of stress is commonly known as relative stress. In both experiments we initialized with the original embedding. The convergence rate of spectral SMACOF, measured in CPU time, is faster by two orders of magnitudes than the convergence rate obtained by regular SMACOF, even though the iteration count is much higher. Since in the unweighted case the minimization step admits an especially simple form (see (11)), this effect is more pronounced in the weighted case. Additional results and applications will appear in a longer version of this article.

##### Linearly constrained MDS.

As another illustration, we use spectral SMACOF to perform image retargetting, the task of re-scaling an image in a non-uniform way such that the distortion will concentrate in non-salient regions. The saliency map is provided interactively by the user. We limit ourselves to horizontal re-scaling only, . To do that we solve the following optimization problem

 minα1σ(X0+Φα)+μ⟨α,Lα⟩s.t.∂xX0+∂xΦα1≥ϵ, (20)

i.e., we restrict the embedding to be monotonic horizontally, and add a Dirichlet energy expressed in the Laplace-Beltrami basis. The linear constraints require the solution of a quadratic program in each iteration, instead of the update (14). Although the constraint has to be enforced at all points on the grid, we observed that enforcing the constraint only at the sampled points is enough, provided that the sampling is dense enough and that is large enough. The resulting quadratic program is pretty small, consisting of a few hundreds of variables and constraints, and we use Matlab’s quadprog to solve it. We point out that even without the linear constraints, solving MDS problem of this size, consisting of millions of variables, is practically impossible with standard techniques.

Since the grid is Euclidean, the LBO basis is just the Fourier series, which can be computed in closed form. The initial manifold on which we construct the basis is the isotropically re-scaled image. We use sliding boundary conditions (i.e., Dirichlet on the horizontal boundaries and Neumann on the vertical). The results are presented in Figure (2). For this application we used the following parameters: , , the image size is , the deformation grid size is . The size of the downscaled image is , = half the horizontal spacing between deformation grid-cells. Figure 2: Left: Original panoramic image with salient regions circled in red . Right: Retargetted panorama, downscaled horizontally to half the size. Notice how the distortion is concentrated primarily in the non-salient regions.

## 7 Conclusion

In this paper we showed that using a spectral representation for the embedding coordinates in LS-MDS problems arising in geometry processing uncovers a multi-resolution property of the numerical optimization scheme, which improves the convergence rate by orders of magnitude compared to other approaches, enabling its application to larger size problems and to more complex MDS models. The acceleration comes from two factors: First, we show experimentally that for various geometric optimization problems the displacement field between the final embedding and the initial embedding can be faithfully modeled as a bandlimited signal on the manifold, using only the first eigenvectors of the Laplacian, reducing the number of variables by a factor of . Second, we observed that for such signals we can sample the spatial data by a rate of and still converge to a local minimum of the full stress as if all the data was used. The quality of the embedding depends on the number of eigenvectors used and the number of points sampled, and changes from shape to shape. The relationship between , and the approximation quality is only an observational result at this stage, and should be made more precise in future research.

In order to apply the method, the leading eigenvectors of the discrete Laplace-Beltrami operator have to be computed in advance. Though this is possible for moderately sized meshes and for specific larger size meshes with special structure for which a closed form expression for the eigenvectors exist, it remains the bottleneck in applying this method to very large meshes. A potential resolution to this bottleneck may come from the fact that the full basis vectors are only needed for the post process interpolation, which can be reformulated as an optimization problem that does not require the computation of the basis explicitly. We intend to explore this issue in future research.

#### 7.0.1 Acknowledgements

This research was supported by the ERC StG grant no. 335491 and the ERC StG grant no. 307048 (COMET). Part of this research was carried out during a stay with Intel Perceptual Computing Group, Haifa, Israel.

## References

•  Toolbox for surface comparison and analysis. http://tosca.cs.technion.ac.il/book/resources_data.html
•  Aflalo, Y., Kimmel, R.: Spectral multidimensional scaling. Proceedings of the National Academy of Sciences 110(45), 18052–18057 (2013)
•  Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation 15(6), 1373–1396 (2003)
• 

Bengio, Y., Paiement, J.f., Vincent, P., Delalleau, O., Roux, N.L., Ouimet, M.: Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. In: Advances in Neural Information Processing Systems. pp. 177–184 (2004)

•  Borg, I., Groenen, P.J.: Modern multidimensional scaling: Theory and applications. Springer Science & Business Media (2005)
•  Boscaini, D., Eynard, D., Bronstein, M.M.: Shape-from-intrinsic operator. arXiv preprint arXiv:1406.1925 (2014)
•  Brandes, U., Pich, C.: Eigensolver methods for progressive multidimensional scaling of large data. In: International Symposium on Graph Drawing. pp. 42–53. Springer (2006)
•  Bronstein, A.M., Bronstein, M.M., Kimmel, R.: Numerical geometry of non-rigid shapes. Springer Science & Business Media (2008)
•  Bronstein, M.M., Bronstein, A.M., Kimmel, R., Yavneh, I.: Multigrid multidimensional scaling. Numerical linear algebra with applications 13(2-3), 149–171 (2006)
• 

Buja, A., Swayne, D.F., Littman, M.L., Dean, N., Hofmann, H., Chen, L.: Data visualization with multidimensional scaling. Journal of Computational and Graphical Statistics 17(2), 444–472 (2008)

•  Cox, T.F., Cox, M.A.: Multidimensional scaling. CRC press (2000)
•  De Leeuw, J., Barra, I.J., Brodeau, F., Romier, G., Van Cutsem, B., et al.: Applications of convex analysis to multidimensional scaling. In: Recent Developments in Statistics. Citeseer (1977)
• 

De Leeuw, J., Heiser, W.J.: Multidimensional scaling with restrictions on the configuration. Multivariate analysis 5, 501–522 (1980)

•  De Silva, V., Tenenbaum, J.B.: Global versus local methods in nonlinear dimensionality reduction. Advances in neural information processing systems pp. 721–728 (2003)
•  Elad, A., Kimmel, R.: On bending invariant signatures for surfaces. IEEE Transactions on pattern analysis and machine intelligence 25(10), 1285–1295 (2003)
•  Eldar, Y., Lindenbaum, M., Porat, M., Zeevi, Y.Y.: The farthest point strategy for progressive image sampling. IEEE Transactions on Image Processing 6(9), 1305–1315 (1997)
•  Guttman, L.: A general nonmetric technique for finding the smallest coordinate space for a configuration of points. Psychometrika 33(4), 469–506 (1968)
•  Kalofolias, V., Bresson, X., Bronstein, M., Vandergheynst, P.: Matrix completion on graphs. arXiv preprint arXiv:1408.1717 (2014)
•  Kovalsky, S.Z., Galun, M., Lipman, Y.: Accelerated quadratic proxy for geometric optimization. ACM Transactions on Graphics (TOG) 35(4), 134 (2016)
•  Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1), 1–27 (1964)
•  Kruskal, J.B.: Nonmetric multidimensional scaling: a numerical method. Psychometrika 29(2), 115–129 (1964)
•  Lawrence, J., Arietta, S., Kazhdan, M., Lepage, D., O’Hagan, C.: A user-assisted approach to visualizing multidimensional images. IEEE transactions on visualization and computer graphics 17(10), 1487–1498 (2011)
•  Ovsjanikov, M., Ben-Chen, M., Solomon, J., Butscher, A., Guibas, L.: Functional maps: a flexible representation of maps between shapes. ACM Transactions on Graphics (TOG) 31(4),  30 (2012)
•  Pinkall, U., Polthier, K.: Computing discrete minimal surfaces and their conjugates. Experimental mathematics 2(1), 15–36 (1993)
•  Puy, G., Tremblay, N., Gribonval, R., Vandergheynst, P.: Random sampling of bandlimited signals on graphs. Applied and Computational Harmonic Analysis (2016)
•  Rosman, G., Bronstein, A.M., Bronstein, M.M., Sidi, A., Kimmel, R.: Fast multidimensional scaling using vector extrapolation. SIAM J. Sci. Comput 2 (2008)
•  Saul, L.K., Weinberger, K.Q., Ham, J.H., Sha, F., Lee, D.D.: Spectral methods for dimensionality reduction. Semisupervised learning pp. 293–308 (2006)
• 

Shahid, N., Kalofolias, V., Bresson, X., Bronstein, M., Vandergheynst, P.: Robust principal component analysis on graphs. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 2812–2820 (2015)

•  Shamai, G., Aflalo, Y., Zibulevsky, M., Kimmel, R.: Classical scaling revisited. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 2255–2263 (2015)
•  Shepard, R.N.: The analysis of proximities: multidimensional scaling with an unknown distance function. i. Psychometrika 27(2), 125–140 (1962)
•  Torgerson, W.S.: Multidimensional scaling: I. theory and method. Psychometrika 17(4), 401–419 (1952)
•  Yuan, Y.x.: A review on subspace methods for nonlinear optimization. In: Proceedings of the International Congress of Mathematics. pp. 807–827 (2014)