1 Problem formulation
This paper focuses on solving a generic inverse problem of recovering causal factor from observations such that . Here function
, is a generic loss function which aligns the observationswith the variable (possibly via other causal factors. e.g. or in Section 3.2 and 3.3).
If, is ill-conditioned (for example when ), we want to recover matrix under the assumption that the columns of it lie near a low-dimensional non-linear manifold. This can be done by solving a constrained optimization problem of the following form:
where is the non-linear mapping of matrix from the input space to the feature space (also commonly referred as Reproducing Kernel Hilbert Space), via a non-linear mapping function associated with a Mercer kernel such that .
In this paper we present a novel energy minimization framework to solve problems of the general form (1).
As our first contribution, we relax the problem (1) by using the trace norm of — the convex surrogate of rank function — as a penalization function. The trace norm of a matrix
is the sum of its eigenvaluesand was proposed as a tight convex relaxation111More precisely, was shown to be the tight convex envelope of , where represent spectral norm of . of the and is used in many vision problems as a rank regularizer . Although the rank minimization via trace norm relaxation does not lead to a convex problem in presence of a non-linear kernel function, we show in 2.2 that it leads to a closed-form solution to denoising a kernel matrix via penalizing the rank of recovered data () directly in the feature space.
With these changes we can rewrite (1) as:
where is a regularization strength.222 can also be viewed as Lagrange multiplier to the constraints in (1).
It is important to notice that although the rank of the kernel matrix is equal to the rank of , is merely
. Thus, directly penalizing the sum of the singular values ofwill not encourage low-rank in the feature space.333Although it is clear that relaxing the rank of kernel matrix to is suboptimal, works like [17, 7] with a variational definition of nuclear norm, allude to the possibility of kernelization. Further investigation is required to compare this counterpart to our tighter relaxation.
Although we have relaxed the non-convex rank function, (2) is in general difficult to minimize due to the implicitness of the feature space. Most widely used kernel functions like RBF do not have a explicit definition of the function . Moreover, the feature space for many kernels is high- (possibly infinite-) dimensional, leading to intractability. These issues are identified as the main barriers to robust KPCA and pre-image estimation . Thus, we have to reformulate (2) by applying kernel trick where the cost function (2) can be expressed in terms of the kernel function alone.
The key insight here is that under the assumption that kernel matrix is positive semidefinite, we can factorize it as: . Although, this factorization is non-unique, it is trivial to show the following:
where is the function mapping the input matrix to its largest eigenvalue.
The row space of matrix in (3
) can be seen to span the eigenvectors associated with the kernel matrix— hence the principal components of the non-linear manifold we want to estimate.
The above minimization can be solved with a soft relaxation of the manifold constraint by assuming that the columns of lie near the non-linear manifold.
As , the optimum of (5) approaches the optimum of (4) . A local optimum of (4) can be achieved using the penalty method of  by optimizing (5) while iteratively increasing as explained in Section 2.
Before moving on, we would like to discuss some alternative interpretations of (5) and its relationship to previous work – in particular LVMs. Intuitively, we can also interpret (5) from the probabilistic viewpoint as commonly used in latent variable model based approaches to define kernel function . For example a RBF kernel with additive Gaussian noise and inverse width can be defined as: , where . In other words, with a finite , our model allows the data points to lie near a non-linear low-rank manifold instead of on it. Its worth noting here that like LVMs, our energy formulation also attempts to maximize the likelihood of regenerating the training data , (by choosing to be a simple least squares cost) while doing dimensionality reduction.
Note that in closely related work , continuous rank penalization (with a logarithmic prior) has also been used for robust probabilistic non-linear dimensionality reduction and model selection in LVM framework. However, unlike [15, 20] where the non-linearities are modeled in latent space (of predefined dimensionality), our approach directly penalizes the non-linear dimensionality of data in a KPCA framework and is applicable to solve inverse problems without pre-training.
We approach the optimization of (5) by solving the following two sub-problems in alternation:
2.1 Pre-image estimation to solve inverse problem.
Subproblem (6) can be seen as a generalized pre-image estimation problem: we seek the factor , which is the pre-image of the projection of onto the principle subspace of the RKHS stored in , which best explains the observation . Here (6) is generally a non-convex problem, unless the Mercer-kernel is linear, and must therefore be solved using non-linear optimization techniques. In this work, we use the Levenberg-Marquardt algorithm for optimizing (6).
Notice that (6) only computes the pre-image for the feature space projections of the data points with which the non-linear manifold (matrix ) is learned. An extension to our formulation is desirable if one wants to use the learned non-linear manifold for denoising test data in a classic pre-image estimation framework. Although a valuable direction to pursue, it is out of scope of the present paper.
2.2 Robust dimensionality reduction
) is non-convex we can solve it in closed-form via singular value decomposition. Our closed-form solution is outlined in Algorithm2 and is based on the following theorem:
With let denote its singular value decomposition. Then
A minimizer of (8) is given by
with , , where denotes the depressed cubic . is the set of n-by-n diagonal matrices with non-negative entries.
As the closed-form solution to (7) is a key contribution of this work, an extended proof to the above theorem is included in the Appendix 0.A. Theorem 2.1 shows that each eigenvalue of the minimizer of (7) can be obtained by solving a depressed cubic whose coefficients are determined by the corresponding eigenvalue of the kernel matrix and the regularization strength . The roots of each cubic, together with zero, comprise a set of candidates for the corresponding egienvalue of . The best one from this set is obtained by choosing the value which minimizes (9) (see Algorithm 2).
In this section we demonstrate the utility of the proposed algorithm. The aims of our experiments are twofold: (i) to compare our dimensionality reduction technique favorably with KPCA and its robust variants; and (ii) to demonstrate that the proposed non-linear dimensionality regularizer consistently outperforms its linear counterpart (a.k.a. nuclear norm) in solving inverse problems.
3.1 Validating the closed form solution
Given the relaxations proposed in Section 1, our assertion that the novel trace regularization based non-linear dimensionality reduction is robust need to be substantiated. To that end, we evaluate our closed-form solution of Algorithm 2 on the standard oil flow dataset introduced in .
This dataset comprises 1000 training and 1000 testing data samples, each of which is of 12 dimensions and categorized into one of three different classes. We add zero mean Gaussian noise with varianceto the training data444Note that our formulation assumes Gaussian noise in where as for this evaluation we add noise to directly. and recover the low-dimensional manifold for this noisy training data with KPCA and contrast this with the results from Algorithm 2. An inverse width of the Gaussian kernel is used for all the experiments on the oil flow dataset.
It is important to note that in this experiment, we only estimate the principal components (and their variances) that explain the estimated non-linear manifold, i.e. matrix by Algorithm 2, without reconstructing the denoised version of the corrupted data samples.
Both KPCA and our solution require model selection (choice of rank and respectively) which is beyond the scope of this paper. Here we resort to evaluate the performance of both methods under different parameters settings. To quantify the accuracy of the recovered manifold () we use following criteria:
Manifold Error : A good manifold should preserve maximum variance of the data — i.e. it should be able to generate a denoised version of the noisy kernel matrix . We define the manifold estimation error as , where is the kernel matrix derived using noise free data. Figure 3 shows the manifold estimation error for KPCA and our method for different rank and parameter respectively.555Errors from non-noisy kernel matrix can be replaced by cross validating the entries of the kernel matrix for model selection for more realistic experiment.
Classification error: The accuracy of a non-linear manifold is often also tested by the nearest neighbor classification accuracy. We select the estimated manifold which gives minimum Manifold Error for both the methods and report 1NN classification error (percentage of misclassified example) of the 1000 test points by projecting them onto estimated manifolds.
Table 1 shows that the proposed method outperforms KPCA to generate less noisy manifold representations with different ranks and gives better classification results than KPCA for test data. The differences are more significant as the amount of noise increases. This simple experiment evaluates our Robust KPCA solution in isolation and indicates that our closed form solution itself can be beneficial for the problems where pre-image estimation is not required. Note that we have used no loss function in this experiment however further investigation into more suitable classification loss functions (e.g. ) should lead to better results.
|Manifold Error||Classification Error|
|STD||KPCA||Our CFS||KPCA||Our CFS|
Robust dimensionality reduction accuracy by KPCA versus our closed-form solution on the full oil flow dataset. Columns from left to right represent: (1) standard deviation of the noise in training samples (2-3) Error in the estimated low-dimensional kernel matrix by (2) KPCA and (3) our closed-form solution, (4-5) Nearest neighbor classification error of test data using (4) KPCA and (5) our closed-form solution respectively.
3.2 Matrix completion
The nuclear norm has been introduced as a low rank prior originally for solving the matrix completion problem. Thus, it is natural to evaluate its non-linear extensions on the same task. Assuming to be the input matrix and a binary matrix specifying the availability of the observations in , Algorithm 1 can be used for recovering a complete matrix with the following choice of :
where represents Hadamard product.
|mean||13 4||28 4||70 9||139 7|
|1-NN||5 3||14 5||90 20||NA|
|PPCA||3.7 .6||9 2||50 10||140 30|
|PKPCA||5 1||12 3||32 6||100 20|
|RKPCA||3.2 1.9||8 4||27 8||83 15|
To demonstrate the robustness of our algorithm for matrix completion problem, we choose 100 training samples from the oil flow dataset described in section 2.2
and randomly remove the elements from the data with varying range of probabilities to test the performance of the proposed algorithm against various baselines. Following the experimental setup as specified in, we repeat the experiments with 50 different samples of . We report the mean and standard deviation of the root mean square reconstruction error for our method with the choice of , alongside five different methods in Table 2. Our method significantly improves the performance of missing data completion compared to other robust extensions of KPCA [29, 27, 22], for every probability of missing data.
Although we restrict our experiments to least-squares cost functions, it is vital to restate here that our framework could trivially incorporate robust functions like the norm instead of the Frobenius norm — as a robust data term — to generalize algorithms like Robust PCA  to their non-linear counterparts.
3.3 Kernel non-rigid structure from motion
Non-rigid structure from motion under orthography is an ill-posed problem where the goal is to estimate the camera locations and 3D structure of a deformable objects from a collection of 2D images which are labeled with landmark correspondences .
Assuming to be the 3D location of point on the deformable object in the image, its orthographic projection can be written as , where is a orthographic projection matrix . Notice that as the object deforms, even with given camera poses, reconstructing the sequence by least-squares reprojection error minimization is an ill-posed problem. In their seminal work,  proposed to solve this problem with an additional assumption that the reconstructed shapes lie on a low-dimensional linear subspace and can be parameterized as linear combinations of a relatively low number of basis shapes. NRSfM was then cast as the low-rank factorization problem of estimating these basis shapes and corresponding coefficients.
Recent work, like [10, 13] have shown that the trace norm regularizer can be used as a convex envelope of the low-rank prior to robustly address ill-posed nature of the problem. A good solution to NRSfM can be achieved by optimizing:
where is the shape matrix whose columns are dimensional vectors storing the 3D coordinates of the shapes and
is a binary variable indicating if projection of pointis available in the image .
Assuming the projection matrices to be fixed, this problem is convex and can be exactly solved with standard convex optimization methods. Additionally, if the 2D projections are noise free, optimizing (12) with very small corresponds to selecting the the solution — out of the many solutions — with (almost) zero projection error, which has minimum trace norm . Thus henceforth, optimization of (12) is referred as the trace norm heuristics (TNH). We solve this problem with a first order primal-dual variant of the algorithm given in , which can handle missing data. The algorithm is detailed and compared with other NRSfM methods favorably in the supplementary material.666TNH is used as a strong baseline and has been validated on the full length CMU mocap sequences. It marginally outperforms  which is known to be the state of the art NRSfM approach without missing data.
A simple kernel extension of the above optimization problem is:
where is the non-linear mapping of to the feature space using an RBF kernel.
With fixed projection matrices , (13) is of the general form (2), for which the local optima can be found using Algorithm 1. To solve NRSfM problem with unknown projection matrices, we parameterize each with quaternions and alternate between refining the 3D shapes and projection matrices using LM.
3.3.1 Results on the CMU dataset
|Dataset||No Missing Data||Missing Data|
We use a sub-sampled version of CMU mocap dataset by selecting every frame of the smoothly deforming human body consisting 41 mocap points used in .777 Since our main goal is to validate the usefulness of the proposed non-linear dimensionality regularizer, we opt for a reduced size dataset for more rapid and flexible evaluation.
In our first set of experiments we use ground truth camera projection matrices to compare our algorithm against TNH. The advantage of this setup is that with ground-truth rotation and no noise, we can avoid the model selection (finding optimal regularization strength ) by setting it low enough. We run the TNH with and use this reconstruction as initialization for Algorithm 1. For the proposed method, we set and use following RBF kernel width selection approach:
Maximum distance criterion (): we set the maximum distance in the feature space to be . Thus, the kernel matrix entry corresponding to the shape pairs obtained by TNH with maximum Euclidean distance becomes .
Median distance criterion (): the kernel matrix entry corresponding to the median euclidean distance is set to 0.5.
Following the standard protocol in [10, 2], we quantify the reconstruction results with normalized mean 3D errors , where is the euclidean distance of a reconstructed point in frame from the ground truth, is the mean of standard deviation for 3 coordinates for the ground truth 3D structures, and are number of input images and number of points reconstructed.
Table 3 shows the results of the TNH and non-linear dimensionality regularization based methods using the experimental setup explained above, both without missing data and after randomly removing 50% of the image measurements. Our method consistently beats the TNH baseline and improves the mean reconstruction error by with full data and by when used with 50% missing data. Figure 1 shows qualitative comparison of the obtained 3D reconstruction using TNH and proposed non-lienar dimensionality regularization technique for some sample frames from various sequences. We refer readers to supplementary material for more visualizations.
Table 4 shows the reconstruction performance on a more realistic experimental setup, with the modification that the camera projection matrices are initialized with rigid factorization and were refined with the shapes by optimizing (3). The regularization strength
was selected for the TNH method by golden section search and parabolic interpolation for every test case independently. This ensures the best possible performance for the baseline. For our proposed approachwas kept to for all sequences for both missing data and full data NRSfM. This experimental protocol somewhat disadvantages the non-linear method, since its performance can be further improved by a judicious choice of the regularization strength.
However our purpose is primarily to show that the non-linear method adds value even without time-consuming per-sequence tuning. To that end, note that despite large errors in the camera pose estimations by TNH and missing measurements, the proposed method shows significant () improvements in terms of reconstruction errors, proving our broader claims that non-linear representations are better suited for modeling real data, and that our robust dimensionality regularizer can improve inference for ill-posed problems.
|Dataset||No Missing Data||Missing Data|
As suggested by , robust camera pose initialization is beneficial for the structure estimation. We have used rigid factorization for initializing camera poses here but this can be trivially changed. We hope that further improvements can be made by choosing better kernel functions, with cross validation based model selection (value of ) and with a more appropriate tuning of kernel width. Selecting a suitable kernel and its parameters is crucial for success of kernelized algorithms. It becomes more challenging when no training data is available. We hope to explore other kernel functions and parameter selection criteria in our future work.
We would also like to contrast our work with , which is the only work we are aware of where non-linear dimensionality reduction is attempted for NRSfM. While estimating the shapes lying on a two dimensional non-linear manifold,  additionally assumes smooth 3D trajectories (parametrized with a low frequency DCT basis) and a pre-defined hard linear rank constraint on 3D shapes. The method relies on sparse approximation of the kernel matrix as a proxy for dimensionality reduction. The reported results were hard to replicate under our experimental setup for a fair comparison due to non-smooth deformations. However, in contrast to , our algorithm is applicable in a more general setup, can be modified to incorporate smoothness priors and robust data terms but more importantly, is flexible to integrate with a wide range of energy minimization formulations leading to a larger applicability beyond NRSfM.
In this paper we have introduced a novel non-linear dimensionality regularizer which can be incorporated into an energy minimization framework, while solving an inverse problem. The proposed algorithm for penalizing the rank of the data in the feature space has been shown to be robust to noise and missing observations. We have picked NRSfM as an application to substantiate our arguments and have shown that despite missing data and model noise (such as erroneous camera poses) our algorithm significantly outperforms state-of-the-art linear counterparts.
Although our algorithm currently uses slow solvers such as the penalty method and is not directly scalable to very large problems like dense non-rigid reconstruction, we are actively considering alternatives to overcome these limitations. An extension to estimate pre-images with a problem-specific loss function is possible, and this will be useful for online inference with pre-learned low-dimensional manifolds.
Given the success of non-linear dimensionality reduction in modeling real data and overwhelming use of the linear dimensionality regularizers in solving real world problems, we expect that proposed non-linear dimensionality regularizer will be applicable to a wide variety of unsupervised inference problems: recommender systems; 3D reconstruction; denoising; shape prior based object segmentation; and tracking are all possible applications.
Appendix 0.A Proof of Theorem 3.1
We will prove theorem 2.1 by first establishing a lower bound for (8) and subsequently showing that this lower bound is obtained at given by (10). The rotational invariance of the entering norms allows us to write (8) as:
Expanding (14) we obtain
Next, with in (8) we have
Finally, since the subproblems in (18) are separable in , its minimizer must be KKT-points of the individual subproblems. As the constraints are simple non-negativity constraints, these KKT points are either (positive) stationary points of the objective functions or . It is simple to verify that the stationary points are given by the roots of the cubic function . Hence it follows that there exists a such that
, which completes the proof.
Abrahamsen, T.J., Hansen, L.K.: Input space regularization stabilizes pre-images for kernel pca de-noising. In: EEE International Workshop on Machine Learning for Signal Processing. pp. 1–6 (2009)
-  Akhter, I., Sheikh, Y., Khan, S., Kanade, T.: Nonrigid structure from motion in trajectory space. In: Advances in neural information processing systems. pp. 41–48 (2009)
-  Bakir, G.H., Weston, J., Schölkopf, B.: Learning to find pre-images. Advances in neural information processing systems 16(7), 449–456 (2004)
Bishop, C.M., James, G.D.: Analysis of multiphase flows using dual-energy gamma densitometry and neural networks. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 327(2), 580–593 (1993)
-  Blanz, V., Vetter, T.: A morphable model for the synthesis of 3d faces. In: 26th annual conference on Computer graphics and interactive techniques. pp. 187–194 (1999)
Bregler, C., Hertzmann, A., Biermann, H.: Recovering non-rigid 3d shape from image streams. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 690–696 (2000)
-  Cabral, R., De la Torre, F., Costeira, J.P., Bernardino, A.: Unifying nuclear norm and bilinear factorization approaches for low-rank matrix decomposition. In: International Conference on Computer Vision (ICCV) (2013)
-  Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Foundations of Computational mathematics 9(6), 717–772 (2009)
-  Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Transactions on pattern analysis and machine intelligence 23(6), 681–685 (2001)
-  Dai, Y., Li, H., He, M.: A simple prior-free method for non-rigid structure-from-motion factorization. International Journal of Computer Vision 107(2), 101–122 (2014)
-  Dame, A., Prisacariu, V.A., Ren, C.Y., Reid, I.: Dense reconstruction using 3d object shape priors. In: Computer Vision and Pattern Recognition. pp. 1288–1295. IEEE (2013)
-  Fazel, M.: Matrix rank minimization with applications. Ph.D. thesis, Stanford University (2002)
-  Garg, R., Roussos, A., Agapito, L.: Dense variational reconstruction of non-rigid surfaces from monocular video. In: Computer Vision and Pattern Recognition. pp. 1272–1279 (2013)
-  Garg, R., Roussos, A., Agapito, L.: A variational approach to video registration with subspace constraints. International journal of computer vision 104(3), 286–314 (2013)
-  Geiger, A., Urtasun, R., Darrell, T.: Rank priors for continuous non-linear dimensionality reduction. In: Computer Vision and Pattern Recognition. pp. 880–887. IEEE (2009)
-  Gotardo, P.F., Martinez, A.M.: Kernel non-rigid structure from motion. In: IEEE International Conference on Computer Vision. pp. 802–809 (2011)
-  Huang, D., Cabral, R.S., De la Torre, F.: Robust regression. In: European Conference on Computer Vision (ECCV) (2012)
-  Jolliffe, I.: Principal component analysis. Wiley Online Library (2002)
-  Kwok, J.Y., Tsang, I.W.: The pre-image problem in kernel methods. IEEE Transactions on Neural Networks, 15(6), 1517–1525 (2004)
-  Lawrence, N.D.: Probabilistic non-linear principal component analysis with gaussian process latent variable models. The Journal of Machine Learning Research 6, 1783–1816 (2005)
-  Mika, S., Schölkopf, B., Smola, A.J., Müller, K.R., Scholz, M., Rätsch, G.: Kernel pca and de-noising in feature spaces. In: NIPS. vol. 4, p. 7 (1998)
-  Nguyen, M.H., De la Torre, F.: Robust kernel principal component analysis. In: Advances in Neural Information Processing Systems (2009)
-  Nocedal, J., Wright, S.: Numerical optimization. Springer, New York (2006)
-  Poling, B., Lerman, G., Szlam, A.: Better feature tracking through subspace constraints. In: Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. pp. 3454–3461. IEEE (2014)
-  Prisacariu, V.A., Reid, I.: Nonlinear shape manifolds as shape priors in level set segmentation and tracking. In: Computer Vision and Pattern Recognition. pp. 2185–2192. IEEE (2011)
-  Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review 52(3), 471–501 (2010)
-  Sanguinetti, G., Lawrence, N.D.: Missing data in kernel pca. In: Machine Learning: ECML 2006, pp. 751–758. Springer (2006)
-  Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural computation 10(5), 1299–1319 (1998)
-  Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61(3), 611–622 (1999)
-  Wright, J., Ganesh, A., Rao, S., Peng, Y., Ma, Y.: Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. In: Advances in Neural Information Processing Systems, pp. 2080–2088 (2009)
-  Zhou, X., Yang, C., Zhao, H., Yu, W.: Low-rank modeling and its applications in image analysis. ACM Computing Surveys (CSUR) 47(2), 36 (2014)