DeepAI
Log In Sign Up

Probabilistic Auto-Associative Models and Semi-Linear PCA

09/20/2012
by   Serge Iovleff, et al.
0

Auto-Associative models cover a large class of methods used in data analysis. In this paper, we describe the generals properties of these models when the projection component is linear and we propose and test an easy to implement Probabilistic Semi-Linear Auto- Associative model in a Gaussian setting. We show it is a generalization of the PCA model to the semi-linear case. Numerical experiments on simulated datasets and a real astronomical application highlight the interest of this approach

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

03/31/2011

Auto-associative models, nonlinear Principal component analysis, manifolds and projection pursuit

In this paper, auto-associative models are proposed as candidates to the...
02/23/2015

Optimal Sparse Linear Auto-Encoders and Sparse PCA

Principal components analysis (PCA) is the optimal linear auto-encoder o...
04/30/2015

Semi-Orthogonal Multilinear PCA with Relaxed Start

Principal component analysis (PCA) is an unsupervised method for learnin...
08/22/2018

XPCA: Extending PCA for a Combination of Discrete and Continuous Variables

Principal component analysis (PCA) is arguably the most popular tool in ...
01/10/2021

HePPCAT: Probabilistic PCA for Data with Heteroscedastic Noise

Principal component analysis (PCA) is a classical and ubiquitous method ...
12/24/2021

Auto-balanced common shock claim models

The paper is concerned with common shock models of claim triangles. Thes...
09/05/2017

Linear Optimal Low Rank Projection for High-Dimensional Multi-Class Data

Classification of individual samples into one or more categories is crit...

1. Introduction

Principal component analysis (PCA) [29, 20, 23] is a well established tool for dimension reduction in multivariate data analysis. It benefits from a simple geometrical interpretation. Given a set of points with and an integer , PCA builds the -dimensional affine subspace minimizing the Euclidean distance to the scatter-plot [29]

. The application of principal component analysis postulates implicitly some form of linearity. More precisely, one assumes that the data cloud is directed, and that the data points can be well approximated by there projections to the affine hyperplane corresponding to the first

principal components.

Starting from this point of view, many authors have proposed nonlinear extensions of this technique. Principal curves or principal surfaces methods [16, 17, 6]

belong to this family of approaches, non-linear transformation of the original data set

[7, 3]

too. The auto-associative neural networks can also be view as a non-linear PCA model

[2, 27, 4, 19]. In [13] we propose the auto-associative models (AAM) as candidates to the generalization of PCA using a projection pursuit regression algorithm [9, 25]

adapted to the auto-associative case. A common point of these approaches is that they have the intent to estimate an auto-associative model whose definition is given hereafter.

Definition 1.1.

A function is an auto-associative function of dimension if it is a map from to that can be written where (the “Projection”) is a map from to (generally ) and (the “Restoration” or the ”Regression”) is a map from to .

An auto-associative model (AAM) of dimension is a manifold of the form

where is an auto-associative function of dimension .

For example the PCA constructs an auto-associative model using as auto-associative function an orthogonal projector on an affine subspace of dimension . More precisely we have

with

and the vectors

chosen in order to maximize the projected variance.

can be written with

and

with . The AAM is then the affine subspace given by the following equation

Interested reader can check that principal curves, principal surfaces, auto-associative neural networks, kernel PCA [31], ISOMAP [36] and local linear embedding [30] have also the intent to estimate an AAM.

In the PCA approach the projection and the restoration function are both linear. It is thus natural to say that the PCA is a Linear Auto-Associative Model. In the general case, the manifold set can be empty (i.e. the auto-associative function have no fixed point) or very complicated to describe. Our aim in this paper is to study from a theoretical and practical point of view the properties of some Auto-Associative models in an intermediary situation between the PCA model and the general case: we will assume that the projection function is linear and let the regression function be arbitrary. We call the resulting AAM the Semi-Linear Auto-Associative Models (SLAAM).

Having restricted our study to the SLAAM, we have to give us some criteria to maximize. As we said previously, the PCA tries to maximize the projected variance or, equivalently, to minimize the residual variance. Common AAM approaches used also the squared reconstruction error as criteria, or more recently a penalized criteria [17]. However as pointed out by M. E. Tipping and C. M. Bishop [37]

, one limiting disadvantage of this approach is the absence of a probability density model and associated likelihood measure. The presence of a probabilistic model is desirable as

  • the definition of a likelihood measure permits comparison between concurrent models and facilitates statistical testing,

  • A single AAM may be extended to a mixture of such models,

  • if a probabilistic AAM is used to model the class conditional densities in a classification problem, the posterior probabilities of class membership may be computed.

We propose thus a Gaussian generative model for the SLAAM and try to estimate it using a maximum likelihood approach. In the general case we are faced with a difficult optimization problem and we cannot go further without additional assumptions. It will appear clearly that if is known then the estimation problem of a SLAAM is very close to an estimation problem in a regression context. There is however some differences we will enlighten. In particular it will appear that in order to get tractable maximum-likelihood estimates, we have to impose some restrictions to the noise. We call the resulting model of all these assumptions/simplifications a Semi-Linear Principal Component Analysis. It does not seem possible to add non-linearity to the PCA and get tractable likelihood estimate for . But clearly, the assumption that is known is too strong in practice. We propose thus to estimate it in a separate step using either the PCA or a contiguity analysis [26] by extending our previous work on the Auto-Associative models [13]. Finally, even if is assumed known it remains to estimate the regression function which is a non-linear function from to . If and is moderately high the task become very complicated. Thus we simplify once more the model and assume that is additive inspired by the Generalized Additive Model (GAM) approach [18].

In view of the experiments we have performed and we present there, it seems we obtain a practical and simple model which generalizes in an understandable way the PCA model to the non-linear case.

The paper is organized as follows. Section 2 introduces the Probabilistic Semi-Linear Auto-Associative Models (PSLAAM) and relate them to the PCA and Probabilistic PCA models. In section 3 we present the Probabilistic Semi-Linear PCA models and the estimation of theirs parameters conditionally to the knowledge of the projection matrix . Section 4 is devoted to the determination of the projection matrix using contiguity analysis. Data sets and experiments are detailed in Section 5 with a real astronomical data set. Finally, some concluding remarks are proposed in Section 6.

2. Semi-Linear Auto-Associatif Models (SLAAM)

2.1. Geometrical properties of the SLAAM

Let us first consider a general auto-associative model as given in the definition 1.1. We have the following evident property

Proposition 2.1.

Let . On the projection function and the regression function verify

(1)

where denote the identity function of .

Proof.

Let and let , then

As a consequence, we have the following “orthogonality” property verified by an AAM when is an additive function

Proposition 2.2.

Let and assume that the property (1) extend on , let , and . If is additive, i.e. , then

Proof.

Using the property (1), we have on one hand . While on the other hand giving the announced result. ∎

Clearly we have and the assumption given in this proposition seems quite natural. We focus now on the semi-linear case and we assume that

(2)

with a matrix of size .

Proposition 2.3.

Let be an auto-associative function, with given in (2) and verifying the property (1). Let be an orthonormal basis of with chosen arbitrarily. Let , and let and denote respectively the vector and the auto-associative function in the basis , then

(3)
Proof.

It suffices to notice that the change of basis matrix is given by

thus the left multiplication of and by , using (1), will give (3). ∎

From this last proposition we can see that the Semi-Linear Auto-Associative models have a relatively simple geometrical structure and that we cannot expect to model highly non-linear models with them.

2.2. Probabilistic Semi-Linear Auto-Associative Models (PSLAAM)

In the sequel, we will denote by the subspace spanned by the set of vectors , and give us an arbitrary orthonormal basis of denoted by . We will denote by the matrix and by the matrix . As in proposition 2.3,

represents the unitary matrix

.

2.2.1. General Gaussian Setting

Definition 2.1.

Let be a -dimensional Gaussian random vector:

(4)

and let be a -dimensional centered Gaussian random vector with a diagonal covariance matrix .

The -dimensional vector is a Probabilistic Semi-Linear Auto-Associative Model (PSLAAM) if it can be written

(5)

where the , , are arbitrary real functions from to .

2.2.2. Link with the Principal Component Analysis

Assume that:

  1. for all ,

  2. the covariance matrix of , is diagonal with ,

  3. the Gaussian noise have the following covariance matrix with .

then the vector is a Gaussian random vector

with

and are the

first eigenvectors given by the PCA.

2.2.3. Link with the Probabilistic Principal Component Analysis

The probabilistic PCA [37] is a model of the form

(6)

with a matrix, a -dimensional isotropic Gaussian vector, i.e. , and a -dimensional centered Gaussian random vector with covariance matrix . The law of is not modified if is right multiplied by a unitary matrix, it is thus possible to impose to the rows of to be orthogonal (assuming that is of full rank).

The following proposition is then straightforward

Proposition 2.4.

Assume that (and thus ) is an isotropic Gaussian noise, i.e. , take for all and set

The resulting Probabilistic Semi-Linear Auto-Associative Model is a Probabilistic Principal Component Analysis.

For this simple model there exists a close form of the posterior probability of and for the maximum likelihood of the parameters of the model. In particular, the matrix can be estimated up to a rotation and spans the principal subset of the data.

3. Semi-Linear PCA

Our aim is now to generalize the PCA model we present in part (2.2.2) to the semi-linear case. We observe that in the PCA model, if the matrix

is known, then we are able to know the random variable

. This observation lead us to formulate the following hypothesis about the noise :

N:

the Gaussian noise have the following covariance matrix .

Expressing in the basis (definition 2.1) we get the following expression for :

(7)

In other word, the coordinates of can be split in two sets. The first coordinates are the Gaussian random vector , while the remaining coordinates are a random vector which is conditionally to a Gaussian random vector . Observe that the regression functions are dependents of the choice of the vectors and that, as the noise lives in the orthogonal of , we have .

3.1. Maximum Likehood Estimates

The parameters we have to estimate are the position and correlation parameters and for the part and for the non-linear part. Given a set of points in , we get by projection two sets of points in , and in .

Standard calculation give the maximum likehood for and

(8)

and

(9)

The maximum likehood of is given by

(10)

It remains to estimate . We consider two cases : is a linear function and is a linear combination of the elements of a B-Spline function basis. The linear case is just a toy example that we will use for comparison with the additive B-Spline case. In the non-linear case, we have to estimate a function from to . As we say in the introduction this is a difficult task and we restrict ourself to a generalized additive model (GAM) by assuming that the function is additive, i.e.

(11)

where each is a map from into .

3.2. Linear Auto-Associative Models

In the linear case, we are looking for a vector and a matrix minimizing

It is easily verified that

Setting and , where represent a vector of size with on every coordinates. Assuming that the matrix is invertible, standard calculus show that

Finally, using the decomposition in eigenvalues of the covariance matrix of

, it is straightforward to verify the following theorem

Theorem 3.1.

If the orthonormal vectors are the eigenvectors associated with the first eigenvalues of the covariance matrix of then the estimated auto-associative model is the on obtain by the PCA.

3.3. Additive Semi-Linear Auto-Associative Models

In order to estimate the regression functions , we express them as a linear combination of B-Spline functions basis where is a number chosen by the user. We have thus to estimate the set of coefficients , , by minimizing

Standard regression techniques give then the estimates

where is the design matrix which depends of the knots position, degree of the B-Spline and the number of control points chosen by the user [14].

The estimated regression function , for are then given by the formula

3.4. Estimation in practice

The drawback of the previous maximum likehood equations is that, given the projection matrix , we have to perform a rotation of the original data set and next to perform an inverse rotation of the estimated model. In practice, we avoid such computations by estimating the model using the following steps:

  • (C) Center and (optionally) standardize the data set : obtain ,

  • (P) Compute the projected data set ( is centered),

  • (R) Compute the regression (without intercept),

  • (S) Compute the log-likelihood and the BIC criteria.

As we can see the main difference is in the regression step: we estimate directly a function from to . In practice, as the non-linear part of the model is in , the regression function we obtain numerically give the identity function in the space.

3.5. Model Selection

Since a Semi-linear PCA model depends highly of the projection matrix , model selection allows to select among various candidate the best projection. Several criteria for model selection have been proposed in the literature and the widely used are penalized likelihood criteria. Classical tools for model selection include the AIC [1] and BIC [33] criteria. The Bayesian Information Criterion (BIC) is certainly the most popular and consists in selecting the model which penalizes the likelihood by where is the number of parameters of the model and is the number of observations.

In practice we will fix a set of vectors given either by the contiguity analysis (section 4) or the PCA and select the dimension of the model using the BIC criteria because of its popularity. The projection matrices we compare are thus , ,… and so on.

4. Contiguity Analysis

Given an orthonormal set of vector in , an index is a functional measuring the interest of the projection of the vector on with a non negative real number. A widely used choice of is , the projected variance. This is the criteria maximized in the usual PCA method [23].

The choice of the index is crucial in order to find ”good” parametrization directions for the manifold to be estimated. We refer to [21] and [24] for a review on this topic in a projection pursuit setting. The meaning of the word ”good” depends on the considered data analysis problem. For instance, Friedman et al [10, 8], and more recently Hall [15], have proposed an index which measure the deviation from the normality in order to reveal more complex structures of the scatter plot. An alternative approach can be found in [5]

where a particular metric is introduced in PCA in order to detect clusters. We can also mention the index dedicated to outliers detection

[28].

Our approach generalizes the one we present in [13] and consists in defining a contiguity coefficient similar to Labart one’s [26] whose maximization allows to unfold nonlinear structures.

A contiguity matrix is a boolean matrix whose entry is if data points and are ”neighbors” and otherwise. Lebart proposes to use a threshold to the set of distances in order to construct this matrix but the choice of could be delicate. In [13] we propose to use a first order contiguity matrix, i.e. iff is the nearest neighbor of in order to construct the proximity graph. In order to get a more robust estimate of the neighbor structure, it is possible to generalize this approach and to use a -contiguity matrix, i.e. iff is one of the k-nearest neighbor of .

The contiguity matrix being chosen, we compute the local covariance matrix

(12)

and the total variance matrix

(13)

The axis of projection are then estimated by maximizing the contiguity index

(14)

Using standard optimization techniques, In can be shown that the resulting axis are the eigenvectors associated with the largest eigenvalues of the matrix .

5. Examples

We first present two illustrations of the estimation principle of PSLAAM on low dimensional data (Section 5.1 and 5.2). These two simulated examples are very similar from the one we use in our previous article with S. Girard [13]. Second, PSLAAM is applied to an astronomical analysis problem in Section 5.3.

Similarly, we always use an additive B-Spline regression model for the estimation of the regression function (section 3.3). The B-Spline are of degree 3 and we select the number of control points using the BIC .

5.1. First example on simulated data

The data are simulated using a one-dimensional regression function in . The equation of the AA model is given by

(15)

and thus

. The first coordinate of the random vector is sampled from a centered Gaussian distribution with standard deviation

a thousand times. An independent noise with standard deviation has then been added to the and coordinates.

The axis of projection have been computed thanks to the contiguity analysis (section 4) using the 3 nearest neighbors for the proximity graph. The correlations between the projected data set and the original data set are

X Y Z
Proj 0.9999680850 -0.0005794581 0.0089238830

which show that the first axis given by the contiguity analysis is very close from the -axis as it was expected. The result of the contiguity analysis can be visualized in the figure 1.

Figure 1. Correlation of the first axis obtain using a contiguity analysis with the and variables and representation of the scatter-plot and regression function in the main PCA plan. This graphic has been obtained with R using the plot command of the aam library.

The projected variance on the first axis is which is also very close from . We use the BIC criteria in order to select the dimension of the model and the number of control points. A summary of the tested model is given in the table 1

Contiguity Analysis PCA
dim BIC Residual Variance BIC Residual Variance
linear 1 9987,13(5) 1.439 11495.8 1.43864
2 10609.7(10) 1.40815

9
1 11247.6(29) 1.06557 11058.4 1.06407
2 11621(58) 1.1086

10
1 11060.7(32) 0.986316 10926.3 0.985777
2 11469(64) 0.913605

11
1 10853.7(35) 0.941064 10855 0.92711
2 11503(70) 0.90682

12
1 10844.6(38) 0.92717 10845(38) 0.941661
2 11525(76) 0.892669

13
1 10871.4(41) 0.929965 10871.6 0.927976
2 11568.4(82) 0.891119

14
1 108887.5(44) 0.927834 10888.3 0.927976
2 11604(88) 0.885939
Table 1. Values of the BIC criteria for and and for various number of control points (given in the first column). The number of free parameters of each model is given in parenthesis. The BIC criteria selects the model of dimension 1 with 11 control points. The axis of projection can be either the one obtained by contiguity analysis or the one obtained using the PCA.

Finally the result of the regression is drawn in the figure 2.

Figure 2. The simulated scatter-plot (blue), the estimated AAM (orange) and the true AAM (grey). This graphic is obtained with R using the draw3d command of the aam library.

5.2. Second example on simulated data

In our second example the AAM is given by

(16)

with and thus . The first two coordinates of the random vector are sampled from a centered Gaussian distribution with covariance matrix

and points are simulated. An independent noise with standard deviation has then been added to the coordinate.

The correlations between the projected data set and the original data are

X Y Z
Proj 0.99924737 -0.1488330 0.0179811
Proj -0.14239437 0.98532966 0.01396622

which show that the -plan is the plan essentially chosen by the contiguity analysis. The result of the contiguity analysis is displayed in the figure 3.

Figure 3. The AAM components and the 2-neighbors graph, the correlation circle of the AAM components with the variables and representation of the scatter-plot in the PCA plan. These pictures have been obtained using the plot command of the aam library.

The BIC criteria select 7 control points. A summary of the tested model is given in the table 2. The selected model over-estimate the residual variance by a factor 2. It is not a surprising result as the original model is not additive and we cannot expect to reconstruct it exactly. We don’t show the results with the PCA as the first axis this method select is the -axis which is clearly the wrong parametrization.

dim BIC Residual Variance
linear 1 11024.7(5) 1.78335
2 10.832.7(10) 1.20314

2 10654.2(34) 0.852753
6 1 11072.6(20) 1.68675
2 10716.1(40) 0.870381

7
1 11014.2(23) 1.55841
2 10.213.6(46) 0.505186

8
1 11.031.6(26) 1.55334
2 10.216.6(52) 0.486137
9 1 11051.7(29) 1.55229
2 10255(58) 0.484644
Table 2. Values of the BIC criteria for and and for various number of control points (given in the first column). The number of free parameters of each model is given in parenthesis. The BIC criteria selects the model of dimension 2 with 7 control points.

The true model and the estimated model obtained with an additive B-Spline regression are given in the figure 4.

(a) (b) (c)
Figure 4. (a) the original data set (blue) and the individual regression functions. (b)-(c) two views of the original manifold (grey) and of the extrapolated manifold using an additive B-Spline regression (yellow-red). These images have been obtained using the draw3d function of the aam library.

5.3. Example in spectrometry analysis

Finally we illustrate the performance of the semi-linear PCA on a real data set. The data consists of -dimensional spectral information of 487 stars [34, 12, 35, 11]

and they have been classified in 6 groups. They have been modeled by

[32] using an auto-associative neural networks based on a 19-30-10-30-19 network. Using the terminology of this article the model proposed by M. Scholz and its co-authors is an auto-associative model of dimension 10. We select the model using the BIC criteria. The main results are the following:

  1. The axis of projection given by the PCA outperform largely the results we obtain with the contiguity analysis, for any choice of control point.

  2. The BIC criteria retains a model of dimension 5 with 9 Control Points (871 parameters) when we use a non-linear regression step. The residual variance is

    while the total variance (inertia) of the data was 26.59832.

  3. The BIC criteria retains a model of dimension 12 (307 parameters) when we use a linear regression step. Observe that in this case, we are performing an usual PCA (theorem 3.1).

The data cloud in the main PCA space with the values predicted by the model is displayed in the figure 5.

Figure 5. Data cloud in the main PCA space with the predicted values (green).

A summary of some tested model are given in the table 3 and some planes of projection are displayed in the picture 6. We visualize each components of the regression functions by setting all, except one, predictors to zero and we represent the evolution of the regression function in the space in the graphic 7.

PCA
Control Points BIC value dim Residual variance
Linear 1829,95(307) 12 0,0049702
7 1187,26(820) 6 0,00727521
8 1147,62(776) 5 0,00975073
9 453,387(871) 5 0,0080763
10 701,769(966) 5 0,00768342
11 1333,45(1061) 5 0,00773327
Table 3. Values of the BIC criteria for various number of control points (given in the first column). The BIC criteria select the model of dimension 5 with 9 control points using as projection matrix the axis given by the PCA. The total variance (inertia) of the data set was 26.59832.
Figure 6. Some plans of projection of the PCA with the 2-neighbors graph (left) and with the single regression functions (right). As we use the PCA for the projection matrix, the AAM components are the PCA components. The colors of the points represent the classification of the stars.
Figure 7. The individual regression functions from to . In each row we have the non-zero predictor sampled in the range , and in each column the evolution of the functions in the dimension 6,…,19. The system of coordinates is the one given by the PCA.

6. Conclusion

We have presented a class of auto-associative model for data modeling and visualization called semi-linear auto-associative models. We provided theoretical groundings for these models by proving that the principal component analysis and the probabilistic principal component analysis are special cases. Our model allows to models data set with a simple non-linear component and is truly generative with an underlying probabilistic interpretation. However it does not allow to models data with a strong non-linear component and it depends highly on the choice of the projection matrix.

The Semi-Linear PCA have been implemented in C++ using the stk++ library [22] and is available at: https://sourcesup.renater.fr/projects/aam/.

The program is accompanied with a set of R scripts which allows to simulate and display the results of the aam program.

References

  • [1] H. Akaike. A new look at the statistical mode identification. IEEE Transaction on Automatic Control, 19:716–723, 1974.
  • [2] Pierre Baldi and Kurt Hornik. Neural networks and principal component analysis: Learning from examples without local minima. Neural Networks, 2(1):53–58, 1989.
  • [3] P. Besse and F. Ferraty. A fixed effect curvilinear model. Computational Statistics, 10(4):339–351, 1995.
  • [4] C. M. Bishop. Pattern recognition and machine learning, information science and statistics. Springer, Berlin, 2006.
  • [5] H. Caussinus and A. Ruiz-Gazen. Metrics for finding typical structures by means of Principal Component Analysis. Data science and its Applications, Harcourt Brace Japan, pages 177–192, 1995.
  • [6] P. Delicado. Another look at Principal curves and surfaces.

    Journal of Multivariate Analysis

    , 77:84–116, 2001.
  • [7] J. F. Durand. Generalized principal component analysis with respect to instrumental variables via univariate spline transformations. Computational Statistics and Data Analysis, 16:423–440, 1993.
  • [8] J. H. Friedman. Exploratory Projection Pursuit. Journal of the American Statistical Association, 82(397):249–266, 1987.
  • [9] J. H. Friedman and W. Stuetzle. Projection Pursuit Regression. Journal of the American Statistical Association, 76(376):817–823, 1981.
  • [10] J. H. Friedman and J. W. Tukey. A Projection Pursuit algorithm for exploratory data analysis. IEEE Trans. on computers, 23(9):881–890, 1974.
  • [11] J. Garcia, N. Sanchez, and R. Velasquez. Quantitative Stellar Spectral Classification. IV. Application to the Open Cluster IC 2391. Rev.Mex.Astron.Astrofis. 45 (2009) 13-24, September 2008.
  • [12] J. Garcia, J. Stock, M. J. Stock, and N. Sanchez. Quantitative Stellar Spectral Classification. III. Spectral Resolution. Rev.Mex.Astron.Astrofis. 41 (2005) 31-40, October 2004.
  • [13] S. Girard and S. Iovleff. Auto-Associative models and generalized principal component analysis. Journal of Multivariate Analysis, 93:21–39, 2005.
  • [14] Prautzsch H., Boehm W., and Paluszny M. Bézier and B-Spline Techniques. Mathematics and visualization. 2002.
  • [15] P. Hall. On polynomial-based projection indices for exploratory projection pursuit. The Annals of Statistics, 17(2):589–605, 1990.
  • [16] T. Hastie and W. Stuetzle. Principal curves. Journal of the American Statistical Association, 84(406):502–516, 1989.
  • [17] T. Hastie, R. Tibshinari, and J. Friedman. The elements of statistical learning. Springer Series in Statistics, Springer, second edition edition, 2001.
  • [18] T. J. Hastie and R. J. Tibshirani. Generalized Additive Models. Monographs on Statisics and Applied Probability, 43, 1990.
  • [19] G. E. Hinton, P. Dayan, and M. Revow. Modeling the manifolds of images of handwrittent digits. IEEE transactions on Neural networks, 8(1):65–74, 1997.
  • [20] H. Hotelling. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24:417–441, 1933.
  • [21] P. J. Huber. Projection Pursuit. The Annals of Statistics, 13(2):435–475, 1985.
  • [22] Serge Iovleff. The Statitiscal ToolKit. http://www.stkpp.org/, 2012.
  • [23] I. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, 1986.
  • [24] M. C. Jones and R. Sibson. What is projection pursuit? Journal of the Royal Statistical Society, Ser. A, 150:1–36, 1987.
  • [25] S. Klinke and J. Grassmann. Projection pursuit regression. Wiley Series in Probability and Statistics, pages 471–496, 2000.
  • [26] Lebart L. Contiguity analysis and classification. In Gaul W., Opitz O., and Schader M., editors, Data Analysis, pages 233–244. Springer-Verlag, 2000.
  • [27] Bei-Wei Lu and Lionel Pandolfo1. Quasi-objective nonlinear principal component analysis. Neural Networks, 24(2):159–170, 2010.
  • [28] J-X. Pan, W-K. Fung, and K-T. Fang. Multiple outlier detection in multivariate data using projection pursuit techniques. Journal of Statistical Planning and Inference, 83(1):153–167, 2000.
  • [29] K. Pearson. On lines and planes of closest fit to systems of points in space. The London, Edinburgh and Dublin philosophical magazine and journal of science, Sixth Series(2):559–572, 1901.
  • [30] Sam T. Roweis and Lawrence K. Saul. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 290(5500):2323–2326, 2000.
  • [31] B. Schölkopf, A. Smola, and K.-R. Müller. Kernel Principal Component Analysis. In Advances in Kernel Methods—Support Vector Learning, pages 327–352, 1999.
  • [32] M. Scholz, M. Fraunholz, and J. Selbig. Nonlinear Principal Component Analysis: Neural Network Models and Applications. In

    Principal Manifolds for Data Visualization and Dimension Reduction, volume 28,

    , pages 205–222. Springer-Verlag, 2007.
  • [33] G. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6:461–464, 1978.
  • [34] J. Stock and M. Stock. Quantitative stellar spectral classification. Revista Mexican de Astronomia y Astrofisica, 34:143–156, 1999.
  • [35] M. J. Stock, J. Stock, J. Garcia, and N. Sanchez. Quantitative Stellar Spectral Classification. II. Early Type Stars. Rev.Mex.Astron.Astrofis. 38 (2002) 127-140, May 2002.
  • [36] J. B. Tenenbaum. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290(5500):2319–2323, December 2000.
  • [37] M. E. Tipping and C. M. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society, Ser. B, 61(3):611–622, 1999.