1 Introduction
There has been a lot of recent interest in the analysis of data lying on a differentiable, locally Euclidean, manifold, rather than in standard Euclidean space. Such data can arise in lots of application areas, but the analysis of shape has been a particular interest (Pennec, 2006; Fletcher et al., 2003; Sommer et al., 2010). The particular challenge of statistics on manifolds is that the space is not uniformly flat, meaning that while many of the standard techniques of Euclidean data analysis have analogues, they may not be uniquely defined, and they are local, that is only defined within an infinitesimal neighbourhood of each datapoint.
This last point is crucial. The particular differentiable manifold that is considered is the Riemannian manifold and the ‘tools’ of data analysis on such a manifold (the Riemannian metric, and the connection) are built at the scale of infinitesimal neighbourhoods, through the computation of tangent vectors and so tangent spaces at each point of the manifold. The metric compares pairs of tangent vectors defined at the same point, defining the inner product between them, which means that the length of vectors can be computed, and integrated to compute distances along paths in the manifold. Then, in order to compare the tangent spaces at different points on the manifold, a connection is required that describes how to transport vectors defined at one point to another.
The curvature of a manifold means that parallel transport is not pathindependent, so that transporting the same vector between two places by different paths will generally produce different vectors at the endpoint. For example, consider the case of parallel transporting a vector on the shell of a sphere in normal three dimensional space, such as (approximately) the surface of the earth. We start at the north pole, move down to the equator, around the equator for one quarter of the earth’s circumference, and then back to the north pole. Comparing the resulting vector with one that did not move at all, we can see that the transported one is now at to the original version. Importantly for data analysis, the lack of path independence means that it is impossible to define a data basis and thus a global coordinate system on the manifold.
In addition to curvature, there is another geometric invariant of differentiable manifolds, which is torsion. Far less commonly considered, torsion of a manifold means that parallel transporting around an infinitesimal parallelogram does not return exactly to the original point, so that loops are not closed; the additional translation required to close a loop gives a measure of the torsion of that region. The reason why torsion is not used is that there exists a particular choice of connection, the LeviCivita connection, that has curvature but everywhere zero torsion, thus simplifying things. In this paper we will consider a differentiable, locally Euclidean, manifold endowed not with the LeviCivita connection, but with the Weitzenböck connection, which has torsion, but where the curvature is uniformly zero.
The Weitzenböck connection can be defined in terms of Cartan’s moving frames (Cartan, 1922)
, which could be used explicitly as a basis for data if they were known. Unfortunately, they are data dependent, and the point of this paper is to derive an algorithm that constructs an estimate of the Weitzenböck frame at each point of the manifold. We introduce the idea of working on data spaces endowed with the Weitzenböck connection, and derive an algorithm that generates autoparallels in a datadriven fashion on the dataspace, enabling the mapping of the data manifold. An autoparallel is the ‘straightest’ path on the manifold according to the directions of the moving frame at each point. Our algorithm is, of necessity, iterative, because we can only make local estimates of the moving frame, and hence the autoparallels, and need to remove each principal autoparallel to find the next most common direction in the data.
Weitzenböck space has also been used to describe an autoparallel version of General Relativity (Hammond, 2002), the shears that sometimes occur in crystals, when points where the regular structure of the atoms of the crystal change (Kleinert & Shabanov, 1998), and nonholonomic mechanics (Fernandez & Bloch, 2011).
2 Mathematical Formulation
We here present a brief overview of the concepts required to understand the geometry of Weitzenböck space. For further details, see for example (Weitzenböck, 1923; Hehl & Obukhov, 2007; Guo et al., 2010); for an alternative description of torsion based on the soldering form, see Michor (2008).
2.1 Torsion
Given a differentiable, locally Euclidean, manifold with an affine connection, torsion can be defined as the amount by which parallel transport parallelograms do not close. Consider starting with a vector at a point , and transporting it in a direction , followed by another direction , and also by transporting it in direction followed by . In a Riemannian manifold with curvature the parallelogram formed by following both paths is closed by construction, but the resultant vector will not point in the same direction if the space is curved. In a manifold with torsion, but not curvature, the parallelogram will not be closed, but the resultant vectors will point in the same direction; see Figure 1 of Hehl & Obukhov (2007) for a picture of this. The general case of a differentiable manifold with both curvature and torsion is known as a RiemannCartan manifold.
In terms of the covariant derivatives of affine connection of vector fields and the torsion can be defined as (where is the Lie bracket of and ):
(1) 
In a coordinate basis, the Lie Bracket of the basis vectors vanishes by definition, and so the torsion reduces to (where are the connection in component form):
(2) 
which is the antisymmetric part of the connection in the coordinate basis.
2.2 The Weitzenböck Frame
Consider standard Euclidean space , which we associate with a Weitzenböck space . Since we are working in Euclidean space, there is a natural coordinate frame, which is global, each direction of which corresponds to a unitlength tangent vector with only one nonzero coordinate component.
At any point we can define a local basis using orthonormal basis vectors , with coordinate components that depend upon their position (where there are datapoints):
(3) 
These basis vector fields are the positiondependent Weitzenböck frame, a particular example of a moving frame (Cartan, 1922), also known as a vielbein, or specifically in 2D, a zweibein, and 3D a dreibein). The frame is orthonormal at every point by construction, but anholonomic, since the Lie derivative does not vanish. This means that the order in which the basis vectors are followed matters: consider starting at a point on the manifold and following the two paths given by going in direction then direction , and vice versa:
(4) 
For an arbitrary vector there is a simple relationship between the components in the coordinate frame and Weitzenböck frames:
(5) 
We can also define torsion in this frame basis:
(6) 
A vector field is everywhere parallel if and only if in the Weitzenböck frame:
(7) 
where is a reference point and is a scalar function.
We can now start to consider how a tangent vector at describes an infinitesimal change in the position of that point. By using the Euclidean metric on space we induce a metric on :
(8) 
where is the Kronecker delta.
The change in coordinate components of any arbitrary vector as it moves from to under parallel transport in a differentiable manifold is described via the connection (although note that with the addition of torsion, the connection does not retain the indexsymmetry of the Christoffel symbols familiar to Riemannian geometry with the LeviCivita connection):
(9) 
Combining (8) and (9) we see that we can define an everywhereparallel vector field by taking the value of the field at a single point, and then parallel transporting it to every point. This gives the relation between the Weitzenböck frame and the Weitzenböck connection defined by that frame:
(10) 
Clearly, the parallel transport is independent of the path taken, and hence the curvature is zero everywhere by construction, and the space possesses absolute parallelism. To see that the space does indeed have torsion, consider the construction of an infinitesimal parallelogram using the frame basis vectors:
(11) 
which means the (infinitesimal) parallelogram will not close.
2.3 AutoParallels
In Riemannian geometry, geodesics are paths that are local extrema of the metric distance (the shortest paths between points). This definition carries over to Weitzenböck space, with the geodesic equation being given by:
(12) 
where are the Christoffel symbols of the metric.
However, there are other curves that we can define in Weitzenböck space, which are curves that are as ‘straight’ as possible, rather than as short as possible, which means following a constant direction with respect to the frame basis:
(13) 
where the are constants, and the curve has been parameterised to be constant speed. Figure 1 shows the difference between geodesics and autoparallels for a simple two dimensional example.
If we now differentiate this a second time, and note that:
then it is simple to show that this path is a solution of the autoparallel equation:
This is an analogue of the geodesic equation with a RiemannCartan connection, rather than just the symmetric LeviCivita connection. In certain cases autoparallels and geodesics can coincide, but with the Euclidean metric this can only occur if the frames do not twist at all, which is when the torsion is identically zero.
Autoparallels in the Weitzenböck space map to straight lines in the Euclidean space. While in general the dimension of the Euclidean space will be higher (just as embedding a curved 2D surface in Euclidean space requires 3 (or more) dimensions, by the Nash Embedding Theorem) the dimensionality is the same for the flat Weitzenböck space. To construct the mapping, we take as Cartesian coordinates in and use the vielbein to generate a map between tangent spaces of Weitzenböck space with torsion, and tangent spaces of Euclidean space, which are all equivalent. Then the mapping to Euclidean coordinates is given by:
Now two paths of the form:
end up at the same place in the Weitzenböck space, but not in the Euclidean space, where we have the path:
The difference between the paths is given by:
which is the mapping coefficients that have a nonvanishing curl, i.e., precisely the torsion. This construction is known as a nonholonomic mapping, for further details see (Kleinert & Shabanov, 1998; Kleinert, 2000; Guo et al., 2010).
The construction we have given is based on the moving frame: a set of linearly independent vector fields are derived and used to construct a derivative. However, for a dataset in an unknown manifold, this is not known a priori, rather, we need to build it from the data. Unfortunately, this is not possible. In the next section we describe an algorithm that iteratively generates this set in a way that is analagous to PCA, and give examples of its use on standard datasets.
3 PAPA: Principal AutoParallel Analysis
The idea behind the algorithm is that autoparallels describe fibres (in the differential geometric sense of elements of the fibre bundle). The autoparallel gives a single direction along the vector field, which can be followed. It is therefore possible to choose a base space, and to identify the points where each fibre intersects that base space. This means that each datapoint can be projected onto the base space by constructing the autoparallel through it, and following until it intersects the base space, choosing the closest one if it meets it several times.
The first direction has then been projected out, leaving data in a lower dimensional space. The same method can then be used to find the autoparallels in the new space, and the method can be iterated as many times as is required. A useful way to understand the algorithm is by analogy with the standard description (rather than implementation) of normal Principal Component Analysis (PCA): the first step is to find the direction with maximum variance as the first principal component, and then to successfully find components that maximise the remaining variance.
PCA can be used for dimensionality reduction by retaining only a subset of the new components, with the data being projected into the subspace that they span. The choice of an appropriate number of components is often based on some percentage of the variance, or a similar consideration. For our method an appropriate choice is not so clear, but given that the noise is often isotropic, one appropriate time to stop is when no consistent autoparallel directions are found, but all directions seem to be equally preferable.
Note that while at each stage we are approximating one vector field of the connection, it is not possible to construct the full connection from them, since the spaces that each vector field live in are different: they are not defined on the same space.
Computationally, there are two tasks: to approximate the autoparallel, and to perform the projection onto the base space. We use standard PCA in a local neighbourhood to define a single direction along the vector field. This describes the local direction of the autoparallel, and we can therefore construct a numerical approximation to an autoparallel by iterating this approach: at a given point on the data manifold we compute the first local principal component, move a step of size in that direction, and then repeat. It is also possible to move in the negative direction, so that the entire data manifold can be traversed.
With any form of local PCA the size of the neighbourhood is critical, and that is particularly true here, since if too few datapoints are used the estimate of the direction of the autoparallel is noisy, while if too many datapoints are used the direction can be compromised by elements that are close by in Euclidean space, but not on the manifold; we will show an example of this shortly. We therefore estimate an appropriate size for the neighbourhood by computing the pointtopoint distance (with a randomly datapoint selected as the origin). By plotting a histogram of these distances (an example is shown on the right of Figure 4) it is possible to choose an appropriate value for the maximum distance allowable to be included in the neighbourhood.
3.1 Demonstration
As a first demonstration of our algorithm we use two standard twodimensional datasets that are commonly used in manifold and surface learning. Figure 2 shows the datasets, each of which consists of 1000 datapoints drawn uniformly at random from the data distribution, together with a set of 50 autoparallels, each running for 200 steps () both forwards and backwards, and starting from randomly chosen datapoints. Where there is good support in the data the autoparallels track the data well, but where the data stops, the autoparallels continue, as the local PCA allows overshooting, before the autoparallels bend round, back towards the dataset. In the dataset on the right, it can be seen that this effect of the local PCA means that the autoparallels can ‘fall off’ the end of one curve and head towards the other portion of the data.
The most famous dataset of manifold learning is the threedimensional swiss roll, and we demonstrate that next. On the left of Figure 3 are drawn a set of 200 autoparallels following the data. It can be seen that they match the data well, and that there is sufficient data at the ends of the dataset for the autoparallels to curve around and run back along the dataset. Also visible in that figure are a set of points marking the intersections between a plane running at to the axis, which intersects with the dataset twice. Having identified these points it is possible to compute the signed distance from each datapoint to the nearest point of intersection, and the histogram of these distances are shown on the right of Figure 3. The data can then be projected onto these two onedimensional data spaces to complete the analysis.
In order to demonstrate the difficulty of using local PCA for the construction of autoparallels, on the left of Figure 4 we demonstrate a case where the neighbourhood was set too large, meaning that datapoints from the next layer of the swiss roll were also selected, pulling the the autoparallel off the manifold. In order to choose an appropriate value, we computed the distribution of pointtopoint distances to a random datapoint, and used this histogram to identify a suitable cutoff by looking for the first point where there was a clear dropoff in the number of points. This lead to the choice of a maximum of 0.5 being chosen in this case.
As another demonstration of our algorithm, we used the optdigits
dataset from the UCI Machine Learning Repository
(Asuncion & Newman, 2007). Figure 5 shows points along four autoparallels starting from randomly chosen examples of a 1, a 3, a 7, and an 8. The points along the autoparallels do seem to make some sense in terms of how they transition between the digits, although there are clearly some problems with the local PCA parameters, since the autoparallel starting at a 1 moves through 4 to 7, back to 4, and then back to 7, and the autoparallel starting at a 7 appears to do similar cycling between 0 and 5. Overall, these results suggest that the autoparallels follow expected paths through the data.4 Related Algorithms
Conceptually, the algorithm that we find most similar to ours is the Principal Curve algorithm, particularly the locally defined variant described in Ozertem & Erdogmus (2011). A principal curve is a curve through the data where each point on the curve is the mean of the data points that have it as the closest point on the curve (Hastie & Stuetzle, 1989). In Ozertem & Erdogmus (2011)
they construct principal curves and surfaces based an estimate of the underlying continuum data density. This leads to an algorithm based on computing the eigenvectors of the Hessian (i.e., principal curvature directions where the curve is flat), with the constraint that the curve is orthogonal to the gradient, so that it sits on a ridge in the data. In order to project data onto the ‘nearest’ principal curve they use the local covariance to identify the direction to travel. This local covariance is rather different to what we have considered in this paper. We are using the standard definitions of covariance and sample covariance, whereas in their paper they use the fact that for a Gaussian distribution they can find a combination of the Hessian and gradient that is not spatially varying, and that this can define a version of covariance. While the two versions are related, ours requires only a reasonable estimate of a single direction, whereas they need a sufficiently good density estimate to construct the local covariance matrix at each point.
Of course, our approach is also related to local PCA in its implementation. However, we are using locality to refer to points within the dataset, rather than treating the problem as a variant of means clustering. In local PCA it is common to approximate the whole data distribution by some set of PCA subspaces, and then has to decide which subspace to assign a point to (i.e., which cluster), then doing PCA again for each cluster (Gassenbauer et al., 2011). The method also differs from neighbourhoodpreserving methods such as Local Linear Embedding (Roweis & Saul, 2000), where locally flat patches of the manifold are identified and linked together, since we make no linear approximation.
Another relevant dimensionality reduction algorithm is Independent Component Analysis (ICA)
(Hyvärinen & Oja, 2000), of which PAPA can be seen as a nonlinear variant, in the sense that sequential unwrapping of the global manifold identifies a series of nonlinear coordinate transforms. However, our approach has a strong ordering of the components, and does not require that the data is nonGaussian.5 Conclusion
The analysis of data lying on differentiable manifolds provides many challenges not seen in the normal Euclidean case. Of particular importance is the local nature of all computations that are made, which is caused by the curvature of the space, meaning that parallel transport between points on the manifold are path dependent. In this paper we have introduced the idea of performing data analysis in Weitzenböck space, which is a differentiable manifold endowed with a connection that has torsion, but not curvature. The benefit of this space is that it is flat by construction; the price that is paid for this is that infinitesimal parallelograms do not close, the signature of torsion.
Construction of the Weitzenböck space requires a positiondependent orthonormal frame to be defined on the manifold. We have described a constructive method to approximate autoparallels on the intrinsic data manifold by using the first local principal component at each datapoint, and then projecting the data into the subspace where that dimension is not included. We have provided experimental demonstrations that this algorithm works well on a variety of standard nonlinear manifolds from the literature.
We will extend this approach to more complicated datasets, and also to the case of shape spaces, which has driven much of the creation of statistical methods for Riemannian manifolds.
Acknowledgments
This work was supported by the Royal Society of New Zealand’s Marsden Fund, and part of the work was done while Stephen Marsland was based at the Erwin Schrödinger Institute for Mathematical Physics in Vienna, Austria.
References
 Asuncion & Newman (2007) Asuncion, A and Newman, D J. UCI machine learning repository, 2007. URL http://www.ics.uci.edu/~mlearn/MLRepository.html.
 Cartan (1922) Cartan, Élie. Sur une généralisation de la notion de courbure de Riemann et les espaces à torsion. C.R. Acad. Sci., 174:593–595, 1922.
 Fernandez & Bloch (2011) Fernandez, O. E. and Bloch, A. M. The Weitzenböck connection and time reparameterization in nonholonomic mechanics. Journal of Mathematical Physics, 52(1):012901, 2011. ISSN 00222488. doi: 10.1063/1.3525798. URL http://link.aip.org/link/JMAPAQ/v52/i1/p012901/s1&Agg=doi.

Fletcher et al. (2003)
Fletcher, P.T., Lu, Conglin, and Joshi, S.
Statistics of shape via principal geodesic analysis on lie groups.
IEEE Computer Society Conference on Computer Vision and Pattern Recognition
, 2003. ISSN 10636919. doi: 10.1109/CVPR.2003.1211342.  Gassenbauer et al. (2011) Gassenbauer, Vaclav, Krivanek, Jaroslav, Bouatouch, Kadi, Bouville, Christian, and Ribardiere, Mickael. Improving performance and accuracy of local PCA. Pacific Graphics, 30(7), 2011.
 Guo et al. (2010) Guo, YongXin, Liu, Chang, Wang, Yong, Liu, ShiXing, and Chang, Peng. Nonholonomic mapping theory of autoparallel motions in RiemannCartan space. Science China  Physics, Mechanics, & Astronomy, 53(9):1707–1715, 2010.
 Hammond (2002) Hammond, Richard T. Torsion gravity. Reports on Progress in Physics, 65:599–649, 2002.
 Hastie & Stuetzle (1989) Hastie, Trevor and Stuetzle, Werner. Principal Curves. Journal of the American Statistical Association, 84(406):502–516, June 1989. ISSN 01621459. doi: 10.2307/2289936. URL http://www.jstor.org/stable/2289936?origin=crossref.
 Hehl & Obukhov (2007) Hehl, Friedrich W. and Obukhov, Yuri N. Élie Cartan’s torsion in geometry and field theory, an essay. Annales de la Foundation Louis de Broglie, 32(2–3):157–194, 2007.
 Hyvärinen & Oja (2000) Hyvärinen, A and Oja, E. Independent component analysis: algorithms and applications. Neural networks, 13(1):411–30, 2000. ISSN 08936080. doi: 10.1016/S08936080(00)000265. URL http://www.ncbi.nlm.nih.gov/pubmed/10946390.
 Kleinert (2000) Kleinert, H. Nonholonomic mapping principle for classical and quantum mechanics in spaces with curvature and torsion. General Relativity and Gravitation, 32(5):769–839, 2000.
 Kleinert & Shabanov (1998) Kleinert, H. and Shabanov, S. V. Spaces with torsion from embedding and the special role of autoparallel trajectories. Physics Letters B, 428:315–321, 1998.
 Michor (2008) Michor, Peter W. Topics in differential geometry, volume 93 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, USA, 2008.
 Ozertem & Erdogmus (2011) Ozertem, Umut and Erdogmus, Deniz. Locally defined principal curves and surfaces. Journal of Machine Learning Research, 12:1249–1286, 2011.
 Pennec (2006) Pennec, Xavier. Intrinsic Statistics on Riemannian Manifolds: Basic Tools for Geometric Measurements. Journal of Mathematical Imaging and Vision, 25(1):127–154, July 2006. ISSN 09249907. doi: 10.1007/s1085100662284. URL http://www.springerlink.com/index/10.1007/s1085100662284.
 Roweis & Saul (2000) Roweis, Sam T and Saul, Lawrence K. Nonlinear dimensionality reduction by locally linear embedding. Science (New York, N.Y.), 290(5500):2323–6, December 2000. ISSN 00368075. doi: 10.1126/science.290.5500.2323. URL http://www.ncbi.nlm.nih.gov/pubmed/11125150.
 Sommer et al. (2010) Sommer, Stefan, Hauberg, Sø ren, and Nielsen, Mads. Manifold Valued Statistics, Exact Principal Geodesic Analysis and the Effect of Linear Approximations. In European Conference on Computer Vision, 2010.
 Weitzenböck (1923) Weitzenböck, R. Invariantentheorie. P. Noordhoff, 1923.