Classification problems arise in many real-life situations. A new observation has to be classified on the basis of a training set, which is described by a set of features whose class memberships are known. Supervised learning techniques have been widely studied when the features lie on a vector space(Hastie et al., 2009). When features do not form a vector space, well-known supervised learning techniques are not well suited to the classifiers Tuzel et al. (2008). However, features can also take values on a Riemannian manifold. This is common in fields such as astronomy, geology, meteorology, etc., which include natural distributions on spheres, tangent bundles and Lie groups (González-Manteiga et al., 2012)
. Another example, this time in the field of computer vision, would be the space of non-singular covariance matrices(Tuzel et al., 2008). One discipline that certainly offers many examples in different fields of applications (biology, medicine, chemistry, etc.) is statistical shape analysis (Dryden and Mardia, 2016)
. Many problems involve predicting a categorical variable as a function of the shape of an object that lies in a Riemannian manifold.
Although different approaches can be identified in shape analysis based on how the object is treated in mathematical terms (Stoyan and Stoyan, 1995), the majority of research has been restricted to landmark-based analysis, where objects are represented using labeled points in the Euclidean space . These landmarks are required to appear in each data object, and to correspond to each other in a physical sense. Seminal papers on this topic are Bookstein (1978), Kendall (1984), and Goodall (1991). The main references are Dryden and Mardia (2016) and Kendall et al. (2009). In this paper we concentrate on this approach.
In a formal way, shape can be defined as the geometrical information about the object that is invariant under a Euclidean similarity transformation, i. e., location, orientation, and scale. The shape space is the resulting quotient space. When the landmark-based approach is used, the corresponding shape space is a finite-dimensional Riemannian manifold, and statistical methodologies on manifolds must be used. There are several difficulties in generalizing probability distributions and statistical procedures to measurements in a non-vectorial space like a Riemannian manifold, but fortunately, there has been a significant amount of research and activity in this area over recent years. An excellent review can be found inPennec (2006).
The most immediate approach for solving the classification problem when predictive variables take values on a Riemmanian manifold would be to map the manifold to a Euclidean space, i.e. to flatten the manifold. But, in a general case, mapping that globally preserves the distances between the points on the manifold is not available. As a consequence, the flattened space would not represent the global structure of the points appropriately. Although statistical analysis of manifold-valued data has gained a great deal of attention in recent years, there is little literature on classification. In fact, to our knowledge the only reference is Tuzel et al. (2008), and it is restricted to a binary classification problem. A LogitBoost (Friedman et al., 2000) on Riemannian manifolds is proposed in Tuzel et al. (2008). It is similar to the original LogitBoost, except for differences at the level of weak learners (the regression functions are learned on the tangent space at the weighted mean of the points).
Another related work is González-Manteiga et al. (2012)
, but they studied a regression problem (the predicted variable is real valued) rather than a classification problem. They introduced partially linear models on Riemannian manifolds (robust estimators can be found inHenry and Rodríguez (2014)). Partially linear models were proposed by Engle et al. (1986). Since then, partially linear models have been used in the study of complex nonlinear problems, some recent examples are Zhang et al. (2017), Qian and Wang (2017), Cui et al. (2017) and Hilafu and Wu (2017). In this semiparametric regression method, the dependent variable is modeled with a parametric linear part and a nonparametric part. In González-Manteiga et al. (2012)
the variable to be non-parametrically modeled is in a manifold, and they proposed a kernel-type estimator.
Based on this idea, we can generalize partially linear models (GPLM) to solve the classification problem with features in a Riemmanian manifold. We benefit from the flexibility of partially linear models and at the same time we can include features in a Riemmanian manifold. To our knowledge, this is the first time GPLMs have been defined on Riemmanian manifolds. At the same time, we also propose a solution for the classification problem for more than two classes, the only case studied to date. In particular, we also introduce a solution for when the dependent variable is ordinal. Furthermore, unlike the method proposed in Tuzel et al. (2008) where features in a Riemmanian manifold were the only predictive variables, other predictive variables together with those in a Riemmanian manifold are managed jointly by our proposal.
This paper addresses an important current application: size fitting for online garment shops, in particular children’s garment size matching. Customers face a challenge when they have to choose the right size of garment without try it on when buying these items both in store and, especially, in online clothes shops (Ding et al., 2011). Although users can base their decision on their previous experience (their virtual closet), children are constantly growing, so this not suitable strategy (Sindicich and Black, 2011). Not only that, but each company also has its own sizing, and what is more, this can change over time (Schofield and LaBat, 2005). As a consequence, size matching in children should be based on their current form.
There is usually a sizing chart that corresponds to several anthropometric measurements, together with their ranges to show the size assignation. Nevertheless, customer’s measurements can lead to different size assignations depending on which measurements are considered. Therefore, customers cannot know which size will fit them best (Labat, 2007). As a result, size fitting problems lead to a high percentage of returns, which represents one of the main costs of this sales channel for distributors and manufacturers. The return rates of some e-commerce businesses are between 20 and 50% (Eneh, 2015). This also decreases customer satisfaction (Otieno et al., 2005) and the likelihood that the customer will buy again. Moreover, concern about poor fit is the main obstacle to purchasing clothes online.
To address the child garment size matching problem, a fit assessment study was carried out by the Biomechanics Institute of Valencia. A sample of Spanish children aged between 3 and 12 years were scanned using the Vitus Smart 3D body scanner from Human Solutions. This has a non-intrusive laser system consisting of four columns that house the optic system, which moves from head to feet in ten seconds, performing a sweep of the body. The body shape of each child in our data set was represented by 3D landmarks. Although a 3D body scanner is not usually available for customers, nowadays customers can obtain their detailed body shapes using their own digital cameras or other measuring technologies (Cordier et al., 2003; Ballester et al., 2015). Recently, 3D bodies have been reconstructed from images captured with a smartphone or tablet in Ballester et al. (2016). Furthermore, a subsample of these children tested different garments of different sizes, and their fit was assessed by an expert. This expert labeled the fit as 2 (correct), 1 (if the garment was small for the child) or 3 (if the garment was large for the child) in an ordered factor called Size-fit.
Therefore, finding the garment size that best fits the user is a statistical classification problem(Meunier, 2000). In this problem, the children’s body shapes represented by landmarks are predictive variables in a Riemmanian manifold. The proposed method has been applied to the aforementioned database of children with excellent results.
To our knowledge, the only previous reference about the child garment size matching problem is Pierola et al. (2016). However, they used multivariate features, not the complete information about the child’s form. In particular, they used the differences between the reference mannequin of the evaluated size and the child for several anthropometric measurements. If the reference mannequin is not available, that methodology cannot be used.
As regards other works that also use variables in a Riemmanian manifold in the context of the apparel industry, in Vinué et al. (2016)
women’s body shapes represented by landmarks were used to define a new sizing system by adapting clustering algorithms to the shape space. Unlike our supervised learning problem, they dealt with an unsupervised learning problem. Another unsupervised learning problem is faced inEpifanio et al. (2017), where archetypal shapes of children are discovered.
The R language (R Core Team, 2017) was employed in our implementations. We used the shapes package by Ian Dryden (Dryden, 2017). This is a very powerful and complete package for the statistical analysis of shapes.
The article is organized as follows: Section 2 reviews the partially linear models on Riemmanian manifolds, which are generalized in Section 3 to Riemmanian manifolds. Algorithms for their estimation are also given. Their R (R Core Team (2017)) code is available at www3.uji.es/~epifanio/RESEARCH/partly.rar. Section 4 describes the basic concepts of statistical shape analysis, and explains how to estimate generalized partially linear models on the Kendall’s 3D Shape Space. The use of the logistic partly linear model on the Kendall’s 3D Shape Space is illustrated by a well-known data set in Section 5, while the ordered partially linear model is applied to solve the child garment size matching problem in Section 6. Finally, conclusions are discussed in Section 7.
2 Partially linear models on Riemannian manifolds
Partially linear models (PLM) Engle et al. (1986)
are regression models in which the response depends on some covariates linearly but on other covariates nonparametrically. PLMs generalize standard linear regression techniques and are special cases of additive models(Hastie and Tibshirani, 1990; Stone, 1985), which makes it easier to interpret the effect of each variable.
Partially linear models when one of the predictive variables takes values on a Riemannian manifold were introduced in González-Manteiga et al. (2012). In this work, they consider a sample , where the response variable, , is a real valued scalar variable, is a real valued -dimensional vector and is a point of a Riemannian manifold, , of dimension . They assume the partially linear model:
with , where and ; and with independent errors and . Therefore, , and are the parameters to estimate.
Manteiga et al. González-Manteiga et al. (2012) suggest estimating and using non-parametric kernel-type estimators on Riemannian manifolds (see section 2.1) and then estimating the parameter considering the least-squares estimator obtained by minimizing:
2.1 Non-parametric estimators on Riemannian manifolds
Let be iid random vectors that take values on . Due to the curvature of , kernel-type estimators of must be adapted to this space.
Pelletier Pelletier (2006) proposes the following non parametric estimator:
where is the volume density function of ; is the Riemannian distance on and is a univariate kernel function with bandwidth with and , being the injectivity radius of . In Pelletier (2006) we can find some good properties of this estimator.
3 Generalized Partially Linear Model on Riemannian manifolds
As stated in the introduction, the aim of this paper is to generalize the partially linear model on Riemannian manifolds to the generalized linear model introduced by Nelder and Wedderburn (1972) and to apply it to the particular and important case of the Kendall’s 3D shape space.
Although our proposal can be extended to generalized linear models in general, we will focus on two particular important models that we will use in our applications: logistic and ordered logistic models.
3.1 Logistic Partially Linear Model on Riemannian manifolds
Let be a set, where
are binary variables,real valued -dimensional vectors and are points in , a Riemannian manifold of dimension .
Defining , we can assume the logistic partially linear model:
with ; ; and where , and are the parameters to estimate.
As in González-Manteiga et al. (2012), because is in a Riemannian manifold, the estimation of must be obtained using equation 3. In the next section the expression of this estimator will be given for the particular and difficult case of Kendall’s 3D shape space.
The algorithm that we propose follows the ideas of additive generalized linear models (Hastie and Tibshirani, 1990; Hastie et al., 2009): a partially linear model is applied to the adjusted dependent variable at each step of the iteratively reweighted least squares (IRLS) algorithm. It is as follows (the superindex () indicates the estimation in the -th iteration):
With respect to the initializations, and , which would correspond to equiprobability, provided good results in our experiments.
3.2 Ordered Partially Linear Model on Riemannian manifolds
The above algorithm can be modified to model an ordinal response, in particular we will assume the cumulative logistic model or proportional odds modelMcCullagh (1980); Agresti (2010).
Let be a set, where are response variables, real valued -dimensional vectors and are points in , a Riemannian manifold of dimension . Suppose that the response variable has ordered categories and , . Assume:
Following Walker and Duncan (1967); McCullagh (1980) and Thompson and Baker (1981), we treat the cumulative link model as a multivariate generalized linear model Fahrmeir and Tutz (2013) defining as if and otherwise as zero. In the multivariate case one merely has to substitute vectors and matrices for the multivariate versions.
We define the total design matrix ; and
Let be the derivative of the link function and the weight matrix , with (which can be considered an approximation of the inverse of the covariance matrix of the transformed response).
Algorithm 1 is modified as follows:
Initialize , , ,
Calculate using equation 3
While ( specified threshold) and ( specified maximum number of steps) do
Construct the working target variable
Apply partly linear model to the targets with weight matrix to :
Calculate using equation 3 replacing by
In the particular case of the ordinal model with three categories of our application:
4 Kendall’s 3D Shape Space
In the previous section the logistic and ordered logistic partially linear models were given for a general Riemannian manifold. In this section we give the expressions that we need in order to apply them in the particular and important case of the Kendall ’s 3D Shape Space. This manifold has a complicated structure and the calculus of the expressions needed in equation 3 is not trivial.
We begin by introducing some basic concepts, a complete introduction to which can be found in Dryden and Mardia (2016).
In our approach to shape analysis each object is identified by a set of landmarks, i.e. a set of points in the Euclidean space that identifies each object and match between and within populations.
A configuration matrix is a matrix with the Cartesian coordinates of the landmarks of an object.
The shape of an object is all the geometric information that remains invariant with translations, rotations and changes of scale. Thus:
The shape space is the set of equivalence classes of configuration matrices under the action of Euclidean similarity transformations.
As mentioned above, the shape space admits a Riemannian manifold structure. The complexity of this Riemannian structure depends on and . For example, is the well-known complex projective space. For , which is the case of our application, they are not familiar spaces and may have singularities.
A representative of each equivalence class can be obtained by removing the similarity transformations one at a time. There are different ways to do that.
Let be a configuration matrix. One way to remove the location effect consists of multiplying it by the Helmert submatrix, , i. e., .
To filter scale, we can divide by the centroid size, which is given by
is the Frobenius norm.
is called the pre-shape of the configuration matrix because all information about location and scale is removed, but rotation information remains.
The pre-shape space is the set of all possible pre-shapes.
is a hypersphere of unit radius in (a Riemannian manifold that is widely studied and known). is the quotient space of under rotations.
As a result, a shape is an orbit associated with the action of the rotation group on the pre-shape.
From now on, in order to simplify the notation, we will use to denote both, a configuration matrix and its shape, provided that it is understood by context.
For , this quotient space is isometric with the complex projective space , a familiar Riemannian manifold without singularities. For , is not a familiar space, and it has singularities. The singularities are shapes whose preshapes have rank or less. With real world applications we can usually assume that our data are almost certainly in the non-singular part of the shape space and, fortunately, the Riemannian structure of the non-singular part of can be obtained taking into account that is a Riemannian submersion (Kendall et al., 2009), for any the tangent space can be identified with the horizontal space of .
4.1 Riemannian distance
The induced Riemannian distance in the shape space is given by the Procrustes distance defined as follows.
Given two configuration matrices , the Procrustes distance of its corresponding shapes, , is the closest great-circle distance between and on the pre-shape hypersphere , where . The minimization is carried out over rotations.
The solution for this optimization problem is:
are the square roots of the eigenvalues of, and the smallest value is the negative square root if and only if Dryden and Mardia (2016).
Note that the range of this distance is .
4.2 Volume density function
The volume density function can be obtained taking into account that the mapping that assigns the corresponding element on the shape space to each preshape :
is a Riemannian submersion. Then the volume density function is (see A)
when , and if . We must stress here the importance of the volume density function in the case of a large number of landmarks , because in the limit , the definition formula (10) becomes
4.3 Generalized Partially Linear Models on the Kendall’s 3D Shape Space
Once the necessary concepts have been introduced, we turn to the algorithm to fit a generalized partially linear model on Kendall’s Shape Space. We focus on the particular case of the ordered partially linear model (for the logistic partially linear model, it is analogous but instead we apply the algorithm 1).
5 Application to anatomical data
In an investigation into sex differences in the crania of a species of macaque, random samples of 9 male and 9 female skulls were obtained by Paul O Higgins (Hull-York Medical School) (Dryden and Mardia, 2016, 1993). A subset of seven anatomical landmarks was located on each cranium and the three-dimensional (3D) coordinates of each point were recorded. The aim of the study was to assess whether there were any size and shape differences between sexes. The data are available in the R shapes package (Dryden, 2017).
Fig. 1 (a) shows the distribution of all the landmarks of the 18 subjects. From the configuration matrices with the coordinates of the landmarks of the 18 macaques, the full Procrustes mean shapes are computed for males and females separately (see Fig. 1 (b)), and their preshapes, , (eq. 9) and sizes (eq. 8), are computed (see Fig. 2 and Table 1).
If we define if the cranium belongs to a female and if it belongs to a male, then we can model:
and algorithm 1 can be used to fit a logistic partially linear model.
In the smoothing procedure, we considered a Gaussian kernel and the bandwidth parameter was fixed as by using a leave-one-out cross-validation (CV) procedure. With this value, a CV error was obtained. With threshold=0.0002, the algorithm stops at iterations. (see Table 2).
|( of correct classifications)|
The estimation procedure provides a , signaling that the probability of being female decreases as skull size increases, and the values presented in Table 3 were obtained for the nonparametric part of the model.
6 Application to children’s body shapes
The aim of this section is to show how the aforementioned algorithm can be used to predict the goodness of fit of a given garment size, i.e. small (), good fit () or large (), as a function of the garment size, the size of the child and his/her shape.
There are multiple ways to choose the most suitable size in a potential online sales application, and all of them depend on the manufacturer.
A randomly selected sample of Spanish children aged to was scanned using a Vitus Smart 3D body scanner from Human Solutions.
The children were asked to wear a standard white garment in order to standardize the measurements. Several cameras capture images and associated software
provided by the scanner manufacturers detects the brightest points and
uses them to create a triangulation that provides information about the 3D
spatial location of points on the body’s surface.
The 3D scan data are processed to create of posture-harmonized homologous models to obtain a database of individual 3D homologous avatars with one-to-one anatomical vertex correspondence between them. As a result, each child’s body shape is represented by 3D landmarks. Because the children’s head, hands, legs and feet are not involved in the shirt size selection, these parts were discarded from the scans, and a total of 1423 3D landmarks per child were considered, i.e. each child’s the body shape was represented by a configuration matrix. Two of them are shown in Figure 3.
Seventy eight of these children performed an additional fit test. All of them tried on the same shirt model in different sizes: the supposedly correct size, the size above and the size below. Then, an expert in clothing and design qualitatively evaluated the fit in each case (as small, correct fit or large). Due to lack of cooperation by some of the children, not all the children tried on all the three sizes, but only two sizes or even one. In 24 cases, only two sizes were evaluated, and 9 children tested just one shirt size. There were 7 shirt sizes available, supposedly corresponding to ages 3, 4, 5, 6, 8, 10 and 12.
Two subsamples are considered, the sample consisting of the boys and that consisting of the girls in the data base.
Algorithm 2 is applied to fit the model:
to each subsample, being the body shape of the -th child, his/her body size and the size evaluated.
We consider a Gaussian kernel again and in order to choose the value of the bandwidth parameter and, at the same time, perform a quantitative analysis of the effectiveness of the method, a leave-one-out cross-validation study is conducted. At each step of this study, a child is left out, and his/her fit predicted for the supposedly correct size, the size above and the size below. In Table 4 we can see the percentage of correct classifications in each case.
|CV (%) accuracy||boys|
The estimation procedure using the full data set provides with for boys and with for girls. With threshold=0.0002, the algorithm stops at 783 and 618 iterations respectively.
6.1 Comparison with other methods
In Pierola et al. (2016)
, the authors used ordered logistic regression and random forest methodologies to predict a garment’s goodness of fit from the differences between the measurements of the reference mannequin for the evaluated size and the child’s anthropometric measurements. We could also have used different children’s anthropometric measurements to fit a classical proportional odds modelMcCullagh (1980). So given the response variable with ordered categories, and given a vector of explicative variables formed by the garment size to evaluate and the 27 children’s anthropometric measurements considered by Pierola et al. (2016), we could have fitted:
Performing a leave-one-out cross-validation study, choosing the model on each step by a forward stepwise model selection based on likelihood ratio tests Christensen (2015), we obtain worse results than those obtained with our methodology. The percentages of correct classifications are now for boys and for girls.
On the other hand, as stated in the introduction, the shape space and size-and-shape space are not flat Euclidean spaces, so classical statistical methods cannot be directly applied to the manifold valued data. However, if the sample has little variability, the problem can be transferred to a tangent space (at the Procrustes mean of these shapes or size-and-shapes, for example) and then standard multivariate procedures can be performed in this space Dryden and Mardia (2016)
, such as Principal Component Analysis (PCA). With this approach, in order to reduce the dimensionality of the data set, the firstPC scores, which summarize most of the variability in the tangent plane data, are usually chosen.
The tangent space is defined from a point called pole, so the distance from the shape to the pole is preserved, i.e. the distance from a point in the manifold to the pole is equal to the Euclidean distance between its projections in the tangent space. As one moves away from the pole, the Euclidean distances between some pairs of points in the tangent space are smaller than their corresponding shape distances. This distortion becomes larger as one considers points further from it. For this reason, the pole should be taken close to all of the points and the mean of the observed shapes is the best choice (Dryden and Mardia, 2016).
So, given the configuration matrices , the size of each child is obtained and the full Procrustes mean shapes are computed for boys and girls separately. Then, the coordinates of the projection of onto the tangent plane defined at their corresponding mean shape are obtained. The first PC scores of these coordinates are calculated and they will be used as covariates in our predictive model. The first PC components that explain of variability are considered.
So, given the response variable with ordered categories and given a vector with the garment size to evaluate, the child’s size and the first PC scores of his/her coordinates in his/her corresponding tangent space, we can fit the model given by Eq. 11.
Once again, performing a leave-one-out cross-validation study using this model, we obtain worse results than those obtained with our methodology. The percentages of correct classifications are now for boys and for girls.
We define GPLMs on Riemmanian manifolds for the first time. Due the application that we address, our GPLMs have focused on Kendall’s 3D Shape Space. Although it is an important and common problem in real applications, this problem has not been addressed until now, to the best of our knowledge. We have developed and illustrated the algorithms for estimating the GPLM in two different applications. We have also compared the results with other simpler approximations in the case of the children’s garment size matching problem.
Although we have focused on children’s shapes in the application, the methodology can also be used to select the right size for adults, men and women. Furthermore, as pointed out in Sect. 1, the proposed methodologies have great potential in all the fields where statistical shape analysis is used, including biological and medical applications.
Besides opening the door to applications in different fields, other future work could focus on other Riemannian manifolds. Moreover, all the work carried out on GPLMs for multivariate data could be extended to the case where variables also take values on Riemannian manifolds.
This paper has been partially supported by the grant from the Spanish Ministry of Economy and Competitiveness with FEDER funds and the grant UJI-B2017-13 from Universitat Jaume I. We would also like to thank the Biomechanics Institute of Valencia for providing us with the data set.
Appendix A Volume density function in the shape space
In this section we recall the notion of the volume density function. We need to study the volume density function when we work in a curved space. Our first step is to introduce the definition of the volume density function in its most general sense. This is attained when the underlying space is a Riemannian manifold , namely, a smooth manifold
endowed with a metric tensor. After that, we will particularize it to the explicit formula for the volume density function in the shape space , which is the relevant space in this paper. However, since it is easier to work with the pre-shape sphere than in shape space , our objective will be to make use of the submersion to compute the volume density function in explicitly.
The definition of the volume density function is as follows.
Definition 5 (see Henry and Rodríguez (2009) for instance).
Let be a Riemannian manifold, let be a point of , let be the tangent space at , let be an open geodesic ball of radius centered at , let be an open ball of radius centered at , and let be the injectivity radius at , (i.e. the maximum radius such that the exponential map is a diffeomorphism). The volume density function is a function defined for any point of the normal ball by
where is the pullback of by the exponential map, is the canonical metric induced by in , and is any chart of that contains .
In this work, following Kendall et al. (2009), we identify a point in the pre-shape sphere as a matrix. The explicit formula of the volume density function in the shape space is given in the following theorem.
Let and be two points of . Let be a Riemannian submersion from to . Suppose that (i.e. is not a singular point), then the volume density function is
where here is the distance in from to .
Since is a Riemannian submersion, for any , the tangent space can be identified with the horizontal space of . Moreover since is an isometry, for any , it is the case that . On the other hand, if , then . Therefore, in order to compute we can make use of the restriction of the exponential map to the horizontal space .
In our particular setting, given , the tangent space is
the horizontal space is
and the exponential map is given by
We are now going to obtain using the properties of the exponential map in . Given , for some , suppose and suppose moreover that is in a normal ball of . Let us now choose an orthonormal basis of with (and with ). Then
Hence, since is in a normal ball of , there is a minimal and horizontal geodesic segment starting at and ending in with initial velocity such that . Taking into account that since for any () and that because and are tangent vectors to we conclude that
where is the dimension of . Now, we are going to compute the dimension of for any . Since the tangent space can be decomposed as , then
where is the dimension of the fiber . The dimension of the fiber