## The strange geometry of shape space

Although a planar shape can be represented as a curve drawn in two dimensional space, the space of point positions representing that curve has many more dimensions: for each of the points we have and coordinates, so need numbers to describe each shape. Point configurations that differ only in position, scale, reflection, rotation, or some combination of these, describe the same shape, which means that the set of different *shapes* is not the whole of , but a subspace of it. And that, in turn, means that shapes live not in ordinary Euclidean space but in a non-linear space embedded within it (Kendall77, Kendall1984). For the basic case of triangles – three points in – the shape space of allowable shapes is a hemisphere (Small, Klingenberg). The points describing more complex shapes exist in unknown, but certainly much more complex, shape spaces (Figure 1). And curves, which are continuous, exist in infinite dimensional shape spaces (Younes).

Geometric Morphometric methods treat outline shapes as sets of points. As such they are *linear* methods that operate in Euclidean space (Bookstein, Zelditch12, Macleod2018). Known variously as eigenshape analysis (Lohmann83, MacLeod_EA, Macleod1999) or Statistical Shape Models (ASM), they begin by extracting shapes from forms by standardizing position, scale, and rotation using a procedure called Procrustes alignment (Gower)

. The dimensionality of the space of coordinate points is then reduced by a method such as Principal Components Analysis, and the distances between shapes computed among the vectors of derived variables.

The strange geometry of shape space means, however, that distances between sets of point coordinates may not be very good estimators of the true distances between shapes. An alternative is to consider the shape as a piece of elastic and ask how much ‘energy’ — stretching and bending — is required to transform one shape into another. Under certain restrictions, one curve can be continuously deformed into another using a smooth, invertible, function called a *diffeomorphism*. Large Deformation Diffeomorphic Metric Mapping (LDDMM) algorithms transform shape curves into each other and estimate a distance directly in the space of diffeomorphisms, which is an infinite-dimensional manifold that can be equipped with a Riemannian metric (Beg05, Younes) (Figure 1). As with Geometric Morphometrics, they deal with form not shape, and so require that Procrustes alignment be applied first.

It can be beneficial to simplify the shape space by transforming the shapes before analysis. The Square-Root Velocity Function (SRVF) method maps shapes in such a way that the shape space is a sphere. The distances among shapes can, then, be easily computed as great circles (Joshi2007a, Kurtek2011, Srivastava16) (Figure 1). Geometric Currents (GC) does something conceptually similar, transforming each shape into a mathematical function that can be represented as a point in the standard Euclidean vector space (Figure 1) based on Geometric Measure Theory (Federer60). Since this linear space is equipped with an Euclidean metric, it very easy to compute distances among shapes, and other standard statistical techniques such as PCA can also be used (currents, Durrleman2009). However, unlike SRVFs, it is not possible to transform the points in the new space back into the original shapes. Although the distances are now Euclidean, the GC transformation preserves much of the information present in the original shape space, whereas linear Geometric Morphometric methods simply ignore it (currents). Neither SRVF nor GC require Procrustes alignment.

Geometric Morphometric methods have been used in fields such as evolutionary biology (Ferson1985, Bookstein), medical image analysis (Cootes94, Heimann09), and archaeology (Grosman2016, Macleod2018, Johaczi2018) for many years. Diffeomorphic methods, by contrast, have been applied only recently (e.g., (dino_2011, bio_tut, RNA_2011, handwriting_2014, tree_2014, kimia_2015, animation_2017markus, tumour_2018, biomed_2020)). As far as we know these two, very different, approaches to shape analysis have not been tested against each other on real objects. Diffeomorphic methods should, in principle, provide better estimates of the true distances among real objects, but whether they do so in fact — and whether any gain in accuracy justifies their greater computational cost — is unclear. Given the range of shape variation among real objects, the assumption that shape space is linear may even be reasonable (Klingenberg).

Here, then, we test one flavour of Geometric Morphometrics, semi-landmarks eigenshapes analysis, and three diffeomorphic methods — LDDMM, SRVF and GC — against each other in order to find out which of them performs best when classifying the shapes of real objects. The objects belong to three very different classes — ancient Greek vases, the leaves of Swedish trees and gastropod shells — chosen so that our results would be useful to archaeologists, botanists and zoologists, all of whom describe the shapes of the things that they study.

Each of our datasets is divided into classes, for example, genera of shells. Our test, then, rests on the ability of a statistical classifier, trained on distances computed by our various methods, to identify those classes. We show that, for all datasets, one diffeomorphic method — SRVFs — is superior to all other methods, including eigenshapes which, however, usually works impressively well. However, we also wanted to know what a good classification — the kind that a trained human might make — looks like, so we asked experts to undertake the same test. We find that most of our algorithmic methods beat the human experts. We conclude that such methods, particularly those that operate on curves rather than points, are very effective when applied to many shape classification problems, and can even be superior to humans. Finally, in homage to the grandfather of shape analysis, D’Arcy Wentworth Thompson, we show that some of these methods provide an answer to the problem that he posed in Chapter XVII of *On Growth & Form* (OGF): how mathematics might be used to transform one shape into another.

## Results

We studied the two-dimensional outline shapes of three very different sets of objects: vases, leaves and shells. The vase outlines are based on 716 images of Athenian black- or red-figure vases classified into 24 classes: the shape categories used by vase scholars; the leaf outlines are based on 440 images of Swedish leaves classified into 15 Linnaean species; the shell outlines are based on 235 images of gastropod shells classified into 10 Linnaean genera. Figure 2 shows, for each dataset, one of the original images from which outlines were extracted, as well as the outline of a randomly chosen member of each class. Our examples embrace a great variety of shapes. Where the outlines of Greek vases are mostly smooth, those of shells and leaves are often very jagged; and where our shells have quite similar aspect ratios, some leaves are needles, others are pancake-like, while others are something in-between. Within each class the individual objects are unique and distributed more-or-less evenly among classes.

### Square-Root Velocity Functions are superior to other shape description methods

To test our four shape description methods — eigenshapes, LDDMM, SRVFs, and GC — we first calculated the pairwise distances between all objects of each set — vases, leaves and shells — using each method. We then trained a statistical classifier on the distances among a training set of between 51–67% of the objects, and then asked the classifier to assign the remaining test objects to a class. To ensure that our results did not depend on the chance allocation of individuals among training and test sets, we constructed a hundred different sets by random stratified sampling and ran the classifier on each. Since the shape analysis methods compute distances between shapes, the obvious classifier is one that uses such distances directly, here the -nearest neighbour (-NN) classifier. We measured classification success as the

-score, the harmonic mean of precision and recall of the obtained classification relative to ground-truth

(MarslandBook2).Even though the training sets were small — a few hundred individuals divided among 10–24 classes — the -NN proved remarkably good at classifying outline shapes. Its ability to do so, however, depended on the shape description method used. Figure 3 shows the ranked performance of each method over the object samples. The Square-Root Velocity Function method was the top-ranked method in all cases, being able to classify vases into their classes with 97% accuracy, leaves with 92% and shells with 84% (-scores); Geometric Currents performed next best overall, followed by eigenshapes.

In order to show each method to its best advantage we varied their parameters (see Methods & Materials); Figure 3

reports the best result for each class. The variation in performance that comes from tweaking parameters can be instructive. When trying eigenshapes, for example, we varied the number of principal components that went into the distances and found that, in all cases, the winner used at least 90% of the total variance and, for vases 99.9%, which suggests that some of the shape differences between classes are very subtle indeed.

The best method, SRVFs, improves shape classification accuracy over the linear eigenshapes method by 5–10% depending on the object class. However, the superiority of diffeomorphic methods is also evident when we plot the positions of the objects in the relevant shape space. Eigenshapes, Geometric Currents and SRVFs all yield principal components and, in general, the classes are better separated in GC and SRVF PC-space than they are in eigenshape space (Figure 4A).

We can also compute the average shape of each class using our various methods. For SRVFs this is the Karcher (Fréchet) mean, an average shape estimated in Riemannian space (Grove1973). Figure 4B shows that, where eigenshape means are rather amorphous, even blob-like, Karcher means retain more detail and so resemble the original objects much more closely (compare the objects in Figure 4B with those in Figure 2). Thus our results show that a diffeomorphic shape description method, SRVFs, is better than the standard linear method, eigenshapes, at classifying the shapes of real objects and also at producing accurate averages of groups of objects. We note, however, that eigenshapes actually work surprisingly well and beat at least one diffeomorphic method, LDDMM, at least in our implementation.

### A machine shape-based classifier is superior to human experts

We would like a machine classifier that classifies at least as well as humans do. But not all humans are equally adept at classifying all things. To find out how well our shape-based -NNs perform we therefore formed a panel of experts composed of three classical vase scholars, three botanists, and three malacologists. We then asked each expert to classify a single set of objects into classes. Each expert only classified objects about which they were an expert (i.e., the malacologists only got shells). Each expert was given outlines that resemble those in Figure 2; they had no direct information about the test set that the -NN did not. All three experts were given the same set of object outlines to classify and told how many classes to make, but did not have to name them. Thus their task was the same as that given to the -NN except that, instead of being trained on a training set, they had to rely on what they already knew.

For these particular test sets, the SRVF-based -NN classifier achieved -scores of 0.971, 0.908, and 0.848, for vases, leaves, and shells, respectively: comparable to the scores we found on our hundred-replicate data sets (Figure 3). Our experts weren’t as good: the mean scores of the three (

one standard deviation) were 0.847

0.087, 0.799 0.039, and 0.574 0.044 for the same objects. The best that any expert did on any dataset was 0.95 (for vases), but even that expert was beaten by the machine. Interestingly, the rank order of the average abilities of our expert groups — vase-scholars botanists malacologists — is the same as that of the machine classifiers, which suggests that the*a priori*taxonomies of vases, leaves and shells that we used embody successively less shape information. Moreover, as the confusion matrices show, experts and algorithms tend to make the same kind of mistakes (Figure 5). Where our experts tended to confuse kyathoi and skyphoi vases, the three species of

*Ulmus*leaves, and shells belonging to the muricid genera

*Hexaplex*and

*Chicoreus*, so did the algorithms. There are some differences. The SRVF-based -NN correctly classified most

*Conus*and

*Conasprella*shells correctly even though they have very similar cone-shaped shells. Our experts, by contrast, all failed to do so. In general, however, our results suggest that, when classifying shapes, human experts and machine classifiers based on distances in shape space do much the same thing. It’s just that algorithms do it better.

### Finding the shortest paths in shape space

SRVF and LDDMM work by transforming shapes into each other. When doing so, they find a geodesic — the shortest path in shape space. Any point along this path can be back-transformed into a shape in the original space to produce a transformational series. To illustrate this we transformed the outline of a plausible ancestor, or at least ancient relative, to one of our modern objects and inferred some intermediates. Figures 6A–C shows the transformation of a Proto-Attic Neck Amphora (725-675 BCE) into an Athenian Red Figure Neck Amphora (525-475 BCE) (Cook1997); an early Miocene (20-18 Ma) maple, *Acer palaeosaccharinum* (Denk2017) into the recent *A. platanoides*; and the first known Conid gastropod, the late Paleocene *Hemiconus leroyi* (59.2–56 Ma) into the recent *Conus furvus* (Leroy2014, Tracey2017). These examples are only illustrative: we do not claim that the earlier objects are true ancestors of the more recent ones. Indeed, the transformed objects need not be linked by evolutionary descent at all. In 1995 the New Zealand Pop artist, Dick Frizzell, transformed an American icon, Mickey Mouse, into a Māori one, the Tiki (Figures 6D & E). The SRVF geodesic path from Mickey to Tiki is slightly different from the artist’s — and 23% more efficient (Figure 6F).

## Discussion

Although linear Geometric Morphometric methods have been widely used, they only approximate distances in the complex geometry of shape space (Kendall1984, Klingenberg). For biological objects this approximation largely suffices (Klingenberg) and, consistent with this claim, we find that eigenshapes analysis generally performs quite well at our classification task. However, two diffeomorphic methods, SRVF and GC, are even better at distinguishing and classifying objects of different shape, and this is true for objects as disparate as vases, leaves and shells. In addition, the mean shape of groups of objects in these shape spaces also clearly preserve more detail than linear shape means do. These results imply that diffeomorphic methods, until now mostly studied by mathematicians, belong in the scientist’s toolbox. All the implementations that we used are publicly available (see Material & Methods).

The three diffeomorphic methods do not perform equally well at the classification task. Since LDDMM does not simplify the shape space it might be expected that it would give the most accurate distances relative to human experts. In fact, of the three it performs the worst. This is because the metric used trades-off the precision of the transformation with the length of the path between them. For this reason it sometimes finds a transformation that is only close to the true target shape, and so may miss some of the finer distinctions among our classes. Since SRVF and GC work in spaces with much simpler geometries, they should be able to match curves exactly. However, implementation in a computer requires additional constraints. The GC algorithm first discretizes the shapes and, in doing so, sacrifices some information about them, while the SRVF algorithm avoids this at the cost of simplifying the path between the shapes. *A priori* it is not clear which of these approaches would be most effective, but empirically SRVFs are for these datsets.

Our machine shape classifier worked best on vases, slightly less well on leaves and only moderately well on shells. It may be supposed that these differences in performance depend on the shapes themselves, however, since the performance of the experts showed exactly the same rank order, it is much more likely that they depend on the quality of the classes. While our ground-truth classes were imposed by humans and chosen by us in the expectation that their members have, on average, different shapes, their natures vary. The vase classes are based on a scholarly taxonomy that largely depends on their gross shapes, but the leaf and shell classes are not; for modern biological genera and species are distinguished not only by gross shape gross shape, shape and positioning of constituent parts (e.g., spiral ribs, varices, leaf veins), but also microscopic, ecological, behavioural and genetic traits or abstract properties such as the ability to interbreed. Even the differences among vase classes are not all visible from their outlines, depending, in part, on constructional details. This means that the amount of information about class identity that is visible from shape outlines varies greatly among the three datasets.

Our classifiers used only shape rather than the many other features that might distinguish these groups. Furthermore, the classifier that we used — a

-NN — requires very small training sets compared to the large training sets required by more sophisticated ML methods such as a Convolutional Neural Network (CNN). However,

-NN is the natural choice since our analysis gives distances among shapes rather than features. Even so, the success of our shape-based classification is remarkable. We imagine that they might be useful for the automatic classification of the innumerable objects that differ in shape, not only those we have studied here, but even things as diverse as protein structures, the spectrograms of bird songs or the melodies of pop songs (e.g, (Urbano2011, Imai2016, Srivastava2016, Cope2012)). Given suitable data our methods could also be applied to three-dimensional shapes (e.g., (Koutsoudis2011, Johaczi2018)). That would be useful for rotationally asymmetrical objects, but also require much more computational effort.Our classifier was more accurate than the judgements of experts, almost regardless of the shape-analysis method under the hood. Why is this? We asked our experts and found that they were often led astray by prior knowledge. Where the machine classifier was trained to distinguish the groups actually present, the experts sometimes sought the groups that they thought *should* have been there. For example, all three malacologists failed to distinguish between the closely related genera *Conus* and *Conasprella*. They did so because the classification of the Conidae remains unsettled (Puillandre15), and the relationship between shell shape and genera unclear. Indeed, one of our experts had second thoughts about the cones, gave us a revised classification before being told the ground-truth, and got the best -score among the malacologists, 0.716. Our intention, however, is not to diminish experts who, after all, usually have much more information about the objects that they classify, but rather show how effective machine shape-classifiers can be, even when based on very small training sets.

Our study revealed some limitations of the diffeomorphic methods as currently implemented. The first is that, compared to linear methods, they are computationally expensive. In the implementations we used, a SRVF or LDDMM registration for a single pair of shape outlines takes, on average, 1–3 seconds to process on a modern laptop. Computing all 255,970 pairwise distances for our Greek vase data set of 716 objects takes, then, 85 hours if performed sequentially. The Geometric Currents algorithm is much faster and takes about 10 seconds to complete the same task, although it does suffer from some memory issues. However, eigenshapes — Procrustes alignment, PCA, and distance calculations — takes, on average, only 1.5 seconds.

A second limitation is more profound. The curves that we have used are closed — without a start or end — and the algorithms can rotate one shape relative to the other to find lower energy paths from one to the other. In general, this should result in homologous parts being aligned to each other, but it need not. Instead, the spire of one shell might be aligned to another’s siphonal canal or the neck of one vase to another’s base. Indeed, when using SRVFs to transform various shapes into each other we came across some instances of just this phenomenon. That may not matter for the purposes of mere classification, but any evolutionary interpretation of the distances would be incorrect, for the inferred path would be one that evolution could not possibly have taken. Geometric morphometric methods, which depend on the correspondence of explicitly homologous points — landmarks — are not vulnerable to this error. Constraining the curve transformations by adding some landmarks may solve this problem, and we will consider this in the future.

More than a hundred years ago D’Arcy Wentworth Thompson posited his “theory of transformations” which held that species closely related by evolutionary descent should also be related by “simple” shape transformations; and that “small” transformations indicate particularly close evolutionary affinities (MandM, OGF). To demonstrate this Thompson relied on outline drawings of an animal, adding a rectilinear grid that was deformed using a regular transformation, with the image of the animal deformed along with it until it more closely matched another animal. Our modern equivalents dispense with the grid and match the curves more accurately. However, in spirit they are the same, and our transformations illustrate how the evolution of shape in Riemannian space can be modelled so that it might be mapped onto a phylogeny or even used to infer one (Gavryushkina2014, Parins2017).

.

### Datasets

The vase images were obtained from the Beazley Archive Pottery Database (BAPD) at Oxford University; their taxonomy, which was modified slightly from the standard shape taxonomy given in the BAPD, was checked by two experts, TM and DRP. The leaf images are based on the Swedish Leaf Dataset previously used in the image analysis and shape literature (swedish_leaf); the images came with species labels which were checked by an expert, TER. The shell images were obtained from Gastropods.com; the images came with species labels whose taxonomy was standardized to the World Register of Marine Species (WoRMs) and checked by AML. Each image represents a unique object and was checked to ensure that it was complete and in standard orientation. The sources of the original images are given in a datafile on this repository [url].

### Data Preparation

Shape methods require an outline of the object, and often it is necessary to extract this from a digital photograph. While this has been an area of research interest for a long time in computer vision — and is something that humans do easily — there do not yet exist completely reliable methods

(De16). We used a common contour extraction algorithm, the Marching Squares method (maple)on a binarized version of each image, with the threshold chosen experimentally. For the leaves and shells no other pre-processing was performed, but for the vases the handles were removed using a spline fit, which was verified and, if necessary, corrected manually. The vases were also made to have a reflective symmetry through a central vertical axis by computing the outline contour of each side, and using the shorter one of the pair, reflecting it to make the full shape. This removes structures such as the spouts of pouring vessels.

Each outline curve was sampled to have an identical number of equally spaced points — 139 for the vases, 150 for the shells, and 200 for the leaves — by sampling a cubic spline fitted to the curve. Preliminary experiments showed that, at these resolutions no difference between the interpolated curve and the original shape were visible to the naked eye. The point sets were aligned using Procrustes alignment to remove the global transformations of scale, rotation, and translation from the curves. This is necessary for the linear method and LDDMM, but not for SRVF or GC. Examples of the resulting shape outlines with filled interiors are shown in Figure

2. The same datasets of shape outlines were used when testing all methods. The shape outline data are available at the following DOIs: Vases: 10.6084/m9.figshare.14551002, Leaves: 10.6084/m9.figshare.14551005, Shells: 10.6084/m9.figshare.14551044. .### Estimating distances

Parameters for each method were chosen experimentally based on the training data, and the upper-triangular distance matrix between all pairs of shapes computed for each method. *Eigenshapes*: We used the points that parameterise the curve as semi-landmarks. We experimented with optimising the position of these landmarks, but it was computationally expensive and did not improve the results. We computed the principal components of the point coordinates of all shapes and, from these, the Euclidean distances among them using the first -dimensions, where was chosen based on the amount of the variance explained, ranging from to . *LDDMM*: We used the implementation described in (langevin_2017) available here, running for 20 timesteps. *SRVF*: We used the implementation available here. The Path-Straightening algorithm is described in (path_st_2011) and available *here*. The algorithm transforms one shape to another in steps. The output is the geodesic distance, which is the inner product in SRVF space between the first shape and the final shape in the transformation. To compute our distance matrix, we set . *Geometric Currents*: we used the method described by (currents) available here. This implementation takes three parameters: a non-negative integer, , determining the size of the matrix representation; the mesh-size, ; and a scaling parameter, . We tested three options for each parameter where and .

### Machine classification

Most machine learning algorithms take as input features of the elements of the dataset (or their complete representation), rather than distances. We, however, used our various shape analysis methods to compute distances among objects and wish to classify on those. For this reason we implemented our own

-nearest neighbour (-NN) classifier that takes a distance matrix as its input. Our -NN assigns elements of the test set to the class of the majority of the closest points in the training set, where is a user-selected parameter. We tested values of between 3 and 12 for each method and object class and found the that results in the highest -score. We ran the -NN on 100 randomly selected samples from the training sets of each dataset and computed the -scores, where the samples were selected with a pseudo-random number generator. For vases the ratio of training:test set was 480:236, leaves 300:140, and shells 120:115. In order to ensure that training set size was not the reason for better performance in the case of vases, we also reduced the size of the training set in that case (to 10 in each class), leaving the test set alone, without significantly changing the results. Interestingly, even when reducing the training set further, to 2 in each class, the classifier still did well. We used the sklearn implementation of the -score with the average parameter set to “weighted”.### Expert classification

Each expert was given a standard test set of shape outlines as individual images and asked to partition them into groups, where is the number of ground-truth classes, by sorting them into folders. The objects were anonymized so that no expert had any information about them that the machine classifier did not. The experts were not asked to identify the groups that they formed. Each expert’s classification was then compared to the ground-truth classification by an -score.

### Transformations

To create the transformation plots seen in Figure 6, we used the SRVF Path-Straightening algorithm with . Note that the transformations are not necessarily symmetric even if the shapes themselves are symmetric, such as Mickey and Tiki. Therefore, to display a symmetric transformation between Mickey and Tiki, we split the outlines in half and transformed these halves from one to another. The transformations were then reflected and attached. Furthermore, to test the efficiency of our transformation with the artist’s, albeit in a metaphorical sense, we computed the sum of the distances between consecutive outlines, i.e., the energy needed to deform one shape into the other.

We thank Thomas Denk, Swedish Museum of Natural History, for images of fossil leaves; Steven Tracey, Natural History Museum, London, for images of fossil cones; Keith Kirby, Plant Sciences, University of Oxford, for help with the plant classification task, and Dick Frizell for the use of his art work *Mickey to Tiki Tu Meke*. AS-J was supported by an EPSRC studentship, awarded to Brunel University London.