Position paper: Towards an observer-oriented theory of shape comparison

03/07/2016 ∙ by Patrizio Frosini, et al. ∙ 0

In this position paper we suggest a possible metric approach to shape comparison that is based on a mathematical formalization of the concept of observer, seen as a collection of suitable operators acting on a metric space of functions. These functions represent the set of data that are accessible to the observer, while the operators describe the way the observer elaborates the data and enclose the invariance that he/she associates with them. We expose this model and illustrate some theoretical reasons that justify its possible use for shape comparison.



There are no comments yet.


page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The poem “The Blind Men and the Elephant” by John Godfrey Saxe begins with these verses:

“It was six men of Indostan
To learning much inclined,
Who went to see the Elephant
(Though all of them were blind),
That each by observation
Might satisfy his mind.”

This famous Indian story describes a group of blind men that touch an elephant and totally disagree about what it is like, because each one touches a different part of the body. In literature, art and philosophy the theme of multiperspectivity (i.e., the existence of many different interpretations of the same perceptual phenomenon) is a well-known issue [Mau11].

Beautiful examples of this fact can also be found in artworks. The fascinating sculptures by Guido Moretti displayed in Figures 1 and 2 show in a direct and clear way how the concept of shape cannot be defined and treated independently from the choice of an observer. Shape is indeed in the beholder’s eyes, and phenomena such as camouflage and optical illusions depend on this basic principle (cf., e.g., [Koe90, Fro09, GFSF10]).

It is indeed well-known that in many contexts the concept of shape is not a property of objects but of the pairs (object, observer) that are involved in perception, since changing the observer can drastically transform the perception of reality.

In the past these observations were mostly confined to the philosophical and epistemological debate, but nowadays they start to be quite relevant also in several scientific applications involving Information Technology [Sta07]. In particular, geometrical shape comparison often requires approaches that take into account the role of the observer. Trying to avoid this problem by considering shapes just as subsets, topological subspaces or submanifolds of an Euclidean space contributes to the semantic gap between the geometrical descriptions and their perceptual meanings (cf. [HF07, SWS00]). This semantic gap cannot be closed focusing just on the objects and disregarding the chosen observer.

In this position paper we intend to consider this issue and propose a possible solution, framing it in the emerging field of Topological Data Analysis [Car09].

We are interested in these questions:

  • Is there a general metric model to compare data in TDA?

  • What is the role of the observer in this comparison?

  • How could we approximate the observer’s judgement by means of a computable metric?

Figure 1: Different observers can perceive different shapes in the presence of the same object. This image depicts three views of the bronze sculpture “Impossible Ring and Parallelepipeds” by Guido Moretti.
Figure 2: Three views of the bronze sculpture “Impossible Ring and Pillars” by Guido Moretti.

2 Our theoretical model

In the next two sections we will describe both the principles accounting for our model and its mathematical formalization.

2.1 An informal description of our model

The model we propose to consider is based on these general assumptions:

  1. No object can be studied in a direct and absolute way. Any object is only knowable through acts of measurement made by an observer.

  2. Any act of measurement can be represented as a function defined on a topological space.

  3. The observer usually acquires measurement data by applying operators to the functions describing them.

  4. Only the observer is entitled to decide about shape similarity.

Assumption 1 is justified by the fact that according to the scientific paradigm, we cannot refer to properties of reality that are intrinsically impossible to detect by measurement processes.

Assumption 2 is based on the fact that when we make a measurement, we usually obtain a function as a result. For example, a grayscale image can be considered as a function from a rectangle to the real numbers (or, in the discrete case, from the set of cells of a matrix to a set of integers). The result of a CT scanning can be seen as a function from (or, more precisely, a helix going around a body) to the real numbers, where represents the topological space of all directions that are orthogonal to a given axis and the real numbers are the metric space of all possible quantities of matter encountered by the X-ray beam in the considered direction. A weight measurement is just a function from a singleton to the real numbers, taking the only available point in the domain to the weight of the object we are examining. We also observe that many kinds of data that are not usually represented as functions can in fact be described by means of functions. For example, every cloud of points in a metric space is equivalent to the function that takes each point of to its distance from .

The requirement that the domain of the function be a topological space is important in applications where we need to assume that our data are continuous. However, in the presence of discontinuities, it is usually required that they be localized at “small” subsets of the domain, so that we still need to use a topology on it. For example, we usually assume that the color of the points of an object changes continuously, possibly apart from a set of null measure. The formalization of this assumption cannot be made without the use of a topology.

Assumption 3 is supported by the fact that in most of the experiments, data are not used directly, but after an elaboration that makes easier (or simply feasible) their analysis. This elaboration is usually done by means of suitable operators, which are sometimes embedded in the measurement process. Building the body of a simplicial -complex from a cloud of points or blurring an image are examples of two such operators among many possible others. These operators transform functions into other functions that are usually simpler to manage.

It is important to underline that the observer cannot usually choose the functions representing the measurement data, but can often choose the operators that will be applied to those functions. Moreover, the choice of the operators reflects the invariances that are relevant for the observer.

Assumption 4 is based on what we previously said about multiperspectivity.

According to this model, instead of directly focusing on the objects we are interested in, we should focus on the functions describing the measurements we make on the objects, and on the “glasses” that we use to “observe” the functions. In our approach, these “glasses” are invariant operators which act on the functions.

These operators represent the observer’s perspective and, in some sense, define the observer.

Figure 3: In the proposed model, each observer can be represented as a collection of (suitable) operators , which act on the functions that represent the measurement data and are endowed with the invariance that the observer has chosen.

2.2 A mathematical framework to formalize our model

The previous epistemological model has led to the mathematical framework that will be described in this section (cf. [FJ16]).

Let us consider a compact space and a subset of the set of all continuous functions from to . The space represents the space of the functions that the observer considers as acceptable data. In our previous example about CT scanning, would be the set of all functions that we can obtain by associating each X-ray beam with the quantity of matter it can encounter.

The observer usually takes some invariance into account. This invariance can be usually represented by a group.

Let be a subgroup of the group of all homeomorphisms . We assume that the group acts on by composition on the right, i.e. by taking every to the function , for each .

We can define a pseudo-distance by setting

The function is called the natural pseudo-distance associated with the group . We recall that a pseudo-distance is just a distance without the property .

In plain words, the definition of is based on the attempt of finding the best correspondence between the functions (i.e. observations) by means of homeomorphisms in . If , and are equivalent w.r.t. . For example, if is the space of all normalized grayscale images, represented as the set of all compact-supported functions from the real plane to the interval , we can choose to be the group of rigid motions of the plane. In this case the equality means that there is a rigid motion taking the image to the image . The choice of the invariance group is assigned to the observer, which is the only judge of similarity in shape comparison.

The natural pseudo-distance represents our ground truth. When the observer has chosen the set of signals he/she can perceive and the invariance group he/she uses to define which signals are considered equivalent, endows with a pseudo-metric structure. Unfortunately, in many cases is difficult to compute. This is also a consequence of the fact that we can easily find subgroups of that cannot be approximated with arbitrary precision by smaller finite subgroups of (i.e. group of rigid motions of ).

Nevertheless, can be approximated with arbitrary precision by means of a dual approach based on persistent homology and -invariant non-expansive operators.

We recall that persistent homology is a theory describing the -dimensional holes (components, tunnels, voids, … ) of the sublevel sets of a topological space endowed with a continuous function . In the case , persistent homology is described by suitable collections of points called persistence diagrams [EH10]. These diagrams can be compared by a suitable metric , called bottleneck (or matching) distance. The simplest version of this theory counts the components of the sub-level sets of [VUFF93].

For the sake of simplicity, in the rest of this paper we will assume that . For technical reasons, let us also assume that the topological space is finitely triangulable and has nontrivial homology in degree , and that the set contains the set of all constant functions.

Now, let us consider the set of all -invariant non-expansive operators (GINOs) from to .

In plain words, means that

  1. . ( is a -operator)

  2. . ( is non-expansive)

The symbol denotes the sup-norm.

In the example where is the space of all normalized grayscale images and is the group of rigid motions of the plane, a simple example of operator is given by the Gaussian blurring filter, i.e. the operator taking to the function .

Now, let us assume that is a subset of . For every we can consider the supremum of the bottleneck distances between the persistence diagrams (in the fixed degree ) of the functions , when varies in .

Since is the supremum of a set of pseudo-metrics, it is itself a pseudo-metric. Furthermore, for every and every the equalities hold.

We remark that the pseudo-distance and the natural pseudo-distance are defined in quite different ways. In spite of this, the following result can be proved [FJ16].

Theorem 2.1

If , then the pseudo-distance coincides with the natural pseudo-distance .

This fact suggests to study instead of .

We can prove that if is a compact metric space with respect to the sup-norm, then is a compact metric space with respect to the distance defined by setting

for every [FJ16].

As a consequence, we can also prove that if the metric space is compact with respect to the sup-norm and is a subset of , then for every a finite subset of exists, such that

for every .

This statement implies that the pseudo-distance (and hence also ) can be approximated computationally, at least in the case that is compact. As we have just seen, this is done by means of a collection of suitable operators, which takes the place of the observer in our model.

It is important to highlight that in the framework we have described the invariance group is a variable of our problem, and that its choice is completely assigned to the observer, according to the statement that only the observer is entitled to decide about shape similarity.

Many interesting questions remain open. The most important is probably the one of devising methods to build families

of -invariant non-expansive operators that are small and simple to compute, but still able to guarantee that the associated pseudo-metric is a good approximation of the natural pseudo-distance .

2.3 A simple case study in this model

In order to show the use of our approach, we have realized (jointly with Grzegorz Jabłoński and Marc Ethier) a simple demonstrator that illustrates how our model based on collections of

-invariant non-expansive operators (GINOs) could make available new methods for image comparison. The demonstrator (named GIPHOD–

-Invariant Persistent HOmology Demonstrator) is available at the web page http://giphod.ii.uj.edu.pl/. The program asks the user to choose an invariance group in a list and a query image in a dataset of quite simple synthetic images obtained by adding a small number of bell-like functions. After that, GIPHOD provides ten images that are judged to be the most similar to the proposed query image with respect to the chosen invariance group. In this case study, the dataset is a subset of the set of all continuous functions from the square to the interval . Each of them represents a grayscale image on the square (=white, =black).

GIPHOD works by using a collection of GINOs for each invariance group . This demonstrator tries to approximate by means of the previously described technique, based on the persistent homology of the functions for and varying in our set of operators.

2.4 Conclusions

The model that we have proposed to study is based on the idea that, from the mathematical point of view, a shape should not be considered as a subset or a submanifold of a Euclidean space, but as a quotient of the space of the signals that can be perceived by the chosen observer with respect to the action of a given invariance group . According to this model, each observer should be represented by a collections of group-invariant non-expansive operators acting on . This idea is supported by some formal results showing how the emerging theory of persistent homology could be used to study the approach to shape comparison that we have proposed in this position paper. We suggest that this approach could possibly contribute to bridge the semantic gap by means of the framework of topological data analysis.

Developments of the proposed model are presently the object of research. In particular, the extension of the model to the case of operators taking measurement data belonging to a space to functions belonging to a different space is under study. This extension seems promising for applications.

Another present research project concerns the study of the algebraic and topological properties of the spaces of GINOs.


The author thanks Sergio Rajsbaum for his valuable suggestions and advice concerning multiperspectivity. Special thanks to Marc Ethier, Massimo Ferri and Grzegorz Jabłoński for their precious help. The research described in this article has been partially supported by GNSAGA-INdAM (Italy), and is based on the work realized by the author within the ESF-PESC Networking Programme “Applied and Computational Algebraic Topology”.


  • [Car09] Carlsson G.: Topology and data. Bull. Amer. Math. Soc. 46, 2 (2009), 255–308. doi:10.1090/S0273-0979-09-01249-X.
  • [EH10] Edelsbrunner H., Harer J. L.: Computational Topology. An introduction. American Mathematical Society, Providence, RI, USA, 2010.
  • [FJ16] Frosini P., Jabłoński G.: Combining persistent homology and invariance groups for shape comparison. Discrete & Computational Geometry 55, 2 (2016), 373–409. doi:10.1007/s00454-016-9761-y.
  • [Fro09] Frosini P.: Does intelligence imply contradiction? Cognitive Systems Research 10, 4 (2009), 297–315. doi:10.1016/j.cogsys.2007.07.009.
  • [GFSF10] Giorgi D., Frosini P., Spagnuolo M., Falcidieno B.: 3D relevance feedback via multilevel relevance judgements. The Visual Computer 26, 10 (2010), 1321–1338. doi:10.1007/s00371-010-0524-0.
  • [HF07] Havemann S., Fellner D. W.: Seven research challenges of generalized 3D documents. IEEE Computer Graphics and Applications 27, 3 (2007), 70–76. doi:10.1109/MCG.2007.67.
  • [Koe90] Koenderink J. J.: Solid shape. MIT Press, Cambridge, MA, USA, 1990.
  • [Mau11] Mausfeld R.: Intrinsic multiperspectivity: Conceptual forms and the functional architecture of the perceptual system. In Interdisciplinary Anthropology: Continuing Evolution of Man, et al. W. W., (Ed.). Springer-Verlag, Berlin Heidelberg, 2011, ch. 2, pp. 19–54. doi:10.1007/978-3-642-11668-1.
  • [Sta07] Stahl B. C.: Positivism or non-positivism - tertium non-datur. a critique of ontological syncretism in is research. In Ontologies - A Handbook of Principles, Concepts and Applications in Information Systems, Kishore R., Ramesh R., (Eds.), Integrated Series in Information Systems. Springer US, 2007, ch. 5, pp. 115–142. doi:10.1007/978-0-387-37022-4_5.
  • [SWS00] Smeulders A. W., Worring M., Santini S., Gupta A., Jain R.: Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 12 (2000), 1349–1380. doi:10.1109/34.895972.
  • [VUFF93] Verri A., Uras C., Frosini P., Ferri M.: On the use of size functions for shape analysis. Biological Cybernetics 70, 2 (1993), 99–107. doi:10.1007/BF00200823.