 # A Distance Function for Comparing Straight-Edge Geometric Figures

This paper defines a distance function that measures the dissimilarity between planar geometric figures formed with straight lines. This function can in turn be used in partial matching of different geometric figures. For a given pair of geometric figures that are graphically isomorphic, one function measures the angular dissimilarity and another function measures the edge length disproportionality. The distance function is then defined as the convex sum of these two functions. The novelty of the presented function is that it satisfies all properties of a distance function and the computation of the same is done by projecting appropriate features to a cartesian plane. To compute the deviation from the angular similarity property, the Euclidean distance between the given angular pairs and the corresponding points on the y=x line is measured. Further while computing the deviation from the edge length proportionality property, the best fit line, for the set of edge lengths, which passes through the origin is found, and the Euclidean distance between the given edge length pairs and the corresponding point on a y=mx line is calculated. Iterative Proportional Fitting Procedure (IPFP) is used to find this best fit line. We demonstrate the behavior of the defined function for some sample pairs of figures.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Two geometric figures can be said to be similar if one of the geometric figures can be obtained by either squeezing or enlarging the other. This implies that the considered geometric figures need to have equal number of vertices and edges, matching corresponding angles, and a fixed proportionality between the corresponding edges. This concept of similarity can be used for partial matching of different geometric figures.

It is well known that geometric shapes and structures are important in determining the behavior of chemical compounds. This is true of smaller molecules  as well as larger macromolecules such as DNA and RNA that are studied in bioinformatics . Molecular geometry  is thus an important aspect of physical and structural chemistry. However, while it is also known that similarity in structures often implies similar observed chemical properties, there is yet no well defined mathematical approach for comparing geometric shapes, and comparisons are made on an ad hoc basis [12, 13]. Such an approach as proposed here would thus allow for a rigorous evaluation of such properties based on the similarity of shapes with molecules with known properties. Similarity in general has wide-ranging applications in many domains .

Image similarity and comparisons also play an important role in other domains, such as in models of visual perception and object recognition in humans as well as animals [16, 1], finance and economics 

, and video analyses . In such contexts also there is much scope for application of this work.

Existing theory in this matter is far from complete. There are heuristic approaches to morphological similarity

[7, 8], but no sound mathematical basis for the detection of geometric similarity. Geometric similarity is particularly important in engineering, in comparing a model and its prototype [6, 10], but there however does not seem to be a proper universal measure of geometric similarity. The measure in common use in engineering is merely scale-free identity, that all corresponding lengths should be in the same ratio—there is thus no way to properly measure inexact similarity, or to quantitatively state that a figure is more similar to a reference figure, than is some other figure.

Using subgraph isomorphism, alike constituent geometric figures of the original geometric figures can be found and checked for similarity. A simple similarity function can return a boolean value of 1 for similar geometric figures and 0 otherwise. However, such a function would have limited applications. In this paper, we define instead a distance function that returns a value between 0 (inclusive) and 1. The returned value reflects the dissimilarity between alike planar geometric figures connected with straight lines.

Therefore, the distance function is defined only when the graphs representing the given geometric figures are isomorphic . The crux of the function is in the measurement of deviations from angular similarity and edge length proportionality.

The function is the convex sum of functions and :

• The function , which we may call angular dissimilarity, measures the deviation from the angular identity between two geometric figures. In order to compute this, angles are projected on a cartesian plane, where the angles of the first geometric figure makes up one axis and the angles of the second geometric figure makes up the other axis. Therefore, a cluster of points in this plane represents corresponding angles of the given geometric figures. If the figures are similar (identical up to scale), the angular similarity property may be said to be satisfied, and the corresponding angle points lie on the line, and the value returned by is zero. If not, then the deviation from the property is now computed as the distance from the original point to the corresponding point on the line.

• The function , which we may call edge-length disproportionality, measures the deviation from edge-length proportionality between geometric figures. In order to compute this, the edge lengths are similarly projected to a cartesian plane, where the edge lengths of the first geometric figure makes up one axis and the edge lengths of the second geometric figure makes up the other axis. The corresponding edge lengths of the given geometric figures are represented as points in this plane. If two figures are proportional (identical up to scale), all corresponding edge-lengths are in a fixed proportion , all points pass through a line , and the value returned by is zero. In case the edge-lengths are not perfectly proportional, the calculation of comes to finding the best-fit line passing through the origin, and measuring the deviation from that line.

The choice of method to find the best fit line needs to consider the fact that the line should pass through the origin. Using the least-squares method of fitting  by adding as one of the corresponding edge-length pairs does not give a proper line passing through the origin. This is the reason that the Iterative Proportional Fitting Procedure (IPFP)  is used instead. IPFP tries to find a fixed proportion among a set of pairs, thereby giving points on the line passing through origin.

There are many IPFP , of which the one used in this paper is the classical IPFP , owing to its simplicity. On obtaining the required points from IPFP, the ratio between any two points gives the values of , as IPFP creates a fixed proportionality among a set of edge-length pairs.  D explains step-by-step the IPFP technique used in this paper. Further, to compute the deviation from the edge-length proportionality, we calculate the Euclidean distance between the original point and the corresponding point on the line . Sum up the Euclidean distances of all edge-length pairs. is computed using this sum and a scaling factor. As the considered geometric figures are alike, the scaling factor is the number of edges in any one of these geometric figures. The need for this scaling factor arises to account for the fact that in a large figure, with a large number of edges, a minor change is less significant in determining overall dissimilarity, than a corresponding change in a smaller figure.

The function is shown to be a distance function as it satisfies the three properties  required: satisfies the commutativity (Theorem 3.7) and triangular inequality properties (Theorem 3.8) defined over single geometric figures. However, the coincidence axiom is defined over equivalence classes of geometric figures (figures that are alike up to scale). The proofs for these properties are given later in this paper.

## 2. The Distance Function

The distance function, represented by , reflects the degree of dissimilarity between figures.

Let, be the set of straight edge figures for which the distance function is defined then

 γi=(Vi,Ei,Li,Θi)∈Γ

where denotes the set of vertices, is the set of edges, represents the set of corresponding edge lengths and denotes the set of angles that are defined between adjacent edges in terms of radian.

Further, if and are said to be “similar”, then and satisfy the below conditions:

1. If is a graph that represents the adjacency of figure and is a graph that represents the adjacency of figure , then graphs and are isomorphic.

2. All the corresponding angles of and are equal, i.e.,
if represent the set of angles of and
if represent the set of corresponding angles of , then

 (2.1) θi(1)=θj(1),θi(2)=θj(2),…,θi(z)=θj(z)

.

3. All the corresponding edge lengths of and are proportional, i.e.,
if represent the set of edge lengths of and
if represent the set of corresponding edge lengths of , then

 (2.2) lj(1)li(1)=lj(2)li(2)=…=lj(z)li(z)=m, a constant.

.

In view of this, the distance function tries to find the extent to which the considered figures deviate from conditions 2 and 3, provided condition 1 is satisfied.

###### Remark 2.1.

A few properties of the function:

1. , if and only if
where denotes that and belong to same equivalence class of figures,
i.e., are figures that are identical up to scale.

2. satisfies the following:

 (2.3) d(γi,γj)={0if γi≈γj,λ∈(0,1)otherwise.

## 3. Components of the Distance Function

### 3.1. Angular Dissimilarity

Let represent the angular dissimilarity function. Then the function is defined as:

 (3.1a) α:Γ×Γ→[0,1) (3.1b) α(γi,γj)={\ndownarrowif δ(gi,gj)=0,φ∈(0,1]otherwise.

where represents the graph isomorphism function.

 δ:G×G→{0,1}

with : set of all graphs.

 (3.2) δ(gi,gj)={1if gi≈gj,0otherwise.

In (3.2), the symbol denotes that and satisfy all properties of graph isomorphism.

Assuming is computed as follows:

Project each corresponding pair into a cartesian plane, wherein the -axis represents the set , while the -axis represents the set . The function computes the deviation from (2.1). In this cartesian plane, according to (2.1), all corresponding pairs must lie on the line:

 (3.3) y=x

For each point , calculate the Euclidean distance from its corresponding point on the line (3.3), i.e., .

 Λi,j(u) =√(θi(u)−θi(u))2+(θj(u)−θi(u))2 =√(θj(u)−θi(u))2 (3.4) =|θj(u)−θi(u)|

Therefore,

 (3.5) α(γi,γj)=∑nu=1Λi,j(u)1+∑nu=1Λi,j(u)
###### Remark 3.1.

A few properties of the function:

1. (which can be equal to 0), where

###### Proof.

We see that the constituents of the function are commutative:

 Λi,j(u)=|θj(u)−θi(u)| Λi,j(u)=Λj,i(u) , as|a−b|=|b−a|

This follows that , a constant

Hence,

 α(γi,γj) =∑nu=1Λi,j(u)1+∑nu=1Λi,j(u) =e1+e =∑nu=1Λj,i(u)1+∑nu=1Λj,i(u)

###### Proof.
 Λi,k(u)=|θk(u)−θi(u)| Λi,k(u)≤Λi,j(u)+Λj,k(u) ∵|c−a|≤|b−a|+|c−b|

Summing the above inequality for , it follows that

 (3.6) n∑u=1Λi,k(u)≤n∑u=1Λi,j(u)+n∑u=1Λj,k(u)

Let respectively.

The inequality (3.6) now translates to

 (3.7) e≤f+g

Assume that the contradiction of Theorem 3.3 is true, i.e,

 (3.8) α(γi,γk)>α(γi,γj)+α(γj,γk) e1+e>f1+f+g1+g

On simplification,

 (3.9) e>(f+g)+(2fg+efg)

As the quantity , the inequality (3.9) contradicts already proved inequality (3.7). Hence, (3.8) does not hold true, thereby proving Theorem 3.3. ∎

### 3.2. Edge-Length Disproportionality

Let represent the edge-length disproportionality function. Then the function is defined as:

 (3.10a) ρ:Γ×Γ→[0,1) (3.10b) ρ(γi,γj)={\ndownarrowif δ(gi,gj)=0,τ∈(0,1]otherwise.

Assuming is computed as follows:

Project each corresponding pair into a cartesian plane, wherein the -axis represents the set , while the -axis represents the set . The function computes the deviation from (2.2). Consider a part of the same equation.

 (3.11) lj(h)li(h)=m, a constant

In the context of the plane, (3.11) gives the slope of a line that passes through .

 Slope of a line, m =(y2−y1)(x2−x1) =(lj(h)−0)(li(h)−0)

Further extending this concept, it can be seen that in order to satisfy (2.2) all points should lie on the same line. Therefore, finding edge length proportionality now boils down to finding for the set of corresponding edge-length pairs the best fit line, which passes though origin.

Let the equation of the required line be:

 (3.12) y=mx, as the line passes through origin.

Using IPFP each point is transformed to , which is a point on the line 3.12.

On finding the desired line, the euclidean distance between and is computed.

 (3.13) Δi,j(h)=√(li(h)−l′i(h))2+(lj(h)−l′j(h))2

Therefore,

 (3.14) ρ(γi,γj)=∑nh=1Δi,j(h)n+∑nh=1Δi,j(h)
###### Remark 3.4.

A few properties of the function:

1. can be equal to .

###### Proof.

The proof is similar to that of Theorem 3.2. ∎

###### Proof.

The proof is similar to that of Theorem 3.3. ∎

### 3.3. Deriving the Function

The function is the convex sum of and .

 (3.15a) d:Γ×Γ↛[0,1) (3.15b) d(γi,γj)={\ndownarrowif δ(gi,gj)=0,βα(γi,γj)+(1−β)ρ(γi,γj),% where β∈[0,1]otherwise.

While computing using  (3.15b) in  A,  B,  C and  D, the value of is set to , to equally weight the and functions. However, other values of can be used resulting in similar outcomes for the function.

###### Proof.

According to (3.15b),

 d(γi,γj) =βα(γi,γj)+(1−β)ρ(γj,γi), where β∈[0,1] =βα(γj,γi)+(1−β)ρ(γi,γj), from Theorem 3.2 and Theorem 3.5

###### Proof.

According to (3.15b),

Multiplying by both sides of the inequality in Theorem 3.3, we get:

 (3.16) βα(γi,γk)≤βα(γi,γj)+βα(γj,γk)

Multiplying by both sides of the inequality in Theorem 3.6, we get:

 (3.17) (1−β)ρ(γi,γk)≤(1−β)ρ(γi,γj)+(1−β)ρ(γj,γk)

Summing up inequalities (3.16) and (3.17), it follows that:

 (3.18) βα(γi,γk)+(1−β)ρ(γi,γk)≤βα(γi,γj)+(1−β)ρ(γi,γj)+βα(γj,γk)+(1−β)ρ(γj,γk)

. ∎

## 4. Results

Using the above discussed method to compute the distance function, , this section tabulates the results for a few pairs of figures. It can be found that the values of in Table 1 are reflective of the dissimilarity of considered figures. The same can be said for and values.

## References

•  Donald S. Blough, The perception of similarity, Avian Visual Cognition (Robert G. Cook, ed.), September 2001.
•  P. E. Bourne and J. Gu, Structural bioinformatics, 2 ed., Wiley, 2009.
•  W. Edwards Deming and Frederick F. Stephan, On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are Known, The Annals of Mathematical Statistics 11 (1940), no. 4, 427–444.
•  Ronald J. Gillespie and Edward A. Robinson, Models of molecular geometry, Chemical Society Reviews 34 (2005), 396–407.
•  B. S. Grewal and J. S. Grewal, Higher Engineering Mathematics, 40 ed., Khanna Publishers, New Delhi, 2007.
•  Valentin Heller, Scale effects in physical hydraulic engineering models, ournal of Hydraulic Research 49 (2011), no. 3, 293–306.
•  Maciej Komosinski, Grzegorz Koczyk, and Marek Kubiak,

On estimating similarity in artificial and real organisms

, Theory in Biosciences 120 (2001), no. 3–4, 271–286.
•  Maciej Komosinski and Marek Kubiak, Quantitative measure of structural and geometric similarity of 3D morphologies, Complexity 16 (2011), no. 6, 40–52.
•  Michael Lahr and Louis de Mesnard, Biproportional Techniques in Input-Output Analysis: Table Updating and Structural Analysis, Economic Systems Research 16 (2004), no. 2, 115–134.
•  G. Pallett, Geometric similarity—some applications in fluid mechanics, Education + Training 3 (1961), no. 2, 36–37.
•  Walter Rudin, Principles of Mathematical Analysis, McGrawHill Inc., New York, 1976.
•  Ayusman Sen, Venkatasuryanarayana Chebolu, and Arnold L. Rheingold, First structurally characterized geometric isomers of an eight-coordinate complex. structural comparison between cis- and trans-diiodobis(2,5,8-trioxanonane)samarium, Inorganic Chemistry 26 (1987), no. 11, 1821–1823.
•  A. Stashans, G. Chamba, and H. Pinto, Electronic structure, chemical bonding, and geometry of pure and Sr-doped CaCO3, Journal of Computational Chemistry 29 (2008), no. 3, 343–349.
•  H. Stephen Stoker, General, organic, and biological chemistry, Cengage Learning, 2009.
•  Chris Sweeney, Laurent Kneip, Tobias Höllerer, and Matthew Turk, Computing similarity transformations from only image correspondences, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), June 2015, doi:10.1109/CVPR.2015.7298951, pp. 3305–3313.
•  Michael J. Tarr and Heinrich H. Bülthoff, Image-based object recognition in man, monkey and machine, Cognition 67 (1998), no. 1–2, 1–20, doi:10.1016/S0010-0277(98)00026-2.
•  J. R. Ullmann, An Algorithm for Subgraph Isomorphism, Journal of the ACM 23 (1976), no. 1, 31–42.
•  Maximilian Vermorken, Ariane Szafarz, and Hugues Pirotte, Sector classification through non-gaussian similarity, Applied Financial Economics 20 (2008), no. 11, doi:10.1080/09603101003636238.
•  David W. S. Wong, The Reliability of Using the Iterative Proportional Fitting Procedure∗, 1992, pp. 340–348.
•  Shiuh-Sheng Yu, Jinn-Rong Liou, and Wen-Chin Shen, Computational similarity based on chromatic barycenter algorithm, IEEE Transactions on Consumer Electronics 42 (1996), no. 2, 216–220.
•  Bilal Zaka, Theory and applications of similarity detection techniques, Ph.D. thesis, Institute for Information Systems and Computer Media (IICM), Graz University of Technology, Graz, Austria, 2 2009.