    # Towards Automated Discovery of Geometrical Theorems in GeoGebra

We describe a prototype of a new experimental GeoGebra command and tool Discover that analyzes geometric figures for salient patterns, properties, and theorems. This tool is a basic implementation of automated discovery in elementary planar geometry. The paper focuses on the mathematical background of the implementation, as well as methods to avoid combinatorial explosion when storing the interesting properties of a geometric figure.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In this technical paper we introduce a new GeoGebra command and tool Discover that is available in a development GitHub repository . This research is closely related to a former project  (see [3, 4, 5] for further details).

Given a Euclidean geometry construction drawn in GeoGebra, suppose a user wants to know if a given object has some “interesting features,” such as relevant theorems or properties. This object can be a point, a line, a circle, or something else, although in the current implementation will always be a point. Without any further user input, the Discover command will then analyze for its interesting and relevant features, and present them to the user as both a list of formulas and graphics outputs.

For example, let an arbitrary triangle, and let and be the midpoints of and , respectively (Fig. 1). Has some interesting features? Yes: is parallel to , independent of the position of , and . Indeed, the command Discover() confirms this observation with the output shown in Fig. 2; GeoGebra adds lines and in the same color (Fig. 3). (Note, however, that the current implementation of GeoGebra does not report that .) Also, the software reports the somewhat trivial finding that the segments and are congruent, with and highlighted in the same color. This output can be obtained by selecting the Discover tool in GeoGebra’s toolbox:

and then clicking on the point . This functionality is implemented in both GeoGebra Classic 5 and 6, available as an experimental software package called GeoGebra Discovery, at http://github.com/kovzol/geogebra-discovery. Figure 2: Output window of the Discover command that reports the Midline theorem Figure 3: Further output of the Discover command

What strategy is used in the background? First, all points are analyzed to determine whether they are the same as another point. Then, all possible point triplets are examined for collinearity. Next, all possible subsets containing four points on the figure are checked for concyclicity. With knowledge of the collinear points, separate lines can be uniquely defined, in order find whether they are parallel. Finally, considering the pairs of all possible point pairs, congruent segments can be identified. This strategy is a result of a combination of numerical and symbolic processes.

Our second example shows a more complicated setup. A regular hexagon is given in Fig. 4. Point is defined as the intersection of and , and, in addition, , . The points , and may have trivial differences in their numerical representations, but in the geometrical sense they should be equal. In the figure rounding was set to 13 digits to emphasize that GeoGebra computes objects numerically by default. Note that while the -coordinates of and numerically differ, the final calculations to prove that they are identical will be symbolic and exact.

Now we are about to learn if point has some interesting features, so the command Discover() will be issued. GeoGebra reports a set of properties in a message box (Fig. 5) and adds some additional outputs to the initial setup (Fig. 6).

Here, we see that concyclic points are reported as a single item and not as separate data. Also, parallel lines are classified into five different sets. Finally, there are three sets of congruent segments. This approach in computation and reporting helps avoid combinatorial explosion.

## 2 Mathematical background

The above mentioned strategies have some similiarities to the ones introduced in , but here we focus on minimizing the number of objects that have to be compared in the process that practically compares all objects with all other objects.

Our current implementation deals with points, lines, circles and parallel lines (or directions) and congruent segments.

A geometric point is a GeoGebra object, described by the GeoPoint class (see GeoGebra’s source code at github.com/geogebra/geogebra for more details). While we will not provide a detailed definition of a geometric point, generally speaking it is an object with a very complex structure containing two real coordinates, several style settings (including size and color, for example) and other technical details that are used in the application. Some geometric points are dependent of other geometric points or other geometric objects—this hierarchy is stored in the set of GeoPoints, too.

Independent of the detailed definition of a geometric point, we can still define the notion of point in our context.

###### Definition 1

A set of geometric points is called a point if for all different the points and are identical in general.

Henceforth, unless otherwise mentioned, we will consider points according to the definition above, not as geometric points.

Here, we do not precisely define when two points are identical in general. Instead, we will illustrate the concept of point identicality with the following example. Consider geometric points , , and that form a parallelogram. Now define and as the midpoint of and , and and , respectively. This setting implies that and are identical, because the diagonals of a parallelogram always bisect each other. In a dynamic geometry setting like GeoGebra, this simply means that by changing some points of the set , the points and will still share the same position in the plane. (See Fig. 7. Here the construction is controlled by the points , and only: they can be freely chosen, and based on them, the point is already dependent and uniquely defined as the intersection of the two parallel lines to and , respectively, through and .) Figure 7: Points P5 and P6 are defined as midpoints of opposite vertices of parallelogram P1P2P3P4

In fact, general truth includes statements that are not always true, but just “in most cases”—here we can think of some degeneracies that can occur in some constructions when some objects are degenerate. For example, altitudes of a triangle generally meet at a point—but not always, since a degenerate triangle “usually” has three parallel “altitudes”; unless two (or even three!) vertices of the triangle coincide. (See  for more details on the concept of general truth and degeneracies.)

###### Definition 2

A set of points is called a line if for all different the points , and are collinear in general.

For example, the set in Fig. 6 forms a line.

###### Definition 3

A set of points is called a circle if for all different the points , , and are concyclic in general.

###### Definition 4

A set of lines is called parallel lines (or a direction) if for all different the lines and are parallel in general.

###### Definition 5

A set of two points is called a segment.

###### Definition 6

A set of segments is called equal length segments (or congruent segments) if for all different the segments and are equally long in general.

In fact, GeoGebra Discovery uses a more general concept of being identical: it allows two points (or two objects) to have a kind of relationship also if it is true just on parts (see  for more details).

The main idea of storing the objects is that points, lines, circles, directions and equally long segments designate equivalence classes, that is:

###### Theorem 2.1

Let and be lines. Then, for all different points , if , then ; that is, .

###### Proof

In Euclidean geometry two points always designate a unique line.

###### Theorem 2.2

Let and be circles. Then, for all different points , if , then ; that is, .

###### Proof

In Euclidean geometry three non-collinear points always designate a unique circle.

###### Theorem 2.3

Let and be directions. Let and . If in general, then .

###### Proof

This follows immediately from the transitive property of parallelism.

###### Theorem 2.4

Let and be segments. Let and . If in general, then .

###### Proof

This is an immediate consequence of the transitive property of equality of lengths.

By using these theorems we can maintain a minimal set of objects during discovery.

Fig. 8 shows the objects identified during the command Discover() for the input in Fig. 4. The set of lines are not listed in the figure separately, but as a single entry at the bottom list of equally long segments. Also, some of the outputs are not particularly interesting features, such as lines with only point, circles with only three points, directions with only one line, or isolated equal length segments. Figure 8: The list of objects as shown in IntelliJ IDEA, a popular integrated development environment for Java

## 3 Examples of Discover with selected theorems

GeoGebra is a well-known and widely used software tool in education, with meaningful potential for using geometric discovery and exploration to teach elementary geometry. Even so, the range of mathematical knowledge is broad, including secondary school topics, international math competitions, and higher level mathematics. Below we examine selected theorems confirmed in the current implementation of Discover.

### 3.1 The diagonals of a parallelogram bisect each other

We already mentioned this simple theorem. The problem is shown in Fig. 7. With discovery on point , the applicable theorems are reported in Fig. 9.

### 3.2 Euler line

The Euler line is a line determined from any triangle that is not regular. It passes through the orthocenter, the circumcenter and the centroid. The problem is shown in Fig. 10.

With discovery on point , the relevant theorems are listed in Fig. 11.

The Euler line theorem implicitly includes several simple theorems, including concurrency of the medians of a triangle (, the generated points being the pairwise intersections of the medians), concurrency of the altitudes (, these points being the pairwise intersections of the altitudes), and concurrency of the perpendicular bisectors of the altitudes (, pairwise intersections as above).

### 3.3 Nine-point circle

The nine-point circle passes through nine significant points of an arbitrary triangle, namely:

• the midpoint of each side of the triangle,

• the foot point of each altitude,

• the midpoint of the line segment from each vertex of the triangle to the orthocenter.

The problem setting is shown in Fig. 12.

With discovery on point , the appropriate theorems are reported in Fig. 13.

The nine-point circle theorem implicitly includes several other simple theorems. In addition, the graphical result suggests further theorems: segments , , and are congruent and concurrent; these three segments are also diameters of the nine-point circle; and their intersection designates the center of the nine-point circle. By using another discovery this can be confirmed.

### 3.4 A contest problem

In 2010, at the 51st International Mathematics Olympiad in Astana, Kazakhstan, the following shortlisted problem was proposed by United Kingdom:

Let be an acute triangle with , , the feet of the altitudes lying on , , respectively. One of the intersection points of the line and the circumcircle is . The lines and meet at point . Prove that .

After constructing the according figure with GeoGebra Discovery (Fig. 14), Figure 14: Problem setting for a shortlisted contest problem at IMO 2010

we start discovery on point . The discovered theorems appear in Fig. 15. We learn a few unexpected properties: , and the concyclicity of points , , , , and , , , .

### 3.5 Pappus’s hexagon theorem

Consider two sets of collinear triplets , , ; and , , . The intersection points , , are created. Pappus’s hexagon theorem (Fig. 16) claims that the points , and are collinear (in general, after assuming certain non-degeneracy conditions).

With discovery on point , the theorem is reported in Fig. 17. This final example is more commonly discussed at the university level, rather than in secondary school.

*

As a final note we highlight that the user interface for the geometric discovery is designed to be easy for non-experts as well. One does not need to use anything else but the mouse pointer to obtain all the information.

## 4 Discussion

### 4.1 Trivial statements and theorems

In Fig. 2 the collinearity of points , and and of points , and were not reported. This is intentional: by defining as the midpoint of we implicitly assumed this collinearity, so it does not make any sense to reiterate this. Therefore, it seems useful to make a distinction between trivial statements and theorems.

The question of which properties are considered trivial or not is at some level a judgment call. In Fig. 2 most users may regard the information as trivial, with being the midpoint of . On the other hand, for beginners this information may still be useful.

At the moment GeoGebra Discovery maintains some background information if the obtained theorem is to be displayed or not. For example, in Fig.

9, the collinearity of , and is considered trivial and not displayed, but the fact that is presented as non-trivial. By considering both of these ideas, the collinearity of , and could be considered either trivial or non-trivial—currently it is considered as trivial and not shown. The decision process for such questions should be clarified in the future.

### 4.2 Combinatorial explosion and computational complexity

Despite the large number of possible statements, the combinatorial complexity is still polynomial, because from a given set of input objects we need to select just at most four objects (four objects are required to confirm concyclicity.) On the other hand, by using the classes of the equivalence relations, the number of statements to be checked can be decreased significantly.

For each possible statement, a numerical check is first performed. We assume that this is always successful when a generally true statement is about to check. Unfortunately, in reality this is not always the case, because for some exotic coordinates, the numerical check can be completely misleading. For example, some very large numbers can result in numerically unstable computations. Regardless, if a numerical check is positive, then the statement is added to the list of conjectures, but if it is negative, no conjecture is registered. As a consequence, while our implementation may miss some true statements (due to numerical errors), it will not output false statements.

For each conjecture, a symbolic check will be performed. If the symbolic check is positive, then the statement will be saved as a theorem. If the symbolic check is negative, then the statement will be removed from the list of conjectures. If the symbolic check cannot decide if a conjecture is true or false, the conjecture is removed from the list.

A special case of a conjecture is for each two geometric points. If this conjecture cannot be proven or disproven symbolically, then the discovery process will be completely stopped and the user will be notified that the construction must be redrawn in a different way—otherwise no output can be produced. This exception is required to keep the internal data consistent.

Symbolic checks usually require more time than numerical verifications. The underlying computation uses Gröbner bases that require at most double exponential time of the number of variables  according to the given figure. Usually, the number of variables are double the number of geometric points in the figure (since there are two coordinates for each).

GeoGebra internally sets 5 seconds for the maximal execution time of each symbolic check. After timeout the result of the symbolic check will be undecided.

### 4.3 User interface enhancements

GeoGebra is designed with a straightforward user interface that asks the users no questions if possible. However, its usability could be improved for situation when the user wishes to limit the output by filtering or excluding certain relationships.

Currently only points can be investigated. In a future version a set of points, segments, lines, circles or a set of these should be permitted as input.

Currently the computation process cannot be interrupted by the user. Given a large number of points in the figure, the calculation can be time consuming. For example, investigating the relationships of a regular 20-gon may require about 4 minutes on a modern personal computer (in our test a Lenovo ThinkPad E480 with 8i7, 16 GB RAM, Ubuntu Linux 18.04, was used). See Fig. 18 for the output.

The version that is based on GeoGebra Classic 5 performs better than the one on Classic 6—the latter is a web implementation of the GeoGebra application and uses a WebAssembly compilation of the computer algebra system Giac. Even if the code is reasonably fast as embedded code in a web page, this latter version underperforms the native technology: the same hardware is unable to handle the input of the regular 20-gon, and the browser tab crashes after 12 minutes of computation. (Google Chrome 83 was used for testing.)

### 4.4 Colors

At the moment a limited set of colors is used to highlight parallelism and congruence. In the future a pre-defined sequence of distinguishable colors should be added to GeoGebra Discovery—for example, at the moment in Fig. 5 the same black color is used to highlight different sets of parallel lines.

### 4.5 Perpendicular lines

Perpendicular lines play an important role in elementary planar geometry. Their detection and presentation are not yet implemented in GeoGebra Discovery. Here we mention that the relationship of perpendicularity is not an equivalence, in contrast to the previous relationships defined in Section 2. On the other hand, if and are directions, if and , the relationship implies perpendicularity for all and , that is, .

It seems convenient to color perpendicular lines with the same color. So a rectangular grid can be observed for each pair of directions and whose representative lines are perpendicular, accordingly. Fig. 19 shows an example that includes four rectangular grids for the parallel diagonals of a regular octagon. Figure 19: Four rectangular grids describing the parallel diagonals of a regular octagon

### 4.6 Angles

In a complex algebraic geometry setting, the study of angles is not as straightforward as investigating other objects. For a future version, however, this feature would be an important improvement.

By combining algebraic and pure geometric observations, however, simple theorems on angle equality could be easily detected. For example, Fig. 15 states that points , , , are concyclic. The inscribed angle theorem automatically implies , among others.

### 4.7 Stepwise suggestions

Prior research (see [10, p. 46]) proposed that collecting the interesting new objects in a figure could be done stepwise, similarly to GeoGebra’s former feature “special objects.” For our midline theorem example (Fig. 1), this meant that after constructing the triangle , and then midpoint , the segments and were automatically shown by the system. The user could then accept these newly generated segments or remove them from the system. Then, by creating midpoint , the system could show lines and to visualize parallelism.

Actually, the “special objects” feature was recently removed from GeoGebra after some negative feedback from the community—many users found this feature confusing. As a consequence, adding stepwise suggestions in GeoGebra Discovery remains a question for future research.

### 4.8 Benchmarks

There is no benchmarking suite for the Discover command yet. This should be addressed in the next phase of the development.

## 5 Related work

We now discuss several projects that share some similarity to GeoGebra Discover but differ significantly in meaningful ways.

First of all, GeoGebra Discovery is not the first tool that systematically displays confirmed theorems in a geometric figure. We refer the reader to

These systems are available free of charge, but without the source code. On the other hand, GeoGebra Discovery focuses on an intuitive user interface and proofs in the most mathematical sense.

Second, we highlight that there is a growing interest in creating algorithms related to success completion of secondary school or undergraduate mathematics entrance exams. (See , , 

, among others.) Sometimes these projects rely significantly on techniques used in the underlying computational methods. Also, these projects are often related to artificial intelligence and Big Data rather than to computational mathematics.

Third, we mention a theoretical issue. The idea to store a geometric point only once if it is identical to another one was previously described in Kortenkamp’s work [15, 9.3.1]. This concept is a main design element in the dynamic geometry software Cinderella, which never stores a geometric point twice if the two variants are identical in general.

GeoGebra has a different design concept by allowing the user an arbitrary number of identical points to be defined. From the theorem prover’s point of view, GeoGebra’s concept is more difficult to handle, and a kind of translation is required to have a different data structure by using the concepts from Section 2.

Also, we note that GeoGebra Discovery proves the truth in a different manner from Cinderella, with Cinderella using a probabilistic method, and GeoGebra Discovery literally proving all the deduced facts.

## 6 Conclusion

We described a prototype of the Discover command that is available in an experimental version of GeoGebra, called GeoGebra Discovery. Our current implementation can be directly downloaded from https://github.com/kovzol/geogebra/releases/tag/v5.0.591.0-2020Jul16.

Our work is still in progress, as noted with the issues listed in Section 4.

## 7 Acknowledgments

The Discover command is a result of a long collaboration of several researchers. The project was initiated by Tomás Recio in 2010, and several other researchers joined, including Francisco Botana and M. Pilar Vélez, to name just the most prominent collaborators. The development and research work was continuously monitored and supported by the GeoGebra Team. Special thanks to Markus Hohenwarter, project director of GeoGebra.

The work was partially supported by a grant MTM2017-88796-P from the Spanish MINECO (Ministerio de Economia y Competitividad) and the ERDF (European Regional Development Fund).

## References

•  Kovács, Z.: GeoGebra Discovery. A GitHub project (2020) https://github.com/kovzol/geogebra-discovery.
•  Botana, F., Kovács, Z., Recio, T.: Automated Geometer. A GitHub project (2018) https://github.com/kovzol/ag.
•  Botana, F., Kovács, Z., Recio, T.: Automated Geometer, a web-based discovery tool. Presentation at ADG-12, Nanning, China (2018)
•  Botana, F., Kovács, Z., Recio, T.: Towards an automated geometer. Presentation at AISC-13, Suzhou, China (2018)
•  Botana, F., Kovács, Z., Recio, T.: Towards an automated geometer. In Fleuriot, J., Wang, D., Calmet, J., eds.: Artificial Intelligence and Symbolic Computation. Volume 11110 of Lecture Notes in Artificial Intelligence., Springer International Publishing (2018) 215–220
•  Chen, X., Song, D., Wang, D.: Automated generation of geometric theorems from images of diagrams. Annals of Mathematics and Artificial Intelligence 74 (2015) 333–358
•  Chou, S.C.: Mechanical Geometry Theorem Proving. Springer Science Business Media (1987)
•  Kovács, Z., Recio, T., Vélez, M.P.: Detecting truth, just on parts. Revista Matemática Complutense 32 (2019) 451–474
•  Mayr, E., Meyer, A.: The complexity of the word problem for commutative semigroups and polynomial ideals. Advances in Mathematics 46 (1982) 305–329
•  Kovács, Z.: Towards a new GeoGebra Geometry App. Presentation at MatemaTech Seminar for teachers, České Budějovice, Czechia (2019)
•  Magajna, Z.: An observation tool as an aid for building proofs. The Electronic Journal of Mathematics and Technology 5 (2011) 251–260
•  Fu, H., Zhang, J., Zhong, X., Zha, M., Liu, L.: Robot for mathematics college entrance examination. In: Electronic Proceedings of the 24th Asian Technology Conference in Mathematics, Mathematics and Technology, LLC (2019)
•  Seo, M., Hajishirzi, H., Farhadi, A., Etzioni, O., Malcolm, C.: Solving geometry problems: Combining text and diagram interpretation.

In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. (2015) 1466–1476

•  Fujita, A., Kameda, A., Kawazoe, A., Miyao, Y.: Overview of Todai robot project and evaluation framework of its NLP-based problem solving. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14). (2014) 2590–2597
•  Kortenkamp, U.: Foundations of Dynamic Geometry. PhD thesis, ETH Zürich (1999)