George Polya argued that symmetry plays an important role in the inductive phase of complex problem solving by reducing and ordering the observable facts . Captivated by visual diagrams of Polya’s work in crystallography, M.C. Escher created a systematic organization of geometrical transformations and enshrined symmetry as the principal rule underlying his art . Without systematic knowledge of the mathematics governing patterns of symmetry, he created his own ”layman’s theory” of symmetry, duality, infinity, and paradoxes. Escher comes to the open gate of mathematics by exploring how concepts like repetition, rotation and reflection shape our interpretation of boundaries between shapes .
The art work Escher produced over his lifetime profoundly challenges our visual perception of the world. Equally interesting, most humans are capable of understanding and appreciating the beauty of Escher’s drawings even in the absence of previous experience with them. Consider, for example, just the two drawings shown in Figure 1. It is easy to see some of the concepts of symmetry – such as translation, rotation, and reflection – and this invites questions about the nature of cognitive processes when we perceive this kind of art.
Indeed, Gestalt psychology has long proposed symmetry as organizing principle of geometric intelligence ,. Gestalt theories suggest that human cognition uses repetition (translational symmetry), foreground/background (rotation and reflection symmetry) in creating and making sense of art . In sharp contrast, many AI techniques for geometric intelligence rely on object detection as a primary source of power ,. This raises the basic question motivating our work: Might it be possible to build artificial intelligence (AI) agents that use the principles of Gestalt psychology to make inferences about geometric patterns and transformations?
In an attempt to provide algorithmic answers to these questions, we start with Dehaene’s test of geometric intelligence  that shares several themes with Escher’s more intricate structures. Dehaene et. al. describes symmetry as a geometrical language that adults and children can comprehend regardless of their culture and background . In fact, Dehaene developed the test containing 45 problems to examine whether humans brought up in a technologically advanced civilization with the benefit of formal education including geometry performed better than subjects from a technologically primitive society with little formal education.
Dehaene’s test eschews the use of geometric objects such as triangles and instead relies on more abstract concepts such as closure. All 45 problems on the test explore various aspects of core geometry, such as Euclidean geometry, topology, symmetrical figures, metric properties, and geometric transformations . Each problem is an array of six images where one violates the displayed concept, and the test taker is tasked with identifying it as the one that breaks the structure. Figure 2 shows an example that highlights the above mentioned concept of closure. Although at first glance symmetry is not explicit in Figure 1, we will show below that specific representations of the drawings in Figure 1 derived from Euclidean transformations capture the latent symmetry and order in the drawings.
Although problems on the Dehaene’s test clearly are different from than Escher’s more intricate drawings, they nevertheless entail similar, if simpler, kinds of abstract reasoning. Abstract reasoning on Dehaene’s test require the ability to infer higher-level concepts such as relations, symmetries, and complex patterns from low-level pixel representations. According to Dehaene, using geometrical tests with perceptually accessible features such as shapes, positions, and between-object relations, a human capacity to reason abstractly can be measured independent of their culture, language, or experience. This brings us back to the research question motivating our research. If Gestalt principles underlie human perception, might the same principles form the basis for building AI agents capable of addressing problems on the Dehaene’test? Further, if AI agents based on Gestalt principles indeed can be built for Dehaene’s test, what may that tell us about human cognition?
Lovett & Forbus (2017) provide a detailed review of psychological models and studies: A common feature among many of these studies is focus on similarity and especially analogy. Carpenter et al. provide a detailed cognitive model of problem solving on the Raven’s Progressive Matrices Test of general human intelligence . Their model is based on the traditional production system architecture in which the agent has access to a variety of rules that capture the range of geometric patterns that occur in the Ravens test.
Lovett, Lockwood & Forbus (2008) view problems on the Ravens test as geometric analogy problems  and use the structure-mapping theory of analogy to address them. They describe a cognitively inspired approach that detects geometric shapes from an input drawing on the Ravens test, constructs spatial representations of relations among the objects, and then applies the structure-mapping technique for addressing the problem .
Kunda, McGreggor & Goel also view the Raven’s test as a set of visual analogy problems. However, in contrast to Carpenter et al. and Lovett et al., they use affine transformations, such as translation, rotation and reflection, directly on pixel-level representations to address the Raven’s test including the Standard, Color and Advanced Raven’s test . Given an input image, their ASTI method interprets the drawing in terms of linear combinations of affine transformations and completes the problem in terms of transformation combination. An interesting aspect of the ASTI computational model is that it does not have prior knowledge of geometric objects and does not need to detect objects, and yet its performance is comparable to that of earlier methods.
McGreggor, Kunda & Goel describe a method that makes analogies based on fractal representations. Given an input image, their technique called FAR first builds a fractal representation of the input and then uses similarity and analogy to address the problem. The FAR technique has been successfully used not only for the Ravens test of intelligence, but also for the Odd-One-Out test  and the Dehaene’s test . An interesting aspect of the self-similar fractal representations is that the FAR technique can automatically change the level of resolution to best match the given problem. Like ASTI, FAR too does not detect objects in the input images.
In our own earlier work, we have developed a method called Structural Affinity to address problems on the Raven’s test of intelligence 
. The method uses Markov Random Fields parameterized by affinity factors for first learning the underlying rules as described in Carpenter’s work, and subsequently recognizing the pattern in order to make a prediction. The strong emphasis is on discovering topologies that do not rely on object detection, but instead represent features for the type of relationship between images. Not to be confused with image similarity methods, the Structural Affinity captures thegeneration rule that represents the abstract reasoning ability.
As noted above, many previous computational models and techniques have viewed tests of geometric intelligence in terms of extant theories of similarity and analogy. In our current work, we affirm that symmetry plays a fundamental role in how an input image is perceived and forms the basis for analogy. Dehaene et. al. deliberately minimizes the cues of the concept by randomizing the perceived features. For example, by varying the orientation of the objects or modifying size, the concept becomes obscured. We conjecture that a suitable representation can undo the complexity and highlight the right context for the desired concept.
Structural Affinity for Core Geometry
In the current work, we build upon the idea of the agreement by constructing different types of affinity factors each representing a property of the geometrical concepts covered in Dehaene’s test . The AI agent asks a series of questions from each image and determines the one weird
image in two steps: 1) identify the most relevant properties that attribute to significant deviations 2) rank images by the largest contribution of variance.
A distinctive feature of the Dehaene’s test is that each problem with six images demonstrates a single concept with small variations. The problem is considered solved if a test taker identifies one out of six images that contains variation not explainable by the concept. Therefore, the aim of the proposed method is to identify the most likely concept by 1) unifying the representation, i.e. reducing superfluous features, and 2) ranking the remaining features by the strength of the signal towards a single pattern.
We observe that in general, Dehaene’s problems exhibit geometrical primitives that fall into specific types of symmetry classes, even in cases where the symmetry concept was not directly targeted. For example, problems shown in Figures 4 and 5 with intention to highlight the properties of Euclidean geometry, such as distance, can also be interpreted with symmetry - reflective and rotational. Thus, with the assumption of preserving the symmetry, our normalization method rotates and translates the original images re-mapping the pixels to the common axes. This normalization intensifies the most relevant features while removing variations intended to obscure the concept.
We observe that in some instances, the described feature ablation removes information critical to the recognition of the concept. For example, the concept of chirality is being erased by the transformation where pixels are reoriented around principal components (see Figure 6). This is a result of the underlying property of chirality - an object cannot be mapped into its mirror image by rotation and translation only. Therefore, there is no symmetry operation (in 2D) that would preserve object’s invariance.
Image Segmentation Phase
A Dehaene’s problem is represented as a 3x2 matrix of visual entities that capture geometrical shapes and transformations. To represent the problem in the structural affinity framework, we, first, identify the grid and segment the images into its six components. By applying the segmentation, the image is ready to be ingested by our computational model without additional image pre-processing steps.
The segmentation phase for one problem produces six panels that are subsequently transformed into an array where each cell takes on a binary value: 1 if the color intensity of the pixel is above a certain threshold , and 0, otherwise. Prior to reading the pixels into an array, the image is optionally cropped to remove the pixels associated with a plain text (typically at the top left of the first image).
As the Dehaene’s problems target geometry concepts, it makes sense to represent the images in the Cartesian coordinate system instead of amatrix where and are the height and the width of the given image. The binary values (0,1) are mapped to real numbers . This representation returns a set of points in the coordinate system generated by the Equation 2.
We use Principal Component Analysis (PCA) method tounrotate the figures and obtain the coordinate axes that contain the maximum variance . Figure 3(b) shows the result of a successful re-orientation that removes the randomness in the original axes and brings the symmetry feature into focus.
The design of the features is inspired by the classical principles in Gestalt perception - proximity, similarity, common fate, good continuation, closure, symmetry, parallelism 
. The set of simple heuristics is used for building the knowledge representation. The intentions is to design functions that can serve as cues for the underlying concept.
Color concept can be encapsulated with a density function, that counts number of pixels of varying intensity. During the representation phase, the pixels are transformed into coordinate points, therefore reducing the density function to a count of points.
Orientation concept must be captured before the PCA transformation is applied that rotates the figures to the simplest structure. By computing Pearson correlation coefficient, we obtain the amount of linear relationships between points .
For Topology concepts, such as inside/outside, closure, connectedness, and holes, we design several features - contour count and child-parent relationship between contours. For example, Figure 2 that highlights closure contains one figure that is inconsistent with the other figures with regards to the number of contours. The odd figure shows a disjointed curve whereas the consistent with the concept figures shows curves that join continuous points.
Symmetrical Figures after PCA transformation are automatically aligned along the axes of symmetry, thus computing the discrepancies between the alignment points in the upper and lower quadrants, will give a cue on how symmetrical is the figure.
First, we collapse
the points to a single vector by computing the average value per x-coordinate. In symmetrical figures, the vector should contain values close to zero.
where is the average value of the points where x=k.
To obtain a scalar measure of the symmetry feature, we subsequently compute the variance of the vector - that captures the overall adherence to the concept. Likewise, we compute an average vector and y-axis along with its scalar representation via variance - .
The similar features are relevant for Geometrical Figures concept. Certain figures can be mapped into itself in their lines of symmetry. For example, a square has four lines of symmetry, whereas a rectangle has only two. An equilateral triangle has three symmetry lines, whereas an isosceles triangle has only a bilateral symmetry. Designing several functions that capture the heuristics of symmetry nature (reflection, translation, rotation), in conjunction with other features can identify the most inconsistent with the concept figure.
Euclidean Geometry (e.g., line, points, parallelism, and right angle) requires a subset of features defined above. Figure 4 suggests that a feature that computes a variance along axes can easily identify the weird image that violates the consistent measure across the remaining figures.
Figure 4: Alignment of Points Concept. In (a) the concept is shown in its original raw form that include an orientation noise not relevant to the concept. In (b) the transformation applied highlights the single concept of alignment
Problems that assess the ability to detect Metric Properties, such as distance between objects, center and middle segments, are likewise dependent of the symmetry features. In problems where the center of an object is intentionally offset, a feature that creates a mapping between upper and lower quadrant points, exhibits a gap, i.e. a number of points from that upper quadrant that do not have a pair from the lower quadrant (see Figure 5).
Figure 5: Center of Circle. In (a), the concept is shown in its original raw form that includes circles of different sizes and positions. In (b), the transformation applied highlights the single concept of the location of the center point
Chiral Figures require an ability to perform mental rotation in order to align the figures in the same axes for comparison. Figure 6(a) shows an example of chiral figures rotated on oblique axes. In this scenario, applying PCA Figure 6(b) removes the information needed to distinguish between the figures making them identical.
Figure 6: Chirality on Oblique Axes. In (a), the concept is shown in its original raw form that includes chiral figures in random axes. In (b), the transformation applied removed the rotation which made the all the figures identical.
Geometrical Transformations involve translation, homothecy, symmetrical reflection, and rotation. Arguably, this is the most difficult mathematical concept, especially when given in static images . All previously described features are applicable here as well, although the more confident answers are achieved using a combination of several features.
Problem Solving Phase
The general process of our computational model is presented in Figure 7. The algorithm starts with the segmentation and visual encoding as described in the sections above. After the raw features are extracted and transformed, a filtering method is applied to select the most prominent feature(s) that might hold the cues for identifying the discrepant image.
where is a feature extracted from all six segmented and encoded images for the original problem ;
and - the standard score computed for the feature ; in our experiments
in order to select features where an element is at least two standard deviations from the mean.
The intuition behind applying a z-score to the raw features is that the core of the Dehaene’s test is to detect violations of some desired geometry concept. Therefore, we expect that most pertinent feature will contain an anomaly that can be captured with metrics specifically designed to detect outliers. Thus, if the set of figures contain aweird attribute, the feature will draw an attention to it with an anomalous z-score. If none of the extracted features exceed the threshold, the extraction process is deemed insufficient, and the problem is skipped.
To identify the discrepant image each of the filtered features votes for one of the six candidates, and the solution is selected by choosing the candidate with the highest number of votes.
Ideal are scenarios in which filtering yields a single feature, and no voting is required. The ambiguity only arises where more than one feature holds an attribute of the underlying geometry concept. For example, symmetry concept can be captured with different high-level heuristic functions, such one-to-one point mapping, or distance to the center line. For ties, voting is repeated with a feature preference weight (simpler features are preferred.)
Additionally, the algorithm applies aggressive rounding of the computations across all phases - from representation to transformation, and scoring. This reduction in precision helps with dealing with noise when working with raw images .
Table 1 summarizes the performance of the designed algorithm per metric as presented in the Dehaene et. al. experiment. The total accuracy measure is 89% with 40 out of 45 problems solved correctly. The last column in the table records the performance of Mundurukú participants. Problems involving basic concepts of geometry, such as topology, geometrical figures, lines and angles, have unambiguous features, and therefore are simpler (100% correct). Problems that involve transformations are more complex as they involve mental translation, reflections, scaling, and rotations. The lower performance on those problems is consistent with the findings reported by Dehaene et. al. Problems that exercise the concept of chirality (number of examples = 4) average to a 50% accuracy with a significant difference between figures shown in vertical vs. random (oblique) axes ( 85% and 23% correspondingly).
|Concept||true||total||ratio||human [head to column names]figures/dehaene_results_per_concept.csv|
Comparison to Other Computational Models
In this paper, we demonstrate that by leveraging Gestalt principles of perception in the design of algorithms for problem solving, we can approximate the humanlike thinking on core geometry tasks. Unlike models that rely on objects and borders detection for qualitative representations , , we describe a computational model that 1) transforms input images into more perceptually coherent variations, 2) scores the resulting representations against pre-defined properties (such as symmetry, rotation, and other geometrical concepts), and 3) identifies an instance where scores are in disagreement with the rest of images. Lovett at al.  and Lovett and Forbus  integrate four different systems for addressing visual oddity tasks, and solve 39 out of 45 problems correctly. The advantage of our approach is that it solves a similar number of problems correctly (40 out of 45), and it does so while satisfying the parsimony characteristic, i.e. striving for the simplest theory that can explain the intent in the discussed geometry problem. Parsimony in this context is concerned with the problem solving behavior, and it serves as a cognitive characteristic of gifted individuals .
In a computational model that employs fractal representation for reasoning , visual oddity tasks are addressed with a notion of visual similarity that operates on varied levels of image representation - from coarse to refined. In addition to providing solutions, the authors compute confidence measures (30 unambiguously correct, and 15 correct but ambiguous) and compare them with the human performance. An analysis of ambiguity allows tuning the levels of coarseness to determine the best strategy for representation. McGreggor et al. claim that their algorithm is parsimonious because it does not use additional mechanisms for problem solving phases . However, simplicity is not always a sufficient measure of parsimony; nor does simplicity always offer a robust explanation of the behavior. Our approach is based on scoring images by their explainability with concepts such as symmetry, topology, and Euclidean geometry properties. An image is considered odd if its characteristics do not match characteristic of the rest of the images. An analysis of agreements and violations helps highlight the most pertinent attributes of the problem, thus giving an explainable answer.
An algorithm of visual perception that is intuitive based on Gestalt principles can infer high level meaning between pixels and their relationships. This in turn increases the capability to understand and reason in the context of more complex images.
One of the main questions raised by Dehaene et. al. is whether or not core geometric intelligence is inherent in the human mind . This question is complex and it depends on how the geometrical concepts are represented, and what type of mental operations are invoked when assessing a concept. Therefore, the research presented here is motivated by two goals: 1) to understand what transformations are ubiquitous for exploring geometrical notions, i.e. what kind of abstract reasoning is most relevant for dissecting a geometry problem, and 2) is there an underlying organizing principle that governs our perception of shapes, and their ability to be transformed into similar, but more perceptually coherent objects.
To answer (1) we developed a computational model that transforms encoded images that hold geometry concepts using a mathematical mechanism helpful for discovering structure and relations among the variables. We found the applying PCA (principal component analysis) can be interpreted as performing mental transformations, such as rotations of the coordinate axes to align the studied shapes. In the Dehaene’s problems, finding the principal components is a viable explanation of how mind removes irrelevant differences between shapes in order to identify the central geometry concept. The resulting structural uniformity brings into focus those aspects that violate the desired geometrical concept.
To address (2), we observe that most Dehaene’s problems exhibit symmetry as a latent (unless directly specified) principle. Gestalt theory suggests that visual perception is frequently driven by the tendency to maximize the appeal of the shapes and their connection with other surrounding shapes . Therefore, our computational model benefits from viewing the geometry problems through lens of symmetry. For example, Figure 5 is probing the concept of metric properties, specifically a center of a circle. One way of reasoning about that in concordance with a metric concept is to consider the distance of the inside point from all of the circumference points. Alternatively, it may be reasoned with the concept of symmetry - the misalignment of the inner point from the center in one of the figures breaks the symmetry group of the circle object, i.e. it is not possible to map the figure on itself using rotations or reflections. Similar reasoning is applied to the other problems in the Dehaene’s set suggesting that symmetry may be the underlying organizing principle for basic geometry concepts.
M.C. Escher’s engagement in mathematical ideas through art brings diverse audiences towards a better understanding of the fundamental geometric rules governing complex shapes and transformations . Escher absorbs himself in the playfulness of the recognizable figures (motifs) by suppressing the distinction between foreground and background .
The self-organizing processes of the brain described by Gestalt principles enable spontaneous switching between the functions of the figure and ground . Thus, it is reasonably easy for a human to identify the motifs in Escher’s drawings without requiring substantial previous exposure and training in geometry concepts. In the next stage of our research, we will probe the ability of AI agent to infer the organizational functions in the figure/ground images.
The main focus of this work was to encode the general properties of grouping principles during perceptions of geometrical primitives in Dehaene’s images. In future work, we intend to expand the arsenal of geometrical concepts that can be organized with various types of symmetrical operations and groups. In addition to identifying strictly repeating motifs (gliding symmetry), we will explore strategies for extracting motifs for M.C. Escher’s tessellation works by analyzing each of the seventeen symmetry groups . We also hope to deepen the interpretative capabilities of AI solutions by building the bridge between Gestalt principles of perception and algorithmic object manipulations for visual reasoning.
-  (2017) The language of geometry: fast comprehension of geometrical primitives and rules in human adults and preschoolers. PLoS computational biology 13 (1), pp. e1005273. Cited by: §1.
-  (2009) Pearson correlation coefficient. In Noise reduction in speech processing, pp. 1–4. Cited by: 2nd item.
-  (1981) Perception of symmetry in infancy.. Developmental psychology 17 (1), pp. 82. Cited by: §1.
-  (1990) What one intelligence test measures: a theoretical account of the processing in the raven progressive matrices test.. Psychological review 97 (3), pp. 404. Cited by: Related Work.
-  (2006) Core knowledge of geometry in an amazonian indigene group. Science 311 (5759), pp. 381–384. Cited by: Symmetry as an Organizing Principle for Geometric Intelligence, §1, §1, 9th item, Discussion.
-  (2004) Visions of symmetry. Thames & Hudson. Cited by: §1, Future Directions.
-  (2011) CogSketch: sketch understanding for cognitive science research and for education. Topics in Cognitive Science 3 (4), pp. 648–666. Cited by: §1.
-  (1976) Transformation geometry and the artwork of M.C. Escher. Mathematics Teacher 69 (8), pp. 647–652. Cited by: §1, Future Directions.
-  (2008) On considerations of parsimony in mathematical problem solving. In Proceedings of the 32nd Conference of the International Group for the Psychology of Mathematics Education, Vol. 3, pp. 273–280. Cited by: Comparison to Other Computational Models.
-  (2013) A computational model for solving problems from the raven’s progressive matrices intelligence test using iconic visual representations. Cognitive Systems Research 22, pp. 47–66. Cited by: Related Work.
-  (2009) Interest points of general imbalance. IEEE Transactions on Image Processing 18 (11), pp. 2536–2546. Cited by: §1.
-  (2011) Cultural commonalities and differences in spatial problem-solving: a computational analysis. Cognition 121 (2), pp. 281–287. Cited by: Comparison to Other Computational Models.
-  (2008) Modeling cross-cultural performance on the visual oddity task. In International Conference on Spatial Cognition, pp. 378–393. Cited by: §1, Related Work, Comparison to Other Computational Models.
-  (2011) Finding the odd one out: a fractal analogical approach. In Proceedings of the 8th ACM conference on Creativity and cognition, pp. 289–298. Cited by: Related Work.
-  (2013) Fractal representations and core geometry. In Proceedings of the Second Annual Conference on Advances in Cognitive Systems ACS, Vol. 3, pp. 19. Cited by: Related Work, Comparison to Other Computational Models.
-  (1990) Mathematics and plausible reasoning: induction and analogy in mathematics. Vol. 1, Princeton University Press. Cited by: §1.
Measuring abstract reasoning in neural networks. In
International Conference on Machine Learning, pp. 4477–4486. Cited by: §1.
-  (2010) The mathematical side of m.c. escher. Notices of the AMS 57 (6), pp. 706–718. Cited by: Future Directions.
-  (2018) The structural affinity method for solving the raven’s progressive matrices test for intelligence. In Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: Related Work, Structural Affinity for Core Geometry.
-  (1990) Principles of object perception. Cognitive science 14 (1), pp. 29–56. Cited by: Discussion.
-  (1999) Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61 (3), pp. 611–622. Cited by: Transformation Phase.
-  (1995) Empirical aspects of symmetry perception. Spatial Vision 9 (1), pp. 1–8. Cited by: §1.
-  (2012) A century of gestalt psychology in visual perception: i. perceptual grouping and figure–ground organization.. Psychological bulletin 138 (6), pp. 1172. Cited by: Transformation Phase, Future Directions.
-  (1984) Making computers think like people [fuzzy set theory]. IEEE spectrum 21 (8), pp. 26–32. Cited by: Problem Solving Phase.