Log In Sign Up

Visual-Imagery-Based Analogical Construction in Geometric Matrix Reasoning Task

Raven's Progressive Matrices is a family of classical intelligence tests that have been widely used in both research and clinical settings. There have been many exciting efforts in AI communities to computationally model various aspects of problem solving such figural analogical reasoning problems. In this paper, we present a series of computational models for solving Raven's Progressive Matrices using analogies and image transformations. We run our models following three different strategies usually adopted by human testees. These models are tested on the standard version of Raven's Progressive Matrices, in which we can solve 57 out 60 problems in it. Therefore, analogy and image transformation are proved to be effective in solving RPM problems.


page 5

page 13

page 15

page 16


A Data Augmentation Method by Mixing Up Negative Candidate Answers for Solving Raven's Progressive Matrices

Raven's Progressive Matrices (RPMs) are frequently-used in testing human...

KitBit: A New AI Model for Solving Intelligence Tests and Numerical Series

The resolution of intelligence tests, in particular numerical sequences,...

Solving Raven's Progressive Matrices with Neural Networks

Raven's Progressive Matrices (RPM) have been widely used for Intelligenc...

Generating Correct Answers for Progressive Matrices Intelligence Tests

Raven's Progressive Matrices are multiple-choice intelligence tests, whe...

Raven's Progressive Matrices Completion with Latent Gaussian Process Priors

Abstract reasoning ability is fundamental to human intelligence. It enab...

Solving Raven's Progressive Matrices with Multi-Layer Relation Networks

Raven's Progressive Matrices are a benchmark originally designed to test...

1 Introduction

(a) 22 RPM-like problem
(b) 33 RPM-like problem
Figure 1: Example problems: Real RPM problems are not shown to protect the secrecy of the test.

Raven’s Progressive Matrices (RPM) is a widely used intelligence test that contains geometric reasoning problems like those shown in Figure 1 — including 22 problems (left) and 33 problems (right). The task is to select an answer from the options provided below the matrix that best completes it so that the relations in parallel rows and columns (and diagonals in some cases) form meaningful analogies.

How do you solve such problems? Your solution process is likely to involve constructing analogies from the problem elements — one row or column becomes the source, another row or column becomes the target, you find a mapping between them, and finally you transfer information from the source to the target to produce an answer — but there are many possible ways to construct analogies. For the 22 problem on the left, you might construct analogies based on rows or columns. For the 33 problem on the right, there are far more variations. Perhaps you just focus on the top and bottom rows, ignoring the middle row completely. Or maybe you look at the top row first, use the second row to “verify” your hypothesis, and then try to fill in the bottom row.

When taking the RPM test, no one tells you how to construct these various analogies to get to the answer. Some research suggests that the ability to construct abstract analogical relations is an innate capacity that distinguishes humans from other species (Hespos et al., 2020). The RPM was specifically designed to test a person’s eductive ability to extract information from and make sense of a complex situation (Raven et al., 1998), where analogies are often indispensable. Previous computational models have explored many different dimensions of matrix reasoning, including the capacity for subgoaling (Carpenter et al., 1990)

, pattern matching

(Cirillo and Ström, 2010), rule induction (Rasmussen and Eliasmith, 2011), and dynamically re-representing and re-organizing visual elements (Lovett and Forbus, 2017).

In this paper, we present a systematic examination of another dimension of matrix reasoning: how one constructs analogies from matrix elements. As our base model, we use the Affine and Set Transformation Induction (ASTI) model, which operates on scanned, pixel-based images from the RPM test booklet and uses affine transformations and set operations to reason about image differences (Kunda et al., 2013; Kunda, 2013). Our contributions include: A three-level hierarchy for solving RPM problems. First, at the level of images, one can search across a set of image transformations to interpret relationships within a given pair or triple of images (e.g., to explain the variation across a row, column, or diagonal). Second, at the level of a problem matrix, one can search across different analogies to find transfers of relationships across different pairs or triples of images. Third, at the highest level, one can use alternative integration strategies that specify how to combine results from different levels to produce the final answer.

A finer taxonomy of the option-usage strategy for solving RPM problems, which are traditionally categorized into constructive matching and response elimination (Bethell-Fox et al., 1984). We further divide constructive matching into option-free and option-informed constructive matchings, which describe usages in practice more precisely.

A demonstration that a certain combination of transformations, analogies, and integration strategy solves 57/60 problems on the Raven’s Standard Progressive Matrices test, which shows that these representations and inference mechanisms are expressive and effective for building computational solutions to this type of tasks.

Systematic ablation experiments that show test performance varies widely as a function of overall analogy constructions and that it covers almost the entire range of human performance reported in the studies of the RPM test.

2 Description of the ASTI+ Model

In this section, we present the ASTI+ model for solving RPM problems. We provide detailed and formal description of its core dimensions: problem representations, similarity metrics, image transformations, matrix analogies, and integration strategies. Based on the description, we then compare the ASTI+ model to its predecessor — the ASTI model (Kunda et al., 2013; Kunda, 2013).

2.1 Problem Representations

Since the standard RPM is in black and white, we represent each problem as a binary (i.e. pure black and white) image. Note that this is equivalent to representing an image as a set of black pixels with each pixel identified by its coordinates in the image. Throughout this paper, we use these two representations interchangeably. Binary images are generated from grayscale scannings of RPM test booklet and we select a threshold manually to convert grayscale values to binary values. We use an RPM-specific automated image-processing pipeline (Kunda, 2013) to decompose each full test page into images of individual matrix entries and answer options, as shown in Figure 2. We then feed these individual images as inputs to the ASTI+ model.

Figure 2: Illustration of input to our model for the 22 example problem: is the matrix entry in Row and Column , and is the -th answer option.

2.2 Similarity Metrics

The first element of our model specifies how to measure similarity between images. For this purpose, we use the Jaccard index and the asymmetric Jaccard index, as shown in Equation (

1) and (2):


where and are two sets representing two binary images. Equation (2) is asymmetric because and measures the extent to which is inside (or a subset of) .

A problem with Equation (1) and (2) is that and should be properly aligned. That is, and should have the same shape and size, and pixels in and belonging to the same object should have the same coordinates in and

. However, the images of matrix entries and options come in various shapes and sizes. We take a simple but robust approach to this problem — slide one image over the other, calculate a similarity value at every relative position, and select the maximum. In the process of sliding, images are padded to have the same shape and size.


As a result, similarity calculation in our model is defined by Equation (3) and (4), where and are the maximum111When the maximum is achieved at multiple relative positions, we take the least shifted one. If multiple such least shifted positions exist, then the image must contain some symmetric structure. In this case, all of them are equally representative and we only need to select one in a consistent manner, for example, always select the first one. Of course, there exist other methods to resolve this issue. similarity values at the relative position of to . In Equation (4), is the difference between and when the maximum is reached, and is the relative position of to .

2.3 Transformations

The second element specifies low-level visuospatial knowledge about the domain. ASTI+ represents this content as a discrete set of image transformations that map from one or more input images to an output image. These functions operate on images at the pixel level, without re-representing visual information in terms of higher-order features. Although these functions were defined manually, based largely on inspections of the Raven’s test, important directions for future work include expanding them to include higher-order features and concepts, as well as learning from perceptual experience (Michelson et al., 2019).

Figure 3: Illustrations of affine transformations used in our model.
Figure 4: Illustrations of set transformations used in our model: (a) Given an analogy A:B::C:? and an unary set transformation , the output image is , where is the input, and and are parameters of ; (b) Given an analogy A:B:C::D:E:? and a binary set transformation , the output image is when is applied on A:B:C, where and are the inputs, and is a parameter of , or when is applied on D:E:?, where is an option of the RPM problem.

ASTI+ includes two types of image transformations: unary and binary, which take a single input image and two input images, respectively. All ASTI+ transformations are based on fundamental affine transformations and set operations. These extend the original collections proposed in earlier ASTI research (Kunda et al., 2013; Kunda, 2013). ASTI+ includes nine unary affine transformations: eight rectilinear rotations/reflections, as shown in Figure 3, and a ninth scaling transformation that doubles the area of the input image. There are also 11 additional set transformations: five unary and five binary, as shown in Figure 3(a) and 3(b) respectively, and one hybrid unary/binary transformation. Table 1 gives details of each transformation. Unary transformations are defined relative to analogies between pairs of images, such as A:B::C:D for images A, B, C and D. Binary transformations are defined relative to analogies between trios of images, such as A:B:C::D:E:F for images A, B, C, D, E and F.


Calculate . Align and using . Output .

Calculate . Align and using and . Output .

Calculate . Align and using , and calculate and . Align and using . Output .

Let be an empty image of the same size as . Calculate and aligned by , and copy to the position of in . Repeat this until nothing is left in . Output .

Let be an empty image of the same size as . Decompose , and into connected components , and . If is false, output a value indicating failure. Otherwise, find a permutation of that maximizes by calculating for each and each . Find another permutation of that minimizes . Generate by copying to position of in for all .  

Calculate and . Align and with and . Output .

Calculate and . Align and with and . Output .

Calculate and . Align , and using and . Output image .

Calculate . Align and by . Output .

Let and be the shadows of and , where “shadow” is defined to be a copy of an image where any white area surrounded by black in the original image is colored black. Calculate . Align and using , and calculate . Align and using . Output .  

Given analogy , works as . But it requires that and , where is an option. Otherwise, output a value indicating failure. (This transformation is NOT shown in Figure 4.)  

  • , ,

Table 1: Details of unary, binary, and hybrid unary/binary transformations.

2.4 Analogies

The third element specifies how analogies are defined within a given RPM problem next. ASTI+ posits that an RPM analogy is composed of relations between matrix entries and that all parallel relations should be instantiated by the same transformation. This assumption seems adequate for most problems on the Standard Raven’s test, but items on the Advanced test or other geometric analogy tests may require considering multiple transformations (Carpenter et al., 1990; Kunda, 2015).

Figure 5 illustrates simple analogies that one could draw in any given RPM problem, where the images are represented by characters. These analogies are either between rows (Figure 4(a) and 4(c)) or between columns (Figure 4(b) and 4(d)), implying that the rows or columns share the same underlying relation among entries.

Figure 5: Illustrations of simple analogies in RPM problems. Simple analogies reflect how a matrix layout is naturally perceived as rows or columns. Particularly, given 22 matrices in (a) and (b), the row analogy is A:B::C:? and the column analogy is A:C::B:?; similarly, given 33 matrix in (c) and (d), the row analogies include A:B:C::G:H:? and D:E:F::G:H:?, and the column analogies include A:D:G::C:F:? and B:E:H::C:F:?.
Figure 6: Illustrations of recursive analogies in 33 RPM problems: (a) are (b) are trio analogies and (c) through (j) are pair analogies.

In addition to the simple analogies in Figure 5, the ASTI+ model also expands these analogies in two ways. First, for matrices, the model further considers several subproblems, as shown in Figure 6. For example, consider the simple analogies in Figure 4(c), A:B:C::G:H:? and D:E:F::G:H:?, which use only two of the three rows. We then combine them into a larger recursive222Recursive in that it is an analogy of analogies. format, A:B:C::D:E:F:::D:E:F::G:H:? as in Figure 5(a), which use all rows. In this recursive analogy, two subproblems are created — the first subproblem is A:B:C::D:E:?, with F as the only option, and the second subproblem is D:E:F::G:H:?, with options from the original RPM problem. All subproblems should be solved equally well by the correct transformation.

Second, ASTI+ captures more sophisticated spatial regularities by expanding the matrix in a way that the adjacency between matrix entries is preserved everywhere in the expanded version. Then it encloses different parts of the expanded matrix with quadrilaterals, as shown in Figure 7

. The entries in each quadrilateral form a new matrix, whose rows and columns constitute analogies that can not be systematically constructed by rows and columns in the original matrix. ASTI+ follows two reasonable heuristics to enclose these matrices: (1) the quadrilateral should contain a permutation of the original matrix, and (2) the quadrilateral should have a

? at one of its corners. We do not necessarily expect that humans use this strategy to search through this analogy space, but it provides a systematic and parsimonious way to capture regularities within a matrix that humans might perceive and reason about, albeit in different ways.

Figure 7: Expanded matrices to generate analogies: (a) through (c) are expanded from the 22 matrix in Figure 5, and (d) through (g) are expanded from the 33 matrix in Figure 5.

2.5 General Integration Strategy

The fourth element concerns the general strategy used to integrate transformations, analogies and similarity metrics to solve an RPM. The integration can be generally divided into three stages. In Stage 1, ASTI+ attempts to explain the variations in the incomplete matrix with some analogies and transformations. In Stage 2, it verifies the explanations by checking if there exists an option that can be generated from the analogy and transformation. In Stage 3, it uses the best explanation — the best analogy and the best transformation — to select an answer option.

To quantify “how well” an analogy and a transformation explain the variations across matrix entries, we introduce three scores corresponding to the three stages, which are realized through different ways to assemble Jaccard similarity measurements: (1) the score measures how well an analogy and a transformation explain the variations in the matrix in Stage 1; (2) the score measures how well an analogy and a transformation explain the variations involving the options in Stage 2; and (3) the score, which is used as the final metric to select the answer, is computed from the and scores. For example, given the matrix in Figure 4(a), analogy A:B::C:? and transformation , we have , and . Score calculation depends on what types of analogy and transformation are used, as described below.

MAT Scores. For transformations in forms of or (without extra parameters), scores are calculated in the same way as . For transformations with extra parameters, they cannot be computed in this way because the model does not know the extra parameters. For example, for and A:B::C:?, it cannot use because it does not know and , but it can use to calculate score. In this case, the score is calculated as for . Although the model takes transformation-specific approaches to calculate scores, they are simply different ways to assemble similarity measurements (symmetric and asymmetric Jaccard indices) of the same known matrix entries.

O Scores. For transformations whose scores are calculated through the Jaccard index, so are their scores. For transformations using the asymmetric Jaccard index, for example and , the asymmetric Jaccard index is always higher than the Jaccard index given the same input (see Equation (1) and (2)). As a result, transformations measured by asymmetric Jaccard index tend to have higher scores even if their explanations are poor. To fix this issue, the model calculates multiple Jaccard and asymmetric Jaccard indices, each of which characterizes a distinct aspect of the transformation, and average them to get an score. For example, for and A:B::C:?, three aspects of the transformation are considered: (1) how much is a subset of , where is an option, (2) how the difference between and compares to the difference between and and (3) how similar the predicted image is to . This leads to , where and after , , and are properly aligned.

MATO Scores. Finally, every combination of an analogy, a transformation and an option is evaluated by a weighted average of its and scores, where the weight is proportional to the number of variations that the score measures. For recursive analogies in 33 matrices, scores of the original problem are derived from the scores of subproblems. For instance, suppose that there are subproblems in a recursive analogy, and let and be the score and score of the -th subproblem. In this case, the final score is and the final score is .

Figure 8: The dependencies of scores: The dashed lines denote partial dependence. Given the relations in an analogy, relies on the entries that are not related to the missing entries while relies on the entries that are related to the missing entries.

2.6 Specific Integration Strategies: When and What to Maximize

ASTI+ implements the general integration strategy as several alternative specific strategies that systematically explore different design choices in each stage of the general strategy. Given the dependencies of scores in Figure 8, the general strategy boils down to an optimization in which score is maximized over the analogy , the transformation , and the option for a problem-specific matrix . An heuristic for solving the optimization can be drawn from an observation on high-achieving human solvers — they often first form a good understanding of the incomplete matrix before attending to the options. This observation, translated into our scoring system, says that a good score implies a good score and thus a good score. However, as most heuristics in intelligent systems, this heuristic might become invalid in some cases, for example, it will not work if the system does not have adequate capability to fully “understand” or explain the incomplete matrix (e.g. lacking appropriate transformations or analogies), or if the matrix contains distracting noisy features that cause the system to “over-explain” the content that should have been ignored.

For this reason, we introduced specific integration strategies (summarized in the first part of Table 2) that range from relying entirely on the heuristic to ignoring it. In particular, given an RPM matrix , an analogy , a transformation and an option , the score is a function , score is a function , and is a function . We formulate the three strategies as optimization processes, as shown below in (), () and ():


where () completely relies on the heuristic, () completely ignores the heuristic, and () lies in between. We thus refer to optimizations (), () and () as M-confident, M-neutral and M-prudent strategies, respectively, in the following discussion.

Since the score also depends on the option in Figure 8, it can also serve as the objective function to select an answer from the options. Therefore, ASTI+ has three analogous integration strategies for maximizing , which we refer to as O-confident, O-neutral () and O-prudent ()) strategies:


Note that is simply a weighted average of and , so the O-confident strategy is equivalent to M-confident (). Thus we do not need a separate optimization for it.

2.7 From ASTI to ASTI+

In this subsection, we compare ASTI+ to its predecessor ASTI. The ASTI model (Kunda et al., 2013; Kunda, 2013) introduced a visual-imagery framework for solving geometric reasoning problem that based analogical reasoning on a pixel-level representation, transformations, and metrics. This framework remains unchanged in the ASTI+ model. From ASTI to ASTI+, we gave enhancements to the core dimensions of the framework.


For 22 matrices, ASTI and ASTI+ share the same analogy set, which could be manually enumerated given the small size of matrices. In contrast, 33 matrices provide many more choices of analogies. We thus developed the systematic approach in Section 2.4 to enumerate analogies, which led to analogies that ASTI supported. We adopted this approach because analogues in matrix reasoning tasks are usually arranged in spatial parallelism. Another enhancement was the introduction of recursive analogy, which was inspired by the recursive and incremental nature of human solving reported in the literature (Carpenter et al., 1990; Kunda, 2015).


ASTI+ inherits all the affine transformations of ASTI. Meanwhile, ASTI+ has extra complex set operations, such as inverse unite and shadow mask unite, that combine basic set operations in ASTI.

Integration Strategy

Compared to ASTI, ASTI+ has more choices of integration strategy representing different degrees of reliance on the heuristic mentioned in Section 2.6. In contrast, ASTI implements only one strategy that roughly equals the M-prudent strategy in ASTI+.

Option-Usage Strategy

Two general option-usage strategies for solving RPM problems and other multiple-choice reasoning problems have been reported in human studies: constructive matching and response elimination (Snow, 1981; Bethell-Fox et al., 1984). Constructive matching proceeds as , where is a transformation and is an answer constructed by applying . Response elimination proceeds as , where is an option used to infer and, if fails, will be eliminated. The strategy choice observed in human experiments was found to relate to subject’s intellectual ability, item type and difficulty. Cognitive models have been constructed based on both strategies (Evans, 1964; Sternberg, 1977; Mulholland et al., 1980).

ASTI strictly follows the constructive matching strategy, where options are never used before generating the missing entry. We refer to this constructive matching as option-free constructive matching. In contrast, ASTI+ adopts a slightly different approach that we refer to as option-informed constructive matching, which lies between constructive matching and response elimination. It follows the pattern , where and are parameters of and is inferred from the option . For example, options are used for calculating alignment parameters of the transformations in ASTI+. This strategy gives ASTI+ the flexibility to represent the relations that cannot be represented by single-direction transformations.

3 Experimental Studies of the ASTI+ Model

To study how different analogical constructions affect the performance on the RPM test, we equip ASTI+ with different configurations of analogies and transformations, and integration strategies, and test its performance on the standard RPM test, which consists of five sets of problems with 12 problems each. In our experiments, the analogical constructions are implemented as different configurations of analogies, transformations, and integration strategies. We further aggregated them into the groups summarized in Table 2. Each configuration has one or more groups of analogies and transformations, whereas it has only one integration strategy. We hypothesized that, by varying the configuration, the performance would change accordingly.

To study how each dimension of the configuration affects performance, we conducted two experiments. In the first one, we varied only the integration strategy and fixed the configuration of analogies and transformations (using the full set of analogies and transformations). In the second one, we selected the best integration strategy in the first one and varied the configurations of analogies and configurations.


Find an analogy and a transformation that best explain the incomplete matrix; and then select an option that best matches the analogy and the transformation.

Mathematically equivalent to M-confident.

For each analogy, find a transformation that best explains the incomplete matrix; and then select an option such that there exist an analogy and its best transformation that match the option well and explain the incomplete matrix well.

For each analogy, find a transformation that best explains the incomplete matrix; and then select an option such that there exist an analogy and its best transformation that match the option well.

Select an option such that there exist an analogy and a transformation that match the option well and explain the incomplete matrix well.

Select an option such that there exist an analogy and a transformation that match the option well.   M-confident

All the affine transformations.

, , , and .

and .

, , , and .   M-confident

The analogies in Figure 6(a) and 6(d).

The analogies in Figure 6(b) and 6(e).

The analogies in Figure 6(c) and 6(f).

The analogies in Figure 6(g).  

Table 2: Configurations of integration strategies, analogy groups and transformation groups.
Figure 9: Performance of each strategy on the standard RPM test: (a) and (b) show numbers of problems correctly solved by each strategy in every set (A—E) and the entire test; (c) and (d) visualize and scores of each strategy’s answer to each problem as disks of various sizes and colors, where red disks indicates incorrect answers and blue disks indicate correct answers. Note that the “signed” score in (c) and (d) is only to distinguish visually between correct and incorrect answers, and the real scores always fall in .
Figure 10: Scatter plots of each strategy’s answer to each problem in the standard RPM test drawn with respect to the and scores.

The first experiment compares the integration strategies. Figure 9 shows the set-wise and problem-wise performance of each one. The M-neutral strategy always ties with the M-prudent strategy, solving 57/60 problems, whereas the M-confident strategy performs slightly worse, solving 55/60 problems. The O strategies are far less capable, especially in the last three sets (C, D and E), where the problems are 33 (Set A and B contains only 22 problems). Thus, the M strategies, by considering both and scores, are more robust to increases in the matrix dimension.

While the M-confident comes in last in Figure 8(a) by maximizing , the O-confident fares best by maximizing in Figure 8(b). Furthermore, -neutral and -prudent strategies in Figure 8(b) contrast sharply with their counterparts in Figure 8(a). In particular, the more a strategy relies on the heuristic from Section 2.6, the more performance drops when switching from maximizing to maximizing . We surmise that this is because the RPM is designed to have distractors with high and low . In other words, these distractors work like traps for strategies that maximize only scores, which is consistent with observations that people often make errors of “repetition” while solving RPM problems (Kunda et al., 2016).

Figure 8(c) and 8(d) depict the scores for each strategy’s answer to each problem as disks. and scores are encoded as size and color intensity, while the correctness of the answer is denoted by colors (blue for correct and red for incorrect). Note that the “signed” score in Figure 8(c) and 8(d) is only to distinguish between correct and incorrect answers, and the real scores always fall in . Figure 8(c) and 8(d) show a subtle difference: different strategies can have the same correct answer to a problem, but the answer may result from different analogies and transformations. Otherwise blue disks in any column would have the same size and color.

Figure 9(a) and 9(b) present the strategies’ answers to every problem in scatter plots drawn with respect to the and scores, which show more difference between strategies. Note that most data points in Figure 9(a), corresponding to the blue disks in Figure 8(c), denote problems that are correctly solved. Since these data points in Figure 9(a) are mostly located near or below the diagonal, we could hypothesize that, for a "naive" participant or computational model (with little prior knowledge about RPM), a good explanation for the known matrix entries matters more than how an option can be matched. Recall that and are measurements of these two explanations. On the flip side, many more points, representing incorrect answers according to Figure 8(d), fall above the diagonal in Figure 9(b), which further supports this idea. The hypothesis is consistent with observations in previous human studies that high-achieving test takers usually take a more constructive approach, which requires a clear explanation of the matrix rather than perceptually matching the options (Bethell-Fox et al., 1984; Carpenter et al., 1990; Lovett and Forbus, 2017).

Figure 11: Bar charts of numbers of problems correctly solved by -prudent strategy using different analogy groups and transformation groups. (This figure should be viewed in color.)

In the second experiment, we compared different configurations of analogies and transformations while setting the integration strategy to the M-prudent strategy. Figure 11 shows the performance of different combinations of analogy and transformation groups in this situation. In particular, each analogy group is combined with each transformation group in Figure 10(a), and analogy groups and transformation groups are combined in an incremental way in Figure 10(b). In Figure 10(a), we can see the strength and weakness of each analogy group and each transformation group. S analogies plus Diff transformations are good at problems in Set A, B, and C, whereas R analogies and Set transformations work well on Set D and E but work poorly on Set A, B, and C. Figure 10(b) shows increases in both the vertical and horizontal directions. The former are more substantial than the latter. This does not mean that transformations are more important than analogies, because, as seen in Figure 10(a), the S group outperforms H, V, and R for every transformation group and most problems in Set A, B and C solved by H, V, and R can also be also solved by S with a different transformation. We might expect more variation across analogy groups if they were defined at a finer-grained level.

Figure 12: Comparison between ASTI+ and human subjects. The blue horizontal lines denote the performance of the configurations used in our experiments. The green curves represent the percentiles of human data.

To conclude our analysis, we compare ASTI+’s performance with human performance (Raven et al., 1998) (i.e., the normative data of RPM). Figure 12 shows 95th, 50th and 5th percentiles of human performance (age from 6 to 19) in green curves and the performance of different configurations of ASTI+ used in our experiments in horizontal blue lines333Note that we did not use all the possible configurations, which would have resulted in wider and more even distribution of blue lines

. The ranges of ASTI+’s and human performance overlap substantially, suggesting that analogical construction has a great effect on the performance of the RPM test and that it probably has the same effect on the performance in other geometric reasoning tasks.

4 Related Work

In this section, we examine the related research about reasoning on geometric matrix tasks. We first review the commonly used problem sets in this area, then discuss problem representations adopted by different computational models. We further analyze how analogies among matrix entries are interpreted differently in different problem sets and computational models. Finally, we compare the protocols used to evaluate the computational models and the integration strategies.

Problem Sets

The commonly used problem sets in geometric matrix reasoning task can be classified into three categories according to their original purpose. The first contains the problem sets that are handcrafted by psychologists and psychometricians and then applied to measure human intellectual abilities. The best-known examples are the three versions of the RPM test (colored, standard, and advanced)

(Raven et al., 1998) and the Thurstone’s problem set (a.k.a. Evans’ problem set (Evans, 1964)), which was published in the 1942 edition of the Psychological Test for College Freshman of the American Council on Education. The second category includes the problem sets that are automatically generated by computer programs based on the designs in the first category and are mostly used in large-scale and adaptive testing of human intellectual abilities (Yang et al., 2021)

. The third category includes the problem sets that are also automatically generated but used to train and evaluate machine learning models for abstract reasoning. The problem sets in this category are usually large-scale data sets of homogeneous matrix problems, whose psychometric qualities are usually not guaranteed as in the first two categories. The representatives of this category are Procedurally Generated Matrices (PGM)

(Barrett et al., 2018) and Relational and Analogical Visual rEasoNing (RAVEN) (Zhang et al., 2019).

Problem Representations

Problem representations have been generally grouped into two categories — propositional and visual representations — in previous studies (Nersessian, 2010; Kunda et al., 2013). However, given the complex structure of RPM, different aspects of RPM can be represented differently, and thus it is inaccurate to say that an model is using a propositional or visual representation. Particularly, geometric objects, relations among geometric objects, and analogies between groups of geometric objects are represented differently in previous computational models. For example, the early models (Evans, 1964; Hunt, 1974; Carpenter et al., 1990; Lovett and Forbus, 2017) used propositional representations for both the geometric objects and relations, and implicitly represented analogies by specific procedural arrangement of computation according to specific analogy interpretation, which will be discussed later. The machine learning models (Barrett et al., 2018; Zhang et al., 2019) visually represent the geometric objects, implicitly represent the relations through learned parameters, and implicitly represent the analogies through specific structures of machine learning models according to their assumed analogy interpretations. Our model and its predecessor (Kunda et al., 2013; Kunda, 2013) take a different approach by representing goemetric objects visually and relations among them propositionally, and we encode analogies explicitly in a proportional format. Another aspect that one could consider is whether geometric objects are represented hierarchically (Cirillo and Ström, 2010). Rather than representing each matrix entry separately, one could also represent the entire RPM matrix as a single image (Hua and Kunda, 2020), thus encoding every aspect visually and implicitly.

Analogy Interpretation

Although an analogy is usually verbalized as “A is to B as C is to D” (i.e., A:B::C:D), this must be interpreted quite differently in different tasks. As a result, computational models require specific assumptions about analogy interpretation. For example, RPM matrices have independent vertical and horizontal goemetric variations, and thus can be solved using either row or column analogies. In contrast, the Thurstone’s problems have goemetric variations in only one direction, say row direction, and thus can only be solved using row analogies. In this case, the analogical relations between columns are naturally determined when a meaningful row analogy is determined for the problem. PGM and RAVEN problems are the same as Thurstone set in this regard. This closely relates to the central permutation property of analogy, i.e., A:B::C:D if and only if A:C::B:D. This property is true when the analogy has already been established in a specific context, for example, engine:fuel::human:food versus engine:human::fuel:food. But when a human or computational model is tasked to make an analogy with these four words, the thinking/solving process and difficulty can be quite different for these two analogies. Besides analogical directions, A:B::C:D is also interpreted differently in other dimensions, such as “A is similar to B as C is similar to D” versus “A is different from B as C is different to D(Prade and Richard, 2014).

Transformations and Learning

Transformations of geometric elements in computational cognitive models for solving RPM are usually predefined based on the observations on existing handcrafted problems (Hornke and Habon, 1986; Carpenter et al., 1990). These transformations are very limited and usually do not go beyond progressions, arithmetic operation, and set/logical operation, which are defined on specific geometric elements. In addition, diversity and complexity of perception organization (Primi, 2001) of these transformations and geometric elements were kept at a low level in current problem sets and computational models. For example, when multiple transformations are present in a problem, they are often manifested by independent or separate geometric elements, but the situation where one transformation depends on another is very rare. In machine learning models for solving RPM, transformations are learned from data sets while the models are trained to solve RPM. It has been shown that machine learning models are capable of learning the above transformations when they are bound with specific geometric elements (Barrett et al., 2018; Zhang et al., 2019). But it is still questionable whether they learn the abstract concepts of transformations and analogy per se, which are conceptually invariant to geometric elements, because no good generalization across geometric elements has been reported.

Evaluation Protocols and Integration Strategies

Benny et al. identified two typical evaluation protocols for computational models for solving RPM — multiple choice and single choice. In single choice, the model scores each option independently and the option with the highest score is taken; in multiple choice, the model is allowed to score all options by comparing them. The integration strategies of ASTI+ could be considered as extensions of single choice evaluation. On one hand, the option variable is always in the outermost layer of optimization in ASTI+ (as in single choice, select the highest-scored option), on the other, the integration strategies have more variables to arrange — analogy and transformation variables — in optimization (thus extending the protocol by providing extra choices of individually scoring each analogy and each transformation or comparing them, i.e., extra layers of single/multiple choice).

5 Concluding Remarks

This paper described a framework of solving geometric matrix reasoning tasks, including variations in transformations, analogies, and integration strategies. We showed that this task-specific language of representations and inference mechanisms is quite expressive on the Raven’s Standard Progressive Matrices test. We further demonstrated that test performance varies not only as a function of transformations and analogies used, but also with the higher-level integration strategy: when and how, across analogies and transformations, the model performs its maximization calculations.

In tasks such as the RPM, where eductive ability (Spearman, 1923; Raven et al., 1998) is required to extract information from a new situation, redundant information often exists; otherwise, ambiguity cannot be eliminated because little prior knowledge is available. Methods for representing, identifying, and exploiting such redundancies are crucial to solving the problem. Analogy is often used for this purpose. By varying the configuration of the ASTI+ model, we alter its ability to identify and represent these redundancies and control the extent to which it can exploit them to solve the task.

Our work has two main implications. First, for artificial intelligence, analogical ability might be needed for systems in new unseen situations. Second, for human intelligence, understanding analogical ability helps us understand eductive ability. ASTI+ demonstrates that analogical reasoning can be implemented in AI systems as exhaustive search on a predefined analogy space. Humans’ analogical ability is far more sophisticated than explicit search: it adapts to different complexity levels and task domains

(Bethell-Fox et al., 1984), it involves goal management and selective attention in working memory (Carpenter et al., 1990; Primi, 2001), and it requires synergy between perception and cognition that works in a bidirectional and recursive way (Barsalou and others, 1999; Hofstadter, 2001). These features present a huge challenge to any existing analogy-making AI system.

Our current models use only one analogy and one transformation to solve problems in the standard Raven’s test. However, multiple analogies and transformations are required for problems beyond the standard test (Carpenter et al., 1990; Kunda, 2015) and, thus, adding methods that coordinate multiple reasoning pathways of different analogies and transformations.

Going one step further, virtually all extant computational RPM models, including ASTI+, employ a single strategy to solve every problem. However, there is ample evidence that people change strategies on Raven’s problems, sometimes within a single testing session. For example, studies have found behavioral (DeShon et al., 1995) and neural (Prabhakaran et al., 1997) differences across test items linked to visual versus verbal problem-solving strategies, and other dimensions of strategy may exist. How do people manage these strategies and, possibly meta-cognitively, select options appropriate for problems? And how might an intelligent agent benefit from similar flexibility during complex problem solving?

Finally, although analogies and strategies are predefined in this research, there is the question of how humans learn such strategies, which, to our knowledge, no AI systems have accomplished for the Raven’s test (Hernández-Orallo et al., 2016). Even RPM models that use learning still require the system designer to define the function to be maximized. Research in program induction may provide one path to tackle this thorny question (Schmid and Kitzelmann, 2011), including how strategies might be learned in the first place and adapted to new problems.

We would like to thank Ashok Goel for contributions to earlier phases of this research, as well as the editor for his helpful feedback. The work was supported in part by NSF Award #1730044.


  • D. Barrett, F. Hill, A. Santoro, A. Morcos, and T. Lillicrap (2018)

    Measuring abstract reasoning in neural networks

    In Proceedings of the Thirty-fifth International Conference on Machine Learning, J. Dy and A. Krause (Eds.), Proceedings of Machine Learning Research, Vol. 80, Cambridge, MA, pp. 511–520. Cited by: §4, §4, §4.
  • L. W. Barsalou et al. (1999) Perceptual symbol systems. Behavioral and Brain Sciences 22 (4), pp. 577–660. Cited by: §5.
  • Y. Benny, N. Pekar, and L. Wolf (2021) Scale-localized abstract reasoning. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Los Alamitos, CA, pp. 12557–12565. Cited by: §4.
  • C. E. Bethell-Fox, D. F. Lohman, and R. E. Snow (1984) Adaptive reasoning: componential and eye movement analysis of geometric analogy performance. Intelligence 8 (3), pp. 205–238. Cited by: §1, §2.7, §3, §5.
  • P. A. Carpenter, M. A. Just, and P. Shell (1990) What one intelligence test measures: a theoretical account of the processing in the Raven Progressive Matrices test. Psychological Review 97 (3), pp. 404. Cited by: §1, §2.4, §2.7, §3, §4, §4, §5, §5.
  • S. Cirillo and V. Ström (2010) An anthropomorphic solver for Raven’s Progressive Matrices. Technical report Department of Applied Information Technology, Chalmers University of Technology, Göteborg, Sweden. Cited by: §1, §4.
  • R. P. DeShon, D. Chan, and D. A. Weissbein (1995) Verbal overshadowing effects on Raven’s Advanced Progressive Matrices: evidence for multidimensional performance determinants. Intelligence 21 (2), pp. 135–155. Cited by: §5.
  • T. G. Evans (1964) A heuristic program to solve geometric-analogy problems. In Proceedings of the Spring Joint Computer Conference, New York, NY, pp. 327–338. Cited by: §2.7, §4, §4.
  • J. Hernández-Orallo, F. Martínez-Plumed, U. Schmid, M. Siebers, and D. L. Dowe (2016) Computer models solving intelligence test problems: progress and implications. Artificial Intelligence 230, pp. 74–107. Cited by: §5.
  • S. J. Hespos, E. Anderson, and D. Gentner (2020) Structure-mapping processes enable infants’ learning across domains including language. In Language and Concept Acquisition from Infancy Through Childhood, J. B. Childers (Ed.), pp. 79–104. Cited by: §1.
  • D. R. Hofstadter (2001) Analogy as the core of cognition. In The Analogical Mind: Perspectives From Cognitive Science, D. Gentner, K. J. Holyoak, and B. N. Kokinov (Eds.), pp. 499–538. Cited by: §5.
  • L. F. Hornke and M. W. Habon (1986) Rule-based item bank construction and evaluation within the linear logistic framework. Applied Psychological Measurement 10 (4), pp. 369–380. Cited by: §4.
  • T. Hua and M. Kunda (2020)

    Modeling gestalt visual reasoning on Raven’s Progressive Matrices using generative image inpainting techniques

    In Proceedings of the Eighth Annual Conference on Advances in Cognitive Systems, Cited by: §4.
  • E. Hunt (1974) Quote the raven? nevermore!. In Knowledge and Cognition, L. W. Gregg (Ed.), pp. 129–158. Cited by: §4.
  • M. Kunda, K. McGreggor, and A. K. Goel (2013) A computational model for solving problems from the Raven’s Progressive Matrices intelligence test using iconic visual representations. Cognitive Systems Research 22, pp. 47–66. Cited by: §1, §2.3, §2.7, §2, §4.
  • M. Kunda, I. Soulières, A. Rozga, and A. K. Goel (2016) Error patterns on the Raven’s Standard Progressive Matrices test. Intelligence 59, pp. 181–198. Cited by: §3.
  • M. Kunda (2013) Visual problem solving in autism, psychometrics, and AI: the case of the Raven’s Progressive Matrices intelligence test. Ph.D. Thesis, Department of Computer Science, Georgia Institute of Technology, Atlanta, GA. Cited by: §1, §2.1, §2.3, §2.7, §2, §4.
  • M. Kunda (2015) Computational mental imagery, and visual mechanisms for maintaining a goal-subgoal hierarchy. In Proceedings of the Third Annual Conference on Advances in Cognitive Systems, Atlanta, GA. Cited by: §2.4, §2.7, §5.
  • A. Lovett and K. Forbus (2017) Modeling visual problem solving as analogical reasoning.. Psychological Review 124 (1), pp. 60. Cited by: §1, §3, §4.
  • J. Michelson, J. H. Palmer, A. Dasari, and M. Kunda (2019) Learning spatially structured image transformations using planar neural networks. arXiv preprint arXiv:1912.01553. Cited by: §2.3.
  • T. M. Mulholland, J. W. Pellegrino, and R. Glaser (1980) Components of geometric analogy solution. Cognitive Psychology 12 (2), pp. 252–284. Cited by: §2.7.
  • N. J. Nersessian (2010) Creating scientific concepts. The MIT Press, Cambridge, MA. Cited by: §4.
  • V. Prabhakaran, J. A. Smith, J. E. Desmond, G. H. Glover, and J. D. Gabrieli (1997) Neural substrates of fluid reasoning: an fMRI study of neocortical activation during performance of the Raven’s Progressive Matrices test. Cognitive Psychology 33 (1), pp. 43–63. Cited by: §5.
  • H. Prade and G. Richard (2014) Homogenous and heterogeneous logical proportions. Journal of Logic and Computation 1 (1), pp. 1–52. Cited by: §4.
  • R. Primi (2001) Complexity of geometric inductive reasoning tasks: contribution to the understanding of fluid intelligence. Intelligence 30 (1), pp. 41–70. Cited by: §4, §5.
  • D. Rasmussen and C. Eliasmith (2011) A neural model of rule generation in inductive reasoning. Topics in Cognitive Science 3 (1), pp. 140–153. Cited by: §1.
  • J. Raven, J. C. Raven, and J. H. Court (1998) Manual for Raven’s Progressive Matrices and Vocabulary Scales. Harcourt Assessment, San Antonio, TX. Cited by: §1, §3, §4, §5.
  • U. Schmid and E. Kitzelmann (2011) Inductive rule learning on the knowledge level. Cognitive Systems Research 12 (3-4), pp. 237–248. Cited by: §5.
  • R. E. Snow (1981) Aptitude processes. In Conference Proceedings: Aptitude, Learning, and Instruction, R. E. Snow, P. Federico, and W. E. Montague (Eds.), Vol. 1, London, UK, pp. 27–63. Cited by: §2.7.
  • C. Spearman (1923) The nature of "intelligence" and the principles of cognition. Macmillan, London, UK. Cited by: §5.
  • R. J. Sternberg (1977) Intelligence, information processing, and analogical reasoning: the componential analysis of human abilities.. Lawrence Erlbaum Associates, Mahwah, NJ. Cited by: §2.7.
  • Y. Yang, D. Sanyal, J. Michelson, J. Ainooson, and M. Kunda (2021) Automatic item generation of figural analogy problems: a review and outlook. In Proceedings of the Ninth Annual Conference on Advances in Cognitive Systems, Cited by: §4.
  • C. Zhang, F. Gao, B. Jia, Y. Zhu, and S. Zhu (2019) Raven: a dataset for relational and analogical visual reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Los Alamitos, CA, pp. 5317–5327. Cited by: §4, §4, §4.