Abstract Reasoning via Logic-guided Generation

07/22/2021
by   Sihyun Yu, et al.
0

Abstract reasoning, i.e., inferring complicated patterns from given observations, is a central building block of artificial general intelligence. While humans find the answer by either eliminating wrong candidates or first constructing the answer, prior deep neural network (DNN)-based methods focus on the former discriminative approach. This paper aims to design a framework for the latter approach and bridge the gap between artificial and human intelligence. To this end, we propose logic-guided generation (LoGe), a novel generative DNN framework that reduces abstract reasoning as an optimization problem in propositional logic. LoGe is composed of three steps: extract propositional variables from images, reason the answer variables with a logic layer, and reconstruct the answer image from the variables. We demonstrate that LoGe outperforms the black box DNN frameworks for generative abstract reasoning under the RAVEN benchmark, i.e., reconstructing answers based on capturing correct rules of various attributes from observations.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 9

08/06/2019

Logic could be learned from images

Logic reasoning is a significant ability of human intelligence and also ...
06/01/2011

Reasoning about Minimal Belief and Negation as Failure

We investigate the problem of reasoning in the propositional fragment of...
06/14/2015

Artificial general intelligence through recursive data compression and grounded reasoning: a position paper

This paper presents a tentative outline for the construction of an artif...
02/13/2015

Abstract Learning via Demodulation in a Deep Neural Network

Inspired by the brain, deep neural networks (DNN) are thought to learn a...
03/22/2021

Raven's Progressive Matrices Completion with Latent Gaussian Process Priors

Abstract reasoning ability is fundamental to human intelligence. It enab...
12/20/2013

Abstract Modular Systems and Solvers

Integrating diverse formalisms into modular knowledge representation sys...
03/26/2021

Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution

Spatial-temporal reasoning is a challenging task in Artificial Intellige...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Imitating the human ability to infer complicated patterns from observations has been a long-standing goal of artificial intelligence. In order to build such models capable of this reasoning ability, recent works

(zheng2019abstract; zhang2019learning; hahne2019attention; wang2020abstract; hu2020hierarchical; wu2020scattering) have focused on training a deep neural network (DNN) which solves abstract reasoning problems that resemble an IQ test (Figure 1). In these problems, one should infer common rules of contexts without any additional information other than context images and select a correct answer from a candidate set. Accordingly, those DNN-based approaches attempt to derive the framework in situations where both the supervision on rules of the problem and the explicit symbol labels of each image are not provided. A couple of studies (santoro2018measuring; zhang2019raven) demonstrated how the widely used neural network architectures such as ResNet (he2016deep) or LSTM (hochreiter1997long) are unfit for learning reasoning capability, as any priors to resemble the human reasoning procedure are not employed in these architectures. Remarkably, recent works have shown that the performance can be significantly improved with the careful neural architecture design motivated by the human reasoning process and even outperforms humans (zhang2019learning; zheng2019abstract; wu2020scattering).

(a) Response elimination.
(b) Constructive matching.
(c) Connection to propositional logic.
Figure 1: Illustration of strategies to solve Raven’s Progressive Matrices (RPM) problem. (a) In response elimination, one finds the answer by eliminating the irrelevant candidates which are not aligned from context images. (b) In constructive matching, one first imagines the answer candidate from inferred rules then selects the answer. (c) Interpretation of the problem combined with the propositional logic.

Most DNN-based methods on abstract reasoning mostly have resembled how humans perform reasoning via response elimination strategy (Figure 0(a)), i.e., to exclude candidate answers based on matching with the given context images. Intriguingly, cognitive science (bethell1984adaptive; carpenter1990one) reveals that humans use two types of abstract reasoning strategies, not only limited to response elimination strategy. To be specific, humans also have the ability to perform constructive matching (Figure 0(b)); they can first imagine the answer from context images without any candidates and then match the candidate answers to select the most similar one. Especially, several works (mitchum2010solve; becker2016preventing) have emphasized that the latter strategy better reflects the general intelligence of humans. However, investigation on the ability of neural networks to achieve constructive matching is yet under-explored; even such a direction is promising.

Contribution. We introduce a new end-to-end generative framework, coined logic-guided generation (LoGe), to learn a constructive matching strategy on abstract reasoning like humans. Our main idea is to reduce these reasoning problems into optimization problems in propositional logic. Leveraging such prior knowledge, LoGe learns to embed each image to discrete variables and generate the answer image via incorporating a differentiable propositional logical reasoning layer. We note that both objectives are achieved without any supervision on the exact propositional variables of each image and underlying rules in the problem. Specifically, we propose a three-step framework to achieve these objectives: abstraction, reasoning, and reconstruction.

We verify how our LoGe effectively solves the proposed task on the RAVEN (zhang2019raven) dataset. To be specific, we show that our framework generates high-quality images, which is correct, based on capturing the underlying abstract rules and attributes. This result is remarkable, as the widely used neural network architectures perform poorly for this conditional generation task. We also verify how LoGe performs comparably to neural networks that rely on response elimination strategy to perform abstract reasoning, even though our task is arguably harder and has not accessed to the wrong candidates while training.

2 Logic-guided Generation

In this section, we demonstrate logic-guided generation (LoGe), a framework to imitate a human’s constructive matching strategy on abstract reasoning.

2.1 Overview of logic-guided generation

Our problem setup is largely inspired by bethell1984adaptive, who evaluated the constructive matching ability of humans to measure their generative reasoning ability. In this perspective, we express reasoning as a task of inferring the rule from a given problem , where the problem is a pair of a context and answer satisfying the rule . We especially focus on the generative strategy for solving this task; given a query context , we evaluate the ability of machines to infer a rule as from contexts to generate an answer image that matches the ground-truth image .

For teaching models the ability of generative strategy in abstract reasoning, we train them on a dataset consisting of problems, i.e., . To be specific, we consider a dataset where each context in the dataset is a tuple of images and is an answer image. Images are specified by a collection of abstract features such as shapes, colors and size. For instance, we visualize the case of the generative strategy in Raven’s Progressive Matrices (RPM) structure in Figure 0(b): contexts are given as eight images i.e., , and the goal is to generate an answer image for the remaining location, denoted by a question mark.

Our main idea is to connect abstract reasoning to the optimization problem in propositional logic to achieve the generative reasoning strategy into the framework. For instance, in RPM problems, one can define propositional variables as “an image placed at th row and th column contains an attribute ” to represent contexts in the problem, as shown in Figure 0(c). Here, attributes are sets of features in each context image, e.g., set of shapes , color , and size . With those variables, underlying rules can be written as propositional logical formulas as in Figure 0(c). In this respect, one may interpret the answer generation procedure as the MAXSAT optimization problem in propositional logic: finding propositional variables representing the answer which satisfy the underlying logical formula in the given RPM problem as much as possible. We provide a more description of the MAXSAT problem in Appendix A.

Figure 2: Overall illustration of our framework consists of three steps: abstraction, reasoning, and reconstruction. We compute the propositional logical variables of each image and predict variables of the answer combined with the logical reasoning layer.

As those propositional variables are not provided in dataset , LoGe learns a propositional embedding and rules in problems in self-supervised manner. In particular, we derive a three-step framework with an encoder network , a decoder network , a logical reasoning layer parametrized by , , , respectively, and a latent codebook , where each element is a trainable

-dimensional real-valued vector:

  • (Abstraction.) The encoder network and the codebook embeds contexts into propositional variables.

  • (Reasoning.) The reasoning layer predicts propositional variables of the answer image.

  • (Reconstruction.) The decoder network and the codebook generates the answer image from inferred propositional variables.

We provide an illustration of our framework in Figure 2.

2.2 Detailed description of LoGe

In the rest of this section, we describe each step of our framework in detail.

Abstraction. We first compute propositional logical variables of each context image from the encoder network and the latent codebook . To achieve this, we first pass through each image into the encoder network to have a corresponding output , in which we denote . We then quantize the output with the codebook , denoted by , where with is defined as follows:

where

. We finally consider an one-hot encoding of indices of

, namely . Specifically, we map each index into -categorical one-hot vector in which the -th value is 1 for all . Consequently, we have an one-hot embedding of each image . This one-hot embedding is regarded and utilized as propositional variables of in further steps.

Reasoning. With propositional variables of the context , we compute propositional variables which corresponds to the predicted answer image. To be specific, we evaluate from the reasoning layer and propositional variables of contexts , i.e., . For the reasoning layer , we choose the SATNet layer (wang2019satnet), which is a differentiable version of the MAXSAT problem solver and learns propositional logical formulas from data as layer weights. We provide details of this reasoning layer in Appendix B.

Reconstruction. Finally, we infer the answer image from predicted propositional variables . To achieve this, we first compute a latent vector of from predicted and the codebook :

We then return the output from the decoder as the final answer image .

Training objective.

To train LoGe, we propose three loss functions:

, , and for each , which is for abstraction, reasoning, and reconstruction step, respectively. To formulate those objectives, we additionally consider and , which indicates a quantized vector and propositional variables of the answer from the abstraction step.

We first formulate and

, which resemble the objective in vector-quantized variational autoencoder

(van2017neural; razavi2019generating):

Figure 3: Comparison of generated and ground truth image on RAVEN dataset among different architecture variants. (a)-(c): autoencoder with an attention layer and two convolutional networks with different kernel sizes as the reasoning network. We set the kernel size of 3 and 1 for (b) and (c), respectively. (d)-(e): VQ-VAE with an attention layer and a convolutional network with a kernel size as 3. Ours: our LoGe framework, which is composed of VQ-VAE and SATNet. GT: ground-truth answer image.

where the term with the bar indicates the term with a stop-gradient operator. Moreover, we define as follows:

Here, denotes the binary cross-entropy loss.

To sum up, we optimize the loss , which is a sum of above three loss functions:

Here, the total loss contains several discrete outputs, e.g., outputs of the reasoning layer in . We provide a detailed description of how to deal with such non-differentiability on the optimization in Appendix C.

3 Experiments

We verify the effectiveness of our framework on the RAVEN/i-RAVEN dataset (zhang2019raven; hu2020hierarchical). Our result demonstrates that the proposed logic-guided generation (LoGe) framework how well generates the answer image at a given abstract reasoning problem, while other neural architectures fail to achieve this. Moreover, we also show our framework can be employed for discriminative tasks, i.e., choose the answer among candidates, and it shows improved results compared to existing discriminative methods (zhang2019raven; zheng2019abstract; zhang2019learning; hu2020hierarchical; wu2020scattering).

Method Center U-D L-R O-IC
LSTM (zhang2019raven) 12.3 10.3 12.7 12.9
WReN (santoro2018measuring) 23.3 15.2 16.5 16.8
LEN (zheng2019abstract) 42.5 28.1 27.6 32.9
CoPINet (zhang2019learning) 50.4 40.8 40.0 42.7
SRAN (hu2020hierarchical) 53.4 43.1 41.4 44.0
LoGe (Ours) 87.5 64.0 51.7 48.5
Table 1: Accuracy of discriminate tasks on i-RAVEN dataset from existing approaches and LoGe. For LoGe, we select the image with the smallest mean-squared error from the generated image.

3.1 Experimental setup

Datasets. To verify the effectiveness of LoGe, we choose RAVEN (zhang2019raven) and i-RAVEN (hu2020hierarchical). For more details of these datasets, see Appendix D.

Baselines. We note that LoGe utilizes VQ-VAE (van2017neural) and SATNet (wang2019satnet) to leverage a propositional logic. To verify the effectiveness of this logical prior, we qualitatively compare the generated answers with ones from other black-box neural network frameworks, i.e., different encoders and reasoning networks other than VQ-VAE and SATNet, respectively. For quantitative results, we compare the performance with prior methods on discriminative abstract reasoning. We provide detailed descriptions of baselines for comparison in Appendix E.

3.2 Main results

Qualitative result. Figure 3 summarizes results of the generated answers from different configurations of RAVEN dataset. LoGe successfully generates the answer image at the various configuration of abstract reasoning problems, while other black-box architecture design choices fail. It indicates how propositional logic prior is beneficial to achieve the objective. We provide more illustrations in Appendix F.

Quantitative result. Table 1 shows the comparison of our framework and prior approaches on discriminative tasks.111For hu2020hierarchical, we resized the image to 8080 for a fair comparison with other baselines. We select the answer by choosing the candidate which has the smallest mean-squared error from the generated answer image to employ our framework into discriminative tasks. Intriguingly, LoGe shows better performance to existing works; even our method has not been accessed to other candidates other than the answer while training.

4 Conclusion

We introduce a new deep neural network generative framework that resembles a human’s constructive matching strategy on abstract reasoning. Specifically, we derive a three-step procedure based on connecting the optimization problem in propositional logic and abstract reasoning. Experimental results demonstrate the effectiveness of our framework among various problem types in abstract reasoning to generate the correct answer based on capturing common patterns with propositional logic prior.

5 Acknowledgements

This work was partially supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2019-0-00075, Artificial Intelligence Graduate School Program (KAIST)). This work was mainly supported by Samsung Research Funding & Incubation Center of Samsung Electronics under Project Number SRFCIT1902-06.

References

Appendix A Detailed description of the MAXSAT problem

CNF formula. Conjunctive normal form (CNF) formula is a conjunction of clauses, where each clause is composed of OR operation of propositional variables. To be specific, the following is an example of the CNF formula with 2 propositional variables and :

(1)

MAXSAT problem.

MAXSAT problem aims to find values of propositional variables to maximize the number of clauses of given propositional logical formula in CNF. One may interpret the MAXSAT problem as a combinatorial optimization problem. Specifically, by letting

as propositional variables and as the CNF formula with clauses where is an -th propositional variable in -th clauses:

(2)

We notice that there may not exist any values of propositional variables that satisfy all clauses in the CNF formula to be satisfied. For instance, consider the CNF formula (1) with propositional variables and : there are no assignments of two propositional variables makes CNF formula satisfiable, while there exist assignments simultaneously satisfying three clauses out of four clauses exist.

Appendix B Detailed description of the reasoning layer

As MAXSAT problem in Appendix A is a combinatorial optimization problem which is not differentiable, several approaches (wang2018low; goemans1995improved) attempt to relax such MAXSAT problems into the continuous optimization problem. Specifically, they propose the conversion of MAXSAT problem into semidefinite program (SDP). In depth, the MAXSAT problem in (2) can be formulated as the following optimization problem:

(3)

where , with are relaxations of propositional variables and clauses, respectively. Here, the solution of original MAXSAT problem can be recovered from the solution of SDP in probabilistic manner via randomized rounding, i.e., (goemans1995improved; wang2019satnet). Moreover, barvinok1995problems; pataki1998rank show the optimal solution of the original problem can be recovered from relaxed SDP if , and wang2017mixing proves coordinate descent update in this SDP on converges to the optimal fixed point.

With this continuous relaxation of the MAXSAT problem, wang2019satnet proposes SATNet to bridge such continuously relaxed problems and the deep neural network. Specifically, they regard in (3) as the layer weight of the neural network, i.e., they define differentiable forward operation which solves (3) from current weights and backward operation to learn a relaxed logical formula from a given dataset via optimizing .

Appendix C Detailed description of the optimization scheme

We first notice that the input of the reasoning layer is non-differentiable, as it includes the operator. Consequently, optimization of the reasoning loss affects the layer parameter but not other parameters, e.g., codebook . To solve this problem, we propose to use the relaxed version of propositional embeddings , denoted by , where each for is defined as follows:

Moreover, we also note that there is no gradient in due to the operator to have from each image . To compensate this issue, we simply use straight-through operator (bengio2013estimating), i.e., we copy gradients of the decoder input to the encoder output .

Hyperparameters.

We note that LoGe contains following hyperparameters: the size of codebook

, the size of spatial features , the size of reasoning layer (see Appendix B), and the coefficient in the abstraction loss . In all experiments, we use universal hyperparameter setups: , , , and .

Appendix D Details of the RAVEN/i-RAVEN dataset

RAVEN. RAVEN dataset (zhang2019raven) is a synthetic dataset to evaluate the abstract reasoning ability of machines, where each problem is a Raven’s Progressive Matrices (RPM) format. Specifically, the dataset consists of total 7 problem types: Center-Single (Center), Left-Right (L-R), Up-Down (U-D), Out-InCenter (O-IC), Out-InGrid (O-ID), 2x2Grid, and 3x3Grid, where each configuration contains 10000 problems. Here, we consider 4 out of total 7 configurations: Center, L-R, U-D, and O-IC. Each image contains five attributes: number of objects, position, shape, size, and color. For rules, the dataset contains 4 rules in total: the attribute is either constant, progressive, arithmetic, and distributed across each row in the problem. Figure 1 illustrates the Center configuration in RAVEN dataset, and more illustrations are provided in Figure 3.

i-RAVEN. i-RAVEN dataset (hu2020hierarchical) is a modified version of the RAVEN dataset with a different rule to generate a list of candidates in the problem. To be specific, hu2020hierarchical finds there exists a shortcut bias in candidates in RAVEN dataset, i.e., one can find the answer only from candidates without accessing the context images of the problem. Accordingly, they propose a new RAVEN dataset in which such a bias is removed from the candidate set to better measure the reasoning ability of the discriminative abstract reasoning framework. Moreover, they find that the accuracy of existing methods significantly drops if this shortcut bias is removed.

Appendix E Detailed description of baselines

Baselines for qualitative results. We notice that the neural architecture in LoGe is composed of vector-quantized variational autoencoder (VQ-VAE) (van2017neural) and SATNet (wang2019satnet)

, based on employing propositional logical prior to solve abstract reasoning problems. To verify the effectiveness of this prior on the abstract reasoning, we qualitatively compare generated answer images from other widely used neural architectures without this assumption. As generative neural architectures for solving reasoning problems are under-explored, we compare the result with other architecture variants where VQ-VAE and SATNet is substituted to different neural architectures. Specifically, we compare results from combinations of different encoders (autoencoder and VQ-VAE) and reasoning networks (attention layer and 2-layer convolutional neural networks (CNNs) with different kernel sizes, where kernel sizes of CNNs set to 3 and 1).

Baselines for qualitative results. In the rest of this section, we briefly explain previous approaches to solve the abstract reasoning problem via deep neural networks.

  • [leftmargin=0.2in]

  • LSTM (zhang2019raven) attempts to utilize LSTM to validate the inefficiency of conventional deep neural network architectures to solve reasoning problems.

  • WReN (santoro2018measuring) proposes to solve the abstract reasoning problem via relation network. (NIPS2017_e6acf4b0).

  • LEN (zheng2019abstract)

    proposes a variant of relation network, where the input of the network is a triplet of images rather than a pair of images. Furthermore, they empirically verify the performance can be further boosted with the curriculum learning based on a reinforcement learning framework.

  • CoPINet (zhang2019learning) suggests a contrastive learning algorithm to learn underlying rules from given images.

  • SRAN (hu2020hierarchical) designs a hierarchical neural network framework that simultaneously considers images in the problem individually and also at the row and column level.

Appendix F More illustration of generated results

In this section, we provide additional generated comparisons of different configurations in RAVEN deadset.

Figure 4: Comparison of generated and ground truth image on Center configuration in RAVEN dataset among different architecture variants. (a)-(c): autoencoder with an attention layer and two convolutional networks with different kernel sizes as the reasoning network. We set the kernel size of 3 and 1 for (b) and (c), respectively. (d)-(e): VQ-VAE with an attention layer and a convolutional network with a kernel size of 3. Ours: our LoGe framework, which is composed of VQ-VAE and SATNet. GT: ground-truth answer image.
Figure 5: Comparison of generated and ground truth image on U-D configuration in RAVEN dataset among different architecture variants. (a)-(c): autoencoder with an attention layer and two convolutional networks with different kernel sizes as the reasoning network. We set the kernel size of 3 and 1 for (b) and (c), respectively. (d)-(e): VQ-VAE with an attention layer and a convolutional network with a kernel size of 3. Ours: our LoGe framework, which is composed of VQ-VAE and SATNet. GT: ground-truth answer image.
Figure 6: Comparison of generated and ground truth image on L-R configuration in RAVEN dataset among different architecture variants. (a)-(c): autoencoder with an attention layer and two convolutional networks with different kernel sizes as the reasoning network. We set the kernel size of 3 and 1 for (b) and (c), respectively. (d)-(e): VQ-VAE with an attention layer and a convolutional network with a kernel size of 3. Ours: our LoGe framework, which is composed of VQ-VAE and SATNet. GT: ground-truth answer image.
Figure 7: Comparison of generated and ground truth image on O-IC configuration in RAVEN dataset among different architecture variants. (a)-(c): autoencoder with an attention layer and two convolutional networks with different kernel sizes as the reasoning network. We set the kernel size of 3 and 1 for (b) and (c), respectively. (d)-(e): VQ-VAE with an attention layer and a convolutional network with a kernel size of 3. Ours: our LoGe framework, which is composed of VQ-VAE and SATNet. GT: ground-truth answer image.