Log In Sign Up

Natural language understanding for logical games

by   Adrian Groza, et al.

We developed a system able to automatically solve logical puzzles in natural language. Our solution is composed by a parser and an inference module. The parser translates the text into first order logic (FOL), while the MACE4 model finder is used to compute the models of the given FOL theory. We also empower our software agent with the capability to provide Yes/No answers to natural language questions related to each puzzle. Moreover, in line with Explainalbe Artificial Intelligence (XAI), the agent can back its answer, providing a graphical representation of the proof. The advantage of using reasoning for Natural Language Understanding (NLU) instead of Machine learning is that the user can obtain an explanation of the reasoning chain. We illustrate how the system performs on various types of natural language puzzles, including 382 knights and knaves puzzles. These features together with the overall performance rate of 80.89% makes the proposed solution an improvement upon similar solvers for natural language understanding in the puzzles domain.


page 1

page 2

page 3

page 4


Learning Executable Semantic Parsers for Natural Language Understanding

For building question answering systems and natural language interfaces,...

A Neural-Symbolic Approach to Natural Language Understanding

Deep neural networks, empowered by pre-trained language models, have ach...

Logic, Language, and Calculus

The difference between object-language and metalanguage is crucial for l...

A Relational Tsetlin Machine with Applications to Natural Language Understanding

TMs are a pattern recognition approach that uses finite state machines f...

Information Extraction Tool Text2ALM: From Narratives to Action Language System Descriptions

In this work we design a narrative understanding tool Text2ALM. This too...

GF + MMT = GLF – From Language to Semantics through LF

These days, vast amounts of knowledge are available online, most of it i...

A Language for Function Signature Representations

Recent work by (Richardson and Kuhn, 2017a,b; Richardson et al., 2018) l...

I Introduction

Here is a puzzle for you: There are three friends staying on the couch in Central Perk: Rachel, Ross, and Monica. Monica is looking at Ross. Ross is looking at Rachel. Monica is married; Rachel is not. Is a married person looking at an unmarried person? [8] (Figure 1). The answer may seem difficult since we do not know if Ross is married or not. A fan of the “Friends” series would know that even Ross has some difficulties to figure out if he is married or not.

Yet, such puzzles are easy to solve for theorem provers, given the proper formalisation. Take for instance the given facts in first order logic (FOL):

The theorem to prove:


is easily demonstrated by resolution-based theorem provers. The challenge remains how to translate the text of the puzzle from natural language to first order logic.

Building a system that can understand the natural language as well as a human is considered a ”too hard” AI problem [17], since an exact understanding of all clues is necessary to figure out the solution. Barrière views language understanding as a transformation of text into a deeper representation, one on which reasoning can occur [1]. In this line, we argue here that solving logical puzzles by machines is a powerful and relevant test to illustrate natural language understanding.

For a software agent, solving such puzzles raises two main challenges: i) translating the text from natural language to some logical representation; and ii) reasoning on that formalisation to infer the solution. Also, in most of the cases, background knowledge is required to solve the puzzle (e.g. the married relation is symmetric , and functional ). What distinguishes puzzles from classical tasks in natural language understanding is their nice property: each piece of information provided in the text is required. Hence, a good puzzle provides no text which is not useful. Differently from tasks from the knowledge representation domain in which there is a lot of knowledge (e.g. large ontologies, big data) and little reasoning, here puzzles require knowledge and reasoning in similar proportions.




I am in a break!

I am not married

I am married
Fig. 1: A warm-up puzzle: Is a married person looking at an unmarried one?

We present a solver for logical puzzles in natural language. We rely on grammar rules and lexical resources such as Wordnet [13]. The text is automatically translated into FOL using the NLTK [2] and then given to the theorem prover Prover9 and model finder Mace4 [11].

Ii System overview

The input of this system consists of the puzzle to be solved, in natural language. The solver111The system is available at - Link follows four steps (Figure 2

). First, we have to build the context-free grammar which contains the lexicon and the grammar. Second, the text is translated into FOL. Third, we add the background knowledge for the given domain (i.e. knights and knaves puzzles). Fourth, a model builder (in our case, Mace4) is called to compute the solutions.

Fig. 2: The four modules of the puzzle solver

Ii-a Building the grammar

Consider the following small puzzle: Diana is taller than Maria. Ana is taller than Diana. Who is the shortest? The grammar for such puzzles contains both lexical and grammar rules:

Lexical rules hold the linguistic categories for each puzzle (Listing 1). A custom encoding is defined for every category (e.g. N for noun, VT for transitive verbs, IV for intransitive verbs). Every word has to belong to such a category and can have one or more features [16]. Two of the most important features used by the solver are named SEM, which is used to specify the semantics of the word and NUM, which can be used to define the number of a noun (i.e. which can have a value like for singular or for plural). Thus, an invalid sentence as ”These dog barks” cannot be parsed. To assess the correctness of sentences, we also included features like PERS for distinguish the person, or TENSE for verb tenses.

The most important feature is the one for defining the semantics of the words. The values of the SEM feature are computed based on -calculus. For example, the expression represents the entities that both walk and chew gum. If we apply this expression to a person named Gerald, we obtain the expression , which is equivalent to the expression , meaning that Gerald walks and chews gum. The lambda operator is essential in connecting different words.

22PropN[-LOC,NUM=sg,SEM=<\P.P(kevin)>] -> Kevin
23PropN[-LOC,NUM=sg,SEM=<\P.P(diana)>] -> Diana
24PropN[-LOC,NUM=sg,SEM=<\P.P(maria)>] -> Maria
26Det[NUM=sg,SEM=<\P Q.exists x.((P(x) & Q(x)) & all y.(P(y) -> (x = y)))>] -> the | The
28InterPron[-LOC, NUM=sg, SEM=<\P.P(x)>] -> who | Who
30Aux[+COP,NUM=sg,SEM=<\P x.P(x)>,tns=pres] -> is
31Aux[+COP,NUM=pl,SEM=<\P x.P(x)>,tns=pres] -> are
33Adj[GRD=rel, SEM=<\X x.X(\y.(taller(x,y)))>] -> taller
34Adj[GRD=abs, SEM=<\x.(shortest(x))>] -> shortest
35P[+than] -> than
Listing 1: Lexical rules

Grammar rules define, in a recursive way, how words from different parts of speech are connected ). The rules in Listing 1 were manually crafted for the application domain. These sequences of words are grouped into chunks. An example of such a chunk is the noun phrase chunk, which can be exemplified by expresions like ”the man” or ”the red ball”. Unlike the words in lexical rules, which can be considered terminals, the chunks are nonterminals. A grammar is context-free if the left side of its rules are only nonterminals, whereas the right-hand side can be a combination of terminals and nonterminals.

Important for consistency and generalization is the usage of variables as values for features. Thus, using a wildcard operator, we can define a rule like


which says that the feature of both and must have the same value, so we can control the correctness of a sentence. Note that the context-free grammar in Listing 2 can parse sentences such as ”Kevin is taller than Diana” or ”Maria is the tallest”, but not ”Maria are the tallest”.

5S[SEM = <?subj(?vp)>] -> NP[NUM=?n,SEM=?subj] VP[NUM=?n,SEM=?vp]
7NP[LOC=?l,NUM=?n,SEM=?np] -> PropN[LOC=?l,NUM=?n,SEM=?np] | InterPron[LOC=?l,NUM=?n,SEM=?np]
9VP[NUM=?n,SEM=<?v(?prd)>] -> AuxP[+COP,NUM=?n,SEM=?v] Pred[SEM=?prd]
11Pred[SEM=?prd] -> AdjP[GRD=?rel, SEM=?prd] | AdjP[GRD=?abs, SEM=?prd]
13AdjP[GRD=?rel, SEM=<?adj(?pp)>] -> Adj[GRD=?rel, SEM=?adj] PP[+than,SEM=?pp]
14AdjP[GRD=abs, SEM=?adj] -> Det[NUM=sg] Adj[GRD=abs, SEM=?adj]
16PP[+than, SEM=?np] -> P[+than] NP[SEM=?np]
18AuxP[COP=?c,NUM=?n,SEM=?aux] -> Aux[COP=?c,NUM=?n,SEM=?aux]
Listing 2: Grammar rules

Ii-B Translating to First-Order Logic

We start the translation mechanism with the coreference resolution, followed by parsing based on the grammars previously built.

Coreference resolution aims to make a connection between different expressions from the text and the corresponding entities (e.g. the pronoun he with the corresponding person that the pronoun refers to). Failing to achieve coreference resolution leads to failure in solving the puzzle, as some information would be omitted or interpreted in a wrong way. For this task we rely on the Neuralcoref module based on the spaCy parser [15].

Next, we obtain a tree representation for every sentence in the updated puzzle. The representation of every sentence in FOL is obtained from the previously defined SEM feature, which is stored in the parse trees. The lexicon is automatically extended with all the synonyms offered by the Wordnet lexical database, so that a word without definition in the grammar file but with a synonym defined in the grammar file can be successfully parsed. The ”unseen” word will take all the attributes from the synonym found in the grammar file, so there is no difference between the parsing result of a sentence containing unseen words and the one containing only predefined words.

As an example, let the following puzzle:

Puzzle 1

On the island where each inhabitant is either a knave or a knight , knights always tell the truth while knaves always lie. You meet two inhabitants: Marge and Homer. Marge says that Homer and she are both knights or both knaves. Homer claims that Marge and he are the same. Can you determine who is a knight and who is a knave? (Figure 3).



I always tell the truth!

I always lie!



We are both knaves!
Fig. 3: Example of puzzle with knights and knaves
Fig. 4: Parsing tree for ”You meet two inhabitants:…”

Figure 4 shows the parse tree for the first sentence. Note that there are some words whose semantics are not considered when the representation in FOL has to be obtained, like the pronoun ”You” or the transitive verb ”meet”. By using the grammar rules the first sentence is translated into FOL as three clauses:


The sentence Marge says that Homer and Marge are both knights or both knaves is formalised with:


The clue Homer claims that Marge and Homer are the same is formalised with:


One of the system features, in contrast to other existing solvers, is the ability to distinguish, for example, between the person names from the puzzle. Thus, the user does not need to specify, by himself, who are the named entities in the puzzle he wants to solve. Recognising the named entities was achieved by the spaCy library, using a trained existing pipeline in order to get a label for the discovered named entities by making a prediction. The task in our example is to retrieve the existing persons in the puzzle. Since the unique name assumption does not hold by default in FOL, we explicitly state the distinctions between names (e.g. ).

Ii-C Adding background knowledge

Although for a human agent the task of figuring out the additional knowledge needed to reach a solution is an easy one, for the software agent it is challenging [7].

For each word in the puzzle, we extract from the WordNet the set of synonyms. Then, we check if any synonym from the set coexists in the specified text. If so, we formalise in FOL the equivalence (e.g. ). When extracting the synonyms we considered the part of speech also. For example, the word ”state” can be both a noun (e.g country) or a verb (as a synonym for ”say”). Both of these two words are valid synonyms, but in different contexts. Therefore, we determine the POS in the current context and consider just the synonyms which have the same tag. To avoid building some assumptions which refer to the same words, but in different forms (like and ), lemmatization is applied.

There exists, however, some contextual meaning that depends on the puzzle domain. For instance, in the puzzles of knights and knaves, the sentence ”Alice and Sam are the same”, requires the proper formalisation for the predicate ”same”:


The domain knowledge for the knight and knaves puzzles states also that there are only knights and knaves:


The above tiny background knowledge for the knights and knaves domain was also manually crafted.

Ii-D Computing the solution

In the final step, the Mace4 model builder computes the solution. Mace4 uses the FOL representation of the puzzle to compute a model for the given FOL theory. For example, Mace4’s solution for the Puzzle 1 appears in Figure 5.



Homer and I are both knights or both knaves

Marge and I are the same
Fig. 5: Computing models with MACE4: both Homer (i.e. ) and Marge (i.e. ) are knights

Using a model finder to answer the puzzle is not necessarily an obvious choice since the constraints could have more than one model. However, fa good puzzle should have sufficient constraints to ensure that the answer to the question does not depend on the choice of model.

Ii-E Question answering and explainability

We empower our solver, with two interesting features: i) question answering (QA) - the capacity to answer various natural language questions from the puzzle, and ii) explainability (XAI)- the capability to provide graphical proofs for each answer provided. For these features we used Prover9 to provide Yes/No answers and graphical representation of the proofs for each answer.

For exemplifying the question answering capability consider the following puzzle:

Puzzle 2

On the island where each inhabitant is either a knave or a knight, knights always tell the truth while knaves always lie. You meet two inhabitants: Sue and Alice. Sue claims that Alice is a knave. Alice says that she and Sue are knights.

First, the solver uses named entity resolution to identify the two persons Sue and Alice. Hence the domain size is set to two elements. Then the correference resolution figures out that the pronoun ”she” refers to Alice in the sentence: ”Alice says that she and Sue are knights.” The Wordnet is used to identify that ”claims and ”says” are synonyms, thus adding this piece of knowledge in the FOL theory. The FOL theory built using the grammar file, the domain size, and the background knowledge is given to MACE4, which finds valid instantiations for each element in the theory. The user can ask questions like:

Is Sue a knight?
Does Alice lie?
Is Alice a knave?
Are Alice and Sue different?
Are Alice and Sue the same?
Are Alice and Sue both knights?

The same grammar file is used to parse the queries and to translate them as theorems in First Order Logic. The theorem is given to Prover9 which provides a YES/No answer.

Figure 6 shows the proof for the theorem ”Marge and Mel are knights”. There are assumptions from text (line 4) or from synonymy between words (line 12), while the goal appears in line 14. Some clauses represent disjunctions of literals deduced from assumptions (line 42). Based on resolution, new clauses inferred (e.g. line 68), until the empty clause is deduced which signals that the theorem is proved.

Fig. 6: Proof for the computed solution

For each Yes/No computed answer, the system outputs a graphical representation of the proof. To illustrate this feature, consider the following puzzle.

Puzzle 3

On the island of knights and knaves, knights always tell the truth, while knaves always lie. You are approached by two people. The first one says: “We are both knaves”.

Consider also the compound query: Is the first inhabitant a knave and the second one a knight? The corresponding FOL theory appears in Listing 3.

First, the inhabitants are either knights or knaves:


Second, one cannot be a knight and a knave in the same time:


Third, a message said by a knight is always true: . Fourth, a message said by a knave is always false: . These pieces of knowledge represent background knowledge for all knights and knaves puzzles Next, we formalise the current puzzle. We learn that there are two inhabitants, let’s say and : . We need a domain size of two individuals. The message of inhabitant is: .

2  all x (inhabitant(x) -> knight(x) | knave(x)).
3  all x ((knight(x) -> -knave(x)) & (knave(x) -> -knight(x))).
4  knight(x) -> m(x).              knave(x)  -> -m(x).
6  inhabitant(a) & inhabitant(b).
7  m(a) <->  knave(a) & knave(b).
11  knave(a) & knight(b).
Listing 3: A FOL theory used to provide Yes/No answers

Given the code in Listing 3, Prover9 finds the proof in Figure 7 for the goal . Here, the prover finds a contradiction based on three pieces of knowledge. First, the negated conjecture {12} is equivalent to {18}: . Second, the agent deduces {17}: is a knave or the message is true. Further, based on {15}, the system deduces that is a knave (clause {22}). Third, the prover infers {20} based on the input clause {5} (knaves are liars) and the given message {3}. Using resolution between {20} and {22}, is not a knave. By combining these three pieces of knowledge ({18}, {22}, {23}) through hyper-resolution, the software agent signals a contradiction. Hence, the theorem {6} is proved.



We are both knaves!
Fig. 7: Proof for clause {6}: is a knave and is a knight based on clause {3}: “We are both knaves”

Iii Running experiments

We evaluate the solver on a dataset of 382 puzzles with knights and knaves222Puzzles are taken from the website
. The complexity of these puzzles is variable, starting with puzzles with two individuals (such as Puzzle 1), continuing with puzzles with more text and persons such as Puzzle 4

. As a measure of complexity, the puzzles can be classified based on the number of inhabitants: there are 50 puzzles each with two, three, four, five, six, seven and eight inhabitants, and 32 puzzles having nine inhabitants.

Puzzle 4

”On the island where each inhabitant is either a knave or a knight, knights always tell the truth while knaves always lie. You meet nine inhabitants: Carl, Betty, Ted, Dave, Marge, Alice, Rex, Bob and Sally. Carl claims that Ted would tell you that Alice is a knave. Betty says that at least one of the following is true: that Ted is a knave or that Dave is a knave. Ted says that Rex could say that Betty is a knave. Dave claims that Sally and Alice are different. Marge says that Rex is a knave. Alice claims that Carl is a knave or Sally is a knight. Rex says that he knows that he is a knight and that Bob is a knave. Bob says that he and Rex are both knights or both knaves. Sally tells you that Bob is a knave. Can you determine who is a knight and who is a knave?”

We start by formalising the context-free grammar file by analysing 43 puzzles of different complexities, including 20 puzzles with two inhabitants, 5 puzzles with 3 inhabitants, and 3 puzzles each with four, five, six, seven, eight and nine inhabitants. The remaining 339 unseen puzzles were used to test the grammar.

Testing puzzzles Solved puzzles Performance
Grammar 339 331 97.64%
Named Entity Recognition 382 342 89.52%
Coreference Resolution 382 352 92.14%
TABLE I: System performance on knights and knaves puzzles set

In our experiments, first we aimed to check how many of the unseen 339 puzzles can be automatically solved. Second, we aimed to identify the impediments for those puzzles that couldn’t be solved. Hence, we analysed three points of failure: i) the incapacity of the grammar to parse the text, ii) the failure to identify all the persons from the puzzles, and iii) the failure of coreference resolution step. Table I bears out the experimetal results against these three coordinates. In line 1, since 43 puzzles were used to build the grammar, just the rest of 339 puzzles are tested. The performance of the Named Entity Recognition (line 2) and Coreference Resolution (line 3) were tested against the entire set of 382 puzzles. The conclusions for each metric are:

  1. Grammar: The grammar manually built by analysing 43 puzzles manages to parse 97.64% of the remaining 331 puzzles. The 8 unsolved puzzles need more grammar rules to be covered

  2. Named Entity Recognition: The implementation offered by spaCy encountered some problems in finding out all of the person names from our puzzles. For instance, there were some puzzles where a name like ”Peggy” or ”Bozo” were not recognized. In some puzzles, although all the person names were found, the trained pipeline used made some other mistakes (e.g. it identified ”Ted and Bob” as a single person instead to ”Ted” and ”Bob” as different persons).

  3. Coreference Resolution: Two problems were noticed: i) there were some pronouns which could not be bound to any person name; and ii) the pronoun was not replaced with the name of the person it should. Both cases led to the incapacity of the system to solve the puzzle further. However, the task of coreference resolution had a performance score of 92.15%.

We also analysed how the performance is affected by the puzzle’s complexity. Here, we measure complexity in terms of number of persons appearing within the text (see Table II).

Number of inhabitants Puzzles Solved puzzles Performance
2 50 44 88%
3 50 43 86%
4 50 42 84%
5 50 41 82%
6 50 39 78%
7 50 39 78%
8 50 38 76%
9 32 23 71.87%
TABLE II: Performance based on puzzles complexity (i.e. number of inhabitants)

We focused here on knight and knaves puzzles. However, the system is extensible: to solve puzzles from a different domain, one has to provide the corresponding grammar file (see Figure 8). The public version of our system provides small grammars for other types of puzzles.

Fig. 8: Running system

Taking into account that there were five puzzles where neither the Named Entity Recognition, nor the Coreference Resolution managed to be applied successfully, in total, a number of 73 puzzles out of 382 could not be solved. Hence, on the proposed set of puzzles, the overall performance of our solver is 80.89%.

Iv Discussion and Related work

Solving logical word puzzles is considered a challenging task. Lev et al. have also proposed a solution based on grammar rules, FOL, and model builders [10]. Lev et al. have focused multiple-choice puzzles, so the inferences only have to find out the correct answer, not to discover it. Milicevic et al. have tackled the task by using the ink Grammar general-purpose English parser, a semantic translator, and an automated logical analyzer [12]. The Alloy language is used as a formal representation of the puzzle and and its corresponding constraints solver computes the solution. The solver is designed around the Zebra puzzles and tested on a dataset of 68 puzzles. Here, the user must rephrase the ambiguous phrases, which is a disadvantage of a general parser. Our architecture is similar to the work of Milicevic as both solutions compose a parser and an inference module. Our parser is based in NLTK and Spacy while the inference modules uses MACE4 to find the solution and Prover9 to graphically represent the proof for the solution.

Bogaerts et al. have proposed a solver for logic grid puzzles which also makes use of QA and XAI [3]. De Cat et al. employs an extension of FOL [6]. A drawback that the named entities (e.g. persons, colors) in the puzzle cannot be detected automatically and must be stated by the user. Also, there exists a semi-automated process to detect the synonymy between verbs. Differently, we empower our agent with capabilities to automatically detect synonymy between other part-of-speech too, such as nouns.

Jabrayilzade and Tekir have proposed the DistilBERT tool that automatically solves logic grid puzzles [9]. The clues are translated in Prolog and the reported accuracy is 100%. The zebra puzzles have different categories that need to be recognised within the text (e.g. person, name, occupation, color). One assumption of Jabrayilzade and Tekir is that these categories and also their instances are clearly given before parsing the text. Differently, we relaxed this assumption by performing named entity recognition. This task introduces some points of failure (89.52% in our experiments), but it represents a mandatory step towards automatic generic solvers. A second assumption of Jabrayilzade and Tekir is that the domain is closed to five predicates only: is, either, all different, pair different, and comparison

. The solver is based on classifying the clues into one of these five predefined predicates. This is performed with a feed-forward neural network with Softmax activation on top. A number of 50 puzzles were used for training and 100 for testing. In our case, we analysed 43 puzzles to identify the recurrent predicates, and we tested the resulted grammar on 331 puzzles.

Mitra and Baral have developed the Logicia system [14] also for 150 Zebra puzzles. The clues are classified using a maximum entropy model based on features like POS tags or dependency trees. The target language is answer set programming, based on which 71 out of 100 puzzles have been solved.

Some work has been made on the particular puzzles of ”knights and knaves”. For instance, Chesani et al. have proposed the following workflow: i) understanding the text, ii) identifying modeling (e.g. proposition logic) and solving techniques (e.g. truth tables, resolution), iii) identifying problem components and hidden knowledge (if exists), iv) framing the model and solving the problem [4]. A completely different approach, via Logic Algebra, is the work of Ciraulo and Maschio [5]. This method relies on a system of equation for solving a puzzle. Thus, encoding every inhabitant from puzzle with a different unknown variable, assuming that a truth proposition is equal to 1, using the properties of a boolean ring and a binary encoding for knight and knave, the solution is found by solving the equations. The disadvantage of this proposal is that it cannot solve other types of logical word puzzles, being focused just on the types and variations of puzzles with knights and knaves.

Our system is fairly successful on the benchmarks of 382 problems, but these benchmarks are limited to a single category of puzzles. In particular, ”knights and knaves” domain require very little background knowledge: the main axioms are common to all problems and the only additional knowledge used is the synonymy of some words. We are interested to apply the method to other domains, by automatically import background knowledge.

Our approach belongs to the larger domains of Question Answering (QA) and Explainable AI (XAI). QA highlights the ability of a system to answer the questions addressed by humans in natural language, while XAI explains how the solution was found in a human understandable way. Regarding QA, our software agent is able to provide Yes/No answers to natural language questions related to each puzzle. Moreover, in line with XAI, the agent can back its answer, providing a graphical representation of the proof.

V Conclusion

We described here a tool to automatically solve logical puzzles, in particular knights and knaves puzzles. The puzzles are given in natural language. The problem is parsed and translated to first-order logic using the Natural Language Toolkit with a hand-crafted lexicon and grammar. The MACE4 model finder constructs a solution. In addition, the system allows to process natural language yes/no questions by translating them to first-order logic and letting the theorem prover Prover9 find the answer. By combining natural language processing with theorem proving, the system can fully explain its answer in the form of a graphical proof. The evaluation experiments shown that 309 out of 382 problems could be solved, with only 43 of them were used to build the grammar, and the rest were completely new to the system.

Our solution is based on manually created grammar rules for a closed domain: knights and knaves puzzles. The lexicon benefits from the Wordnet lexical database, while the natural language processing pipeline, including coreference resolution, was developed with the Spacy library. The resulting FOL theory is given to MACE4 model finder. Since MACE4 works on finite domain, it is important to compute the domain size of each problem. This is in our case, the exact number of persons that need to be instantiated as knights or knaves. Hence, the named entity recognition was a necessary task to automatically compute the domain size. We also empower our software agent with the capability to provide Yes/No answers to natural language questions related to each puzzle. Moreover, in line with XAI, the agent can back its answer, providing a graphical representation of the proof. These features together with the overall performance rate of 80.89% make the proposed solution an improvement upon similar solvers for natural language understanding in the puzzles domain.


We thank the anonymous reviewers for their valuable comments.


  • [1] C. Barrière (2016) Natural language understanding in a semantic web context. Springer. Cited by: §I.
  • [2] S. Bird (2006) NLTK: the natural language toolkit. In Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, pp. 69–72. Cited by: §I.
  • [3] B. Bogaerts, E. Gamba, and T. Guns (2020) A framework for step-wise explaining how to solve constraint satisfaction problems. arXiv preprint arXiv:2006.06343. Cited by: §IV.
  • [4] F. Chesani, P. Mello, and M. Milano (2017) Solving mathematical puzzles: a challenging competition for AI. AI Magazine 38 (3), pp. 83–96. Cited by: §IV.
  • [5] F. Ciraulo and S. Maschio (2020) Solving knights-and-knaves with one equation. The College Mathematics Journal 51 (2), pp. 82–89. Cited by: §IV.
  • [6] B. De Cat, B. Bogaerts, M. Bruynooghe, G. Janssens, and M. Denecker (2018) Predicate logic as a modeling language: the IDP system. In

    Declarative Logic Programming: Theory, Systems, and Applications

    pp. 279–323. Cited by: §IV.
  • [7] A. Groza and L. Corde (2015) Information retrieval in falktales using natural language processing. In 2015 IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), pp. 59–66. Cited by: §II-C.
  • [8] A. Groza (2021) Modelling puzzles in first order logic. Springer International Publishing. External Links: Document Cited by: §I.
  • [9] E. Jabrayilzade and S. Tekir (2020) LGP Solver – Solving Logic Grid Puzzles Automatically. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pp. 1118–1123. Cited by: §IV.
  • [10] I. Lev, B. MacCartney, C. D. Manning, and R. Levy (2004) Solving logic puzzles: from robust processing to precise semantics. In Proceedings of the 2nd Workshop on Text Meaning and Interpretation, pp. 9–16. Cited by: §IV.
  • [11] W. McCune (2005) Prover9. University of New México. Cited by: §I.
  • [12] A. Milicevic, J. P. Near, and R. Singh (2012) Puzzler: an automated logic puzzle solver. Cited by: §IV.
  • [13] G. A. Miller (1995) WordNet: a lexical database for English. Communications of the ACM 38 (11), pp. 39–41. Cited by: §I.
  • [14] A. Mitra and C. Baral (2015) Learning to automatically solve logic grid puzzles. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1023–1033. Cited by: §IV.
  • [15] Y. Vasiliev (2020) Natural language processing with python and spacy: a practical introduction. No Starch Press. Cited by: §II-B.
  • [16] W. Wagner (2010) Steven bird, ewan klein and edward loper: natural language processing with python, analyzing text with the natural language toolkit. Language Resources and Evaluation 44 (4), pp. 421–424. Cited by: §II-A.
  • [17] R. V. Yampolskiy (2013) Turing test as a defining feature of AI-completeness. In

    Artificial intelligence, evolutionary computing and metaheuristics

    pp. 3–17. Cited by: §I.