A current new and promising neural network architecture, Logical Neural Networks (LNNs) proposed by lnns
, has shown promising results as an architecture which combines neural network’s abilities to learn and systems of formal logic’s abilities to perform symbolic reasoning. LNNs work by associating with each neuron in the network a subexpression of a real-valued logic formula, such as weighted Łukasiewicz logic. Because of the one-to-one association of neurons to logical formulae, LNNs have the ability to represent their decisions in terms of logic; therefore, unlike other neural architectures, we have the ability to easily interpret the decisions of LNNs while still retaining the robust learning ability of neural networks. In summary, LNNs allow for a neural architecture with explainable decisions.
Moreover, because of LNNs’ inherit “two-sided” nature – i.e., their one-to-one correspondence of formulae to neurons – LNNs provide programmers the ability to modify the underlying structure of the neural network without needing to actually work with the neurons themselves. In other words, LNNs allow programmers to work with neural networks at an abstracted level via easy to understand and concise logical formulae. Thus, LNNs can be seen to have two sides: a “logic side” and “neuron side”. This notion of abstraction is akin to that inherit in the design of the Internet because the Internet was made such that one could use and design Internet applications without needing to worry about the low-level details and protocols actually used to transmit data. In this paper, we argue that this abstraction allows for the robust design of more complex extensions of LNNs without needing to work with “low-level” neurons.
LNNs operate by taking as input a knowledge base where each entry is a logical formula. Currently, LNNs support formulae expressed in first-order logic (FOL) – a very powerful language with respect to what one can express in it. FOL languages typically include predicate, variable, constant, and function symbols – along with an equality operator represented as a special predicate or logical constant symbol. The current architectural design of LNNs, however, only supports the use of expressions in FOL without equality and functions symbols. While equality and functions are not needed to obtain the full expressive ability of FOL, they nevertheless provide one with the ability to write and reason with formulae in a much easier and natural way. Therefore, in this work, in order to demonstrate the robustness of LNNs, we extend LNNs to support equality and function symbols.
Furthermore, in doing so, we restrict ourselves to working at the logic side of LNNs – demonstrating the power of abstraction inherit in LNNs. Specifically, we introduce both equality and functions as first-order theories, i.e., additional axioms expressed in terms of and reasoned with FOL. Therefore, we need only introduce these axioms into the network via logical formulae during the construction of the network. In this work, we explain how the introduction of first-order theories increases the domain of problems LNNs can reason about. We additionally provide a description of what the introduction of these theories corresponds to in terms of the low-level neuron side of LNNs. Finally, as a proof of concept, we introduce support for equality into the IBM’s LNN Python library111https://github.com/IBM/LNN, therefore, allowing IBM’s system to now reason about the equality symbol.
In this paper, we first provide a background on LNNs in Section 2
; we focus primarily on the basic structure of an LNN, i.e., how symbols and neurons interact and the connections between real-valued logic and activation functions. We also briefly discuss how inference and learning work in LNNs. In Section3, we introduce the theoretical framework used to add equality and function to LNNs and discuss how the addition of equality and functions using this framework affects the neural structure of LNNs. In Section 4 we discuss the details of our implementation of equality into IBM’s LNN library and provide an example of how this increases the domain of problems LNNs can handle. In Section 5 we conclude and discuss future work.
In this section, we provide a brief introduction to the LNN framework created by lnns.
Logical Neural Networks
Historically, the field of artificial intelligence has focused on eitherstatistical AI or symbolic AI. Statistical AI for example includes the study of neural networks, while symbolic AI has included the study of “good old-fasioned” AI, i.e., deductive systems. Statistical AI allows for inductive reasoning which enables the model to generalize inferences from a set of given data; symbolic AI allows for deductive reasoning which enables the model to draw for-sure conclusions using formal systems of logic. Moreover, symbolic AI allows for one to have a clear, explainable sequence of reasoning steps – statistical AI tends to just be a black-box with no way to understand why the model made the decision it did. LNNs aim to create a bridge between these two distinct approaches and, therefore, allows a model to perform both inductive and deductive reasoning – taking advantage of the benefits of both the statistical and symbolic AI approaches.
Basic Structure of LNNs
Currently, the framework of LNNs proposed by Riegel et al. combine the capabilities of neural networks and FOL by associating neurons with subformulae of real-valued logic, specifically, Łukasiewicz logic. Overall, the structure of the neural network associated with the formula is exactly the formula’s syntax tree; an example is shown in Figure 1.
In Łukasiewicz logic, instead of having logical expressions evaluate to simply true or false, they now evaluate to some real-valued number between 0 and 1; we call this number the truth value. We define a threshold of truth, , such that we consider a statement true if its truth value is above and false if below .
Each neuron returns a pair of numbers between 0 and 1 called the lower and upper bounds. These numbers determine which primary state the neuron is in; these states are displayed in Figure 2.
Essentially, each binary logical connective of the input formula, such as conjunctions ( or ) or disjunctions ( or ), is associated with a neuron; the neuron’s activation function is the real-valued logical operation, such as the following for Łukasiewicz logic, which computes the truth-value of the operation:
These activation functions are then generalized to a “weighted real-valued logic” which allows one to express the importance of a subformula; these weights are updated via backpropagation during learning. The unary negation () connective and existential () and universal () quantifiers are represented as pass-through nodes which are neurons with basic unweighted activation functions. For example, negation is simply: .
FOL predicates are represented as input neurons where each predicate has an associated table of groundings, i.e., our known input data for what the predicate should evaluate to given various inputs. The input to the predicates of the table are the FOL constants, which are symbols associated with objects within our domain of discourse, i.e., what we’re talking about.
To perform inference, LNNs perform an upward and downward pass through the network. The upward pass propagates the truth values from the predicates through the network, calculating the truth value of the entire formula itself. The downward pass works using the believed truth value of the entire formula to compute the truth values of its subformulae and predicates. Specifically, the upward pass will compute truth bounds for each subformula and, ultimately, the entire formula using the truth bounds of the predicates; the downward pass will then tighten these bounds for each subformula and, ultimately, each predicate until convergence. Convergence is guaranteed for propositional logic (i.e., no quantifiers and only nullary predicates) but is not guaranteed for FOL due to FOL’s undecidability.
Because we are working with real-valued formulae, the equations used are differentiable and, therefore, we are able to use backpropagation to update the parameters, such as the weights of each connective or the truth value bounds, of the formulae. The loss function for the optimization via backpropagation aims to minimize the amount of contradiction present in the model; in other words, it aims to remove as many contradictory conclusions as possible. This allows LNNs to learn from noisy and conflicting data sets.
3 Theoretical Results: Extending LNNs via First-Order Theories
In this section, we describe how to formalize equality and functions as first-order theories and how adding these theories into an LNN will affect its underlying structure.
Informally, a first-order theory is a set of symbols which we may include in our FOL formulae such that these symbols also have some sort of additional meaning. A set of axioms also included with the theory specify the additional meaning placed on these symbols. Essentially, first-order theories allow us to formalize more complex structures and concepts in which we would like to discuss: e.g., equality, functions, lists, arrays, trees, etc. In our case, we wish to formalize equality and functions.
Equality in FOL
We first wish to include a theory of equality to formalize the notion of the equality operation. The theory of equality is a common first-order theory and is a popular way to introduce an equality symbol to FOL.222cf. calccomp To do so, we begin by introducing the equality symbol itself as a binary predicate: “=”. Specifically, we will add a sort of “universal predicate” to the LNN which is defined from the beginning of the LNN model and, therefore, always accessible for use in any formula; we discuss the detailed implementation of this in practice in Section 4. Meanwhile, this predicate is supposed to capture our notion of what it means for two things to be equal. Specifically, in FOL, the equality operator should be comparing two terms. A term in FOL is either a constant, variable, or the application of a function to another term. Because we currently do not have functions, we restrict our discussion in this subsection to terms which are simply constants or variables.
In FOL, formulae are evaluated under a specific interpretation or model. Terms are then said to refer to some object in the domain of the interpretation; under different interpretations, different terms may refer to different objects. Within an interpretation, we use the equality symbol to say that two terms refer to the same object. For example, if we had two constants, and , and we then said that , this means that both and refer to the same object within our domain.
Using this knowledge, we then add axioms which formalize the intended meaning we wish the equality symbol to have. Because we wish the equality symbol to say that two terms are the same if the refer to the same object, we are essentially claiming that the equality symbol is an equivalence relation: it’s partitioning our terms into classes which all refer to the same object. Thus, the predicate we use to represent equality should be reflexive, symmetric, and transitive. We, thus, have our first three axioms:
We also desire that two equal terms can be replaced by one another; i.e., we can substitute any term for one to which it’s equal. For example, say we had some unary predicate, , and terms and such that . Then, we’d wish to say that if , then too since . Here, because , we should be able to replace any occurrences of with .
In the case of our unary predicate, , we may state this with the following axiom:
This now ensures that we can replace equal terms with each other in our expressions for .
More generally, say that is an -ary predicate. We would then include the following axiom:
Effect on Neural Structure
In order to add these axioms to our LNN model, we may add our first three axioms (reflexivity, symmetry, and transitivity) during the instantiation of our model since they only make a claim about our equality predicate, which is also introduced from the beginning; i.e., we add the three axioms as formulae of our knowledge base from the very start. This will lead our neural network to contain neurons corresponding to each axiom’s formula.
In the case of our fourth axiom (congruence), however, we must handle this for every predicate; therefore, whenever a new predicate is introduced into our knowledge base, we also add the appropriate congruence axiom for this predicate, which also leads to the introduction of neurons for each of the congruence axioms.
We include a model schematic displaying the effect on the network from adding these axioms in Figure 3.
For the incorporation of functions as a first-order theory, we had much less to go off of. The introduction of equality as something added “later on” to FOL is a much more common practice than for functions. For example, in the proof of Gödel’s Completeness Theorem, we first prove the claim for FOL without equality and then prove that introducing equality does not change anything. Normally, however, functions are assumed to be included in FOL from the start – unlike equality. Therefore, for the formalization of functions, we did not have many prior resources to work with. We describe our results for formalizing functions as a first-order theory below.
Functions as “Functional Relations”
To formalize functions as a first-order theory, we must understand what functions are fundamentally. Now, an -ary function is used to map inputs to an output such that no set of inputs can map to two different outputs. Moreover, both functions and predicates are fundamentally relations; the only difference being that the relation for a function is constrained to capture how each input may only map to one output. Formally, we say that an -ary relation, , is functional if and only if
This condition ensures that if the first inputs to our -ary relation are the same, then so will the input. In other words, we may say that this ensures that the first inputs map to the value of the input as we desire for functions. Thus, we may say that for any -ary function, , we can construct an -ary functional relation, , such that for every term and terms through ,
Since predicates are simply relations, we can then rewrite each function as a predicate.
Rewriting Formulae to use Functional Relations instead of Functions
Furthermore, when it comes to how functions may be used in FOL formulae, functions are another type of term – like constants and variables. Notice that for any predicate and term , is logically equivalent to ; see Appendix A for a proof. In other words, we may extract the term out of our predicate and introduce an existentially quantified variable which must be equal to our term.
Therefore, by and in the case of being a function , we may say that is logically equivalent to . Since the relation may be represented simply as a predicate, we have now rewritten an FOL formula which contained functions in terms of an FOL formula without any functions. Since functions may themselves have functions as arguments, this procedure repeats until all the functions are removed.
Recursively Extracting Functions
To explain the recursive extraction of functions, we do so by providing a basic example of the phenomena. Say we have a unary predicate , two unary functions and , and a constant . Let and be the functional relations associated with and , respectively.
Now say we wish to remove the use of all functions from the formula . Since is a term of , we first extract this out so that does not contain any terms which are functions:
Note that after rewriting, however, we are still left with a predicate containing a term which is a function. While may be function free, is not. Therefore, we again apply the rewriting rules to our new formula in order to again extract any functions from the terms of our newly introduced predicate :
Thus, we now have an equivalent function-free formula for .
Therefore, in order to add functions to LNNs, we may simply rewrite the formulae of an LNN’s knowledge base to remove functions via the rewriting procedure outlined above and then add axioms ensuring that the predicates we introduced for each function during the rewriting steps are functional.
Effect on Neural Structure
Similar to equality, we will add axioms for each of the predicates associated with a function; specifically, we will do so for those predicates we wish to be defined as functional. This will involve introducing several new neurons to the network to handle the axiom for each function. We present a model schematic displaying the effect on the network from adding these axioms in Figure 4.
We will also modify the neuronal structure corresponding to our input formula as well. Specifically, for each predicate which contains a function as an argument, we will end up adding additional neurons for each of the functions and neurons for existential quantification and conjunction since those operators are introduced from the rewriting process.
4 Proof of Concept: Implementation of Equality
In this section, we include details of our implementations of extending IBM’s LNN library to include equality and provide an example demonstrating how the introduction of equality expands the domain of problems LNNs can now represent. Our implementation of equality is available in an online repository.333https://github.com/nsnave/LNN
Description of Implementation
First, to use equality within a model, we specify that an LNN model instance should have support for equality during its instantiation:
This ensures that equality’s required axioms are added from the start.
In IBM’s LNN library, predicates are instances of a Predicate class. Since we are treating equality as a predicate, to add an equality operator to IBM’s LNN library, we define a new variable444“variable” here is being used to describe a variable in Python – not to be confused with our earlier discussion of variables within FOL. Equals which is an instance of the Predicate class; this variable is then added as an additional import like the other operators. Therefore, users have access to the variable globally.
This also allows for the Model class to have access to the equality variable so we can add the axioms we need to for equality within the Model class. During the instantiation step above, this introduces the axioms for reflexivity, symmetry, and transitivity. We add the congruence axioms for each predicate when each predicate is added with the add_predicates function. We now discuss the implementation via an example.
This full example is presented in Listing 1 and in the online repository.555https://github.com/nsnave/LNN/tests/reasoning/logic/fol/test˙same˙name.py We now walk through it step by step.
Say we wish to introduce a unary predicate dog which tells us whether the input argument refers to a dog. We would then write the following:
At this point, the model now has axioms for reflexivity, symmetry, and transitivity (via line 3) along with congruence for the dog predicate (via line 4).
If we then wish to say that there is a dog named “Aggie” and that Aggie’s nickname is “Fruton”, we’d introduce the following facts to the model:
This establishes that dog(‘Aggie’) and ‘Aggie’ = ‘Fruton’ are true statements. We can then demonstrate our model’s ability to reason about equality by having it prove that dog(‘Fruton’) must also be true. To do so, we introduce as an axiom that Not(dog(‘Fruton’)) is true; if this results in our model reporting that we have a contradiction, then we know that Not(dog(‘Fruton’)) must in reality be false and, therefore, dog(‘Fruton’) is true.
To accomplish this, we begin by first declaring that Not(dog(‘Fruton’)) is an axiom of our model:
Then, we’ll have the model deduce what it can about its knowledge base via the infer function and print the resulting state of our query, which ultimately reports a contradiction:
Currently, LLNs necessarily make the unique-names assumption; this assumes that every constant refers to a unique object in the domain. Therefore, in the example above, there would be no way to declare that both the names “Aggie” and “Fruton” refer to the same dog. Now that we’ve introduced equality, we need not make this assumption as the above example proves. Because of our introduction of equality, we were able to state that both the constants “Aggie” and “Fruton” do in fact refer to the same object. Thus, we were able to deduce that whatever “Fruton” referenced was a dog because we knew that what “Aggie” referenced was a dog. Therefore, the introduction of equality as a first-order theory greatly expanded what LNNs are able to reason about without actually changing how the underlying architecture’s neurons work; we only had to add or modify the logical formulae of the network.
A Note on Functions
In general, the implementation of functions would be significantly more complicated since it involves the actual rewriting of the input functions. In order to add support for functions to IBM’s library, we would need to rewrite the formulae added to the model; however, IBM’s current implementation of LNNs is not friendly to rewriting formulae. A better approach for introducing theories which involve rewriting the input formulae would be to modularize the rewriting process by introducing a separate parser. This parser would take as input a simple abstract syntax tree for each input formulae, apply the rewriting rules, and then pass these rewritten formulae to IBM’s LNN library as usual.
In this project, we focused on developing the theory needed to incorporate equality and functions to the LNN model. We proposed a way to incorporate these elements by implicitly changing the LNN architecture using first-order theories. This allows us to then use equality and functions in the LNNs model proposed by Riegel et al. The addition of equality and functions allows us to use LNNs to reason about statements in FOL that are much more interesting and natural because we have a larger domain of representable problems. For instance, introduction of equality alone frees us from necessarily working within the unique-names assumption. Additionally, we explain how adding such elements affects the underlying structure of LLNs. With this, we have, therefore, demonstrated that the reasoning ability of LNNs can easily be extended to other domains via first-order theories; moreover, the inclusion of other first-order theories beyond equality and functions would allow LNNs to reason about even more complicated objects.
Future work includes looking into the implementation of additional first-order theories such as theories of arrays, binary search tress, multisets, or even theories of arithmetic. This would then allow one to represent and reason about these structures using a neural network-based architecture. Moreover, future work includes finding more empirical support for the reasoning and learning ability of LLNs by testing their ability to prove theorems on a benchmark set such as TPTP666https://www.tptp.org/; the current implementation of LLNs by Riegel et al. was limited to only a small portion of the theorem proving tasks available because it lacked the ability to represent equality and functions. Additionally, one could also look into LNNs ability to recognize logical entailment as described by entail – something we think LNNs would be naturally well-suited for.
Appendix A Proof of Function Rewriting Rule
Theorem: For a given -ary predicate, , and a chosen term , is logically equivalent to .
Let and be arbitrary. As depicted in Figure 5, we may deduced that .
From the Soundess Theorem, it follows that is logically equivalent to .∎
Corollary: If is a -ary function, , then is logically equivalent to where is the -ary functional predicate associated with .