Neural-Symbolic (NeSy) seeks to develop effective integration between connectionist learning and symbolic reasoning, possibly taking advantage of all statistical methods that can be applied to the features of data perceived or on the logical structure of symbolic information [dagstuhl17]
. Moreover, the NeSy Computing community that has sought to integrate the views from AI, cognitive sciences, machine learning, ANN, computational vision and natural language processing and point out the main lines NeSy should go to meet Human-Like Computaing (HLC), and in particular the Human-Like AI (HLAI) initiative. Among these points addressed, two of them are of particular interest for our experiment because it falls, somehow, in both aspects.
Statistical Relational Learning to integrate explanation and computation of symbolic knowledge in deep networks. Recent results have already shown that this is a possible path [donaserafgarcezLTN]
. In the same way that the statistics of elements of a propositional (or relational) logic program can be used to induce the semantics of Artificial Neural Networks (ANN) to behave like them, e.g. C-ILP [adg-gzCILP], self-organization can be exploited to produce meaningful maps of concepts in order to ease induction of hypothesis.
Explainable AI aims to develop AI models that are intelligible to humans, unlike “black box” models, which are efficient but difficult to extract knowledge, e.g., deep (or similar) learning without symbolic interpretation. In this direction, patterns of concepts can be used to justify (and explain) “shortcuts” to generate recursive hypothesis from very large sets of relations without the need to compute the entire path to justify it. This is critical when the background knowledge has huge amounts of data. It could be adequately handled as regions of concepts and categories, similar to the human brain map organization. This will allow symbolic deduction to be performed as matching and inductive reasoning to use weights to prune the search space of candidates for hypotheses.
The Shared Neural Multi-Space (Shared NeMuS), is a Smarandache’s multispace [mao:smultispace], or the union of spaces . As such, each represents the space of a characteristic distinct from total space and so it is suitable to represent different concepts of logical language. Such structure resembles the self-organization of a brain-like map as it is enhanced with adjustable weights of importance.
NeMuS was proposed in [nemus4NeSy16]
as a hierarchy of weighted vectors of logical elements pointing to their occurrences within a set of logical formulae in their normal disjunctive form. This structure of weights was used to generate patterns of refutation so that deduction is treated as a matching problem. The knowledge base, known as background knowledge (BK), is used to compute complementary similarities among literals and form activation regions. The present work brings another proof-of-concept that NeMuS, we claim, is suitable to deal with complex learning tasks that meet at least two of human-like AI endeavour because:
the self-organizing tendency of information in our brain, occurring at all levels is easily mapped into its hierarchy.
the sort of segregation of information into distinct parts needed to learn about objects, relations and rules composing them can be obtained through the adaptation of its (neural) network of weights to form regions of concepts (like refutation, similarities, etc.).
Thus, NeMuS is suitable to self-organization of maps or regions of importance on the logical structure that have a certain relevance to what one wants to learn
about symbolic knowledge. This can be used, for example, to choose better deduction strategies that help reduce search space. This paper makes the following contributions: it shows how patterns of similarities among (mostly relational) logical formulae can be determined; it points out how the formation of such patterns can be used, initially, as heuristics to guide the search for consistent hypothesis in inductive learning; it brings a self-organizing perspective that one may be interested to learn about a large set of relational knowledge.
The remainder of this paper is organized as follows: section 2.1 gives some brief background on NeMuS main concepts and its applications on patterns of reasoning and inductive clause learning, section 3 presents the self-organizing models we used in our experiments, section 4 describes the experiment setting up and the preliminary results found, section 5 we briefly contextualize this work in relation to others similar, and finally in section 6 concludes the work presented.
2.1 Fundamentals of NeMuS structure
For FOL purposes, NeMuS can be defined with five spaces, one for each component of the first-order language as depicted in Figure 1. The upward arrows illustrate the distribution of weights from the bottom up. The hierarchical composition of the FOL is top-down, i.e. the semantics of a first order expression is the composition of the semantics of its subexpressions.
Note that this structure is not limited to only these spaces since clauses may influence possible worlds, possible worlds may define other higher concepts, and so on. For our purposes, we have: 0 (space) for variables, 1 for atomic constants of the Herbrand Universe, 2 for functions (suppressed here), 3 for predicates with literal instances with their own space and 4 to clauses. In what follows vectors are written , and or is used to refer to an element of a vector at position .
The building block of NeMuS is a 4-place vector, called T-Node, used to describe logical elements. Each element is uniquely identified by an integer code (an index) within its space. In addition, a T-Node identifies the lexicographic occurrence of the element, and (when appropriate) an attribute position.
Definition 1 (T-Node)
Let , . A T-Node (target node) is a quadruple that identifies an object at space , with code and occurrence , at attribute position (when it applies, otherwise 1). is the set of all T-Nodes.
- NeMuS Binding
is an indexed pair , , and , such that , , and . It represents the importance of object over occurrence of object at space in position .
- Constant Space (1)
is a vector , in which every is a vector of bindings. The function maps a constant to the vector of its bindings , as above.
Functions, predicates and clauses are compounds forming higher spaces. Their logical components are vectors of T-Nodes (one for each argument), and a vector of NeMuS bindings (simply bindings) to represent their instances.
in NeMuS is a vector of T-Nodes, i.e. , so that each , and it represents an attribute of a compound logical expression coded as .
- Instance Space
(I-Space) of a compound is the pair in which is a vector of bindings. A vector of I-Spaces is a NeMuS Compound Space (C-Space).
A literal (predicate instance), is an element of an I-Space, and so the predicate space is simply a C-Space. Seen as compounds, clauses’ attributes are the literals composing such clauses.
- Predicate Space (3)
is a pair in which and are vectors of C-spaces.
- Clause Space (4)
is a vector of C-spaces such that every pair in the vector shall be .
The following description is not the only way of building a NeMuS structure. For the purpose of this work we assume no function terms, and so only 4 spaces is needed.
Definition 2 (Shared NeMuS)
A Shared NeMuS for a set of coded first-order expressions is a ordered 4-tuple(i.e. is = 4), , in which is the variable space, is the constant space, is the predicate space and is the clause space.
2.2 Inductive Clause Learning (ICL) with NeMuS
The main focus of ILP is symbolic learning of a generic logic formula , called a hypothesis, which describes a concept, say , not yet defined in BK, having BK data and examples that characterise instances or positive examples, as well as negative examples. In other words, a formula learned is a valid hypothesis if, and only if, the union of it with BK only yields positive deductions from the concept and not the negative ones, or formally
In [ilSNeMuS4NeSy17], Mota, Howe and Garcez showed how to make use of NeMuS to perform the same task of inductive logic programming systems presented on the literature. ICL could be performed in a system called Amao, did not explicitly used the weights, but the “intuitive use” of them was explored to define linkage patterns between predicates of the HB and the occurrences of ground terms in positive examples. Meaningless hypotheses are pruned away as a result of inductive momentum between predicates connected to positive and negative examples.
The results on inductive learning in Amao show that using its shared structure leads to reliable hypothesis generation in the sense that the minimally correct ones are generated.
More recent ILP solutions proposed the use of meta-programming to define which predicates should appear in hypothesis , as [muggleton15meta], like a bias in traditional ILP. However, this and other ILP approaches have to generate multiple hypothesis candidates, pair with a copy of BK, and test positive and negative deductions. A valid hypothesis search engine will be efficient if there is a partial ordering on the substitutions of hypotheses that subsume others, i.e. are more generic [nienhuys1997foundations].
Amao use of NeMuS takes a different approach: rather than generating all possible candidates for a hypothesis, itconsiders only those whose predicates of the premises contained terms from the positive examples, while excluding from negative ones. This is achieved by walking through the Herbrand base, from the terms of the examples, using inverse unification [idestam93generalization].
However, when it comes to generate recursive hypothesis that demand many examples, it behaves like most approaches. This is were the importance of this experiment came about: by indicating that (relational) literals without a direct chain of attributes can be induced as recursive if the respective regions of their predicates shows that they have connections. A parallel work to ours, [motaEfficientPINemus19], is actually making use of the results presented here (for awhile) as and heuristic to provide predicate invention on recursive hypothesis.
In order that Amao meets the desired self-organized behavior when learning (or reasoning), we took an approach that adapts itself to new information which will be very important for future neuro-symbolic applications like non-monotonic reasoning, and so on.
3 Self-Organizing Models
In this section, we presented a brief explanation about the Self-Organizing Maps and its variant used in this experiment, the Associative Self-Organizing Maps.
3.1 Self-Organizing Maps
Self Organizing Maps (SOM) is an artificial neural network which transforms a given n-dimensional pattern of data into a 1- or 2-dimensional map or grid. This transformation process is done following a topological ordering, where patterns of data (synaptic or vector weights) with similar statistical features are gathered in regions close to each other in the grid. This learning process can be classified as competitive-based because neurons compete against each other to be placed at the output layer of the neuron network, but only one wins. It is also unsupervised because the neuron network learns only with entry patterns, reorganizing itself after the first trained data and adjusting its weights as new data arrive.
A detailed description of the complete SOM algorithm is presented in [kohonenBook]. In what follows, we provide a summary of the main steps of how the SOM’s learning process works.
Initialization: at the beginning of the process, all neuron vectors have their synaptic weights randomly generated. Such vectors must have the same dimension of the entry pattern space.
Sampling: a single sample is chosen from the entry pattern space and fed to the neuron grid.
Competition: based on the minimum Euclidean distance criterion, the winning neuron is found as follows:
where is the number of neurons in the grid.
Synaptic adaptation: after finding the winning neuron (Best-Matching Unit or BMU), all synaptic weights of each neuron vector are adjusted:
where represents the current instant, is the learning rate which gradually decreases with time , and is the neighborhood function which determines the grade of learning of a neuron according to its relative distance to the winning neuron.
Repeat steps 2 to 4 until no significant change happens in the topological map or achieve the number of epochs predefined.
3.2 Associative Self-Organizing Maps
The A-SOM is described in [johnsson2009associative] and can be considered as a SOM which learns to associate its activity with additional ancillary inputs from a number of additional SOMs. It consists of an grid of neurons with a fixed number of neurons and a fixed topology. Each neuron is associated with weight vectors, where is used for the main input and ,, , are used for the ancillary inputs.
The following equations show the synaptic adaptation in the main and ancillary weight vectors where is the learning rate (that decreases after each iteration), is the neighbourhood function, is the BMU index, is the main neuron activity and the ancillary activity. At the end of each train epoch, all the weight vectors are normalized.
4 Setting Up the Experiment
The experiment has two parts. The first consists of training a SOM map of concepts, , to learn about a predicate base that describes a family genealogy using only the NeMuS constant space. The second part consists of modeling an Associative SOM, train it with the NeMuS constant and predicate spaces provided by the knowledge base (KB) and compare the two approaches. Some predicates present in the KB are described below.
|father(Jake, Bill)||mother(Alice, Ted)|
|father(Jake, John)||mother(Alice, Megan)|
|mother(Matilda, John)||father(John, Harry)|
|mother(Matilda, Bill)||father(John, Susan)|
|father(Bill, Ted)||mother(Mary, Harry)|
|father(Bill, Megan)||mother(Mary, Susan)|
4.1 SOM training and induction
The SOM is generated and trained using as inputs the NeMuS constant space. The entire NeMuS structure is generated when a knowledge base is compiled. The codes for each logical element are inserted in a very efficient hashed corpus so that they are uniquely identified with space, instance, and the attribute (if it is the case) they belong to. For detailed description, we refer to [nemus4NeSy16]. Our script just selects the NeMuS constant space to feed SOM training phase and it yields a map as shown in Figure 2. The circles represent the predicate instances and the ‘s represent the predicate instances.
After the training, we used to induce rules that define predicate targets such as ancestor. In this experiment, we used Python language and the Jupyter environment and its MiniSOM library. In the following step-by-step description of how self-organized induction works, is a NeMuS space instance, positive and negative examples and is a vector of examples.
Normalize NeMuS constant space into
Instantiate and train it with
Verify the partition of predicate regions (plot).
Induction over positive or negative examples of .
generate NeMuS random vectors for each example
for each , do
if is a positive example then:
let be a copy of all bindings in , in which occurs as 1st argument and occurs as 2nd.
train with BMU weights of and
select the BMU for the induction vector
if is a negative example:
let be from a copy of that ignores all bindings of that it occurs as 1st argument and the ones that occurs as 2nd.
train with BMU weights of
select the BMU for the induction vector
extract the new rule based on the neurons closest to the positive induction vector
If is close to neurons representing the same predicate, then it assumes this characteristic as well.
If and , representing the same rule target and, at end of training, they are located in different regions, then the rule is characterized by the union of these different concepts.
For example, and are located in the mother and father regions respectively. Therefore, we can say that an ancestor can be a father or a mother. The Figure 3 shows the SOM after the induction ancestor(X,Y) knowing ancestor(Jake,John) and ancestor(John,Harry). The triangle represents the induction vector of ancestor(Jake,John) and represents the example ancestor(John,Harry). From the organization of the map, we see that both vectors are near to instances so we can assume that Jake is father of somebody that is ancestor of Harry.
We have run other vector examples and the results were similar to this one, that for lack of space it is not possible to show. However, the full potential of NeMuS structure of weights is their combination and both maps just considered the bindings of constants from the given vector examples. Next, we present an approach to do that.
4.2 A-SOM training
In this part, we modeled a 20x20 A-SOM called using as main inputs the NeMuS constant space and the NeMuS predicate space as ancillary inputs. Both spaces were generated from the knowledge base before the experiment. We choosed A-SOM to combine the different views and notations of the same base.
We trained the map with two approaches. The first using only the NeMus constant space as well as the first part of the experiment to compare both final results. Then, we trained using the two spaces described in the last paragraph. The comparison of these maps is showed in the Figures 4 and 5. Like the first part, the circles represent the predicate instances and the ’s represent the predicate instances.
The arrangement of regions in the associative maps was shown a little different from that presented in the SOM. This was due to the shorter amount of A-SOM training times (1,000 times) compared to SOM (10,000 times).
Because it is a more complex model, which deals with main and auxiliary inputs, it has a much larger number of vectors for synaptic adaptation (the A-SOM has associated weight vectors while the SOM has vectors, where is the map dimension), it would take much longer to train the two models with the same amount of iterations and we did not have enough equipment for a deeper analysis that we expect to explore in a near future.
4.3 Dealing with Positive and negative examples
What differentiates the training of the induction vectors from positive and negative examples is the selection of the instances that will be used for the training. If the example is positive, we select only the instances where the constants present in the example appear and respecting the order. Otherwise, we select all other instances. In the end, the positive and even negative induction vectors will be located close to some region of similarity and this is a limitation of our model: interpret the location of the induction vectors of negative examples since it can not be similar to that described section 4.1.
We have presented and experimented a model for reasoning over a knowledge base using Self-Organizing Map (SOM) and one of its variants called the Associative Self-Organizing Map. The A-SOM develops a representation of its input space but also learns to associate its activity with the activities of an arbitrary number of ancillary inputs. In our experiments we connected an A-SOM to a list of main and ancillary inputs provided by the knowledge base using the NeMuS space notation.
As a result, we have presented a embryonic way to recognize patterns among predicates and induce rules over them using the Self-Organising Maps. However, the existence of these regions is yet to be formally proven, although it can be clearly seen in the map plot.
5 Related Work
Inductive reasoning has been the building block for the successful development of ILP approaches [Muggleton2012], as well as in the establishment of NeSy computing as an effective methodology for the integration of machine learning and reasoning [2019arXiv190506088D]. Both have the benefit of having logic language as the framework to generate human-interpretable explanations, not present in other ML approaches or artificial neural network (ANN) models of learning.
However, the recent advances in AI as a consequence of the groundbreaking achievements of Deep learning,[leCunBengioDeepLearn], with its applications, e.g. [silver2017mastering] brought attention from ANN side to unveil the “black-box” computations, although they have surpassed human intelligent abilities in some application domains. An avalanche of works to turn deep ANN with ILP and NeSy-like features have emerged to meet the XAI challenges, e.g. [dILP18].
Our work brings a new contribution to XAI, although we have not had the time and resources to a massive data set experimentation or mathematical proofs of the existence of such regions before any computation starts. The inductive reasoning endowed with self-organized learning feature points out to a direction in which XAI system will not only be able to explain their computation, but also to give intuitive justification for reasoning with shortcuts like the one presented here and to evolve its learning and reasoning mechanisms as expected for a human-like AI system.
6 Conclusion and Future Works
This paper has shown some preliminary results of exploring NeMuS weighted structure to generate patterns of similarities from a small set of relational logical formulae. Those patterns can be used as a strategy to find recursive rules in a more efficient way. Although not yet integrated within Amao platform, the results presented here do support Amao learning strategy when dealing with potential recursive hypothesis. The patterns of similarities among relational logical formulae indicates how they can be used for inductive learning purposes.
For the lack of time, it was not possible to experiment the use of similar regions of concepts to guide predicate invention. Furthermore, the self-organizing approach brought interesting aspects that may be exploited to experiment like dynamic knowledge bases and non-monotonic learning and reasoning based on maps of concepts.
Future work will focus on such aspects as well as making more efficient use of weighted structures of concepts within Amao and interact more directly with its learning components. This will help to investigate its use on learning and reasoning of complex formulae, as well as dealing with noise, uncertainty and possible worlds. We then aim to incorporate deep learning-like mechanisms for the training of massive structured datasets.