Logic Tensor Networks: Deep Learning and Logical Reasoning from Data and Knowledge

06/14/2016 ∙ by Luciano Serafini, et al. ∙ City, University of London Fondazione Bruno Kessler 0

We propose Logic Tensor Networks: a uniform framework for integrating automatic learning and reasoning. A logic formalism called Real Logic is defined on a first-order language whereby formulas have truth-value in the interval [0,1] and semantics defined concretely on the domain of real numbers. Logical constants are interpreted as feature vectors of real numbers. Real Logic promotes a well-founded integration of deductive reasoning on a knowledge-base and efficient data-driven relational machine learning. We show how Real Logic can be implemented in deep Tensor Neural Networks with the use of Google's tensorflow primitives. The paper concludes with experiments applying Logic Tensor Networks on a simple but representative example of knowledge completion.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The recent availability of large-scale data combining multiple data modalities, such as image, text, audio and sensor data, has opened up various research and commercial opportunities, underpinned by machine learning methods and techniques [Bengio:2009:LDA:1658423.1658424, 44806, Kephart:2003:VAC:642194.642200, kiela-bottou-2014]

. In particular, recent work in machine learning has sought to combine logical services, such as knowledge completion, approximate inference, and goal-directed reasoning with data-driven statistical and neural network-based approaches. We argue that there are great possibilities for improving the current state of the art in machine learning and artificial intelligence (AI) thought the principled combination of knowledge representation, reasoning and learning. Guha’s recent position paper

[

towards-a-model-theory-for-distributed-representations-guha2015

]
is a case in point, as it advocates a new model theory for real-valued numbers. In this paper, we take inspiration from such recent work in AI, but also less recent work in the area of neural-symbolic integration [tr-bottou-2011, DBLP:series/cogtech/GarcezLG2009, DBLP:journals/ml/DiligentiGMR12] and in semantic attachment and symbol grounding [DBLP:journals/neco/BarrettFD08] to achieve a vector-based representation which can be shown adequate for integrating machine learning and reasoning in a principled way.

This paper proposes a framework called Logic Tensor Networks (LTN) which integrates learning based on tensor networks [SocherChenManningNg2013] with reasoning using first-order many-valued logic [bergmann2008introduction], all implemented in TensorFlow [tensorflow2015-whitepaper]. This enables, for the first time, a range of knowledge-based tasks using rich knowledge representation in first-order logic (FOL) to be combined with efficient data-driven machine learning based on the manipulation of real-valued vectors111In practice, FOL reasoning including function symbols is approximated through the usual iterative deepening of clause depth.. Given data available in the form of real-valued vectors, logical soft and hard constraints and relations which apply to certain subsets of the vectors can be specified compactly in first-order logic. Reasoning about such constraints can help improve learning, and learning from new data can revise such constraints thus modifying reasoning. An adequate vector-based representation of the logic, first proposed in this paper, enables the above integration of learning and reasoning, as detailed in what follows.

We are interested in providing a computationally adequate approach to implementing learning and reasoning [Valiant:1999:RL:301250.301425] in an integrated way within an idealized agent. This agent has to manage knowledge about an unbounded, possibly infinite, set of objects . Some of the objects are associated with a set of quantitative attributes, represented by an -tuple of real values , which we call grounding. For example, a person may have a grounding into a -tuple containing some numerical representation of the person’s name, her height, weight, and number of friends in some social network. Object tuples can participate in a set of relations , with , where denotes the arity of relation . We presuppose the existence of a latent (unknown) relation between the above numerical properties, i.e. groundings, and partial relational structure on . Starting from this partial knowledge, an agent is required to: (i) infer new knowledge about the relational structure on the objects of ; (ii) predict the numerical properties or the class of the objects in .

Classes and relations are not normally independent. For example, it may be the case that if an object is of class , , and it is related to another object through relation then this other object should be in the same class . In logic: . Whether or not holds will depend on the application: through reasoning, one may derive where otherwise there might not have been evidence of from training examples only; through learning, one may need to revise such a conclusion once examples to the contrary become available. The vectorial representation proposed in this paper permits both reasoning and learning as exemplified above and detailed in the next section.

The above forms of reasoning and learning are integrated in a unifying framework, implemented within tensor networks, and exemplified in relational domains combining data and relational knowledge about the objects. It is expected that, through an adequate integration of numerical properties and relational knowledge, differently from the immediate related literature [DBLP:journals/dagstuhl-reports/GarcezGHL14, AAAISpring, COCONIPS], the framework introduced in this paper will be capable of combining in an effective way first-order logical inference on open domains with efficient relational multi-class learning using tensor networks.

The main contribution of this paper is two-fold. It introduces a novel framework for the integration of learning and reasoning which can take advantage of the representational power of (multi-valued) first-order logic, and it instantiates the framework using tensor networks into an efficient implementation which shows that the proposed vector-based representation of the logic offers an adequate mapping between symbols and their real-world manifestations, which is appropriate for both rich inference and learning from examples.

The paper is organized as follows. In Section 2, we define Real Logic. In Section 3, we propose the Learning-as-Inference framework. In Section 4, we instantiate the framework by showing how Real Logic can be implemented in deep Tensor Neural Networks leading to Logic Tensor Networks (LTN). Section 5 contains an example of how LTN handles knowledge completion using (possibly inconsistent) data and knowledge from the well-known smokers and friends experiment. Section 6 concludes the paper and discusses directions for future work.

2 Real Logic

We start from a first order language , whose signature contains a set of constant symbols, a set of functional symbols, and a set of predicate symbols. The sentences of are used to express relational knowledge, e.g. the atomic formula states that objects and are related to each other through binary relation ; states that is a symmetric relation, where and are variables; states that there is an (unknown) object which is related to object through . For simplicity, without loss of generality, we assume that all logical sentences of are in prenex conjunctive, skolemised normal form [Huth:2004:LCS:975331], e.g. a sentence is transformed into an equivalent clause , where is a new function symbol.

As for the semantics of , we deviate from the standard abstract semantics of FOL, and we propose a concrete semantics with sentences interpreted as tuples of real numbers. To emphasise the fact that is interpreted in a “real” world, we use the term (semantic) grounding, denoted by , instead of the more standard interpretation222In logic, the term “grounding” indicates the operation of replacing the variables of a term/formula with constants. To avoid confusion, we use the term “instantiation” for this..

  • associates an -tuple of real numbers to any closed term of ; intuitively is the set of numeric features of the object denoted by .

  • associates a real number in the interval to each clause of . Intuitively, represents one’s confidence in the truth of ; the higher the value, the higher the confidence.

A grounding is specified only for the elements of the signature of . The grounding of terms and clauses is defined inductively, as follows.

Definition 1

A grounding for a first order language is a function from the signature of to the real numbers that satisfies the following conditions:

  1. for every constant symbol ;

  2. for every ;

  3. for every ;

A grounding is inductively extended to all the closed terms and clauses, as follows:

where is an s-norm operator, also known as a t-co-norm operator (i.e. the dual of some t-norm operator). 333Examples of t-norms which can be chosen here are Lukasiewicz, product, and Gödel. Lukasiewicz s-norm is defined as ; Product s-norm is defined as ; Gödel s-norm is defined as

Example 1

Suppose that is a set of documents defined on a finite dictionary of words. Let be the language that contains the binary function symbol denoting the document resulting from the concatenation of documents with . Let contain also the binary predicate which is supposed to be true if document is deemed to be similar to document . An example of grounding is the one that associates to each document its bag-of-words vector [Blei:2003:LDA:944919.944937]. As a consequence, a natural grounding of the concat function would be the sum of the vectors, and of the Sim

predicate, the cosine similarity between the vectors. More formally:

  • , where is the number of occurrences of word in document ;

  • if , ;

  • if , .

For instance, if the three documents are = “John studies logic and plays football”, = “Mary plays football and logic games”, = “John and Mary play football and study logic together”, and ={John, Mary, and, football, game, logic, play, study, together} then the following are examples of the grounding of terms, atomic formulas and clauses.

3 Learning as approximate satisfiability

We start by defining ground theory and their satisfiability.

Definition 2 (Satisfiability)

Let be a closed clause in , a grounding, and . We say that satisfies

in the confidence interval

, written , if .

A partial grounding, denoted by , is a grounding that is defined on a subset of the signature of . A grounded theory is a set of clauses in the language of and partial grounding .

Definition 3 (Grounded Theory)

A grounded theory (GT) is a pair where is a set of pairs , where is a clause of containing the set of free variables, and is an interval contained in , and is a partial grounding.

Definition 4 (Satisfiability of a Grounded Theory)

A GT is satisfiabile if there exists a grounding , which extends such that for all and any tuple of closed terms, .

From the previous definiiton it follows that checking if a GT is satisfiable amounts to seaching for an extension of the partial grounding in the space of all possible groundings, such that all the instantiations of the clauses in are satisfied w.r.t. the specified interval. Clearly this is unfeasible from a practical point of view. As is usual, we must restrict both the space of grounding and clause instantiations. Let us consider each in turn: To check satisfiability on a subset of all the functions on real numbers, recall that a grounding should capture a latent correlation between the quantitative attributes of an object and its relational properties444

For example, whether a document is classified as from the field of Artificial Intelligence (AI) depends on its bag-of-words grounding. If the language

contains the unary predicate standing for “ is a paper about AI” then the grounding of , which is a function from bag-of-words vectors to [0,1], should assign values close to to the vectors which are close semantically to . Furthermore, if two vectors are similar (e.g. according to the cosine similarity measure) then their grounding should be similar.. In particular, we are interested in searching within a specific class of functions, in this paper based on tensor networks, although other family of functions can be considered. To limit the number of clause instantiations, which in general might be infinite since admits function symbols, the usual approach is to consider the instantiations of each clause up to a certain depth [DBLP:series/faia/Achlioptas09].

When a grounded theory is inconsitent, that is, there is no grounding that satisfies it, we are interested in finding a grounding which satisfies as much as possible of . For any we want to find a grounding that minimizes the satisfiability error. An error occurs when a grounding assigns a value to a clause which is outside the interval prescribed by . The measure of this error can be defined as the minimal distance between the points in the interval and :

(1)

Notice that if , .

The above gives rise to the following definition of approximate satisfiability w.r.t. a family of grounding functions on the language .

Definition 5 (Approximate satisfiability)

Let be a grounded theory and a finite subset of the instantiations of the clauses in , i.e.

Let be a family of grounding functions. We define the best satisfiability problem as the problem of finding an extensions of in that minimizes the satisfiability error on the set , that is:

4 Implementing Real Logic in Tensor Networks

Specific instances of Real Logic can be obtained by selectiong the space of groundings and the specific s-norm for the interpretation of disjunction. In this section, we describe a realization of real logic where is the space of real tensor transformations of order (where

is a parameter). In this space, function symbols are interpreted as linear transformations. More precisely, if

is a function symbol of arity and are real vectors corresponding to the grounding of terms then can be written as:

for some matrix and -vector , where .

The grounding of -ary predicate , , is defined as a generalization of the neural tensor network [SocherChenManningNg2013] (which has been shown effective at knowledge compilation in the presence of simple logical constraints), as a function from to , as follows:

(2)

where is a 3-D tensor in , is a matrix in , and is a vector in , and

is the sigmoid function. With this encoding, the grounding (i.e. truth-value) of a clause can be determined by a neural network which first computes the grounding of the literals contained in the clause, and then combines them using the specific s-norm. An example of tensor network for

is shown in Figure 1.

Figure 1: Tensor net for , with and and .

This architecture is a generalization of the structure proposed in [SocherChenManningNg2013], that has been shown rather effective for the task of knowledge compilation, also in presence of simple logical constraints. In the above tensor network formulation, and with

are parameters to be learned by minimizing the loss function or, equivalently, to maximize the satisfiability of the clause

.

5 An Example of Knowledge Completion

Logic Tensor Networks have been implemented as a Python library called ltn using Google’s TensorFlow . To test our idea, in this section we use the well-known friends and smokers555Normally, a probabilistic approach is taken to solve this problem, and one that requires instantiating all clauses to remove variables, essentially turning the problem into a propositional one; ltn takes a different approach. example [Richardson-and-domingos-MLN-2006] to illustrate the task of knowledge completion in ltn. There are 14 people divided into two groups and . Within each group of people we have complete knowledge of their smoking habits. In the first group, we have complete knowledge of who has and does not have cancer. In the second group, this is not known for any of the persons. Knowledge about the friendship relation is complete within each group only if symmetry of friendship is assumed. Otherwise, it is imcomplete in that it may be known that, e.g., is a friend of , but not known whether is a friend of . Finally, there is also general knowledge about smoking, friendship and cancer, namely, that smoking causes cancer, friendship is normally a symmetric and anti-reflexive relation, everyone has a friend, and that smoking propagates (either actively or passively) among friends. All this knowledge can be represented by the knowledge-bases shown in Figure 2.

Figure 2: Knowledge-bases for the friends-and-smokers example.

The facts contained in the knowledge-bases should have different degrees of truth, and this is not known. Otherwise, the combined knowledge-base would be inconsistent (it would deduce e.g. and ). Our main task is to complete the knowledge-base (KB), that is: (i) find the degree of truth of the facts contained in KB, (ii) find a truth-value for all the missing facts, e.g. , (iii) find the grounding of each constant symbol 666Notice how no grounding is provided about the signature of the knowledge-base. To answer (i)-(iii), we use ltn to find a grounding that best approximates the complete KB. We start by assuming that all the facts contained in the knowledge-base are true (i.e. have degree of truth 1). To show the role of background knolwedge in the learning-inference process, we run two experiments. In the first (), we seek to complete a KB consisting of only factual knowledge: . In the second (), we also include background knowledge, that is: .

We confgure the network as follows: each constant (i.e. person) can have up to 30 real-valued features. We set the number of layers in the tensor network to 10, and the regularization parameter777A smoothing factor is added to the loss function to create a preference for learned parameters with a lower absolute value. . For the purpose of illustration, we use the Lukasiewicz t-norm with s-norm

, and use the harmonic mean as aggregation operator. An estimation of the optimal grounding is obtained after 5,000 runs of the RMSProp learning algorithm

[rmsprop-tieleman-hinton-2012] available in TensorFlow .

The results of the two experiments are reported in Table 1. For readability, we use boldface for truth-values greater than 0.5. The truth-values of the facts listed in a knowledge-base are highlighted with the same background color of the knowledge-base in Figure 2. The values with white background are the result of the knowledge completion produced by the LTN learning-inference procedure. To evaluate the quality of the results, one has to check whether (i) the truth-values of the facts listed in a KB are indeed close to 1.0, and (ii) the truth-values associated with knowledge completion correspond to expectation. An initial analysis shows that the LTN associated with produces the same facts as itself. In other words, the LTN fits the data. However, the LTN also learns to infer additional positive and negative facts about and not derivable from by pure logical reasoning; for example: , and . These facts are derived by exploiting similarities between the groundings of the constants generated by the LTN. For instance, and happen to present a high cosine similarity measure. As a result, facts about the friendship relations of affect the friendship relations of and vice-versa, for instance and . The level of satisfiability associated with , which indicates that is classically satisfiable.

        
1.00 1.00 0.00 1.00 0.00 0.00 1.00 1.00 1.00 0.00
0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.82 0.00 1.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.06 0.00 0.00 0.00 0.00 0.00
1.00 1.00 0.00 0.33 0.21 0.00 0.00 1.00 0.00 0.00
1.00 0.00 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00
1.00 0.00 0.03 1.00 1.00 1.00 0.11 1.00 0.00 1.00
0.00 0.00 0.00 0.23 0.01 0.14 0.00 0.02 0.00 0.00
    
1.00 0.00 0.00 1.00 0.00 0.00 1.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.10 1.00 0.00 1.00 0.00 0.00
0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00
0.00 0.03 1.00 1.00 0.12 1.00 0.00 1.00
1.00 0.01 0.00 0.98 0.00 0.01 0.02 0.00
Learning and reasoning on
0.84 0.87 0.02 0.95 0.01 0.03 0.93 0.97 0.98 0.01
0.13 0.16 0.45 0.01 0.97 0.04 0.02 0.03 0.06 0.03
0.13 0.15 0.02 0.94 0.11 0.99 0.03 0.16 0.15 0.15
0.14 0.15 0.01 0.06 0.88 0.08 0.01 0.03 0.07 0.02
0.84 0.85 0.32 0.06 0.05 0.03 0.04 0.97 0.07 0.06
0.81 0.19 0.34 0.11 0.08 0.04 0.42 0.08 0.06 0.05
0.82 0.19 0.81 0.26 0.19 0.30 0.06 0.28 0.00 0.94
0.14 0.17 0.05 0.25 0.26 0.16 0.20 0.14 0.72 0.01
    
0.83 0.86 0.02 0.91 0.01 0.03 0.97 0.01
0.19 0.22 0.73 0.03 0.00 0.04 0.02 0.05
0.14 0.34 0.17 0.07 0.04 0.97 0.04 0.02
0.16 0.19 0.11 0.12 0.15 0.06 0.05 0.03
0.14 0.17 0.96 0.07 0.02 0.11 0.00 0.92
0.84 0.86 0.13 0.28 0.01 0.24 0.69 0.02
,
0.98
0.90 , 0.90
0.77
0.96 , 0.92
1.0
Learning and reasoning on
Table 1:

The results of the second experiment show that more facts can be learned with the inclusion of background knowledge. For example, the LTN now predicts that and are true. Similarly, from the symmetry of the friendship relation, the LTN concludes that is a friend of , as expected. In fact, all the axioms in the generic background knowledge are satisfied with a degree of satisfiability higher than 90%, apart from the smoking causes cancer axiom - which is responsible for the classical inconsistency since in the data and smoke and do not have cancer -, which has a degree of satisfiability of 77%.

6 Related work

In his recent note, [towards-a-model-theory-for-distributed-representations-guha2015], Guha advocates the need for a new model theory for distributed representations (such as those based on embeddings). The note sketches a proposal, where terms and (binary) predicates are all interpreted as points/vectors in an -dimensional real space. The computation of the truth-value of the atomic formulae is obtained by comparing the projections of the vector associated to each with that associated to . Real logic shares with [towards-a-model-theory-for-distributed-representations-guha2015] the idea that terms must be interpreted in a geometric space. It has, however, a different (and more general) interpretation of functions and predicate symbols. Real logic is more general because the semantics proposed in [towards-a-model-theory-for-distributed-representations-guha2015] can be implemented within an ltn with a single layer (), since the operation of projection and comparison necessary to compute the truth-value of can be encoded within an matrix with the constraint that , which can be encoded easily in ltn.

Real logic is orthogonal to the approach taken by (Hybrid) Markov Logic Networks (MLNs) and its variations [Richardson-and-domingos-MLN-2006, DBLP:conf/aaai/WangD08, DBLP:conf/aaai/NathD15]. In MLNs, the level of truth of a formula is determined by the number of models that satisfy the formula: the more models, the higher the degree of truth. Hybrid MLNs introduce a dependency from the real features associated to constants, which is given, and not learned. In real logic, instead, the level of truth of a complex formula is determined by (fuzzy) logical reasoning, and the relations between the features of different objects is learned through error minimization. Another difference is that MLNs work under the closed world assumption, while Real Logic is open domain. Much work has been done also on neuro-fuzzy approaches [Kosko:1992:NNF:129386]. These are essentially propositional while real logic is first-order.

Bayesian logic (BLOG) [DBLP:conf/ijcai/MilchMRSOK05]

is open domain, and in this respect similar to real logic and LTNs. But, instead of taking an explicit probabilistic approach, LTNs draw from the efficient approach used by tensor networks for knowledge graphs, as already discussed. LTNs can have a probabilistic interpretation but this is not a requirement. Other statistical AI and probabilistic approaches such as lifted inference fall into this category, including probabilistic variations of inductive logic programming (ILP)

[DBLP:series/synthesis/2016Raedt], which are normally restricted to Horn clauses. Metainterpretive ILP [DBLP:journals/ml/MuggletonLT15], together with BLOG, seem closer to LTNs in what concerns the knowledge representation language, but do not explore the benefits of tensor networks for computational efficiency.

An approach for embedding logical knowledge onto data for the purpose of relational learning, similar to Real Logic, is presented in [rocktaschel2015injecting]. Real Logic and [rocktaschel2015injecting] share the idea of interpreting a logical alphabet in an -dimensional real space. Terminologically, the term “grounding” in Real Logic corresponds to “embeddings” in [rocktaschel2015injecting]. However, there are several differences. First, [rocktaschel2015injecting] uses function-free langauges, while we provide also groundings for functional symbols. Second, the model used to compute the truth-values of atomic formulas adopted in [rocktaschel2015injecting] is a special case of the more general model proposed in this paper (as described in Eq. (2)). Finally, the semantics of the universal and existential quantifiers adopted in [rocktaschel2015injecting] is based on the closed-world assumption (CWA), i.e. universally (respectively, existentially) quantified formulas are reduced to the finite conjunctions (respectively, disjunctions) of all of their possible instantiations; Real Logic does not make the CWA. Furthermore, Real Logic does not assume a specific t-norm.

As in [DBLP:journals/ml/DiligentiGMR12], LTN is a framework for learning in the presence of logical constraints. LTNs share with [DBLP:journals/ml/DiligentiGMR12] the idea that logical constraints and training examples can be treated uniformly as supervisions of a learning algorithm. LTN introduces two novelties: first, in LTN existential quantifiers are not grounded into a finite disjunction, but are scolemized. In other words, CWA is not required, and existentially quantified formulas can be satisfied by “new individuals”. Second, LTN allows one to generate data for prediction. For instance, if a grounded theory contains the formula , LTN generates a real function (corresponding to the grounding of the Skolem function introduced by the formula) which for every vector returns the feature vector , which can be intuitively interpreted as being the set of features of a typical object which takes part in relation with the object having features equal to .

Finally, related work in the domain of neural-symbolic computing and neural network fibring [DBLP:series/cogtech/GarcezLG2009] has sought to combine neural networks with ILP to gain efficiency [DBLP:journals/ml/FrancaZG14] and other forms of knowledge representation, such as propositional modal logic and logic programming. The above are more tightly-coupled approaches. In contrast, LTNs use a richer FOL language, exploit the benefits of knowledge compilation and tensor networks within a more loosely- coupled approach, and might even offer an adequate representation of equality in logic. Experimental evaluations and comparison with other neural-symbolic approaches are desirable though, including the latest developments in the field, a good snapshot of which can be found in [COCONIPS].

7 Conclusion and future work

We have proposed Real Logic: a uniform framework for learning and reasoning. Approximate satisfiability is defined as a learning task with both knowledge and data being mapped onto real-valued vectors. With an inference-as-learning approach, relational knowledge constraints and state-of-the-art data-driven approaches can be integrated. We showed how real logic can be implemented in deep tensor networks, which we call Logic Tensor Networks (LTNs), and applied efficiently to knowledge completion and data prediction tasks. As future work, we will make the implementation of LTN available in TensorFlow and apply it to large-scale experiments and relational learning benchmarks for comparison with statistical relational learning, neural-symbolic computing, and (probabilistic) inductive logic programming approaches.

References