1 Introduction
One of the challenging topics in AI is to reason with huge scale of knowledge bases. Linear algebraic computation has potential to make symbolic reasoning scalable to reallife datasets, and several studies aim at integrating linear algebraic computation and symbolic computation. For instance, Grefenstette 2013
introduces tensorbased predicate calculus that realizes logical operations. Yang,
2015 introduce a method of mining Horn clauses from relational facts represented in a vector space. Serafini and Garcez 2016 introduce logic tensor networks that integrate logical deductive reasoning and datadriven relational learning. Sato 2017a formalizes Tarskian semantics of firstorder logic in vector spaces, and Sato (2017b) shows that tensorization realizes efficient computation of Datalog. Lin 2013 introduces linear algebraic computation of SAT for clausal theories.To realize linear algebraic computation of logic programming, Sakama 2017 introduce encodings of Horn, disjunctive and normal logic programs in vector spaces. They show that least models of Horn programs, minimal models of disjunctive programs, and stable models of normal programs are computed by algebraic manipulation of thirdorder tensors. The study builds a new theory of logic programming, while implementation and evaluation are left open.
In this paper, we first reformulate the framework of Sakama (2017) and present an algorithm for computing least models of definite programs in vector spaces. We next introduce two optimization techniques for computing: the first one is based on column reduction of matrices, and the second one is based on partial evaluation. We perform experimental testing and compare algorithms for computing fixpoints of definite programs. The rest of this paper is organized as follows. Section 2 reviews basic notions and Section 3 provides linear algebraic characterization of logic programming. Section 4 presents partial evaluation of logic programs in vector spaces. Section 5 provides experimental results and Section 6 summarizes the paper. Due to space limit, we omit proofs of propositions and theorems.
2 Preliminaries
We consider a language that contains a finite set of propositional variables. Given a logic program , the set of all propositional variables appearing in is called the Herbrand base of (written ). A definite program is a finite set of rules of the form:
(1) 
where and are propositional variables (atoms) in . A rule is called a drule if is the form:
(2) 
where and are propositional variables in . A dprogram is a finite set of rules that are either (1) or (2). Note that the rule (2) is a shorthand of rules: , , , so a dprogram is considered a definite program.^{1}^{1}1The notion of dprograms is useful when we consider a program such that each atom is defined by a single rule in Section 3. For each rule of the form (1) or (2), define and .^{2}^{2}2We assume if . A rule is called a fact if .
A set is an interpretation of . An interpretation is a model of a dprogram if implies for every rule (1) in , and implies for every rule (2) in . A model is the least model of if for any model of . A mapping (called a operator) is defined as:
The powers of are defined as: and . Given , there is a fixpoint . For a definite program , the fixpoint coincides with the least model of van Emden & Kowalski (1976).
3 Logic Programming in Linear Algebra
3.1 SD programs
We first consider a subclass of definite programs, called SD programs.
Definition 1 (SD program)
A definite program is called singly defined ( program, for short) if for any two rules and () in .
Interpretations and programs are represented in a vector space as follows.
Definition 2 (interpretation vector Sakama (2017))
Let be a definite program and . Then an interpretation is represented by a vector where each element represents the truth value of the proposition such that if ; otherwise, . We write . Given ℝ, define and ℝ .
Definition 3 (matrix representation of SD programs)
Let be an SD program and . Then is represented by a matrix ℝ such that for each element in ,

if is in ;

if is in ;

, otherwise.
is called a program matrix. We write and .
In the th row corresponds to the atom appearing in the head of a rule, and the th column corresponds to the atom appearing in the body of a rule. On the other hand, every fact in is represented as a tautology in .
Example 1
Consider with . Then becomes
where and .
Definition 4 (initial vector)
Let be a definite program and . Then the initial vector of is an interpretation vector such that if and a fact is in ; otherwise, .
Definition 5 (thresholding)
Given a vector , define where if ; otherwise, .^{3}^{3}3 can be greater than 1 only later when drules come into play. We call the thresholding of .
Given a program matrix ℝ and an initial vector ℝ, define
It is shown that for some . When , we write .
Theorem 1
Let be an SD program and ℝ its program matrix. Then ℝ is a vector representing the least model of iff where is the initial vector of .
Example 2
Consider the program of Example 1 and its program matrix . The initial vector of is . Then
and . Next,
and . Hence, represents the least model of .
Remark: The current study is different from the previous work Sakama (2017) in matrix representation of programs as follows.

In Sakama (2017) a fact is represented as a rule “” and is encoded in a matrix by where and . In contrast to the current study, the previous study sets the empty set as the initial vector and computes fixpoints. In this study, we start with the initial vector representing facts, instead of representing facts as rules in . This has the effect of increasing zero elements in matrices and reducing the number of required iterations in fixpoint computation. Representing matrices in sparse forms also brings storage advantages with a good matrix library.

In Sakama (2017) a constraint is represented as a rule “” and is encoded in a matrix by where and . In the current study, we do not include constraints in a program as it causes a problem in partial evaluation. Still, we can handle constraints separately from a program as follows. Given a program and constraints , encode them by matrices ℝ and ℝ, respectively, where has the element in its row. After computing the fixpoint ℝ as in Theorem 1, compute ℝ. If and , then is inconsistent; otherwise, represents the least model of .
3.2 NonSD programs
When a definite program contains two rules: and , is transformed to a dprogram where , and . Here, and are new propositional variables associated with and , respectively.
Generally, a nonSD program is transformed to a dprogram as follows.
Definition 6 (transformation)
Let be a definite program and its Herbrand base. For each , put and . Then define and where is a new propositional variable such that and if . Then, build a dprogram
where is an SD program and is a set of drules.
introduces additional propositional variables and holds. By definition, the next result holds.
Proposition 1
Let be a definite program and its transformed dprogram. Suppose that and have the least models and , respectively. Then holds.
In this way, any definite program is transformed to a semantically equivalent dprogram where is an SD program and is a set of drules. A dprogram is represented by a matrix as follows.
Definition 7 (program matrix for dprograms)
Let be a dprogram such that where is an SD program and is a set of drules, and the Herbrand base of . Then is represented by a matrix ℝ such that for each element in ,

if is in ;

otherwise, every rule in is encoded as in Def. 3.
Given a program matrix and the initial vector representing facts in , the fixpoint is computed as before. The fixpoint represents the least model of .
Theorem 2
Let be a dprogram and ℝ its program matrix. Then ℝ is a vector representing the least model of iff where is the initial vector of .
Example 3
Consider the program
.
As is a nonSD program,
it is transformed to a dprogram
where and are new propositional variables.
Then ℝ becomes the matrix (right).
The initial vector of is . Then,
,
,
, and
.
Then represents the least model
of , hence is the least model of .
An algorithm for computing the least model of a definite program is shown in Figure 1. In the algorithm, the complexity of computing is and computing is . The number of times for iterating is at most times. So the complexity of Step 3 is in the worst case.
3.3 Column Reduction
To decrease the complexity of computing , we introduce a technique of column reduction of program matrices.
Definition 8 (submatrix representation of dprograms)
Let be a definite program such that . Suppose that is transformed to a dprogram such that where is an SD program and is a set of drules, and . Then is represented by a matrix ℝ such that each element in is equivalent to the corresponding element in of Def. 7. is called a submatrix of .
Note that the size of ℝ of Def. 7 is reduced to ℝ in Def. 8 by . In the columns do not include values of newly introduced propositions and derivation of propositions in via drules is checked by the following thresholding.
Definition 9 (thresholding)
Given a vector , define a vector such that (i) if , (ii) if and there is a drule such that and , and (iii) otherwise, . We call the thresholding of .
Intuitively, thresholding introduces an additional condition Def. 9(ii) to thresholding, which means that “if an element in the body of a drule is 1, then the element in the head of the drule is set to 1”. is computed by checking the value of for and checking all drules for . Since the number of drules is at most , the complexity of computing is . By definition, it holds that .
Proposition 2
Let be a definite program with , and a transformed dprogram with . Let ℝ be a submatrix of . Given a vector ℝ representing an interpretation of , let ℝ. Then is a vector representing an interpretation of such that .
Given a program matrix ℝ and the initial vector ℝ of , define
where represents the product of and . Then it is shown that for some . When , we write .
Theorem 3
Let be a definite program with , and a transformed dprogram with . Then ℝ is a vector representing the least model of iff where ℝ is the initial vector of .
Generally, given a dprogram , the value of is not greater than the value of of Section 3.1.
Example 4
By Proposition 2, we can replace the computation in Step 3 of Algorithm 1 in Figure 1 by . In the column reduction method, the complexity of computing is and computing is . The number of times for iterating is at most times. So the complexity of computing is . Comparing the complexity of Step 3 in Algorithm 1, the column reduction reduces the complexity to as in general.
4 Partial Evaluation
Partial evaluation is known as an optimization technique in logic programming Lloyd & Shepherdson (1991). In this section, we provide a method of computing partial evaluation of definite programs in vector spaces.
Definition 10 (partial evaluation)
Let be an SD program. For any rule in , put . Then construct a rule such that

, and

Define
where and . is called partial evaluation of .
Example 5
Consider . Put , , , and . Unfolding rules produces: , , , and . Then it becomes .
By definition, is obtained from by unfolding propositional variables appearing in the body of any rule in in parallel. If contains an atom unfolded by no rule in , then is just removed from . Partial evaluation preserves the least model of the original program Lloyd & Shepherdson (1991).
Proposition 3
Let be an SD program. Then and have the same least model.
Partial evaluation is computed by matrix products in vector spaces.
Example 6
The program of Example 5 is represented by the matrix , and becomes
Intuitively speaking, nonzero elements in represent conjuncts appearing in each rule. So the first row represents the rule and the second row represents the rule . then represents . is different from for the representation of the rule . This is because is represented as in , so that unfolding by becomes . Thus, does not represent the result of unfolding rules by facts precisely, while it does not affect the result of computing the least model of . In fact, applying the vector representing facts in and applying thresholding, we obtain that represents the least model of . We say that represents the rule by rule (shortly, rr) partial evaluation, and often say just partial evaluation when no confusion arises. Formally, we have the next result.
Proposition 4
Let be an SD program and the initial vector representing facts of . Then .
Partial evaluation has the effect of reducing deduction steps by unfolding rules in advance. Proposition 4 realizes this effect by computing matrix products in advance. Partial evaluation is performed iteratively as
Iterative partial evaluation is computed by matrix products as follows.
Let be an SD program and ℝ its program matrix. Define and . Then is a matrix representing a program that is obtained by th iteration of (rr) partial evaluation.
Theorem 4
Let be an SD program and ℝ . Then where .
When is a nonSD program, first transform to a dprogram where is an SD program and is a set of drules (Section 3.2). Next, define . We then compute (rr) partial evaluation of as (rr) partial evaluation of an SD program plus drules .
Comments
There are no comments yet.