# Partial Evaluation of Logic Programs in Vector Spaces

In this paper, we introduce methods of encoding propositional logic programs in vector spaces. Interpretations are represented by vectors and programs are represented by matrices. The least model of a definite program is computed by multiplying an interpretation vector and a program matrix. To optimize computation in vector spaces, we provide a method of partial evaluation of programs using linear algebra. Partial evaluation is done by unfolding rules in a program, and it is realized in a vector space by multiplying program matrices. We perform experiments using randomly generated programs and show that partial evaluation has potential for realizing efficient computation in huge scale of programs.

There are no comments yet.

## Authors

• 4 publications
• 23 publications
• 6 publications
• 12 publications
09/22/2020

### Enhancing Linear Algebraic Computation of Logic Programs Using Sparse Representation

Algebraic characterization of logic programs has received increasing att...
04/23/2018

### Top-down and Bottom-up Evaluation Procedurally Integrated

This paper describes how XSB combines top-down and bottom-up computation...
03/17/2000

### Detecting Unsolvable Queries for Definite Logic Programs

In solving a query, the SLD proof procedure for definite programs someti...
09/06/2021

### Lightweight, Multi-Stage, Compiler-Assisted Application Specialization

Program debloating aims to enhance the performance and reduce the attack...
08/07/2002

### Offline Specialisation in Prolog Using a Hand-Written Compiler Generator

The so called "cogen approach" to program specialisation, writing a comp...
07/30/2016

### A Linear Algebraic Approach to Datalog Evaluation

In this paper, we propose a fundamentally new approach to Datalog evalua...
10/11/2018

### Towards Understanding Linear Word Analogies

A surprising property of word vectors is that vector algebra can often b...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

One of the challenging topics in AI is to reason with huge scale of knowledge bases. Linear algebraic computation has potential to make symbolic reasoning scalable to real-life datasets, and several studies aim at integrating linear algebraic computation and symbolic computation. For instance, Grefenstette 2013

introduces tensor-based predicate calculus that realizes logical operations. Yang,

2015 introduce a method of mining Horn clauses from relational facts represented in a vector space. Serafini and Garcez 2016 introduce logic tensor networks that integrate logical deductive reasoning and data-driven relational learning. Sato 2017a formalizes Tarskian semantics of first-order logic in vector spaces, and Sato (2017b) shows that tensorization realizes efficient computation of Datalog. Lin 2013 introduces linear algebraic computation of SAT for clausal theories.

To realize linear algebraic computation of logic programming, Sakama 2017 introduce encodings of Horn, disjunctive and normal logic programs in vector spaces. They show that least models of Horn programs, minimal models of disjunctive programs, and stable models of normal programs are computed by algebraic manipulation of third-order tensors. The study builds a new theory of logic programming, while implementation and evaluation are left open.

In this paper, we first reformulate the framework of Sakama (2017) and present an algorithm for computing least models of definite programs in vector spaces. We next introduce two optimization techniques for computing: the first one is based on column reduction of matrices, and the second one is based on partial evaluation. We perform experimental testing and compare algorithms for computing fixpoints of definite programs. The rest of this paper is organized as follows. Section 2 reviews basic notions and Section 3 provides linear algebraic characterization of logic programming. Section 4 presents partial evaluation of logic programs in vector spaces. Section 5 provides experimental results and Section 6 summarizes the paper. Due to space limit, we omit proofs of propositions and theorems.

## 2 Preliminaries

We consider a language that contains a finite set of propositional variables. Given a logic program , the set of all propositional variables appearing in is called the Herbrand base of (written ). A definite program is a finite set of rules of the form:

 h←b1∧⋯∧bm(m≥0) (1)

where and are propositional variables (atoms) in . A rule is called a d-rule if is the form:

 h←b1∨⋯∨bm(m≥0) (2)

where and are propositional variables in . A d-program is a finite set of rules that are either (1) or (2). Note that the rule (2) is a shorthand of rules: , , , so a d-program is considered a definite program.111The notion of d-programs is useful when we consider a program such that each atom is defined by a single rule in Section 3. For each rule of the form (1) or (2), define and .222We assume if . A rule is called a fact if .

A set is an interpretation of . An interpretation is a model of a d-program if implies for every rule (1) in , and implies for every rule (2) in . A model is the least model of if for any model of . A mapping (called a -operator) is defined as:

 TP(I)={h∣h←b1∧⋯∧bm∈Pand{b1,…,bm}⊆I} ∪{h∣h←b1∨⋯∨bn∈Pand{b1,…,bn}∩I≠∅}.

The powers of are defined as: and . Given , there is a fixpoint . For a definite program , the fixpoint coincides with the least model of van Emden & Kowalski (1976).

## 3 Logic Programming in Linear Algebra

### 3.1 SD programs

We first consider a subclass of definite programs, called SD programs.

###### Definition 1 (SD program)

A definite program is called singly defined ( program, for short) if for any two rules and () in .

Interpretations and programs are represented in a vector space as follows.

###### Definition 2 (interpretation vector Sakama etal. (2017))

Let be a definite program and . Then an interpretation is represented by a vector where each element represents the truth value of the proposition such that if ; otherwise, . We write . Given , define and .

###### Definition 3 (matrix representation of SD programs)

Let be an SD program and . Then is represented by a matrix such that for each element in ,

1. if  is in ;

2. if is in ;

3. , otherwise.

is called a program matrix. We write and .

In the -th row corresponds to the atom appearing in the head of a rule, and the -th column corresponds to the atom appearing in the body of a rule. On the other hand, every fact in is represented as a tautology in .

###### Example 1

Consider with . Then becomes

 pqrs pqrs⎛⎜ ⎜ ⎜ ⎜⎝\par01001/201/2000010001⎞⎟ ⎟ ⎟ ⎟⎠

where and .

###### Definition 4 (initial vector)

Let be a definite program and . Then the initial vector of is an interpretation vector such that if and a fact is in ; otherwise, .

###### Definition 5 (θ-thresholding)

Given a vector , define where if ; otherwise, .333 can be greater than 1 only later when d-rules come into play. We call the  -thresholding of .

Given a program matrix and an initial vector , define

 v1=θ(MPv0)andvk+1=θ(MPvk)(k≥1)

It is shown that for some . When , we write .

###### Theorem 1

Let be an SD program and its program matrix. Then is a vector representing the least model of iff where is the initial vector of .

###### Example 2

Consider the program of Example 1 and its program matrix . The initial vector of is . Then

 MPv0=⎛⎜ ⎜ ⎜ ⎜⎝\par01001/201/2000010001⎞⎟ ⎟ ⎟ ⎟⎠⎛⎜ ⎜ ⎜⎝0001⎞⎟ ⎟ ⎟⎠=⎛⎜ ⎜ ⎜⎝\par0011⎞⎟ ⎟ ⎟⎠

and . Next,

 MPv1=⎛⎜ ⎜ ⎜ ⎜⎝01001/201/2000010001⎞⎟ ⎟ ⎟ ⎟⎠⎛⎜ ⎜ ⎜⎝0011⎞⎟ ⎟ ⎟⎠=⎛⎜ ⎜ ⎜ ⎜⎝01/211⎞⎟ ⎟ ⎟ ⎟⎠

and . Hence, represents the least model of .

Remark: The current study is different from the previous work Sakama (2017) in matrix representation of programs as follows.

• In Sakama (2017) a fact is represented as a rule “” and is encoded in a matrix by where and . In contrast to the current study, the previous study sets the empty set as the initial vector and computes fixpoints. In this study, we start with the initial vector representing facts, instead of representing facts as rules in . This has the effect of increasing zero elements in matrices and reducing the number of required iterations in fixpoint computation. Representing matrices in sparse forms also brings storage advantages with a good matrix library.

• In Sakama (2017) a constraint is represented as a rule “” and is encoded in a matrix by where and . In the current study, we do not include constraints in a program as it causes a problem in partial evaluation. Still, we can handle constraints separately from a program as follows. Given a program and constraints , encode them by matrices and , respectively, where has the element in its row. After computing the fixpoint as in Theorem 1, compute . If and , then is inconsistent; otherwise, represents the least model of .

### 3.2 Non-SD programs

When a definite program contains two rules: and , is transformed to a d-program where , and . Here, and are new propositional variables associated with and , respectively.

Generally, a non-SD program is transformed to a d-program as follows.

###### Definition 6 (transformation)

Let be a definite program and its Herbrand base. For each , put and . Then define and where is a new propositional variable such that and if . Then, build a d-program

 Pδ=(P∖⋃p∈BPRp)∪⋃p∈BP(Sp∪Dp)

where is an SD program and is a set of d-rules.

introduces additional propositional variables and holds. By definition, the next result holds.

###### Proposition 1

Let be a definite program and its transformed d-program. Suppose that and have the least models and , respectively. Then holds.

In this way, any definite program is transformed to a semantically equivalent d-program where is an SD program and is a set of d-rules. A d-program is represented by a matrix as follows.

###### Definition 7 (program matrix for d-programs)

Let be a d-program such that where is an SD program and is a set of d-rules, and the Herbrand base of . Then is represented by a matrix such that for each element in ,

1. if  is in ;

2. otherwise, every rule in is encoded as in Def. 3.

Given a program matrix and the initial vector representing facts in , the fixpoint is computed as before. The fixpoint represents the least model of .

###### Theorem 2

Let be a d-program and its program matrix. Then is a vector representing the least model of iff where is the initial vector of .

By Proposition 1 and Theorem 2, we can compute the least model of any definite program.

###### Example 3

Consider the program . As is a non-SD program,
it is transformed to a d-program where and are new propositional variables. Then becomes the matrix (right). The initial vector of is . Then, , , , and . Then represents the least model of , hence is the least model of .

An algorithm for computing the least model of a definite program is shown in Figure 1. In the algorithm, the complexity of computing is and computing is . The number of times for iterating is at most times. So the complexity of Step 3 is in the worst case.

### 3.3 Column Reduction

To decrease the complexity of computing , we introduce a technique of column reduction of program matrices.

###### Definition 8 (submatrix representation of d-programs)

Let be a definite program such that . Suppose that is transformed to a d-program such that where is an SD program and is a set of d-rules, and . Then is represented by a matrix such that each element in is equivalent to the corresponding element in of Def. 7. is called a submatrix of .

Note that the size of of Def. 7 is reduced to in Def. 8 by . In the columns do not include values of newly introduced propositions and derivation of propositions in via d-rules is checked by the following -thresholding.

###### Definition 9 (θD-thresholding)

Given a vector , define a vector such that (i) if , (ii) if and there is a d-rule such that and , and (iii) otherwise, . We call the  -thresholding of .

Intuitively, -thresholding introduces an additional condition Def. 9(ii) to -thresholding, which means that “if an element in the body of a d-rule is 1, then the element in the head of the d-rule is set to 1”. is computed by checking the value of for and checking all d-rules for . Since the number of d-rules is at most , the complexity of computing is . By definition, it holds that .

###### Proposition 2

Let be a definite program with , and a transformed d-program with . Let be a submatrix of . Given a vector representing an interpretation of , let . Then is a vector representing an interpretation of such that .

Given a program matrix and the initial vector of , define

 v1=θD(NPδv0[1…n])and% vk+1=θD(NPδvk[1…n])(k≥1)

where represents the product of and . Then it is shown that for some . When , we write .

###### Theorem 3

Let be a definite program with , and a transformed d-program with . Then is a vector representing the least model of iff where is the initial vector of .

Generally, given a d-program , the value of is not greater than the value of of Section 3.1.

###### Example 4

For the d-program of Example 3, we have the submatrix representing .
Given the initial vector of , it becomes , , . Then is a vector representing the least model of , and is a vector representing the least model of . Note that the second element of becomes by Def. 9(ii).

By Proposition 2, we can replace the computation in Step 3 of Algorithm 1 in Figure 1 by . In the column reduction method, the complexity of computing is and computing is . The number of times for iterating is at most times. So the complexity of computing is . Comparing the complexity of Step 3 in Algorithm 1, the column reduction reduces the complexity to as in general.

## 4 Partial Evaluation

Partial evaluation is known as an optimization technique in logic programming Lloyd & Shepherdson (1991). In this section, we provide a method of computing partial evaluation of definite programs in vector spaces.

###### Definition 10 (partial evaluation)

Let be an SD program. For any rule in , put . Then construct a rule such that

• , and

Define

 peval(P)=(⋃r∈Punfold(r))∖R

where and . is called partial evaluation of .

###### Example 5

Consider . Put , , , and . Unfolding rules produces: , , , and . Then it becomes .

By definition, is obtained from by unfolding propositional variables appearing in the body of any rule in in parallel. If contains an atom unfolded by no rule in , then is just removed from . Partial evaluation preserves the least model of the original program Lloyd & Shepherdson (1991).

###### Proposition 3

Let be an SD program. Then and have the same least model.

Partial evaluation is computed by matrix products in vector spaces.

###### Example 6

The program of Example 5 is represented by the matrix , and becomes

 pqst MP=pqst⎛⎜ ⎜ ⎜ ⎜⎝\par01/31/31/31/2001/200010001⎞⎟ ⎟ ⎟ ⎟⎠(MP)2=⎛⎜ ⎜ ⎜ ⎜⎝\par1/6005/601/61/62/300010001⎞⎟ ⎟ ⎟ ⎟⎠

Intuitively speaking, non-zero elements in represent conjuncts appearing in each rule. So the first row represents the rule and the second row represents the rule . then represents . is different from for the representation of the rule . This is because is represented as in , so that unfolding by becomes . Thus, does not represent the result of unfolding rules by facts precisely, while it does not affect the result of computing the least model of . In fact, applying the vector representing facts in and applying -thresholding, we obtain that represents the least model of . We say that represents the rule by rule (shortly, r-r) partial evaluation, and often say just partial evaluation when no confusion arises. Formally, we have the next result.

###### Proposition 4

Let be an SD program and the initial vector representing facts of . Then .

Partial evaluation has the effect of reducing deduction steps by unfolding rules in advance. Proposition 4 realizes this effect by computing matrix products in advance. Partial evaluation is performed iteratively as

 pevalk(P)=peval(pevalk−1(P))(k≥1)% andpeval0(P)=P.

Iterative partial evaluation is computed by matrix products as follows.

Let be an SD program and its program matrix. Define and . Then is a matrix representing a program that is obtained by -th iteration of (r-r) partial evaluation.

###### Theorem 4

Let be an SD program and . Then where .

When is a non-SD program, first transform to a d-program where is an SD program and is a set of d-rules (Section 3.2). Next, define . We then compute (r-r) partial evaluation of as (r-r) partial evaluation of an SD program plus d-rules .

An algorithm for computing the least model of a definite program by (r-r) partial evaluation is shown in Figure 2. We can combine partial evaluation and column reduction of Section 3.3 by slightly changing Step 3 of Algorithm 2. We evaluate this hybrid method in the next section.