As in the case of combinatorial structures, the study of randomly generated instances of NP-complete problems in artificial intelligence has received significant attention in the last two decades. These problems include the satisfiability of boolean formulas (SAT) and the constraint satisfaction problems (CSP)[Achlioptas et al. (1997), Achlioptas et al. (2005), Cheeseman et al. (1991), Gent and Walsh (1994), Huberman and Hogg (1987), Mitchell et al. (1992), Monasson et al. (1999)]. In turn, these results on properties of random SAT and random CSP significantly help researchers in better understanding SAT and CSP, and developing fast solvers for them.
On the other hand, it is well known that reasoning in propositional logic and in most constraint languages is monotonic in the sense that conclusions obtained before new information is added cannot be withdrawn. However, commonsense knowledge is nonmonotonic. In artificial intelligence, significant effort has been paid to develop fundamental problem solving paradigms that allow users to conveniently represent and reason about commonsense knowledge and solve problems in a declarative way. Answer set programming (ASP) is currently one of the most widely used nonmonotonic reasoning systems due to its simple syntax, precise semantics and importantly, the availability of ASP solvers, such as clasp [Gebser et al. (2009)], dlv [Leone et al. (2006)], and smodels [Syrjänen and Niemelä (2001)]. However, the theoretical study of random ASP has not made much progress so far [Namasivayam and Truszczynski (2009), Namasivayam (2009), Schlipf et al. (2005), Zhao and Lin (2003)].
first conducted an experimental study on the issue of phase transition for randomly generated ASP programs whose rules can have three or more literals.[Schlipf et al. (2005)] reported on their experimental work for determining the distribution of randomly generated normal logic programs at the Dagstuhl Seminar.
To study statistical properties for random programs, [Namasivayam and Truszczynski (2009), Namasivayam (2009)] considered the class of randomly generated ASP programs in which each rule has exactly two literals, called simple random programs. Their method is to map some statistical properties of random graphs into simple random programs by transforming a random program into that of a random graph through a close connection between simple random programs and random graphs. As the authors have commented, those classes of random programs that correspond to some classes of random graphs are too restricted to be useful. Their effort further confirms that it is challenging to recast statistical properties of SAT/CSP to nonmonotonic formalisms such as ASP.
In fact, the monotonicity plays an important role in proofs of major results for random SAT/CSP. Specifically, major statistical properties for SAT/CSP are based on a simple but important property: An interpretation is a model of a set of clauses/constraints if and only if is a model of each clause/constraint. Due to the lack of monotonicity in ASP, this property fails to hold for ASP and other major nonmonotonic formalisms.
For this reason, it might make sense to first focus on some relatively simple but expressive classes of ASP programs (i.e., still NP-complete). We argue that the class of negative two-literal programs (i.e. normal logic programs in which a rule body has exactly one negative literal) is a good start for studying random logic programs under answer set semantics for several reasons111Our definition of negative two-literal programs here is slightly different from that used by some other authors. But these definitions are essentially equivalent if we notice that a fact rule can be expressed as a rule where is a new atom. Details can be found in Section 2.: (1) The problem of deciding if a negative two-literal program has an answer set is still NP-complete. In fact, the class of negative two-literal programs is used to show the NP-hardness of answer set semantics for normal logic programs in [Marek and Truszczynski (1991)] (Theorem 6.4 and its proof, where a negative two-literal program corresponds to a simple -theory). (2) Many important NP-complete problems can be easily encoded as (negative) two-literal programs [Huang et al. (2002)]. (3) Negative two-literal programs allow us to conduct large scale experiments with existing ASP solvers, such as smodels, dlv and clasp.
In this paper we introduce a new model for generating and studying random negative two-literal programs, called linear model. A random program generated under the linear model is of the size about where is a constant and is the total number of atoms. We choose such a model of randomly generating negative two-literal programs for two reasons. First, if we use a natural way to randomly generate programs like what has been done in SAT and CSP, we would come up with two possible models in terms of program sizes (i.e. linear in and quadratic in ), since only negative two-literal rules in total can be generated from a set of atoms. We study statistical properties of such random programs and have obtained both theoretical and experimental results for random programs generated under the linear model, especially, Theorem 1. These properties include the average number of answer sets, the size distribution of answer sets, and the distribution of consistent programs under the linear model. Second, such results can be used in practical applications. For instance, it is important to compute all answer sets of a program in applications, such as diagnoses and query answering, in P-log [Baral et al. (2009)]
. In such cases, the number of answer sets for a program is certainly relevant. If we know the number of answer sets and the average size of the answer sets for a logic program, such information can be useful heuristics for finding all answer sets of a given program. Also, the linear model of random programs may be useful in application domains such as ontology engineering where most of large practical ontologies are sparse in the sense that the ratio of terminological axioms to concepts/roles is relatively small[Staab and Studer (2004)].
The contributions of this work can be summarised as follows:
A model for generating random logic programs, called the linear model, is established. Our model generates random logic programs in a similar way as SAT and CSP, but we distinguish the probabilities for picking up pure rules and contradiction rules.[Namasivayam and Truszczynski (2009)] discusses some program classes of two-literal programs that may not be negative. However, as their major results are inherited from the corresponding ones in random graph theory, such results hold only for very special classes of two-literal programs. For instance, in regard to the result on negative two-literal programs without contradiction rules (Theorem 2, page 228), the authors pointed out that the theorem “concerns only a narrow class of dense programs, its applicability being limited by the specific number of rules programs are to have” (, is a fixed number, the number of rules and )222There may be an error here as ..
We mathematically show that the average number of answer sets for a random program converges to a constant when the number of atoms approaches infinity. We note that the proofs of statistical properties, such as phase transitions, for random SAT and random CSP are usually obtained through the independence of certain probabilistic events, which in turn is based on a form of the monotonicity of classical logics (specifically, given a set of formulas with , it holds that when denotes the set of all models of a formula or a set of formulas). However, it is well known that ASP is nonmonotonic. In our view, this is why many proof techniques for random SAT cannot be immediately adapted to random ASP. In order to provide a formal proof for Theorem 1, we resort to some techniques from mathematical analysis such as Stirling’s Approximation and Taylor series. As a result, our proof is both mathematically involved and technically novel. We look into the application of our main result in predicting the consistency of random programs (Proposition 5 and Section 4.3).
We have conducted significant experiments on statistical properties of random programs generated under the linear model. These properties include the average number of answer sets, the size distribution of answer sets, and the distribution of consistent programs under the linear model. For the average number of answer sets, our experimental results closely match the theoretical results obtained in Section 3. Also, the experimental results corroborate the conjecture that under the linear model, the size distribution of answer sets for random programs obeys a normal distribution when is large. The experimental results show that our theories can be used to predict practical situations. As explained above, we need to find all answer sets in some applications. For large logic programs, it may be infeasible to find all answer sets but we could develop algorithms for finding most of the answer sets. If we know an average size of answer sets, we might need only to examine those sets of atoms whose sizes are around the average size.
The rest of the paper is arranged as follows. In Section 2, we briefly review answer set semantics of logic programs and some properties of two-literal programs that will be used in the subsequent sections. In Section 3, we first introduce the linear model for random logic programs (negative two-literal programs), study mathematical properties of random programs, and then present the main result in a theorem. In Section 4 we describe some of our experimental results and compare them with related theoretical results obtained in the paper. We conclude the work in Section 5. For the convenience of readers, some mathematical basics required for the proofs are included in the Appendix at the end of the paper.
2 Answer Set Semantics and Two-Literal Programs
We briefly review some basic definitions and notation of answer set programming (ASP). We restrict our discussion to finite propositional logic programs on a finite set of atoms ().
A normal logic program (simply, logic program) is a finite set of rules of the form
where is for the default negation, , and , and are atoms in (, ).
We assume that all atoms appearing in the body of a rule are pairwise distinct.
A literal is an atom or its default negation . The latter is called a negative literal. An atom and its default negation are said to be complementary.
Given a rule of form (1), its head is defined as and its body is where , , and .
A rule of form (1) is positive, if ; negative, if . A logic program is called positive (resp. negative), if every rule in is positive (resp. negative).
An interpretation for a logic program is a set of atoms . A rule is satisfied by , denoted , if whenever and . Furthermore, is a model of , denoted , if for every rule . A model of is a minimal model of if for any model of , implies .
The semantics of a logic program is defined in terms of its answer sets (or equivalently, stable models) [Gelfond and Lifschitz (1988), Gelfond and Lifschitz (1990)] as follows. Given an interpretation , the reduct of on is defined as . Note that is a positive logic program and every (normal) positive program has a unique least model. Then we say is an answer set of , if is the least model of . By we denote the collection of all answer sets of . For an integer , denotes the set of answer sets of size for .
A logic program may have zero, one or multiple answer sets. is said to be consistent, if it has at least one answer set. It is well-known that the answer sets of a logic program are incomparable: for any and in , implies .
Two logic programs and are equivalent under answer set semantics, denoted , if , i.e., and have the same answer sets. We can slightly generalise the equivalence of two programs as follows. Let be a logic program on and a logic program on , where is a set of new (auxiliary) atoms. We say and are equivalent if the following two conditions are satisfied:
if , then there exists such that and .
if , then is in .
From the next section and on, we will focus on a special class of logic programs, called negative two-literal programs.
A negative two-literal rule is a rule of the form where and are atoms. These two atoms do not have to be distinct. If , it is a pure rule; if , it is a contradiction rule. A negative two-literal program is a finite set of negative two-literal rules.
We note that our definition is slightly different from some other authors, such as [Janhunen (2006), Lonc and Truszczynski (2002)], in that fact rules are not allowed in our definition. This may not be an issue since a fact rule of the form can be expressed as a negative two-literal rule , where is a new atom that does not appear in the program.
It is shown in [Marek and Truszczynski (1991)] that the problem of deciding the existence of answer sets for a negative two-literal program is NP-complete. This result confirms that the class of negative two-literal programs is computationally powerful and it makes sense to study the randomness for such a class of logic programs.
We remark that, by allowing the contradiction rules, constraints of the form () can be expressed in the class of negative two-literal programs. A contradiction rule is strongly equivalent to the constraint under answer set semantics: for any logic program , is equivalent to under answer set semantics. Notice also that a constraint of the form is strongly equivalent to the two constraints and , and a constraint of the form is strongly equivalent to two rules and where is a fresh atom.
In the rest of this section we present three properties of negative two-literal programs. While Proposition 1 is to demonstrate the expressive power of negative two-literal programs, Propositions 2 and 3 will be used to prove our main theorem in the next section. These properties are already known in the literature and we do not claim their originality here.
First, each logic program can be equivalently transformed into a negative two-literal program under answer set semantics. This result is mentioned in [Blair et al. (1999)] but no proof is provided there. For completeness, we provide a proof of this proposition in the appendix at the end of the paper.
Each normal logic program is equivalent to a negative two-literal program under answer set semantics.
The next result provides an alternative characterization for the answer sets of a negative two-literal program, which is a special case of Theorem 6.81, Section 6.8 in [Marek and Truszczynski (1993)].
Let be a negative two-literal program on containing at least one rule. Then is an answer set of iff the following two conditions are satisfied:
If , then is not a rule in .
If , then there exists such that is a rule in .
We note that in condition 1 above, it can be the case that .
We note that if the empty set is an answer set of a negative two-literal program, the program must be empty. Also, is not an answer set for any negative two-literal program on .
Let be a negative two-literal program on containing at least one rule. If is an answer set of , then . Here is the number of elements in .
3 Random Programs and Their Properties
In this section we first introduce a model for randomly generating negative two-literal programs and then present some statistical properties of such random programs. The main result in this section (Theorem 1) shows that the expected number of answer sets for a random program on generated under our model converges to a constant when the number of atoms approaches infinity. As the proof of Theorem 1 is lengthy and mathematically involved, some technical details, as well as necessary basics of mathematical analysis, are included in the appendix at the end of the paper.
In this section, we assume that each negative two-literal program contains at least one rule.
Definition 1 (Linear Model )
Let and be two non-negative real numbers with . Given a set of atoms with , a random program on is a negative two-literal program that is generated as follows:
For any two different atoms , the probability of the pure rule being in is .
For any atom , the probability of the constraint being in is .
Each rule is selected randomly and independently based on the given probability.
In the above notation, ‘’ is for ‘negative two-literal programs’. For simplicity, we assume that a random program is non-empty. If , then a random program generated under does not contain any contradiction rules.
In probability theory, the expected value (or mathematical expectation) of a random variable is the weighted average of all possible values that this random variable can take on. Suppose random variablecan take possible values and each has the probability for . Then the expected value of random variable is defined as
Also, if a random variable is the sum of a finite number of other variables (), i.e.,
The number of rules in random program (i.e., the size of ) is a random variable. As there are possible pure rules, each of which has probability , and possible constraints, each of which has the probability . Thus, the expected value of , also called the expected number of rules for random program , is the sum of expected number of pure rules and the expected number of constraints:
This means that the average size of random programs generated under the model is a linear function of . This is the reason why we refer to our model for random programs as the linear model of random programs under answer sets.
For with (), the probability of being an answer set of random program , denoted , can be easily figured out as the next result shows. We remark that, by Proposition 3, for negative two-literal program , neither the empty set nor can be an answer set of . So we do not need to consider the case of or .
Let be a random program on a set of atoms, generated under , with . Then
Recall that and . If we denote , then Eq.(2) can be simplified into
Let be a subset of with and . We can split the first condition in Proposition 2 into two sub-conditions. is an answer set of negative two-literal program iff the following two conditions are satisfied:
for each pair with , the rule is not in .
for each , the rule is not in .
for each , there exists an atom such that is in .
Let us figure out the probabilities that the above conditions (1.1), (1.2) and (2) hold, respectively.
We say that an atom is supported w. r. t. in (or just, supported) if there exists a rule of the form in such that . In this case, the rule is referred to as a supporting rule for .
First, since contains elements, there are possible pure rules of the form with and . By the definition of , the probability that a pure rule does not belong to is . Thus, the probability that none of the pure rules with and belongs to is . That is, the condition (1.1) will hold with the probability .
Next, by the definition of , the probability that a constraint rule of the form does not belong to is . Since contains atoms, the probability that none of the constraint rules of the form with is . That is, the condition (1.2) will hold with the probability .
Last, we consider the condition (2). For each , if a pure rule supports , then it must be of the form for some . There are possible such pure rules. Also, is not supported by such pure rules only if does not contain such rules at all. Thus, the probability that is not supported (by one of such pure rules) is . That is, the probability that is supported is . As there are atoms in , the probability that every atom in is supported by a pure rule in is .
Combining the above three conditions, we know that the probability that is an answer set of random program is as follows.
Now we are ready to present the main result in this section, which shows that the average number of answer sets for random logic programs generated under the linear model converges to a constant when the number of atoms approaches infinity. This constant is determined by and , e. g., when and , the constant is around .
Let denote a random program generated under the linear model and be the expected number of answer sets for random program . Then
where is the unique solution of the equation .
This result gives an estimation for the average number of answer sets for a random program. Before we prove Theorem1, let us look at its application in predicting the consistency of a random program.
For a random program and a set of atoms , by we denote the (probabilistic) event that a given set of atoms is an answer set for . We introduce the following property for random programs:
(ASI) Given a random program , for any two sets and of atoms.
The ‘I’ in (ASI) is for ‘Independence’. Informally, the above property says that for any two sets of atoms and , the events and are independent of each other. We remark that this property does not hold in general. For example, suppose . If is an answer set of , then must not be an answer set of . This implies that and are actually not independent. However, when the set of atoms is sufficiently large, by Theorem 1, the average number of answer sets will be relatively small compared to the number of all subsets of . As a result, there will be a relatively small number of pairs and with such that and are not independent. Thus, when is sufficiently large, the impact of dependency for answer sets will be not radical. Under the (ASI) assumption, we are able to derive an estimation for the probability that a random program has an answer set.
Let be a random program on a set of atoms, generated under , with . If (ASI) holds and is sufficiently large, then
As explained, (ASI) does not hold in realistic situation. Our experiments indeed show that there is a shift between the estimated probability determined by Eq.(5) and the actual probability. However, The experimental results suggest that this shift can be remedied by applying a factor of around to in Eq.(5), see Section 4 for details. So, combining Theorem 1 and Proposition 5, we will be able to estimate the probability for the consistency of random programs. Let be the event that is an answer set of size for random program . We first observe that by Eq.(3), . Recall that is the set of answer sets of size for logic program .
If is sufficiently large, then
In the rest of this section, we will present a formal proof of Theorem 1. Let us first outline a sketch for the proof. In order to prove Eq.(4), our first goal will be to show that is the sum of ’s for .
For an integer with , we use to denote the collection of answer sets of size for program , i.e., . Then the number is a random variable. It is easy to see that the expected number of answer sets of size for random program is
So the expected (total) number of answer sets for , denoted , can be expressed as
Note that by Proposition 3, a random program generated under the linear model has neither answer sets of size nor . So, we can ignore the cases of and .
Our next goal is, based on Eq.(7), to show that
where the function is defined by
At the same time, we are going to show that
where the function defined below is a normal distribution function multiplied by a constant.
As the above integral of is , which can be figured out easily, the conclusion of Theorem 1 will be proven.
Here is the unique solution of the equation and is the normal distribution function
multiplied by a constant :
while and are defined, respectively, as follows.
Some remarks are in order. As , if for some , it must be the case that . On the other hand, if , the function is monotonically increasing and thus the equation must have a unique solution.
Moreover, we define
Before providing the proof of Theorem 1, we first prove some technical results.
The following result shows that , as defined in Eq.(9), is indeed a tight approximation to .
Let be a random program on a set of atoms, generated under , with . Let be the expected number of answer sets of size for (). Then ,
By Proposition 4,
Let . Then
By Proposition 6, we can show the following result.
Based on Eq.(17), and the fact that both and are non-negative,
As for , we have that
The next result shows that the integral of can be obtained through the integral of , which is useful as the integral of can be easily figured out.
4 Experimental Results
In this section, we describe some experimental results about the average number of answer sets, the size distribution of answer sets, and the probability of consistency for random programs under the linear model. For the average number of answer sets, our experimental results closely match the theoretical results obtained in Section 3.
To conduct the experiments, we have developed a software tool to generate random logic programs, which is able to randomly generate logic programs based on the user-input parameters, such as the type of programs, the number of atoms, the number of literals in a rule, the number of rules in a program and the number of programs in a test set etc. After a set of random programs are generated, the tool invokes an ASP solver to compute the answer sets of the random programs, records the test results in a file, and analyses them. The experimental results in this section were based on the ASP solver clasp [Gebser et al. (2009)], but same patterns were obtained for test cases on which dlv [Leone et al. (2006)] and smodels [Syrjänen and Niemelä (2001)] were also used.
We have conducted a significant number of experiments to corroborate the theoretical results obtained in Section 3 including Theorem 1. In order to get a feel for how quickly the experimental distribution converges to the theoretical one, we tested the difference rate of these two values for varied numbers of atoms. The experimental results show that the theorem can be used to predict practical situations. Some other statistical properties of random programs generated under the linear model were also experimented, such as the size distribution of answer sets. Positive results are received for nearly all of our experiments. In this section, we report the results from two of our experiments. In the first experiment, we set , which means there are no contradiction rules in the programs. In the second experiment, we set from to 20 to test the impact of contradiction rules on the random programs.
4.1 Experiment 1: Random Programs without contradiction rules
In this experiment, , , and varies with values , respectively. For each of these values of , logic programs were randomly and independently generated under the linear model.
Given that and is determined by , we have that . Thus, by Eq.(4), it follows that .
We use to denote the average number of answer sets for the programs in each test generated under the linear model. The (experimental) values for and their corresponding theoretical values (i.e., the expected number of answer sets for random programs determined by Eq.(4)) are listed in Table 1. The experimental and theoretical results are visualized in Figure 1. We can see that these two values are very close even if is relatively small.
Another important result obtained from this experiment is about the size distribution of answer sets for random programs. Specifically, the experiment supports a conjecture that the distribution of the average size of answer sets for random programs obeys a normal distribution.
Average number of answer sets for the programs randomly generated in each test (referred to as ‘Experiment Result’ in Figure 2 and Figure 3): We took , respectively, and for each of these values of , we randomly generated programs under the linear model. For each (), we calculated the average number of answer sets of size for these programs, i.e., the ratio of the total number of answer sets of size for all these programs divided by .
Expected number of answer sets for random programs under the linear model (referred to as ‘The Model’ in Figure 2 and Figure 3): In order to compare the experimental values with their theoretical counterparts, for each , we calculated the expected number of answer sets of size for random programs under the linear model.
Normal Distribution function: The above two types of values were also compared with the function defined by Eq.(11), which is actually the normal distribution function multiplied by a constant.
, the theoretical results are still very close to the experimental results. In order to see how quickly the experimental distribution converges to the theoretical one, we consider the rate variance function: For two discrete functions and on the interval with (), we define
Clearly, the closer and , the smaller , and vice versa. The function is often used in measuring the gap between two discrete functions and . If we take as the normal distribution function and as the experimental distribution function (i.e., the average size of answer sets based on the programs randomly generated in each test). The resulting rate variance function is depicted in Figure 4. This diagram shows that, as increases, the rate variance gradually decreases. It also shows that the rate variance is very small even when . This experimental result further suggests the conjecture that the size distribution of answer sets obeys a normal distribution.
4.2 Experiment 2: Random Programs with contradiction rules
In this experiment, we tested random programs that may contain contradiction rules and obtained similar experimental results as in the first experiment. We set , , and , respectively. For each value of , programs were independently generated under the linear model.
On the other hand, based on Eq.(4), we can figure out the expected number of answer sets for each .
These two types of values are visualized as two curves in Figure 5. It shows that these two curves are very close to each other, which means our theoretical result on size distribution of answer sets is corroborated by the experimental result.
Similar to the first experiment, the size distribution of answer sets was also investigated experimentally. In this case, we took and three types of values were obtained (shown in Figure 6). There is a slight shift between the linear model and the normal distribution. We expect that when the number is sufficiently large, this shift will become narrower. For example, when increases from to , the shift is significantly reduced.
4.3 Experiment 3: Approximating the probability for consistency of random programs
In this subsection we present our experimental results on verifying the formula for predicting consistency of random programs (discussed in Section 3):
Here is a constant around (i.e. independent of ). We tested various pairs of and . For each such pair, we took . Then for each value of , we computed the value determined by Eq.(20). For each value of , we generated programs randomly and computed the ratio of consistent programs to all programs.
Our experimental results corroborate the estimation in Eq.(20). So this formula can be used to predict the consistency of random progr