1 Introduction
Inductive Logic Programming [Muggleton (1991)] (ILP) addresses the task of learning a logic program, called a hypothesis, that explains a set of examples using some background knowledge. Although ILP has traditionally addressed learning (monotonic) definite logic programs, recently, several new systems have been proposed for learning under the (nonmonotonic) answer set semantics (e.g. [Ray (2009)], [Corapi et al. (2012)], [Athakravi et al. (2014)], [Law et al. (2014)] and [Law et al. (2015a)]). Among these, ILASP2 [Law et al. (2015a)] extended ILP to learning from ordered answer sets (), a computational task that learns answer set programs containing normal rules, choice rules and both hard and weak constraints.
Common to all ILP systems is the underlying assumption that hypotheses should cover the examples with respect to one fixed given background knowledge. But, in practice, some examples may be contextdependent – different examples may need to be covered using different background knowledges. For instance, within the problem domain of urban mobility, the task of learning journey preferences of people in a city may require a general background knowledge that describes the different modes of transport available to a user (walk, drive, etc.), and examples of which modes of transport users choose for particular journeys. In this case, the context of an example would be the attributes (e.g. the distance) of the journey. It is infeasible to assume that every possible journey could be encoded in the background knowledge – attributes, such as journey distances, may take too many possible values. But, encoding the attributes of observed journeys as contexts of the observations restricts the computation to those attribute values that are in the contexts.
In this paper, we present a generalisation of , called contextdependent learning from ordered answer sets (), which uses contextdependent examples. We show that any task can be translated into an task, and can therefore be solved by ILASP2. Furthermore, to improve the scalability of ILASP2, we present a new iterative reformulation of this learning algorithm, called ILASP2i. This iterative approach differs from existing nonmonotonic learning systems, which tend to be batch learners, meaning that they consider all examples at once. Nonmonotonic systems cannot use a traditional cover loop (e.g., [Muggleton (1995)]), as examples that were covered in previous iterations are not guaranteed to be covered in later iterations. However, ILASP2i iteratively computes a hypothesis by constructing a set of examples that are relevant to the search, without the need to consider all examples at once. Relevant examples are essentially counterexamples for the hypotheses found in previous iterations. This approach is a middle ground between batch learning and the cover loop: it avoids using the whole set of examples, but works in the nonmonotonic case, as the relevant examples persist through the iterations. We show that ILASP2i performs significantly better than ILASP2 in solving learning from ordered answer set tasks with large numbers of examples, and better still when learning with contextdependent examples, as in each iteration it only considers the contexts of relevant examples, rather than the full set.
To demonstrate the increase in scalability we compare ILASP2i to ILASP2 on a variety of tasks from different problem domains. The results show that ILASP2i is up to 2 orders of magnitude faster and uses up to 2 orders of magnitude less memory than ILASP2. We have also applied both algorithms to the realworld problem domain of urban mobility, and explored in greater depth the task of learning a user’s journey preferences from pairwise examples of which journeys are preferred to others. As we learn ASP, these user preferences can very naturally be represented as weak constraints, which give an ordering over the journeys. Our results show that ILASP2i achieves an accuracy of at least with around 40 examples. We also show that, by further extending with ordering examples that express equal preferences, in addition to strict ordering, the accuracy can increase to .
The rest of the paper is structured as follows. In Section 2 we review the relevant background. In Section 3 we present our new contextdependent learning from ordered answer set task, and in Section 4 we introduce our new ILASP2i algorithm. In Section 5 we compare ILASP2i to ILASP2 on a range of different learning tasks and give a detailed evaluation of the accuracy of ILASP2i and compare its scalability with ILASP2 in the context of the journey planning problem. Finally, we conclude the paper with a discussion of related and future work.
2 Background
Let be atoms and and be integers. The ASP programs we consider contain normal rules, of the form ; constraints, which are rules of the form ; and choice rules, of the form . We refer to the part of the rule before the “:” as the head, and the part after the “:” as the body. The meaning of a rule is that if the body is true, then the head must be true. The empty head of a constraint means false, and constraints are used to rule out answer sets. The head of a choice rule is true if between and atoms from are true. The solutions of an ASP program form a subset of the Herbrand models of , called the answer sets of and denoted as .
ASP also allows optimisation over the answer sets according to weak constraints, which are rules of the form where are atoms called (collectively) the body of the rule, and are all terms with called the weight and the priority level. We will refer to as the tail of the weak constraint. A ground instance of a weak constraint is obtained by replacing all variables in (including those in the tail of ) with ground terms. In this paper, it is assumed that all weights and levels of all ground instances of weak constraints are integers.
Given a program and an interpretation we can construct the set of tuples such that there is a ground instance of a weak constraint in whose body is satisfied by and whose (ground) tail is . At each level the score of is the sum of the weights of tuples with level . An interpretation dominates another interpretation if there is a level for which has a lower score than , and no level higher than for which the scores of and are unequal. We write to denote that given the weak constraints in , dominates .
Example 1
Consider the set
The first weak constraint in , at priority , means “minimise the number of legs in our journey in which we have to walk through an area with a crime rating higher than ”. As this has the highest priority, answer sets are evaluated over this weak constraint first. The remaining weak constraints are considered only for those answer sets that have an equal number of legs where we have to walk through an area with such a crime rating. The second weak constraint means “minimise the number of buses we have to take” (at priority ). Finally, the last weak constraint means “minimise the distance walked”. Note that this is the case because for each leg where we have to walk, we pay the penalty of the distance of that leg (so the total penalty at level is the sum of the distances of the walking legs).
We now briefly summarise the key properties of Learning from Ordered Answer Sets and ILASP2, which we extend in this paper to Contextdependent Learning from Ordered Answer Sets and ILASP2i. It makes use of two types of examples: partial interpretations and ordering examples. A partial interpretation is a pair of sets of atoms . An answer set extends if and . An ordering example is a pair of partial interpretations. A program bravely (resp. cautiously) respects an ordering example if for at least one (resp. every) pair of answer sets that extend and , it is the case that .
Definition 1
[Law et al. (2015a)] A Learning from Ordered Answer Sets () task is a tuple where is an ASP program, called the background knowledge, is the set of rules allowed in hypotheses (the hypothesis space) and is a tuple . and are finite sets of partial interpretations called, respectively, positive and negative examples. and are finite sets of ordering examples over called, respectively, brave and cautious orderings. A hypothesis is an inductive solution of (written ) iff: ; , st extends ; , st extends ; , bravely respects ; and, , cautiously respects .
In [Law et al. (2015a)], we proposed a learning algorithm, called ILASP2, and proved that it is sound and complete with respect to tasks. We use the notation to denote a function that uses ILASP2 to return an optimal (shortest in terms of number of literals) solution of the task . ILASP2 terminates for any task such that grounds finitely (or equivalently, , grounds finitely). We call any such task well defined.
3 Contextdependent Learning from Ordered Answer Sets
In this section, we present an extension to the framework called Contextdependent Learning from Ordered Answer Sets (written ). In this new learning framework, examples can be given with an extra background knowledge called the context of an example. The idea is that each context only applies to a particular example, giving more structure to the background knowledge.
Definition 2
A contextdependent partial interpretation (CDPI) is a pair , where is a partial interpretation and is an ASP program with no weak constraints, called a context. A contextdependent ordering example (CDOE) is a pair of CDPIs, . A program is said to bravely (resp. cautiously) respect if for at least one (resp. every) pair such that , , extends and extends , it is the case that .
Example 2
Consider the programs , and . and . Also consider the CDOE , where , Let . bravely respects as is preferred to , but does not cautiously respect as is not preferred to .
Examples with empty contexts are equivalent to examples in . Note that contexts do not contain weak constraints. The operator defines the ordering over two answer sets based on the weak constraints in one program . So, given a CDOE , in which and contain different weak constraints, it is not clear whether the ordering should be checked using the weak constraints in , , or . We now present the framework.
Definition 3
A Contextdependent Learning from Ordered Answer Sets () task is a tuple where is an ASP program called the background knowledge, is the set of rules allowed in the hypotheses (the hypothesis space) and is a tuple called the examples. and are finite sets of CDPIs called, respectively, positive and negative examples, and and are finite sets of CDOEs over called, respectively, brave and cautious orderings. A hypothesis is an inductive solution of (written ) if and only if:

;

, st extends ;

, st extends ;

, bravely respects ; and finally,

, cautiously respects .
In this paper we will say a hypothesis covers an example iff it satisfies the appropriate condition in (2)(5); e.g. a brave CDOE is covered iff it is bravely respected.
Example 3
In general, it is not the case that an task can be translated into an task simply by moving all the contexts into the background knowledge ( where are the contexts of the examples). Consider, for instance, the task defined as follows:

. . .


This task has one solution: But, if we were to add all the contexts to the background knowledge, we would get a background knowledge containing the single fact . So, there would be no way of explaining both examples, as every hypothesis would, in this case, lead to a single answer set (either or ), and therefore cover only one of the examples.
To capture, instead, the meaning of contextdependent examples accurately, we could augment the background knowledge with the choice rule and define the examples as the pairs and . In this way, answer sets of the inductive solution would exclude when (i.e., in the context of raining), and include otherwise, which is the correct meaning of the given contextdependent examples.
Definition 4 gives a general translation of to , which enables the use of ILASP2 to solve tasks. The translation assumes that each example has a unique (constant) identifier, , and that for any CDPI , is the partial interpretation , where is a new predicate. Also, for any program and any atom , is the program constructed by appending to the body of every rule in .
Definition 4
For any task , , where the components of are as follows:


;

We say that an task is well defined if and only if is a well defined task. Before proving that this translation is correct, it is useful to introduce a lemma (which is proven in A). Given a program and a set of contexts , Lemma 3 gives a way of combining the alternative contexts into the same program. Each rule of each context , is appended with a new atom , unique to , and a choice rule stating that exactly one of the new atoms is true in each answer set. This means that the answer sets of , for each , are the answer sets of the combined program that contain (with the extra atom ).
For any program (consisting of normal rules, choice rules and constraints) and any set of pairs such that none of the atoms appear in (or in any of the ’s) and each atom is unique:
For any learning task , .
Let and .
st extends ; st extends ; bravely respects ; cautiously respects
st extends ; st extends ; bravely respects ; cautiously respects (by Lemma 3)
st extends ; st extends ; bravely respects ; cautiously respects
Theorem 3 shows that, by using an automatic translation, ILASP2 can be used to solve tasks. Although this means that any task can be translated to an task, contextdependent examples are useful for two reasons: firstly, they simplify the representation of some learning tasks; and secondly, the added structure gives more information about which parts of the background knowledge apply to particular examples. In Section 4 we present a new algorithm that is able to take advantage of this extra information.
The complexity of deciding whether an task is satisfiable is complete.
Theorem A (proven in A) implies that the complexity of deciding the satisfiability of an task is the same as for an task. Note that, similar to Theorem 2 in [Law et al. (2015a)], this result is for propositional tasks.
4 Iterative Algorithm: ILASP2i
In the previous section, we showed that our new task can be translated into , and therefore solved using the ILASP2 algorithm [Law et al. (2015a)]. However, ILASP2 may suffer from scalability issues, due to the number of examples or the size and complexity of the grounding of the hypothesis space, when combined with the background knowledge. In this paper, we address the first scalability issue by introducing a new algorithm, ILASP2i, for solving (contextdependent) learning from ordered answer sets tasks. The algorithm iteratively computes a hypothesis by incrementally constructing a subset of the examples that are relevant to the search. These are essentially counterexamples for incorrect hypotheses. The idea of the algorithm is to incrementally build, during the computation, a set of relevant examples and, at each iterative step, to learn hypotheses with respect only to this set of relevant examples instead of the full set of given examples. Although we do not directly address the second issue of large and complicated hypothesis spaces, it is worth noting that by using the notion of contextdependent examples, the size of the background knowledge (and therefore the grounding of the hypothesis space) in a particular iteration of our algorithm may be much smaller. In fact, in Section 5 we show that the background knowledge of one learning task (learning the definition of a Hamiltonian graph) can be eliminated altogether by using contexts.
Definition 5
Consider an learning task and a hypothesis . A (contextdependent) example is relevant to given if and does not cover .
The intuition of ILASP2i (Algorithm 1) is that we start with an empty set of relevant examples and an empty hypothesis. At each step of the search we look for an example which is relevant to our current hypothesis (i.e. an example that does not cover). If no such example exists, then we return our current hypothesis as an optimal inductive solution; otherwise, we add the example to our relevant set of examples and use ILASP2 to compute a new hypothesis.
The notation , in line 5 of algorithm 1, means to add the relevant example to the correct set in (the first set if it is a positive example etc).
The function returns a (contextdependent) example in which is not covered by , or if no such example exists. It works by encoding and into a meta program whose answer sets can be used to determine which examples in are covered. This meta program contains a choice rule, which specifies that each answer set of the program tests the coverage of a single CDPI or CDOE example. For a positive or negative example , if there is an answer set of the meta program corresponding to then there must be at least one answer set of that extends . This means that positive (resp. negative) examples are covered iff there is at least one (resp. no) answer set of the meta program that corresponds to . Similarly, CDOE’s are encoded such that each brave (resp. cautious) ordering is respected iff there is at least one (resp. no) answer set corresponding to . uses the answer sets of the meta program to determine which examples are not covered. Details of the meta program are in B, including proof of its correctness.
It should be noted that in the worst case our set of relevant examples is equal to the entire set of examples. In this case, ILASP2i is slower than ILASP2. In real settings, however, as examples are not carefully constructed, there is likely to be overlap between examples, so the relevant set will be much smaller than the whole set.Theorem A shows that ILASP2i has the same condition for termination as ILASP2.
ILASP2i terminates for any well defined task.
Note that although the algorithm is sound, it is complete only in the sense that it always returns an optimal solution if one exists (rather than returning the full set).
ILASP2i is sound for any well defined task, and returns an optimal solution if one exists.
Note that in Algorithm 1 the translation of a contextdependent learning task is applied to the contextdependent task generated incrementally at each step of the iteration (see line 6) instead of pretranslating the full initial task. This has the advantage that the background knowledge of the translated task only contains the contexts for the relevant examples, rather than the full set. In Section 5 we compare the efficiency of ILASP2i on tasks that have been pretranslated with corresponding tasks that have not been pretranslated, and demonstrate that in the latter case ILASP2i can be up to one order of magnitude faster. We refer to the applicationof ILASP2i with an automatic pretranslation to as ILASP2i_pt.
5 Evaluation
In this section, we demonstrate the improvement in performance of ILASP2i over ILASP2, both in terms of running time and memory usage. Although there are benchmarks for ASP solvers, as ILP systems for ASP are relatively new, and solve different computational tasks, there are no benchmarks for learning ASP programs. We therefore investigate new problems. To demonstrate the increased performance of ILASP2i over ILASP2, we chose tasks with large numbers of examples. We compare the algorithms in four problem settings, each including tasks requiring different components of the framework. We also investigate how the performance and accuracy vary with the number of examples, for the task of learning user journey preferences. All learning tasks were run with ILASP2, ILASP2i and ILASP2i_pt^{1}^{1}1For details of the tasks discussed in this section and how to download and run ILASP2, ILASP2i and ILASP2i_pt, see https://www.doc.ic.ac.uk/~ml1909/ILASP..
Learning  #examples  time/s  Memory/kB  
task  2  2i_pt  2i  2  2i_pt  2i  
Hamilton A  100  100  0  0  10.3  4.2  4.3  9.7  1.2  1.2 
Hamilton B  100  100  0  0  32.0  84.9  3.6  3.6  2.7  1.4 
Scheduling A  400  0  110  90  291.9  64.2  63.4  2.7  1.7  1.7 
Scheduling B  400  0  128  72  347.2  40.1  40.3  5.2  2.6  2.6 
Scheduling C  400  0  133  67  1141.8  123.6  124.2  8.4  4.9  5.0 
Agent A  200  0  0  0  444.5  56.7  39.1  4.7  3.7  9.8 
Agent B  50  0  0  0  TO  212.3  9.4  TO  1.1  1.8 
Agent C  80  120  0  0  808.7  132.3  60.1  2.9  3.5  8.4 
Agent D  172  228  390  0  OOM  863.3  408.4  OOM  2.4  8.0 
Our first problem setting is learning the definition of whether a graph is Hamiltonian or not (i.e. whether it contains a Hamilton cycle). Hamilton A is an (non contextdependent) task. The background knowledge consists of the two choice rules and : , meaning that the answer sets of correspond to the graphs of size 1 to 4. Each example then corresponds to exactly one graph, by specifying which and atoms should be true. Positive examples correspond to Hamiltonian graphs, and negative examples correspond to nonHamiltonian graphs. Hamilton B is an encoding of the same problem. The background knowledge is empty, and each example has a context consisting of the and atoms representing a single graph. ILASP2i performs significantly better than ILASP2 in both cases. Although ILASP2i is slightly faster at solving Hamilton B compared with Hamilton A, one interesting result is that ILASP2 and ILASP2i_pt perform better on Hamilton A. This is because the non contextdependent encoding in Hamilton A is more efficient than the automatic translation (using definition 4) of Hamilton B.
To test how the size of the contexts affects the performance of the three algorithms, we reran the Hamilton A and B experiments with the maximum size of the graphs varying from 4 to 10. Each experiment was run 100 times with randomly generated sets of positive and negative examples (100 of each in each experiment). The results (figure 1) show that ILASP2i performs best in both cases  interestingly, on average, there is no difference between Hamilton A (non contextdependent) and Hamilton B (contextdependent) at first, but as the maximum graph size increases, the domain of the background knowledge in Hamilton A increases and so ILASP2i performs better on Hamilton B. Although ILASP2i_pt is much slower on Hamilton B than Hamilton A, it uses significantly less memory on the former. As the performance of ILASP2i and ILASP2i_pt is the same on any non contextdependent task, we do not show the results for ILASP2i_pt on Hamilton A.
We also reconsider the problem of learning scheduling preferences, first presented in [Law et al. (2015a)]. In this setting, the goal is to learn an academic’s preferences about interview scheduling, encoded as weak constraints. Tasks AC in this case are over examples with 3x3, 4x3 and 5x3 timetables, respectively. As this setting contains no contexts for the examples, the performance of ILASP2i and ILASP2i_pt are relatively similar; however, for larger timetables both are over an order of magnitude faster and use over an order of magnitude less memory than ILASP2. Interestingly, although ILASP2i does not directly attempt to scale up the size of possible problem domains (in this case, the dimensions of the timetables), this experiment demonstrates that ILASP2i does (indirectly) improve the performance on larger problem domains. One unexpected observation is that ILASP2i runs faster on task B than task A. This is caused by the algorithm choosing “better” relevant examples for task B, and therefore needing a smaller set of relevant examples. On average, the time for 4x3 timetables would be expected to be higher than the 3x3’s.
Our third setting is taken from [Law et al. (2014)] and is based on an agent learning the rules of how it is allowed to move within a grid. Agent A requires a hypothesis describing the concept of which moves are valid, given a history of where an agent has been. Agent B requires a similar hypothesis to be learned, but with the added complexity that an additional concept is required to be invented. While Agent A and Agent B are similar to scenarios 1 and 2 in [Law et al. (2014)], the key difference is that different examples contain different histories of where the agent has been. These histories are encoded as contexts, whereas in [Law et al. (2014)], one single history was encoded in the background knowledge. There are also many more examples in these experiments. In Agent C, the hypothesis from Agent A must be learned along with a constraint ruling out histories in which the agent visits a cell twice (not changing the definition of valid move). This requires negative examples to be given, in addition to positive examples. In Agent D, weak constraints must be learned to explain why some traces through the grid are preferred to others. This uses positive, negative and brave ordering examples. In each case, ILASP2i performs significantly better than ILASP2i_pt, which performs significantly better thanILASP2 (ILASP2 times out in one experiment, and runs out of memory in another).
In our final setting, we investigate the problem of learning a user’s preferences over alternative journeys, in order to demonstrate how the performance of the three algorithms varies with the number of examples. We also investigate how the accuracy of ILASP2i varies with the number of examples. In this scenario, a user makes requests to a journey planner to get from one location to another. The user then chooses a journey from the alternatives returned by the planner. A journey consists of one or more legs, in each of which the user uses a single mode of transport.
We used a simulation environment [Poxrucker et al. (2014)] to generate realistic examples of journeys. In our experiment, we ran the simulator for one (simulated) day to generate a set of journey requests, along with the attributes of each possible journey. The attributes provided by the simulation data are: , which takes the value , , or ; , which takes an integer value between and ; and . As the crime ratings were not readily available from the simulator, we used a randomly generated value between and .
For our experiments, we assume that the user’s preferences can be represented by a set of weak constraints based on the attributes of a leg. We constructed a set of possible weak constraints, each including at most 3 literals. Most of these literals capture the leg’s attributes, e.g., or (if the attribute’s values range over integers this is represented by a variable, otherwise each possible value is used as a constant). For the crime rating (), we also allow comparisons of the form where is an integer from 1 to 4. The weight of each weak constraint is a variable representing the distance of the leg in the rule, or 1 and the priority is 1, 2 or 3. One possible set of preferences is the set of weak constraints in Example 1. denotes the set of possible weak constraints.
We now describe how to represent the journey preferences scenario in. We assume a journey is encoded as a set of attributes of the legs of the journey; for example the journey has two legs; in the first leg, the person must take a bus for 2000m and in the second, he/she must walk 100m. Given a set of such journeys and a partial ordering over , is the task , where and . Each solution of is a set of weak constraints representing preferences which explain the ordering of the journeys. Note that the positive examples are automatically satisfied as the (empty) background knowledge (combined with the context) already covers them. Also, as the background knowledge together with each context has exactly one answer set, the notions of brave and cautious orderings coincide; hence, we do not need cautious ordering examples for this task. Furthermore, since we are only learning weak constraints, and not hard constraints, the task also has no negative examples (a negative example would correspond to an invalid journey).
In each experiment we randomly selected 100 test hypotheses, each consisting of between 1 and 3 weak constraints from . For each test hypothesis , we then used the simulated journeys to generate a set of ordering examples such that was one of the optimal journeys, given , and was an nonoptimal alternative to . We then tested the algorithms on tasks with varying numbers of ordering examples by taking a random sample of the complete set of ordering examples.
The accuracy of ILASP2i for different numbers of examples is shown in Figure 2. The average accuracy converges to around after roughly 20 examples. As we only gave examples of journeys such that one was preferred to the other the hypotheses were often incorrect at predicting that two journeys were equal. We therefore introduced a new type of brave ordering example to ILASP2i, which enables us to specify that two answer sets should be equally optimal. We ran the same experiment with half of the ordering examples as the new “equality” orderings. The average accuracy increased to around after 40 examples. Note that as ILASP2 and ILASP2i return an arbitrary optimal solution of a task, their accuracy results, on average, are the same. We therefore only present the results for ILASP2i.
Figures 3(a) and (b) show the running times and memory usage (respectively) for up to 500 examples for ILASP2, ILASP2i and ILASP2i_pt. For experiments with more than 200 examples, ILASP2 ran out of memory. By 200 examples, ILASP2i is already over 2 orders of magnitude faster and uses over 2 orders of magnitude less memory than ILASP2, showing a significant improvement in scalability. The fact that by 500 examples ILASP2i is an order of magnitude faster without the pretranslation shows that, in this problem domain, the context is a large factor in this improvement; however, ILASP2i_pt’s significantly improved performance over ILASP2 shows that the iterative nature of ILASP2i is also a large factor.
6 Related Work
Most approaches to ILP address the learning of definite programs [Srinivasan (2001), Muggleton et al. (2014)], usually aiming to learn Prolog programs. As the language features of Prolog and ASP are different (e.g. ASP lacks lists, Prolog lacks choice), a comparison is difficult. On the shared language of ASP and the fragment of Prologlearned by these systems (definite rules), a traditional ILP task can be represented with a single positive example (where the inclusions (resp. exclusions) of this example correspond to the positive (resp. negative) examples in the original task).
The idea of contextdependent example has similarities with the concept of learning from interpretation transitions (LFIT) [Inoue et al. (2014)], where examples are pairs of set of atoms such that must satisfy (where is the set of immediate consequences of with respect to the program ). LFIT technically learns under the supported model semantics and uses a far smaller language than that supported by (not supporting choice rules or hard or weak constraints), but can be simply represented in . The head of each rule in the background knowledge and hypothesis space should be replaced by , and each body literal , by . Each example should then be mapped to a contextdependent positive example .
Other than our own frameworks, the two main ILP frameworks under the answer set semantics are brave and cautious induction [Sakama and Inoue (2009)]. As subsumes , inherits the ability to perform both brave and cautious induction. ILASP2i is therefore more general than systems such as [Ray (2009), Corapi et al. (2012), Athakravi et al. (2014)], which can only perform brave induction. In ILP, learners can be divided into batch learners (those which consider all examples simultaneously), such as [Ray (2009), Corapi et al. (2012), Athakravi et al. (2014), Law et al. (2014)], and learners which consider each example in turn (using a cover loop), such as [Srinivasan (2001), Muggleton (1995), Ray et al. (2003)]. Under the answer set semantics, most learners are batch learners due to the nonmonotonicity. In fact, it is worth noting that, in particular, although the HAIL [Ray et al. (2003)] algorithm for learning definite clauses employs a cover loop, the later XHAIL algorithm is a batch learner as it learns nonmonotonic programs [Ray (2009)]. One approach which did attempt to utilise a cover loop is [Sakama (2005)]. Their approach, however, was only sound for a small (monotonic) fragment of ASP if the task had multiple examples, as otherwise later examples could cause earlier examples to become uncovered.
The ILED system [Katzouris et al. (2015)] extended the ideas behind XHAIL in order to allow incremental learning of event definitions. This system takes as input, multiple “windows” of examples and incrementally learns a hypothesis. As the approach is based on theory revision (at each step, revising the hypothesis from the previous step), ILED is not guaranteed to learn an optimal solution. In contrast, ILASP2i learns a new hypothesis in each iteration and incrementally builds the set of relevant examples.
7 Conclusion
In this paper, we have presented an extension to our framework which allows examples to be given with extra background knowledge called the context of the example. We have shown that these contexts can be used to give structure to the background knowledge, showing which parts apply to which examples. We have also presented a new algorithm, ILASP2i, which makes use of this added structure to improve the efficiency over the previous ILASP2. In Section 5, we demonstrated that our new approach is considerably faster for tasks with large numbers of examples.
Unlike previous systems for learning under the answer set semantics, ILASP2i is not a batch learner and does not need to consider all examples at the same time, but instead iteratively builds a set of relevant examples. This combination of relevant examples and the added structure given by contexts means that ILASP2i can be up to 2 orders of magnitude better than ILASP2, both in terms of time and memory usage. In future work, we intend to investigate how to improve the scalability of ILASP2i with larger hypothesis spaces and with noisy examples.
References
 Athakravi et al. (2014) Athakravi, D., Corapi, D., Broda, K., and Russo, A. 2014. Learning through hypothesis refinement using answer set programming. In Inductive Logic Programming. Springer, 31–46.
 Corapi et al. (2012) Corapi, D., Russo, A., and Lupu, E. 2012. Inductive logic programming in answer set programming. In Inductive Logic Programming. Springer, 91–97.
 Inoue et al. (2014) Inoue, K., Ribeiro, T., and Sakama, C. 2014. Learning from interpretation transition. Machine Learning 94, 1, 51–79.
 Katzouris et al. (2015) Katzouris, N., Artikis, A., and Paliouras, G. 2015. Incremental learning of event definitions with inductive logic programming. Machine Learning 100, 23, 555–585.

Law
et al. (2014)
Law, M., Russo, A., and Broda, K. 2014.
Inductive learning of answer set programs.
In
Logics in Artificial Intelligence (JELIA 2014)
. Springer.  Law et al. (2015a) Law, M., Russo, A., and Broda, K. 2015a. Learning weak constraints in answer set programming. Theory and Practice of Logic Programming 15, 45, 511–525.
 Law et al. (2015b) Law, M., Russo, A., and Broda, K. 2015b. Proof of the soundness and completeness of ILASP2. https://www.doc.ic.ac.uk/~ml1909/Proofs_for_ILASP2.pdf.
 Lifschitz and Turner (1994) Lifschitz, V. and Turner, H. 1994. Splitting a logic program. In ICLP. Vol. 94. 23–37.
 Muggleton (1991) Muggleton, S. 1991. Inductive logic programming. New generation computing 8, 4, 295–318.
 Muggleton (1995) Muggleton, S. 1995. Inverse entailment and progol. New generation computing 13, 34, 245–286.
 Muggleton et al. (2014) Muggleton, S. H., Lin, D., Pahlavi, N., and TamaddoniNezhad, A. 2014. Metainterpretive learning: application to grammatical inference. Machine Learning 94, 1, 25–49.
 Poxrucker et al. (2014) Poxrucker, A., Bahle, G., and Lukowicz, P. 2014. Towards a realworld simulator for collaborative distributed learning in the scenario of urban mobility. In Proceedings of the Eighth IEEE International Conference on SelfAdaptive and SelfOrganizing Systems Workshops. IEEE Computer Society, 44–48.
 Ray (2009) Ray, O. 2009. Nonmonotonic abductive inductive learning. Journal of Applied Logic 7, 3, 329–340.
 Ray et al. (2003) Ray, O., Broda, K., and Russo, A. 2003. Hybrid abductive inductive learning: A generalisation of progol. In Inductive Logic Programming. Springer, 311–328.
 Sakama (2005) Sakama, C. 2005. Induction from answer sets in nonmonotonic logic programs. ACM Transactions on Computational Logic (TOCL) 6, 2, 203–231.
 Sakama and Inoue (2009) Sakama, C. and Inoue, K. 2009. Brave induction: a logical framework for learning from incomplete information. Machine Learning 76, 1, 3–35.
 Srinivasan (2001) Srinivasan, A. 2001. The aleph manual. Machine Learning at the Computing Laboratory, Oxford University.
Appendix A Proofs
In this section, we give the proofs of the theorems in the main paper. First, we prove the preliminary lemma (Lemma 3). Really, this is a corollary of the splitting set theorem [Lifschitz and Turner (1994)]. is the partial evaluation of with respect to (over the atoms in ), which is described in [Lifschitz and Turner (1994)].
For any program (consisting of normal rules, choice rules and constraints) and any set of pairs such that none of the atoms appear in (or in any of the ’s) and each atom is unique:
The answer sets of are , hence by the splitting set theorem (using as a splitting set):
.
The complexity of deciding whether an task is satisfiable is complete.
Deciding satisfiability for is complete ([Law et al. (2015a)]). It is therefore sufficient to show that there is a polynomial mapping from to and a polynomial mapping from to . The former is trivial (any task can be mapped to the same task in with empty contexts). The latter follows from theorem 3.
ILASP2i terminates for any well defined task.
Assume that the task is well defined. This means that is a well defined task (every possible hypothesis has a finite grounding when combined with the background knowledge of ). Note that this also means that is well defined in each iteration as the size of the grounding of the background knowledge of combined with each hypothesis will be smaller than or equal to the size of the background in (the background knowledge of is almost a subset of the background in , other than the extra choice rule, which is smaller).
The soundness of ILASP2 [Law et al. (2015a)] can be used to show that will always cover every example in ; hence, at each step must be an example which is in but not in . As there are a finite number of examples in , this means there can only be a finite number of iterations; hence, it remains to show that each iteration terminates. This is the case because, as is well defined, the call to ILASP2 terminates ([Law et al. (2015a)]) and terminates (B).
ILASP2i is sound for any well defined task, and returns an optimal solution if one exists.
If the ILASP2i algorithm returns a hypothesis then the while loop must terminate. For this to happen must return nil. This means that must cover every example in . Hence ILASP2i is sound. As the algorithm terminates (see Theorem A), the only way for a solution not to be returned is when returns nil. Since is complete [Law et al. (2015a)], this is only possible when is unsatisfiable. But if is unsatisfiable then so is .
It remains to show that when a solution is returned, it is an optimal solution. Any solution returned must be an optimal solution of , (as ILASP2returns an optimal solution). As it must also be a solution of , it must be an optimal solution (any shorter solution would be a solution of , contradicting that is an optimal solution for ).
Appendix B
In this section, we describe (and prove the correctness of) the