A Multi-Engine Approach to Answer Set Programming

06/20/2013 ∙ by Marco Maratea, et al. ∙ University of Calabria Università di Genova 0

Answer Set Programming (ASP) is a truly-declarative programming paradigm proposed in the area of non-monotonic reasoning and logic programming, that has been recently employed in many applications. The development of efficient ASP systems is, thus, crucial. Having in mind the task of improving the solving methods for ASP, there are two usual ways to reach this goal: (i) extending state-of-the-art techniques and ASP solvers, or (ii) designing a new ASP solver from scratch. An alternative to these trends is to build on top of state-of-the-art solvers, and to apply machine learning techniques for choosing automatically the "best" available solver on a per-instance basis. In this paper we pursue this latter direction. We first define a set of cheap-to-compute syntactic features that characterize several aspects of ASP programs. Then, we apply classification methods that, given the features of the instances in a training set and the solvers' performance on these instances, inductively learn algorithm selection strategies to be applied to a test set. We report the results of a number of experiments considering solvers and different training and test sets of instances taken from the ones submitted to the "System Track" of the 3rd ASP Competition. Our analysis shows that, by applying machine learning techniques to ASP solving, it is possible to obtain very robust performance: our approach can solve more instances compared with any solver that entered the 3rd ASP Competition. (To appear in Theory and Practice of Logic Programming (TPLP).)



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Answer Set Programming [Gelfond and Lifschitz (1988), Eiter et al. (1997), Marek and Truszczyński (1998), Niemelä (1998), Lifschitz (1999), Gelfond and Lifschitz (1991), Baral (2003)] (ASP) is a truly-declarative programming paradigm proposed in the area of non-monotonic reasoning and logic programming. The idea of ASP is to represent a given computational problem by a logic program whose answer sets correspond to solutions, and then use a solver to find such solutions [Lifschitz (1999)]. The language of ASP is very expressive, indeed all problems in the second level of the polynomial hierarchy can be expressed in ASP [Eiter et al. (1997)]. Moreover, in the last years ASP has been employed in many applications, see, e.g., [Nogueira et al. (2001), Baral (2003), Brooks et al. (2007), Friedrich and Ivanchenko (2008), Gebser et al. (2011), Balduccini and Lierler (2012)], and even in industry [Ricca et al. (2009), Ricca et al. (2010), Rullo et al. (2009), Ricca et al. (2012), Marczak et al. (2010), Smaragdakis et al. (2011)]. The development of efficient ASP systems is, thus, a crucial task, made even more challenging by existing and new-coming applications.

Having in mind the task of improving the robustness, i.e., the ability to perform well across a wide set of problem domains, and the efficiency, i.e., the quality of solving a high number of instances, of solving methods for Answer Set Programming (ASP), it is possible to extend existing state-of-the-art techniques implemented in ASP solvers, or design from scratch a new ASP system with powerful techniques and heuristics. An alternative to these trends is to build on top of state-of-the-art solvers, leveraging on a number of efficient ASP systems, e.g., 

[Simons et al. (2002), Leone et al. (2006), Giunchiglia et al. (2006), Gebser et al. (2007), Mariën et al. (2008), Janhunen et al. (2009)], and applying machine learning techniques for inductively choosing, among a set of available ones, the “best” solver on the basis of the characteristics, called features, of the input program. This approach falls in the framework of the algorithm selection problem [Rice (1976)]. Related approaches, following this per-instance selection, have been exploited for solving propositional satisfiability (SAT), e.g., [Xu et al. (2008)], and Quantified SAT (QSAT), e.g., [Pulina and Tacchella (2007)] problems. In ASP, an approach for selecting the “best” clasp internal configuration is followed in [Gebser et al. (2011)], while another approach that imposes learned heuristics ordering to smodels is [Balduccini (2011)].

In this paper we pursue this direction, and propose a multi-engine approach to ASP solving. We first define a set of cheap-to-compute syntactic features that describe several characteristics of ASP programs, paying particular attention to ASP peculiarities. We then compute such features for the grounded version of all benchmarks submitted to the “System Track” of the 3rd ASP Competition [Calimeri et al. (2012)] falling in the “NP” and “Beyond NP” categories of the competition: this track is well suited for our study given that contains many ASP instances, the language specification, ASP-Core, is a common ASP fragment such that many ASP systems can deal with it.

Then, we apply classification methods that, starting from the features of the instances in a training set, and the solvers’ performance on these instances, inductively learn general algorithm selection strategies to be applied to a test set. We consider six well-known multinomial classification methods, some of them considered in [Pulina and Tacchella (2007)]. We perform a number of analysis considering different training and test sets. Our experiments show that it is possible to obtain a very robust performance, by solving many more instances than all the solvers that entered the 3rd ASP Competition and DLV [Leone et al. (2006)].

The paper is structured as follow. Section 2 contains preliminaries about ASP and classification methods. Section 3 then describes our benchmark setting, in terms of dataset and solvers employed. Section 4 defines how features and solvers have been selected, and presents the classification methods employed. Section 5 is dedicated to the performance analysis, while Section 6 and 7 end the paper with discussion about related work and conclusions, respectively.

2 Preliminaries

In this section we recall some preliminary notions concerning Answer Set Programming and machine learning techniques for algorithm selection.

2.1 Answer Set Programming

In the following, we recall both the syntax and semantics of ASP. The presented constructs are included in ASP-Core [Calimeri et al. (2012)], which is the language specification that was originally introduced in the 3rd ASP Competition [Calimeri et al. (2012)] as well as the one employed in our experiments (see Section 3). Hereafter, we assume the reader is familiar with logic programming conventions, and refer the reader to [Gelfond and Lifschitz (1991), Baral (2003), Gelfond and Leone (2002)] for complementary introductory material on ASP, and to [Calimeri et al. (2011)] for obtaining the full specification of ASP-Core.


A variable or a constant is a term. An atom is , where is a predicate of arity and are terms. A literal is either a positive literal or a negative literal , where is an atom. A (disjunctive) rule is of the form:

where are atoms. The disjunction is the head of , while the conjunction is the body of . We denote by the set of atoms occurring in the head of , and we denote by the set of body literals. A rule s.t. (i.e., ) is called a normal rule; if the body is empty (i.e., ) it is called a fact (and the sign is omitted); if (i.e., ) is called a constraint. A rule is safe if each variable appearing in appears also in some positive body literal of .

An ASP program is a finite set of safe rules. A -free (resp., -free) program is called positive (resp., normal). A term, an atom, a literal, a rule, or a program is ground if no variable appears in it.


Given a program , the Herbrand Universe is the set of all constants appearing in , and the Herbrand Base is the set of all possible ground atoms which can be constructed from the predicates appearing in with the constants of . Given a rule , denotes the set of rules obtained by applying all possible substitutions from the variables in to elements of . Similarly, given a program , the ground instantiation of is .

An interpretation for a program is a subset of . A ground positive literal is true (resp., false) w.r.t. if (resp., ). A ground negative literal is true w.r.t. if is false w.r.t. ; otherwise is false w.r.t. .

The answer sets of a program are defined in two steps using its ground instantiation: first the answer sets of positive disjunctive programs are defined; then the answer sets of general programs are defined by a reduction to positive ones and a stability condition.

Let be a ground rule, the head of is true w.r.t. if . The body of is true w.r.t. if all body literals of are true w.r.t. , otherwise the body of is false w.r.t. . The rule is satisfied (or true) w.r.t. if its head is true w.r.t. or its body is false w.r.t. .

Given a ground positive program , an answer set for is a subset-minimal interpretation for such that every rule is true w.r.t. (i.e., there is no other interpretation that satisfies all the rules of ).

Given a ground program and an interpretation , the (Gelfond-Lifschitz) reduct [Gelfond and Lifschitz (1991)] of w.r.t. is the positive program , obtained from by deleting all rules whose negative body is false w.r.t. , and deleting the negative body from the remaining rules of .

An answer set (or stable model) of a general program is an interpretation of  such that is an answer set of .

As an example consider the program ., ., ., ., . and . The reduct is ., . ., .. is an answer set of , and for this reason it is also an answer set of .

2.2 Multinomial Classification for Algorithm Selection

With regard to empirically hard problems, there is rarely a best algorithm to solve a given combinatorial problem, while it is often the case that different algorithms perform well on different problem instances. In this work we rely on a per-instance selection algorithm in which, given a set of features –i.e., numeric values that represent particular characteristics of a given instance– it is possible to choose the best (or a good) algorithm among a pool of them –in our case, ASP solvers. In order to make such a selection in an automatic way, we model the problem using multinomial classification algorithms, i.e., machine learning techniques that allow automatic classification of a set of instances, given some instance features.

In more detail, in multinomial classification we are given a set of patterns, i.e., input vectors

with , and a corresponding set of labels, i.e., output values , where is composed of values representing the classes of the multinomial classification problem. In our modeling, the classes are ASP solvers. We think to the labels as generated by some unknown function applied to the patterns, i.e., for and . Given a set of patterns and a corresponding set of labels

, the task of a multinomial classifier

is to extrapolate given and , i.e., construct from and so that when we are given some we should ensure that is equal to . This task is called training, and the pair is called the training set.

Problem Class #Instances
DisjunctiveScheduling NP 10
GraphColouring NP 60
HanoiTower NP 59
KnightTour NP 10
MazeGeneration NP 50
Labyrinth NP 261
MultiContextSystemQuerying NP 73
Numberlink NP 150
PackingProblem NP 50
SokobanDecision NP 50
Solitaire NP 25
WeightAssignmentTree NP 62
MinimalDiagnosis Beyond NP 551
StrategicCompanies Beyond NP 51
Total 1462
Table 1: Problems and instances.

3 Benchmark Data and Settings

In this section we report the benchmark settings employed in this work, which is needed for properly introducing the techniques described in the remainder of the paper. In particular, we report some data concerning: benchmark problems, instances and ASP solvers employed, as well as the hardware platform, and the execution settings for reproducibility of experiments.

3.1 Dataset

The benchmarks considered for the experiments belong to the suite of the 3rd ASP Competition [Calimeri et al. (2011)]. This is a large and heterogeneous suite of hard benchmarks encoded in ASP-Core, which was already employed for evaluating the performance of state-of-the-art ASP solvers. That suite includes planning domains, temporal and spatial scheduling problems, combinatorial puzzles, graph problems, and a number of application domains, i.e., databases, information extraction and molecular biology field.222An exhaustive description of the benchmark problems can be found in [Calimeri et al. (2011)]. In more detail, we have employed the encodings used in the System Track of the competition, and all the problem instances made available (in form of facts) from the contributors of the problem submission stage of the competition, which are available from the competition website [Calimeri et al. (2011)]. Note that this is a superset of the instances actually selected for running (and, thus evaluated in) the competition itself. Hereafter, with instance we refer to the complete input program (i.e., encoding+facts) to be fed to a solver for each instance of the problem to be solved.

The techniques presented in this paper are conceived for dealing with propositional programs, thus we have grounded all the mentioned instances by using GrinGo (v.3.0.3) [Gebser et al. (2007)] to obtain a setup very close to the one of the competition. We considered only computationally-hard benchmarks, corresponding to all problems belonging to the categories NP and Beyond NP of the competition. The dataset is summarized in Table 1, which also reports the complexity classification and the number of available instances for each problem.

3.2 Executables and Hardware Settings

We have run all the ASP solvers that entered the System Track of the 3rd ASP Competition [Calimeri et al. (2011)] with the addition of DLV [Leone et al. (2006)] (which did not participate in the competition since it is developed by the organizers of the event). In this way we have covered –to the best of our knowledge– all the state-of-the-art solutions fitting the benchmark settings. In detail, we have run: clasp [Gebser et al. (2007)], claspD [Drescher et al. (2008)], claspfolio [Gebser et al. (2011)], idp [Wittocx et al. (2008)], cmodels [Lierler (2005)], sup [Lierler (2008)], Smodels [Simons et al. (2002)], and several solvers from both the lp2sat [Janhunen (2006)] and lp2diff [Janhunen et al. (2009)] families, namely: lp2gminisat, lp2lminisat, lp2lgminisat, lp2minisat, lp2diffgz3, lp2difflgz3, lp2difflz3, and lp2diffz3. More in detail, clasp is a native ASP solver relying on conflict-driven nogood learning; claspD is an extension of clasp that is able to deal with disjunctive logic programs, while claspfolio exploits machine learning techniques in order to choose the best-suited execution options of clasp; idp is a finite model generator for extended first-order logic theories, which is based on MiniSatID [Mariën et al. (2008)]; Smodels is one of the first robust native ASP solvers that have been made available to the community; DLV [Leone et al. (2006)] is one of the first systems able to cope with disjunctive programs; cmodels exploits a SAT solver as a search engine for enumerating models, and also verifies model minimality with SAT, whenever needed; sup exploits nonclausal constraints, and can be seen as a combination of the computational ideas behind cmodels and Smodels; the lp2sat family employs several variants (indicated by the trailing g, l and lg) of a translation strategy to SAT and resorts to MiniSat [Eén and Sörensson (2003)] for actually computing the answer sets; the lp2diff family translates programs in difference logic over integers [smt-lib-web (2011)] and exploit Z3 [de Moura and Bjørner (2008)] as underlying solver (again, g, l and lg indicate different translation strategies). DLV was run with default settings, while remaining solvers were run on the same configuration (i.e., parameter settings) as in the competition.

Concerning the hardware employed and the execution settings, all the experiments were carried out on CyberSAR [Masoni et al. (2009)], a cluster comprised of 50 Intel Xeon E5420 blades equipped with 64 bit GNU Scientific Linux 5.5. Unless otherwise specified, the resources granted to the solvers are 600s of CPU time and 2GB of memory. Time measurements were carried out using the time command shipped with GNU Scientific Linux 5.5.

4 Designing a Multi-Engine ASP Solver

The design of a multi-engine solver involves several steps: design of (syntactic) features that are both significant for classifying the instances and cheap-to-compute (so that the classifier can be fast and accurate); selection of solvers that are representative of the state of the art (to be able to possibly obtain the best performance in any considered instance); and selection of the classification algorithm, and fair design of training and test sets, to obtain a robust and unbiased classifier.

In the following, we describe the choices we have made for designing me-asp, which is our multi-engine solver for ground ASP programs.

4.1 Features

Our features selection process started by considering a very wide set of candidate features that correspond, in our view, to several characteristics of an ASP program that, in principle, should be taken into account.

The features that we compute for each ground program are divided into four groups (such a categorization is borrowed from [Nudelman et al. (2004)]):

  • Problem size features: number of rules , number of atoms , ratios , , and ratios reciprocal , and . This type of features are considered to give an idea of what is the size of the ground program.

  • Balance features: ratio of positive and negative atoms in each body, and ratio of positive and negative occurrences of each variable; fraction of unary, binary and ternary rules. This type of features can help to understand what is the “structure” of the analyzed program.

  • “Proximity to horn” features: fraction of horn rules and number of atoms occurrences in horn rules. These features can give an indication on “how much” a program is close to be horn: this can be helpful, since some solvers may take advantage from this setting (e.g., minimum or no impact of completion [Clark (1978)] when applied).

  • ASP peculiar features: number of true and disjunctive facts, fraction of normal rules and constraints, head sizes, occurrences of each atom in heads, bodies and rules, occurrences of true negated atoms in heads, bodies and rules; Strongly Connected Components (SCC) sizes, number of Head-Cycle Free (HCF) and non-HCF components, degree of support for non-HCF components.

For the features implying distributions, e.g., ratio of positive and negative atoms in each body, atoms occurrences in horn rules, and head sizes, five numbers are considered: minimum, 25% percentile, median, 75% percentile and maximum. The five numbers are considered given that we can not a-priori consider the distributions to be Gaussians, thus mean and variance are not that informative.

The set of features reported above seems to be adequate for describing an ASP program.333Observations concerning existing proposals are reported in Section 6. On the other hand, we have to consider that the time spent computing the features will be integral part of our solving process: the risk is to spend too much time in calculating the features of a program. This component of the solving process could result in a significant overhead in the solving time in case of instances that are easily solved by (some of) the engines, or can even cause a time out on programs otherwise solved by (some of) the engines within the time limit.

Given these considerations, our final choice is to consider syntactic features that are cheap-to-compute, i.e., computable in linear time in the size of the input, also given that in previous work (e.g., [Pulina and Tacchella (2007)]) syntactic features have been profitably used for characterizing (inherently) ground instances. To this end, we implemented a tool able to compute the above-reported set of features and conducted some preliminary experiments on all the benchmarks we were able to ground with GrinGo in less than 600s: 1425 instances out of a total of 1462, of which 823 out of 860 NP instances.444The exceptions are 10 and 27 instances of DisjunctiveScheduling and PackingProblem, respectively. On the one hand, the results confirmed the need for avoiding the computation of “expensive” features (e.g., SCCs): indeed, in this setting we could compute the whole set of features only for 665 NP instances within 600s; and, on the other hand, the results helped us in selecting a set of “cheap” features that are sufficient for obtaining a robust and efficient multi-engine system. In particular, the features that we selected are a subset of the ones reported above:

  • Problem size features: number of rules , number of atoms , ratios , , and ratios reciprocal , and ;

  • Balance features: fraction of unary, binary and ternary rules;

  • “Proximity to horn” features: fraction of horn rules;

  • ASP peculiar features: number of true and disjunctive facts, fraction of normal rules and constraints .

This final choice of features, together with some of their combinations (e.g., ), amounts for a total of 52 features. Our tool for extracting features from ground programs can then compute all these features (in less than 600s) for 1371 programs out of 1462. The distribution of the CPU times for extracting features is characterized by the following five numbers: 0.24s, 1.74s, 2.40s, 4.37s, 541.92s. It has to be noticed that high CPU times correspond to extracting features for ground programs whose size is in the order of GigaBytes. Our set of chosen features is relevant, as will be shown in Section 5.

4.2 Solvers Selection

Solver Solved Unique Solver Solved Unique
clasp 445 26 lp2diffz3 307
cmodels 333 6 lp2sat2gminisat 328
dlv 241 37 lp2sat2lgminisat 322
idp 419 15 lp2sat2lminisat 324
lp2diffgz3 254 lp2sat2minisat 336
lp2difflgz3 242 smodels 134
lp2difflz3 248 sup 311 1
Table 2: Results of a pool of ASP solvers on the NP instances of the 3rd ASP Competition. The table is organized as follows: column “Solver” reports the solver name, column “Solved” reports the total amount of instances solved with a time limit of 600 seconds, and, finally, in column “Unique” we report the total amount of uniquely solved instances by the corresponding solver.

The target of our selection is to collect a pool of solvers that is representative of the state-of-the-art solver (sota), i.e., considering a problem instance, the oracle that always fares the best among available solvers. Note that, in our settings, the various engines available employ (often substantially) different evaluation strategies, and (it is likely that) different engines behaves better in different domains, or in other words, the engines’ performance is “orthogonal”. As a consequence one can find that there are solvers that solves a significant number of instances uniquely (i.e., instances solved by only one solver), that have a characteristic performance and are a fundamental component of the sota. Thus, a pragmatic and reasonable choice, given that we want to solve as much instances as possible, is to consider a solver only if it solves a reasonable amount of instances uniquely, since this solver cannot be, in a sense, subsumed performance-wise by another behaving similarly.

In order to select the engines we ran preliminary experiments, and we report the results (regarding the NP class) in Table 2. Looking at the table, first we notice that we do not report results related to both claspD and claspfolio. Concerning the results of claspD, we report that –considering the NP class– its performance, in terms of solved instances, is subsumed by the performance of clasp. Considering the performance of claspfolio, we exclude such system from this analysis because we consider it as a yardstick system, i.e., we will compare its performance against the performance of me-asp.

Looking at Table 2, we can see that only 4 solvers out of 16 are able to solve a noticeable amount of instances uniquely, namely clasp, cmodels, DLV, and idp.555The picture of uniquely solved instances does not change even considering the entire family of lp2sat (resp. lp2diff) as a single engine that has the best performance among its variants. Concerning Beyond NP instances, we report that only three solvers are able to cope with such class of problems, namely claspD, cmodels, and DLV. Considering that both cmodels and DLV are involved in the previous selection, that claspD has a performance that does not overlap with the other two in Beyond NP instances, the pool of engines used in me-asp will be composed of 5 solvers, namely clasp, claspD, cmodels, DLV, and idp.

The experiments reported in Section 5 confirmed that this engine selection policy is effective in practice considering the ASP state of the art. Nonetheless, it is easy to see that in scenarios where the performance of most part of the available solvers is very similar on a common pool of instances, i.e., their performance is not “orthogonal”, choosing a solver for the only reason it solves a reasonable amount of instances uniquely may be not an effective policy. Indeed, the straightforward application of that policy to “overlapping” engines could result in discarding the best ones, since it is likely that several of them can solve the same instances. An effective possible extension of the selection policy presented above to deal with overlapping engines is to remove dominated solvers, i.e., a solver dominates a solver if the set of instances solved by is a superset of the instances solved by . Ties are broken choosing the solver that spends the smaller amount of CPU time. If the resulting pool of engines is still not reasonably distinguishable, i.e., there are not enough uniquely solved instances by each engine of the pool, then one may compute such pool, say , as follow: starting from the empty set (), and trying iteratively to add engine candidates to from the one that solves more instances, and faster, to the less efficient. At each iteration, an engine is added to if both the set of uniquely solved instances by the engines in is larger than in , and the resulting set is reasonably distinguishable.

We have applied the above extended policy, that is to be considered as a pragmatic strategy more than a general solution, obtaining good results in a specific experiment with overlapping engines; more details will be found in Section 5.4.

4.3 Classification Algorithms and Training

In the following, we briefly review the classifiers that we use in our empirical analysis. Considering the wide range of multinomial classifiers described in the scientific literature, we test a subset of algorithms, some of them considered in [Pulina and Tacchella (2007)]. Particularly, we can limit our choice to the classifiers able to deal with numerical attributes (the features) and multinomial class labels (the engines). Furthermore, in order to make our approach as general as possible, our desiderata is to choose classifiers that allow us to avoid “stringent” assumptions on the features distributions, e.g., hypotheses of normality or independence among the features. At the end, we also prefer classifiers that do not require complex parameter tuning, e.g., procedures that are more elaborated than standard parameters grid search. The selected classifiers are listed in the following:

  • Aggregation Pheromone density based pattern Classification (apc

    ): it is a pattern classification algorithm modeled on the ants colony behavior and distributed adaptive organization in nature. Each data pattern is considered as an ant, and the training patterns (ants) form several groups or colonies depending on the number of classes present in the data set. A new test pattern (ant) will move along the direction where average aggregation pheromone density (at the location of the new ant) formed due to each colony of ants is higher and, hence, eventually it will join that colony. We refer the reader to 

    [Halder et al. (2009)] for further details.

  • Decision rules (furia): a classifier providing a set of rules that generally takes the form of a Horn clause wherein the class labels is implied by a conjunction of some attributes; we use furia [Hühn and Hüllermeier (2009)] to induce decision rules.

  • Decision trees (j48): a classifier arranged in a tree structure, and used to discover decision rules. Each inner node contains a test on some attributes, and each leaf node contains a label; we use j48, an optimized implementation of c4.5 [Quinlan (1993)].

  • Multinomial Logistic Regression


    ): a classifier providing a hyperplane of the hypersurfaces that separate the class labels in the feature space; we use the inducer described in 

    [Le Cessie and Van Houwelingen (1992)].

  • Nearest-neighbor (nn): it is a classifier yielding the label of the training instance which is closer to the given test instance, whereby closeness is evaluated using, e.g., Euclidean distance [Aha et al. (1991)].

  • Support Vector Machine (svm

    ): it is a supervised learning algorithm used for both classification and regression tasks. Roughly speaking, the basic training principle of

    svms is finding an optimal linear hyperplane such that the expected classification error for (unseen) test patterns is minimized. We refer the reader to [Cortes and Vapnik (1995)] for further details.

The rationale of our choice is twofold. On the one hand, the selected classifiers are “orthogonal”, i.e., they build on different inductive biases in the computation of their classification hypotheses, since their classification algorithms are based on very different approaches. On the other hand, building me-asp on top of different classifiers allows to draw conclusions about both the robustness of our approach, and the proper design of our testing set. Indeed, as shown in Section 5, performance is positive for each classification method.

As mentioned in Section 2.2, in order to train the classifiers, we have to select a pool of instances for training purpose, called the training set. Concerning such selection, our aim is twofold. On the one hand, we want to compose a training set in order to get a robust model; while, on the other hand, we want to test the generalization performance of me-asp also on instances belonging to benchmarks not “covered” by the training set.

(a) Whole dataset
(b) ts
(c) ts
(d) ts
Figure 1: Training set coverage: two-dimensional space projection of (a) the whole dataset, (b) ts, (c) ts, and (d) ts.

As result of the considerations above, we designed three training sets. The first one –ts in the following– is composed of the 320 instances uniquely solved by the pool of engines selected in Section 4.2, i.e. such that only one engine, among the ones selected, solves each instance (without taking into account the instances involved in the competition). The rationale of this choice is to try to “mask” noisy information during model training to obtain a robust model. The remaining training sets are subsets of ts, and they are composed of instances uniquely solved considering only the ones belonging to the problems listed in the following:

  • ts: 297 instances uniquely solved considering:
    aaGraphColouring, Numberlink, Labyrinth, MinimalDiagnosis.

  • ts: 59 instances uniquely solved considering:
    aaSokobanDecision, HanoiTower, Labyrinth, StrategicCompanies.

Note that both ts and ts contain one distinct Beyond NP problem to ensure a minimum coverage of this class of problems. The rationale of these additional training sets is thus to test our method on ”unseen” problems, i.e. on instances coming from domains that were not used for training: a ”good” machine learning method should generalize (to some degrees) and obtain good results also in such setting. In this view, both training sets are composed of instances coming from a limited number, i.e., 4 out of 14, of problems. Moreover, ts is also composed of a very limited number of instances. Such setting will further challenge me-asp to understand what is the point in which we can have degradation in performance: we will see that, while it is true that ts is a challenging situation in which performance decreases, even in this setting me-asp has reasonable performance and performs better than its engines and rival systems.

In order to give an idea of the coverage of our training sets and outline differences among them, we depict in Figure 1 the coverage of: the whole available dataset (Fig. 1(a)), ts (Fig. 1(b)), and its subsets ts (Fig. 1(c)) and ts (Fig. 1(d)

). In particular, the plots report a two-dimensional projection obtained by means of a principal components analysis (PCA), and considering only the first two principal components (PC). The

-axis and the -axis in the plots are the first and the second PCs, respectively. Each point in the plots is labeled by the best solver on the related instance. In Figure 1(a) we add a label denoting the benchmark name of the depicted instances, in order to give a hint about the “location” of each benchmark. From the picture is clear that ts covers less space that ts, which in turn covers a subset of the whole set of instances. Clearly, ts, which is the smallest set of instances, has a very limited coverage (see Fig. 1(d)).

Considering the classification algorithms listed above,666For all algorithms but apc, we use the the tool rapidminer [Mierswa et al. (2006)]. we trained the classifiers and we assessed their accuracy. Referring to the notation introduced in Section 2.2, even assuming that a training set is sufficient to learn , it is still the case that different sets may yield a different . The problem is that the resulting trained classifier may underfit the unknown pattern –i.e., its prediction is wrong– or overfit –i.e., be very accurate only when the input pattern is in the training set. Both underfitting and overfitting lead to poor generalization performance, i.e., fails to predict when

. However, statistical techniques can provide reasonable estimates of the generalization error. In order to test the generalization performance, we use a technique known as

stratified 10-times 10-fold cross validation to estimate the generalization in terms of accuracy, i.e., the total amount of correct predictions with respect to the total amount of patterns. Given a training set , we partition in subsets with such that and whenever ; we then train on the patterns and corresponding labels . We repeat the process 10 times, to yield 10 different and we obtain the global accuracy estimate.

We report an accuracy greater than 92% for each classification algorithm trained on ts, while concerning the remaining training sets, just for the sake of completeness we report an average 85% as accuracy result. The main reason for this result is that the training sets different from ts are composed of a smaller number of instances with respect to ts, thus the classification algorithms are not able to generalize with the same accuracy. This result is not surprising, also considering the plots in Figure 1 and, as we will see in the experimental section, this will influence the performance of me-asp.

5 Performance Analysis

In this section we present the results of the analysis we have performed. We consider different combinations of training and test sets, where the training sets are the ones introduced in Section 4, and the test set ranges over the 3rd ASP Competition ground instances. In particular, the first (resp. second) experiment has ts as training set, and the successfully grounded instances evaluated (resp. submitted) to the 3rd ASP Competition as test set: the goal of this analysis is to test the efficiency of our approach on all the evaluated (resp. submitted) instances when the model is trained on the whole space of the uniquely solved instances. The third experiment considers ts and ts as training sets, and all the successfully grounded instances submitted to the competition as test set: in this case, given that the models are not trained on all the space of the uniquely solved instances, but on a portion, and that the test set contains “unseen” problems (i.e., belonging to domains that were left unknown during training), the goal is to test, in particular, the robustness of our approach. We devoted one subsection to each of these experiments, where we compare me-asp to its component engines. In detail, for each experiment the results are reported in a table structured as follows: the first column reports the name of the solver and (when needed) its inductive model in a subcolumn, where the considered inductive models are denoted by mod , mod and mod, corresponding to the test sets ts, ts and ts introduced before, respectively; the second and third columns report the result of each solver on NP and Beyond NP classes, respectively, in terms of the number of solved instances within the time limit and sum of their solving times (a sub-column is devoted to each of these numbers, which are “–” if the related solver was not among the selected engines). We report the results obtained by running me-asp with the six classification methods introduced in Section 4.3, and their related inductive models. In particular, me-asp (c) indicates me-asp employing the classification method c apc, furia, j48, mlr, nn, svm . We also report the component engines employed by me-asp on each class as explained in Section 4.2, and as reference sota, which is the ideal multi-engine solver (considering the engines employed).

An additional subsection summarizes results and compares me-asp with state-of-the-art solvers that won the 3rd ASP Competition.

We remind the reader that the compared engines were run on all the 1425 instances grounded in less than 600s, whereas the instances on which me-asp was run are limited to the ones for which we were able to compute all features (i.e., 1371 instances), and the timings for multi-engine systems include both the time spent for extracting the features from the ground instances, and the time spent by the classifier.

Solver NP Beyond NP
Ind. Model #Solved Time #Solved Time
clasp 60 5132.45
claspD 13 2344.00
cmodels 56 5092.43 9 2079.79
DLV 37 1682.76 15 1359.71
idp 61 5010.79
me-asp (apc) mod 63 5531.68 15 3286.28
me-asp (furia) mod 63 5244.73 15 3187.73
me-asp (j48) mod 68 5873.25 15 3187.73
me-asp (mlr) mod 65 5738.79 15 3187.57
me-asp (nn) mod 66 4854.78 15 3187.31
me-asp (svm) mod 60 4830.70 15 2308.60
sota 71 5403.54 15 1221.01
Table 3: Results of the various solvers on the grounded instances evaluated at the 3rd ASP Competition. me-asp has been trained on the ts training set.

5.1 Efficiency on Instances Evaluated at the Competition

In the first experiment we consider ts introduced in Section 4 as training set, and as test set all the instances evaluated at the 3rd ASP Competition (a total of 88 instances). Results are shown in Table 3. We can see that, on problems of the NP class, me-asp (j48) solves the highest number of instances, 7 more than idp and 8 more than clasp. Note also that me-asp (svm) (our worst performing version) is basically on par with clasp (with 60 solved instances) and is very close to idp (with 61 solved instances). Nonetheless, 5 out of 6 classification methods lead me-asp to have better performance than each of its engines. On the Beyond NP problems, instead, all versions of me-asp and DLV solve 15 instances (DLV having best mean CPU time), followed by claspD and cmodels, which solve 13 and 9 instances, respectively. Among the me-asp versions, me-asp (j48) is, in sum, the solver that solves the highest number of instances: here it is very interesting to note that its performance is very close to the sota solver (solving only 3 instances less) which, we remind, has the ideal performance that we could expect in these instances with these engines.

5.2 Efficiency on Instances Submitted to the Competition

In the second experiment we consider the ts training set (as for the previous experiment), and the test set is composed of all successfully grounded instances submitted to the 3rd ASP Competition. The results are now shown in Table 4. Note here that in both NP and Beyond NP classes, all me-asp versions solve more instances (or in shorter time in one case) than the component engines: in particular, in the NP class, me-asp (apc) solves the highest number of instances, 52 more than clasp, which is the best engine in this class, while in the Beyond NP class me-asp (mlr) solves 519 instances and three me-asp versions solve 518 instances, i.e., 86 and 85 more instances than claspD, respectively, which is the engine that solves more instances in the Beyond NP class. Also in this case me-asp (svm) solves less instances than other me-asp versions; nonetheless, me-asp (svm) can solve as much NP instances as clasp, and is effective on Beyond NP, where it is one of the versions that can solve 518 instances.

As far as the comparison with the sota solver is concerned, the best me-asp version, i.e., me-asp (apc) solves, in sum, only 23 out of 1036 instances less than the sota solver, mostly from the NP class.

Solver NP Beyond NP
Ind. Model #Solved Time #Solved Time
clasp 445 47096.14
claspD 433 52029.74
cmodels 333 40357.30 270 38654.29
DLV 241 21678.46 364 9150.47
idp 419 37582.47
me-asp (apc) mod 497 55334.15 516 60537.67
me-asp (furia) mod 480 48563.26 518 60009.23
me-asp (j48) mod 490 49564.19 510 59922.86
me-asp (mlr) mod 489 49569.77 519 58287.31
me-asp (nn) mod 490 46780.31 518 55043.39
me-asp (svm) mod 445 40917.70 518 52553.84
sota 516 39857.76 520 24300.82
Table 4: Results of the various solvers on the grounded instances submitted to the 3rd ASP Competition. me-asp has been trained on the ts training set.

In order to give a different look at the magnitude of improvements of our approach in this experiment, whose test set we remind is a super-set of the one in Section 5.1, in Fig 2 we present the results of me-asp (apc), its engines, claspfolio and sota on NP instances in a cumulative way as customary in, e.g., Max-SAT and ASP Competitions. The -axis reports a CPU time, while the -axis indicates the number of instances solved within a certain CPU time.

Results clearly show that me-asp (apc) performs better, in terms of total number of instances solved, than its engines clasp, claspD and claspfolio; also, me-asp (apc) it is very close to the sota. Looking more in details at the figure, we can note that, along the -axis the distance of me-asp (apc) w.r.t. the sota decreases: this is due, for a small portion of instances (given that we have seen that these two steps are efficient), to the time spent to compute features and on classification, and to the fact that we may not always predict the best engine to run. The convergence of me-asp (apc) toward sota confirms that, even if we may sometimes miss to predict the best engine, most of the time we predict an engine that allows to solve the instance within the time limit.

Figure 2: Results of claspfolio, me-asp engines, me-asp (apc) (trained on ts and sota on the NP instances submitted to the competition.

5.3 Robustness on Instances Submitted to the Competition

In this experiment, we use the two smaller training sets ts and ts introduced in Section 4, while the same test set as that of previous experiment. The rationale of this last experiment is to test the robustness of our approach on “unseen” problems, i.e., in a situation where the test set does not contain any instance from some problems. Note that ts contains 297 uniquely solved instances, covering 4 domains out of 14; and ts is very small, since it contains only 59 instances belonging to 4 domains. We can thus expect this experiment to be particularly challenging for our multi-engine approach. Results are presented in Table 5, from which it is clear that me-asp (apc) trained on ts performs better that the other alternatives and solves 46 instances more than clasp in the NP class, and 11 instances more than claspD in the Beyond NP class (clasp and claspD being the best engines in NP and Beyond NP classes, respectively). As expected, if we compare the results with the ones obtained with the larger training set ts, we note a general performance degradation. In particular, the performance now is less close to the sota solver, which solves in total 40 more instances than the best me-asp version trained on ts, with additional unsolved instances coming mainly from the Beyond NP class in this case. This can be explained considering that ts does not contain instances from the Strategic Companies problem, and, thus, it is not always able to select DLV on these instances where DLV is often a better choice than claspD. However, me-asp can solve also in this case far more instances than all the engines, demonstrating a robust performance.

These findings are confirmed when the very small test set ts is considered. In this very challenging setting there are still me-asp versions that can solve more instances than the component engines.

Solver NP Beyond NP
Ind. Model #Solved Time #Solved Time
clasp 445 47096.14
claspD 433 52029.74
cmodels 333 40357.30 270 38654.29
DLV 241 21678.46 364 9150.47
idp 419 37582.47
me-asp (apc) mod 491 54126.87 505 56250.96
me-asp (furia) mod 479 49226.42 507 55777.67
me-asp (j48) mod 477 46746.65 507 55777.67
me-asp (mlr) mod 471 48404.11 507 52499.83
me-asp (nn) mod 476 47627.06 507 49418.67
me-asp (svm) mod 459 38686.16 507 51462.13
me-asp (apc) mod 445 48290.97 433 53268.62
me-asp (furia) mod 414 37902.37 363 10542.85
me-asp (j48) mod 487 51187.66 431 57393.61
me-asp (mlr) mod 460 42385.66 363 10542.01
me-asp (nn) mod 487 48889.21 363 10547.81
me-asp (svm) mod 319 32162.37 364 10543.00
sota 516 39857.76 520 24300.82
Table 5: Results of the various solvers on the grounded instances submitted to the 3rd ASP Competition. me-asp has been trained on training sets ts and ts.
(a) Inductive model mod
(b) Inductive model mod
(c) Inductive model mod
Figure 3: Number of calls to the component engines of the various versions of me-asp on the instances submitted to the 3rd ASP Competition.

5.4 Discussion and Comparison to the State of the Art

Summing up the three experiments, it is clear that me-asp has a very robust and efficient performance: it often can solve (many) more instances than its engines, even considering the single NP and Beyond NP classes.

We also report that all versions of me-asp have reasonable performance, so –from a machine learning point of view– we can conclude that, on the one hand, the set of cheap-to-compute features that we selected is representative (i.e., they allow to both analyze a significant number of instances and drive the selection of an appropriate engine) independently from the classification method employed. On the other hand, the robustness of our inductive models let us conclude that we made an appropriate design of our training set ts.

Additional observations can be drawn by looking at Figure 3, where three plots are depicted, one for each inductive model, showing the number of calls to the internal engines for each variant of me-asp. In particular, by looking at Figure 3(a), we can conclude that also the selection of the engines was fair. Indeed, all of them were employed in a significant number of cases and, as one would expect, the engines that solved a larger number of instances in the 3rd ASP Competition (i.e., clasp and claspD) are called more often. Nonetheless, the ability of exploiting all solvers from the pool made a difference in performance, e.g., looking at Figure 3(a) one can note that our best version me-asp (apc) exploits all engines, and it is very close to the ideal performance of sota. It is worth noting that the me-asp versions that select DLV more often (note that DLV solves uniquely a high number of StrategicCompanies instances) performed better on Beyond NP. Note also that Figure 3 allows to explain the performance of me-asp (svm), which often differs from the other methods; indeed, this version often prefers DLV over the other engines also on NP instances. Despite choosing DLV is often decisive on Beyond NP, it is not always a good choice on NP as well. As a consequence me-asp (svm) is always very fast on Beyond NP but does not show overall the same performance of me-asp equipped with other methods.

Figure 3(a) also gives some additional insight concerning the differences among our inductive models. In particular, the me-asp versions trained with ts (containing only StrategicCompanies in Beyond NP) prefer more often DLV (see Fig. 3(c)), thus the performance is good on this class but deteriorates a bit on NP. Concerning ts (see Fig. 3(b)), we note that idp is less exploited than in the other cases, even by me-asp (mlr) which is the alternative that chooses idp

more often: this is probably due to the minor coverage of this training set on

NP. On the overall, as we would expect, the number of calls for me-asp trained with ts is more balanced among the various engines, than for me-asp trained with the smaller training sets.

We have seen that me-asp almost always can solve more instances than its component engines. One might wonder how it compares with the state-of-the-art ASP implementations. Table 6 summarizes the performance of claspD and claspfolio (the overall winner, and the fastest solver in the NP class that entered the System Track of the competition, respectively), in terms of number of solved instances on both instance sets, i.e., evaluated and submitted, and of the various versions of me-asp exploiting our inductive model of choice, obtained from the test set ts.

We observe that all me-asp versions outperform yardstick state-of-the-art solvers considering all submitted instances.777Recall that claspfolio can deal with NP instances only.

Concerning the comparison on the instances evaluated at the 3rd ASP Competition, we note that all me-asp versions outperform the winner of the System Track of the competition claspD that could solve 65 instances, whereas me-asp (j48) (i.e., the best solver in this class) solves 83 instances (and is very close to the ideal sota solver). Even me-asp (svm) (i.e., the worst performing version of our system) could solve 10 instances more than claspD; moreover, also me-asp (apc) is very effective here, solving 78 instances.

Concerning the comparison on the larger set of instances submitted to the 3rd ASP Competition, the picture is similar. All me-asp versions outperform claspD, which solves 835 instances where the worst performing version of our system, me-asp (svm), solves 963 instances, and the best version overall me-asp (apc) solvers 1013 instances, i.e., 178 instances more that the winner of the 3rd ASP System competition. We remind that this holds even considering the most challenging settings when me-asp is trained with ts and ts (see Tab. 5).

If we limit our attention to the instances belonging to the NP class, the yardstick for comparing me-asp with the state of the art is clearly claspfolio. Indeed, claspfolio was the solver that could solve more NP instances at the 3rd ASP Competition, and also claspfolio is the state of the art portfolio system for ASP, selecting from a pool of different clasp configurations.

The picture that comes out from Table 6 shows that all versions of me-asp could solve more instances than claspfolio, especially considering the instances submitted to the competition. In particular, me-asp (apc) solves 497 NP instances, while claspfolio solves 431. Concerning the comparison on the instances evaluated at the 3rd ASP Competition, we note that claspfolio could solve 62 instances and performs similarly to, e.g., me-asp (svm) (with 60 instances), and me-asp (furia) (with 63 instances); our best performing version (i.e., me-asp (j48)) could solve 68 instances, i.e., 6 instances more that claspfolio (i.e., about 10% more).

Up to now we have compared the raw performance of me-asp with out-of-the-box alternatives. A more precise picture of the comparison between the two machine learning based approaches (me-asp and claspfolio) can be obtained by performing some additional analysis.

First of all note that the above comparison was made considering as reference the claspfolio version (trained by the Potassco team) that entered the 3rd ASP Competition. One might wonder what is the performance of claspfolio when trained on our training set ts. As will be discussed in detail in Section 6, claspfolio exploits a different method for algorithm selection, thus this datum is reported here only for the sake of completeness. We have trained claspfolio on ts with the help of the Potassco team.888Following the suggestion of the Potassco team we have run claspfolio

(ver. 1.0.1 – Aug, 19th 2011), since the feature extraction tool

claspre has been recently updated and integrated in claspfolio. As a result, the performance of claspfolio trained on ts is analogous to the one obtained by the claspfolio trained for the competition (i.e., it solves 59 instances from the evaluated set, and 433 of the submitted set).

On the other hand, one might want to analyze what would be the result of applying the approach to algorithm selection implemented in me-asp to the setting of claspfolio. As pointed out in Section 6 the multi-engine approach that we have followed in me-asp is very flexible, and we could easily develop an ad-hoc version of our system, that we called me-clasp, that is based on the same “algorithms” portfolio of claspfolio. In practice, we considered as a separate engine each of the 25 clasp versions employed in claspfolio, and we applied the same steps described in Section 4 to build me-clasp. Concerning the selection of the engines, as one might expect, many engines are overlapping and the number of uniquely solved instances considering all the available engines was very low (we get only ten uniquely solved instances). Thus, we applied the extended engine selection policy and we selected 5 engines, we trained me-clasp on ts, and selected a classification algorithm, in this case nn. (We also tried other settings with different combinations, both more and less engines, still obtaining similar overall results).

The goal of this final experiment is to confirm the prediction power of our approach. The resulting picture is that me-clasp (nn) solves 458 NP instances, where the ideal limit that one can reach considering all the 25 heuristics in the portfolio is 484. This is substantially more that claspfolio, solving 431 instances. Nonetheless, me-asp (nn) (that solves 490) outperforms me-clasp (nn).

All in all, one can conclude that the approach introduced in this paper, combining cheap-to-compute features, and multinomial classification works well also when applied to a portfolio of heuristics. On the other hand, as one might expect, the possibility to select among several different engines featuring (often radically different) evaluation strategies with non overlapping performance, gives additional advantages w.r.t. a single-engine portfolio. Indeed, even in presence of an ideal prediction strategy, a portfolio approach based on variants of the same algorithm cannot achieve the same performance of an ideal multi-engine approach. This is clear observing that the sota solver on NP can solve 516 instances, whereas the ideal performance for both me-clasp and claspfolio tops at 484 instances. The comparison of me-clasp and me-asp seem to confirm that me-asp can exploit this ideal advantage also in practice.

Solver Evaluated Submitted
Ind. Model NP Beyond NP Tot. NP Beyond NP Tot.
claspD 52 13 65 402 433 835
claspfolio Competition 62 431
me-asp (apc) mod 63 15 78 497 516 1013
me-asp (furia) mod 63 15 78 480 518 998
me-asp (j48) mod 68 15 83 490 510 1000
me-asp (mlr) mod 65 15 80 489 519 1008
me-asp (nn) mod 66 15 81 490 518 1008
me-asp (svm) mod 60 15 75 445 518 963
Table 6: Comparison to the state of the art. me-asp trained on training set ts.

6 Related Work

Starting from the consideration that, on empirically hard problems, there is rarely a “global” best algorithm, while it is often the case that different algorithms perform well on different problem instances, Rice 1976 defined the algorithm selection problem as the problem of finding an effective algorithm based on an abstract model of the problem at hand. Along this line, several works have been done to tackle combinatorial problems efficiently. In Gomes and Selman (2001); Leyton-Brown et al. (2003) it is described the concept of “algorithm portfolio” as a general method for combining existing algorithms into new ones that are unequivocally preferable to any of the component algorithms. Most related papers to our work are Xu et al. (2008); Pulina and Tacchella (2007) for solving SAT and QSAT problems. Both Xu et al. (2008) and Pulina and Tacchella (2007) rely on a per-instance analysis, like the one we have performed in this paper: in Pulina and Tacchella (2007), which is the work closest to our, the goal is to design a multi-engine solver, i.e. a tool that can choose among its engines the one which is more likely to yield optimal results. Pulina and Tacchella (2009) extends Pulina and Tacchella (2007) by introducing a self-adaptation of the learned selection policies when the approach fails to give a good prediction. The approach by Xu et al. 2008 has also the ability to compute features on-line, e.g., by running a solver for an allotted amount of time and looking “internally” to solver statistics, with the option of changing the solver on-line: this is a per-instance algorithm portfolio approach. The related solver, satzilla, can also combine portfolio and multi-engine approaches. The algorithm portfolio approach is employed also in: Gomes and Selman (2001) on Constraint Satisfaction and MIP, Samulowitz and Memisevic (2007) on QSAT and Gerevini et al. (2009) on planning problems. If we consider “pure” approaches, the advantage of the algorithm portfolio over a multi-engine is that it is possible, by combining algorithms, to reach a performance than is better than the one of the best engine, which is an upper bound for a multi-engine solver instead. On the other hand, multi-engine treats the engines as a black-box, and this is a fundamental assumption to have a flexible and modular system: to add a new engine one just needs to update the inductive model. Other approaches, an overview can be found in Hoos (2012), work by designing methods for automatically tuning and configuring the solver parameters: e.g., Hutter et al. (2009); Hutter et al. (2010) for solving SAT and MIP problems, and Vallati et al. (2011) for planning problems.

About the other approaches in ASP, the one implemented in claspfolio Gebser et al. (2011) mixes characteristics of the algorithm portfolio approach with others more similar to this second trend: it works by selecting the most promising clasp internal configuration on the basis of both “static” and “dynamic” features of the input program, the latter obtained by running clasp for a given amount of time. Thus, like the algorithm portfolio approaches, it can compute both static and dynamic features, while trying to automatically configure the “best” clasp configuration on the basis of the computed features.

The work here presented is in a different ballpark w.r.t. claspfolio for a number of motivations. First, from a machine learning point of view, the inductive models of me-asp are based on classification algorithms, while the inductive models of claspfolio are mainly based on regression techniques, as in satzilla, with the exception of a “preliminary” stage, in which a classifier is invoked in order to predict the satisfiability result of the input instance. Regression-based techniques usually need many training instances to have a good prediction while, as shown in our paper, this is not required for our method that is based on classification. To highlight consideration of the prediction power, in Section 5.4 we have applied our approach to claspfolio, showing that relying on classification instead of regression in claspfolio can lead to better results. Second, as mentioned before, in our approach we consider the engines as a black-box: me-asp architecture is designed to be independent from the engines internals. me-asp, being a multi-engine solver, has thus higher modularity/flexibility w.r.t. claspfolio: adding a new solver to me-asp is immediate, while this is problematic in claspfolio, and likely would boil down to implement the new strategy in clasp. Third, as a consequence of the previous point, we use only static features: dynamic features, as in the case of claspfolio, usually are both strongly related to a given engine and possibly costly to compute, and we avoided such kind of features. For instance, one of the claspfolio dynamic feature is related to the number of “learnt constraints”, that could be a significant feature for clasp but not for other systems, e.g., DLV that does not adopt learning and is based on look-ahead. Lastly, as described in Section 4.1, we use only cheap-to-compute features, while claspfolio relies some quite “costly” features, e.g., number of SCCs and loops. This was confirmed on some preliminary experiments: it turned out that claspfolio feature extractor could compute, in 600s, all its features for 573 out of 823 NP ground instances.

An alternative approach in ASP is followed in the dors framework of Balduccini 2011, where in the off-line learning phase, carried out on representative programs from a given domain, a heuristic ordering is selected to be then used in smodels when solving other programs from the same domain. The target of this work seems to be real-world problem domains where instances have similar structures, and heuristic ordering learned in some (possibly small) instances in the domain can help to improve the performance on other (possibly big) instances. According to its author999Personal communications with Marcello Balduccini. the solving method behind dors can be considered “complementary” more than alternative w.r.t. the one of me-asp, i.e., they could in principle be combined. An idea can be the following: while computing features, one can (in parallel) run one or more engines in order to learn a (possibly partial) heuristic ordering. Then, in the solving phase, engines can take advantage from the learned heuristic (but, of course, assuming minimal changes in the engines). This would come up to having two “sources” of knowledge: the “most promising” engine, learned with the multi-engine approach, and the learned heuristic ordering.

Finally, we remark that this work is an extended and revised version of Maratea et al. (2012a), the main improvements include:

  • the adoption of six classification methods (instead of the only one, i.e., nn, employed in Maratea et al. (2012a));

  • a more detailed analysis of the dataset and the test sets;

  • a wider experimental analysis, including more systems, i.e., different versions of me-asp and claspfolio, and more investigations on training and test sets, and

  • an improved related work, in particular w.r.t. the comparison with claspfolio.

7 Conclusion

In this paper we have applied machine learning techniques to ASP solving with the goal of developing a fast and robust multi-engine ASP solver. To this end, we have: specified a number of cheap-to-compute syntactic features that allow for accurate classification of ground ASP programs; applied six multinomial classification methods to learning algorithm selection strategies; and implemented these techniques in our multi-engine solver me-asp, which is available for download at

http://www.mat.unical.it/ricca/me-asp .

The performance of me-asp was assessed on three experiments, which were conceived for checking efficiency and robustness of our approach, involving different training and test sets of instances taken from the ones submitted to the System Track of the 3rd ASP Competition. Our analysis shows that our multi-engine solver me-asp is very robust and efficient, and outperforms both its component engines and state-of-the-art solvers.


The authors would like to thank Marcello Balduccini for useful discussions (by email and in person) about the solving algorithm underlying his system dors, and all the members of the claspfolio team, in particular Torsten Schaub and Thomas Marius Schneider, for clarifications and the valuable support to train claspfolio in the most proper way.


  • Aha et al. (1991) Aha, D., Kibler, D., and Albert, M. 1991. Instance-based learning algorithms. Machine learning 6, 1, 37–66.
  • Balduccini (2011) Balduccini, M. 2011. Learning and using domain-specific heuristics in ASP solvers.

    AI Communications – The European Journal on Artificial Intelligence

     24, 2, 147–164.
  • Balduccini and Lierler (2012) Balduccini, M. and Lierler, Y. 2012. Practical and methodological aspects of the use of cutting-edge asp tools. In Proc. of the 14th International Symposium on Practical Aspects of Declarative Languages (PADL 2012), C. V. Russo and N.-F. Zhou, Eds. Lecture Notes in Computer Science, vol. 7149. Springer, 78–92.
  • Baral (2003) Baral, C. 2003. Knowledge Representation, Reasoning and Declarative Problem Solving. Cambridge University Press, Tempe, Arizona.
  • Brooks et al. (2007) Brooks, D. R., Erdem, E., Erdogan, S. T., Minett, J. W., and Ringe, D. 2007. Inferring phylogenetic trees using answer set programming.

    Journal of Automated Reasoning

     39, 4, 471–511.
  • Calimeri et al. (2011) Calimeri, F., Ianni, G., and Ricca, F. since 2011. The third answer set programming system competition. https://www.mat.unical.it/aspcomp2011/.
  • Calimeri et al. (2011) Calimeri, F., Ianni, G., Ricca, F., Alviano, M., Bria, A., Catalano, G., Cozza, S., Faber, W., Febbraro, O., Leone, N., Manna, M., Martello, A., Panetta, C., Perri, S., Reale, K., Santoro, M. C., Sirianni, M., Terracina, G., and Veltri, P. 2011. The Third Answer Set Programming Competition: Preliminary Report of the System Competition Track. In Proc. of LPNMR11. LNCS Springer, Vancouver, Canada, 388–403.
  • Calimeri et al. (2012) Calimeri, F., Ianni, G., Ricca, F. 2012. The third open answer set programming competition. Theory and Practice of Logic Programming. Available online. DOI:http://dx.doi.org/10.1017/S1471068412000105.
  • Clark (1978) Clark, K. L. 1978. Negation as Failure. In Logic and Data Bases, H. Gallaire and J. Minker, Eds. Plenum Press, New York, 293–322.
  • Cortes and Vapnik (1995) Cortes, C. and Vapnik, V. 1995. Support-vector networks. Machine learning 20, 3, 273–297.
  • de Moura and Bjørner (2008) de Moura, L. M. and Bjørner, N. 2008. Z3: An Efficient SMT Solver. In Proceedings of the 14th International Conference on Tools and Algorithms for Construction and Analysis of Systems, TACAS 2008. 337–340.
  • Drescher et al. (2008) Drescher, C., Gebser, M., Grote, T., Kaufmann, B., König, A., Ostrowski, M., and Schaub, T. 2008. Conflict-Driven Disjunctive Answer Set Solving. In Proceedings of the Eleventh International Conference on Principles of Knowledge Representation and Reasoning (KR 2008), G. Brewka and J. Lang, Eds. AAAI Press, Sydney, Australia, 422–432.
  • Eén and Sörensson (2003) Eén, N. and Sörensson, N. 2003. An Extensible SAT-solver. In Theory and Applications of Satisfiability Testing, 6th International Conference, SAT 2003. LNCS Springer, 502–518.
  • Eiter et al. (1997) Eiter, T., Gottlob, G., and Mannila, H. 1997. Disjunctive Datalog. ACM Transactions on Database Systems 22, 3 (Sept.), 364–418.
  • Friedrich and Ivanchenko (2008) Friedrich, G. and Ivanchenko, V. 2008. Diagnosis from first principles for workflow executions. Tech. rep., Alpen Adria University, Applied Informatics, Klagenfurt, Austria. http://proserver3-iwas.uni-klu.ac.at/download_area/Technical-Reports/technical_report_2008_02.pdf.
  • Gebser et al. (2011) Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T., Schneider, M. T., and Ziller, S. 2011. A portfolio solver for answer set programming: Preliminary report. In Proc. of the 11th International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR), J. P. Delgrande and W. Faber, Eds. LNCS, vol. 6645. Springer, Vancouver, Canada, 352–357.
  • Gebser et al. (2007) Gebser, M., Kaufmann, B., Neumann, A., and Schaub, T. 2007. Conflict-driven answer set solving. In Twentieth International Joint Conference on Artificial Intelligence (IJCAI-07). Morgan Kaufmann Publishers, Hyderabad, India, 386–392.
  • Gebser et al. (2007) Gebser, M., Schaub, T., and Thiele, S. 2007. GrinGo : A New Grounder for Answer Set Programming. In Logic Programming and Nonmonotonic Reasoning, 9th International Conference, LPNMR 2007. Lecture Notes in Computer Science, vol. 4483. Springer, Tempe, Arizona, 266–271.
  • Gebser et al. (2011) Gebser, M., Schaub, T., Thiele, S., and Veber, P. 2011. Detecting inconsistencies in large biological networks with answer set programming. Theory and Practice of Logic Programming 11, 2-3, 323–360.
  • Gelfond and Leone (2002) Gelfond, M. and Leone, N. 2002. Logic Programming and Knowledge Representation – the A-Prolog perspective . Artificial Intelligence 138, 1–2, 3–38.
  • Gelfond and Lifschitz (1988) Gelfond, M. and Lifschitz, V. 1988. The Stable Model Semantics for Logic Programming. In Logic Programming: Proceedings Fifth Intl Conference and Symposium. MIT Press, Cambridge, Mass., 1070–1080.
  • Gelfond and Lifschitz (1991) Gelfond, M. and Lifschitz, V. 1991. Classical Negation in Logic Programs and Disjunctive Databases. New Generation Computing 9, 365–385.
  • Gerevini et al. (2009) Gerevini, A., Saetti, A., and Vallati, M. 2009. An automatically configurable portfolio-based planner with macro-actions: Pbp. In Proc. of the 19th International Conference on Automated Planning and Scheduling, A. Gerevini, A. E. Howe, A. Cesta, and I. Refanidis, Eds. AAAI, Thessaloniki, Greece.
  • Giunchiglia et al. (2006) Giunchiglia, E., Lierler, Y., and Maratea, M. 2006. Answer set programming based on propositional satisfiability. Journal of Automated Reasoning 36, 4, 345–377.
  • Gomes and Selman (2001) Gomes, C. P. and Selman, B. 2001. Algorithm portfolios. Artificial Intelligence 126, 1-2, 43–62.
  • Halder et al. (2009) Halder, A., Ghosh, A., and Ghosh, S. 2009. Aggregation pheromone density based pattern classification. Fundamenta Informaticae 92, 4, 345–362.
  • Hoos (2012) Hoos, H. H. 2012. Programming by optimization. Communucations of the ACM 55, 2, 70–80.
  • Hühn and Hüllermeier (2009) Hühn, J. and Hüllermeier, E. 2009. Furia: an algorithm for unordered fuzzy rule induction. Data Mining and Knowledge Discovery 19, 3, 293–319.
  • Hutter et al. (2010) Hutter, F., Hoos, H. H., and Leyton-Brown, K. 2010. Automated configuration of mixed integer programming solvers. In

    Proc. of the 7th International Conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems

    , A. Lodi, M. Milano, and P. Toth, Eds. LNCS, vol. 6140. Springer, Bologna, Italy, 186–202.
  • Hutter et al. (2009) Hutter, F., Hoos, H. H., Leyton-Brown, K., and Stützle, T. 2009. ParamILS: An automatic algorithm configuration framework. Journal of Artificial Intelligence Research 36, 267–306.
  • Janhunen (2006) Janhunen, T. 2006. Some (in)translatability results for normal logic programs and propositional theories. Journal of Applied Non-Classical Logics 16, 35–86.
  • Janhunen et al. (2009) Janhunen, T., Niemelä, I., and Sevalnev, M. 2009. Computing stable models via reductions to difference logic. In Proceedings of the 10th International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR). LNCS. Springer, Postdam, Germany, 142–154.
  • Leone et al. (2006) Leone, N., Pfeifer, G., Faber, W., Eiter, T., Gottlob, G., Perri, S., and Scarcello, F. 2006. The DLV System for Knowledge Representation and Reasoning. ACM Transactions on Computational Logic 7, 3 (July), 499–562.
  • Leyton-Brown et al. (2003) Leyton-Brown, K., Nudelman, E., Andrew, G., Mcfadden, J., and Shoham, Y. 2003. A portfolio approach to algorithm selection. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI-03.
  • Lierler (2005) Lierler, Y. 2005. Disjunctive Answer Set Programming via Satisfiability. In Logic Programming and Nonmonotonic Reasoning — 8th International Conference, LPNMR’05, Diamante, Italy, September 2005, Proceedings, C. Baral, G. Greco, N. Leone, and G. Terracina, Eds. Lecture Notes in Computer Science, vol. 3662. Springer Verlag, 447–451.
  • Lierler (2008) Lierler, Y. 2008. Abstract Answer Set Solvers. In Logic Programming, 24th International Conference (ICLP 2008). Lecture Notes in Computer Science, vol. 5366. Springer, 377–391.
  • Lifschitz (1999) Lifschitz, V. 1999. Answer Set Planning. In Proceedings of the 16th International Conference on Logic Programming (ICLP’99), D. D. Schreye, Ed. The MIT Press, Las Cruces, New Mexico, USA, 23–37.
  • Maratea et al. (2012a) Maratea, M., Pulina, L., and Ricca, F. 2012a. Applying Machine Learning Techniques to ASP Solving. In Technical Communications of the 28th International Conference on Logic Programming (ICLP 2012). LIPIcs, vol. 17. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 37–48.
  • Maratea et al. (2012b) Maratea, M., Pulina, L., and Ricca, F. 2012b. The multi-engine asp solver me-asp. In Proceedings of Logics in Artificial Intelligence, JELIA 2012. LNCS, vol. 7519. Springer, 484–487.
  • Marczak et al. (2010) Marczak, W. R., Huang, S. S., Bravenboer, M., Sherr, M., Loo, B. T., and Aref, M. 2010. Secureblox: customizable secure distributed data processing. In SIGMOD Conference. 723–734.
  • Marek and Truszczyński (1998) Marek, V. W. and Truszczyński, M. 1998. Stable models and an alternative logic programming paradigm. CoRR cs.LO/9809032.
  • Mariën et al. (2008) Mariën, M., Wittocx, J., Denecker, M., and Bruynooghe, M. 2008. Sat(id): Satisfiability of propositional logic extended with inductive definitions. In Proc. of the 11th International Conference on Theory and Applications of Satisfiability Testing, SAT 2008. LNCS. Springer, Guangzhou, China, 211–224.
  • Masoni et al. (2009) Masoni, A., Carpinelli, M., Fenu, G., Bosin, A., Mura, D., Porceddu, I., and Zanetti, G. 2009. Cybersar: A lambda grid computing infrastructure for advanced applications. In Nuclear Science Symposium Conference Record (NSS/MIC), 2009 IEEE. IEEE, 481–483.
  • Mierswa et al. (2006) Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., and Euler, T. 2006. Yale: Rapid prototyping for complex data mining tasks. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 935–940.
  • Niemelä (1998) Niemelä, I. 1998. Logic Programs with Stable Model Semantics as a Constraint Programming Paradigm. In Proceedings of the Workshop on Computational Aspects of Nonmonotonic Reasoning, I. Niemelä and T. Schaub, Eds. Trento, Italy, 72–79.
  • Le Cessie and Van Houwelingen (1992) Le Cessie, S., and Van Houwelingen, J.C. 1992. Ridge estimators in logistic regression. Applied statistics. JSTOR, 191–201.
  • Nogueira et al. (2001) Nogueira, M., Balduccini, M., Gelfond, M., Watson, R., and Barry, M. 2001. An A-Prolog Decision Support System for the Space Shuttle. In Practical Aspects of Declarative Languages, Third International Symposium (PADL 2001), I. Ramakrishnan, Ed. Lecture Notes in Computer Science, vol. 1990. Springer, 169–183.
  • Nudelman et al. (2004) Nudelman, E., Leyton-Brown, K., Hoos, H. H., Devkar, A., and Shoham, Y. 2004. Understanding random SAT: Beyond the clauses-to-variables ratio. In Proc. of the 10th International Conference on Principles and Practice of Constraint Programming (CP), M. Wallace, Ed. Lecture Notes in Computer Science. Springer, Toronto, Canada, 438–452.
  • Pulina and Tacchella (2007) Pulina, L. and Tacchella, A. 2007. A multi-engine solver for quantified boolean formulas. In Proc. of the 13th International Conference on Principles and Practice of Constraint Programming (CP), C. Bessiere, Ed. Lecture Notes in Computer Science. Springer, Providence, Rhode Island, 574–589.
  • Pulina and Tacchella (2009) Pulina, L. and Tacchella, A. 2009. A self-adaptive multi-engine solver for quantified boolean formulas. Constraints 14, 1, 80–116.
  • Quinlan (1993) Quinlan, J. 1993. C4.5: programs for machine learning. Morgan kaufmann.
  • Ricca et al. (2009) Ricca, F., Gallucci, L., Schindlauer, R., Dell’Armi, T., Grasso, G., and Leone, N. 2009. OntoDLV: an ASP-based system for enterprise ontologies. Journal of Logic and Computation 19, 4, 643–670.
  • Ricca et al. (2012) Ricca, F., Grasso, G., Alviano, M., Manna, M., Lio, V., Iiritano, S., and Leone, N. 2012. Team-building with Answer Set Programming in the Gioia-Tauro Seaport. Theory and Practice of Logic Programming 12, 3, 361–381.
  • Ricca et al. (2010) Ricca, F., Dimasi, A., Grasso, G., Ielpa, S.M., Iiritano, S., Manna, M., and Leone, N. 2010. A Logic-Based System for e-Tourism. Fundamenta Informaticae. 105 (2010) 35–55
  • Rice (1976) Rice, J. R. 1976. The algorithm selection problem. Advances in Computers 15, 65–118.
  • Rullo et al. (2009) Rullo, P., Policicchio, V. L., Cumbo, C., and Iiritano, S. 2009. Olex: Effective rule learning for text categorization. IEEE Transactions on Knowledge and Data Engineering 21, 8, 1118–1132.
  • Samulowitz and Memisevic (2007) Samulowitz, H. and Memisevic, R. 2007. Learning to solve QBF. In Proceedings of the 22th AAAI Conference on Artificial Intelligence. AAAI Press, Vancouver, Canada, 255–260.
  • Simons et al. (2002) Simons, P., Niemelä, I., and Soininen, T. 2002. Extending and Implementing the Stable Model Semantics. Artificial Intelligence 138, 181–234.
  • Smaragdakis et al. (2011) Smaragdakis, Y., Bravenboer, M., and Lhoták, O. 2011. Pick your contexts well: understanding object-sensitivity. In Proceedings ot the 38th Symposium on Principles of Programming Languages, POPL 2011. 17–30.
  • smt-lib-web (2011) smt-lib-web. 2011. The Satisfiability Modulo Theories Library. http://www.smtlib.org/.
  • Gerevini et al. (2009) Gerevini, A., Saetti, A., and Vallati, M. 2009. An automatically configurable portfolio-based planner with macro-actions: Pbp. In Proc. of the 19th International Conference on Automated Planning and Scheduling, A. Gerevini, A. E. Howe, A. Cesta, and I. Refanidis, Eds. AAAI, Thessaloniki, Greece.
  • Vallati et al. (2011) Vallati, M., Fawcett, C., Gerevini, A., Hoos, H., and Saetti, A. 2011. Generating fast domain-specific planners by automatically configuring a generic parameterised planner. In Working notes of 21st International Conference on Automated Planning and Scheduling (ICAPS-11) Workshop on Planning and Learning.
  • Wittocx et al. (2008) Wittocx, J., Mariën, M., and Denecker, M. 2008. The idp system: a model expansion system for an extension of classical logic. In Logic and Search, Computation of Structures from Declarative Descriptions (LaSh 2008). Leuven, Belgium, 153–165.
  • Xu et al. (2008) Xu, L., Hutter, F., Hoos, H. H., and Leyton-Brown, K. 2008. SATzilla: Portfolio-based algorithm selection for SAT. Journal of Artificial Intelligence Research 32, 565–606.