1 Introduction
Answer Set Programming [Gelfond and Lifschitz (1988), Eiter et al. (1997), Marek and Truszczyński (1998), Niemelä (1998), Lifschitz (1999), Gelfond and Lifschitz (1991), Baral (2003)] (ASP) is a trulydeclarative programming paradigm proposed in the area of nonmonotonic reasoning and logic programming. The idea of ASP is to represent a given computational problem by a logic program whose answer sets correspond to solutions, and then use a solver to find such solutions [Lifschitz (1999)]. The language of ASP is very expressive, indeed all problems in the second level of the polynomial hierarchy can be expressed in ASP [Eiter et al. (1997)]. Moreover, in the last years ASP has been employed in many applications, see, e.g., [Nogueira et al. (2001), Baral (2003), Brooks et al. (2007), Friedrich and Ivanchenko (2008), Gebser et al. (2011), Balduccini and Lierler (2012)], and even in industry [Ricca et al. (2009), Ricca et al. (2010), Rullo et al. (2009), Ricca et al. (2012), Marczak et al. (2010), Smaragdakis et al. (2011)]. The development of efficient ASP systems is, thus, a crucial task, made even more challenging by existing and newcoming applications.
Having in mind the task of improving the robustness, i.e., the ability to perform well across a wide set of problem domains, and the efficiency, i.e., the quality of solving a high number of instances, of solving methods for Answer Set Programming (ASP), it is possible to extend existing stateoftheart techniques implemented in ASP solvers, or design from scratch a new ASP system with powerful techniques and heuristics. An alternative to these trends is to build on top of stateoftheart solvers, leveraging on a number of efficient ASP systems, e.g.,
[Simons et al. (2002), Leone et al. (2006), Giunchiglia et al. (2006), Gebser et al. (2007), Mariën et al. (2008), Janhunen et al. (2009)], and applying machine learning techniques for inductively choosing, among a set of available ones, the “best” solver on the basis of the characteristics, called features, of the input program. This approach falls in the framework of the algorithm selection problem [Rice (1976)]. Related approaches, following this perinstance selection, have been exploited for solving propositional satisfiability (SAT), e.g., [Xu et al. (2008)], and Quantified SAT (QSAT), e.g., [Pulina and Tacchella (2007)] problems. In ASP, an approach for selecting the “best” clasp internal configuration is followed in [Gebser et al. (2011)], while another approach that imposes learned heuristics ordering to smodels is [Balduccini (2011)].In this paper we pursue this direction, and propose a multiengine approach to ASP solving. We first define a set of cheaptocompute syntactic features that describe several characteristics of ASP programs, paying particular attention to ASP peculiarities. We then compute such features for the grounded version of all benchmarks submitted to the “System Track” of the 3rd ASP Competition [Calimeri et al. (2012)] falling in the “NP” and “Beyond NP” categories of the competition: this track is well suited for our study given that contains many ASP instances, the language specification, ASPCore, is a common ASP fragment such that many ASP systems can deal with it.
Then, we apply classification methods that, starting from the features of the instances in a training set, and the solvers’ performance on these instances, inductively learn general algorithm selection strategies to be applied to a test set. We consider six wellknown multinomial classification methods, some of them considered in [Pulina and Tacchella (2007)]. We perform a number of analysis considering different training and test sets. Our experiments show that it is possible to obtain a very robust performance, by solving many more instances than all the solvers that entered the 3rd ASP Competition and DLV [Leone et al. (2006)].
The paper is structured as follow. Section 2 contains preliminaries about ASP and classification methods. Section 3 then describes our benchmark setting, in terms of dataset and solvers employed. Section 4 defines how features and solvers have been selected, and presents the classification methods employed. Section 5 is dedicated to the performance analysis, while Section 6 and 7 end the paper with discussion about related work and conclusions, respectively.
2 Preliminaries
In this section we recall some preliminary notions concerning Answer Set Programming and machine learning techniques for algorithm selection.
2.1 Answer Set Programming
In the following, we recall both the syntax and semantics of ASP. The presented constructs are included in ASPCore [Calimeri et al. (2012)], which is the language specification that was originally introduced in the 3rd ASP Competition [Calimeri et al. (2012)] as well as the one employed in our experiments (see Section 3). Hereafter, we assume the reader is familiar with logic programming conventions, and refer the reader to [Gelfond and Lifschitz (1991), Baral (2003), Gelfond and Leone (2002)] for complementary introductory material on ASP, and to [Calimeri et al. (2011)] for obtaining the full specification of ASPCore.
Syntax.
A variable or a constant is a term. An atom is , where is a predicate of arity and are terms. A literal is either a positive literal or a negative literal , where is an atom. A (disjunctive) rule is of the form:
where are atoms. The disjunction is the head of , while the conjunction is the body of . We denote by the set of atoms occurring in the head of , and we denote by the set of body literals. A rule s.t. (i.e., ) is called a normal rule; if the body is empty (i.e., ) it is called a fact (and the sign is omitted); if (i.e., ) is called a constraint. A rule is safe if each variable appearing in appears also in some positive body literal of .
An ASP program is a finite set of safe rules. A free (resp., free) program is called positive (resp., normal). A term, an atom, a literal, a rule, or a program is ground if no variable appears in it.
Semantics.
Given a program , the Herbrand Universe is the set of all constants appearing in , and the Herbrand Base is the set of all possible ground atoms which can be constructed from the predicates appearing in with the constants of . Given a rule , denotes the set of rules obtained by applying all possible substitutions from the variables in to elements of . Similarly, given a program , the ground instantiation of is .
An interpretation for a program is a subset of . A ground positive literal is true (resp., false) w.r.t. if (resp., ). A ground negative literal is true w.r.t. if is false w.r.t. ; otherwise is false w.r.t. .
The answer sets of a program are defined in two steps using its ground instantiation: first the answer sets of positive disjunctive programs are defined; then the answer sets of general programs are defined by a reduction to positive ones and a stability condition.
Let be a ground rule, the head of is true w.r.t. if . The body of is true w.r.t. if all body literals of are true w.r.t. , otherwise the body of is false w.r.t. . The rule is satisfied (or true) w.r.t. if its head is true w.r.t. or its body is false w.r.t. .
Given a ground positive program , an answer set for is a subsetminimal interpretation for such that every rule is true w.r.t. (i.e., there is no other interpretation that satisfies all the rules of ).
Given a ground program and an interpretation , the (GelfondLifschitz) reduct [Gelfond and Lifschitz (1991)] of w.r.t. is the positive program , obtained from by deleting all rules whose negative body is false w.r.t. , and deleting the negative body from the remaining rules of .
An answer set (or stable model) of a general program is an interpretation of such
that is an answer set of .
As an example consider the program ., ., ., ., . and . The reduct is ., . ., .. is an answer set of , and for this reason it is also an answer set of .
2.2 Multinomial Classification for Algorithm Selection
With regard to empirically hard problems, there is rarely a best algorithm to solve a given combinatorial problem, while it is often the case that different algorithms perform well on different problem instances. In this work we rely on a perinstance selection algorithm in which, given a set of features –i.e., numeric values that represent particular characteristics of a given instance– it is possible to choose the best (or a good) algorithm among a pool of them –in our case, ASP solvers. In order to make such a selection in an automatic way, we model the problem using multinomial classification algorithms, i.e., machine learning techniques that allow automatic classification of a set of instances, given some instance features.
In more detail, in multinomial classification we are given a set of patterns, i.e., input vectors
with , and a corresponding set of labels, i.e., output values , where is composed of values representing the classes of the multinomial classification problem. In our modeling, the classes are ASP solvers. We think to the labels as generated by some unknown function applied to the patterns, i.e., for and . Given a set of patterns and a corresponding set of labels, the task of a multinomial classifier
is to extrapolate given and , i.e., construct from and so that when we are given some we should ensure that is equal to . This task is called training, and the pair is called the training set.Problem  Class  #Instances 

DisjunctiveScheduling  NP  10 
GraphColouring  NP  60 
HanoiTower  NP  59 
KnightTour  NP  10 
MazeGeneration  NP  50 
Labyrinth  NP  261 
MultiContextSystemQuerying  NP  73 
Numberlink  NP  150 
PackingProblem  NP  50 
SokobanDecision  NP  50 
Solitaire  NP  25 
WeightAssignmentTree  NP  62 
MinimalDiagnosis  Beyond NP  551 
StrategicCompanies  Beyond NP  51 
Total  1462 
3 Benchmark Data and Settings
In this section we report the benchmark settings employed in this work, which is needed for properly introducing the techniques described in the remainder of the paper. In particular, we report some data concerning: benchmark problems, instances and ASP solvers employed, as well as the hardware platform, and the execution settings for reproducibility of experiments.
3.1 Dataset
The benchmarks considered for the experiments belong to the suite of the 3rd ASP Competition [Calimeri et al. (2011)]. This is a large and heterogeneous suite of hard benchmarks encoded in ASPCore, which was already employed for evaluating the performance of stateoftheart ASP solvers. That suite includes planning domains, temporal and spatial scheduling problems, combinatorial puzzles, graph problems, and a number of application domains, i.e., databases, information extraction and molecular biology field.^{2}^{2}2An exhaustive description of the benchmark problems can be found in [Calimeri et al. (2011)]. In more detail, we have employed the encodings used in the System Track of the competition, and all the problem instances made available (in form of facts) from the contributors of the problem submission stage of the competition, which are available from the competition website [Calimeri et al. (2011)]. Note that this is a superset of the instances actually selected for running (and, thus evaluated in) the competition itself. Hereafter, with instance we refer to the complete input program (i.e., encoding+facts) to be fed to a solver for each instance of the problem to be solved.
The techniques presented in this paper are conceived for dealing with propositional programs, thus we have grounded all the mentioned instances by using GrinGo (v.3.0.3) [Gebser et al. (2007)] to obtain a setup very close to the one of the competition. We considered only computationallyhard benchmarks, corresponding to all problems belonging to the categories NP and Beyond NP of the competition. The dataset is summarized in Table 1, which also reports the complexity classification and the number of available instances for each problem.
3.2 Executables and Hardware Settings
We have run all the ASP solvers that entered the System Track of the 3rd ASP Competition [Calimeri et al. (2011)] with the addition of DLV [Leone et al. (2006)] (which did not participate in the competition since it is developed by the organizers of the event). In this way we have covered –to the best of our knowledge– all the stateoftheart solutions fitting the benchmark settings. In detail, we have run: clasp [Gebser et al. (2007)], claspD [Drescher et al. (2008)], claspfolio [Gebser et al. (2011)], idp [Wittocx et al. (2008)], cmodels [Lierler (2005)], sup [Lierler (2008)], Smodels [Simons et al. (2002)], and several solvers from both the lp2sat [Janhunen (2006)] and lp2diff [Janhunen et al. (2009)] families, namely: lp2gminisat, lp2lminisat, lp2lgminisat, lp2minisat, lp2diffgz3, lp2difflgz3, lp2difflz3, and lp2diffz3. More in detail, clasp is a native ASP solver relying on conflictdriven nogood learning; claspD is an extension of clasp that is able to deal with disjunctive logic programs, while claspfolio exploits machine learning techniques in order to choose the bestsuited execution options of clasp; idp is a finite model generator for extended firstorder logic theories, which is based on MiniSatID [Mariën et al. (2008)]; Smodels is one of the first robust native ASP solvers that have been made available to the community; DLV [Leone et al. (2006)] is one of the first systems able to cope with disjunctive programs; cmodels exploits a SAT solver as a search engine for enumerating models, and also verifies model minimality with SAT, whenever needed; sup exploits nonclausal constraints, and can be seen as a combination of the computational ideas behind cmodels and Smodels; the lp2sat family employs several variants (indicated by the trailing g, l and lg) of a translation strategy to SAT and resorts to MiniSat [Eén and Sörensson (2003)] for actually computing the answer sets; the lp2diff family translates programs in difference logic over integers [smtlibweb (2011)] and exploit Z3 [de Moura and Bjørner (2008)] as underlying solver (again, g, l and lg indicate different translation strategies). DLV was run with default settings, while remaining solvers were run on the same configuration (i.e., parameter settings) as in the competition.
Concerning the hardware employed and the execution settings, all the experiments were carried out on CyberSAR [Masoni et al. (2009)], a cluster comprised of 50 Intel Xeon E5420 blades equipped with 64 bit GNU Scientific Linux 5.5. Unless otherwise specified, the resources granted to the solvers are 600s of CPU time and 2GB of memory. Time measurements were carried out using the time command shipped with GNU Scientific Linux 5.5.
4 Designing a MultiEngine ASP Solver
The design of a multiengine solver involves several steps: design of (syntactic) features that are both significant for classifying the instances and cheaptocompute (so that the classifier can be fast and accurate); selection of solvers that are representative of the state of the art (to be able to possibly obtain the best performance in any considered instance); and selection of the classification algorithm, and fair design of training and test sets, to obtain a robust and unbiased classifier.
In the following, we describe the choices we have made for designing measp, which is our multiengine solver for ground ASP programs.
4.1 Features
Our features selection process started by considering a very wide set of candidate features that correspond, in our view, to several characteristics of an ASP program that, in principle, should be taken into account.
The features that we compute for each ground program are divided into four groups (such a categorization is borrowed from [Nudelman et al. (2004)]):

Problem size features: number of rules , number of atoms , ratios , , and ratios reciprocal , and . This type of features are considered to give an idea of what is the size of the ground program.

Balance features: ratio of positive and negative atoms in each body, and ratio of positive and negative occurrences of each variable; fraction of unary, binary and ternary rules. This type of features can help to understand what is the “structure” of the analyzed program.

“Proximity to horn” features: fraction of horn rules and number of atoms occurrences in horn rules. These features can give an indication on “how much” a program is close to be horn: this can be helpful, since some solvers may take advantage from this setting (e.g., minimum or no impact of completion [Clark (1978)] when applied).

ASP peculiar features: number of true and disjunctive facts, fraction of normal rules and constraints, head sizes, occurrences of each atom in heads, bodies and rules, occurrences of true negated atoms in heads, bodies and rules; Strongly Connected Components (SCC) sizes, number of HeadCycle Free (HCF) and nonHCF components, degree of support for nonHCF components.
For the features implying distributions, e.g., ratio of positive and negative atoms in each body, atoms occurrences in horn rules, and head sizes, five numbers are considered: minimum, 25% percentile, median, 75% percentile and maximum. The five numbers are considered given that we can not apriori consider the distributions to be Gaussians, thus mean and variance are not that informative.
The set of features reported above seems to be adequate for describing an ASP program.^{3}^{3}3Observations concerning existing proposals are reported in Section 6. On the other hand, we have to consider that the time spent computing the features will be integral part of our solving process: the risk is to spend too much time in calculating the features of a program. This component of the solving process could result in a significant overhead in the solving time in case of instances that are easily solved by (some of) the engines, or can even cause a time out on programs otherwise solved by (some of) the engines within the time limit.
Given these considerations, our final choice is to consider syntactic features that are cheaptocompute, i.e., computable in linear time in the size of the input, also given that in previous work (e.g., [Pulina and Tacchella (2007)]) syntactic features have been profitably used for characterizing (inherently) ground instances. To this end, we implemented a tool able to compute the abovereported set of features and conducted some preliminary experiments on all the benchmarks we were able to ground with GrinGo in less than 600s: 1425 instances out of a total of 1462, of which 823 out of 860 NP instances.^{4}^{4}4The exceptions are 10 and 27 instances of DisjunctiveScheduling and PackingProblem, respectively. On the one hand, the results confirmed the need for avoiding the computation of “expensive” features (e.g., SCCs): indeed, in this setting we could compute the whole set of features only for 665 NP instances within 600s; and, on the other hand, the results helped us in selecting a set of “cheap” features that are sufficient for obtaining a robust and efficient multiengine system. In particular, the features that we selected are a subset of the ones reported above:

Problem size features: number of rules , number of atoms , ratios , , and ratios reciprocal , and ;

Balance features: fraction of unary, binary and ternary rules;

“Proximity to horn” features: fraction of horn rules;

ASP peculiar features: number of true and disjunctive facts, fraction of normal rules and constraints .
This final choice of features, together with some of their combinations (e.g., ), amounts for a total of 52 features. Our tool for extracting features from ground programs can then compute all these features (in less than 600s) for 1371 programs out of 1462. The distribution of the CPU times for extracting features is characterized by the following five numbers: 0.24s, 1.74s, 2.40s, 4.37s, 541.92s. It has to be noticed that high CPU times correspond to extracting features for ground programs whose size is in the order of GigaBytes. Our set of chosen features is relevant, as will be shown in Section 5.
4.2 Solvers Selection
Solver  Solved  Unique  Solver  Solved  Unique 

clasp  445  26  lp2diffz3  307  – 
cmodels  333  6  lp2sat2gminisat  328  – 
dlv  241  37  lp2sat2lgminisat  322  – 
idp  419  15  lp2sat2lminisat  324  – 
lp2diffgz3  254  –  lp2sat2minisat  336  – 
lp2difflgz3  242  –  smodels  134  – 
lp2difflz3  248  –  sup  311  1 
The target of our selection is to collect a pool of solvers that is representative of the stateoftheart solver (sota), i.e., considering a problem instance, the oracle that always fares the best among available solvers. Note that, in our settings, the various engines available employ (often substantially) different evaluation strategies, and (it is likely that) different engines behaves better in different domains, or in other words, the engines’ performance is “orthogonal”. As a consequence one can find that there are solvers that solves a significant number of instances uniquely (i.e., instances solved by only one solver), that have a characteristic performance and are a fundamental component of the sota. Thus, a pragmatic and reasonable choice, given that we want to solve as much instances as possible, is to consider a solver only if it solves a reasonable amount of instances uniquely, since this solver cannot be, in a sense, subsumed performancewise by another behaving similarly.
In order to select the engines we ran preliminary experiments, and we report the results (regarding the NP class) in Table 2. Looking at the table, first we notice that we do not report results related to both claspD and claspfolio. Concerning the results of claspD, we report that –considering the NP class– its performance, in terms of solved instances, is subsumed by the performance of clasp. Considering the performance of claspfolio, we exclude such system from this analysis because we consider it as a yardstick system, i.e., we will compare its performance against the performance of measp.
Looking at Table 2, we can see that only 4 solvers out of 16 are able to solve a noticeable amount of instances uniquely, namely clasp, cmodels, DLV, and idp.^{5}^{5}5The picture of uniquely solved instances does not change even considering the entire family of lp2sat (resp. lp2diff) as a single engine that has the best performance among its variants. Concerning Beyond NP instances, we report that only three solvers are able to cope with such class of problems, namely claspD, cmodels, and DLV. Considering that both cmodels and DLV are involved in the previous selection, that claspD has a performance that does not overlap with the other two in Beyond NP instances, the pool of engines used in measp will be composed of 5 solvers, namely clasp, claspD, cmodels, DLV, and idp.
The experiments reported in Section 5 confirmed that this engine selection policy is effective in practice considering the ASP state of the art. Nonetheless, it is easy to see that in scenarios where the performance of most part of the available solvers is very similar on a common pool of instances, i.e., their performance is not “orthogonal”, choosing a solver for the only reason it solves a reasonable amount of instances uniquely may be not an effective policy. Indeed, the straightforward application of that policy to “overlapping” engines could result in discarding the best ones, since it is likely that several of them can solve the same instances. An effective possible extension of the selection policy presented above to deal with overlapping engines is to remove dominated solvers, i.e., a solver dominates a solver if the set of instances solved by is a superset of the instances solved by . Ties are broken choosing the solver that spends the smaller amount of CPU time. If the resulting pool of engines is still not reasonably distinguishable, i.e., there are not enough uniquely solved instances by each engine of the pool, then one may compute such pool, say , as follow: starting from the empty set (), and trying iteratively to add engine candidates to from the one that solves more instances, and faster, to the less efficient. At each iteration, an engine is added to if both the set of uniquely solved instances by the engines in is larger than in , and the resulting set is reasonably distinguishable.
We have applied the above extended policy, that is to be considered as a pragmatic strategy more than a general solution, obtaining good results in a specific experiment with overlapping engines; more details will be found in Section 5.4.
4.3 Classification Algorithms and Training
In the following, we briefly review the classifiers that we use in our empirical analysis. Considering the wide range of multinomial classifiers described in the scientific literature, we test a subset of algorithms, some of them considered in [Pulina and Tacchella (2007)]. Particularly, we can limit our choice to the classifiers able to deal with numerical attributes (the features) and multinomial class labels (the engines). Furthermore, in order to make our approach as general as possible, our desiderata is to choose classifiers that allow us to avoid “stringent” assumptions on the features distributions, e.g., hypotheses of normality or independence among the features. At the end, we also prefer classifiers that do not require complex parameter tuning, e.g., procedures that are more elaborated than standard parameters grid search. The selected classifiers are listed in the following:

Aggregation Pheromone density based pattern Classification (apc
): it is a pattern classification algorithm modeled on the ants colony behavior and distributed adaptive organization in nature. Each data pattern is considered as an ant, and the training patterns (ants) form several groups or colonies depending on the number of classes present in the data set. A new test pattern (ant) will move along the direction where average aggregation pheromone density (at the location of the new ant) formed due to each colony of ants is higher and, hence, eventually it will join that colony. We refer the reader to
[Halder et al. (2009)] for further details. 
Decision rules (furia): a classifier providing a set of rules that generally takes the form of a Horn clause wherein the class labels is implied by a conjunction of some attributes; we use furia [Hühn and Hüllermeier (2009)] to induce decision rules.

Decision trees (j48): a classifier arranged in a tree structure, and used to discover decision rules. Each inner node contains a test on some attributes, and each leaf node contains a label; we use j48, an optimized implementation of c4.5 [Quinlan (1993)].

Multinomial Logistic Regression
(mlr): a classifier providing a hyperplane of the hypersurfaces that separate the class labels in the feature space; we use the inducer described in
[Le Cessie and Van Houwelingen (1992)]. 
Nearestneighbor (nn): it is a classifier yielding the label of the training instance which is closer to the given test instance, whereby closeness is evaluated using, e.g., Euclidean distance [Aha et al. (1991)].

): it is a supervised learning algorithm used for both classification and regression tasks. Roughly speaking, the basic training principle of
svms is finding an optimal linear hyperplane such that the expected classification error for (unseen) test patterns is minimized. We refer the reader to [Cortes and Vapnik (1995)] for further details.
The rationale of our choice is twofold. On the one hand, the selected classifiers are “orthogonal”, i.e., they build on different inductive biases in the computation of their classification hypotheses, since their classification algorithms are based on very different approaches. On the other hand, building measp on top of different classifiers allows to draw conclusions about both the robustness of our approach, and the proper design of our testing set. Indeed, as shown in Section 5, performance is positive for each classification method.
As mentioned in Section 2.2, in order to train the classifiers, we have to select a pool of instances for training purpose, called the training set. Concerning such selection, our aim is twofold. On the one hand, we want to compose a training set in order to get a robust model; while, on the other hand, we want to test the generalization performance of measp also on instances belonging to benchmarks not “covered” by the training set.
As result of the considerations above, we designed three training sets. The first one –ts in the following– is composed of the 320 instances uniquely solved by the pool of engines selected in Section 4.2, i.e. such that only one engine, among the ones selected, solves each instance (without taking into account the instances involved in the competition). The rationale of this choice is to try to “mask” noisy information during model training to obtain a robust model. The remaining training sets are subsets of ts, and they are composed of instances uniquely solved considering only the ones belonging to the problems listed in the following:

ts: 297 instances uniquely solved considering:
GraphColouring, Numberlink, Labyrinth, MinimalDiagnosis. 
ts: 59 instances uniquely solved considering:
SokobanDecision, HanoiTower, Labyrinth, StrategicCompanies.
Note that both ts and ts contain one distinct Beyond NP problem to ensure a minimum coverage of this class of problems. The rationale of these additional training sets is thus to test our method on ”unseen” problems, i.e. on instances coming from domains that were not used for training: a ”good” machine learning method should generalize (to some degrees) and obtain good results also in such setting. In this view, both training sets are composed of instances coming from a limited number, i.e., 4 out of 14, of problems. Moreover, ts is also composed of a very limited number of instances. Such setting will further challenge measp to understand what is the point in which we can have degradation in performance: we will see that, while it is true that ts is a challenging situation in which performance decreases, even in this setting measp has reasonable performance and performs better than its engines and rival systems.
In order to give an idea of the coverage of our training sets and outline differences among them, we depict in Figure 1 the coverage of: the whole available dataset (Fig. 1(a)), ts (Fig. 1(b)), and its subsets ts (Fig. 1(c)) and ts (Fig. 1(d)
). In particular, the plots report a twodimensional projection obtained by means of a principal components analysis (PCA), and considering only the first two principal components (PC). The
axis and the axis in the plots are the first and the second PCs, respectively. Each point in the plots is labeled by the best solver on the related instance. In Figure 1(a) we add a label denoting the benchmark name of the depicted instances, in order to give a hint about the “location” of each benchmark. From the picture is clear that ts covers less space that ts, which in turn covers a subset of the whole set of instances. Clearly, ts, which is the smallest set of instances, has a very limited coverage (see Fig. 1(d)).Considering the classification algorithms listed above,^{6}^{6}6For all algorithms but apc, we use the the tool rapidminer [Mierswa et al. (2006)]. we trained the classifiers and we assessed their accuracy. Referring to the notation introduced in Section 2.2, even assuming that a training set is sufficient to learn , it is still the case that different sets may yield a different . The problem is that the resulting trained classifier may underfit the unknown pattern –i.e., its prediction is wrong– or overfit –i.e., be very accurate only when the input pattern is in the training set. Both underfitting and overfitting lead to poor generalization performance, i.e., fails to predict when
. However, statistical techniques can provide reasonable estimates of the generalization error. In order to test the generalization performance, we use a technique known as
stratified 10times 10fold cross validation to estimate the generalization in terms of accuracy, i.e., the total amount of correct predictions with respect to the total amount of patterns. Given a training set , we partition in subsets with such that and whenever ; we then train on the patterns and corresponding labels . We repeat the process 10 times, to yield 10 different and we obtain the global accuracy estimate.We report an accuracy greater than 92% for each classification algorithm trained on ts, while concerning the remaining training sets, just for the sake of completeness we report an average 85% as accuracy result. The main reason for this result is that the training sets different from ts are composed of a smaller number of instances with respect to ts, thus the classification algorithms are not able to generalize with the same accuracy. This result is not surprising, also considering the plots in Figure 1 and, as we will see in the experimental section, this will influence the performance of measp.
5 Performance Analysis
In this section we present the results of the analysis we have performed. We consider different combinations of training and test sets, where the training sets are the ones introduced in Section 4, and the test set ranges over the 3rd ASP Competition ground instances. In particular, the first (resp. second) experiment has ts as training set, and the successfully grounded instances evaluated (resp. submitted) to the 3rd ASP Competition as test set: the goal of this analysis is to test the efficiency of our approach on all the evaluated (resp. submitted) instances when the model is trained on the whole space of the uniquely solved instances. The third experiment considers ts and ts as training sets, and all the successfully grounded instances submitted to the competition as test set: in this case, given that the models are not trained on all the space of the uniquely solved instances, but on a portion, and that the test set contains “unseen” problems (i.e., belonging to domains that were left unknown during training), the goal is to test, in particular, the robustness of our approach. We devoted one subsection to each of these experiments, where we compare measp to its component engines. In detail, for each experiment the results are reported in a table structured as follows: the first column reports the name of the solver and (when needed) its inductive model in a subcolumn, where the considered inductive models are denoted by mod , mod and mod, corresponding to the test sets ts, ts and ts introduced before, respectively; the second and third columns report the result of each solver on NP and Beyond NP classes, respectively, in terms of the number of solved instances within the time limit and sum of their solving times (a subcolumn is devoted to each of these numbers, which are “–” if the related solver was not among the selected engines). We report the results obtained by running measp with the six classification methods introduced in Section 4.3, and their related inductive models. In particular, measp (c) indicates measp employing the classification method c apc, furia, j48, mlr, nn, svm . We also report the component engines employed by measp on each class as explained in Section 4.2, and as reference sota, which is the ideal multiengine solver (considering the engines employed).
An additional subsection summarizes results and compares measp with stateoftheart solvers that won the 3rd ASP Competition.
We remind the reader that the compared engines were run on all the 1425 instances grounded in less than 600s, whereas the instances on which measp was run are limited to the ones for which we were able to compute all features (i.e., 1371 instances), and the timings for multiengine systems include both the time spent for extracting the features from the ground instances, and the time spent by the classifier.
Solver  NP  Beyond NP  
Ind. Model  #Solved  Time  #Solved  Time  
clasp  60  5132.45  –  –  
claspD  –  –  13  2344.00  
cmodels  56  5092.43  9  2079.79  
DLV  37  1682.76  15  1359.71  
idp  61  5010.79  –  –  
measp (apc)  mod  63  5531.68  15  3286.28 
measp (furia)  mod  63  5244.73  15  3187.73 
measp (j48)  mod  68  5873.25  15  3187.73 
measp (mlr)  mod  65  5738.79  15  3187.57 
measp (nn)  mod  66  4854.78  15  3187.31 
measp (svm)  mod  60  4830.70  15  2308.60 
sota  71  5403.54  15  1221.01 
5.1 Efficiency on Instances Evaluated at the Competition
In the first experiment we consider ts introduced in Section 4 as training set, and as test set all the instances evaluated at the 3rd ASP Competition (a total of 88 instances). Results are shown in Table 3. We can see that, on problems of the NP class, measp (j48) solves the highest number of instances, 7 more than idp and 8 more than clasp. Note also that measp (svm) (our worst performing version) is basically on par with clasp (with 60 solved instances) and is very close to idp (with 61 solved instances). Nonetheless, 5 out of 6 classification methods lead measp to have better performance than each of its engines. On the Beyond NP problems, instead, all versions of measp and DLV solve 15 instances (DLV having best mean CPU time), followed by claspD and cmodels, which solve 13 and 9 instances, respectively. Among the measp versions, measp (j48) is, in sum, the solver that solves the highest number of instances: here it is very interesting to note that its performance is very close to the sota solver (solving only 3 instances less) which, we remind, has the ideal performance that we could expect in these instances with these engines.
5.2 Efficiency on Instances Submitted to the Competition
In the second experiment we consider the ts training set (as for the previous experiment), and the test set is composed of all successfully grounded instances submitted to the 3rd ASP Competition. The results are now shown in Table 4. Note here that in both NP and Beyond NP classes, all measp versions solve more instances (or in shorter time in one case) than the component engines: in particular, in the NP class, measp (apc) solves the highest number of instances, 52 more than clasp, which is the best engine in this class, while in the Beyond NP class measp (mlr) solves 519 instances and three measp versions solve 518 instances, i.e., 86 and 85 more instances than claspD, respectively, which is the engine that solves more instances in the Beyond NP class. Also in this case measp (svm) solves less instances than other measp versions; nonetheless, measp (svm) can solve as much NP instances as clasp, and is effective on Beyond NP, where it is one of the versions that can solve 518 instances.
As far as the comparison with the sota solver is concerned, the best measp version, i.e., measp (apc) solves, in sum, only 23 out of 1036 instances less than the sota solver, mostly from the NP class.
Solver  NP  Beyond NP  
Ind. Model  #Solved  Time  #Solved  Time  
clasp  445  47096.14  –  –  
claspD  –  –  433  52029.74  
cmodels  333  40357.30  270  38654.29  
DLV  241  21678.46  364  9150.47  
idp  419  37582.47  –  –  
measp (apc)  mod  497  55334.15  516  60537.67 
measp (furia)  mod  480  48563.26  518  60009.23 
measp (j48)  mod  490  49564.19  510  59922.86 
measp (mlr)  mod  489  49569.77  519  58287.31 
measp (nn)  mod  490  46780.31  518  55043.39 
measp (svm)  mod  445  40917.70  518  52553.84 
sota  516  39857.76  520  24300.82 
In order to give a different look at the magnitude of improvements of our approach in this experiment, whose test set we remind is a superset of the one in Section 5.1, in Fig 2 we present the results of measp (apc), its engines, claspfolio and sota on NP instances in a cumulative way as customary in, e.g., MaxSAT and ASP Competitions. The axis reports a CPU time, while the axis indicates the number of instances solved within a certain CPU time.
Results clearly show that measp (apc) performs better, in terms of total number of instances solved, than its engines clasp, claspD and claspfolio; also, measp (apc) it is very close to the sota. Looking more in details at the figure, we can note that, along the axis the distance of measp (apc) w.r.t. the sota decreases: this is due, for a small portion of instances (given that we have seen that these two steps are efficient), to the time spent to compute features and on classification, and to the fact that we may not always predict the best engine to run. The convergence of measp (apc) toward sota confirms that, even if we may sometimes miss to predict the best engine, most of the time we predict an engine that allows to solve the instance within the time limit.
5.3 Robustness on Instances Submitted to the Competition
In this experiment, we use the two smaller training sets ts and ts introduced in Section 4, while the same test set as that of previous experiment. The rationale of this last experiment is to test the robustness of our approach on “unseen” problems, i.e., in a situation where the test set does not contain any instance from some problems. Note that ts contains 297 uniquely solved instances, covering 4 domains out of 14; and ts is very small, since it contains only 59 instances belonging to 4 domains. We can thus expect this experiment to be particularly challenging for our multiengine approach. Results are presented in Table 5, from which it is clear that measp (apc) trained on ts performs better that the other alternatives and solves 46 instances more than clasp in the NP class, and 11 instances more than claspD in the Beyond NP class (clasp and claspD being the best engines in NP and Beyond NP classes, respectively). As expected, if we compare the results with the ones obtained with the larger training set ts, we note a general performance degradation. In particular, the performance now is less close to the sota solver, which solves in total 40 more instances than the best measp version trained on ts, with additional unsolved instances coming mainly from the Beyond NP class in this case. This can be explained considering that ts does not contain instances from the Strategic Companies problem, and, thus, it is not always able to select DLV on these instances where DLV is often a better choice than claspD. However, measp can solve also in this case far more instances than all the engines, demonstrating a robust performance.
These findings are confirmed when the very small test set ts is considered. In this very challenging setting there are still measp versions that can solve more instances than the component engines.
Solver  NP  Beyond NP  
Ind. Model  #Solved  Time  #Solved  Time  
clasp  445  47096.14  –  –  
claspD  –  –  433  52029.74  
cmodels  333  40357.30  270  38654.29  
DLV  241  21678.46  364  9150.47  
idp  419  37582.47  –  –  
measp (apc)  mod  491  54126.87  505  56250.96 
measp (furia)  mod  479  49226.42  507  55777.67 
measp (j48)  mod  477  46746.65  507  55777.67 
measp (mlr)  mod  471  48404.11  507  52499.83 
measp (nn)  mod  476  47627.06  507  49418.67 
measp (svm)  mod  459  38686.16  507  51462.13 
measp (apc)  mod  445  48290.97  433  53268.62 
measp (furia)  mod  414  37902.37  363  10542.85 
measp (j48)  mod  487  51187.66  431  57393.61 
measp (mlr)  mod  460  42385.66  363  10542.01 
measp (nn)  mod  487  48889.21  363  10547.81 
measp (svm)  mod  319  32162.37  364  10543.00 
sota  516  39857.76  520  24300.82 
5.4 Discussion and Comparison to the State of the Art
Summing up the three experiments, it is clear that measp has a very robust and efficient performance: it often can solve (many) more instances than its engines, even considering the single NP and Beyond NP classes.
We also report that all versions of measp have reasonable performance, so –from a machine learning point of view– we can conclude that, on the one hand, the set of cheaptocompute features that we selected is representative (i.e., they allow to both analyze a significant number of instances and drive the selection of an appropriate engine) independently from the classification method employed. On the other hand, the robustness of our inductive models let us conclude that we made an appropriate design of our training set ts.
Additional observations can be drawn by looking at Figure 3, where three plots are depicted, one for each inductive model, showing the number of calls to the internal engines for each variant of measp. In particular, by looking at Figure 3(a), we can conclude that also the selection of the engines was fair. Indeed, all of them were employed in a significant number of cases and, as one would expect, the engines that solved a larger number of instances in the 3rd ASP Competition (i.e., clasp and claspD) are called more often. Nonetheless, the ability of exploiting all solvers from the pool made a difference in performance, e.g., looking at Figure 3(a) one can note that our best version measp (apc) exploits all engines, and it is very close to the ideal performance of sota. It is worth noting that the measp versions that select DLV more often (note that DLV solves uniquely a high number of StrategicCompanies instances) performed better on Beyond NP. Note also that Figure 3 allows to explain the performance of measp (svm), which often differs from the other methods; indeed, this version often prefers DLV over the other engines also on NP instances. Despite choosing DLV is often decisive on Beyond NP, it is not always a good choice on NP as well. As a consequence measp (svm) is always very fast on Beyond NP but does not show overall the same performance of measp equipped with other methods.
Figure 3(a) also gives some additional insight concerning the differences among our inductive models. In particular, the measp versions trained with ts (containing only StrategicCompanies in Beyond NP) prefer more often DLV (see Fig. 3(c)), thus the performance is good on this class but deteriorates a bit on NP. Concerning ts (see Fig. 3(b)), we note that idp is less exploited than in the other cases, even by measp (mlr) which is the alternative that chooses idp
more often: this is probably due to the minor coverage of this training set on
NP. On the overall, as we would expect, the number of calls for measp trained with ts is more balanced among the various engines, than for measp trained with the smaller training sets.We have seen that measp almost always can solve more instances than its component engines. One might wonder how it compares with the stateoftheart ASP implementations. Table 6 summarizes the performance of claspD and claspfolio (the overall winner, and the fastest solver in the NP class that entered the System Track of the competition, respectively), in terms of number of solved instances on both instance sets, i.e., evaluated and submitted, and of the various versions of measp exploiting our inductive model of choice, obtained from the test set ts.
We observe that all measp versions outperform yardstick stateoftheart solvers considering all submitted instances.^{7}^{7}7Recall that claspfolio can deal with NP instances only.
Concerning the comparison on the instances evaluated at the 3rd ASP Competition, we note that all measp versions outperform the winner of the System Track of the competition claspD that could solve 65 instances, whereas measp (j48) (i.e., the best solver in this class) solves 83 instances (and is very close to the ideal sota solver). Even measp (svm) (i.e., the worst performing version of our system) could solve 10 instances more than claspD; moreover, also measp (apc) is very effective here, solving 78 instances.
Concerning the comparison on the larger set of instances submitted to the 3rd ASP Competition, the picture is similar. All measp versions outperform claspD, which solves 835 instances where the worst performing version of our system, measp (svm), solves 963 instances, and the best version overall measp (apc) solvers 1013 instances, i.e., 178 instances more that the winner of the 3rd ASP System competition. We remind that this holds even considering the most challenging settings when measp is trained with ts and ts (see Tab. 5).
If we limit our attention to the instances belonging to the NP class, the yardstick for comparing measp with the state of the art is clearly claspfolio. Indeed, claspfolio was the solver that could solve more NP instances at the 3rd ASP Competition, and also claspfolio is the state of the art portfolio system for ASP, selecting from a pool of different clasp configurations.
The picture that comes out from Table 6 shows that all versions of measp could solve more instances than claspfolio, especially considering the instances submitted to the competition. In particular, measp (apc) solves 497 NP instances, while claspfolio solves 431. Concerning the comparison on the instances evaluated at the 3rd ASP Competition, we note that claspfolio could solve 62 instances and performs similarly to, e.g., measp (svm) (with 60 instances), and measp (furia) (with 63 instances); our best performing version (i.e., measp (j48)) could solve 68 instances, i.e., 6 instances more that claspfolio (i.e., about 10% more).
Up to now we have compared the raw performance of measp with outofthebox alternatives. A more precise picture of the comparison between the two machine learning based approaches (measp and claspfolio) can be obtained by performing some additional analysis.
First of all note that the above comparison was made considering as reference the claspfolio version (trained by the Potassco team) that entered the 3rd ASP Competition. One might wonder what is the performance of claspfolio when trained on our training set ts. As will be discussed in detail in Section 6, claspfolio exploits a different method for algorithm selection, thus this datum is reported here only for the sake of completeness. We have trained claspfolio on ts with the help of the Potassco team.^{8}^{8}8Following the suggestion of the Potassco team we have run claspfolio
(ver. 1.0.1 – Aug, 19th 2011), since the feature extraction tool
claspre has been recently updated and integrated in claspfolio. As a result, the performance of claspfolio trained on ts is analogous to the one obtained by the claspfolio trained for the competition (i.e., it solves 59 instances from the evaluated set, and 433 of the submitted set).On the other hand, one might want to analyze what would be the result of applying the approach to algorithm selection implemented in measp to the setting of claspfolio. As pointed out in Section 6 the multiengine approach that we have followed in measp is very flexible, and we could easily develop an adhoc version of our system, that we called meclasp, that is based on the same “algorithms” portfolio of claspfolio. In practice, we considered as a separate engine each of the 25 clasp versions employed in claspfolio, and we applied the same steps described in Section 4 to build meclasp. Concerning the selection of the engines, as one might expect, many engines are overlapping and the number of uniquely solved instances considering all the available engines was very low (we get only ten uniquely solved instances). Thus, we applied the extended engine selection policy and we selected 5 engines, we trained meclasp on ts, and selected a classification algorithm, in this case nn. (We also tried other settings with different combinations, both more and less engines, still obtaining similar overall results).
The goal of this final experiment is to confirm the prediction power of our approach. The resulting picture is that meclasp (nn) solves 458 NP instances, where the ideal limit that one can reach considering all the 25 heuristics in the portfolio is 484. This is substantially more that claspfolio, solving 431 instances. Nonetheless, measp (nn) (that solves 490) outperforms meclasp (nn).
All in all, one can conclude that the approach introduced in this paper, combining cheaptocompute features, and multinomial classification works well also when applied to a portfolio of heuristics. On the other hand, as one might expect, the possibility to select among several different engines featuring (often radically different) evaluation strategies with non overlapping performance, gives additional advantages w.r.t. a singleengine portfolio. Indeed, even in presence of an ideal prediction strategy, a portfolio approach based on variants of the same algorithm cannot achieve the same performance of an ideal multiengine approach. This is clear observing that the sota solver on NP can solve 516 instances, whereas the ideal performance for both meclasp and claspfolio tops at 484 instances. The comparison of meclasp and measp seem to confirm that measp can exploit this ideal advantage also in practice.
Solver  Evaluated  Submitted  

Ind. Model  NP  Beyond NP  Tot.  NP  Beyond NP  Tot.  
claspD  –  52  13  65  402  433  835 
claspfolio  Competition  62  –  –  431  –  – 
measp (apc)  mod  63  15  78  497  516  1013 
measp (furia)  mod  63  15  78  480  518  998 
measp (j48)  mod  68  15  83  490  510  1000 
measp (mlr)  mod  65  15  80  489  519  1008 
measp (nn)  mod  66  15  81  490  518  1008 
measp (svm)  mod  60  15  75  445  518  963 
6 Related Work
Starting from the consideration that, on empirically hard problems, there is rarely a “global” best algorithm, while it is often the case that different algorithms perform well on different problem instances, Rice 1976 defined the algorithm selection problem as the problem of finding an effective algorithm based on an abstract model of the problem at hand. Along this line, several works have been done to tackle combinatorial problems efficiently. In Gomes and Selman (2001); LeytonBrown et al. (2003) it is described the concept of “algorithm portfolio” as a general method for combining existing algorithms into new ones that are unequivocally preferable to any of the component algorithms. Most related papers to our work are Xu et al. (2008); Pulina and Tacchella (2007) for solving SAT and QSAT problems. Both Xu et al. (2008) and Pulina and Tacchella (2007) rely on a perinstance analysis, like the one we have performed in this paper: in Pulina and Tacchella (2007), which is the work closest to our, the goal is to design a multiengine solver, i.e. a tool that can choose among its engines the one which is more likely to yield optimal results. Pulina and Tacchella (2009) extends Pulina and Tacchella (2007) by introducing a selfadaptation of the learned selection policies when the approach fails to give a good prediction. The approach by Xu et al. 2008 has also the ability to compute features online, e.g., by running a solver for an allotted amount of time and looking “internally” to solver statistics, with the option of changing the solver online: this is a perinstance algorithm portfolio approach. The related solver, satzilla, can also combine portfolio and multiengine approaches. The algorithm portfolio approach is employed also in: Gomes and Selman (2001) on Constraint Satisfaction and MIP, Samulowitz and Memisevic (2007) on QSAT and Gerevini et al. (2009) on planning problems. If we consider “pure” approaches, the advantage of the algorithm portfolio over a multiengine is that it is possible, by combining algorithms, to reach a performance than is better than the one of the best engine, which is an upper bound for a multiengine solver instead. On the other hand, multiengine treats the engines as a blackbox, and this is a fundamental assumption to have a flexible and modular system: to add a new engine one just needs to update the inductive model. Other approaches, an overview can be found in Hoos (2012), work by designing methods for automatically tuning and configuring the solver parameters: e.g., Hutter et al. (2009); Hutter et al. (2010) for solving SAT and MIP problems, and Vallati et al. (2011) for planning problems.
About the other approaches in ASP, the one implemented in claspfolio Gebser et al. (2011) mixes characteristics of the algorithm portfolio approach with others more similar to this second trend: it works by selecting the most promising clasp internal configuration on the basis of both “static” and “dynamic” features of the input program, the latter obtained by running clasp for a given amount of time. Thus, like the algorithm portfolio approaches, it can compute both static and dynamic features, while trying to automatically configure the “best” clasp configuration on the basis of the computed features.
The work here presented is in a different ballpark w.r.t. claspfolio for a number of motivations. First, from a machine learning point of view, the inductive models of measp are based on classification algorithms, while the inductive models of claspfolio are mainly based on regression techniques, as in satzilla, with the exception of a “preliminary” stage, in which a classifier is invoked in order to predict the satisfiability result of the input instance. Regressionbased techniques usually need many training instances to have a good prediction while, as shown in our paper, this is not required for our method that is based on classification. To highlight consideration of the prediction power, in Section 5.4 we have applied our approach to claspfolio, showing that relying on classification instead of regression in claspfolio can lead to better results. Second, as mentioned before, in our approach we consider the engines as a blackbox: measp architecture is designed to be independent from the engines internals. measp, being a multiengine solver, has thus higher modularity/flexibility w.r.t. claspfolio: adding a new solver to measp is immediate, while this is problematic in claspfolio, and likely would boil down to implement the new strategy in clasp. Third, as a consequence of the previous point, we use only static features: dynamic features, as in the case of claspfolio, usually are both strongly related to a given engine and possibly costly to compute, and we avoided such kind of features. For instance, one of the claspfolio dynamic feature is related to the number of “learnt constraints”, that could be a significant feature for clasp but not for other systems, e.g., DLV that does not adopt learning and is based on lookahead. Lastly, as described in Section 4.1, we use only cheaptocompute features, while claspfolio relies some quite “costly” features, e.g., number of SCCs and loops. This was confirmed on some preliminary experiments: it turned out that claspfolio feature extractor could compute, in 600s, all its features for 573 out of 823 NP ground instances.
An alternative approach in ASP is followed in the dors framework of Balduccini 2011, where in the offline learning phase, carried out on representative programs from a given domain, a heuristic ordering is selected to be then used in smodels when solving other programs from the same domain. The target of this work seems to be realworld problem domains where instances have similar structures, and heuristic ordering learned in some (possibly small) instances in the domain can help to improve the performance on other (possibly big) instances. According to its author^{9}^{9}9Personal communications with Marcello Balduccini. the solving method behind dors can be considered “complementary” more than alternative w.r.t. the one of measp, i.e., they could in principle be combined. An idea can be the following: while computing features, one can (in parallel) run one or more engines in order to learn a (possibly partial) heuristic ordering. Then, in the solving phase, engines can take advantage from the learned heuristic (but, of course, assuming minimal changes in the engines). This would come up to having two “sources” of knowledge: the “most promising” engine, learned with the multiengine approach, and the learned heuristic ordering.
Finally, we remark that this work is an extended and revised version of Maratea et al. (2012a), the main improvements include:

the adoption of six classification methods (instead of the only one, i.e., nn, employed in Maratea et al. (2012a));

a more detailed analysis of the dataset and the test sets;

a wider experimental analysis, including more systems, i.e., different versions of measp and claspfolio, and more investigations on training and test sets, and

an improved related work, in particular w.r.t. the comparison with claspfolio.
7 Conclusion
In this paper we have applied machine learning techniques to ASP solving with the goal of developing a fast and robust multiengine ASP solver. To this end, we have: specified a number of cheaptocompute syntactic features that allow for accurate classification of ground ASP programs; applied six multinomial classification methods to learning algorithm selection strategies; and implemented these techniques in our multiengine solver measp, which is available for download at
http://www.mat.unical.it/ricca/measp .
The performance of measp was assessed on three experiments, which were conceived for checking efficiency and robustness of our approach, involving different training and test sets of instances taken from the ones submitted to the System Track of the 3rd ASP Competition. Our analysis shows that our multiengine solver measp is very robust and efficient, and outperforms both its component engines and stateoftheart solvers.
Acknowledgments.
The authors would like to thank Marcello Balduccini for useful discussions (by email and in person) about the solving algorithm underlying his system dors, and all the members of the claspfolio team, in particular Torsten Schaub and Thomas Marius Schneider, for clarifications and the valuable support to train claspfolio in the most proper way.
References
 Aha et al. (1991) Aha, D., Kibler, D., and Albert, M. 1991. Instancebased learning algorithms. Machine learning 6, 1, 37–66.

Balduccini (2011)
Balduccini, M. 2011.
Learning and using domainspecific heuristics in ASP solvers.
AI Communications – The European Journal on Artificial Intelligence
24, 2, 147–164.  Balduccini and Lierler (2012) Balduccini, M. and Lierler, Y. 2012. Practical and methodological aspects of the use of cuttingedge asp tools. In Proc. of the 14th International Symposium on Practical Aspects of Declarative Languages (PADL 2012), C. V. Russo and N.F. Zhou, Eds. Lecture Notes in Computer Science, vol. 7149. Springer, 78–92.
 Baral (2003) Baral, C. 2003. Knowledge Representation, Reasoning and Declarative Problem Solving. Cambridge University Press, Tempe, Arizona.

Brooks et al. (2007)
Brooks, D. R., Erdem, E., Erdogan, S. T., Minett,
J. W., and Ringe, D. 2007.
Inferring phylogenetic trees using answer set programming.
Journal of Automated Reasoning
39, 4, 471–511.  Calimeri et al. (2011) Calimeri, F., Ianni, G., and Ricca, F. since 2011. The third answer set programming system competition. https://www.mat.unical.it/aspcomp2011/.
 Calimeri et al. (2011) Calimeri, F., Ianni, G., Ricca, F., Alviano, M., Bria, A., Catalano, G., Cozza, S., Faber, W., Febbraro, O., Leone, N., Manna, M., Martello, A., Panetta, C., Perri, S., Reale, K., Santoro, M. C., Sirianni, M., Terracina, G., and Veltri, P. 2011. The Third Answer Set Programming Competition: Preliminary Report of the System Competition Track. In Proc. of LPNMR11. LNCS Springer, Vancouver, Canada, 388–403.
 Calimeri et al. (2012) Calimeri, F., Ianni, G., Ricca, F. 2012. The third open answer set programming competition. Theory and Practice of Logic Programming. Available online. DOI:http://dx.doi.org/10.1017/S1471068412000105.
 Clark (1978) Clark, K. L. 1978. Negation as Failure. In Logic and Data Bases, H. Gallaire and J. Minker, Eds. Plenum Press, New York, 293–322.
 Cortes and Vapnik (1995) Cortes, C. and Vapnik, V. 1995. Supportvector networks. Machine learning 20, 3, 273–297.
 de Moura and Bjørner (2008) de Moura, L. M. and Bjørner, N. 2008. Z3: An Efficient SMT Solver. In Proceedings of the 14th International Conference on Tools and Algorithms for Construction and Analysis of Systems, TACAS 2008. 337–340.
 Drescher et al. (2008) Drescher, C., Gebser, M., Grote, T., Kaufmann, B., König, A., Ostrowski, M., and Schaub, T. 2008. ConflictDriven Disjunctive Answer Set Solving. In Proceedings of the Eleventh International Conference on Principles of Knowledge Representation and Reasoning (KR 2008), G. Brewka and J. Lang, Eds. AAAI Press, Sydney, Australia, 422–432.
 Eén and Sörensson (2003) Eén, N. and Sörensson, N. 2003. An Extensible SATsolver. In Theory and Applications of Satisfiability Testing, 6th International Conference, SAT 2003. LNCS Springer, 502–518.
 Eiter et al. (1997) Eiter, T., Gottlob, G., and Mannila, H. 1997. Disjunctive Datalog. ACM Transactions on Database Systems 22, 3 (Sept.), 364–418.
 Friedrich and Ivanchenko (2008) Friedrich, G. and Ivanchenko, V. 2008. Diagnosis from first principles for workflow executions. Tech. rep., Alpen Adria University, Applied Informatics, Klagenfurt, Austria. http://proserver3iwas.uniklu.ac.at/download_area/TechnicalReports/technical_report_2008_02.pdf.
 Gebser et al. (2011) Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T., Schneider, M. T., and Ziller, S. 2011. A portfolio solver for answer set programming: Preliminary report. In Proc. of the 11th International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR), J. P. Delgrande and W. Faber, Eds. LNCS, vol. 6645. Springer, Vancouver, Canada, 352–357.
 Gebser et al. (2007) Gebser, M., Kaufmann, B., Neumann, A., and Schaub, T. 2007. Conflictdriven answer set solving. In Twentieth International Joint Conference on Artificial Intelligence (IJCAI07). Morgan Kaufmann Publishers, Hyderabad, India, 386–392.
 Gebser et al. (2007) Gebser, M., Schaub, T., and Thiele, S. 2007. GrinGo : A New Grounder for Answer Set Programming. In Logic Programming and Nonmonotonic Reasoning, 9th International Conference, LPNMR 2007. Lecture Notes in Computer Science, vol. 4483. Springer, Tempe, Arizona, 266–271.
 Gebser et al. (2011) Gebser, M., Schaub, T., Thiele, S., and Veber, P. 2011. Detecting inconsistencies in large biological networks with answer set programming. Theory and Practice of Logic Programming 11, 23, 323–360.
 Gelfond and Leone (2002) Gelfond, M. and Leone, N. 2002. Logic Programming and Knowledge Representation – the AProlog perspective . Artificial Intelligence 138, 1–2, 3–38.
 Gelfond and Lifschitz (1988) Gelfond, M. and Lifschitz, V. 1988. The Stable Model Semantics for Logic Programming. In Logic Programming: Proceedings Fifth Intl Conference and Symposium. MIT Press, Cambridge, Mass., 1070–1080.
 Gelfond and Lifschitz (1991) Gelfond, M. and Lifschitz, V. 1991. Classical Negation in Logic Programs and Disjunctive Databases. New Generation Computing 9, 365–385.
 Gerevini et al. (2009) Gerevini, A., Saetti, A., and Vallati, M. 2009. An automatically configurable portfoliobased planner with macroactions: Pbp. In Proc. of the 19th International Conference on Automated Planning and Scheduling, A. Gerevini, A. E. Howe, A. Cesta, and I. Refanidis, Eds. AAAI, Thessaloniki, Greece.
 Giunchiglia et al. (2006) Giunchiglia, E., Lierler, Y., and Maratea, M. 2006. Answer set programming based on propositional satisfiability. Journal of Automated Reasoning 36, 4, 345–377.
 Gomes and Selman (2001) Gomes, C. P. and Selman, B. 2001. Algorithm portfolios. Artificial Intelligence 126, 12, 43–62.
 Halder et al. (2009) Halder, A., Ghosh, A., and Ghosh, S. 2009. Aggregation pheromone density based pattern classification. Fundamenta Informaticae 92, 4, 345–362.
 Hoos (2012) Hoos, H. H. 2012. Programming by optimization. Communucations of the ACM 55, 2, 70–80.
 Hühn and Hüllermeier (2009) Hühn, J. and Hüllermeier, E. 2009. Furia: an algorithm for unordered fuzzy rule induction. Data Mining and Knowledge Discovery 19, 3, 293–319.

Hutter
et al. (2010)
Hutter, F., Hoos, H. H., and LeytonBrown, K. 2010.
Automated configuration of mixed integer programming solvers.
In
Proc. of the 7th International Conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems
, A. Lodi, M. Milano, and P. Toth, Eds. LNCS, vol. 6140. Springer, Bologna, Italy, 186–202.  Hutter et al. (2009) Hutter, F., Hoos, H. H., LeytonBrown, K., and Stützle, T. 2009. ParamILS: An automatic algorithm configuration framework. Journal of Artificial Intelligence Research 36, 267–306.
 Janhunen (2006) Janhunen, T. 2006. Some (in)translatability results for normal logic programs and propositional theories. Journal of Applied NonClassical Logics 16, 35–86.
 Janhunen et al. (2009) Janhunen, T., Niemelä, I., and Sevalnev, M. 2009. Computing stable models via reductions to difference logic. In Proceedings of the 10th International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR). LNCS. Springer, Postdam, Germany, 142–154.
 Leone et al. (2006) Leone, N., Pfeifer, G., Faber, W., Eiter, T., Gottlob, G., Perri, S., and Scarcello, F. 2006. The DLV System for Knowledge Representation and Reasoning. ACM Transactions on Computational Logic 7, 3 (July), 499–562.
 LeytonBrown et al. (2003) LeytonBrown, K., Nudelman, E., Andrew, G., Mcfadden, J., and Shoham, Y. 2003. A portfolio approach to algorithm selection. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI03.
 Lierler (2005) Lierler, Y. 2005. Disjunctive Answer Set Programming via Satisfiability. In Logic Programming and Nonmonotonic Reasoning — 8th International Conference, LPNMR’05, Diamante, Italy, September 2005, Proceedings, C. Baral, G. Greco, N. Leone, and G. Terracina, Eds. Lecture Notes in Computer Science, vol. 3662. Springer Verlag, 447–451.
 Lierler (2008) Lierler, Y. 2008. Abstract Answer Set Solvers. In Logic Programming, 24th International Conference (ICLP 2008). Lecture Notes in Computer Science, vol. 5366. Springer, 377–391.
 Lifschitz (1999) Lifschitz, V. 1999. Answer Set Planning. In Proceedings of the 16th International Conference on Logic Programming (ICLP’99), D. D. Schreye, Ed. The MIT Press, Las Cruces, New Mexico, USA, 23–37.
 Maratea et al. (2012a) Maratea, M., Pulina, L., and Ricca, F. 2012a. Applying Machine Learning Techniques to ASP Solving. In Technical Communications of the 28th International Conference on Logic Programming (ICLP 2012). LIPIcs, vol. 17. Schloss Dagstuhl  LeibnizZentrum fuer Informatik, 37–48.
 Maratea et al. (2012b) Maratea, M., Pulina, L., and Ricca, F. 2012b. The multiengine asp solver measp. In Proceedings of Logics in Artificial Intelligence, JELIA 2012. LNCS, vol. 7519. Springer, 484–487.
 Marczak et al. (2010) Marczak, W. R., Huang, S. S., Bravenboer, M., Sherr, M., Loo, B. T., and Aref, M. 2010. Secureblox: customizable secure distributed data processing. In SIGMOD Conference. 723–734.
 Marek and Truszczyński (1998) Marek, V. W. and Truszczyński, M. 1998. Stable models and an alternative logic programming paradigm. CoRR cs.LO/9809032.
 Mariën et al. (2008) Mariën, M., Wittocx, J., Denecker, M., and Bruynooghe, M. 2008. Sat(id): Satisfiability of propositional logic extended with inductive definitions. In Proc. of the 11th International Conference on Theory and Applications of Satisfiability Testing, SAT 2008. LNCS. Springer, Guangzhou, China, 211–224.
 Masoni et al. (2009) Masoni, A., Carpinelli, M., Fenu, G., Bosin, A., Mura, D., Porceddu, I., and Zanetti, G. 2009. Cybersar: A lambda grid computing infrastructure for advanced applications. In Nuclear Science Symposium Conference Record (NSS/MIC), 2009 IEEE. IEEE, 481–483.
 Mierswa et al. (2006) Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., and Euler, T. 2006. Yale: Rapid prototyping for complex data mining tasks. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 935–940.
 Niemelä (1998) Niemelä, I. 1998. Logic Programs with Stable Model Semantics as a Constraint Programming Paradigm. In Proceedings of the Workshop on Computational Aspects of Nonmonotonic Reasoning, I. Niemelä and T. Schaub, Eds. Trento, Italy, 72–79.
 Le Cessie and Van Houwelingen (1992) Le Cessie, S., and Van Houwelingen, J.C. 1992. Ridge estimators in logistic regression. Applied statistics. JSTOR, 191–201.
 Nogueira et al. (2001) Nogueira, M., Balduccini, M., Gelfond, M., Watson, R., and Barry, M. 2001. An AProlog Decision Support System for the Space Shuttle. In Practical Aspects of Declarative Languages, Third International Symposium (PADL 2001), I. Ramakrishnan, Ed. Lecture Notes in Computer Science, vol. 1990. Springer, 169–183.
 Nudelman et al. (2004) Nudelman, E., LeytonBrown, K., Hoos, H. H., Devkar, A., and Shoham, Y. 2004. Understanding random SAT: Beyond the clausestovariables ratio. In Proc. of the 10th International Conference on Principles and Practice of Constraint Programming (CP), M. Wallace, Ed. Lecture Notes in Computer Science. Springer, Toronto, Canada, 438–452.
 Pulina and Tacchella (2007) Pulina, L. and Tacchella, A. 2007. A multiengine solver for quantified boolean formulas. In Proc. of the 13th International Conference on Principles and Practice of Constraint Programming (CP), C. Bessiere, Ed. Lecture Notes in Computer Science. Springer, Providence, Rhode Island, 574–589.
 Pulina and Tacchella (2009) Pulina, L. and Tacchella, A. 2009. A selfadaptive multiengine solver for quantified boolean formulas. Constraints 14, 1, 80–116.
 Quinlan (1993) Quinlan, J. 1993. C4.5: programs for machine learning. Morgan kaufmann.
 Ricca et al. (2009) Ricca, F., Gallucci, L., Schindlauer, R., Dell’Armi, T., Grasso, G., and Leone, N. 2009. OntoDLV: an ASPbased system for enterprise ontologies. Journal of Logic and Computation 19, 4, 643–670.
 Ricca et al. (2012) Ricca, F., Grasso, G., Alviano, M., Manna, M., Lio, V., Iiritano, S., and Leone, N. 2012. Teambuilding with Answer Set Programming in the GioiaTauro Seaport. Theory and Practice of Logic Programming 12, 3, 361–381.
 Ricca et al. (2010) Ricca, F., Dimasi, A., Grasso, G., Ielpa, S.M., Iiritano, S., Manna, M., and Leone, N. 2010. A LogicBased System for eTourism. Fundamenta Informaticae. 105 (2010) 35–55
 Rice (1976) Rice, J. R. 1976. The algorithm selection problem. Advances in Computers 15, 65–118.
 Rullo et al. (2009) Rullo, P., Policicchio, V. L., Cumbo, C., and Iiritano, S. 2009. Olex: Effective rule learning for text categorization. IEEE Transactions on Knowledge and Data Engineering 21, 8, 1118–1132.
 Samulowitz and Memisevic (2007) Samulowitz, H. and Memisevic, R. 2007. Learning to solve QBF. In Proceedings of the 22th AAAI Conference on Artificial Intelligence. AAAI Press, Vancouver, Canada, 255–260.
 Simons et al. (2002) Simons, P., Niemelä, I., and Soininen, T. 2002. Extending and Implementing the Stable Model Semantics. Artificial Intelligence 138, 181–234.
 Smaragdakis et al. (2011) Smaragdakis, Y., Bravenboer, M., and Lhoták, O. 2011. Pick your contexts well: understanding objectsensitivity. In Proceedings ot the 38th Symposium on Principles of Programming Languages, POPL 2011. 17–30.
 smtlibweb (2011) smtlibweb. 2011. The Satisfiability Modulo Theories Library. http://www.smtlib.org/.
 Gerevini et al. (2009) Gerevini, A., Saetti, A., and Vallati, M. 2009. An automatically configurable portfoliobased planner with macroactions: Pbp. In Proc. of the 19th International Conference on Automated Planning and Scheduling, A. Gerevini, A. E. Howe, A. Cesta, and I. Refanidis, Eds. AAAI, Thessaloniki, Greece.
 Vallati et al. (2011) Vallati, M., Fawcett, C., Gerevini, A., Hoos, H., and Saetti, A. 2011. Generating fast domainspecific planners by automatically configuring a generic parameterised planner. In Working notes of 21st International Conference on Automated Planning and Scheduling (ICAPS11) Workshop on Planning and Learning.
 Wittocx et al. (2008) Wittocx, J., Mariën, M., and Denecker, M. 2008. The idp system: a model expansion system for an extension of classical logic. In Logic and Search, Computation of Structures from Declarative Descriptions (LaSh 2008). Leuven, Belgium, 153–165.
 Xu et al. (2008) Xu, L., Hutter, F., Hoos, H. H., and LeytonBrown, K. 2008. SATzilla: Portfoliobased algorithm selection for SAT. Journal of Artificial Intelligence Research 32, 565–606.
Comments
There are no comments yet.