1 Introduction
Significant strides have recently been made in the application of portfoliobased algorithms in the fields of constraint satisfaction
[15], quantified boolean formulae [17], and most notably in sat [23, 6]. Having a collection of solvers, these approaches compute a set of representative features about a problem instance and then use this information to decide what is the most effective solver to employ. These decisions can be made based on regression techniques [23], in which a classifier is trained to predict expected runtime of each solver and choosing the one with best predicted performance. Alternatively, a ranking algorithm can be trained to directly predict the best solver for each instance
[5]. The features can also be used for clustering [7], where the best solver is chosen for each cluster of instances. In practice, regardless of the approach, portfolio algorithms have been shown to be dramatically better than using a single solver.Algorithm portfolios also rely on a good set of features to describe the problem instance being solved. This can be seen as a major drawback since one needs to use specific features for each problem at hand, or worse, has to come up with a set of features if none exists. If there are not enough informative features present, it is impossible to train a classifier to differentiate between classes of instances. On the other hand, if there are too many features it is possible to over fit the classifier to the training data. Furthermore, a large feature set is likely to have noisy features, which could be detrimental to the quality of the learned classifier. In the sat domain, the features used by the solvers dominating the competitions have been thoroughly analyzed and studied over the last decade. Unfortunately, many other fields do not have such well established feature set. Even in the case of constraint satisfaction problems, where a feature set has been proposed, careful filtering can dramatically improve the quality of portfolios [9].
However, while there might not be an existing feature set, for NPcomplete problems there exist polynomialtime transformations to any other NPcomplete problem. In this paper we propose to take advantage of this by transforming csp instances to sat as a preprocessing step before computing its features. We show that such a transformation retains the necessary information needed to differentiate the classes of instances. In particular we show the effectiveness of this approach on constraint satisfaction problems. We choose the csp domain for two reasons. First, it has a large number of solvers that can be used to make a diversified portfolio. Second, because a feature set exists for csps, we can compare the quality of a portfolio trained on sat features to the domain specific csp features.
There has been a lot of work exploring the effect of transforming csp instances into sat. Perhaps the most relevant work is by Ansótegui and Manyà which evaluated the performance of sat solvers on six satencodings on graph colouring, random binary csps, pigeon hole, and all interval series problems [1]. Solvers such as sugar [19], azucar [20], and CSP4SAT4J [11] have similarly tackled csp problems by encoding them into sat and then solving them with a predefined sat solver. However, as far as we are aware, this paper represents the first time that a portfolio has been created using features gained after transforming a problem from one domain to another.
2 Encodings
There are a number of known polynomialtime transformations, or encodings, from constraint satisfaction problems to sat [16]. In this paper we focus on three commonly used encodings: the direct, order and support encodings.
2.1 Direct Encoding
In the direct encoding [22] for each csp variable , with domain , a sat variable is created for each domain value, i.e. . If is in the resulting sat formula, then the csp variable is assigned the value in the csp solution. Therefore, in order to represent a solution to the csp exactly one of must be assigned . We add an atleastone clause and atmostone clauses to the sat formula for each csp variable :
Constraints between csp variables are represented in the direct encoding by enumerating the conflicting tuples. For a binary constraint between the pair of variables and , if the tuple is forbidden, then we add the conflict clause .
2.2 Support Encoding
The support encoding [8, 3] uses the same mechanism as the direct encoding to translate a csp variable’s domain into sat. However, the support encoding differs on how the constraints between variables are encoded. Given a constraint between two variables and , for each value in the domain of , let be the subset of the values in the domain of which are consistent with assigning . Either is false or one of the consistent assignments from must be true, represented by the clause:
This must be repeated by adding clauses for each value in the domain of and listing the values in which are consistent with each assignment.
2.3 Order Encoding
Unlike the direct and support encoding which model as a sat variable, the order encoding creates sat variables to represent . If is less than or equal to , then must also be less than or equal to . To enforce this across the domain we add the clauses:
The order encoding is naturally suited to modelling inequality constraints. To state , we would just post the unit clause . If we want to model the constraint , we could rewrite it as . can then be rewritten as . To state that under the order encoding, we would encode . A conflicting tuple between two variables, for example can be written in propositional logic and simplified to a cnf clause using De Morgan’s Law:
3 Feature Computation
In addition to the pure direct, support and order encodings discussed in the previous section, we also consider variants of these encodings in which the clauses that encode the domains of the variables are not included. We omit the domains in order to test whether focusing only on the constraints present in a csp is enough to differentiate the instances. We now briefly describe the features used for CSP and SAT.
CSP Features. We compute features for each of the original csp instances, plus for each of the six encodings. We record 36 features directly from the csp instance using mistral [4]. This includes static features such as statistics about the types of constraints used, average and maximum domain size; and dynamic statistics recorded by running mistral
for 2 seconds: average and standard deviation of variable weights, number of nodes, number of propagations and a few others.
SAT Features. We use the 54 features computed using the newest feature computation tool from UBC [13]. These features include problem size features, graphbased features, balance features, proximity to horn formula features, DPLL probing features, and local search probing features.
4 Numerical Results
We implemented a tool to translate a csp instance specified in XCSP format [18] into sat (cnf). At present, it is capable of encoding inequality and binary extensional constraints using the direct, support and order encoding.
Benchmarks. For our evaluation, we consider csp problem instances from the csp solver
competition.^{1}^{1}1csp solver competition instances
http://www.cril.univartois.fr/~lecoutre/benchmarks.html Of these, we
consider the instances that contain either inequality or binary extensional
constraints. This presents a pool of 2,433 instances, containing Graph
Colouring, Random, Quasirandom, Black Hole, Quasigroup Completion, Quasigroup
With Holes, Langford, Towers of Hanoi and Pigeon Hole problems.
Portfolio Approach. To train our portfolios we used the ISAC methodology [7] which has been shown to work better than a regression based approaches [14]. ISAC uses the computed features to cluster the instances. Then for each cluster, the best solver in the portfolio is selected. When a new instance needs to be solved, its features are computed, it is assigned to the nearest cluster, and subsequently solved using the appropriate solver.
For our csp solver portfolio we used: abscon [12], csp4j [11], sat4j [10], pcs [21], gecode [2], and sugar [19]. Each instance was run for 3,600 seconds. It is important to note that we include the time required for encoding the instances and computing the features as part of the computation time.
PAR 10  

Approach  CSP  Direct  Direct  Order  Order  Support  Support 
ND  ND  ND  
VBS  1792  1887  1793  1806  1806  1810  1811 
Portfolio  2066  3312  3221  2689  2077  2084  2022 
Random Cluster  3806  3705  3424  3725  3797  3867  3902 
Best Single  4776  4870  4777  4789  4789  4792  4792 
Number Solved  
Approach  CSP  Direct  Direct  Order  Order  Support  Support 
ND  ND  ND  
VBS  2315  2310  2315  2315  2315  2315  2315 
Portfolio  2297  2215  2220  2256  2297  2297  2301 
Random Cluster  2180  2188  2206  2187  2182  2177  2175 
Best Single  2115  2110  2115  2115  2115  2115  2115 
We perform our experiments using stratified 10fold cross validation. In Table 1, we present the performance for both the number of solved instances and the penalized runtime average PAR 10 which counts each timeout as taking 10 times the timeout to complete for each problem representation. The SAT encodings without the variable domains are marked with ND. We compare the portfolio performance to the best single solver as well as to the oracle Virtual Best Solver (VBS) which for every instance always selects the fastest solver. As we can see, using a portfolio approach for csp instances is always preferable to just choosing to run a single solver. We also compare to a random clustering approach, which randomly groups the instances of the test set into the same number of clusters as the portfolio method and finds the best solver to run on each group. Note that the random clustering is trained on the same data it is evaluated on, and further that in practice one would not know to which cluster to assign a new instance. The random clustering approach is included to show that the clusters found by ISAC are indeed capturing important information about the instances. We observe this because in all cases Portfolio is better than the Random Clustering approach.
Table 1 also shows that regardless of the encoding we use, we can always close at least 50% of the performance gap between the best single solver and the virtual best one. Furthermore, we see that if we use particularly accurate encoding, which in our case is the support encoding without domain clauses, we can even achieve slightly better performance than using features that have been specifically designed for the problem domain.
5 Conclusion
In this paper we show that it is possible to encode an instance from one problem domain to another as a preprocessing step for feature computation. In particular, we show that even with the overhead of converting csp instances to sat, a csp portfolio trained on well established sat features can perform just as well as if it was trained on csp
specific features. These findings show that encoding techniques can retain enough information about the original instance to accurately differentiate different classes of instances. Our results serves as a proof of concept for an automated feature generation approach for NPcomplete problems that do not have a well studied feature vector. We consider this as a step toward problem independent feature computation for algorithm portfolios, and we plan to analyze it further and extend its applications in the future.
Acknowledgements
The second author was supported by Paris Kanellakis fellowship at Brown University when conducting the work contained in this document. This document reflects his opinions only and should not be interpreted, either expressed or implied, as those of his current employer.
References
 [1] Ansótegui, C., Manyà, F.: Mapping Problems with FiniteDomain Variables into Problems with Boolean Variables. In: The 7th International Conference on Theory and Applications of Satisfiability Testing, SAT 2004 (2004)
 [2] Gecode Team: Gecode: Generic Constraint Development Environment (2006), http://www.gecode.org

[3]
Gent, I.P.: Arc Consistency in SAT. In: Proceedings of the 15th European Conference on Artificial Intelligence, ECAI’2002. pp. 121–125 (2002)
 [4] Hebrard, E.: Mistral,a Constraint Satisfaction Library. In: Proceedings of the Third International CSP Solver Competition (2009)
 [5] Hurley, B., O’Sullivan, B.: Adaptation in a CBRBased Solver Portfolio for the Satisfiability Problem. In: CaseBased Reasoning Research and Development  20th International Conference, ICCBR 2012. pp. 152–166 (2012)
 [6] Kadioglu, S., Malitsky, Y., Sabharwal, A., Samulowitz, H., Sellmann, M.: Algorithm Selection and Scheduling. In: Proceedings of the 17th International Conference on Principles and Practice of Constraint Programming. pp. 454–469. CP’11, SpringerVerlag, Berlin, Heidelberg (2011)
 [7] Kadioglu, S., Malitsky, Y., Sellmann, M., Tierney, K.: ISAC  InstanceSpecific Algorithm Configuration. In: Coelho, H., Studer, R., Wooldridge, M. (eds.) ECAI. Frontiers in Artificial Intelligence and Applications, vol. 215, pp. 751–756. IOS Press (2010)
 [8] Kasif, S.: On the Parallel Complexity of Discrete Relaxation in Constraint Satisfaction Networks. Artificial Intelligence 45(3), 275–286 (Oct 1990)
 [9] Kroer, C., Malitsky, Y.: Feature filtering for instancespecific algorithm configuration. In: IEEE 23rd International Conference on Tools with Artificial Intelligence, ICTAI 2011. pp. 849–855 (2011)
 [10] Le Berre, D., Parrain, A.: The sat4j library, release 2.2 system description. Journal on Satisfiability, Boolean Modeling and Computation 7, 59–64 (2010)
 [11] Le Berre, D., Lynce, I.: CSP2SAT4J: A Simple CSP to SAT Translator. In: Proceedings of the 2nd International CSP Solver Competition (2008)
 [12] Lecoutre, C., Tabary, S.: Abscon 112, Toward more Robustness. In: Proceedings of the Third International CSP Solver Competition (2009)
 [13] Lin Xu, Frank Hutter, H.H., LeytonBrown, K.: Features for SAT (2012), http://www.cs.ubc.ca/labs/beta/Projects/SATzilla/Report_SAT_features.pdf
 [14] Malitsky, Y., Sabharwal, A., Samulowitz, H., Sellmann, M.: Nonmodelbased algorithm portfolios for sat. In: Proceedings of the 14th international conference on Theory and application of satisfiability testing. pp. 369–370. SAT’11, SpringerVerlag, Berlin, Heidelberg (2011), http://dl.acm.org/citation.cfm?id=2023474.2023517
 [15] O’Mahony, E., Hebrard, E., Holland, A., Nugent, C., O’Sullivan, B.: Using Casebased Reasoning in an Algorithm Portfolio for Constraint Solving. Proceeding of the 19th Irish Conference on Artificial Intelligence and Cognitive Science (2008)
 [16] Prestwich, S.D.: CNF Encodings. In: Handbook of Satisfiability, pp. 75–97. IOS Press (2009)
 [17] Pulina, L., Tacchella, A.: A multiengine solver for quantified boolean formulas. In: Proceedings of the 13th international conference on Principles and practice of constraint programming. pp. 574–589. CP’07, SpringerVerlag, Berlin, Heidelberg (2007)
 [18] Roussel, O., Lecoutre, C.: XML Representation of Constraint Networks: Format XCSP 2.1. CoRR abs/0902.2362 (2009)
 [19] Tamura, N., Tanjo, T., Banbara, M.: System Description of a SATbased CSP Solver Sugar. In: Proceedings of the 3rd International CSP Solver Competition. pp. 71–75 (2009)
 [20] Tanjo, T., Tamura, N., Banbara, M.: Azucar: A SATBased CSP Solver Using Compact Order Encoding — (Tool Presentation). In: Proceedings of the 15th International Conference on Theory and Applications of Satisfiability Testing (SAT 2012), LNCS 7317. pp. 456–462. Springer (2012)
 [21] Veksler, M., Strichman, O.: A ProofProducing CSP Solver. In: Proceedings of the TwentyFourth AAAI Conference on Artificial Intelligence, AAAI 2010 (2010)
 [22] Walsh, T.: SAT v CSP. In: Principles and Practice of Constraint Programming — CP 2000, LNCS 1894. vol. 1894, pp. 441–456. SpringerVerlag (2000)
 [23] Xu, L., Hutter, F., Hoos, H.H., LeytonBrown, K.: SATzilla: Portfoliobased Algorithm Selection for SAT. Journal of Artificial Intelligence Research pp. 565–606 (June 2008)
Comments
There are no comments yet.