1 Introduction
Large search spaces are common in artificial intelligence, heuristics being of major importance in limiting search efforts. The role of a heuristic, depending on type of search algorithm, is to decrease the number of nodes expanded (e.g. in A* search), the number of candidate actions considered (planning), or the number of backtracks in constraint satisfaction problem (CSP) solvers. Nevertheless, some sophisticated heuristics have considerable computational overhead, significantly decreasing their overall effect
[Horsch and Havens2000, Kask et al.2004], even causing increased total runtime in pathological cases. It has been recognized that control of this overhead can be essential to improve search performance; e.g. by selecting which heuristics to evaluate in a manner dependent on the state of the search [Wallace and Freuder1992, Domshlak et al.2010].We propose a rational metareasoning approach [Russell and Wefald1991] to decide when and how to deploy heuristics, using CSP backtracking search as a case study. The heuristics examined are various solution count estimate heuristics for value ordering [Meisels et al.1997, Horsch and Havens2000], which are expensive to compute, but can significantly decrease the number of backtracks. These heuristics make a good case study, as their overall utility, taking computational overhead into account, is sometimes detrimental; and yet, by employing these heuristics adaptively, it may still be possible to achieve an overall runtime improvement, even in these pathological cases. Following the metareasoning approach, the value of information (VOI) of a heuristic is defined in terms of total search time saved, and the heuristic is computed such that the expected net VOI is positive.
We begin with background on metareasoning and CSP (Section 2), followed by a restatement of value ordering in terms of rational metareasoning (Section 3), allowing a definition of VOI of a valueordering heuristics — a contribution of this paper. This scheme is then instantiated to handle our casestudy of backtracking search in CSP (Section 4), with parameters specific to valueordering heuristics based on solutioncount estimates, the main contribution of this paper. Empirical results (Section 5) show that the proposed mechanism successfully balances the tradeoff between decreasing backtracking and heuristic computational overhead, resulting in a significant overall search time reduction. Other aspects of such tradeoffs are also analyzed empirically. Finally, possible future extensions of the proposed mechanism are discussed (Section 6), as well as an examination of related work.
2 Background
2.1 Rational metareasoning
In rational metareasoning [Russell and Wefald1991], a problemsolving agent can perform baselevel actions from a known set . Before committing to an action, the agent may perform a sequence of metalevel “deliberation” actions from a set . At any given time there is an “optimal” baselevel action, , that maximizes the agent’s expected utility:
(1) 
where is the set of possible world states, is the utility of performing action in state , and
is the probability that the current world state is
.A metalevel action provides information and affects the choice of the baselevel action . The value of information (VOI) of a metalevel action is the expected difference between the expected utility of and the expected utility of the current , where is the current belief distribution about the state of world, and is the beliefstate distribution of the agent after the computational action is performed, given the outcome of :
(2) 
Under certain assumptions, it is possible to capture the dependence of utility on time in a separate notion of time cost . Then, Equation (2) can be rewritten as:
(3) 
where the intrinsic value of information
(4) 
is the expected difference between the intrinsic expected utilities of the new and the old selected baselevel action, computed after the metalevel action is taken.
2.2 Constraint satisfaction
A constraint satisfaction problem (CSP) is defined by a set of variables , and a set of constraints . Each variable has a nonempty domain of possible values. Each constraint involves some subset of the variables—the scope of the constraint— and specifies the allowable combinations of values for that subset. An assignment that does not violate any constraints is called consistent (or a solution). There are numerous variants of CSP settings and algorithmic paradigms. This paper focuses on binary CSPs over discretevalues variables, and backtracking search algorithms [Tsang1993].
A basic method used in numerous CSP search algorithm is that of maintaining arc consistency (MAC) [Sabin and Freuder1997]. There are several versions of MAC; all share the common notion of arc consistency. A variable is arcconsistent with if for every value of from the domain there is a value of from the domain satisfying the constraint between and . MAC maintains arc consistency for all pairs of variables, and speeds up backtracking search by pruning many inconsistent branches.
CSP backtracking search algorithms typically employ both variable ordering [Tsang1993] and value ordering heuristics. The latter type include minimum conflicts [Tsang1993], which orders values by the number of conflicts they cause with unassigned variables, Geelen’s promise [Geelen1992] — by the product of domain sizes, and minimum impact [Refalo2004] orders values by relative impact of the value assignment on the product of the domain sizes.
Some valueordering heuristics are based on solution count estimates [Meisels et al.1997, Horsch and Havens2000, Kask et al.2004]: solution counts for each value assignment of the current variable are estimated, and assignments (branches) with the greatest solution count are searched first. The heuristics are based on the assumption that the estimates are correlated with the true number of solutions, and thus a greater solution count estimate means a higher probability that a solution be found in a branch, as well as a shorter search time to find the first solution if one exists in that branch. [Meisels et al.1997]
estimate solution counts by approximating marginal probabilities in a Bayesian network derived from the constraint graph;
[Horsch and Havens2000] propose the probabilistic arc consistency heuristic (pAC) based on iterative belief propagation for a better accuracy of relative solution count estimates; [Kask et al.2004] adapt Iterative JoinGraph Propagation to solution counting, allowing a tradeoff between accuracy and complexity. These methods vary by computation time and precision, although all are rather computationally heavy. Principles of rational metareasoning can be applied independently of the choice of implementation, to decide when to deploy these heuristics.3 Rational ValueOrdering
The role of (dynamic) valueordering is to determine the order of values to assign to a variable from its domain , at a search state where values have already been assigned to . We make the standard assumption that the ordering may depend on the search state, but is not recomputed as a result of backtracking from the initial value assignments to : a new ordering is considered only after backtracking up the search tree above .
Value ordering heuristics provide information on future search efforts, which can be summarized by 2 parameters:

—the expected time to find a solution containing assignment or verify that there are no such solutions;

—the “backtracking probability”, that there will be no solution consistent with .
These are treated as the algorithm’s subjective probabilities about future search in the current problem instance, rather than actual distributions over problem instances. Assuming correct values of these parameters, and independence of backtracks, the expected remaining search time in the subtree under for ordering is given by:
(5) 
In terms of rational metareasoning, the “current” optimal baselevel action is picking the which optimizes . Based on a general property of functions on sequences [Monma and Sidney1979], it can be shown that is minimal if the values are sorted by increasing order of .
A candidate heuristic (with computation time ) generates an ordering by providing an updated (hopefully more precise) value of the parameters for value assignments , which may lead to a new optimal ordering , corresponding to a new baselevel action. The total expected remaining search time is given by:
(6) 
Since both (the “time cost” of in metareasoning terms) and contribute to , even a heuristic that improves the estimates and ordering may not be useful. It may be better not to deploy at all, or to update only for some of the assignments. According to the rational metareasoning approach (Section 2.1), the intrinsic VOI of estimating for the th assignment is the expected decrease in the expected search time:
(7) 
where is the optimal ordering based on priors, and on values after updating . Computing new estimates (with overhead ) for values is beneficial just when the net VOI is positive:
(8) 
To simplify estimation of , the expected search time of an ordering is estimated as though the parameters are computed only for (essentially the metareasoning subtree independence assumption). Other value assignments are assumed to have the prior (“default”) parameters . Assume w.l.o.g. that :
(9) 
and the intrinsic VOI of the th deliberation action is:
(10) 
where is the search time gain given the heuristically computed values :
(11) 
In some cases, provides estimates only for the expected search time . In such cases, the backtracking probability can be bounded by the Markov inequality as the probability for the given assignment that the time to find a solution or verify that no solution exists is at least the time to find all solutions: , and the bound can be used as the probability estimate:
(12) 
Furthermore, note that in harder problems the probability of backtracking from variable is proportional to , and it is reasonable to assume that backtracking probabilities above (trying values for ) are still significantly greater than 0. Thus, the “default” backtracking probability is close to 1, and consequently:
(13) 
By substituting (12), (13) into (11), estimate (14) for is obtained:
(14)  
Finally, since (12), (13) imply that ,
(15) 
4 VOI of Solution Count Estimates
The estimated solution count for an assignment may be used to estimate the expected time to find a solution for the assignment under the following assumptions^{1}^{1}1We do not claim that this is a valid model of CSP search; rather, we argue that even with such a crude model one can get significant runtime improvements.:

Solutions are roughly evenly distributed in the search space, that is, the distribution of time to find a solution can be modeled by a Poisson process.

Finding all solutions for an assignment takes roughly the same time for all assignments to the variable . Prior work [Meisels et al.1997, Kask et al.2004] demonstrates that ignoring the differences in subproblem sizes is justified.

The expected time to find all solutions for an assignment divided by its solution count estimate is a reasonable estimate for the expected time to find a single solution.
Based on these assumptions, can be estimated as where is the expected time to find all solutions for all values of , and is the solution count estimate for ; likewise, , where is the currently greatest . By substituting the expressions for , into (15), obtain as the intrinsic VOI of computing :
(16) 
where
is the probability, according to the Poisson distribution, to find
solutions for a particular assignment when the mean number of solutions per assignment is , and is the estimated solution count for all values of , computed at an earlier stage of the algorithm.Neither nor , the time to estimate the solution count for an assignment, are known. However, for relatively low solution counts, when an invocation of the heuristic has high intrinsic VOI, both and are mostly determined by the time spent eliminating nonsolutions. Therefore, can be assumed approximately proportional to , the average time to find all solutions for a single assignment, with an unknown factor .
(17) 
Then, can be eliminated from both and . Following Equation (8), the solution count should be estimated whenever the net VOI is positive:
(18) 
The infinite series in (18) rapidly converges, and an approximation of the sum can be computed efficiently. As done in Section 5, can be learned offline from a set of problem instances of a certain kind for the given implementation of the search algorithm and the solution counting heuristic.
Algorithm 1 implements rational value ordering. The procedure receives problem instance with assigned values for variables , variable to be ordered, and estimate of the number of solutions of the problem instance (line 1); is computed at the previous step of the backtracking algorithm as the solution count estimate for the chosen assignment for , or, if , at the beginning of the search as the total solution count estimate for the instance. Solution counts estimates for some of the assignments are recomputed (lines 4–9), and then the domain of , ordered by nonincreasing solution count estimates of value assignments, is returned (lines 11–12).
5 Empirical Evaluation
Specifying the algorithm parameter is the first issue. should be a characteristic of the implementation of the search algorithm, rather than of the problem instance; it is also desirable that the performance of the algorithm not be too sensitive to fine tuning of this parameter.
Most of the experiments were conducted on sets of random problem instances generated according to Model RB [Xu and Li2000]. The empirical evaluation was performed in two stages. In the first stage, several benchmarks were solved for a wide range of values of , and an appropriate value for was chosen. In the second stage, the search was run on two sets of problem instances with the chosen , as well as with exhaustive deployment, and with the minimum conflicts heuristic, and the search time distributions were compared for each of the value ordering heuristics.
The AC3 version of MAC was used for the experiments, with some modifications [Sabin and Freuder1997]. Variables were ordered using the maximum degree variable ordering heuristic.^{2}^{2}2A dynamic variable ordering heuristic, such as dom/deg, may result in shorter search times in general, but gave no significant improvement in our experiments; on the other hand, static variable ordering simplifies the analysis. The solution counting heuristic was based on the solution count estimate proposed in [Meisels et al.1997]. The source code is available from http://ftp.davidashen.net/vsc.tar.gz.
5.1 Benchmarks
CSP benchmarks from CSP Solver Competition 2005 [Boussemart et al.2005] were used. 14 out of 26 benchmarks solved by at least one of the solvers submitted for the competition could be solved with 30 minutes timeout by the solver used for this empirical study for all values of : and the exponential range , as well as with the minimumconflicts heuristic and the pAC heuristic.
Figure 1.a shows the mean search time of VOIdriven solution count estimate deployment normalized by the search time of exhaustive deployment (), for the minimum conflicts heuristic , and for the pAC heuristic . The shortest search time on average is achieved by VSC for (shaded in the figure) and is much shorter than for SC (); the improvement is actually close to getting all the information provided by the heuristic without paying the overhead at all. For all but one of the 14 benchmarks the search time for VSC with is shorter than for MC. For most values of , VSC gives better results than MC (). pAC always results in the longest search time due to the computational overhead.
Figure 1.b shows the mean number of backtracks of VOIdriven deployment normalized by the number of backtracks of exhaustive deployment , the minimum conflicts heuristic , and for the pAC heuristic . VSC causes less backtracking than MC for (). pAC always causes less backtracking than other heuristics, but has overwhelming computational overhead.
Figure 1.c shows , the number of estimated solution counts of VOIdriven deployment, normalized by the number of estimated solution counts of exhaustive deployment . When and the best search time is achieved, the solution counts are estimated only in a relatively small number of search states: the average number of estimations is ten times smaller than in the exhaustive case (, ).
The results show that although the solution counting heuristic may provide significant improvement in the search time, further improvement is achieved when the solution count is estimated only in a small fraction of occasions selected using rational metareasoning.
5.2 Random instances
Based on the results on benchmarks, we chose , and applied it to two sets of 100 problem instances. Exhaustive deployment, rational deployment, the minimum conflicts heuristic, and probabilistic arc consistency were compared.
The first, easier, set was generated with 30 variables, 30 values per domain, 280 constraints, and 220 nogoods per constraint. Search time distributions are presented in Figure 2.a. The shortest mean search time is achieved for rational deployment, with exhaustive deployment next (), followed by the minimum conflicts heuristic () and probabilistic arc consistency (). Additionally, while the search time distributions for solution counting are sharp (, ), the distribution for the minimum conflicts heuristic has a long tail with a much longer worst case time ().
The second, harder, set was generated with 40 variables, 19 values, 410 constraints, 90 nogood pairs per constraint. Search time distributions are presented in Figure 2.b. As with the first set, the shortest mean search time is achieved for rational deployment: , while the relative mean search time for the minimum conflicts heuristic is much longer: . The probabilistic arc consistency heuristic resulted again in the longest search time due to the overhead of computing relative solution count estimates by loopy belief propagation: .
Thus, the value of chosen based on a small set of hard instances gives good results on a set of instances with different parameters and of varying hardness.
5.3 Generalized Sudoku
Randomly generated problem instances have played a key role in the design and study of heuristics for CSP. However, one might argue that the benefits of our scheme are specific to model RB. Indeed, realworld problem instances often have much more structure than random instances generated according to Model RB. Hence, we repeated the experiments on randomly generated Generalized Sudoku instances [Ansótegui et al.2006], since this domain is highly structured, and thus a better source of realistic problems with a controlled measure of hardness.
The search was run on two sets of 100 Generalized Sudoku instances, with 4x3 tiles and 90 holes and with 7x4 tiles and 357 holes, with holes punched using the doubly balanced method [Ansótegui et al.2006]. The search was repeated on each instance with the exhaustive solutioncounting, VOIdriven solution counting (with the same value of as for the RB model problems), minimum conflicts, and probabilistic arc consistency value ordering heuristics. Results are summarized in Table 1 and show that relative performance of the methods on Generalized Sudoku is similar to the performance on Model RB.
, sec  

4x3, 90 holes  1.809  0.755  1.278  22.421 
7x4, 357 holes  21.328  0.868  3.889  3.826 
5.4 Deployment patterns
One might ask whether trivial methods for selective deployment would work. We examined deployment patterns of VOIdriven SC with () on several instances of different hardness. For all instances, the solution counts were estimated at varying rates during all stages of the search, and the deployment patterns differ between the instances, so a simple deployment scheme seems unlikely.
VOIdriven deployment also compares favorably to random deployment. Table 2 shows performance of VOIdriven deployment for and of uniform random deployment, with total number of solution count estimations equal to that of the VOIdriven deployment. For both schemes, the values for which solution counts were not estimated were ordered randomly, and the search was repeated 20 times. The mean search time for the random deployment is times longer than for the VOIdriven deployment, and has
times greater standard deviation.
, sec  , sec  , sec  

VOIdriven  19.841  19.815  0.188 
random  31.421  42.085  20.038 
6 Discussion and related work
The principles of bounded rationality appear in [Horvitz1987]. [Russell and Wefald1991] provided a formal description of rational metareasoning and case studies of applications in several problem domains. A typical use of rational metareasoning in search is in finding which node to expand, or in a CSP context determining a variable or value assignment. The approach taken in this paper adapts these methods to whether to spend the time to compute a heuristic.
Runtime selection of heuristics has lately been of interest, e.g. deploying heuristics for planning [Domshlak et al.2010]. The approach taken is usually that of learning
which heuristics to deploy based on features of the search state. Although our approach can also benefit from learning, since we have a parameter that needs to be tuned, its value is mostly algorithm dependent, rather than probleminstance dependent. This simplifies learning considerably, as opposed to having to learn a classifier from scratch. Comparing metareasoning techniques to learning techniques (or possibly a combination of both, e.g. by learning more precise distribution models) is an interesting issue for future research.
Although rational metareasoning is applicable to other types of heuristics, solutioncount estimation heuristics are natural candidates for the type of optimization suggested in this paper. [Dechter and Pearl1987] first suggested solution count estimates as a valueordering heuristic (using propagation on trees) for constraint satisfaction problems, refined in [Meisels et al.1997] to multipath propagation.
[Horsch and Havens2000] used a valueordering heuristic that estimated relative solution counts to solve constraint satisfaction problems and demonstrated efficiency of their algorithm (called pAC, probabilistic Arc Consistency). However, the computational overhead of the heuristic was large, and the relative solution counts were computed offline. [Kask et al.2004] introduced a CSP algorithm with a solution counting heuristic based on the Iterative JoinGraph Propagation (IJGPSC), and empirically showed performance advances over MAC in most cases. In several cases IJGPSC was still slower than MAC due to the computational overhead.
Impactbased value ordering [Refalo2004] is another heavy informative heuristic. One way to decrease its overhead, suggested in [Refalo2004], is to learn the impact of an assignment by averaging the impact of earlier assignments of the same value to the same variable. Rational deployment of this heuristic by estimating the probability of backtracking based on the impact may be possible, an issue for future research. [Gomes et al.2007] propose a technique that adds random generalized XOR constraints and counts solutions with high precision, but at present requires solving CSPs, thus seems not to be immediately applicable as a search heuristic.
The work presented in this paper differs from the above related schemes in that it does not attempt to introduce new heuristics or solutioncount estimates. Rather, an “off the shelf” heuristic is deployed selectively based on value of information, thereby significantly reducing the heuristic’s “effective” computational overhead, with an improvement in performance for problems of different size and hardness.
In summary, this paper suggests a model for adaptive deployment of value ordering heuristics in algorithms for constraint satisfaction problems. As a case study, the model was applied to a value ordering heuristic based on solution count estimates, and a steady improvement in the overall algorithm performance was achieved compared to always computing the estimates, as well as to other simple deployment tactics. The experiments showed that for many problem instances the optimum performance is achieved when solution counts are estimated only in a relatively small number of search states.
Acknowledgments
The research is partially supported by the IMG4 Consortium under the MAGNET program of the Israeli Ministry of Trade and Industry, by Israel Science Foundation grant 305/09, by the Lynne and William Frankel Center for Computer Sciences, and by the Paul Ivanier Center for Robotics Research and Production Management.
References
 [Ansótegui et al.2006] Carlos Ansótegui, Ramón Béjar, César Fernàndez, Carla Gomes, and Carles Mateu. The impact of balancing on problem hardness in a highly structured domain. In Proc. of 9th Int. Conf. on Theory and Applications of Satisfiability Testing (SAT ’06), 2006.
 [Boussemart et al.2005] Frédéric Boussemart, Fred Hemery, and Christophe Lecoutre. Description and representation of the problems selected for the first international constraint satisfaction solver competition. Technical report, Proc. of CPAI’05 workshop, 2005.
 [Dechter and Pearl1987] Rina Dechter and Judea Pearl. Networkbased heuristics for constraintsatisfaction problems. Artif. Intell., 34:1–38, December 1987.
 [Domshlak et al.2010] Carmel Domshlak, Erez Karpas, and Shaul Markovitch. To max or not to max: Online learning for speeding up optimal planning. In AAAI, 2010.
 [Geelen1992] Pieter Andreas Geelen. Dual viewpoint heuristics for binary constraint satisfaction problems. In Proc. 10th European Conf. on AI, ECAI ’92, pages 31–35, New York, NY, USA, 1992. John Wiley & Sons, Inc.
 [Gomes et al.2007] Carla P. Gomes, Willem jan Van Hoeve, Ashish Sabharwal, and Bart Selman. Counting CSP solutions using generalized XOR constraints. In AAAI, pages 204–209, 2007.
 [Horsch and Havens2000] Michael C. Horsch and William S. Havens. Probabilistic arc consistency: A connection between constraint reasoning and probabilistic reasoning. In UAI, pages 282–290, 2000.
 [Horvitz1987] Eric J. Horvitz. Reasoning about beliefs and actions under computational resource constraints. In Proceedings of the 1987 Workshop on Uncertainty in Artificial Intelligence, pages 429–444, 1987.
 [Kask et al.2004] Kalev Kask, Rina Dechter, and Vibhav Gogate. Countingbased lookahead schemes for constraint satisfaction. In Proc. of 10th Int. Conf. on Constraint Programming (CP’04), pages 317–331, 2004.
 [Meisels et al.1997] Amnon Meisels, Solomon Eyal Shimony, and Gadi Solotorevsky. Bayes networks for estimating the number of solutions to a CSP. In Proc. of the 14th National Conference on AI, pages 179–184, 1997.
 [Monma and Sidney1979] Clyde L. Monma and Jeffrey B. Sidney. Sequencing with seriesparallel precedence constraints. Mathematics of Operations Research, 4(3):215–224, August 1979.
 [Refalo2004] Philippe Refalo. Impactbased search strategies for constraint programming. In CP, pages 557–571. Springer, 2004.
 [Russell and Wefald1991] Stuart Russell and Eric Wefald. Do the right thing: studies in limited rationality. MIT Press, Cambridge, MA, USA, 1991.
 [Sabin and Freuder1997] Daniel Sabin and Eugene C. Freuder. Understanding and improving the MAC algorithm. In 3rd Int. Conf. on Principles and Practice of Constraint Programming, LNCS 1330, pages 167–181. Springer, 1997.
 [Tsang1993] Edward Tsang. Foundations of Constraint Satisfaction. Academic Press, London and San Diego, 1993.
 [Wallace and Freuder1992] Richard J. Wallace and Eugene C. Freuder. Ordering heuristics for arc consistency algorithms. In AI/GI/VI ’92, pages 163–169, 1992.

[Xu and Li2000]
Ke Xu and Wei Li.
Exact phase transitions in random constraint satisfaction problems.
Journal of Artificial Intelligence Research, 12:93–103, 2000.
Comments
There are no comments yet.