aspcud: A Linux Package Configuration Tool Based on Answer Set Programming

09/01/2011 ∙ by Martin Gebser, et al. ∙ 0

We present the Linux package configuration tool aspcud based on Answer Set Programming. In particular, we detail aspcud's preprocessor turning a CUDF specification into a set of logical facts.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Answer Set Programming (ASP; [4]) owes its increasing popularity as a tool for Knowledge Representation and Reasoning (KRR; [12]

) to its attractive combination of a rich yet simple modeling language with high-performance solving capacities. The basic idea of ASP is to represent a given computational problem by a logic program whose answer sets correspond to solutions, and then use an ASP solver for finding answer sets of the program. This approach is closely related to the one pursued in propositional Satisfiability Testing (SAT; 

[5]), where a given problem is encoded as a propositional theory such that models represent solutions to the problem. Even though, syntactically, ASP programs resemble Prolog programs, they are treated by rather different computational mechanisms, based on advanced Boolean Constraint Satisfaction technology. Albeit SAT and ASP both focus on the generation of propositional models, they differ regarding the semantics of negation, which is classical in SAT and by default in ASP. The built-in completion of “negative knowledge” admits compact problem specifications in ASP, using rules to describe the formation of solution candidates and integrity constraints to deny unintended ones.








Figure 1: Workflow of aspcud.

Pioneering work on Linux package configuration was done by Tommi Syrjänen in [17], using ASP for representing and solving configuration problems for the Debian GNU/Linux system. Following this tradition, we developed the ASP-based Linux package configuration tool aspcud, leveraging modern ASP technology for solving package configuration problems posed in the context of the mancoosi project [14]. As shown in Figure 1, aspcud comprises four components, all of which are freely available at [3] (and via [16]). A given specification (in CUDF; [18]) is first preprocessed and mapped to a set of (logical) facts; this step is explained in Section 2. As detailed in Section 3, the facts are then combined with one or more (first-order) ASP encodings of the package configuration problem and jointly passed to the ASP grounder gringo [8]. (Our ASP encodings, which are also presented in a companion paper [7] detailing multi-criteria optimization capacities of the ASP solver clasp [9] and evaluating them on package configuration problems, are provided here for completeness.) The instantiation of first-order variables upon grounding results in a propositional logic program whose answer sets, representing problem solutions, are in turn computed by clasp. The impact of preprocessing on residual problem size as well as solving efficiency is empirically assessed in Section 4. (We do not vary solving strategies here; an experimental comparison between different solving strategies can be found in [6, 7].) Finally, in Section 5, we discuss and compare our methodology with related package configuration approaches.

2 Preprocessing

Our package configuration tool aspcud accepts input in Common Upgradability Description Format (CUDF), developed in the mancoosi project to specify interdependencies of packages belonging to large software distributions. The task of a package manager is to find admissible installations satisfying particular user requests, typically also taking into account soft criteria, such as minimal change of an existing installation. While CUDF admits arithmetic expressions, package formulae, and virtual packages (see below), aspcud’s preprocessor generates a flat representation of package interdependencies, so that they can be conveniently handled by the ASP components of aspcud taking over afterwards. Below, we give a quick overview of CUDF and optimization criteria, and then describe the generation of ASP facts.

2.1 Common Upgradability Description Format (CUDF)

The general schema of a “CUDF document” (with an optional preamble; cf. [18]) is as follows:

 package: name    package: name    package: name
 version: vers    version: vers    version: vers     request:
 description      description      description       description

The pairs for identify installable packages along with positive integer versions; they must be mutually distinct, that is, or must hold for all . Then, the universe described by a CUDF document is the set of pairs identifying installable versioned packages.

Each pair can be accompanied with (optional) properties provided in description. In the most general form, a statement in description looks as follows:

 property: |||, |||, , |||

In such a statement, determines a kind of package interdependency, ‘|’ and ‘,’ stand for disjunction and conjunction, respectively, and for is an expression of the form ‘name [op n]’, in which denotes an (optional) arithmetic operation along with a positive integer n. Moreover, if ‘installed: true’ is provided in description for , it means that package name in version vers belongs to an existing installation, and we denote the set of all such pairs by .

For a in the description below the keyword ‘request:’, for uniformity, we assume the same syntax as with package property statements considered before.111The specification of CUDF [18] is more restrictive by not allowing for disjunction in package formulae associated with . Moreover, note that CUDF additionally admits keep as property in description for , which we omitted here because it is straightforward to map keep to install. The requested properties describe goals that must be satisfied by a follow-up installation , where certain versioned packages might have to be installed, removed, or upgraded, respectively.

In order to abstract from arithmetic expressions admitted in CUDF, for ‘name [op n]’, we define:

We extend the notion of targets to package formulae associated with some by defining the following multiset:222Multisets are needed to reflect optimization criteria dealing with (un)satisfied recommendations, below collected in .

Moreover, let be for and , where either a unique package formula is provided for property in description, or if property is not specified in description. Likewise, we let for if no corresponding statement is provided in the description below ‘request:’, while the package formula defining property must be unique otherwise.

 package:    inst
 version:    3
 conflicts:  conf < 3
 package:    inst
 version:    2
 depends:    dep < 2
 package:    inst
 version:    1
 depends:    dep
 package:    conf
 version:    2
 package:    conf
 version:    1
 installed:  true
 package:    feat
 version:    1
 provides:   conf = 3
 package:    dep
 version:    3
 conflicts:  dep
 recommends: recomm
 package:    dep
 version:    2
 conflicts:  dep < 2
 package:    dep
 version:    1
 installed:  true
 package:    recomm
 version:    1
 conflicts:  option
 package:    option
 version:    1
 depends:    avail
 package:    avail
 version:    1
 installed:  true
 install:    inst
 upgrade:    conf > 1
Figure 2: CUDF document specifying the (non-empty) interdependencies , , , , , , , , and ; (non-empty) request targets consist of and .

As an example, consider the CUDF document shown in Figure 2. The existing installation, marked via ‘installed: true’, is . The universe, including all versioned packages, is . The CUDF document further specifies the (non-empty) multisets of targets of package interdependencies and requests, respectively, provided in the caption of Figure 2; their particular meanings are described below in the context of ASP fact generation.

2.2 Optimization Criteria

The preprocessor of aspcud takes optimization criteria evaluated in competitions by mancoosi [14] into account. Given a universe , an existing installation , and a follow-up installation , such criteria rely on the minimization or maximization of the following sets:

Here, is the collection of packages name such that some version vers belongs to , while  contains no pair ; that is, package name is new in the follow-up installation . Similarly, and collect packages name that are deleted or changed, respectively, where change means that some version vers of name is new or deleted in the transition from  to . The sets  and  investigate the follow-up installation  relative to the universe . A package name belongs to  if, for each pair in , there is some in  such that ; that is, the latest version of name is missing in . Finally, a triple in  points to a disjunction ‘|||’ in the recommends statement associated with such that neither contains nor provides any element of . In fact, by and , we refer to the union of  and the targets of its packages’ provides statements. This allows us to abstract from “virtual packages” that may not be installable themselves, but can be provided by other packages. Note that installable and virtual packages are not necessarily disjoint; e.g., the CUDF document in Figure 2 specifies version 1 and 2 of conf as installable, while version 3 is provided by . In the following, we indicate the objective of maximizing or minimizing the cardinality of any of the sets  defined above by writing or , respectively.

2.3 Generation of ASP Facts

We are now ready to specify the algorithm applied by aspcud’s preprocessor to compute the transitive closure  of versioned packages that may belong to a follow-up installation . The general idea is to include versioned packages by need, that is, if they are among the targets of some install or upgrade request, a depends statement, or may otherwise serve some user-specified objective. (E.g., describes the objective of installing as many new packages as possible, so that all pairs in  such that name does not occur in would be added to .) Given a universe , an existing installation , and a set of objectives, the transitive closure  is computed via Algorithm 1.

In Line 1 of Algorithm 1, “negative” requests given by remove and also upgrade are evaluated; packages that must not be installed are collected in to exclude their addition to  in the sequel. While exclusions due to remove statements are straightforward (any package fulfilling some remove target must not be installed), the issue becomes more involved with upgrade. On the one hand, any element of resembles an install request because it must be served by some package (directly or via a provided virtual package) in a follow-up installation . On the other hand, there are three additional requirements, which can make the installation of particular packages prohibitive. First, the version number of packages subject to upgrade must in a follow-up installation  not be smaller than in the existing installation  (if some version is provided by ). Second, exactly one version must be available in , so that packages providing several versions at once cannot belong to . Third, the install request implied by an upgrade target along with the unique version requirement prohibit the installation of packages providing only non-matching versions. These three conditions are taken into account to reflect upgrade requests in .333The CUDF specification [18] disallows disjunction in upgrade requests, and we here generalize upgrade targets to disjunction in an “arbitrary” way. However, in the case without disjunction, the packages included in due to an upgrade target cannot belong to a follow-up installation  according to the semantics given in [18]. (For the CUDF document in Figure 2, and can fulfill the target of the upgrade request ‘conf > 1’, while is excluded in view of its non-matching version.) Given the set of packages that must not belong to a follow-up installation , the test in Line 2 of Algorithm 1 identifies cases in which install or upgrade targets remain unsatisfiable, regardless of further preprocessing, so that can be immediately returned.

10987654321 if then return if then if then if then if then if then if then repeat
       14131211 if then if then
1615until return
Algorithm 1 Compute transitive closure  wrt. universe , existing installation , and objectives .

Provided that the test in Line 2 failed, packages not in that may serve some install or upgrade target are used to initialize the transitive closure  in Line 3. In Line 4–9, is further extended in view of the objectives in . As already mentioned, it might be desirable to install any version of a package name not occurring in the existing installation  if belongs to , describing the objective of installing as many new packages as possible; if so, is extended accordingly in Line 4. Note that the objectives of the form are useless in practice, as they favor follow-up installations  that are as different from , or as suboptimal regarding latest versions or recommends targets as possible. However, such “anti-optimization” would in principle be allowed in the user track of competitions by mancoosi, and thus Algorithm 1 includes cases to extend  accordingly. The reasonable cases in Line 5 and 7 apply if package removals or changes, respectively, are to be minimized, so that it may help to add all (installed) versions of packages name occurring in  to . For instance, if , aiming at the minimization of package removals, belongs to , , , , , and are added to  in Line 5 for the CUDF document in Figure 2, given that , , and are installed in . Note that the installed pair is not added to , as belongs to .

After its initialization wrt. requests (Line 3) and objectives (Line 4–9), the transitive closure  is successively extended in the loop in Line 10–15 of Algorithm 1. To this end, packages matching some dependency of elements already in  are collected in Line 11, provided that the installation of is not excluded by . Similarly, packages serving recommends statements of elements in  are collected in Line 12, but only if the minimization of unsatisfied recommendations is requested via the objective . Finally, if packages ought to be installed in their latest versions, as it can be specified via , we also collect such latest versions in Line 13. The three cases justifying the addition of packages to  are applied until saturation, and the obtained fixpoint is returned in Line 16. Any package remaining in belongs to , meaning that it must not be installed, or is irrelevant regarding dependencies, requests, and objectives. Hence, packages outside  need not be reflected in ASP facts (described below), so that both instance and residual problem size can be reduced. For the CUDF document in Figure 2, assuming that the objective is provided in , is initialized with

  • , , and in view of the request ‘install: inst’,

  • and in order to serve ‘upgrade: conf > 1’, and additionally

  • , , , and due to the objective .

While tracking the dependencies of these packages does not contribute any further elements to , if the objective is given in , ‘recommends: recomm’ associated with justifies the addition of to . The packages still outside  are , which is excluded due to the provided upgrade request, and , as it does not support any element of  and could thus be included only if some of the objectives and would reward new packages or changes, respectively.

Given the transitive closure  of relevant packages, the final step of aspcud’s preprocessor is to generate a representation of package interdependencies, requests, and objectives in terms of ASP facts. Note that, in competitions by mancoosi, objectives are lexicographically ordered by significance; hence, we below identify  with a sequence of objectives, written as in increasing order of significance, where and for . We further associate some ASP constant with each (newpackage for , remove for , change for , uptodate for , and recommend for ). Moreover, for any set of packages, we write to refer to some ASP constant associated with the set , where if . Then, the facts obtained for a CUDF document (specifying a universe  and an existing installation ), a sequence  of objectives, and  are collected in  as shown in Figure 3.

Figure 3: ASP facts for a CUDF document, a sequence  of objectives, and a set  of packages.

In Figure 3, the subset  of  groups packages fulfilling targets of package interdependencies or requests in sets , and respective facts introduce constants referring to . While facts over the predicate depends in (1) simply link the targets of dependencies to packages that provide them, recommends in (2) introduces a counter  along with each set  of packages fulfilling a recommendation  because several elements of the multiset may share the same providers . Also note that (2) contributes facts to  (and ) only if for is among the objectives in . The packages  considered by conflict in (3) are obtained by joining all in  before collecting their providers in . Note that can by definition (cf. [18]) not be in conflict with itself, even if it fulfills some ; this situation arises with in Figure 2, where ‘conflicts: dep’ specifies a universal conflict with any version of dep (and packages including dep in their provides statements). Additional conflicts may be induced by upgrade requests in view of their unique version requirement, and thus packages providing different elements of some are marked as conflicting via (4); for instance, the upgrade request ‘conf > 1’ in Figure 2 is reflected by facts ‘conflict(conf,2,).’ and ‘conflict(feat,1,).’, obtained because provides (as a virtual package). Finally, facts over the predicate request in (5) group packages  fulfilling install or upgrade requests to express that some element of  must be included in a follow-up installation . Note that all packages referred to in facts of , via in arguments or belonging to associated with some constant , are elements of the transitive closure ; that is, the package interdependencies and requests specified by  are limited to .

Figure 4: ASP facts obtained for the CUDF document in Figure 2 along with .

The full ASP instance  extracted from a CUDF document is obtained by joining  with further facts. The first group of them, given in (6)–(9) in Figure 3, links packages to via the predicate satisfies, where was introduced in . The second group of facts in (10)–(12) describes the transitive closure , the existing installation , and latest versions of packages in  via the predicates unit, installed, and newestversion. Moreover, facts over the predicate criterion in (3) represent objectives occurring in  by an associated constant and the polarity along with the position in . E.g., the facts obtained for the CUDF document in Figure 2 and the sequence of objectives are shown in Figure 4. Note that, in view of unspecified objectives regarding recommendations, the respective interdependency of package is not reflected in the facts. However, when would be added to , ‘’ along with further facts describing (then also included in ) would be obtained in .

3 Grounding and Solving

The facts  generated by the preprocessor serve as problem-specific input to the ASP components of aspcud, viz., the grounder gringo [8] and the solver clasp [9], while general knowledge about package configuration problems is provided via encodings. For one, the encoding configuration.lp in Figure 5 specifies admissible follow-up installations ; for another, optimization.lp in Figure 6 encodes optimization criteria (violations) and corresponding penalties. The encodings are written in the first-order input language of gringo, which instantiates the contained variables wrt.  to produce a propositional representation suitable for clasp. For space reasons, we confine the presentation to the encodings that appeared to be most successful in our preliminary, systematic experiments and are thus used by default in aspcud. However, major strengths of ASP are its first-order input language and the availability of grounders to instantiate them; this enables rapid prototyping of alternative problem formulations, and we indeed tested several encoding variants before deciding for the ones provided next.

3.1 Hard Constraints

Hard requirements for follow-up installations  are encoded in configuration.lp. Here, the rules in Line 3–10 are used to abstract from versions if a property applies to all (installable) versions of a package. Note that variables are universally quantified, where P stands for the name a package, X for a version of P, and D is an identifier, , for a set  of packages. In view of this, the auxiliary predicate pconflict defined in Line 3 projects out versions X from facts over conflict in . The rule in Line 4 then lifts a conflict between some version of P (and packages fulfilling D) to the package name P, provided that all (installable) versions X conflict with D; in fact, the condition ‘conflict(P,X,D) : unit(P,X)’, evaluated wrt. values for P and D given through pconflict(P,D), refers to the conjunction of conflict(P,X,D) over all instances of X such that unit(P,X) holds. From the facts  in Figure 4, conflict(conf,) and conflict(feat,) are derived via instances of the rules in Line 3 and 4, as conflict(conf,2,) and conflict(feat,1,) are provided by facts for the only (installable) versions 2 and 1 of conf and feat, respectively. The same approach to lift properties to package names P is applied to dependencies and satisfaction relationships (i.e., membership in a set  referred to by some , given via facts over the predicate satisfies).

1% analyze package interdependencies
3 pconflict(P,D) :-   conflict(P,X,D).
4  conflict(P,D) :-  pconflict(P,  D),  conflict(P,X,D) : unit(P,X).
6  pdepends(P,D) :-    depends(P,X,D).
7   depends(P,D) :-   pdepends(P,  D),   depends(P,X,D) : unit(P,X).
9psatisfies(P,D) :-  satisfies(P,X,D).
10 satisfies(P,D) :- psatisfies(P,  D), satisfies(P,X,D) : unit(P,X).
12% generate follow-up installation
14{ in(P,X) }     :- unit(P,X).
15  in(P)         :-   in(P,X).
17forbidden(D)    :-   in(P,X),  conflict(P,X,D).
18forbidden(D)    :-   in(P),    conflict(P,  D).
20requested(D)    :-   in(P,X),   depends(P,X,D).
21requested(D)    :-   in(P),     depends(P,  D).
23satisfied(D)    :-   in(P,X), satisfies(P,X,D).
24satisfied(D)    :-   in(P),   satisfies(P,  D).
26 :-   request(D), not satisfied(D).
27 :- requested(D), not satisfied(D).
28 :- forbidden(D),     satisfied(D).
30% project output
32#hide.  #show in/2.
Figure 5: ASP encoding of follow-up installations  wrt. facts  (configuration.lp).

While the rules described so far derive deterministic properties from facts, the “choice” rule in Line 14 of configuration.lp allows for guessing a follow-up installation . It describes that, for any instance of specified by the predicate unit, one may freely choose whether to include in(P,X) in an answer set; and a follow-up installation  is given by the instances of in(P,X) belonging to an answer set. Hence, the rule in Line 14 opens up the candidate space for , which is however limited to the transitive closure  (determined via Algorithm 1) because facts over unit do not include packages outside . The rule in Line 15 again abstracts from the version X of a package P in  by projecting out X from in(P,X). Once guessed, it remains to check whether a follow-up installation  is admissible. To this end, the rules in Line 17–24 collect the identifiers of target sets  of package interdependencies, divided by forbidden and requested target sets in view of conflicts and dependencies, respectively, of packages in , and satisfied target sets are determined in turn. The actual checks are implemented via the “constraints” in Line 26–28, which deny follow-up installations  such that the target set of a request (due to some install or upgrade statement in the original CUDF document) or a requested package dependency is not satisfied; furthermore, a target set forbidden in view of some conflict must not be satisfied. For instance, the requirement expressed by ‘request().’ in Figure 4 along with the constraint in Line 26 deny follow-up installations  that do not include any of the packages , , and because satisfied() can be derived only if in(inst,) holds for some . If so, an instance of the rule in Line 23 as well as the rules in Line 15 and 24 apply, where the latter relies on satisfies(inst,), which abstracts from versions of inst. Note that such abstractions and the rules in Line 18, 21, and 24 exploiting them are in principle redundant, since analogous rules considering versions in Line 17, 20, and 23 achieve the same effect, once a version X of P is determined via in(P,X). However, our preliminary empirical comparisons between several encoding variants suggested configuration.lp in Figure 5 as the most “efficient” encoding. Finally, an admissible follow-up installation  can be read off from instances of in(P,X) belonging to an answer set, and so we confine its displayed part accordingly in Line 32.

3.2 Soft Constraints

The encoding optimization.lp in Figure 6 builds on top of facts  and configuration.lp to identify optimization criteria violations and to assign corresponding penalties. While the rule in Line 1 merely projects out versions X of packages P installed in , the rules in Line 5–12 recognize changes, additions, and removals of packages P in the transition from  to . Note that any such violated maintenance condition is considered only if associated objectives are specified via facts over the predicate criterion in ; for the facts in Figure 4, the rules in Line 5–8 and 11–12 of Figure 6 are applicable, given that the sequence of objectives is expressed via ‘criterion(change,-1).’ and ‘criterion(remove,-2).’ Objectives regarding latest versions of packages in  and recommendations are addressed by the rules in Line 13–14 and 15–16, respectively. Note that the latter uses a different format, r(P,X,D), to indicate an unserved recommendation D of a package P in version X, where D is an identifier of the form for a target set ; in addition, the multiplicity of recommendation targets served by  is given in R. (Since violations of the other optimization criteria, identified in Line 5–14, are counted once per package name P, their corresponding instances of violated(C,P,1) use 1 as default weight.) The \#minimize and \#maximize statements in Line 20 and 21 associate penalties (or rewards) with violations of objectives of the form in a sequence , reflected in  by including ‘’ (where and ). Instances of violated(,P,W) in an answer set, derived via the rules in Line 5–16, are then penalized (or rewarded) with priority  and weight W. Note that summation-based minimization applies (in Line 20) if or maximization (in Line 21) if , while a later position  in  indicates greater significance than preceding ones. For instance, the sequence represented by ‘criterion(change,-1).’ and ‘criterion(remove,-2).’ gives preference to the minimization of and then considers the cardinality of for breaking ties. As already mentioned, maximization