Interactive Learning of Acyclic Conditional Preference Networks

01/11/2018 ∙ by Eisa Alanazi, et al. ∙ 0

Learning of user preferences, as represented by, for example, Conditional Preference Networks (CP-nets), has become a core issue in AI research. Recent studies investigate learning of CP-nets from randomly chosen examples or from membership and equivalence queries. To assess the optimality of learning algorithms as well as to better understand the combinatorial structure of classes of CP-nets, it is helpful to calculate certain learning-theoretic information complexity parameters. This paper determines bounds on or exact values of some of the most central information complexity parameters, namely the VC dimension, the (recursive) teaching dimension, the self-directed learning complexity, and the optimal mistake bound, for classes of acyclic CP-nets. We further provide an algorithm that learns tree-structured CP-nets from membership queries. Using our results on complexity parameters, we assess the optimality of our algorithm as well as that of another query learning algorithm for acyclic CP-nets presented in the literature. Our algorithm is near-optimal, and can, under certain assumptions be adapted to the case when the membership oracle is faulty.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Preference learning has become a major branch of AI, with applications in decision support systems in general and in e-commerce in particular [1]. For instance, recommender systems based on collaborative filtering make predictions on a single user’s preferences by exploiting information about large groups of users. Another example are intelligent tutoring systems, which learn a student’s preferences in order to deliver personalized content to the student.

To design and analyze algorithms for learning preferences of a single user, one needs an abstract model for representing user preferences. Some approaches model preferences quantitatively, thus allowing for expressing the relative magnitude of preferences between object pairs, while others are purely qualitative, expressing partial orders or rankings over objects [2, 3].

Most application domains are of multi-attribute form, meaning that the set of possible alternatives (i.e., objects, or outcomes) is defined on a set of attributes and every alternative corresponds to an assignment of values to the attributes. Such combinatorial domains require compact models to capture the preference information in a structured manner. In recent years, various models have been suggested, such as Generalized Additive Decomposable (GAI-net) utility functions [4], Lexicographic Preference Trees [5], and Conditional Preference Networks (CP-nets) [6].

CP-nets provide a compact qualitative preference representation for multi-attribute domains where the preference of one attribute may depend on the values of other attributes. The study of their learnability [7, 8, 9, 10, 11, 4] is an ongoing topic in research on preference elicitation.

For example, Koriche and Zanuttini [7] investigated query learning of -bounded acyclic CP-nets (i.e., with a bound on the number of attributes on which the preferences for any attribute may depend). Their successful algorithms used both membership and equivalence queries, cf. [12], while they proved that equivalence queries alone are not sufficient for efficient learnability. CP-nets have also been studied in models of passive learning from examples, both for batch learning [8, 9, 10, 4] and for online learning [11, 13].

The focus of our work is on the design of methods for learning CP-nets though interaction with the user, and on an analysis of the complexity of such learning problems. In particular, we study the model of learning from membership queries, in which users are asked for information on their preference between two objects. To the best of our knowledge, algorithms for learning CP-nets from membership queries only have not been studied in the literature yet. We argue below (in Section 7) why algorithms using membership queries alone are of importance to research on preference learning. In a nutshell, membership queries seem to be more easily deployable in preference learning than equivalence queries and, from a theoretical point of view, it is interesting to see how powerful they are in comparison to equivalence queries. The latter alone are known to be insufficient for efficient learning of acyclic CP-nets [7]. Therefore, one major part of this article deals with learning CP-nets from membership queries only. A potential application of our approach is in recommender systems. Many recommender systems request item ratings from users, i.e., they pose queries to the user in the form of an item, and the user is expected to rate the item on a fixed scale. It has been shown though that replacing such queries with queries in which users are asked to express a preference between two items may be a more useful approach in recommender systems [14].

In every formal model of learning, a fundamental question in assessing learning algorithms is how many queries or examples would be needed by the best possible learning algorithm in the given model. For several models, lower bounds can be derived from the Vapnik Chervonenkis dimension (, [15]). This central parameter is one of several that, in addition to yielding bounds on the performance of learning algorithms, provide deep insights into the combinatorial structure of the studied concept class. Such insights can in turn help to design new learning algorithms.

A classical result states that is a lower bound on the number of equivalence and membership queries required for learning a concept class whose equals  [16]. Likewise, it is known that a parameter called the teaching dimension [17] is a lower bound on the number of membership queries required for learning [18]. Therefore, another major part of this article deals with calculating exact values or non-trivial bounds on a number of learning-theoretic complexity parameters, such as the and the teaching dimension. All these complexity parameters are calculated under the assumption that information about user preferences is provided for so-called swaps, exclusively. A swap is a pair of objects that differ in the value of only a single attribute. Learning CP-nets over swap examples is an often studied scenario [7, 8, 19, 20, 13], which we adopt here for various reasons detailed in Section 4.

Our main contributions are the following:

(a) We provide the first study that exactly calculates the for the class of unbounded acyclic CP-nets, and give a lower bound for any bound . So far, the only existing studies present a lower bound [7], which we prove incorrect for large values of , and asymptotic complexities [21]. The latter show that for and when , in agreement with our result that for , and is at least for general values of . It should be noted that both previous studies assume that CP-nets can be incomplete, i.e., for some attributes, preference relations may not be fully specified. In our study, we first investigate the (not uncommon) assumption that CP-nets are complete, but then we extend each of our results to the more general case that includes incomplete CP-nets, as well. Further, some of our results are more general than existing ones in that they cover also the case of CP-nets with multi-valued attributes (as opposed to binary attributes.)

As a byproduct of our study, we obtain that the of the class of all consistent CP-nets (whether acyclic or cyclic)111A consistent CP-net is one that does not prefer an outcome over another outcome while at the same time preferring over . Acyclic CP-nets are always consistent, but cyclic ones are not necessarily so. equals that of the class of all acyclic CP-nets. Hence, the class of acyclic CP-nets is less expressive than that of all consistent CP-nets, but may (at least in some models) be as hard to learn.

(b) We further provide exact values (or, in some cases, non-trivial bounds) for two other important information complexity parameters, namely the teaching dimension [17], and the recursive teaching dimension [22].

(c) AppendixB gives an in-depth study of structural properties of the class of all complete acyclic CP-nets that are of importance to learning-theoretic studies.

(d) We present a new algorithm that learns tree-structured CP-nets (i.e., the case ) from membership queries only and use our results on the teaching dimension to show that our algorithm is close to optimal. We further extend our algorithm to deal with the general case of -bounded acyclic CP-nets with bound .

(e) In most real-world scenarios, one would expect some degree of noise in the responses to membership queries, or that sometimes no response at all is obtained. To address this issue, we demonstrate how, under certain assumptions on the noise and the missing responses, our algorithm for learning tree CP-nets can be adapted to handle incomplete or incorrect answers to membership queries.

(f) We re-assess the degree of optimality of Koriche and Zanuttini’s algorithm for learning bounded acyclic CP-nets, using our result on the .

This article extends a previous conference paper [23]. Theorem 4 in this conference paper included an incorrect claim about the so-called self-directed learning complexity of classes of acyclic CP-nets; the incorrect statement has been removed in this extended version.

2 Related Work

This section sets the problems studied in the present paper into the context of the existing literature, both in terms of methods for learning CP-nets and in terms of studies of learning-theoretic complexity parameters in general.

2.1 Learning CP-Nets

The problem of learning CP-nets has recently gained a substantial amount of attention [8, 24, 9, 7, 21, 10, 11, 25, 26, 27, 4].

Both in active and in passive learning, a sub-problem to be solved by many natural learning algorithms is the so-called consistency problem. This decision problem is defined as follows. A problem instance consists of a CP-net and a set of user preferences between objects, in the form of “object is preferred over object ” or “object is not preferred over object ”. The question to be answered is whether or not is consistent with , i.e., whether the partial order over objects that is induced by satisfies if states that is preferred over , and satisfies if states that is not preferred over . The consistency problem was shown to be NP-hard even if is restricted to be an acyclic -bounded CP-net for some fixed and even when, for any object pair under consideration, the outcomes and differ in the values of at most two attributes [8]. Based on this result, Dimopoulos et al. [8] showed that complete acyclic CP-nets with bounded indegree are not efficiently PAC-learnable, i.e., learnable in polynomial time in the PAC model. The authors, however, then showed that such CP-nets are efficiently PAC-learnable from examples that are drawn exclusively from the set of so-called transparent entailments. Specifically, this implied that complete acyclic -bounded CP-nets are efficiently PAC-learnable from swap examples. Michael and Papageorgiou [26] then provided a comprehensive experimental view on the performance of the algorithm proposed in [8]. Their work also proposed an efficient method for checking whether a given entailment is transparent or not. These studies focus on learning approximations of target CP-nets passively and from randomly chosen data. By comparison, all algorithms we propose below learn target CP-nets exactly and they actively pose queries in order to collect training data, following Angluin’s model of learning from membership queries [12].

Lang and Mengin [9] considered the complexity of learning binary separable CP-nets, in various learning settings.222A CP-net is separable if it is -bounded, i.e., the preferences over the domain of any attribute are not conditioned on the values of other attributes.

The literature also includes results on learning CP-nets from noisy examples, namely, via statistical hypothesis testing

[25]

, using evolutionary algorithms and metaheuristics 

[28, 4], or by learning the induced graph directly, which takes time exponential in the number of attributes [10]. These results cannot be compared to the ones presented in the present paper, as (i) they focus on approximating instead of exactly learning the target CP-net, and (ii) the noise models on which they build differ substantially from the settings we consider. In our first setting, there is no noise in the data whatsoever. The second setting we study is one in which the membership oracle may corrupt a certain number of query responses, but there is no randomness to the process. Instead, one analyzes learning under an adversarial assumption on the oracle’s choice of which answers to corrupt, and then investigates whether exact learning is still possible [29, 30, 31]. To the best of our knowledge, this setting has not been studied in the context of CP-nets so far.

As for active learning, Guerin et al. 

[11]

proposed a heuristic online algorithm that is not limited to swap comparisons. The algorithm assumes the user to provide explicit answers of the form “object

is preferred over object ”, “object is preferred over object ”, or “neither of the two objects is preferred over the other” to any query . Labernia et al. [13] proposed another online learning algorithm based on swap observations where the latter can be noisy. It is assumed that the target CP-net represents the global preference for a group of users and the noise is due to variations of a user’s preference compared to the global one. The authors formally proved that their algorithm produces a close approximation to the target CP-net and analyzed the algorithm empirically under random noise. Again, as in all the related literature discussed above, the most striking difference to our setting is that these works focus on approximating the target CP-net rather than learning it exactly.

To the best of our knowledge, the only studies of learning CP-nets in Angluin’s query model, where the target concept is identified exactly, are one by Koriche and Zanuttini [7] and one by Labernia et al. [19]. Koriche and Zanuttini assumed perfect oracles and investigated the problem of learning complete and incomplete bounded CP-nets from membership and equivalence queries over the swap instance space. They showed that complete acyclic CP-nets are not learnable from equivalence queries alone but are attribute-efficiently learnable from membership and equivalence queries. Attribute-efficiency means that the number of queries required is upper-bounded by a function that is polynomial in the size of the input, but only logarithmic in the number of attributes. In the case of tree CP-nets, their results hold true even when the equivalence queries may return non-swap examples. The setting considered in their work is more general than ours and exhibits the power of membership queries when it comes to learning CP-nets. Labernia et al. [19] investigated the problem of learning an average CP-net from multiple users using equivalence queries alone. However, neither study addresses the problem of learning complete acyclic CP-nets from membership queries alone, whether corrupted or uncorrupted. We provide a detailed comparison of our methods to those by Koriche and Zanuttini in Section 7. In a nutshell, our algorithm improves on theirs in that it does not require equivalence queries, but has the downside of not being attribute-efficient. The latter is not an artefact of our algorithm—we argue in Section 7 why attribute-efficient learning of CP-nets with membership queries alone is not possible.

2.2 Complexity Parameters in Computational Learning Theory

The Vapnik Chervonenkis Dimension, also called VC dimension [15]

, is one of the best studied complexity parameters in the computational learning theory literature. Upper and/or lower sample complexity bounds that are linear in the VC dimension are known for various popular models of concept learning, namely for PAC learning, which is a basic model of learning from randomly chosen examples 

[32], for exact learning from equivalence and membership queries, which is a model of learning from active queries [16], and, in some special cases even for learning from teachers [33, 34].

Because of these bounds, knowledge of the VC dimension value of a concept class can help in assessing learning algorithms for . For example, if the number of queries consumed by an algorithm learning exceeds a known lower bound on the query complexity by a constant factor, we know that the algorithm is within a constant factor of optimal. For this reason, the VC dimension of classes of CP-nets have been studied in the literature. Koriche and Zanuttini [7], for the purpose of analyzing their algorithms for learning from equivalence and membership queries, established a lower bound on the VC dimension of the class of complete and incomplete -bounded binary CP-nets. Chevaleyre et al. [21]

gave asymptotic estimates on the VC dimension of classes of CP-nets. They showed that the VC dimension of acyclic binary CP-nets is

for arbitrary CP-nets and for -bounded CP-nets. Here is the number of attributes in a CP-net. The results by Chevaleyre et al. are in agreement with our results, stating that is for arbitrary acyclic CP-nets and at least for -bounded ones.

Our work improves on both of these contributions. Firstly, we correct a mistake in the lower bound published by Koriche and Zanuttini. Secondly, compared to asymptotic studies by Chevaleyre et al., we calculate exact values or explicit lower bounds on the VC dimension. Thirdly, we calculate the VC dimension also for the case that the attributes in a CP-net have more than two values.

To the best of our knowledge, there are no other studies that address learning-theoretic complexity parameters of CP-nets. The present paper computes two more parameters, namely the teaching dimension [17] and the recursive teaching dimension [22]

, both of which address the complexity of machine teaching. Recently, models of machine teaching have received increased attention in the machine learning community 

[35, 36], since they try to capture the idea of helpfully selected training data as would be expected in many human-centric applications. In our study, teaching complexity parameters are of relevance for two reasons.

First, the teaching dimension is a lower bound on the number of membership queries required for learning [17], und thus a tool for evaluating the efficiency of our learning algorithms relative to the theoretic optimum.

Second, due to the increased interest in machine teaching, the machine learning community is looking for bounds on the efficiency of teaching, in terms of other well-studied parameters. The recursive teaching dimension is the first teaching complexity parameter that was shown to be closely related to the VC dimension. It is known to be at most quadratic in the VC dimension [37], and under certain structural properties even equal to the VC dimension [33]. However, it remains open whether or not it is upper-bounded by a function linear in the VC dimension [34]. With the class of all unbounded acyclic CP-nets, we provide the first example of an “interesting” concept class for which the VC dimension and the recursive teaching dimension are equal, provably without satisfying any of the known structural properties that would imply such equality. Thus, our study of the recursive teaching dimension of classes of CP-nets may be of help to ongoing learning-theoretic studies of teaching complexity in general.

3 Background

This section introduces the terminology and notation used subsequently, and motivates the formal settings studied in the rest of the paper.

3.1 Conditional Preference Networks (CP-nets)

We largely follow the notation introduced by Boutilier et al. [6] in their seminal work on CP-nets; the reader is referred to Table 1 for a list of the most important notation used throughout our manuscript.

notation meaning
number of variables
size of the domain of each variable
upper bound on the number of parents of a variable in a CP-net
set of distinct variables,
variable in , for
domain of variable
for

set of vectors (outcomes over

) that assign each a value in
set of all outcomes over the full variable set ; equal to
outcome over the full variable set , i.e., an element of
outcome is strictly preferred over outcome
projection of outcome onto a set ; is short for
swap pair of outcomes
swapped variable of
and and respectively, where is a swap
projection of (and also of ) onto a set
set of the parent variables of ; note that
conditional preference relation of in the context of , where
conditional preference table of
size (number of preference statements) of ; note
set of edges in a CP-net, where iff
a concept class
a concept in a concept class
instance space over which a concept class is defined
instance space of swap examples (without redundancies)
instance space of swap examples (with redundancies)
label that concept assigns to instance
VC dimension of concept class
teaching dimension of concept class
teaching dimension of concept with respect to concept class
recursive teaching dimension of concept class
class of all complete acyclic -bounded CP-nets, over
class of all complete or incomplete acyclic -bounded CP-nets, over
; maximum number of edges in a CP-net in or
; maximum number of statements in a CP-net in or
smallest possible size of an -universal set
LIM strategy to combat a limited oracle
MAL strategy to combat a malicious oracle
set of all swap instances differing from in exactly one non-swapped variable
Table 1: Summary of notation.

Let be a set of attributes or variables. Each variable has a set of possible values (its domain) . We assume that every domain is of a fixed size , independent of . An assignment to a set of variables is a mapping for every variable to a value from . We denote the set of all assignments of by and remove the subscript when . A preference is an irreflexive, transitive binary relation . For any , we write (resp. ) to denote the fact that is strictly preferred (resp. not preferred) to , where and are incomparable w.r.t.  if both and holds. We use to denote the projection of onto and write instead of .

The CP-net model captures complex qualitative preference statements in a graphical way. Informally, a CP-net is a set of statements of the form which states that the preference over with is conditioned upon the assignment of , where is some permutation over . In particular, when has the value and , is preferred to as a value of ceteris paribus (all other things being equal). That is, for any two outcomes where and the preference holds when i) and ii) for . In such case, we say is preferred to ceteris paribus. Clearly, there could be exponentially many pairs of outcomes (,) that are affected by one such statement.

CP-nets provide a compact representation of preferences over by providing such statements for every variable. For every , the decision maker333This can be any entity in charge of constructing the preference network, i.e., a computer agent, a person, a group of people, etc. chooses a set of parent variables that influence the preference order of . For any , the decision maker may choose to specify a total order over . We refer to as the conditional preference statement of in the context of . A Conditional Preference Table for , , is a set of conditional preference statements .

Definition 1 (CP-net [6]).

Given, , , and for , a CP-net is a directed graph , where, for any , iff .

A CP-net is acyclic if it contains no cycles. We call a CP-net -bounded, for some , if each vertex has indegree at most , i.e., each variable has a parent set of size at most . CP-nets that are -bounded are also called separable; those that are -bounded are called tree CP-nets. When speaking about the class of “unbounded” acyclic CP-nets, we refer to the case when no upper bound is given on the indegree of nodes in a CP-net, other than the trivial bound .

Definition 2.

is said to be complete, if, for every element , the preference relation is defined, i.e., contains a statement that imposes a total order on for every context over the parent variables. By contrast, is incomplete, if there exists some for which the preference relation is empty. Analogously, a CP-net is said to be complete if every it poses is complete; otherwise it is incomplete.

Note that we do not allow strictly partial orders as preference statements; a preference relation in a must either be empty or impose a total order. In the case of binary CP-nets, which is the focus of the majority of the literature on CP-nets, this restriction is irrelevant, since every order on a domain of two elements is either empty or total. In the non-binary case though, the requirement that every statement be either empty or a total order is a proper restriction.

It would be possible to also study non-binary CP-nets that are incomplete in the sense that some statements impose proper partial orders, but this extension is not discussed below.

Lastly, we assume CP-nets are defined in their minimal form, i.e., there is no dummy parent in any that actually does not affect the preference relation.

Example 1.

Figure 0(a) shows a complete acyclic CP-net over with , , . Each variable is annotated with its . For variable , the user prefers to unconditionally. For , the preference depends on the values of and , i.e., . For instance, in the context of , is preferred over . Removing any of the four statements in would result in an incomplete CP-net.

Two outcomes are swap outcomes (‘swaps’ for short) if they differ in the value of exactly one variable ; then is called the swapped variable [6].

The size of a preference table for a variable , denoted by , is the number of preference statements it holds which is if is complete. The size of a CP-net is defined as the sum of its tables’ sizes.444It might seem more adequate to define the size of a to be times the number of its preference statements, as each preference statement consists of pairwise preferences. In the binary case, i.e., when , this makes no difference. As this technical detail does not affect our results, we ignore it and define the size of a and of a CP-net simply by the overall number of its statements.

Example 2.

In Figure 1, are swaps over the swapped variable . The size of the CP-net is .

We will frequently use the notation , which refers to the maximum number of statements in any -bounded acyclic CP-net over variables, each of domain size . Note that a CP-net has to be complete in order to attain this maximum size. It can be verified that .

Lemma 1.

The maximum possible size of a -bounded acyclic CP-net over variables of domain size is given by .

Proof.

We first make the following claim: any -bounded acyclic CP-net of largest possible size has (i) exactly variable of indegree , for any and (ii) exactly variables of indegree .

For , i.e., for separable CP-nets, there is no , so the claim states the existence of exactly vertices of indegree , which is obviously correct. Consider any -bounded acyclic CP-net of largest possible size, where . Since is acyclic, it has a topological sort. W.l.o.g., suppose is the sequence of variables in as they occur in a topological sort. Clearly, must have indegree . If were also of indegree , then would not be of maximal size since one could add as a parent of without violating the indegree bound . The resulting CP-net would be of larger size than , since the size of would grow by a factor of without changing the sizes of other CPTs. Hence has indegree in . With the same argument, one can prove that has indegree in , for . For the variables , one can apply the same argument but has to cap their indegrees at because is -bounded. Hence all have indegree . This establishes the claim.

It remains to count the maximal number of statements in a CP-net of this specific structure. The maximal number of statements for a given CP-net graph is obviously obtained when the CP-net is complete, i.e., when the CPT for any variable has rules. Summing up, we obtain statements for the first variables in the topological sort, plus statements for the remaining variables. ∎

From this lemma, we also know that the maximum possible number of edges in a -bounded acyclic CP-net is . We will use the notation to refer to this quantity.

Definition 3.

For given and , let denote the maximum possible number of edges in a -bounded acyclic CP-net over variables.

Note that .

A

B

C

(a) The CP-net.

(b) The induced preference graph.
Figure 1: An acyclic CP-net (cf. Def. 1) and its induced preference graph (cf. Def. 4).

The semantics of CP-nets is described in terms of improving flips. Let be an assignment of the parents for a variable . Let be the preference order of in the context of . Then, all else being equal, going from to is an improving flip over whenever .

Example 3.

In Figure 0(a), is an improving flip with respect to the variable .

For complete CP-nets, the improving flip notion makes every pair of swap outcomes comparable, i.e., either or holds [6]. The question “is ?” is then a special case of a so-called dominance query and can be answered directly from the preference table of the swapped variable. Let be the swapped variable of a swap . Let be the context of in both and . Then, iff . A general dominance query is of the form: given two outcomes , is ? The answer is yes, iff is preferred to , i.e., there is a sequence of improving flips from to , where , , and is an improving flip for all [6].

Example 4.

In Figure 0(b), , as witnessed by the sequence of improving flips.

Definition 4 (Induced Preference Graph [6]).

The induced preference graph of a CP-net is a directed graph where each vertex represents an outcome . An edge from to exists iff is a swap w.r.t. some and precedes in .

Therefore, a CP-net defines a partial order over that is given by the transitive closure of its induced preference graph. If we say entails . is consistent if there is no with , i.e., if its induced preference graph is acyclic. Acyclic CP-nets are guaranteed to be consistent while such guarantee does not exist for cyclic CP-nets; the consistency of the latter depends on the actual values of the s [6]. Lastly, the complexity of finding the best outcome in an acyclic CP-net has been shown to be linear [6] while the complexity of answering dominance queries depends on the structure of CP-nets: PSPACE-complete for arbitrary (cyclic and acyclic) consistent CP-nets [38] and linear in case of trees [39].

Example 5.

Figure 2 shows an example of a cyclic CP-net that is consistent while Figure 3 shows an inconsistent one. Note that both share the same s except for . The dotted edges in the induced preference graph of Figure 3 represent a cycle.

A

B

C

(a) The network.

(b) The induced preference graph.
Figure 2: An example of a consistent cyclic CP-net.

A

B

C

(a) The network.

(b) The induced preference graph.
Figure 3: An example of an inconsistent cyclic CP-net.

3.2 Concept Learning

The first part of our study is concerned with determining—for the case of acyclic CP-nets—the values of information complexity parameters that are typically studied in computational learning theory. By information complexity, we mean the complexity in terms of the amount of information a learning algorithm needs to identify a CP-net. Examples of such complexity notions will be introduced below.

A specific complexity notion corresponds to a specific formal model of machine learning. Each such learning model assumes that there is an information source that supplies the learning algorithm with information about a hidden target concept . The latter is a member of a concept class, which is simply the class of potential target concepts, and, in the context of this paper, also the class of hypotheses that the learning algorithm can formulate in the attempt to identify the target concept .

Formally, one fixes a finite set , called instance space, which contains all possible instances (i.e., elements of) an underlying domain. A concept is then defined as a mapping from to . Equivalently, can be seen as the set , i.e., a subset of the instance space. A concept class is a set of concepts. Within the scope of our study, the information source (sometimes called oracle), supplies the learning algorithm in some way or another with a set of labeled examples for the target concept . A labeled example for is a pair where and . Under the set interpretation of concepts, this means that if and only if the instance belongs to the concept . A concept is consistent with a set of labeled examples, if and only if for all , i.e., if every element of is an example for .

In practice, a concept is usually encoded by a representation defined based on a representation class [40]. Thus, one usually has some fixed representation class in mind, with a one-to-one correspondence between the concept class and its representation class . We will assume in what follows that the representation class is chosen in a way that minimizes the worst case size of the representation of any concept in . Generally, there may be various interpretations of the term “size;” since we will focus on learning CP-nets, we use CP-nets as representations for concepts, and the size of a representation is simply the size of the corresponding CP-net as defined above.

At the onset of a learning process, both the oracle and the learning algorithm (often called learner for short) agree on the representation class (and thus also on the concept class ), but only the oracle knows the target concept . After some period of communication with the oracle, the learner is required to identify the target concept either exactly or approximately.

Many learning models have been proposed to deal with different learning settings [40, 12, 41, 42]. These models typically differ in the constraints they impose on the oracle and the learning goal. One also distinguishes between learners that actively query the oracle for specific information content and learners that passively receive a set of examples chosen solely by the information source. One of the best known passive learning models is the Probably Approximately Correct (PAC) model [42]

. The PAC model is concerned with finding, with high probability, a close approximation to the target concept

from randomly chosen examples. The examples are assumed to be sampled independently from an unknown distribution. On the other end of the spectrum, a model that requires exact identification of is Angluin’s model for learning from queries [12]. In this model, the learner actively poses queries of a certain type to the oracle.

In this paper, we consider specifically two types of queries introduced by Angluin [12], namely membership queries and equivalence queries. A membership query is specified by an element of the instance space, and it represents the question whether or not contains . The oracle supplies the learner with the correct answer, i.e., it provides the label in response to the membership query for . In an equivalence query, the learner specifies a hypothesis . If , the learning process is completed as the learner has then identified the target concept. If , the learner is provided with a labeled example that witnesses . That means, . Note that in this case can be any element in the symmetric difference of the sets associated with and .

A class over some instance space is learnable from membership and/or equivalence queries via a representation class for , if there is an algorithm such that for every concept , asks polynomially many adaptive membership and/or equivalence queries and then outputs a hypothesis that is equivalent to . By adaptivity, we here mean that learning proceeds in rounds; in every round the learner asks a single query and receives an answer from the oracle before deciding on its subsequent query. The number of queries to be polynomial means that it is upper-bounded by a polynomial in where is the size of the minimal representation of w.r.t. .

The above definition is concerned only with the information or query complexity, i.e., the number of queries required to exactly identify any target concept. Moreover, is said to be efficiently learnable from membership and/or equivalence queries if there exists an algorithm that exactly learns , in the above sense, and runs in time polynomial in . Every one of the query strategies we describe in Section 7 gives an obvious polynomial time algorithm in this regard, and thus we will not explicitly mention run-time efficiency of learning algorithms henceforth.

The combinatorial structure of a concept class has implications on the complexity of learning , in particular on the sample complexity (sometimes called information complexity), which refers to the number of labeled examples the learner needs in order to identify any target concept in the class under the constraints of a given learning model. One of the most important complexity parameters studied in machine learning is the Vapnik-Chervonenkis dimension (VCD). In what follows, let be a concept class over the (finite) instance space .

Definition 5.

[15] A subset is shattered by if the projection of onto has concepts. The VC dimension of , denoted by , is the size of the largest subset of that is shattered by .

For example, if contains 5 elements and is the class of all subsets of that have size at most 3, then the VC dimension of is 3. Clearly, no subset of of size 4 can be shattered by , since no concept in would contains all 4 elements of . That means, one obtains only 15, not the full 16 possible concepts over when projecting onto . However, any subset of size 3 is indeed shattered by , as every subset of is also a concept in .

The number of randomly chosen examples needed to identify concepts from in the PAC-learning model is linear in  [43, 32]. By contrast to learning from random examples, in teaching models, the learner is provided with well-chosen labeled examples.

Definition 6.

[17, 44] A teaching set for a concept with respect to is a set of labeled examples such that is the only concept that satisfies for all . The teaching dimension of with respect to , denoted by , is the size of the smallest teaching set for with respect to . The teaching dimension of , denoted by , is given by .

Consider again the class of all subsets of size at most 3 over a 5-element instance space. Any concept containing 3 instances has a teaching set of size 3 in this class: the three positively labeled examples referring to the elements contained in uniquely determine . However, concepts with fewer than 3 elements do not have teaching sets smaller than 5, since any set consisting of 2 positive and 2 negative examples agrees with at least two different concepts in , and so does every set of 1 positive and 3 negative examples and every set of 4 negative examples.

denotes the smallest TD of any . In the class , the value for is 5, while the value for is 3.

A well-studied variation of teaching is called recursive teaching. Its complexity parameter, the recursive teaching dimension, is defined by recursively removing from all the concepts with the smallest TD and then taking the maximum over the smallest TDs encountered in that process. For the corresponding definition of teachers, see [22].

Definition 7.

[22] Let and, for all such that , define . The recursive teaching dimension of , denoted by , is defined by .

As an example, consider to be the class of singletons defined over the instance space where and let , where is the empty concept, i.e., for all . Table 2 displays this class along with for every . Since distinguishing the concept from all other concepts in requires labeled examples, one obtains . However, as witnessed by (each concept in can be taught with a single example) and (the remaining concept has a teaching dimension of with respect to the class containing only ). Note that also , since there is no set of two examples that is shattered by .

Similarly, one can verify that .

0 0 0 0 0
1 0 0 0 0 1
0 1 0 0 0 1
0 0 1 0 0 1
0 0 0 0 1 1
Table 2: The class of all singletons and the empty concept over a set of instances, along with the teaching dimension value of each individual concept.

As opposed to the TD, the RTD exhibits interesting relationships to the VCD. For example, if is a maximum class, i.e., its size meets Sauer’s upper bound [45], and in addition can be “corner-peeled”555Corner peeling is a sample compression procedure introduced by Rubinstein and Rubinstein [46]; the actual algorithm or its purpose are not of immediate relevance to our paper., then fulfills  [33]. The same equality holds if is intersection-closed or has VCD 1 [33]. In general, the RTD is upper-bounded by a function quadratic in the VCD [37].

4 Representing CP-Nets as Concepts

Assuming a user’s preferences are captured in a target CP-net , an interesting learning problem is to identify from a set of observations representing the user’s preferences, i.e., labeled examples, of the form or where is the relation induced by  [8, 7]. In order to study the complexity of learning CP-nets, we model a class of CP-nets as a concept class over a fixed instance space.

The first issue to address is how to define the instance space. A natural approach would be to consider any pair of outcomes as an instance. Such instance would be contained in the concept corresponding to if and only if entails . In our study however, we restrict the instance space to the set of all swaps.

On the one hand, note that our results, due to the restriction to swaps, do not apply to scenarios in which preferences are elicited over arbitrary outcome pairs, as is likely the case in many real-world applications.

On the other hand, for various reasons, the restriction to swaps is still of both theoretical and practical interest and therefore justified. First, CP-net semantics are completely determined by the preference relation over swaps, so that no information on the preference order is lost by restricting to swaps. In particular, the set of all swaps is the most compact instance space for representing the class of all CP-nets or the class of all acyclic CP-nets. Second, many studies in the literature address learning CP-nets from information on swaps, see [7, 8, 19, 20, 13], so that our results can be compared to existing ones on the swap instance space. Third, in the learning models that we consider (teaching and learning from membership queries) learning becomes harder when restricting the instance space. For example, a learning algorithm may potentially succeed faster when it is allowed to enquire about preferences over any pair of outcomes rather than just swaps. Since our study restricts the information presented to the learner to preferences over swaps, our complexity results thus serve as upper bounds on the complexity of learning in more relaxed settings. Fourth, for the design of learning methods, it is often desirable that the learner can check whether its hypothesis (in our case a CP-net) is consistent with the labeled examples obtained. It is known that the time complexity of checking whether a CP-net is consistent with an example is linear in the case of swap examples but NP-hard for non-swap examples [6]. Fifth, there are indeed potential application scenarios in which the information presented to a learner may be in the form of preferences over swaps. This is due to the intuition that in many cases preferences over swaps would be much easier to elicit than preferences over two arbitrary outcomes. For example, a user may be overwhelmed with the question whether to prefer one laptop over another if each of them has a nice feature that the other does not have. It is likely much easier for the user to express a preference over two laptops that are identical except in a single feature like their color.

One may argue that one of the parameters that we study, namely the VC dimension, should be computed over arbitrary instances rather than just swap instances, since the VC dimension captures how difficult a concept class is to learn when the choice of instances is out of the learner’s (or teacher’s) control. However, for the following two reasons, computing the VC dimension over swap instances is of importance to our study:

  • We use our calculations on the VC dimension in order to assess the optimality of one of Koriche and Zanuttini’s algorithms [7] for learning acyclic CP-nets with nodes of bounded indegree from equivalence and membership queries. It is well-known that is a lower bound on the number of membership and equivalence queries required for learning a concept class  [16]. Since Koriche and Zanuttini’s algorithm that we assess is designed over the swap instance space, an optimality assessment using the VC dimension necessarily requires that the VC dimension be computed over the swap instance space as well.

  • Although the VC dimension is best known for characterizing the sample complexity of learning from randomly chosen examples, namely in the model of PAC learning, existing results exhibit a broader scope of applicability of the VC dimension. Recently it was shown that the number of examples needed for learning from benevolent teachers can be upper-bounded by a function quadratic in the VC dimension [37]. When studying the number of swap examples required for teaching CP-nets, thus again also the VC dimension over swap examples becomes interesting.

We therefore consider the set is a swap as an instance space. The size of this instance space is : every variable has different assignments of the other variables and fixing each assignment of these we have instances. For complete acyclic CP-nets, however, half of these instances are redundant as if then we know for certain that , and vice versa. By contrast, in the case of incomplete CP-nets, does not necessarily mean as there could be no relation between the two outcomes, i.e., and are incomparable, corresponding to both and .

Consequently, the choice of instance space in our study will be as follows:

  • Whenever we study classes of CP-nets that contain incomplete CP-nets, we use the instance space is a swap.

  • Whenever we study classes of only complete CP-nets, we use an instance space that includes, for any two swap outcomes , exactly one of the two pairs .666All our results are independent on the mechanism choosing which of two pairs to include in . We assume that the selection is prescribed in some arbitrary fashion. Note that . When we say that a learning algorithm, either passively or through active queries, is given information on the label of a swap under the target concept, we implicitly refer to information on either or , depending on which of the two is actually contained in .

We sometimes refer to as the set of all swaps without “redundancies”, since for the case of complete CP-nets half the instances in are redundant. Of course, for incomplete CP-nets they are not redundant.

For , let denote the swapped variable of . We refer to the first and second outcomes of an example as and , respectively. We use to denote the assignments (in both and ) of . Note that is guaranteed to be the same in and , otherwise will not form a swap instance.

Now if is any CP-net and induces the (partial) order over outcomes, then corresponds to a concept over (over , respectively), where is defined as follows, for any (any , respectively.)

In such case, we say that is represented by . Since no two distinct CP-nets induce exactly the same set of swap entailments, a concept over the instance space cannot be represented by more than one CP-net, and a concept over the instance space cannot be represented by more than one complete CP-net. Therefore, in the context of a specific instance space, we identify a CP-net with the concept it represents and use these two notions interchangeably.

Consequently, we say that a concept contains a swap pair iff the CP-net representing entails . By , we refer to the size of the CP-net that represents .

Table 3 shows two concepts and that correspond to the complete CP-nets shown in Figures 1 and 2, respectively, along with one choice of . It is important to restate the fact that is actually a dominance relation between and , i.e., is mapped to (resp. to ) if (resp. ) holds. Thus, we sometimes talk about the value of in terms of the relation between and ( or ).

max width= 1 1 1 1 1 1 0 0 1 0 0 1 1 1 1 1 1 1 0 1 1 0 0 0

Table 3: The concepts and represent the CP-nets in Figures 1 and 2, respectively, over .

In the remainder of this article, we fix , , , and consider the following two concept classes:

  • The class of all complete acyclic -bounded CP-nets over variables of domain size . This class is represented over the instance space .

  • The class of all complete and all incomplete acyclic -bounded CP-nets over variables of domain size . This class is represented over the instance space .

5 The Complexity of Learning Complete Acyclic CP-Nets

In this section, we will study the information complexity parameters introduced above, for the class of all complete acyclic -bounded CP-nets over variables of domain size . In Section 6, we will extend our results on the VC dimension and the teaching dimension also to the class of all complete and incomplete acyclic -bounded CP-nets. It turns out though that studying the complete case first is easier.

Table 4 summarizes our complexity results for complete acyclic -bounded CP-nets. The two extreme cases are unbounded acyclic CP-nets () and separable CP-nets ().

To define the value used in this table, we will first introduce the notion of -universal set, which is typically used in combinatorics, cf. [47, 48].

Definition 8.

Let be a set of binary vectors of length and let . The set is called -universal if, for every set with , the projection

of