Partial Queries for Constraint Acquisition

03/14/2020 ∙ by Christian Bessiere, et al. ∙ 1

Learning constraint networks is known to require a number of membership queries exponential in the number of variables. In this paper, we learn constraint networks by asking the user partial queries. That is, we ask the user to classify assignments to subsets of the variables as positive or negative. We provide an algorithm, called QUACQ, that, given a negative example, focuses onto a constraint of the target network in a number of queries logarithmic in the size of the example. The whole constraint network can then be learned with a polynomial number of partial queries. We give information theoretic lower bounds for learning some simple classes of constraint networks and show that our generic algorithm is optimal in some cases.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Constraint programming (CP) has been more and more used to solve combinatorial problems in industrial applications. One of the strengths of CP is that it is declarative, which means that the user specifies the problem as a CP model, and the solver finds solutions. However, it appears that specifying the CP model is not that easy for non-specialists. Hence, the modeling phase constitutes a major bottleneck in the use of CP. Several techniques have been proposed to tackle this bottleneck. For example, the matchmaker agent [7] interactively asks the user to provide one of the constraints of the target problem each time the system proposes an incorrect solution. In Conacq.1 [5], the user provides examples of solutions and non-solutions. Based on these examples, the system learns a set of constraints that correctly classifies all examples given so far. This is a form of passive learning. In [9]

, a system based on inductive logic programming uses background knowledge on the structure of the problem to learn a representation of the problem correctly classifying the examples. A last passive learner is

ModelSeeker [3]. Positive examples are provided by the user to the system, which arranges each of them as a matrix and identifies constraints in the global constraints catalog ([2]) that are satisfied by particular subsets of variables in all the examples. Such particular subsets are for instance rows, columns, diagonals, etc. An efficient ranking technique combined with a representation of solutions as matrices allows ModelSeeker to find quickly a good model when a problem has an underlying matrix structure.

By contrast, in an active learner like Conacq.2 [5], the system proposes examples to the user to classify as solutions or non solutions. Such questions are called membership queries [1]

. Such active learning has several advantages. It can decrease the number of examples necessary to converge to the target set of constraints. Another advantage is that the user needs not be a human. It might be a previous system developed to solve the problem. For instance, the Normind company has hired a constraint programming specialist to transform their expert system for detecting failures in electric circuits in Airbus airplanes into a constraint model in order to make it more efficient and easier to maintain. As another example, active learning is used to build a constraint model that encodes non-atomic actions of a robot (e.g., catch a ball) by asking queries of the simulator of the robot in

[10]. Such active learning introduces two computational challenges. First, how does the system generate a useful query? Second, how many queries are needed for the system to converge to the target set of constraints? It has been shown that the number of membership queries required to converge to the target set of constraints can be exponentially large [5].

In this paper, we propose QuAcq (for QuickAcquisition), an active learner that asks the user to classify partial queries. Given a negative example, QuAcq is able to learn a constraint of the target constraint network in a number of queries logarithmic in the number of variables. In fact, we identify information theoretic lower bounds on the complexity of learning constraint networks which show that QuAcq is optimal on some simple languages.

One application for QuAcq would be to learn a general purpose model. In constraint programming, a distinction is made between model and data. For example, in a sudoku puzzle, the model contains generic constraints like each subsquare contains a permutation of the numbers. The data, on the other hand, gives the pre-filled squares for a specific puzzle. As a second example, in a time-tabling problem, the model specifies generic constraints like no teacher can teach multiple classes at the same time. The data, on the other hand, specifies particular room sizes, and teacher availability for a particular time-tabling problem instance. The cost of learning the model can then be amortized over the lifetime of the model. Another advantage of this approach is that it provides less of a burden on the user. First, it often converges quicker than other methods. Second, partial queries will be easier to answer than complete queries. Third, as opposed to existing techniques, the user does not need to give positive examples. This might be useful if the problem has not yet been solved, so there are no examples of past solutions.

The rest of the paper is organized as follows. Section 2 gives the necessary definitions to understand the technical presentation. Section 3 presents QuAcq, the algorithm that learns constraint networks by asking partial queries. In Section 4, we show how QuAcq behaves on some simple languages. Section 5 concludes the paper and gives some directions for future research.

2 Background

The learner and the user need to share some common knowledge to communicate. We suppose this common knowledge, called the vocabulary, is a (finite) set of variables and a domain , where is the finite set of values for . A constraint is defined by the sequence of variables , called the constraint scope, and the relation over specifying which sequences of values are allowed for the variables . A constraint network (or simply network) is a set of constraints on the vocabulary . An assignment , where , is called a partial assignment when and a complete assignment when . An assignment on a set of variables is rejected by a constraint (or violates ) if and the projection of on the variables in is not in . If does not violate , it satisfies it. An assignment on is accepted by if and only if it satisfies all constraint in . An assignment on that is accepted by is a solution of . We write for the set of solutions of , and for the set of constraints from whose scope is included in .

In addition to the vocabulary, the learner owns a language

of bounded arity relations from which it can build constraints on specified sets of variables. Adapting terms from machine learning, the

constraint basis, denoted by , is a set of constraints built from the constraint language on the vocabulary from which the learner builds a constraint network.

The target network is a network such that and for any example , is a solution of if and only if is a solution of the problem that the user has in mind. A membership query takes as input a complete assignment in and asks the user to classify it. The answer to is yes if and only if . A partial query , with , takes as input a partial assignment in and asks the user to classify it. The answer to is yes if and only if does not violate any constraint in . It is important to observe that ”=yes ” does not mean that extends to a solution of , which would put an NP-complete problem on the shoulders of the user. For any assignment on , denotes the set of all constraints in rejecting . A classified assignment is called positive or negative example depending on whether is yes or no.

We now define convergence, which is the constraint acquisition problem we are interested in. Given a set of (partial) examples labeled by the user yes or no, we say that a network agrees with if accepts all examples labeled yes in and does not accept those labeled no. The learning process has converged on the network if agrees with and for every other network agreeing with , we have . We are thus guaranteed that .

In the algorithms presented in the rest of the paper we will use the join operation, denoted by . Given two sets of constraints and , the join of with is the set of non-empty constraints obtained by pairwise conjunction of a constraint in with a constraint in . That is, . A constraint belonging to the basis will sometimes be called elementary constraint in contrast to a constraint composed of the conjunction of several elementary constraints, which will be called conjunction, or simply constraint. A conjunction will also sometimes be referred to as a set of elementary constraints. Given a set of conjunctions, we will use the notation to refer to the subset of containing only the conjunctions composed of at most elementary constraints. Finally, a normalized network is a network that does not contain several elementary constraints with the same scope.

3 Constraint Acquisition with Partial Queries

We propose QuAcq, a novel active learning algorithm. QuAcq takes as input a basis on a vocabulary . It asks partial queries of the user until it has converged on a constraint network equivalent to the target network . When a query is answered yes, constraints rejecting it are removed from . When a query is answered no, QuAcq enters a loop (functions FindScope and FindC) that will end by the addition of a constraint to .

3.1 Description of QuAcq

QuAcq (see Algorithm 1) initializes the network it will learn to the empty set (line 1). In line 1, QuAcq calls function GenerateExample that computes an assignment on a subset of variables satisfying the constraints of that have a scope included in , but violating at least one constraint from .111For this task, the constraint solver needs to be able to express the negation of the constraints in . This is not a problem as we have only bounded arity constraints in . We will see later that there are multiple ways to design function GenerateExample depending on the time we are ready to devote to its computation. In particular, while looking for a pair , if GenerateExample finds a set such that , then it can safely remove from all constraints in . If there does not exist any pair accepted by and rejected by , then all constraints in are implied by , and we have converged (line 1). If we have not converged, we propose the example to the user, who will answer by yes or no (line 1). If the answer is yes, we can remove from the set of all constraints in that reject (line 1). If the answer is no, we are sure that violates at least one constraint of the target network . We then call the function FindScope to discover the scope of one of these violated constraints, and the procedure FindC will learn (that is, put in ) at least one constraint of whose scope is in (line 1).

In : A basis
Out : A learned network
1 begin
2       ;
3       while true do
4             ;
5             if  then  return “convergence on ’’;
6             if   then
7                   ;
8                  
9             else  ;
10            
11      
12
Algorithm 1 QuAcq: asks partial queries and returns a network

The recursive function FindScope (see Algorithm 2) takes as parameters an example and two sets and of variables. An invariant of FindScope is that violates at least one constraint whose scope is a subset of . A second invariant is that FindScope always returns a subset of that is also the subset of the scope of a constraint violated by . If there is at least one constraint in rejecting (i.e., , line 2), we ask the user whether is positive or not (line 2). If the answer is yes, we can remove all the constraints that reject from . If the answer is no, we are sure that itself contains the scope of a constraint of rejecting . As is not needed to cover that scope, we return the empty set (line 2). We reach line 2 only in case does not violate any constraint. We know that violates a constraint. Hence, if is a singleton, the variable it contains necessarily belongs to the scope of a constraint that violates . The function returns . If none of the return conditions are satisfied, the set is split in two balanced parts and (line 2) and we apply a technique similar to QuickXplain ([8]) to elucidate the variables of a constraint violating in a logarithmic number of steps (lines 2 and 2). In the first recursive call, if does not contain any scope of constraint rejecting , FindScope returns a subset of such a scope such that and . In the second recursive call, the variables returned in are added to . if does not contain any scope of constraint rejecting , FindScope returns a subset of such a scope such that and . The rationale of lines 2 and 2 is to avoid entering a recursive call to FindScope when we know the answer to the query in line 2 of that call will necessarily be no. It happens when all the constraints rejecting have a scope included in the set of variables that will be inside that call (that is, for the call in line 2, and union the output of line 2 for the call in line 2). Finally, line 2 of FindScope returns the union of the two subsets of variables returned by the two recursive calls, as we know they all belong to the same scope of a constraint of rejecting .

In : An example ; Two scopes
Out : The scope of a constraint in
1 begin
2       if   then
3             if  then  ;
4             else return ;
5            
6      if  then return Y;
7       split into such that ;
8       if  then ;
9       else ;
10       if  then ;
11       else ;
12       ;
13      
14
Algorithm 2 Function FindScope

The function FindC (see Algorithm 3) takes as parameter and , being the negative example that led FindScope to find that there is a constraint from the target network over the scope . The set is initialized to all candidate constraints, that is, all constraints from with scope exactly (line 3). As we know from FindScope that there will be a constraint with scope in , we join with the set of constraints of scope rejecting (line 3). In line 3, an example is chosen in such a way that contains both constraints satisfied by and constraints violated by . If no such example exists (line 3), this means that all constraints in are equivalent wrt . Any of them is added to (line 3). If a suitable example was found, it is proposed to the user for classification (line 3). If is classified positive, all constraints rejecting it are removed from and (line 3). Otherwise we call FindScope to seek constraints with scope strictly included in that violate (line 3). If FindScope returns the scope of such a constraint, we recursively call FindC to find that smaller arity constraint before the one having scope (line 3). If FindScope has not found such a scope (that is, it returned itself), we do the same join as in line 3 to keep in only constraints rejecting the example (line 3). Then, we continue the loop of line 3.

In : An example ; A scope
In Out : The network
1 begin
2       ;
3       ;
4       ;
5       while true do
6             choose in and , minimizing such that if possible, otherwise ;
7             if  then
8                   pick in with ;
9                   ; return ;
10             else
11                   if  then  ;
12                  else
13                         ;
14                         if  then  ;
15                         else  ;
16                        
17                  
18            
19      
20
Algorithm 3 Procedure FindC

3.2 Example

call ASK return
0
1 yes
1.1 no
1.2
2
2.1 yes
Table 1: FindScope on the example

We illustrate the behavior of QuAcq on a simple example. Consider the set of variables with domains , a language , a basis , and a target network . Suppose the first example generated in line 1 of QuAcq is . The trace of the execution of is displayed in Table 1. Each line corresponds to a call to FindScope. Queries are always on the variables in . ’’ in the column means that the question is skipped because . This happens when is empty or because a (positive) query has already been asked on and has been emptied. The initial call (call-0 in Table 1) does not ask the question because . is split in two sets and . and are different, so the recursive call-1 is performed with and . As is classified as positive, line 2 of FindScope removes the constraints and from . A new split of leads to the call-1.1 with and because and are different. As is negative, the empty set is returned in line 2. Call-1.2 (line 2) is performed with and . It merely detects that is a singleton and returns (line 2). Call-1 finishes by returning one level above in the recursion (line 2). As and are different, we go to call-2, which does not ask because is empty. It goes down to call-2.1 with and because and are different ( is in the difference). In call-2.1, is classified positive. FindScope thus removes constraints and from and returns the singleton . As , we know that there is no possible constraint between and , which means that is the scope of a constraint rejecting . Thus, call-2.2 is skipped and is added to in line 2. As a result, call-2 returns , and call-0 returns . Once the scope is returned, FindC returns and prunes from . Suppose the next example generated by QuAcq is . FindScope will find the scope and FindC will return in a way similar to the processing of . The constraints are removed from by a partial positive query on and by FindC. Finally, examples and , both positive, will prune and from respectively, leading to convergence.

3.3 Analysis

We first show that QuAcq is a correct algorithm to learn a constraint network equivalent to a target network that can be specified within a given basis. We prove that QuAcq is sound, complete, and terminates.

Proposition 1 (Soundness)

Given a basis and a target network , the network returned by QuAcq is such that .

Proof. Suppose there exists . Hence there exists a scope on which has learned a conjunction of constraints rejecting tuples that are accepted by . Let us consider the first such conjunction learned by QuAcq. The only place where we add a conjunction of constraints to is line 3 of FindC. This conjunction has been built by join operations in lines 3 and 3 of FindC. By construction of FindScope, is rejected by a constraint of scope in and by none of the constraints on subscopes of in when the join operation of line 3 of FindC is executed. By construction of FindC, the join operations of line 3 of FindC are executed for and only for generated in this call to FindC that are rejected by a constraint of scope in and by none of the constraints on subscopes of . As a result, contains all minimal conjunctions of constraints from that reject and all generated in this call to FindC that are rejected by a constraint of scope in and by none of the constraints on subscopes of . Thus, at least one conjunction of constraints from must be put in . When in line 3 we put one of these conjunctions in , they are all equivalent wrt because line 3 could not produce an example violating some conjunctions from and satisfying the others. As by assumption scope is the first scope on which QuAcq learns a wrong conjunction of constraints, we deduce that all conjunctions in are equivalent wrt to . Therefore, adding one of them to cannot reject a tuple accepted by .

Proposition 2 (Completeness)

Given a basis and a target network , the network returned by QuAcq is such that .

Proof. Suppose there exists when QuAcq terminates. Hence, there exists an elementary constraint in that rejects , and belongs to , the conjunction of the constraints in with same scope as . The only way for QuAcq to terminate is line 1 of QuAcq. This means that in line 1, GenerateExample was not able to generate an example accepted by and rejected by . Thus, is not in when QuAcq terminates, otherwise the projection of on any containing would have been such an example. We know that , so was in before starting QuAcq. Constraints can be removed from in lines 1 and 1 of QuAcq, line 2 of FindScope, and line 3 of FindC. In line 1 of QuAcq, a constraint can be removed from by GenerateExample if there is a set on which no can be accepted by and violate . This removed constraint cannot be because violates and is accepted by for any . In line 1 of QuAcq and line 2 of FindScope, a constraint is removed from because it rejects a positive example. This removed constraint cannot be because belongs to , so it cannot reject a positive example. In line 3 of FindC, all (elementary) constraints with scope are moved from to . Let us see what happens to these constraints. Given an elementary constraint with scope that was moved from to , either is still appearing in one conjunction of when FindC terminates, or not. If is in one of the conjunctions of , this means that after the execution of line 3, the only line where FindC can terminate. Thus, cannot be because by assumption rejects , which itself is accepted by . If is not in any of the conjunctions of when FindC terminates, these conjunctions must have been removed in line 3 or in line 3, the two places where is modified. Line 3 is executed after a negative query . If rejects , all the conjunctions containing it remain in . If accepts , there necessarily exists a conjunction in which is a subset of the conjunction because QuAcq is sound (Proposition 1). is joined with this subset. Thus, contains a conjunction composed of and a subset of . Each time a negative example will be generated, this subset will either stay in or be joined with another subset of . As a result, line 3 cannot remove all conjunctions composed of and a subset of . These conjunctions must then have been removed in line 3 because they were rejecting the positive example generated in line 3. These conjunctions can be removed only if rejects because the rest of the conjunction is a subset of . Again cannot be because cannot reject positive examples. Therefore, cannot reject an example accepted by , which proves that .

Proposition 3 (Termination)

Given a basis and a target network , QuAcq terminates.

Proof. Each execution of the loop in line 1 of QuAcq executes either line 1 of QuAcq or line 3 of FindC. By construction of in line 1 of QuAcq we know that is not empty. Hence, in line 1 of QuAcq strictly decreases in size. By definition of FindScope, the scope returned by FindScope is such that there exists a constraint in rejecting . Thus, is not empty. As a result, in line 3 of FindC, strictly decreases in size. Therefore, at each execution of the loop in line 1 of QuAcq, strictly decreases in size. As has finite size, we have termination.

Theorem 1 (Correctness)

Given a basis and a target network , QuAcq returns a network such that .

Proof. Correctness immediately follows from Propositions 1, 2, and 3.

We analyze the complexity of QuAcq in terms of the number of queries it can ask of the user. Queries are proposed to the user in line 1 of QuAcq, line 2 of FindScope and line 3 of FindC.

Proposition 4

Given a vocabulary , a basis , a target network , and an example rejected by , FindScope uses queries to return the scope of one of the constraints of violated by .

Proof. Let us first consider a version of FindScope that would execute lines 2 and 2 unconditionally. That is, a version without the tests in lines 2 and 2. FindScope is a recursive algorithm that asks at most one query per call (line 2). Hence, the number of queries is bounded above by the number of nodes of the tree of recursive calls to FindScope. We show that a leaf node is either on a branch that leads to the elucidation of a variable in the scope that will be returned, or is a child of a node of such a branch. By construction of FindScope, we observe that no answers to the query in line 2 always occur in leaf calls and that the only way for a leaf call to return the empty set is to have received a no answer to its query (line 2). Let be the values of the parameters and for a leaf call with a no answer, and be the values of the parameters and for its parent call in the recursive tree. We know that because the parent call necessarily received a yes answer. Furthermore, from the no answer to the query , we know that . Consider first the case where the leaf is the left child of the parent node. By construction, . As a result, intersects , and the parent node is on a branch that leads to the elucidation of a variable in . Consider now the case where the leaf is the right child of the parent node. As we are on a leaf, if the test of line 2 is false (i.e., ), we necessarily exit from FindScope through line 2, which means that this node is the end of a branch leading to a variable in . If the test of line 2 is true (i.e., ), we are guaranteed that the left child of the parent node returned a non-empty set, otherwise would be equal to and we know that has been emptied in line 2 as it received a yes answer. Thus, the parent node is on a branch to a leaf that elucidates a variable in .

We have proved that every leaf is either on a branch that elucidates a variable in or is a child of a node on such a branch. Hence the number of nodes in the tree is at most twice the number of nodes in branches that lead to the elucidation of a variable from . Branches can be at most long. Therefore the total number of queries FindScope can ask is at most , which is in .

Let us come back to the complete version of FindScope, where lines 2 and 2 are active. The purpose of lines 2 and 2 is only to avoid useless calls to FindScope that would return anyway. These lines do not affect anything else in the algorithm. Hence, by adding lines 2 and 2, we can only decrease the number of recursive calls to FindScope. As a rsult, we cannot increase the number of queries.

Theorem 2

Given a basis of bounded arity constraints, and a target network , QuAcq uses queries to find the target network and queries to prove convergence, where and are respectively the number of variables and the number of constraints of the target network, and is the size of the basis.

Proof. Each time line 1 of QuAcq classifies an example as negative, the scope of a constraint from is found in queries (Proposition 4). As the basis only contains constraints of bounded arity, is found in queries. Finding with FindC has a number of queries in because the size of does not depend on the size of the network to learn. Hence, the number of queries necessary for finding is in , that is, . Convergence is obtained once is wiped out of all its constraints or those remaining are implied by the learned network . Each time an example is classified positive in line 1 of QuAcq or line 2 of FindScope, this leads to at least one constraint removal from because, by construction of QuAcq and FindScope, this example violates at least one constraint from . Concerning queries asked in FindC, their number is in at each call to FindC, and there are no more calls to FindC than constraints in because FindC always adds at least one constraint to during its execution (line 3). This gives a total number of queries required for convergence that is bounded above by the size of , that is, .

The complexities stated in Theorem 2 are based on the size of the target network and size of the basis. The size of the language is not considered because it has a fixed size, independent on the number of variables in the target network. Nevertheless, we can wonder how the size of impacts the efficiency of FindC.

Proposition 5

Given a basis , a target network , and a scope , the number of queries required by FindC to learn a subset of equivalent to the conjunction of constraints of with scope in is in , where is the smallest such conjunction and .

Proof. We first compute the number of queries required to generate in , and then the number of queries required to remove all conjunctions of constraints not equivalent to from .

Let us first prove that line 3 of FindC will not stop generating examples before is one of the conjunctions in . Let us take as induction hypothesis that when entering a new execution of the loop in line 3, if is not in , then the set of the conjunctions in that are included in covers the whole set of elementary constraints from . That is, . The only way to modify is to ask a query . If is positive, this means that is satisfied and all its subsets remain in . If is negative, either this is due to a constraint of on a subscope of or not. If it is due to a constraint on a subscope, line 3 is executed and not line 3, so remains unchanged. If it is not due to a constraint on a subscope, this guarantees that at least one elementary constraint of is violated, and according to our induction hypothesis, at least one subset of , call it , is in . Hence, line 3 generates a conjunction of with each of the other subsets of that are in . As a result, every elementary constraint in belongs to at least one of these conjunctions with , which are uniquely composed of elementary constraints from . Furthermore, before entering the loop in line 3, by construction, all elementary constraints composing are in . As a consequence, our induction hypothesis is true. We prove now that as long as is not in , line 3 is always able to generate a query . By definition, we know that is the smallest conjunction equivalent to the constraint of with scope . Thus, no subset of can be implied by any other subset of . This guarantees that there exists an example such that one subset of is in and another subset, , is in . is a valid query to be generated in line 3 and to be asked in line 3. As a consequence, we cannot exit FindC as long as is not in .

We now prove that is in after a number of queries linear in . We first count the number of positive queries. Thanks to the condition in line 3 of FindC, we know that at least one elementary constraint of is violated by the query. Thus, all the conjunctions containing are removed from in line 3, and no conjunction containing will be able to come again in . As a result, the number of positive queries is bounded above by . Let us now count the number of negative queries. A query can be negative because of a constraint on a subscope of or because of . If because of a subscope we do not count it in the cost of learning . If because of , line 3 generates a conjunction of with each of the other subsets of that are in . Before the joining operation, either is included in the largest subset or not. If is included in , then also belongs to and it produces a larger subset by joining with any other non-included subset of . If is not included in , they are necessarily joined together, generating again a subset strictly larger than . Thus, the number of queries that are negative because of is bounded above by . Therefore, the number of queries necessary to have in is in .

Once has been generated, it will remain in until the end of this call to FindC because it can be removed neither by a positive query (it would not be in ) nor by a negative (it is either in the or a subconstraint is found and is not modified).

We now show that the number of queries required to remove all conjunctions of constraints not equivalent to from is in . We first have to prove that once a conjunction has been removed from , it will never come back in by some join operation. The conjunction can come back in if and only if there exist and in such that . If was removed due to a positive query , then was in and then, either or was in too. Thus, or has been removed from at the same time as , which contradicts the assumption that came back due to the join of and . If was removed due to a negative query , then was not in and then, none of and were in . and have thus both been joined with other elements of and have disappeared from at the same time as . This again contradicts the assumption.

We are now ready to show that all conjunctions not equivalent to are removed from in queries. For that, we first prove that all conjunctions not implied by are removed from in queries. As long as there exists a conjunction in such that , line 3 can generate a query with because if cannot be satisfied for any , there necessarily exists an with and because can always be satisfied ( is not inconsistent) and is not implied by . As a result, line 3 can never return a query with if there exists in such that . Suppose first that . By construction of , we know that at least one elementary constraint of is violated by . Thus, all the conjunctions containing are removed from and the number of positive queries is bounded above by . Suppose now that . By construction of and because is violated, we know that is not empty for some , and all these conjunctions in disappear from in line 3 because they are joined with other conjunctions of . Hence, the number of negative queries is bounded above by the number of possible conjunctions in , which is in .

Once all the conjunctions not implied by have been removed from , only contains and conjunctions included in the set of elementary constraints implied by . We show that removing from all conjunctions implied by is performed in queries. As all conjunctions remaining in are implied by , all queries will be negative. By construction of such a negative query , we know that is not empty. All these conjunctions in disappear from in line 3 because they are joined with other conjunctions of . Thus, each query removes at least one element from , which is a subset of . As a result, the number of such queries is in .

Corollary 1

Given a basis , a target network , and a scope such that contains a constraint equivalent to the conjunction of constraints of with scope and there does not exist any in such that , FindC returns in queries, which is included in .

The good news brought by Corollary 1 are that despite the join operation required in FindC to deal with non-normalized networks, QuAcq is linear in the size of the language when does not contain constraints subsuming others and the target network is normalized.

4 Learning Simple Languages

In order to gain a theoretical insight into the “efficiency” of QuAcq, we look at some simple languages, and analyze the number of queries required to learn networks on these languages. In some cases, we show that QuAcq will learn problems of a given language with an asymptotically optimal number of queries. However, for some other languages, a suboptimal number of queries can be necessary in the worst case. Our analysis assumes that when generating a complete example in line 1 of QuAcq, the solution of maximizing the number of violated constraints in the basis is chosen.

4.1 Languages for which QuAcq is optimal

Theorem 3

QuAcq learns networks on the language in a number of queries in , which is asymptotically optimal.

Proof. First, we give a lower bound to the number of queries required to learn a constraint network in this language. In an instance of this language, all variables of a connected component must be equal. This is isomorphic to the set of partitions of objects, whose size is given by Bell’s Number:

(1)

By an information theoretic argument, at least queries are required to learn such a problem. This entails a lower bound of because (see [6] for the proof).

Second, we show that QuAcq can learn networks of this language in queries, hence being optimal. The key observation we use in this proof is that any constraint network in this language is equivalent to a constraint network that has a tree structure. This is because any constraint that creates a cycle in is redundant. Hence, any constraint network in this language contains at most non-redundant constraints. The condition we use in this proof is that function GenerateExample in line 1 of QuAcq generates assignments on that are solution of and that maximize the number of violations in . We consider the query submitted to the user in line 1 of QuAcq and count how many times it can receive the answer yes and no.

For each no answer in line 1 of QuAcq, a new constraint will eventually be added to . Only non-redundant constraints are discovered in this way because the query generated in line 1 of QuAcq must be accepted by . It follows that at most such queries are answered no, each one entailing more queries through the function FindScope and through the function FindC.

Now we bound the number of yes answers in line 1 of QuAcq. To simplify, let us consider that QuAcq generates queries using only two values. In such a case, the problem of computing a query maximizing the number of violations of constraints in while satisfying the constraints in corresponds to the problem of partitioning a set of numbers in two sets such that the product of pairs of numbers in different sets of the partition is maximized. The set of numbers is given by the set of sizes of components in , and the product corresponds to the number of constraints that can be removed from if components have not been assigned the same value. However, let be a set of numbers in decreasing order. The partition such that and is such that: . In other words, the query that QuAcq generates in line 1 violates at least constraints in . By using more than two values, the query would violate even more constraints in . Thus, each query answered yes at least halves the number of constraints in . It follows that the query submitted in line 1 of QuAcq cannot receive more than yes answers. The total number of queries is therefore bounded by , which is optimal.

Theorem 4

QuAcq learns networks on the language in a number of queries in , which is asymptotically optimal.

Proof. First, we give a lower bound on the number of queries required to learn a constraint network in this language. Consider the restriction to equalities only. In an instance of this language, all variables of a connected component must be equal. This is isomorphic to the set of partitions of objects, whose size is given by Bell’s Number:

(2)

By an information theoretic argument, at least queries are required to learn such a problem. This entails a lower bound of since (see [6] for the proof). The language is richer and thus requires at least as many queries.

Second, we consider the query submitted to the user in line 1 of QuAcq and count how many times it can receive the answer yes and no. The key observation is that an instance of this language contains at most non-redundant constraints. For each answer in line 1 of QuAcq, a new constraint will eventually be added to . Only non-redundant constraints are discovered in this way because the query must be accepted by . It follows that at most such queries are answered , each one entailing more queries through the procedure FindScope.

Now we bound the number of yes answers in line 1 of QuAcq. The same observation on the structure of this language is useful here as well. We show in the complete proof that a query maximizing the number of violations of constraints in the basis while satisfying the constraints in violates at least constraints in . Thus, each query answered yes at least halves the number of constraints in . It follows that the query submitted in line 1 of QuAcq cannot receive more than yes answers. The total number of queries is therefore bounded by .

The same argument holds for simpler languages ( and on Boolean domains). Moreover, this is still true for on arbitrary domains.

Corollary 2

QuAcq learns networks on the language in a number of queries in , which is asymptotically optimal.

4.2 Languages for which QuAcq is not optimal

First, we show that a Boolean constraint network on the language can be learnt with queries. Then, we show that QuAcq requires queries.

Theorem 5

Boolean constraint networks on the language can be learned in queries.

Proof. Observe that in order to describe such a problem, the variables can be partionned into three sets, one for variables that must take the value (i.e., on the left side of a constraint), a second for variables that must take the value (i.e., on the right side of a constraint), and the third for unconstrained variables. In the first phase, we greedily partition variables into three sets, initially empty and standing respectively for Left, Right and Unknown. During this phase, we have three invariants:

  1. There is no such that belongs to the target network

  2. iff there exists and a constraint in the target network

  3. iff there exists and a constraint in the target network

We go through all variables of the problem, one at a time. Let be the last variable picked. We query the user with an assignment where , as well as all variables in are set to , and all variables in are set to