# Modularity in Query-Based Concept Learning

We define and study the problem of modular concept learning, that is, learning a concept that is a cross product of component concepts. If an element's membership in a concept depends solely on it's membership in the components, learning the concept as a whole can be reduced to learning the components. We analyze this problem with respect to different types of oracle interfaces, defining different sets of queries. If a given oracle interface cannot answer questions about the components, learning can be difficult, even when the components are easy to learn with the same type of oracle queries. While learning from superset queries is easy, learning from membership, equivalence, or subset queries is harder. However, we show that these problems become tractable when oracles are given a positive example and are allowed to ask membership queries.

There are no comments yet.

## Authors

• 1 publication
• 51 publications
• ### Actively Learning Concepts and Conjunctive Queries under ELr-Ontologies

We consider the problem to learn a concept or a query in the presence of...
05/18/2021 ∙ by Maurice Funk, et al. ∙ 0

• ### Probably approximately correct learning of Horn envelopes from queries

We propose an algorithm for learning the Horn envelope of an arbitrary d...
07/16/2018 ∙ by Daniel Borchmann, et al. ∙ 0

• ### Safety Synthesis Sans Specification

We define the problem of learning a transducer S from a target language ...
11/15/2020 ∙ by Roderick Bloem, et al. ∙ 0

• ### Learning Halfspaces With Membership Queries

Active learning is a subfield of machine learning, in which the learning...
12/20/2020 ∙ by Ori Kelner, et al. ∙ 0

• ### Learning Using 1-Local Membership Queries

Classic machine learning algorithms learn from labelled examples. For ex...
12/01/2015 ∙ by Galit Bary, et al. ∙ 0

• ### Agnostic learning with unknown utilities

Traditional learning approaches for classification implicitly assume tha...
04/17/2021 ∙ by Kush Bhatia, et al. ∙ 0

• ### The Fundamental Learning Problem that Genetic Algorithms with Uniform Crossover Solve Efficiently and Repeatedly As Evolution Proceeds

This paper establishes theoretical bonafides for implicit concurrent mul...
07/15/2013 ∙ by Keki M. Burjorjee, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Inductive synthesis or inductive learning is the synthesis of programs (concepts) from examples or other observations. Inductive synthesis has found application in formal methods, program analysis, software engineering, and related areas, for problems such as invariant generation (e.g. [1]), program synthesis (e.g., [2]), and compositional reasoning (e.g. [3]). Most inductive synthesis follows the query-based learning model, where the learner is allowed to make queries about the target concept to an oracle. Using the correct set of oracles can result in the polynomial time learnability of otherwise unlearnable sets [4]. Using queries for software analysis is becoming increasingly popular (e.g., [5, 6]). The special nature of query-based learning for formal synthesis, where a program is automatically generated to fit a high-level specification through interaction with oracles, has also been formalized [7].

In spite of this progress, most algorithms for inductive learning/synthesis are monolithic; that is, even if the concept (program) is made up of components, the algorithms seek to learn the entire concept from interaction with an oracle. In contrast, in this paper, we study the setting of modular concept learning, where a learning problem is analyzed by breaking it into independent components. If an element’s membership in a concept depends solely on its membership in the components that make up the concept, learning the concept as a whole can be reduced to learning the components. We study concepts that are the Cartesian products (i.e., cross-products) of their component concepts. Such concept arise in several applications: (i) in invariant generation, an invariant that is the conjunction of other component invariants; (ii) in compositional reasoning, an automaton that is the product of individual automata encapsulating different aspects of an environment model, and (iii) in program synthesis, a product program whose state space is the product of the state spaces of individual component programs. Modular concept learning can improve the efficiency of learning since the complexity of several query-based learning algorithms depends on the size of the concept (e.g. automaton) to be learned, and, as is well known, this can grow exponentially with the number of components. Besides improving efficiency of learning, from a software engineering perspective, modular concept learning also has the advantage of reuse of component learning algorithms.

We will focus on the oracle queries given in Table 1 and show several results, including both upper and lower bounds. We show learning cross-products from superset queries is no more difficult than learning each individual concept. Learning cross-products from equivalence queries or subset queries is intractable, while learning from just membership queries is polynomial, though somewhat expensive. We show that when a learning algorithm is allowed to make membership queries and is give a single positive example, previously intractable problems become tractable. We show that learning the disjoint unions of sets is easy. Finally, we discuss the computational complexity of PAC-learning and show how it can be improved when membership queries are allowed.

### 1.1 A Motivating Example

To illustrate the learning problem, consider the sketching problem given in Figure 1. Say we wanted to find the set of possible initial values for and that can replace the values so that the program satisfies .

Looking at the structure of this program and specification, we can see that the correctness of these two variables are independent of each other. Correct values are correct independent of and vice-versa. Therefore, the set of settings will be the cross product of the acceptable settings for each variable. If an oracle can answer queries about correct or values separately, then the oracle can simply learn the acceptable values separately and take their Cartesian product.

If the correct values form intervals, the correct settings will look something like the rectangle shown in Figure 1. An algorithm for learning this rectangle can try to simulate learning algorithms for each interval by acting as the oracle for each sublearner. For example, if both sublearners need a positive example, the learner can query the oracle for a positive example. Given the example as shown in the figure, the learner can then pass and to the sublearners as positive examples. However, this does not apply to negative examples, such as in the figure. In this example, is in its target interval, but is not. The learner has no way of knowing which subconcept a negative element fails on. Handling negative counterexamples is one of the main challenges of this paper.

## 2 Notation

In the following proofs, we assume we are given concept classes defined over sets , , …, . Each in each is learnable from algorithm (called sublearners) using queries to an oracle that can answer any queries in a set . This set contains the available types of queries and is a subset of the queries shown in Table 1. For example, if , then each can make membership and equivalence queries to its corresponding oracle. 11todo: 1Q is almost always a singleton. Should we just call it a query instead of a set?

For each query , we say algorithm makes (or ) many queries to the oracle in order to learn concept , dropping the index when necessary . We replace the term with a more specific term when the type of query is specified. For example, an algorithm might make many membership queries to learn . 22todo: 2What background information, i.e., explanation of queries, should I include, if any?

Unless otherwise stated, we will assume any index or ranges over the set . We write or to refer to the -ary Cartesian product (i.e., cross-product) of sets . We use to refer to .

We use vector notation

to refer to a vector of elements , to refer to , and to refer to with replacing value at position . We define . We write or for any element of and will often denote by in place of . The target concept will be represented as or which equals .

33todo: 3mention representations

The results below answer the following question:

For different sets of queries , what is the bound on the number of queries to learn a concept in as a function of each for each ?

The proofs in this paper make use of the following simple observation:

###### Observation 1

For sets and , assume . Then if and only if , for all .

## 3 Simple Lower Bounds

This section introduces some fairly simple lower bounds. We will start with a lower-bound on learnability from positive examples.

###### Proposition 1

There exist concepts and that are each learnable from constantly many positive queries, such that is not learnable from any number of positive queries.

###### Proof

Let and set . 44todo: 4double check is used for defining everywhere To learn the set in , pose two positive queries to the oracle, and return if and only if both and are given as positive examples. To learn , pose one positive query to the oracle and return if and only if the positive example is in . An adversarial oracle for could give positive examples only in the set . Each new example is technically distinct from previous examples, but there is no way to distinguish between the sets and from these examples.

Now we will show lower bounds on learnability from EQ, Sub, and Mem. We will see later that this lower bound is tight when learning from membership queries, but not equivalence or subset queries.

###### Proposition 2

There exists a concept that is learnable from many queries posed to such that learning requires many queries. 55todo: 5Should I explicitly handle infinite and finite cases separately? Should I include bigO notation on the infinite case?

###### Proof

Let .

We can learn in membership, subset, or equivalence queries by querying , , or , respectively.

However, a learning algorithm for requires more than queries. To see this, note that contains all singletons in a space of size .

So for each subset query , if , the oracle will return as a counterexample, giving no new information. Likewise, for each equivalence query , if , the oracle can return as a counterexample. Therefore, any learning algorithm must query , , or for values of .

## 4 Learning From Superset Queries

This section introduces arguably the simplest positive result of the paper: when using superset queries, learning cross-products of concepts is as easy as learning the individual concepts.

Like all positive results in this paper, this is accomplished by algorithm that takes an oracle for the cross-product concept and simulates the learning process for each sublearner by acting as an oracle for each such sublearner.

66todo: 6make sure subconcept sublearner, etc, is defined77todo: 7should I give more intuition? It’s all in the proof fwiw
###### Proposition 3

If , then there is an algorithm that learns any concept in queries.

###### Proof

Algorithm 1 learns by simulating the learning of each on its respective class . The algorithm asks each for superset queries , queries the product to the oracle, and then uses the answer to answer at least one query to some . 88todo: 8should there be a special symbol for queries instead of just Since at least one receives an answer for each oracle query, at most queries must be made in total.

We will now show that each oracle query results in at least one answer to an query (and that the answer is correct). The oracle first checks if the target concept is empty and stops if so. If no concept class contains the empty concept, this check can be skipped. At each step, the algorithm poses query to the oracle. If the oracle returns ’yes’ (meaning ), then for each by Observation 1, so the oracle answers ’yes’ to each . If the oracle returns ’no’, it will give a counterexample . There must be at least one (otherwise, would be in ). So the algorithm checks for all until an is found. Since , we know , so , so the oracle can pass as a counterexample to .

Note that once has output a correct hypothesis , will always equal , so counterexamples must be taken from some .

## 5 Learning From Membership Queries and One Positive Example

Ideally, learning the cross-product of concepts should be about as easy as learning all the individual concepts. The last section showed this is not the case when learning with equivalence, subset, or membership queries. However, when the learner is given a single positive example and allowed to make membership queries, the number of queries becomes tractable. This is due to the following simple observation.

###### Observation 2

Fix sets , points and an index . If for all , then if and only if .

So, given a positive example , we can see that if and only if . This fact is used to learn using subset or equivalence queries with the addition of membership queries and a positive example. The algorithm is fairly similar for equivalence and subset queries, and is shown as a single algorithm in Algorithm 2.

###### Proposition 4

If and a single positive example is given, then is learnable in queries from (i.e., subset or equivalence queries) and membership queries.

###### Proof

The learning process for either subset or equivalence queries is described in Algorithm 2, with differences marked in comments. In either case, once the correct is found for any , will equal for all future queries, so any counterexamples must fail on an .

We separately show for each type of query that a correct answer is given to at least one learner for each subset (resp. equivalence) query to the cross-product oracle. Moreover, at most membership queries are made per subset (resp. equivalence) query, yielding the desired bound.

Subset Queries: For each subset query , the algorithm either returns ‘yes’ or gives a counterexample . If the algorithm returns ’yes’, then by Observation 1 for all , so the algorithm can return ’yes’ to each . Otherwise, , so there is an such that . By Observation 2 the algorithm can query for all until the is found.

Equivalence Queries: For each equivalence query , the algorithm either returns ’yes’, or gives a counterexample . If the algorithm returns ‘yes’, then a valid target concept is learned. Otherwise, either or . In the first case, as with subset queries, the algorithm uses membership queries to query for all . Once the is found it is given to as a counterexample. In the second case, as with superset queries, the algorithm checks if for all until the is found and given to .

Finally, learning from only membership queries and one positive example if fairly easy.

###### Proposition 5

If and a single positive example is given, then is learnable in membership queries.

###### Proof

The algorithm learns by simulating each in sequence, moving on to once returns a hypothesis . For any membership query made by , if and only if by Observation 2. Therefore the algorithm is successfully able to simulate the oracle for each , yielding a correct hypothesis .

## 6 Learning From Only Membership Queries

We have seen that learning with membership queries can be made significantly easier if a single positive example is given. In this section we describe a learning algorithm using membership queries when no positive example is given. This algorithm makes queries, matching the lower bound given in a previous section.

For this algorithm to work, we need to assume that for all . If not, there is no way to distinguish between an empty and non-empty concept. For example consider the classes and . It is easy to know when we have learned the correct class in or in using membership queries. However, learning from their cross-product is impossible. For any finite number of membership queries, there is no way to distinguish between the sets and for some that has yet to be queried.

The main idea behind this algorithm is that learning from membership queries is easy once a single positive example is found. So the algorithm runs until a positive example is found from each concept or until all concepts are learned. If a positive example is found, the learner can then run the simple algorithm from Proposition 5 for learning from membership queries and a single positive example.

###### Proposition 6

Algorithm 3 will terminate after making queries.

###### Proof

The algorithm works by constructing sets of elements and querying all possible elements of . We will get our bound of by showing the algorithm will find a positive example once for all . Since the algorithm queries all possible elements of , it is sufficient to prove that will contain an element of once . We will now show this is true for each .

Assume that the sublearner eventually terminates with the correct answer . Let be the elements whose membership would query assuming it only received negative answers from an oracle. If is finite, then there is some set that outputs after querying all elements in (and receiving negative answers). We will consider the cases when and

Assume : Then by our assumption that , contains some element . Note that although sampling elements from a set might be expensive in general, this is only done for and can therefore be hard-coded into the learning algorithm. The algorithm will start with , so contains an element of at the start of the algorithm.

Assume : By our assumption that eventually terminates, must eventually query some (Otherwise, would only receive negative answers and would output ). So after steps, contains some element of . Since , we have that contains a positive example once , completing the proof.

## 7 Learning from Equivalence or Subset Queries is Hard

The previous section showed that learning cross products of membership queries requires at most membership queries. A natural next question is whether this can be done for equivalence and subset queries. In this section, we answer that question in the negative. We will construct a class that can be learned from equivalence or subset queries but which requires at least queries to learn .

We define to be the set , where is defined as follows:

 c(λ):={λ}×N
 c(s):=({s}×N)∪csub(s)
 csub(sa):=({s}×(N∖{a}))∪csub(s)

For example, .

An important part of this construction is that for any two strings , we have that if and only if . This implies that a subset query will return true if and only if the true concept has been found. Moreover, an adversarial oracle can always give a negative example for an equivalence query, meaning that oracle can give the same counterexample if a subset query were posed. So we will show that is learnable from equivalence queries, implying that it is learnable from subset queries.

We will prove a lower-bound on learning from subset queries from an adversarial oracle. This will imply that is hard to learn from equivalence queries, since an adversarial equivalence query oracle can give the exact same answers and counterexamples as a subset query oracle.

###### Proposition 7

There exist algorithms for learning from equivalence queries or subset queries such that any concept can be learned from queries.

###### Proof
99todo: 9flesh out this proof?

(sketch) Algorithm 4 shows the learning algorithm for equivalence queries, and Figure 2

show the decision tree. When learning

for any , the algorithm will construct by learning at least one new element of per query. Each new query to the oracle is constructed from a string that is a substring of If a positive counterexample is given, this can only yield a longer substring of .

### 7.1 Showing Ck is Hard to Learn

It is easy to learn , since each new counterexample gives one more element in the target string . When learning a concept, , it is not clear which dimension a given counterexample applies to. Specifically, a given counterexample could have the property that for all , but the learner cannot infer the value of this . It must then proceed considering all possible values of , requiring exponentially more queries for longer . 1010todo: 10is this clear? This subsection will formalize this notion to prove an exponential lower bound on learning . First, we need a couple definitions.

A concept is justifiable if one of the following holds:

• For all ,

• There is an and an and such that , and the -ary cross-product was justifiably queried to the oracle and received a counterexample such that .

A concept is justifiably queried if it was queried to the oracle when it was justifiable.

For any strings , we write if is a substring of , and we write if and . We say that the sum of string lengths of a concept is of size if

Proving that learning is hard in the worst-case can be thought of as a game between learner and oracle. 1111todo: 11is this clear? The oracle can answer queries without first fixing the target concept. It will answer queries so that for any , after less than queries, there is a concept consistent with all given oracle answers that the learning algorithm will not have guessed. The specific behavior of the oracle is defined as follows:

• It will always answer the same query with the same counterexample.

• Given any query , the oracle will return a counterexample such that for all , , and has not been in any query or counterexample yet seen.

• The oracle never returns ‘yes’ on any query.

The remainder of this section assumes that queries are answered by the above oracle. An example of answers by the above oracle and the justifiable queries it yields is given below.

###### Example 1

Consider the following example when . First, the learner queries to the oracle and receives a counter-example . The justifiable concepts are now and . The learner queries and receives counterexample . The learner queries and receives counterexample . The justifiable concepts are now , , and . At this point, these are the only possible solutions whose sum of string lengths is . The graph of justifiable queries is given in Figure 3.

1212todo: 12fix figure caption and box/text formatting

The following simple proposition can be proven by induction on sum of string lengths.

###### Proposition 8

Let be a justifiable concept. Then for all , , …, where for all , , has been queried to the oracle.

###### Proposition 9

If all justified concepts with sum of string lengths equal to have been queried, then there are justified queries whose sum of string lengths equals

###### Proof

This proof follows by induction on . When , the concept is justifiable.For induction, assume that there are justifiable queries with sum of string lengths equal to . By construction, the oracle will always chose counterexamples with as-yet unseen values in . So querying each concept will yield a counterexample where for all , for new . Then for all , this query creates the justifiable concept , where for all and . Thus there are justifiable concepts with sum of string lengths equal to .

We are finally ready to prove the main theorem of this section.

###### Theorem 7.1

Any algorithm learning from subset (or equivalence) queries requires at least queries to learn a concept , whose sum of string lengths is . Equivalently, the algorithm takes subset (or equivalence) queries.

###### Proof
1313todo: 13Should I go into more detail why the existence of this c’ shows the algorithm hasn’t learned c?

Assume for contradiction that an algorithm can learn with less than queries and let this algorithm converge on some concept after less than queries. Since less than queries were made to learn , by Proposition 9, there must be some justifiable concept with sum of string lengths less than or equal to that has not yet been queried. By Proposition 8, we can assume without loss of generality that for all , has been queried to the oracle. We will show that is consistent with all given oracle answers, contradicting the claim that is the correct concept. Let be any concept queried to the oracle, and let be the given counterexample. If for all , , then by construction, there is a with such that , so is a valid counterexample. Otherwise, there is an such that . So , so is a valid counterexample. Therefore, all counterexamples are consistent with being correct concept, contradicting the claim that the learner has learned .

## 8 Disjoint Union

This section discusses learning disjoint unions of concept classes. This is generally much easier than learning cross-products of classes, since counterexamples belong to a single dimension in the disjoint union. This problem uses the same notation as the cross-product case, but we denote the disjoint union of two sets as and the disjoint union of many sets as . We define the concept class of disjoint unions as .

The algorithm for learning from membership queries is very easy and won’t be stated here. Algorithm 5 shows the learning procedure for when . The correctness of this algorithm follows from the following simple facts. Assume we have sets and . Then if and only for all . Likewise if and only if for all .

We can summarize these results in the following proposition.

###### Proposition 10

Take any and assume each concept class is learnable from many queries. 1414todo: 14should I say instead? Then there exists an algorithm that can learn the disjoint union of concept classes in many queries.

## 9 Efficient PAC-Learning

This sections discusses the problem of PAC-learning the cross products of concept classes. 1515todo: 15do we want to assume familiarity with VC dimension?

Previously, van Der Vaart and Weller [8] have shown the following bound on the VC-dimension of cross-products of sets:

 VC(∏Ci)≤a1log(ka2)∑VC(Ci)

Here and are constants with and . As always, is the number of concept classes included in the cross-product.

The VC-dimension gives a bound on the number of labelled examples needed to PAC-learn a concept, but says nothing of the computational complexity of the learning process. This complexity mostly comes from the problem of finding a concept in a concept class that is consistent with a set of labelled examples. We will show that the complexity of learning cross-products of concept classes is a polynomial function of the complexity of learning from each individual concept class.

First, we will describe some necessary background information on PAC-learning.

### 9.1 PAC-learning Background

###### Definition 1

Let be a concept class over a space . We say that is efficiently PAC-learnable if there exists an algorithm with the following property: For every distribution on , every , and every , if algorithm is given access to

then with probability

, will return a such that . must run in time polynomial in , , and .

We will refer to as the ‘accuracy’ parameter and as the ‘confidence’ parameter. The value of is the probability that for an sampled from that . PAC-learners have a sample complexity function . The sample complexity is the number of samples an algorithm must see in order to probably approximately learn a concept with parameters and .

Given a set of labelled examples in , we will use to denote the the concept class the algorithm returns after seeing set 1616todo: 16check whether we should use S as an input or as something given in alg.

A learner is an empirical risk minimizer if returns a that minimizes the number of misclassified examples (i.e., it minimizes ).

Empirical risk minimizers are closely related to VC dimension and PAC-learnability as shown in the following theorem (Theorem 6.7 from [9])

###### Theorem 9.1

If the concept class has VC dimension , then there is a constant, , such that applying an Empirical Risk Minimizer to samples will PAC-learn in , where

 mC(ϵ,δ)≤bd⋅log(1/ϵ)+log(1/δ)ϵ

Finally, we will discuss the growth function. The growth function describes how many distinct assignments a concept class can make to a given set of elements. More formally, for a concept class and , the growth function is defined by:

 GC(m)=maxx1,x2,…,xm∣∣{(c(x1),c(x2),…,c(xm))∣c∈C}∣∣

Each in the above equation is taken over all possible elements of . The VC-dimension of a class is the largest number such that .

We will use the following bound, a corollary of the Perles-Sauer-Shelah Lemma, to bound the runtime of learning cross-products [9].

###### Lemma 1

For any concept class with VC-dimension and :

 GC(m)≤(em/d)d

### 9.2 PAC-Learning Cross-Products

We now have enough background to describe the strategy for PAC-learning cross-products. We will just describe learning the cross-product of two concepts. 1717todo: 17do we want to describe the general strategy? As above, assume concept classes and and PAC-learners and are given. We define as the runtime of the sublearner to PAC-learn with accuracy parameter and confidence parameter .

Assume that and have VC-dimension and , respectively. We can use the bound from van Der Vaart and Weller to get an upper bound on the VC-dimension of their cross-product. Assume the algorithm is given an and and there is a fixed target concept . Theorem 9.1 gives a bound on the sample complexity . The algorithm will take a sample of labelled examples of size . Our goal is to construct an Empirical Risk Minimizer for . In our case, and . Therefore, for any sample , an Empirical Risk Minimizer will yield a concept in that is consistent with . This algorithm is show in Algorithm 6.

So let be any such sample the algorithm takes. This set can easily be split into positive examples and negative examples , both in . The algorithm works by maintaining sets labeled samples and for each dimension. For any , it holds that and so and are added to and respectively. For any , we know that or (or both), but it is not clear which is true. However, since the goal is only to create an Empirical Risk Minimizer, it is enough to find any concepts and that are consistent with these samples. In other words, we need to find a and a such that for every , and and for all , either or . One idea would be to try out all possible assignments to elements in and check if any such assignment fits any possible concepts. 1818todo: 18is it clear what ”all possible assignments” are? This, however, would be exponential in .

Bounding the size of the growth function can narrow this search. Specifically, let , let and order the elements of by . By the definition of the growth function and Lemma 1:

 |{(c(x1),c(x2),…,c(xm))∣c∈C1}|≤GC1(m)≤(em/d)d

In other words, there are less than assignments of truth values to elements of that are consistent with some concept in . If the algorithm can check every consistent with and , it can then call to see if there is any such that assigns true to every element in and false to every element in . 1919todo: 19be clear about difference between a set of pairs of labeled examples (such as ) and one side of that set (such as )

Finding these consistent elements of is made easier by the fact that we can check whether partial assignments to are consistent with any concept in . As mentioned above, it starts by creating the sets and containing all samples in the first and second dimension of , respectively. It then iteratively adds labeled samples from . At each step, the algorithm chooses one element at a time and checks which possible assignments to are consistent with . If is consistent, it adds to and calls