 # Sampling and Learning for Boolean Function

In this article, we continue our study on universal learning machine by introducing new tools. We first discuss boolean function and boolean circuit, and we establish one set of tools, namely, fitting extremum and proper sampling set. We proved the fundamental relationship between proper sampling set and complexity of boolean circuit. Armed with this set of tools, we then introduce much more effective learning strategies. We show that with such learning strategies and learning dynamics, universal learning can be achieved, and requires much less data.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In [1, 2, 4, 5], we tried to study universal learning machine. There, we laid out framework of discussions and proved some basic yet important results, such as: with sufficient data, universal learning machine can be achieved. The core of universal learning machine is X-form, which turns out to be a form of boolean function. We showed that the learning is actually equivalent to dynamics of X-form inside a learning machine. Thus, in order to study universal learning machine well, we need to study thoroughly X-form and the motion of X-form under driven of data.

Since the work of [2, 4, 5], we have constantly pursued the effective learning dynamics, and tried to understand X-form, and more generally, boolean function and boolean circuit. In the process, eventually, we found that the very core of problem is: we need to find a powerful way to describe the property of boolean function. If we have such a tool, we can penetrate into boolean function deep and do much better than before. But, it is not easy to find such a tool. It took us a long time. We recently invented a set of tools, namely, fitting extremum and proper sampling set. Our invention, i.e. fitting extremum and learning dynamics, can be seen in our patent application [9, 10]. How to use fitting extremum and proper sampling set for a spacial case, namely 1-dim real function, can be seen in . In this article, we provide theoretical discussions of these tools and related studies.

We discuss boolean function in section 2, and boolean circuit in section 3. We define a way to present a boolean circuit, i.e. connection matrix, and decomposition of connection matrix. In section 4, we introduce sampling set, fitting extremum, and proper sampling set (PSS). We show the deep connections between PSS and size of boolean circuit. In section 5, we will discuss how to apply these tools to learning dynamics, and prove universal learning machine can be achieved by using them. Finally, in section 6, we make some comments. In appendix, we put details of relationship of PSS and size of boolean circuit.

## 2 Boolean Function

Boolean function and boolean circuit are very important for learning machine. We first define boolean functions and related concepts.

is N-dim boolean space, it consists of all N-dim boolean vectors:

 BN={(b1,b2,…bN)  |  bk=0 or 1, k=1,2,…,N}

We also called this space as base pattern space . is the starting point for us. Specially, when become . N-dim boolean function is a function defined on :

###### Definition 2.1 (Boolean Function).

A N-dim boolean function is a function from to . We can also write it as:

 f:BN→B,f(b1,b2,…bN)=0 or 1

We can see some examples of boolean functions.

###### Example 2.1 (Some Simplest Boolean Functions).

Constant function is simplest:

 f:BN→B,  f(b1,b2,…bN)=1

The function only depends on one variable is also very simple:

 f:BN→B,  f(b1,b2,…bN)=b1

We can see more examples of boolean function. Boolean functions formed by one basic logic operations are also very simple. Logical operation OR forms one boolean function:

 o:B2→B,o(b1,b2)=b1∨b2={0both are 01otherwise

Logical operation AND also forms one boolean function:

 a:B2→B,a(b1,b2)=b1∧b2={1both are 10otherwise

Logical operation Identity also forms one boolean function:

 id:B→B,id(b)=b={1b=10b=0

Logical operation Negation also forms one boolean function:

 n:B→B,n(b)=¬b={1b=00b=1

Logical operation XOR also forms one boolean function:

 x:B2→B,x(b1,b2)=b1⊕b2={1one and only of b1,b2 is zero0otherwise

It is worth to note that XOR can be written by using OR, AND and Neg:

 b1⊕b2=(b1∧¬b2)∨(¬b1∧b2)=(b1∨b2)∧¬(b1∧b2)

These simple logic operations are actually form the foundation of boolean function. But boolean functions can be defined and calculated by many ways, not just by logical operations.

###### Example 2.2 (Boolean Function as Real Function).

Logical operation OR can be written as real function:

 o:B2→B,o(b1,b2)=b1∨b2=sign(b1+b2),where sign(x)={1x>00x≤0

where, + is the addition of real number. Logical operation AND can be written:

 a:B2→B,a(b1,b2)=b1∧b2=b1⋅b2

where is the multiplication of real number. Logical operation Negation also forms one boolean function:

 n:B→B,n(b)=¬b=−(b−1)

More boolean function defined by real functions.

###### Example 2.3 (More Boolean Functions Defined by Real Functions).

We can define a boolean function as:

 f:B2→B,  f(b1,b2)=sign(Oscil(r1b1+r2b2)), sign(x)={1x>00x≤0

where are 2 real numbers, sign is the sign function, Oscil is an oscillator function. Oscillator function is something like , which oscillates from negative to positive and go on. Generally, oscillator functions are very rich. They do not need to be oscillate regularly like . They could oscillate irregularly and very complicatedly.

Yet, another boolean function is more popular:

 f:BN→B,  f(b1,b2,…,bN)=sign(r1b1+r2b2+…+rNbN)

where

are real numbers. This function is often called as a artificial neuron. A little modification will give linear threshold function:

 f:BN→B,  f(b1,b2,…,bN)=sign(r1b1+r2b2+…+rNbN−θ)

where are real numbers.

Parity function is one important boolean function, which help us in many aspects.

###### Example 2.4 (Parity Function).

Parity function is defined as below:

 p(b1,b2,…,bN)={1number of 1 is odd0number of 1 is even

Parity can also be calculated by real number as below:

 p(b1,b2,…,bN)=(N∑i=1bi) (mod 2)

Since boolean function is on a finite set, it is possible to express it by a table of value. This table is called as truth table. For example, a parity function of 3 variables can be expressed as below table:

We have seen that a boolean function can be defined and calculated by many ways, such as: logical operations, real functions, truth table, etc. But, any boolean function can be expressed by logical operations.

###### Lemma 2.1 (Expressed by Basic Logic Operation).

Any boolean function can be expressed by basic logic operations: .

Proof: First, one boolean function can be expressed by its truth table. In the truth table, there are entries, and at each entry, the function value is recorded. Since we can use the basic logic operations to express one boolean vector in , each entry can be expressed by basic logic operations. Thus, we can express the boolean function.

For example, we can express the parity function of 3 variables as:

 p(b1,b2,b3)=(b1⊕b2)⊕b3

Note, can be expressed by .

Another example of boolean function.

###### Example 2.5 (Expressed By Polynomial Function).

Consider a polynomial function on real number, e.g., . Also, consider a way to embed a boolean vector into real number. There are infinite such embeddings. We will consider following:

 ∀v∈BN,x=b1(12)+b2(12)2+…+bN(12)N

Then, we define a boolean function as:

 ∀v∈BN,f(v)=sign(P(x)),where x is as above

This will define a boolean function on for any . Such a way to define boolean function and embedding to real number is quite useful.

## 3 Boolean Circuit

We know a boolean function can be defined and calculated by many possible ways. But, no matter how it is defined and calculated, Lemma 2.1 tells us that it can be expressed by . We call such expression as boolean expression.

###### Definition 3.1 (Boolean Expression).

A boolean function can be expressed by and input variables as one algebraic expression, we call this algebraic expression as boolean expression of .

Boolean expression is also called boolean formula. As one example, the parity function of 4 variable can be expressed as:

 p(b1,b2,b3,b4)=(b1⊕b2)⊕(b3⊕b4)

This is to say, we can realize a boolean function by one algebraic expression. Moreover, we can realize one algebraic expression by hardware that is a group of switches and connections, namely. a circuit. Actually, we can just make such a circuit that is direct translation from the boolean expression, just use a AND switch to replace , a OR switch to replace , and negation connection to replace . Thus, we have definition:

###### Definition 3.2 (Boolean Circuit).

Boolean circuit is one directed acyclic graph. There are 2 types of nodes, AND and OR nodes. Connection between nodes are either direct connection (1 to 1 and 0 to 0) or negation connection (1 to 0 and 0 to 1). This graph starts from input nodes: , and ends at the top node. We note that at each node, there are 2 and only 2 connections from below (this is called 2 fanin). But the connections going up could be any number.

Note, the definition here are slight different than boolean circuit defined in most literatures (for example ). But, the difference is just very surface and it is just for convenience for our discussions. We can write a boolean circuit in diagram. See diagram below for some examples. A boolean circuit and a boolean expression actually are identical. So, we will later to use them as same.

###### Example 3.1 (Some Simple Circuit).

Simplest circuit: . This is a special case. This circuit has no node, i.e. the number of node is 0.

Second simplest circuit: . See Fig. 1 C1 for diagram. This circuit has 1 node and 2 connections. Circuit: . See Fig. 1 C3 for diagram. This circuit has 1 node and 2 connections,.one is direct connection, another is negation connection.

Circuit for AND. See Fig. 1 C2 for diagram. This circuit has 1 node and 2 connections, both are direct connections.

Circuit for XOR. See Fig. 1 C5 for diagram. We can express it as: . This circuit has 3 nodes, i.e. one OR node, and 2 AND nodes, and with 2 negation connection.

. See Fig. 1 C4 for diagram. This circuit has 2 nodes.

Fig1. Diagrams of Some Simple Circuits

For a given boolean circuit , for a given input, i.e. taking value of 0 or 1, we can feed these values into . The circuit will take value at each node accordingly. When the value at the most top node is taken, the circuit take value for itself. This is how a boolean circuit to execute a boolean function. We will denote as .

Any boolean function , no matter how is defined and calculated, it can be expressed by one boolean circuit . That is to say, .

Clearly, for a boolean function, the boolean circuit to express the function is not unique. For example, one very simple boolean function XOR can be expressed in 2 ways: or . That is to say, XOR can be expressed by 2 different boolean circuit. For more complicated boolean function, this is even more true.

A boolean circuit consists of a series of nodes and connections. One very important properties of a boolean circuit is its number of nodes.

###### Definition 3.3 (Node Number).

For one boolean circuit , we denote the number of nodes of as .

That is to say, we define a function on all circuits. Such function is called node number. This function will play an important role in our discussions.

How can we write a boolean circuit? We can write it as algebraic expression like before. But, for the purpose of easy manipulation, we need to write them in more ways. First, we denote all nodes of a circuit as: , where . Theses are working nodes. Yet, input variables are also nodes, which are nodes for inputs. So, is a graph with nodes . are input nodes, and as ending node, the rest, i.e. , are working nodes, and is the ending node (it is working node as well).

At each working node, , there are 2 and only 2 incoming connections. Except ending node, at each working node, there are 1 or more outgoing connections.

Thus, besides using diagram and boolean algebraic expression to express a boolean circuit, we can use matrix notation to express a circuit.

###### Definition 3.4 (Connection Matrix).

For a circuit on , suppose all working nodes of are , where , we define a matrix , its entries are these symbols: or 0, and the meaning of symbols are as following:

 at (i, j):⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩0no connection from j-th node to i-th % working node∧direct connection from j-th node to i-th working node, and this % working node is ∧∧¬negation connection from j-th node to i-th working node, and % this working node is ∧∨direct connection from j-th node to i-th working node, and this % working node is ∨∨¬negation connection from j-th node to i-th working node, and % this working node is ∨

We call such maxtrix as connection matrix of .

Clearly, for a given circuit, we can write down its connection matrix. Reversely, if we have such a matrix, it gives a circuit as well. So, we could identify a circuit with a connection matrix.

We can see some immediate properties of connection matrix. Each row of connection matrix is for one working node, and each column is for connection to all working nodes (except ending node) from one node. Since for each working node, there are 2 and only 2 incoming connections, each row has 2 and only 2 entries are non 0. Since for each node (except ending node), there are 1 or more outgoing connections, each column has 1 or more entries are non 0.

###### Example 3.2 (Examples of Connection Matrix).

Consider a circuit . See Fig. 1 C4 for diagram of this circuit. All nodes of are , and working nodes are , ending node is . The connection matrix of is a matrix as below:

 Mf=[0∧∧¬0∨00∨]

Another example, consider XOR, the circuit is , all nodes of are , working nodes are , ending node is . The connection matrix of is a matrix as below:

 Mxor=⎡⎢⎣∨∨00∧∧0000∧∧¬⎤⎥⎦

In the above discussions, there is no order among working nodes. Now we define a order among working nodes. Let’s see how the ending node is getting its values. At the very beginning, only input nodes have values, all working nodes are with empty value. When the values propogate along the circuit, the working nodes that have 2 incoming connections from input nodes will get their values. So, these nodes should be put first in the order. But, there could be more than one such nodes. Among these nodes, we will define order by this way: if both 2 nodes have 2 incoming connections from input nodes, say, with , and with , the order of are determined by so called dictionary order, i.e. if , then is first than , if , then is first than . Yet, if it is the case: , then and must be different type (otherwise, we could eliminate one), then the node of is first.

Now, we have order among working nodes that have 2 incoming connections from input nodes. These nodes will be evaluated. We then consider those working nodes that have 2 incoming connections from nodes that have values already. Then, we can have the order as before. Clearly, we can repeat the above process to give the order to these nodes. So, eventually, we will have the order to all working nodes.

In one word, the natural order of working nodes means: if one node is evaluated in front, then, it is in front by the natural order. To demonstrate this order, we see one example, circuit for parity function of 4 variables: . See diagram below.

Fig2. Circuit of Parity of 4 Variables

There are 9 working nodes. Thus, all nodes are . According to the natural order, working nodes are getting values in this way: get input values, then get values, then, get values, then, , finally, . We can write the connection matrix below.

 Mp=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣∨∨0000000000∧∧000000000000∨∨0000000000∧∧000000000000∧∧¬000000000000∧∧¬000000000000∨∨0000000000∧∧000000000000∧∧¬⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦

Note, the connection matrix is done according to the natural order of working nodes. If the order in working nodes are different, the connection matrix will appear differently (but just some permutation).

Natural order in working nodes is useful tool. We use a lemma to describe it.

###### Lemma 3.1 (Natural Order of Working Nodes).

For a boolean circuit , suppose its working nodes are , we can make one natural order in the working nodes, so that evaluation of the working nodes will depend on the working nodes in front of it, and will not depend on any working nodes in back of it.

Proof: The proof is already done in above discussions.

Using the natural order of working nodes, we can see that the working nodes will be in levels. For example, in the example of parity of 4 variables, we have 9 working nodes, and they are divided into 4 levels: level 1: , level 2: , level 3: , and level 4: . See this clearly in diagram. Nodes in level 1 will get value first. Nodes in level 2, will depends in level 1, etc. That is to say, in order to evaluate nodes in level , all nodes in all levels should be evaluated first.

###### Definition 3.5 (Level of Nodes).

For boolean circuit , suppose its working nodes are , we can group working nodes into a series of subsets , consisting of all working nodes that any their incoming connections are from previous subsets, i.e. from . We call each subset as one level of working nodes, we also call the number as depth, or depth number, or height.

According to Lemma 3.1, we can indeed make such level of working nodes. Clearly, the top level only has one node, i.e. ending node . As the above example of parity of 4 variables demonstrates, the evaluation process of a circuit must be level by level. In order to evaluate nodes in level , it must first evaluate all nodes in level . This property indicates that we can do decomposition according to level.

That is to say, we can do evaluation by this way: from input nodes to level 1, then, from level 1 to level 2, etc. If we see the connection matrix of parity of 4 variables, we can see clearly. Thus, we can decompose the connection matrix according to levels. See below:

 M1=⎡⎢ ⎢ ⎢⎣∨∨00∧∧0000∨∨00∧∧⎤⎥ ⎥ ⎥⎦M2=[∧∧¬0000∧∧¬]M3=[∨∨∧∧]M4=[∧∧¬]

Here, is for: from input nodes to get value of nodes in level 1. For example, if is the input, then . vecto r gives values of all nodes in level 1. We can continue to use for values of all nodes in level 2, for values of all nodes in level 3, and finally, for value of top node. We can write these operations into following form:

 Cp(v)=M4M3M2M1v,v=(b1,b2,b3,b4)T∈B4

Here, is the circuit of parity of 4 variables, and stands for the value of top node, which is the output value of the circuit. In this way, we can operate on circuit much easier. It is still not as good as ordinary matrix calculations, but it is much better and clear. We will use this notation consistently.

However, we need to be more careful. In the above example, level only depends on level , not on level directly. This is not always true. Consider the circuit , which is in diagram of C4 in Fig. 1. All nodes of are . Working nodes are . The connection matrix of is a matrix as below:

 Mf=[0∧∧¬0∨00∨]

So, clearly, level 0 is {} (input nodes), level 1 is {}, level 2 is {} (ending node). But, we can see that level 2 node has incoming connections from level 1 and level 0. Thus, the decomposition according to level to level seems has difficulties. Can we still do decomposition as we did for ?

In order to make neat decomposition, we need introduce a new kind of node: spurious node. We will se to denote spurious node. A spurious node is one node adding to one level to just pass the connections from lower level to higher level. After introducing spurious node, then, we can go back to the situation: level will only depends on level , not on any previous level. As one example to demonstrate, for , we add one spurious node in level 1. This spurious node has 1 and only 1 incoming connection, and this node will not do anything, but pass the value of , so its outgoing connections are exactly same as the outgoing connections of . So, after add this node, will have 2 incoming connections from level 1. So, we can write following decomposition.

 M1=[s000∧∧¬]M2=[∨∨]

And,

 Cf(v)=M2M1v,v=(b1,b2,b3)T∈B3

This decomposition will make our operation on circuit easier. For example, if input is , then, , , so .

###### Definition 3.6 (Spurious Nodes).

For a circuit on , suppose all working nodes of are , where , and nodes are grouped into levels: , where is the number of levels. If at level , there are the incoming connections not from level , but from level lower than , we can add spurious nodes in level , so that these nodes only pass the value. We use to denote such nodes. By adding spurious nodes, the evaluation of one level will only depend on level .

We can write this decomposition into following lemma.

###### Lemma 3.2 (Decomposition of Connection Matrix by Level).

For a boolean circut , suppose its working nodes are , and nodes are grouped into levels: , where is the number of levels. Then, if necessary, we can add spurious nodes, then the evaluation of will be decomposited to a series evaluation so that each evaluation is done from one level to next level. And, each evaluation can be achieved by matrix operation.

Proof: The proof is already done in above discussions.

Fig. 3 Circuit of 5 Levels

###### Example 3.3 (Example of Decomposition).

We consider this boolean circuit: . See the diagram for this circuit in Fig. 3, which is the left diagram. has 9 working nodes: . The working nodes are ordered as we discussed before. We can write down working nodes as: , , , , , , , , . The connection matrix is blow:

 Mp=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣0∧0∧0000000000∨∨0000000000∧∧00000000∨000∨000000000000∧∧¬000000∨00000∨000000000000∨∨0000000000∧∧000000000000∧∧¬⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦

There are 5 levels in this circuit: level 0: {}, level 1: {}, level 2: {}, level 3: {}, level 4: {}, level 5: {}. These levels are not single level evaulation. For example, at , we need (level 1) and (level 0) to evaluate it. But, we can add spurious nodes. See the right diagram in Fig. 3, where nodes are spurious nodes. We can see clearly, with spurious nodes, the circuit becomes single level evaluation. Then, we can do decomposition by level. We have following connection matries between levels.

 M1=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣s0000∧0∧0s0000∨∨00∧∧⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦M2=⎡⎢⎣∨∨00000s00000∧∧¬⎤⎥⎦M3=[s000∧∧¬]
 M4=[∨∨∧∧]M5=[∧∧¬]

By using these connection matrices, we can see the evaluation of circuit as following:

First, input value is . We feed this into , and get a 5-dim vector . Then, feed into , we will get a 3-dim vector . Then, feed into , get a 2-dim vector. Then, feed into , get a 2-dim vetor. Finally, feed into , get the value at ending node.

Note, the role that spurious nodes are playing.

This example shows that decomposition will make boolean circuit becomes much easier to analyze. After decomposition, we have several levels. Each level is very simple boolean circuits: each node has only 2 incoming connections, and all nodes are in the exactly same level. We can use one matrix to record this one level circuit well. We call this matrix as one level connection matrix. We can use the matrix to evaluate all nodes in the one level circuit, and the evaluation is very simple and mechanical, almost like the normal matrix-vector multiplication. This makes analysis much easier. Although the operation is not truly matrix calculation, yet, it is quite simple and easier to handle. So, the above notation is good enough to help us to record the boolean circuits, and help us the do operations and analysis on boolean circuits.

### About Size of Boolean Circuit

In most literatures about boolean circuit, for example, in , the size and depth of a boolean circuit are defined. They are highly related to and different from our definition of node number and level number. We discuss them here.

In , the size of a boolean circuit is defined as the number of gates used in the circuit. In contrast, we define the node number of a circuit as the number of nodes , not including . We will use notation for size of a boolean circuit (as most literature), and use notation for node number.

In most literatures, the depth of circuit is defined as: the steps required from input to output. Our definition of depth is exactly same as most literature. The depth equals the number of levels. So, if depth of a circuit is , we can decompose connection matrix to connection matrices, and each such matrix is only for one level, i.e. depth is 1.

###### Lemma 3.3 (Relationship of s(C) and d(C), K and Depth).

For a boolean circuit , suppose is the size of circuit (as most literature), and is node number, then . And, depth of a circuit equals number of levels.

Proof: The proof is clear.

Since circuit complexity in most literature is measured by , if we are interested in circuit complexity, using is equivalent to using . However, for our purpose, to use is more convenient. We will mostly use to measure a circuit.

## 4 Fitting Extremum and Proper Sampling Set

In order to analyze boolean function , one way is to consider some examples, say, we feed some into and see its value. This is called sampling. More precisely, we get an input by some way, we then get value of , this forms one sample of . If we repeat such sampling for some times, we get the sampling set.

###### Definition 4.1 (Sampling Set).

A sampling set is one subset of , that is, if , we say is one sampling set (or, just sampling). Moreover, over one sampling set, there could have assigned values:

 Sv={[x,b] | x∈S, b=0 or 1}

We say such set as sampling set with assigned values, or sampling with values, or just sampling. For a boolean function , we can have the sampling set of (or sampling set for ):

 Sv={[x,f(x)] | x∈S}

Sampling set of will give us information about this boolean function. We can think a sampling set of as a subset of the truth table of . Naturally, we want to ask: Can we recover the whole truth table by a sampling set? Actually, under certain condition, we can. See this simple example. Consider the simplest circuit: . The truth table is very simple as below:

If we only have a subset of this truth table, can we use a it to recover the whole truth table? Depends. If the subset is: , we could not, since there is another circuit satisfies this sampling set as well. But, if the subset is: , we could. Even though this subset is a true subset of truth table, we can see clearly, there is no any other simple circuit satisfies this set. But, there is indeed a circuit satisfies this set and it is not . But, this circuit is more complicated than , i.e. it has more nodes.

This simple fact, of course many other facts as well, motivates us to consider this question: Given a sampling set, if we look a simplest boolean circuit to satisfy the sampling set, what would happen? Can we recover the whole truth table by this action? This is the central question that we try to address. But first we define circuit space.

###### Definition 4.2 (Circuit Space on BN).

The set of all boolean circuit on is called circuit space on . We use to represent the circuit space.

 C={C | C is boolean circuit on BN}

Note, is much a bigger set than the set of all boolean functions. The number of boolean functions on are finite, though the number is very huge: . But, one boolean function could have many boolean circuits to express it. So, is a much larger space.

We then define Fitting Extremum that is a minimizing problem to look for the boolean circuit that has smallest node number while fitting with sampling.

###### Definition 4.3 (Fitting Extremum).

For a sampling set with values, we define one extremum problem as following:

 Min: d(C), C∈C & ∀[x,b]∈Sv C(x)=b

We call this problem as fitting extremum on .

In fitting extremum, we are looking for boolean circuit in that it has these properties: 1) fitting with sampling set and 2) with smallest node number. We can use one most simple case to illustrate the meaning of fitting extremum. Consider sampling set: . As discussed above, this could be a subset of truth table of some unknown circuit. We want to use this sampling set to recover the whole truth table. When we look circuit fitting with sampling, we find that 2 circuits and fitting with sampling. So, which circuit should we choose? Just sampling set itself is not good enough. But, if we add one more condition, i.e. to look for simplest circuit fitting with sampling, then, we know should be chosen, since . This simple example indeed tells us what fitting extremum is about.

In the definition of fitting extremum, we give a sampling set with values. But, what if we give a subset of and a boolean function? This sure will give a fitting extremum as well.

###### Definition 4.4 (Fitting Extremum of a Boolean Function).

For one boolean function , and for a sampling set , we define one extremum problem as following:

 Min: d(C), C∈C & ∀x∈S C(x)=f(x)

We call this problem as fitting extremum on and .

Such a circuit is called as circuit generated by fitting extremum on sampling and . That is to say, given a sampling and a boolean function, we can generate a circuits from them.

###### Lemma 4.1 (Existence of Circuit Generated).

For any given boolean function , and any given sampling , the circuit generate by fitting extremum on and always exists. That is to say, there exists at least one circuit so that fitting with sampling and reach minimum.

Proof: For a given , we denote the set of circuits as : . Very clear that is not empty, since there is at least a circuit expressing , then fits with . So, the set is a nonempty set of integers. Thus, there must be a so that equals the minimum.

So, for any given and , there is at least one circuit generated by fitting extremum from them. That is to say, if we have a boolean function and a sampling set , we can put them into fitting extremum, then we get one or more boolean circuit fitting with on . Naturally, we ask: what is the relationship between and ? Could this circuit express exactly? We first see a simple example.

For OR function , for sampling , if we put them into fitting extremum, it is easy to see circuit fitting with sampling and . So, circuit is a circuit generated by fitting extreme. But, the circuit does not express since . However, if we choose sampling , the circuit generated by fitting extremum from and is , which expresses exactly.

This simple example tells us: For a boolean function , for some sampling , the circuit generated by fitting extremum from and indeed expresses , but for some other sampling, the circuit generated from fitting extremum does not express . The sampling that makes fitting extremum to produce a circuit expressing is special and needs our attention. Thus, we define proper sampling set.

###### Definition 4.5 (Proper Sampling Set).

For a given boolean function , and for a sampling set , if fitting extremum on and generates a boolean circuit , i.e. fits on , and reaches minimum, and if expresses exactly, i.e. , we say is a proper sampling set of , or just proper sampling.

In another words, when is proper sampling set, the boolean circuit generated by fitting extremum on and will always express . This is one crucial property.

We will use PSS to stand for proper sampling set. In the above simple example, for OR function , is not PSS, but is PSS.

###### Lemma 4.2 (Existence of PSS).

For any boolean function , there is some subset so that is proper sampling set of .

Proof: This is very clear. At least, the whole space is proper sampling.

That is to say, for any boolean function , PSS always exists. The trivial case is that PSS equals the whole boolean space . We can think in this way: give a sampling , if is not PSS, we can add more elements into , eventually, will become PSS. Of course, we do not want the whole space, if possible. This is actually the major problem we will discuss here. First, we consider more examples.

###### Example 4.1 (Examples for Sampling and PSS).

Note, normally, we write vectors as column. But, for convenience, for short vectors (low dimension), we write as row.

For OR function , the sample set is not PSS. It is easy to see the fitting extremum generate a constant circuit . But, the sampling set is PSS. Fitting extremum generates , which expresses exactly. Note, .

For AND function , the sampling set is not PSS. It is easy to see, fitting extremum generates a circuit . But, the sampling set is PSS, fitting extremum generates , which expresses . Also note .

For XOR function , the sampling set is not PSS. But,
is PSS. Here, .

See diagram C4 in Fig. 1. It is for a function . Sampling
is PSS. How do we know this? Let’s see some details. For node , this is a node with one negation connection. As we talked above, for node, the PSS should be: , but, since there is one negation connection, for node, the PSS become: . This is only for . But, we can add as 0, so, we have a set . But, we need sampling for . This is the sampling , as we set as 1, and as 0. So, we have . We then consider node . This is node. As above discussion, for this node, we need to have for . But, for this case, indeed will cause to have for . Thus, is a PSS. We can verify this by trying some circuits. But, the procedure we did here is generally true, which we will see in later discussions.

###### Example 4.2 (More example of PSS).

Consider a sampling with value, in , . This sampling set is not PSS. We can easily see that circuit fits with , and fits with as well. However, if we add one more sampling into , for example: , we can exclude out . Thus, is a PSS.

From above discussions, we know that for a boolean function , we could first sampling it, then apply fitting extremum on sampling, if the sampling is right, i.e. it is PSS, we will get a boolean circuit that express . This is a very great outcome. With this procedure, we can understand better.

###### Theorem 4.3 (PSS implies Circuit).

If is a boolean function , and is a PSS for , and is the size of PSS, then there is a circuit expresses and .

Opposite direction is also true, that is to say, if we have circuit, we can to construct a PSS from it.

###### Theorem 4.4 (Circuit implies PSS).

If is a boolean function , and is a boolean circuit to express , then there is a PSS for , and size of PSS is less than .

PSS implies circuit theorem tells us that for a boolean function , if we have a PSS for , we can construct a circuit to express and the size of circuit is controlled by size of PSS. Note, the size of circuit is one good measure of complexity of , thus, the size of PSS is also a good measure of complexity of .

Circuit implies PSS theorem tells us that for a boolean function , if we know a circuit expressing , we can pick up PSS by using .

So, the 2 theorems tell us that for a boolean function , if we have a PSS of , we can construct a circuit to express and the size of circuit is controlled by size of sampling. And, reversely, if there is one circuit expressing , then we can find a PSS by using circuit, and the size of sampling is controlled by size of circuit. Thus, the size of circuit and size of PSS is equivalent. Since the size of circuit is one good measure of computational complexity of , so is the size of PSS. This is a very important property.

The above 2 theorems are very crucial. We put the proofs for them in Appendix.

For one boolean function , there might be more than one PSS of it. Could be many. But, among all PSSs, the PSS with lowest number of nodes will be specially interesting.

###### Definition 4.6 (Minimal Proper Sampling Set).

For a given boolean function , if a sampling is a proper sampling set, and reaches the minimum, we call such a sampling set as minimal proper sampling set.

We use brief notation mPSS for minimal proper sampling set.

## 5 Learning Dynamics

We discussed universal learning machine in [2, 4, 5], which is a machine that can learn any possible to learn without human intervention. In our previous discussions, the learning dynamics of universal learning machine was given special attention, and several methods/strategies were introduced. As the result, we proved that with sufficient data (sufficient to bound and sufficient to support), universal learning machine can be realized. Of course, we are constantly looking for better learning methods. As a matter of fact, we invented Fitting Extremum and Proper Sampling Set (FE and PSS) particularly for such a purpose. Without the efforts to find better learning methods, perhaps FE and PSS would not be invented. In this section, we will discuss on how to utilize FE and PSS for learning dynamics.

### Universal Learning Machine

We briefly recall learning machine and learning dynamics. An universal learning machine is a system consisting of input space, output space, conceiving space and governing space. The input space has dimension, and output space has dimension. The conceiving space contains information processing unit that will get information from input space, process the information, and put results into output space. The conceiving space is the container for information processing units, and it normally contains many pieces of information processing. But, at one particular time, only one information processing unit is used to generate output. The learning is actually modifying/adapting the current information processing unit so that it becomes better. Governing space is the container for methods that control how learning is conducted.

For convenience of discussions and without loss of generality, we often set the dimension of output space . Thus the information processing unit becomes a boolean function . Inside conceiving space, there could be many boolean functions, and one is used as current information processing unit.

The input space is dimension, thus input . We also call the space as base pattern space. Any vector is also called as a base pattern. Learning machine will get information from input and form subjective view for in machine. Such subjective view is called as subjective pattern, which is handled inside machine by something called X-form. Actually, the information processing is done according to those subjective patterns, so according to X-forms. Inside conceiving space, normally, there are many X-forms.

X-form plays one crucial role in learning machine. For full details of X-form, consult [2, 4, 5]. Here, we focus on relationship between X-form and boolean functions.

###### Definition 5.1 (X-form as Algebraic Expression).

If is an algebraic expression of 3 operators, (OR, AND, NOT), and is a group of base patterns, then we call the expression as an X-form upon , or simply X-form.

Note a small difference on surface: in [2, 4, 5], we used for OR, AND, NOT operators. In fact, if we want to do algebraic expression, to use is much better. Here, for consistence with this paper, we use , though, which is not as good for algebraic expressions.

In another words, a X-form is an algebraic expression of some base patterns. This is one way to see X-form. But, we can view such algebraic expression as subjective pattern.

###### Definition 5.2 (X-form as Subjective Pattern).

Suppose is a set of subjective pattern, and is one X-form on (as algebraic expression). With necessary supports (i.e. the operations in the algebraic expression can be realized), this expression is a new subjective pattern.

Further, such algebraic expression can be viewed as information processing:

###### Definition 5.3 (X-form as Information Processor).

Assuming is a learning machine, is a set of subjective patterns subjectively perceived by , and is a X-form on (as algebraic expression), then is an information processing unit that processes information like this: when a basic pattern is put into , and perceives this pattern, then the subjective patterns forms a set of boolean variables, still written as: , and when this set of boolean variables is applied to , the value of is the output of the unit, and it is written as: .

Thus, one X-form actually is one boolean function. So, we now understand the meaning of X-form in several aspects. Why do we call as X-form? These expressions are mathematical forms and have very rich meanings, yet there are many properties of such expressions are unknown. Following tradition, we use X to name it.

Following theorem connect objective pattern, subjective pattern and X-form.

###### Theorem 5.1 (Objective and Subjective Pattern, and X-form).

Suppose is an learning machine. For any objective pattern (i.e. a subset in