Iterated belief revision (BR) deals with the problem of maintaining syntactic propositional knowledge representations that are sufficiently flexible to accommodate reasoning about a stream of incoming observations in the form of propositional formulae (over a finite alphabet of atomic propositions), while taking into account the possibility of any such observation being inconsistent with the current state of the knowledge representation. It is not unreasonable then to argue that BR operators should be used for maintaining well-reasoned internal representations for autonomous learning agents (see, e.g. ). However, one needs merely to observe the high computational costs associated with revision operators [31, 30] to conclude that such representations are too expensive to implement them in a mobile autonomous agent. Attempts at making the representations more palatable using prime forms [6, 33] have been made, but the fundamental complexity barriers remain .
We introduce a computationally cheap form of iterated propositional belief revision—the universal memory architecture (UMA)—which harnesses the geometry of model spaces in place of the model-theoretic techniques characteristic of this field.
The computational advantages come at the price of modifying the notion of an observation and restricting the syntactic form of the epistemic state maintained by the agent (understood in the broad sense of Darwiche and Pearl ) to a special type of default system in the sense of .
Most notably, observations are no longer allowed to take the form of arbitrary propositional formulae; rather, we restrict them to conjunctive monomials in the underlying propositional variables.
Equivalently, an observation is a partial truth-value assignment to the agent’s inputs.
In addition, each observation is accompanied by a value signal—a quantity indicating a notion of the value of the experience to the agent at that time.
111 The value signal should not be confused with the notion of reward, as used in Reinforcement Learning. One of our learning schemes (see Section
The value signal should not be confused with the notion of reward, as used in Reinforcement Learning. One of our learning schemes (see Section4.2) leads to a (partial) syntactic representation of the distribution from which observations are being drawn, and does not encode any preference of one state over another.
These alterations to the classical setting of iterated BR are motivated by the prospect of implementing iterated BR on mobile robotic platforms in real time. While the Boolean component of the observation corresponds to the robot’s raw sensory inputs, the value signal may correspond to an encoding of a task, or to feedback from a teacher. The limited form of the epistemic state maintained by an UMA instance reduces the space and time complexity costs of maintenance (applying the revision operator) and exploitation (e.g. inference) down to an absolute minimum, as we review next.
1.2. Contributions: Introduction and Analysis of UMAs.
Motivated by the problem of realizing iterated belief revision and update in a bounded resources setting, we seek a class of lightweight general-purpose representations. From a learning perspective, ours is a problem of learning from positive examples: an observer of an unknown, unmodeled system experiences some process—a sequence of transitions—in that system through an array of Boolean sensors, and is required to reason about regularities in the observed sequence of experiences, constructing a formal theory of what is possible for that system.
We assume that observations occur in discrete time steps. An observation at time will consist of (1) a complete truth-value assignment222Henceforth, the symbol appended to anything else should be read as “at time ”. —the observation at time —to a fixed set —the sensorium—of Boolean queries of the agent’s interactions with its environment; and of (2) a sample of a fixed value signal, .
Little needs to be assumed about the sensorium: for the purpose of this paper, we allow any query expressible as a Boolean function of the state history (finite or infinite) of the system (an appropriate formalism is developed in Section 2.1.1); it is also assumed that truth-value assignments are consistent in the sense that each agrees with the values of the available queries on the history that manifested at the corresponding time; finally, observations are assumed to be time shift-invariant
in the sense that observing the same histories at different times must yield the same Boolean observation vector. The value signal, for now, is assumed to bestatic, in the sense that it factors through a function of the observation (more detail in Section 3.1). The architecture itself does not rely on any of these assumptions, but the learning guarantees we provide in this paper do.
An UMA representation integrates its accumulated experiences by repeatedly revising two structural components, based on the incoming observations: (a) a relation , called a pointed complemented relation (PCR), representing a system of implications, or defaults, which the agent believes to hold true among the queries in ; and (b) a set , representing the agent’s belief regarding the current state of the system. The machinery for maintaining these data structures will be referred to as a snapshot. Briefly, our results about UMA representations are as follows.
Universality of Representation.
In our intended setting, the learner’s sensors realize the formal sensorium as a family of subsets of the space of histories, closed under complementation. The possible worlds actually witnessed by points of this space correspond to the learner’s perceptual equivalence classes (in the sense of, e.g. [13, 42]). Intuitively, an element of the PCR should be seen as correct if no history falsifies the formula , and, more generally, if histories falsifying are improbable, or insignificant according to the user’s formal model of these notions.
It turns out that a PCR supports a natural dual space, a set of possible worlds canonically associated with the PCR. Recall that a possible world over is a complete truth value assignment . We prove that, given a PCR over a set of literals , its dual space has the following universality property (Proposition 2.22): is the smallest set of possible worlds over which, for any realization of as a set of Boolean queries over a space not falsifying a relation listed in , contains every model for .
Returning to UMA learners, this means that the model space encoded by the PCR is a minimal envelope for the true space of possible worlds, provided just the information that all the relations recorded in are correct.
From a computational perspective, the maintenance costs of an UMA representation are roughly the same as those of maintaining a neural representation (=the cost of maintaining and using a matrix of weights), but with the added benefit of affording a formal understanding of the model space, its geometry, and its deficiencies. Here are some results, all of which are corollaries of the geometric properties of the class of model spaces defined by PCRs. Let denote the cardinality of the sensorium . Then:
Maintaining an UMA snapshot structure requires space;
Update operations for learning the PCR structure require time;
Inference requires time, reducible to on fully parallel hardware. 333We will remark that our current implementation is, in fact, an implementation utilizing matrix multiplication on a GPU. This kind of implementation makes it possible to multiply fairly big matrices very quickly, improving on the performance of the naïve quadratic algorithm we provide later in this paper.
Multiple Learning Paradigms.
The mathematical foundations for UMA provide sufficient flexibility to admit a variety of learning mechanisms and settings, spanning the range from probabilistic filtering, as proposed in , to a variation on [iterated] revision and update introduced in , while keeping maintenance costs down to the bare minimum (see preceding paragraph). Depending on the snapshot type, different learning scenarios and guarantees may be provided, while maintaining a uniform revision and update scheme at the symbolic level.
Flexibility of Representation.
A central feature of the UMA architecture is that the duality theory of PCRs allows one to interpret maps between PCRs as maps between the associated model spaces and vice versa. This makes it possible to formally introduce—as well as operate with—notions of approximate equivalence, of redundancy and negligibility of queries. This also enables the study of the impact on model space geometry of operations augmenting a sensorium with new queries (see, for example, Section A.2.4
) or removing existing ones. In particular, this opens a way to formal (and, possibly, automated) cost/benfit analysis of such extension and pruning operations—a topic of ongoing research at the moment, which we will touch upon briefly in our final discussion of the results presented in this paper.
1.3. Related Work.
Given the focus of this work on the representation of knowledge using defaults, we believe it is most tightly related to work in the field of propositional iterated belief revision. Early work in BR resulted in wide acceptance of the AGM framework [4, 3, 2] for maintaining a belief set—a deductively closed set of formulae representing the state of the observed system. Convenient, intuitive axioms for belief revision in the propositional setting, the KM axioms, were developed by Katsuno and Mendelzon in .
Pointing out some inadequacies of the KM axioms in the context of repeated application of revisions, Darwiche and Pearl (DP) argue in their seminal paper  that, to achieve the overarching goal of iterated revision, one must maintain a set of conditional statements—an epistemic state—which, upon revision by an incoming observation, always produces a belief set accommodating that observation (axiom of the DP system of axioms for iterated revision). Building on Spohn’s framework of ordinal conditional functions  and its implications for ranked default systems [39, 17] and revision of the associated belief sets , they propose to view ranking functions as epistemic states (interchangeable with the associated system of ranked defaults), as they construct appropriate revision operators. Consequent work by many authors [24, 12, 27, 22, 34, 28]—much of it very new—considers different weaknesses and benefits of the DP axioms, relating to the effect of the order in which observations are made and the manner of mutual dependence they present, and resulting in a variety of iterated revision methods, as well as in some proposals to apply belief revision methods to the control of general agents  based on varying computational approaches to belief revision operators (e.g. [6, 33] on the use of prime forms for this purpose).
Clearly, the problems tackled by this field generalize the representation problem we posed at the beginning of Section 1.2, but one needs merely to observe the high computational costs associated with revision operators [31, 30] (or with computing normal forms and prime forms ) to reach the conclusion that the existing computational approaches cannot be considered viable candidates for a solution of the representation problem in any setting where computational resources are limited.
Aiming to reduce the computational burden on the learner, we shift attention from precise syntactic computation with arbitrary propositional formulae to imposing radical simplifying assumptions on the allowed model spaces. The postulated mode of interaction between the agent and its environment—specifically the fact that the agent is constrained to processing sequences of samples from the space of realizable models (rather than arbitrary propositional formulae)—suggests constructing successive upper approximations of , belonging to a restricted class which satisfying the following intuitive properties:
Syntactic characterization of an element in is computationally inexpensive;
Each approximation is, in some sense, optimal/minimal among members of , given its predecessor and the last observation;
Reasoning (e.g., forming a belief set) over a member of is cheap.
We present results on what is, in essence, the simplest possible class of model spaces satisfying these three requirements: the class of finite median algebras. This class of spaces is well studied, in several different guises, and in very disparate fields. These include: event structures in parallel computation ; median graphs in metric graph theory ; simply connected non-positively curved cubical complexes in formalizations of reconfiguration in robotic systems ; and the spectacular recent achievements in the topology of 3-dimensional manifolds by Agol  are much due to the notion of a cubulated group from Geometric Group Theory .
1.4. Structure of this Paper.
In Section 2, we extend Sageev-Roller duality444See  for a detailed development of that theory; chapters 6-7 of  for a brief intuitive review; and here, Appendix A for background material and examples developed specifically to support this paper., to obtain all finite median algebras as duals (model spaces) of PCRs, viewed as systems of defaults. Further, we explain how to reason over model spaces in this class by leveraging their geometry to avoid satisfiability checks, or any kind of explicit search in model space, for that matter. We then explain in Section 3 how, using UMA snapshot structures to perform a variant of iterated revision, where the model-theoretic outlook on the problem is replaced by its geometric counterpart arising by Sageev-Roller duality. We discuss the necessity of relaxing the DP axiom , and show there is a natural operator for computing a belief set, the coherent projection.
Section 4 presents two different classes of snapshot structures—mechanisms for learning PCR representations—one motivated by Goldszmidt and Pearl’s interpretation of default reasoning as qualitative probabilistic reasoning , and the other based on statistical integration of the observed value signal. Finally, Section 5 presents two kinds of simulation studies:
First, in a range of settings with a-priori known (or readily computable) implications in the sensorium, we consider the deviation of the learned PCR from the ground truth as a function of the number of samples. This is done for both snapshot types, and under different exploration paradigms: sampling and diffusion.
Next, we consider settings closer to the heart of a roboticist. We implement agents with a reactive control paradigm based entirely on their internal UMA representations and conduct comparative simulation studies of their performance given different domains for exploration, and snapshot types.
We close with a discussion of our results and of avenues for additional research in Section 6.
2. Model Spaces for Systems of Approximate Implications.
In this section we construct a representation for finite median algebras (see above) that is sufficiently flexible to be maintained dynamically, and we explain how to reason over these representations. We review and apply existing results about the geometry of model spaces of this class of representations, leading to complexity bounds on maintenance and exploitation.
Section 2.1 formally introduces the basic formal notions required for discussing our representations. Section 2.2 constructs the model spaces as dual spaces of pointed complemented relations (PCRs) and discusses their universal properties. Section 2.3 relates PCRs and their duals (the associated model spaces) to the earlier duality theory of poc sets that motivated our approach, showing that PCR duals are, in fact, poc set duals. Section 2.4 reviews known results about the geometry and topology of poc set duals. Finally, in Section 2.5 we discuss the connection between the geometry of PCR duals and algorithms enabling reasoning over PCRs.
2.1. Pointed Complemented Relations (PCR).
The nature of our application requires a generalization of the formal theory we are about to use, the Sageev-Roller duality theory of poc sets , prompting some changes in the language. We start with:
Definition 2.1 (pointed complemented set, PCS).
A pointed complemented set is a set endowed with a self-map satisfying and for all , and containing a distinguished element, denoted . The element will be denoted . Whenever possible and safe, we will abuse notation and use the symbols in different PCSs. For any we will denote by the set of all , .∎
Definition 2.2 (PCS morphism).
By a PCS morphism we mean a function between PCSs satisfying and for all . The set of all PCS morphisms from to will be denoted by .∎
Example 2.3 (set families, power sets).
Any collection of subsets of a fixed non-empty set satisfying (1) , and (2) . Then is a PCS with respect to the choices and .
The power set of a singleton is, up to isomorphism, the smallest PCS, which we denote by , and identify with the set . Also, the power set will be routinely identified with the set of all functions .
Example 2.4 (PCS over an alphabet).
Suppose is a finite collection of symbols, and think of them as atoms of the propositional calculus over . The extended collection of literals over ,
may be thought of as a PCS when one declares , , and , for all . Hereafter, and stand for the truth values True and False, respectively.
The reason for considering PCSs is that -selections “live on them”:
Definition 2.5 (-selection, the Hamming cube).
Let be a PCS. By a -selection on we mean a subset such that . In addition, a -selection on is complete, if . The set of all -selections with will be denoted by , and referred to as the [combinatorial] Hamming cube on . Its set of vertices, the complete -selections in , will be denoted by . ∎
We now consider these notions in the context of our intended application.
2.1.1. Binary Sensing, Possible Worlds and Perceptual Classes.
Suppose is an observer of some system as it undergoes the transitions along a state trajectory , and suppose is a finite set of unique labels for the Boolean queries available to —this observer’s sensorium. We assume observations of by begin at . It will not matter for our discussion whether the trajectory of in any particular instance does indeed extend indefinitely into the past or future: if needed, one may set the value of to be eventually constant (in either direction).
By a history of we mean a sequence of the form , where is a state of for all , and represents the current state of the history ; represents the preceding state, and so on. Given a trajectory of observed by , at each time , the history that manifests at time is given by .
Henceforth, we let denote the space of histories possible for the system given the initial history manifested at time (as is the case in all physical systems, may have its own dynamics, disqualifying some histories from manifesting at any time , or making such events highly improbable). To say that ’s queries/sensors are time-shift invariant is to say that each query is represented by a fixed Boolean function of the manifested history. In other words, the sensorium is defined by a PCS morphism , , with a sensor reporting on history if and only if .
The mapping induces a partition on —its partition into perceptual classes—as follows. Construct a map by setting if and only if ; each point is mapped to the set of queries (including complements) which evaluate to on that point. Two points are sensory-equivalent if . The image are the possible perceptual states of in the system , given and the system’s initial history. We will also refer to a world/-selection as consistent, if, and only if , or, in other words, if and only if is witnessed (through ) by a point of .
2.1.2. Concept Presentation of Perceptual States.
Digging deeper into the formalism presented just now, observe that -selections are in one-to-one correspondence with vectors, as defined in concept learning . Recall that a vector is an assignment of values standing for , , and “undetermined”, respectively, to the alphabet . A vector is total if it has no values. The map is then a correspondence between vectors over and -selections on the PCS , mapping the set of total vectors onto the set of complete -selections. In more geometric terms, a complete -selection—which corresponds to a complete conjunctive monomial (aka complete term) over —defines a vertex of the cube , while a -selection with corresponds to a -dimensional face. We will refer to as the Hamming cube. The advantage of PCS terminology here is that -selections on enumerate the faces of the Hamming cube without us having to pick an origin for the cube.
Pushing the geometric viewpoint a bit further, we consider the notion of concepts. In , Valiant defines concepts as mappings of the space of vectors to , satisfying the requirement that on a vector if and only if for all total vectors which agree with on those where . In other words, concepts correspond to collections of faces of the Hamming cube, possibly of varying dimensions, satisfying the condition that a face belongs to if and only if every vertex of lay in . Such are precisely the sub-complexes of the Hamming cube obtainable from it by vertex deletions.555Similarly to case of graphs, the operation of deleting a vertex from a cubical complex requires the removal of all the adjoining faces.
Now we return to the observer and the system whose evolution it observes through the queries realized by , as discussed in the preceding section. Thinking of the space of perceptual classes as a concept gives rise to a cubical sub-complex, say , of the Hamming cube, whose faces correspond to those -selections on the PCS that are witnessed (via ) by a point in . Thus, precise reasoning and planning over depends on one’s ability to efficiently capture/encode: (1) the notion of consistency produced by the map ; (2) the topological properties (e.g. connectivity, contractibility) of ; and (3) the geometric properties (e.g. shortest paths, curvature, isoperimetric inequalities) of . The class of approximating model spaces we propose to use as proxies for is a result of weakening this notion of consistency to the extreme, all the way to the notion of coherence discussed in the next section.
2.1.3. PCRs, Implications and Coherence.
Definition 2.6 (pointed complemented relation, PCR).
Let be a PCS. By a pointed complemented relation over we mean a set satisfying666To avoid a proliferation of parentheses, we write to denote the pair . and for all .∎
In the context of the representation problem, one should think of a PCR over as a record of Boolean implications believed to be valid over , conditioned on the particular space of histories being observed. In this respect, a PCR is a restricted form of the notion of a system of defaults, as discussed, e.g. in . Some of these implications are specified directly ( to be read as “it is believed that follows from ”), while others are derived as their consequences, by transitive closure. Hence the following language:
Given a PCR over a PCS , for any , , one defines the following:
Write if lies in the reflexive and transitive closure of ;
The -equivalence class of , denoted , is the equivalence class of under the relation on ;
The forward (backward) closure, (resp. ), of with respect to is the set of all for which (respectively ) holds for some ;
Note that . One says that is forward-closed if ;
Finally, we observe that for all .
We will often drop the subscripts when no ambiguity can arise.∎
Definition 2.8 (PCR morphism).
Let be PCRs over , respectively. A morphism of PCRS from to is a PCS morphism , additionally satisfying in whenever . The set of all morphisms from to will be denoted by .∎
The primary example of a PCR for this work derives from the view of a power set as a PCS (Example 2.3):
Example 2.9 (Set Families as PCRs).
Let be a set. Then any collection of subsets of that is closed under complementation and satisfies gives rise to the PCR of all pairs with , and . In what follows, will always be regarded as a PCR in this way, for any .∎
Another ‘canonical’ example of a PCR to keep in mind is:
Example 2.10 (Less classical PCRs).
Let be any set. Then may be endowed with the structure of a PCR by setting , , and, for any , setting , and if and only if , .∎
Our notion of model for a PCR rests on the following weak form of consistency:
Let be a PCR over . A subset is said to be -coherent, if no pair satisfies .∎
Note that a -coherent set is always a -selection on . Furthermore:
so coherence is preserved by forward closure. Coherent, forward-closed sets may be thought of as the natural counterparts of the notion of a belief state in this setting. We now turn to studying the appropriate notion of model.
2.2. Model Spaces as Dual Spaces
Definition 2.12 (duals).
Let be a PCR over . The set of maximal -coherent subsets of is the dual of . The set of all forward-closed -coherent subsets will be denoted .∎
A standard application of Zorn’s lemma shows that any -coherent subset of is contained in an element of . Note also that .
Example 2.13 (the orthogonal PCR and the Hamming cube).
The simplest example of a dual space is one where the PCR in question is as small as possible. Let be a PCS. The smallest PCR over contains only pairs of the forms and . We will denote this PCR by and refer to it as the orthogonal PCR over . It is clear that , the “Hamming cube” from Definition 2.5.∎
Example 2.14 (‘bad’ queries).
The definitions given above do not preclude one from considering, for example, the PCR . It is easy to see that . At the same time, the smaller has . More generally, for any , having precludes from belonging in any -coherent set. In particular, if both and hold, then no -coherent set is a complete selection on .∎
Following the last example, two definitions are in order:
The trivial PCR, henceforth also denoted by , is the PCR over containing only .∎
Definition 2.16 (negligible query, degenerate graph).
Let be a PCR over . An element is -negligible, if . Denote the set of negligible elements by . We say that is degenerate if contains a negligible element whose complement is also negligible. Note that .∎
For a PCR over , the following are equivalent:
Every element of is a complete selection on ;
Some element of is a complete selection on .
See Section B.1.∎∎
The impact of this result on our representation problem is twofold. First, it provides a clear and easily verifiable criterion for when the dual space of a PCR consists (only!) of possible worlds. Second, it introduces a new and consistent notion of a query of low import, not involving arbitrary choices such as thresholding.
Let be a non-degenerate PCR over the PCS . Then the mapping defined by is a bijection.
See Section B.2.∎∎
Note that the mapping is independent of the choice of .
The last proposition explains the sense in which may be thought of as a dual space of . As with other instances of duality, this is useful because it enables dual mappings:
Let be a PCR morphism. The dual mapping is defined by . Alternatively, upon applying the identification in Proposition 2.18, for any , one has to obtain an element of . ∎
We remark that, since morphisms are composable (meaning that the composition of two morphisms is a morphism as well), so are their dual mappings, producing the identity .
Let be a non-degenerate PCR over a PCS . Then it is clear that the identity mapping — that is: for all — is a morphism of PCRs. The dual mapping is then, clearly, an injection. This reflects the intuitive notion that the dual of any (non-degenerate) PCR may be “excavated” out of a standard Hamming cube by going over all -incoherent pairs, one by one, and successively deleting any vertices of which contain the given pair.
We further specialize the example to our representation problem, considering the effect of fixing a PCR structure on a given PCS:
Proposition 2.22 (Universality of Representation).
Let be a non-degenerate PCR over . Then, for any non-empty set and every PCS morphism , the set of all complete -selections witnessed (via ) by a point in (in the sense of Section 2.1.1) is contained in whenever is a PCR morphism. Moreover, is the smallest subset of having this property.
See Section B.3.∎∎
Thus, the dual of a non-degenerate serves as a minimal model of the state space of the system , and remains valid under any change to this system for as long as remains order-preserving. This is a form of robustness of the representation to changes in the coupling between the agent’s sensory equipment and the environment: changes leaving the implication record invariant provide no reason for the agent to alter its reasoning.
2.3. Reducing PCR Representations.
The universality of PCR duals motivates a deeper study of their properties, seeking a better understanding of the degree of redundancy in the description of by a PCR . This is not a mere technical issue: while non-degeneracy guarantees the adequacy of our notion of an associated “possible world”, it is not obvious that it also provides for sufficient control over the quality of inference. The intended application—inferring approximate
implications from partial observations—is well known to be problematic in the absence of simplifying assumptions (e.g. the ubiquitous restriction to directed acyclic graphs in the context of Bayesian networks). It is therefore crucial to clarify the precise formal sense in which a PCR may be viewed as encoding a “record of implications”, which is the purpose of this section. A crucial notion in any such discussion is that of what it means for a query, as well as for the difference of two queries, to be negligible, because negligible but non-zero differences tend to accumulate in the transitive closure into material ones.
Looking more closely at the setting of the last proposition, notice that, for a fixed , the assumption that is a morphism translates into the following. The property for all implies for any (because is the only negligible element of ); furthermore, must hold whenever and are -equivalent (recall Definition 2.7). These identifications lead us to recall Roller’s definition of a poc set from :
Definition 2.23 (poc set).
A poc set is a tuple where is a partially ordered set with a minimum element , endowed with an order-reversing involution777That is, and for all . satisfying and for all .∎
In other words, a poc set is a transitive and anti-symmetric PCR over whose only negligible element is .
Proposition 2.24 (canonical quotient).
For any non-degenerate PCR there exists a surjective PCR morphism of onto a poc set such that any PCR morphism gives rise to one and only one PCR morphism satisfying .
We defer the proof to Section B.4, but define the canonical quotient mapping here. We set:
and let , and setting to hold in if and only if . It remains to verify that (1) is a well-defined PCS; (2) is a well-defined poc set structure over ; and (3) the assertions of the proposition hold.∎∎
One should view this result as stating the precise conditions necessary for presenting a poc set in terms of a set of generators and a set of relations. However, the emphasis on what happens to morphisms leads to powerful realizations about dual spaces:
Corollary 2.25 (all duals are poc set duals).
If is a non-degenerate PCR then is a bijection.
See Section B.5.∎∎
Corollary 2.26 (naturality of canonical quotients).
Let be non-degenerate PCRs. Then, for every morphism there exists one and only one morphism satisfying .
See Section B.6.∎∎
A particular consequence of the last corollary is that one also has . This means the dual maps of and coincide up to the identifications between the pre- and post-projection duals. Thus, any results about poc set duals apply to duals of PCRs. In the next two sections we review these results, and then harness them in our construction of the universal memory architecture (UMA).
2.4. Convexity theory of PCR duals.
To discuss the geometry of PCR duals, we need to endow PCRs with more structure. From this point on, all PCRS we consider will be finite, with the sole possible exception of power sets.
Definition 2.27 (Hamming metric).
Let be a PCR over . The Hamming metric on is defined by , where is the canonical quotient map. We define to be the simple888That is: loopless, unoriented, with no multiple edges. graph with vertex set , and edges of the form for all with .∎
In the case when is already a poc set, two vertices form an edge if and only if is a singleton, that is: the perceptual classes represented by and differ by the truth value of a single query. The common edge they span in the Hamming cube corresponds to the -selection in the concept presentation. In the general case ( not necessarily a poc set), since both and are coherent, each is the union of with a number of -equivalence classes , (recall Definition 2.7 and Proposition 2.24). Thus and span an edge in if and only if for some . Intuitively, we think of the different as counting for a single Boolean query.
We briefly recall the graph-theoretic notion of convexity:
Definition 2.28 (convexity in graphs).
Let be a graph and let . The hop distance is defined to be the minimum length of an edge-path in joining with . The interval is defined to be the set of all vertices satisfying the equality . A set is said to be convex in , if holds for all . A set is a half-space of , if both and are convex sets in . Finally, we denote by the poc set whose elements are the half-spaces of (note that is a half-space of ), ordered by inclusion, and with .∎
We refer the reader to , section 4, for the (very elegant and much more general) proofs of the following two lemmas (stated there for poc sets, but valid for finite non-degenerate PCRs as well, due to Proposition 2.24 and its two corollaries):
Let be a finite non-degenerate PCR. Then the hop metric on coincides with the metric .∎
Let be a finite non-degenerate PCR. Then the half-spaces of are precisely the subsets of of the form999Note that for all , by Proposition 2.17.
In particular, subsets of of the form
are convex in , for any .∎
To simplify notation, we will abuse it in the following ways:
Writing , without specifying will henceforth refer to the subsets of , those are and , respectively.
When is explicitly known, , we will write instead of when convenient.
As a side note, observe that , where coincides with the vertex set of a face of the hamming cube . In particular, presenting any subset of as a concept is equivalent to decomposing it as a union of convex subsets of .
A connected simple graph is said to be a median graph, if the set contains exactly one vertex for each . This vertex is the median of the triple and denoted by – see Figure 1. For median graphs , , a median morphism of to is a map which preserves medians: . ∎
A central result in Sageev-Roller duality, specialized here to the finite case, and reformulated for non-degenerate PCRs is:
The dual of a finite non-degenerate PCR is a finite median graph, with the median calculated according to the formula:
and with intervals in calculated according to the formula:
Conversely, if is a finite median graph then is naturally isomorphic to by sending every vertex to the -selection of all half-spaces of which contain .∎
This result is the consequence of a very strong convexity theory:
Theorem 2.34 (Properties of median graphs, , section 2).
Let be a finite median graph. Then:
Any family of pairwise intersecting convex sets has a common vertex;
Every convex set is an intersection of halfspaces;
For any convex subset , the subgraph of induced by is a median graph;
For any convex and any there is a unique vertex at minimum hop distance from ;
For any convex , the nearest point projection is a median preserving, distance non-increasing retraction of onto its subgraph induced by .
Property (1) is often referred to as the Helly property.∎
The Helly property is, perhaps, the most notable of the results stated above. In our setting of PCR duals, it may be interpreted as guaranteeing the satisfiability of any family of conjunctive monomials over in which every pair is separately satisfiable.
Given the central role of half-spaces in the convexity theory of median graphs, a notion of the set of half-spaces dual to a given set of vertices is useful:
For , its dual set of halfspaces, , is defined to be the set of all with .∎
An immediate corollary of Theorem 2.34(2) is:
Suppose is a non-degenerate PCR, and . Then is -coherent and forward-closed, and the convex hull of in coincides with .∎
Thus, every convex subset of may be written as for some . This representation is unique, by last assertion of the following lemma:
Let be a non-degenerate PCR over . Then, for all :
if and only if is coherent;
For all one has ;
If then ;
If is coherent then ;
If , then
For all one has .
See Section C.1.∎∎
Another important result helps bound the distance from the points of one convex set to another:
Let for a poc set over . Then for all .
See Section C.4.2.∎∎
This motivates the following definition for the general case:
Let be a non-degenerate PCR over and let . The divergence of from is defined to be .∎
Note how seems independent of ; it is not, however, since it is only applied to upwards-closed coherent sets . We will use this notion of divergence in Section 5.3, to drive the decision-making mechanism of the binary UMA agents briefly introduced there.
More details about the convexity theory of a median graph will be discussed in the appendices, as we go about proving our algorithmic results.
2.5. Propagation: A Computational Workhorse.
We are now ready to present another central result of this paper: a low-complexity method for computing nearest point projections in , which we call propagation. This method obviates the need for maintaining an explicit representation of each vertex of in memory, reducing space requirements for this architecture from in the worst case to . The time complexity is, at worst, , coming down to sub-linear on a fully parallel architecture, as will become evident below.
Definition 2.40 (coherent projection).
Let be a PCR over a finite PCS . For any , the set is said to be the -coherent projection of .∎
Coherent projection itself plays an important role in obtaining an observer’s belief state from its epistemic state (the learned PCR structure) and the latest observation (see Section 3.3).
The promised formula for computing projections works as follows.
Let be a PCR over a finite PCS . Let and suppose is -coherent. Let and . Then:
where is the nearest-point projection to in defined in Theorem 2.34.
See Section C.4.∎∎
This description of nearest point projection is easy to visualize as being computed by an algorithm propagating excitation among nodes of a directed graph:
Let be a PCR over a finite PCS . Let . Denote by the graph with vertex set , edge set and with Boolean weights , attached to its vertices. We refer to it as being loaded with .∎
A propagation algorithm over is any algorithm which, for any -coherent load and any accepts and as input and produces as its output the loaded graph , where
Note that coherent closure is obtainable via .∎
Envisioning as describing a graph of ‘cells’ labeled by
and ‘synapses’ labeled by pairs, the loaded graph represents a state of the network indicating that the cells of are in an excited state. A propagation algorithm should be seen as exciting, additionally, the cells of and spreading this excitation along the directed connections while inhibiting for each cell encountered along the way. Realized on a modern day computer, this may be achieved in quadratic time in . For example, propagation could be implemented using a variant of depth-first search (DFS) on , while maintaining an expanding record of vertices visited —see Algorithm 1. On a fully parallel machine allowing the ‘cells’ to compute their own excitation, the time complexity is clearly of the order of the longest directed vertex path in the network, which is sub-linear in .
We now turn to a high-level description of the UMA architecture and its use of the results of this section.
3. Universal Memory Architecture (UMA): a High-Level View.
In this section we provide a high-level description of the basic UMA functionalities: PCR update/revision and maintaining a belief state.
3.1. Observation Model.
Recall from Section 2.1.1 that an observer is given a set of initial Boolean queries over the space of histories of the observed system. The system of queries and their complements is modeled as a PCS morphism , which is unknown to the observer. The observer is presented with a sequence of observations , and values , , one per update cycle. One must distinguish between two settings:
- Static signal.:
The value signal only depends on the raw observation ;
- Dynamic signal.:
The value signal may produce while .
While ultimately interested in covering the dynamic setting, we will only deal with the static setting in this paper. However, the setting being static by no means implies it is unchanging. We will see in Section 5 that instances of the static setting may, nevertheless, have rich and interesting dynamics. This will happen, in part, as a result of introducing delayed queries. By these we mean the following: if denotes the operation of truncating the last state from a given history, then, for any conjunction of already available queries it is possible to introduce a new query of the form101010Here and on we abuse notation, applying the symbol to denote both a delayed query and the history truncation/shift operator. Which is which is clear from the context. , where reports its value according to the rule , . Of course, implementing this operation requires that the UMA architecture retain the latest raw observation, but this seems like a small price to pay for increasing the range of application of the static setting.
The basic task of an UMA is to evolve a sequence , of non-degenerate PCRs over while aiming for the PCRs to eventually satisfy the following:
‘Completeness’: is a PCR morphism, ensuring that every perceptual class is represented;
‘Precision’: is as close as possible to the true model space .
These requirements should not be taken literally, however. For example, it stands to reason that in some contexts the observer could afford to misclassify a few perceptual classes of low import. We will see how—at least under some of the learning schemes we propose—these vague requirements become possible to state precisely in terms of PAC learning.
3.2. Maintaining a PCR presentation: Snapshot Structures.
A rather restrictive notion of a snapshot structure—a method for learning a poc set structure from positive observations—was introduced by the authors in . Here we merely review the main ideas to provide intuition, while deferring the formal constructions to Section 4.
Motivated loosely by Hebbian ideas about learning , we consider maintaining an evolving symmetric system of weights , with quantifying in some prescribed way a notion of cumulative degree of relevance of the event to the observer, at time .
In addition, rules to maintain as time progresses, must be provided. First, a completion rule, to insert missing values into when it undergoes an extension. Second, an update rule, computing from and the incoming observation.
It is important for both rules to be as simple—and as local—as possible, so as not to sacrifice tractability. In our constructions, we constrain the update laws to ones where depends only on , the value signal , the truth value of the bit and possible global parameters (e.g. the system clock ).
PCRs from snapshot weights.
Inspired by the rough mechanism proposed in , we seek weight systems for which the loosely specified rule—
ranging over all with is guaranteed to define a non-degenerate PCR over . The motivation for the rule is, of course, the fact that is equivalent to , where is the PCS morphism defining the semantics of the queries in .
Finally, note how the properties of a PCR are guaranteed (to the extent that the rule is well-defined, of course), and non-degeneracy is the only remaining question. Of course, the precise notion of ‘negligible’ defined for the purpose of comparing weights is crucial, and is expected to greatly affect the quality and limitations of the emerging representations.
3.3. Maintaining a Belief State.
Since, for each time , we only get to observe states from , we are facing the problem of having to learn negative statements—that is, the list of -incoherent pairs—from the stream of positive examples . From what we have observed so far we must reason about what it is we might never encounter. Seeing that the implication record is inherently uncertain, providing no guarantee at any time that the completeness requirement from Section 3.1 will be met, it is quite possible for the observation to land outside the model space despite its prior role in forming this model space, during the snapshot update. In fact, its value may be too low to trigger a revision of into a for which becomes coherent.
Contrary to the approach adopted by modern iterated revision schemes based on Darwiche and Pearl’s , we do not insist on a revision forcing into . Instead, we apply to the raw observation with aim to relax it, replacing it with a -coherent and forward-closed set:
in the role of the current state of record, or the belief state. This way, UMA naturally resolves possible contradictions at the price of introducing ambiguity into its record of the current state: instead of marking a single vertex of as the current state, any vertex of the convex set may turn out to be the correct current state from the observer’s point of view.
The choice of the coherent projection for the purpose of forming the belief state is motivated by its geometric and categorical properties. In our class of model spaces it is a canonical method of producing coherent sets, as witnessed by the following two results:
Proposition 3.1 (Coherent Approximation).
Let be a PCR over . Then, for any , if realizes the Hamming distance
—that is, if —then we must have .
See Section C.2.∎∎
Thus, the operation ) yields the “best approximation” of by a convex subset of , echoing the principle of minimal change as seen through Dalal’s way  of quantifying the distance between theories. Moreover:
Proposition 3.2 (Coherent Projection).
Let be a PCR over . Then the following hold for all :
(a) is coherent and ;
(c) whenever is -coherent;
(d) if and only if is -coherent and .
In other words, as a self-map of , the operator is an idempotent whose image coincides with .
See Section C.3.∎∎
Note how properties (a) and (c) turn into a closure operator on the subspace of -coherent sets with respect to inference (implication). At the same time, (b) and (d) characterize the set of all terms that are closed under inference.
Overall, Equation 11 provides an intriguingly natural way of maintaining an internal model and belief state with a built-in degree of resilience to observations that fail to make immediate sense to the agent given its epistemic state. Finally, the complexity of this computation is the complexity of propagation over , by Proposition 2.41 and the discussion following Definition 2.43.