Universal Memory Architectures for Autonomous Machines

02/21/2015 ∙ by Dan P. Guralnik, et al. ∙ 0

We propose a self-organizing memory architecture for perceptual experience, capable of supporting autonomous learning and goal-directed problem solving in the absence of any prior information about the agent's environment. The architecture is simple enough to ensure (1) a quadratic bound (in the number of available sensors) on space requirements, and (2) a quadratic bound on the time-complexity of the update-execute cycle. At the same time, it is sufficiently complex to provide the agent with an internal representation which is (3) minimal among all representations of its class which account for every sensory equivalence class subject to the agent's belief state; (4) capable, in principle, of recovering the homotopy type of the system's state space; (5) learnable with arbitrary precision through a random application of the available actions. The provable properties of an effectively trained memory structure exploit a duality between weak poc sets -- a symbolic (discrete) representation of subset nesting relations -- and non-positively curved cubical complexes, whose rich convexity theory underlies the planning cycle of the proposed architecture.



There are no comments yet.


page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

I-a Motivation

A major obstacle to autonomous systems synthesis is the absence of a capacious but efficient memory architecture. In humans, memory influences behaviour over a wide range of time scales, leading to the emergence of what seems to be a functional hierarchy of sub-systems [1]: from non-declarative vs. declarative through the split of declarative memory into semantic and episodic [2]; and on to theories of attention and recall [3]. This variety of scales is mirrored in the collection of problems addressed by the synthetic sciences: from learning dependable actions/motion primitives [4, 5]; through learning objects and their affordances [6, 7] to demonstration-driven task execution [8, 9]; through exploring and mapping an unknown environment [10, 11, 12, 13] and motion planning [14, 15, 16]; and on to general problem solving [17] using artificial general intelligence (AGI) architectures [18, 19, 20].

One idea stands out as common to all these approaches, beginning with the formal notion of a problem space introduced by Newell and Simon [17, 21]: the purpose of a memory architecture is to learn the transition structure of the state space of the system comprised of the agent and its environment while processing the history of observations into a format facilitating improved future control.

It is often argued (e.g. [22, 23, 24]) that memory architectures for general agents should enjoy a high degree of domain- and task-independence. In general, however, clear definitions of notions such as ‘domain’ and ‘task’ are not readily forthcoming across the vast breadth of literatures discussing memory, agents and autonomy. Notions of ‘universal learners’ have been proposed [25]

based on optimizing gain in estimators of predictive entropy, however there is evidence to suggest that the resulting level of generality may be insufficient for some tasks


Absent broadly recognized formal foundations, we advance an architecture provably satisfying intuitive universality properties, including, most centrally: (1) interactions with the environment are encoded in the most generic, yet minimal, manner possible, while requiring no prior semantic information; and (2) learning obtains from direct binary sensory input, automatically developing appropriate contextual links between sensations of arbitrary modality. A key outcome is that the architecture encodes its observation history in a model space that supports the agent’s problem solving as a form of reactive motion planning whereby atomic computations provably correspond to nearest point projection in the reachable set.

I-B Contribution

We consider a generic discrete binary agent (DBA): a machine sensing and interacting with its environment in discrete time, equipped with a finite collection of Boolean-valued sensors, some of which serve as triggers for actions/behaviors (switched on and off at will).

Given an instance of a DBA interacting with an environment , it is natural to view the set of sensory equivalence classes of the associated transition system as a subset of the power set . It is generally accepted [27, 28] that a memory architecture must be capable of supporting internal representations rich enough to account for the diversity [29] of the transition system : Exact problem solving, when construed as abstract motion planning, requires an internal representation capable, eventually, of accounting for all the classes in and the transitions between them. Unfortunately, as expressed forcefully in [29] and as we review at length below, the task of obtaining an exact description of becomes intractable in the absence of strong simplifying assumptions about X, as the number of sensors grows.

To circumvent this obstacle, rather than imposing any specific structure on , we propose to relax the requirement for precise reconstruction by introducing an approximation whose discrepancy from we characterize exactly and show to be the smallest possible in its (computationally effective) class of objects.

The new memory and control architecture we propose here consists of two layers:

  • A data structure – called a snapshot – keeping track of the current state and summarizing observations in terms of collection of real-valued registers, of size quadratic in the number of sensors, summarizing the history of observations made by the agents.

  • A reactive planner, built on a weak poc set structure ([30, 31] and defn. A.1) constituting a record of pairwise implications among the atomic sensations as observed by the agent; is computed from in each control cycle.

A crucial property of our architecture is that and are formally reconstructible from each other. The model space takes the form of a CAT(0) cubical complex, or cubing111For a good introduction CAT(0) cubical complexes, see [32]. For a tutorial on cell complexes see [33], chapter 0 and appendix., whose -skeleton is contained in . As the snapshot is updated by incoming observations, the space , as encoded by , is transformed along with it. We can state our main contributions – albeit, necessarily, informally at this point – in terms of provable properties of the architecture and its model spaces:

  • Universality of Representation. is the minimal model guaranteed to represent all the sensory equivalence classes of any sensorium satisfying the record (see A-E3). In particular, in the absence of information not already encoded in , it is impossible to distinguish the -skeleton of from the set of sensory equivalence classes, .

  • Topological Approximation. As a topological space, is always contractible222The formal notion of being ‘hole-free’  —  see [33], chapter 0.. Provided a sufficiently rich sensorium, the sub-complex of faces all of whose vertices lie in inherits from the topology333Up to homotopy equivalence  —  see definition in [33], chapter 0. of the observed space (see appendix A-E3-A-E4).

  • Low-complexity, effective learning. The proposed architecture requires quadratic space (in the number of sensors) for storage, and no more than quadratic time for updating. Furthermore, an agent picking actions at random learns an approximation of the resulting walk’s limiting distribution on (see II-E2).

  • Efficiency of Planning. Planning the next action given a target sensation takes quadratic time in the number of sensors, while eliminating the need for searching in the model space. With sufficient parallel processing power, this bound may be reduced to a constant multiple of the height  —  the maximum length of a chain of implications of (see III-B).

To the best of our knowledge, this combination of provable properties has not previously appeared in the literature.

I-C Overview and Related Literature

To establish the novelty of our contribution we now briefly review the copious literature bearing on these topics as arising from three distinct traditions: robotics; connectionist computation; and artificial general intelligence. After presenting our technical ideas we will explore at the end of the paper in a more discursive form their relation to and implications for the broader field.

I-C1 Relation to Mapping and Navigation

Formulating navigation and mapping problems in terms of a point agent moving through a homotopically trivial ambient space while avoiding a collection of geometrically defined obstacle regions representing forbidden states is fundamental to motion planning [14, 15] and mapping [12, 34]. The ubiquity of obstacles in these settings introduces topological considerations whose primacy is well established in the algorithmic literature [34, 35, 36, 37, 38, 39, 40], governing the complexity of not only motion planning [41] but even set membership [42].

Our strategy is to reduce the general problem of memory storage and its use for motion planning in the underlying transition structure of a problem space (as sensed by a DBA) to the geometric problem of motion planning in the agent’s model space (playing the role traditionally assigned to Euclidean space). Generalizing the Euclidean setting, has a very strong convexity theory [31, 43] enabling low-cost greedy navigation.

The topological point of view has been shown to be well warranted in the discrete setting as well. As was demonstrated by Pratt [44], oriented topological structures (cubical complexes, in fact) may be used to encode the causal relations among actions and states in symbolic transition systems. Approaches generalizing Pratt’s have since been used to formulate very general models of reconfigurability and self-assembly [45, 46].

I-C2 Mechanisms for Learning and Planning

Snapshots use an evolving estimate of pairwise intersections of sensor footprints to form a record of implications among the atomic sensations of the DBA. The necessity of such a record for planning goes back (at least) to [47], yet ideas about applying it as a way to encode context are fairly recent and specialized [4, 23, 48]. Our internal representation takes the additional step of applying this principle to all the sensations available to a DBA, including the control signals it uses to interact with its environment.

The resulting learning and control mechanisms may be realized in a highly simplified and idealized, yet highly plastic, network of neuron-like cells simulating the structure of


. This analogy with neural networks is not a coincidence: estimating

arbitrary intersections from near-synchronous activation of sensors in a planar sensor field has been explored as a means for topological [39] as well as metric mapping by competitive attractor networks (RatSLAM [13, 24, 48]), as the study of the structure of stability properties vis-a-vis topology and plasticity in more general networks is just taking off [49, 50].

I-C3 Model Spaces

The necessity to maintain high-dimensional representations of the state space poses a major challenge for current approaches to learning [51, 52, 22] and general problem solving [53, 54]. The method closest to ours in its formalism seems to be that of [29]  —  and even lends itself to learning by a connectionist network [55]  —  but still requires an exponentially large representation for planning purposes. By contrast, in our case, the ability to translate action planning in into what is essentially a flow problem in a network constructed from the underlying sensorium obviates the need for maintaining

in memory, allowing us to evade the curse of dimensionality. Nevertheless, we are still guaranteed a model space that is sufficiently rich to account for

all sensory equivalence classes perceivable by the DBA [27].

The computational advantages of our approach come at a cost that is largely driven by topology, as expressed in (ii): necessarily has trivial topology444Again, in the sense of being contractible. [56, 31, 57], and our own result [30] establishes formal conditions on under which the complex reproduces the ”topological shape”555In the sense of homotopy type  —  [33], chapter 0. of (as discussed above), which may not be topologically trivial. The basic algorithm driving planning in our agents, however, achieves its efficiency by disregarding this mismatch. The introduction of auxiliary intrinsic motivation mechanisms [58, 59] as a means of steering the agent away from obstacle states in and towards desirable behaviours (not necessarily states!) seems to be a possible way out of this predicament, as well as towards a solution of the problem of closing the control loop. At this early stage, as a feasibility study for the overall approach, we only consider very simple excitation mechanisms causing the agent to choose actions with the desire to maximize immediate excitation gain, to the extent that may be sensed by (and otherwise to choose random actions).

I-D Organization of the Paper

Having already given proofs of the formal results underlying (i) and (ii) in our previous paper [30], we defer the technical discussion of poc sets to appendix A. This appendix is intended as an introductory overview of the theory of weak poc sets and the geometry of their dual spaces – our agents’ model spaces  —  as well as a repository of proofs of technical results we require but could not find elsewhere in the literature.

Section II discusses (iii). We formally state the observation model for DBAs, describe snapshots and their learning mechanisms, and present our early numerical work illustrating the practical implications of the claims regarding learning.

Section III is dedicated to item (iv) in the list of contributions. Actions are introduced to the observation model, and control algorithms are defined and validated.

Finally, following an extended discussion of our results in relation to existing literature in section IV and the aforementioned appendix dealing with poc sets, a second appendix presents the proofs of technical results about snapshots.

Topic/Notation Ref.
DBA Model (general)
Environment (with points ) Sec.II-A1
State space of the experiment (with points ) Sec.II-A1
The position map Sec.II-A1
Time, the set of integers Sec.II-A2
Reads as: ”at time Eqn.(1)
DBA model (sensing)
Sensorium (elements are ), with involution Eqn.(6)
Realization map of the sensorium Eqn.(7)
Evaluation, e.g. of on Eqns.(4-5)
DBA computational model (at time )
Agent’s snapshot Sec.III-B1
The derived poc graph, Sec.III-B1
Derived (weak) poc set structure on , Sec.II-B1
The model space Sec.II-B3
The punctured model space Def.III.8
Raw observation Sec.II-B2
Recorded observation Sec.II-B4
Decision (action) following the observation
Contents/parameters of a snapshot
The complete graph on with all edges removed Def.II.2
State of the snapshot Def.II.3(a)
Weight on the edge Def.II.3(b)
Learning threshold for the pair Def.II.3(c)
Orientation cocycle of Prop.II.7
Dissimilarity measure of App.B-C

Objects derived from a snapshot
Derived poc graph Prop.II.8
Derived weak poc set structure Def.II.10
Weak poc sets and their duals
Poc sets (with and without indices) Def.A.1
The set dual of , the -skeleton of Def.A.8(b)
Dual graph of , the -skeleton of Def.A.8(c)
Dual cubing of the poc set Def.A.8(a)
The punctured dual with respect to a realization Def.A.28
The dual map of a poc morphism Defs.A.2,A.24
TABLE I: Table of Mathematical Symbols

Ii Snapshots: From Observation Sequences to a Memory Structure

We begin with a formal statement of what we mean by a DBA and its observation model. We then proceed to construct snapshots, their updating mechanisms and the derived weak poc set structures, and conclude the section with results on the learning capabilities of snapshots. Table I reviews notation that will persist throughout the paper.

Ii-a Observation Model for DBAs

Ii-A1 Environment and State

We place an agent in an environment . The state space of the system will be denoted by , where we assume there is a map producing the location of the agent in , given the state of the system as a whole. As it turns out, no further mathematical structure on is required for the results that follow, hence, with a mind toward inviting the broadest range of applications, we impose none, much in the spirit of McCarthy and Hayes discussion of situation calculus [47].

Ii-A2 Time and Transitions

We model time as the set of integers (the subjective time of the agent), with corresponding to the initial time. The basic objects of study are then trajectories, or maps of the form


We define abstract transitions in as follows:

Definition II.1 (-transitions).

An element of the -fold Cartesian power will be referred to as an -transition. For any trajectory and we define the map


We refer to -transitions as states, and to -transitions simply as transitions.

Any setting where , and the transition structure of the system are specified (though, possibly, in an implicit fashion), implies constraints on the set of achievable trajectories. We will refer to such settings as experiments, within the framework of which each allowed trajectory will be referred to as a run of the experiment, while observations produced by sensors during a run (below) will be called experiences.

Ii-A3 Discrete Binary Agents

A discrete binary agent (DBA) is endowed with a collection of binary sensors indexed by a finite set . We will assume that each is assigned an order , and a realization . We then say that is a -sensor, or a sensor of degree . For example, a -sensor – or state sensor – responds to the system entering a certain subset of (a ”macro-state”), while a -sensor responds to the system experiencing a transition of a particular kind.

Evaluation of sensors is best viewed in the context of trajectories: a -sensor is applied to a trajectory and assigned a value at time according to the rule


Here denotes the measurement provided by the sensor at time given the trajectory . To avoid a profusion of parentheses and subscripts we will generally use bracket notation to denote the evaluation of Boolean- and scalar-valued functions:


and so forth. The symbol will always denote the indicator function of a set with respect to the appropriate super-set.

We will assume that comes endowed with a map satisfying the following for all :


as well as


We also introduce the virtual sensors evaluating to


on any trajectory and at any time . For subsets we will always use the notation to denote the set of all , ranging over .

The database structure we will be using is designed to maintain an approximate record of the relations among sensors in believed by the agent to hold true throughout time. This record at time is encoded in a weak poc set structure over (definition A.1).

For two -sensors this requirement translates into in being treated (for planning purposes) as the inclusion in the space . Note how the equivalent containment is encoded by the contra-positive implication , which, by the definition of a weak poc set, holds if and only if does.

When have different orders we are forced to replace this requirement by a weaker one: at any time , our agents will interpret in as


holding for all . In other words, our agents assume that relations among sensors do not change over time666This is not to say that our agents are not allowed to change their minds regarding which relations hold true and which do not: the purpose of keeping a dynamic record of relations is to eventually uncover the ‘correct’ relations..

For example, if where is a state sensor and is a transition sensor, consider the statements:

treated as identities over both and . We see that states all transitions of type must terminate in a state of type , while means that only transitions of type could produce state . It is clear that both kinds of statement are essential for planning.

Ii-B The Model Spaces

Ii-B1 A Record of Implications

Informally, a “record of implications in ” is a partial ordering on reflecting the standard interactions between Boolean complementation and Boolean implication. Formally, our DBA will maintain, at any time , a weak poc set structure on consisting of a partial order relation satisfying, for all :

  1. ;

  2. .

Note that by the construction of and compare with the definition in appendix A.1.

Ii-B2 Observations as vertices of a cube

From the agent’s viewpoint, the current state of the experiment at time is completely characterized by the measurements , where is the agent’s trajectory. Equivalently, the state may be encoded in a subset satisfying for all . Such subsets of are called complete -selections. An incomplete measurement of the state would then correspond to a subset satisfying for all , called an (incomplete) -selection – see definition in appendixA.6, along with remarks on the notation to follow.

Thus, one thinks of the collection of all complete -selections on as enumerating the possible sensory equivalence classes in the sensed space. However, some of the elements in this collection are redundant given the record : an implication means that no containing is expected by the agent to be witnessed by any observation (see fig.9). Formally, a set is coherent (definition A.7), if no pair of elements satisfies .

Ii-B3 The model spaces

The model space corresponding to the record takes the form of a cubical complex  —  a topological space constructed from a collection of vertices (the -skeleton), a set of edges (the -skeleton), and successively higher dimensional connecting cells in the form of cubes [32]. We choose the vertex set of to coincide with the set of coherent -selections in . Edges are inserted to join any pair of vertices satisfying (this condition turns out to be symmetric). The hop-distance on the resulting graph may be seen as a variant of the crude, ‘information motivated’, Hamming distance on . The -dimensional skeleton of is further enriched with higher dimensional cubes to yield the space , as described in appendix A (definition A.8) for the interested reader. While a fairly detailed knowledge of the geometry and topology of spaces obtained in this way is essential for following our formal arguments regarding the modeling capabilities of this class, much of it is unnecessary for this section’s account of how the agent obtains its representation of , the record .

Ii-B4 Maintaining a record of the current state

Returning to the problem of representing the current state, observe that is expected to change as time progresses, possibly giving rise to observations that are incoherent with respect to , and therefore represent points ‘outside’ the model space. While the raw observation must be applied to the agent’s data structure in hopes of improving , the agent must resolve the contradiction within the framework of its current model, replacing the incoherent complete observation in its role as the record of the current state kept by the agent with a coherent but incomplete observation (104), , satisfying certain naturality requirements  —  see appendix B-E2 for the complete technical discussion.

This means the agent resolves the contradiction at the price of introducing ambiguity into its record of the current state: instead of having a single vertex of representing the current state (“complete knowledge”), any vertex containing the set may turn out to be the correct current state.

The complexity of coherent projection (lemma III.4) and its role in the agent’s reasoning processes, its interplay with the convexity theory of the model space and its interpretation as the basis for viewing our architecture as a connectionist model (albeit a very limited one) of cognition will all be discussed in section III-B.

Ii-C Snapshots

In [30] we have introduced the rather loose notion of a snapshot, aiming to outline a class of database structures for dynamically maintaining weak poc-set structures from a sequence of observations made by an agent along a trajectory through . A rigorous treatment of this tool requires some careful definitions.

Definition II.2.

Denote by the graph obtained from the complete graph over the vertex set by removing all edges of the form , . Edges of will be referred to as proper pairs in . We will abuse notation and write for the edge of .

The graph is the scaffolding for snapshots:

Definition II.3 (Snapshot).

A snapshot over consists of the following:

  • State. Each vertex of is assigned a binary state . The set


    is called the

    state vector of the snapshot

    and is required to be a -selection on (definition A.6).

  • Edge weights. Each edge is assigned a non-negative real number denoted .

  • Learning Thresholds. Each edge carries a non-negative real number satisfying

For every , the restriction of to the subgraph induced by the vertices and will be denoted by and referred to as a square in .

The original motivation of [30] for the notion of a snapshot is twofold:

  1. Maintaining a consistent representation of the current state. For this purpose we will generally assign the coherent projection of the current state measurement to be stored in .

  2. Learning implications in the sensorium. To learn an estimate of the implication order on inherited from its realization in it should suffice to maintain a system of weights on quantifying the relevance (e.g. frequency) of the event , allowing one to partially orient the snapshot according to the rule of thumb illustrated in figure 1.

Fig. 1: determining edge orientations in a snapshot by restricting attention to .

The graphical representation derived from a snapshot in this manner does not automatically define a weak poc set, but is nearly there:

Definition II.4 (poc graph).

A poc graph over is a subgraph of endowed with an orientation which satisfies, for every :

  • If then ;

  • If then .

By abuse of notation, we use the symbol to mean the directed edge emanating from and pointing to (if it exists in ).

In order for a poc graph to represent a weak poc set structure on one needs:

Lemma II.5 (derived poc set).

The transitive closure of the orientation relation on a poc graph over is a weak poc set structure on if and only if has no directed cycles.


This follows directly from the discussion in example A-A2. ∎

The rest of this section mainly deals with characterizing a large class of snapshots encoding acyclic directed poc graphs and with means of evolving snapshot representations of from trajectories. Given a trajectory of our agent through , the collection of coincidence indicators


may be used to evolve a sequence of snapshots representing, at any time , the cumulative influence of the agent’s observations on its perception of implications in the sensorium.

Ii-D Probabilistic Snapshots and Acyclicity

The following set-theoretic identities among the coincidence indicators are easily verified for all :


These identities motivate considering snapshots with weights obeying analogous constraints:

Definition II.6 (Probabilistic Snapshot).

We say that a snapshot is probabilistic, if is a coherent -selection and the edge weights satisfy the following:

  • Consistency constraint. if then:

  • Normalization constraint. for any :

  • Orientation constraint. if then:


We denote the set of all probabilistic snapshots over by , or simply when there is no danger of confusion.

A fundamental observation regarding probabilistic snapshots is the following

Proposition II.7 (Acyclicity Lemma).

Suppose is a probabilistic snapshot over and is a poc graph satisfying the orientation cocycle condition:


Then contains no directed cycles.


See appendix B-A. ∎

This proposition puts the vague notion from figure 1 on how to derive implications from a snapshot on a firm footing:

Proposition II.8 (Poc graphs from snapshots).

Let be a probabilistic snapshot. Construct a poc graph by setting


Then is an acyclic poc graph.


The symmetries of and immediately imply iff . The strict inequality in (17) implies the second condition of a poc graph holds as well. Since the orientation cocycle is positive on every edge of by definition, the acyclicity lemma applies. ∎

The element of thresholding present in (17) may also be used as a part of the updating procedure of a probabilistic snapshot, without affecting the derived poc set:

Proposition II.9 (Snapshot Truncation).

Let be a probabilistic snapshot. Define a new snapshot to have the same state as while for every satisfying (17) the weights are updated as follows:


Then .


The proof amounts to a direct verification that the is probabilistic, and that satisfies (17) in if and only if the same condition is satisfied by in . ∎

Following lemma II.5 we may now safely define:

Definition II.10 (Derived Poc Set).

Let be a probabilistic snapshot. Denote by the weak poc set structure obtained by setting iff there exists a directed path in from to .

We now proceed to introduce and study two possible snapshot constructions.

Ii-E Empirical Snapshots

The empirical snapshot structure maintains an empirical approximation of the relative frequencies of observations of the form , . For any trajectory of the agent through we could try setting


with a trivial snapshot for all .

Definition II.11.

A snapshot with is said to be trivial and denoted .

Properties (12) then imply that satisfies the consistency and cocycle constraints (defn. II.6) for all , and would satisfy the normalization constraint if we replace the weights by throughout.

Ii-E1 Construction and Properties

The formal construction is as follows:

Definition II.12 (Empirical Snapshot).

A snapshot over is an empirical snapshot if the following conditions are satisfied:

  • For all , ;

  • For all , the expression


    is independent of the choice of , and vanishes only if ;

  • The following expression does not depend on :


Denote the set of empirical snapshots over by (or just when justified).

The evolution of an empirical snapshot under a sequence of observations is then defined through:

Definition II.13 (Empirical Update).

Let be an empirical snapshot and let be complete -selection. The snapshot is the empirical snapshot obtained from by setting


for all . The state of is set to (where, recall, this is the the coherent projection (104) computed with respect to the weak poc set structure derived from the new weights).

Definition II.14 (Evolution).

We say that a snapshot over is an evolution of a snapshot , either if or if there is a sequence of complete -selections in such that .

Empirical snapshots are characterized by their ancestry:

Lemma II.15.

A non-trivial snapshot over is empirical if and only if it is an evolution of the trivial snapshot.


See appendix B-B. ∎

Having characterized empirical snapshots as evolutions of the trivial one, we return to the observation that the weight on   —  see (21)  —  defines a probabilistic snapshot. We may thus define accordingly, by setting


and conclude that:

Proposition II.16 (empirical implies acyclic).

If is an empirical snapshot, then as defined in (23) is an acyclic poc graph, and as defined in defn. II.10 is a weak poc set structure on .

We will henceforth refer to DBAs endowed with empirical snapshots and utilizing the empirical update as empirical agents.

Ii-E2 Performance of Empirical Agents

In this paper we restrict attention to agents endowed with a fixed finite set of actions. An agent starting out at time with a trivial snapshot has no knowledge of its environment, and is therefore assumed to engage in random exploration for some time, until actionable information becomes available. This motivates the question as to how well the memory structure of an empirical agent performs in this initial stage.

In the case where is finite and the agent’s actions are deterministic it is easy to formulate this: Let be the set of available actions, and consider the graph with vertex set , where a vertex is joined to a vertex labeled by an action if applying at results in . Thus,

becomes endowed with the structure of a Markov chain, where we draw actions uniformly at random in every state. Focus on the case when all the actions available to the agent are reversible in the sense that there is an edge from

to if and only if there is an edge from to (loops are allowed as well). Then the corresponding Markov chain is a random walk and its stationary distribution over , denoted by , is uniform [60] over each connected component of the resulting transition system. Thus, each normalized weight

is nothing but the empirical estimate of the joint probability, given by

, for to fire synchronously.

Restricting to a reachability component, we may assume is connected. By abuse of notation, for denote


Let denote the matrix777Recall that introduced in Prop. II.8 is a directed graph. The new notation is intended to connote a matrix representation of such a graph. with entries whenever is contained in up to a precision of , that is:


and set otherwise. This matrix represents the true poc set structure to be learned by the agent, as determined by the fixed learning thresholds. Analogously we let be the matrix with iff the directed edge is contained in the derived poc graph of (and otherwise). A good measure of the agent’s performance would be the behavior of the total error


over time (the matrices viewed simply as vectors in of the appropriate dimension). By Theorem 5.1 in [60], the agent’s random walk converges to at an exponential rate depending only on the transition structure of determined by the actions . We conclude:

Proposition II.17.

Suppose a DBA performs a random walk on a connected

, at each moment in time performing one of a fixed finite set of reversible deterministic actions. Then

converges to zero at an exponential rate.

With such strong performance guarantees for a broad class of empirical agents we are left to examine the variation in performance as a function of the geometry/topology of the environment (beyond the guarantees given by the preceding discussion) we have run simulations in the following settings:

  • The agent performs a random walk along a path with edges (example A-D1), choosing between one step forward and one step back uniformly at random for every , learning the poc structure of a sensorium consisting of ‘GPS’ sensors, as described in example A-D1;

  • The agent performs a random walk along a cycle with edges, choosing between a clockwise and a counter-clockwise step uniformly at random for every , and learning a sensorium consisting of beacon sensors as described in example A-D3;

  • The agent performs a random walk (up/down/left/right) on a square grid with ‘GPS’ sensors along each of the and axes;

  • The agent performs a random walk along (forward/back) a path with edges, but the sensors are chosen to have random activation fields (randomly chosen subsets of the set of vertices along the path); the sensor fields have been drawn anew prior to each separate run.

The number of sensors is the same for each setting, and each agent carries out runs of a length that is cubic in the number of sensors, starting at a random position with an empty snapshot. We have tested different agents for each setting, corresponding to different values of the learning threshold, spread linearly in the interval from to the maximal meaningful value of (where one should not expect much useful learning to occur).

The results are summarized in figure 2 plotting , where we have replaced the matrix as defined in (25) by the -valued matrix


to render the effect of choosing different values for the learning threshold more visible in the graph of .

Fig. 2: Logarithmic plot of the mean number of incorrect edges in the derived poc graph of an empirical snapshot (20 sensors), for learning thresholds varying linearly between (cyan/light) and (blue/dark), averaged over 50 runs of random walks each.

The resulting plots show significant, though subtle, differences in performance between the four settings, illustrating the similarities and differences in the weak poc set structures being approximated, most notably:

  • The sharper initial decline in the mean deviation for (b) and (c) in comparison to (a) is expected due to the relative abundance of crossing in the former, as opposed to complete nesting (see definition A.22) in the latter.

  • Performance in the random setting (d) seems to lag significantly behind performance in any of the structured settings.

  • Performance in the completely nested setting (a) seems to provide exponentially fast learning no matter what; by contrast, the other settings seem to experience a transition between two modes, depending on how small the learning threshold is:

    1. For large , the deviation plateaus.

    2. For small enough , the deviation decreases to zero in finite time.

    We expect the critical value of in any finite setting to be somewhere around the minimum probability of a state (under the stationary distribution of the random walk): in order for a relation to be put on record, it is necessary for the agent to have visited at a frequency below ; the smaller is, the fewer false relations will be recorded for posterity.

  • Recalling that the poc set representing the ground truth in (c) is the direct sum (see A-D2) of two smaller copies of (a) having 10 sensors each, we see that the crossing relations between the -axis sensors and their -axis counterparts account for 800 of the 1600 entries (two 2020 null sub-matrices) in the adjacency matrix of the derived graph. Thus, the two experiments are not that different: loosely speaking, the 1010 square grid experiment projects onto a Cartesian product of two 10-path experiments where the random walk on the 10-path becomes a lazy walk with probability to stay put. In other words, the behavior of (c) may be inferred from the behavior of (a).

  • Not so when comparing (a) and (c) with (b): note how the sub-critical values of the learning parameters in (a),(c) and (d) force the deviation plot to ‘plunge’ into the -axis versus the horizontal asymptote behavior of (b). In view of theorem A.33), our guess is that the environment (the circle) not being contractible has something to do with this qualitative change in behavior, but this requires further investigation.

Ii-F Discounted Snapshots

A notable weakness of empirical snapshots as a data structure is their potential high cost in space, due to the need for indefinitely maintaining integer-valued counters. In some sense, the entire history of the agent matters, and, in some sense, matters too much. This motivates the search for an alternative, more quantized, updating mechanism whose dependence on any given past observation weakens at a fixed rate.

Ii-F1 Construction and Properties

Definition II.18.

(discounted update) Let and let be a probabilistic snapshot over . For any complete -selection on we define the snapshot to be the snapshot with weights determined by


The state of is set to , the reduction being computed with respect to the weak poc set structure derived from the new weights. Finally, define the -discounted update of to be the snapshot and we refer to as the decay parameter.

A significant advantage of the discounted update is its applicability to arbitrary probabilistic snapshots:

Lemma II.19.

The -discounted update of a probabilistic snapshot by a complete -selection is probabilistic.


It is clear that the discounted update preserves probabilisticity. Proposition II.9 finishes the proof. ∎

Consider the length of time (or the amount of evidence) it takes a discounted snapshot to acquire an implication, compared to the amount of evidence required for giving up an implication already on record.

Assuming a fixed value of the decay parameter over a considerable length of time, a lower bound on the amount of time required for to become small enough for a relation to be put on record is given by the situation when a long enough sequence of consecutive observations with not occurring is made:


On the other hand, once the relation has been put on record, the number of successive observations of required for replacing this relation with must satisfy:


–  this much is guaranteed by the truncation mechanism. Overall, it seems that choosing a value of with sufficiently small should produce meaningful learning: lower values of make it both harder to learn and easier to unlearn a false relation, while maintaining a qualitative difference between the necessary requirements for either process.

Keeping fixed over long periods of time places an emphasis on the values of the learning thresholds . As these values do not have to be chosen uniformly over the snapshot, one might want to vary the values of the learning thresholds individually with the aim of altering the flexibility of the learning process in the corresponding square. This opens up a doorway to employing methods for varying the learning thresholds and the decay parameter in ways analogous to [26] and [22] as a means of improving the quality/dependability of the model space. The simulation results below emphasize the need for this kind of control, showing that a discounted agent is much more susceptible to changes in geometry and topology/combinatorics of the sensor fields than an empirical one.

Ii-F2 Performance Analysis

Figure 3 compares the mean performance of time-discounted snapshot learning from a random walk in the four settings described earlier in II-E2, for the values of the decay parameter given by , .

One immediately notices, in comparison with the empirical case, that the dependence of the learning process on the discount parameter is not monotone: it would seem that a choice of works best for all settings in terms of optimizing the eventual deviation,  —  though it is hard to say what ‘best’ would even mean for (d)  —  while a choice of is more reasonable given the observed waiting time until meaningful learning occurs in the structured environments (a)-(c).

Fig. 3: Mean number of incorrect edges in the derived poc graph of a discounted snapshot in 4 environments (20 sensors each) for varying values of the decay parameter, , from (red/dark) to (yellow/light), averaged over 50 runs of a random walk.

Similar observations to those made for the empirical case (figure 2) regarding the interplay between ‘learning modes’ and geometry/topology can be made here as well, but are more subtle, as the comparison in figure 4 shows.

Fig. 4: A comparison of the mean number of incorrect edges in the derived poc graph as a function of time, for an empirical snapshot (blue) and a discounted one (red). Here and .

Ii-G Further Adjustments to the Weak Poc Structure

The implication record constructed from a probabilistic snapshot in the preceding section does not recognize possible equivalences among sensations: if, for whatever reason, a relation of the form


takes place in a snapshot , it becomes reasonable to interpret it as the logical equivalence , yet will not register any relations in the square , barring an agent equipped with from utilizing the currently observed equivalence.

Thus, an adjustment of is required if we are to allow our agents the advantage of reasoning about equivalences. The following extension of turns out to serve our purposes for a restricted class of probabilistic snapshots:

Definition II.20.

Let be a probabilistic snapshot. The poc graph is defined to be the poc graph obtained from by adding the directed edges for each satisfying (31).

It turns out that gives rise to an adequate weak poc set structure and model space, provided satisfies the additional requirement:

Definition II.21.

A snapshot is said to satisfy the triangle inequality, if


holds whenever .

A class of examples of special significance in this work is that of snapshots whose edge weights are derived from a measure on a space by pulling back along a realization as follows:


The triangle inequality for is then an immediate consequence of the well-known (e.g. [61], chapter 3) triangle inequality for measures:


where are arbitrary measurable sets and denotes the symmetric difference .

The coincidence indicators of (11) are a special case of this example (where is an atomic measure), and so are empirical snapshots (as their weights are sums of coincidence indicators). Discounted snapshots fall into this class, too, as their weights are convex combinations of coincidence indicators.

Due to the technical nature of the interactions between and the extension , we postpone the formal discussion of these interactions to appendix B-C. The bottom line, however, is that for any probabilistic snapshot structure satisfying the triangle inequality our agent may safely apply the control protocols of the next section to the extended poc graph derived from the agent’s current snapshot to arrive at action choices while taking advantage of the perceived equivalences within the sensorium.

Although technically we are obliged to distinguish between and , as well as between the weak poc set structures they correspond to, we will treat these objects as identical for the sake of simplifying the rest of the exposition.

Iii Control with Snapshots

This section introduces the basic control function of a snapshot. We begin with introducing a formalism designed to treat discrete actions as a sub-structure of the binary sensorium, and discuss the effect of this formalism on shaping the model space (III-A). We next turn to a discussion of the snapshot as a highly efficient computational mechanism for coherent state-updating and for decision making based on the geometry of the model space (III-B).

At the technical level, this section requires an understanding of the convexity theory of cubings: The classical results are covered in appendix A-D, while our new technical results underlying the use of snapshots for greedy navigation in cubings are covered in appendix B-E.

Building on these results, section III-B2 introduces the mechanism of signal propagation over a snapshot which realizes the computation both of coherent projection and of closest point projections to prescribed convex subsets of the associated model space. This mechanism suggests a view of snapshot architecture as highly simplified connectionist architectures, and some related work in the literature is discussed.

At the heart of our proposed decision-making algorithm is an assumption that the sensorium is rich enough to detect direct causal relations between actions and other sensations. We provide a fairly broad formalization of this assumption in III-B3 (with an example in III-B4), and prove the ability of an agent to correctly ‘halucinate’ the immediate consequences of taking an action, provided sufficient exposure.

An algorithm using this tool to attempt greedy navigation over to a specified target state is proposed in III-B5, and some of its failure modes are discussed as a motivation for future research on judicious dynamical expansions of the sensorium which would allow the agent to overcome the navigational obstructions formed by states in having no witness in the situation space .

Finally, in III-C3, we explore the performance of some excitation-driven DBAs: agents endowed with an excitation level that changes depending on their distance from a target in the environment ; the agents are capable of sensing an increase or a decrease in excitation, and seek instant gratification in the sense of operating on the mandate to always pick an action guaranteeing an increase in excitation (or else act randomly). We compare the performance of such agents in the domains considered in II-E2 and II-F2; in these domains it is easy to guarantee arrival to the target given a-priori knowledge of the correct snapshot structure, but we are interested in the agent’s performance as they learn the problem ”from scratch”.

Iii-a Defining Actions

We will now restrict attention to DBAs with a sensorium endowed only with state (degree ) and transition (degree ) sensors. As before, we denote the realization of a sensor by , where for a state sensor and for a transition sensor. Thus, state sensors and transition sensors may be viewed as Boolean and situational fluents over the situation space , which is sufficient for setting up a discussion of actions and competencies according to McCarthy and Hayes [47].

For our agents, we posit a set of transition sensors, each of which may be switched on and off at will, earning them the name of actions. To be precise, our requirements are:

  • Actions are binary. We assume , and we denote the poc subset by .

  • Every action has outcomes. For any and , the sets


    are non-empty subsets of .

In this we depart slightly from the accepted notion of actions in the literature on transition systems of various flavors (e.g. [16],[62]), where actions are attached to states and the collection of actions available at each state may differ, depending on that state. Instead, we consider actions as nothing more than control signals, sent by the agent’s ‘mind’ to the agent’s ‘body’ in order to invoke (or not) one or more of a fixed set of available behaviors. It is the purpose of the ‘mind’ to identify whether or not a control signal produces meaningful outcomes as those outcomes are being sensed.

Iii-A1 Invoking Actions Synchronously

Our sensor-centric approach to actions reflects the viewpoint that (1) an action taken at a state imposes a time-independent restriction on the set of states the system may enter in the following moment, and (2) the agent is capable  —  at least in principle  —  of observing its own decisions as they are being invoked. We must now discuss the precise extent to which these principles may or may not restrict our initial suggestion that the sensations in are controllable.

For example, consider the situation where the agent is not engaging in an action during a transition from state to state . This implies is on during this transition, which restricts the possible values of to . Hence, not invoking any of the available must then restrict to the intersection , the set of possible outcomes of the ”no-action”.

More generally, allowing a number of actions to be taken at the same time (while not engaging in the rest) forces the following interpretation by our sensing model:

  1. A generalized action by the agent is a complete -selection on (recall definition A.6);

  2. The realization of a (generalized) action is defined to be


    or, equivalently, for every , the set of possible outcomes of an action equals


For this extended collection of actions one notices that the second requirement of an action  —  for all   —  does not necessarily hold: for example, moving forward along a rail contradicts any motion in the opposite direction. We will say that a generalized action is admissible at if , and that is admissible, if it is admissible at for all .

Aside from setting natural bounds on the meaning of the initial statement that actions are available to the agent at will, the notion of admissibility of a generalized action explains how to interpret the poc set structure induced on from the realization : if


happens to hold for , then every generalized action admissible at a point defines a vertex of no matter the choice of . Similarly, generalized actions not showing up as vertices , where denotes the restriction of the poc set structure to , represent the agent’s belief at time regarding combinations of elementary actions it cannot achieve at that moment.

In the simple examples considered in this paper all agents will be endowed with a collection of mutually exclusive atomic actions. By this we mean that holds for all (). Equivalently, only the ”no-action”, , and the ”pure” actions are admissible, and the resulting cubing takes the form of a starfish: a tree with only one vertex of degree given by the ”no-action” and with a set of leaves in one-to-one correspondence with the set of ”pure” actions (see example A-D1 and figure 11b).

Iii-A2 Observations

The following set is the set of observations in (note that it is closed under the -operator):


and stands for the set of ”passive” sensations, as opposed to actions. Sections II-C-II-G explain how a trajectory , may be used to form an evolving sequence of weak poc-set structures over , with each representing the agent’s belief at time regarding which implications among the sensors in hold true throughout time. Two poc subsets of are formed by restricting its poc structure:

  • is the induced poc structure on ;

  • is the induced poc structure on .

We are interested in the interaction between these smaller poc sets and the full model space, . One has two surjections


defined, at the level of -skeleta, as follows: sends a coherent -selection on to the -selection , and similarly for . Hence, at the level of -skeleta, there is a map:


In fact, Sageev-Roller duality [31] implies a much more precise statement:

Proposition III.1.

The map above (41) is a median-preserving embedding of in the Cartesian product .


See proof in appendix B-D. ∎

Iii-A3 Example: discrete path with motion

To illustrate the description of the model space provided by proposition III.1, consider an agent moving in steps of unit length along a path of integer length . Formally, the environment is given by and the agent has the actions defined by:


enabling motion from any vertex to the adjacent and , when they exist. We also endow the agent with sensors realized as:


Up to symmetry, the only relations holding in the existing scheme are


the ‘starfish’ relations for :


for the actions , and the two relations


indicating may not be reached by applying , while may not be reached by applying . No other relations hold universally. Let denote the poc set structure over recording these relations.

Fig. 5: Model space for a DBA placed in a discrete path, depicted together with its projections to (right) and to (below). This is the case of example III-A3.

is the result of forming the Cartesian product of a -pronged starfish with the path of length obtained888Compare with example A-D1 and figure 11. as , and then removing two squares as shown in figure 5, due to the relation in (46).

Iii-B Reactive Planning

Iii-B1 Statement of the planning problem

In this section we consider a DBA at time , equipped with a snapshot with a derived poc graph and associated weak poc set (but keep in mind the notational simplifications at the end of II-G). The agent’s tasks at hand are:


Predict the immediate outcome of any