Modular programming of computing media using spatial types, for artificial physics

04/11/2019 ∙ by Frederic Gruau, et al. ∙ 0

Our long term goal is to execute General Purpose computation on homogeneous computing media consisting of millions of small identical Processing Elements (PE) communicating locally. We proceed by simulating the Self-Development of a Network (SDN) of membranes, and this implies a medium able to implement artificial physics laws that simulates simplified membrane-agents, dividing and homogenizing. This is a difficult challenge: our current version of SDN-media uses PEs with 77 bits of state and 13878 gates. This high level of complexity forced us to work out an efficient and expressive scheme for programming the medium, the goal of this paper it to present it. The PE's communication graph has to be a maximal planar graph. Fields of bits are spread in 2D, over three locus: the vertices, edges and faces of this planar graph. They constitute three data types, which abstract away the ensemble of PEs. The simplicial proximity between bits is used to define operations on fields, thus implementing " spatial type". Expression combining operations can be translated in logical circuits. The efficiency is achieved because fields of different locus are combined using reduction operation. This allows to factorize computation by exploiting the symmetries always present when simulating physics. The expressiveness is achieved by allowing a modular procedural programming: Instead of directly focusing on a specific target update function, we develop a library or reusable functions mapping fields to other fields. We consider two kinds of maximal planar graph: with isotropic distribution of PEs or with the hexagonal lattice structure. The first compares to amorphous computing medium and has a better potential for hardware scalability, the second compares with cellular automata computing medium, it is more efficient.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Physics on Computing Media for general-purpose Computing.

Computing Media.

Future computing platforms, whether very-large-scale integration (VLSI), nano, or bio , will probably consist of a vast number of Processing Elements (PEs) homogeneously embedded in 2D or 3D space, where the magnitude involved forces the programmer to incorporate the

locality constraint: Each PE has a specific location in space, communication is local in space, communication time must be proportional to Euclidian distance as in the VLSI complexity model [12]. This invariant enables unbounded scalability of hardware, and characterizes a family of computer architectures refereed to as “computing media” [9]. : This family includes regular classic models, such as Cellular Automata (CA) or systolic arrays; but also irregular models where the constraints of lattice tiling of space and synchronism in time are relaxed, as exemplified in the amorphous medium [1] which is an homogeneous and isotropic scattering of PEs in 2D or 3D space, with nearest-neighbor communication.

Physics on Computing Media.

Because they are tightly bound to space, simulating physics is what computing media are naturally good at. It is a major application of CAs. Physical laws expressed as differential equations can be translated into simple local rules [3]. Physics can also be done without lattice discretization of space: Rauch [16] modeled wave propagation on an amorphous medium.

Towards General Purpose computing media.

Simulating physics is only a tiny fraction of the spectrum of computation as we know it. While computing media have a potentially unbounded hardware scalability, they cover a very narrow scope of application. Our long term project is to broaden this scope up to general purpose computation. To achieve this level of general, one more level of indirection is necessary. We implement a virtual machine on top of the computing medium, called self-developing network [5, 6].

Self Developing Network(SDN).

The SDN virtual machine allows to program “space agnostic” real algorithms. In [9] we showed how to program and execute matrix multiplication and sorting. Simulating this machine means implementing on a medium physical objects which are biological artifacts: simplified membranes modeled as connected blobs. Membranes allows to structure space into independent regions where distinct computation can take place. This is “artificial physics”: the goal is not to model reality but to design an SDN-medium emulating the SDN virtual machine. In other words, using physics to go beyond physics. For example we set repulsive forces between membrane to homogenize their distribution (load balancing), and strangle force to divide membranes (self-development). Execution resembles a much simplified biological developmental-process. What is developed is not a multicellular organism, but a clean and deterministic virtual network of virtual processing-elements delimited by membrane-blob and adherence between them. The connectivity is determined by the instructions.

The current status of the SDN-medium.

Our current version can interpret a flow of host-instructions dictating the self-development of a virtual 2D-grid network of membrane, in a time proportional to the diameter of the circuit. It is shown in short youtube videos in [8]. This demonstrates that efficient general-purpose execution on computing media is not an utopia.

Cellular Circuit, amorphous computers and Cellular Automata.

The SDN-medium is quite complex. Its programmation and simulation was made possible thanks to a new scheme which allows a modular specification and an efficient execution. It is based on spatial types, which embed data and operation in 2D space. The first goal of this paper is to explain spatial types, and why it enables to tackle complexity thanks to improved efficiency and modularity. A program using spatial types is translated into a circuit of logic gates embedded in 2D space. Just like CAs, the same computation goes on through space justifying the denomination “cellular circuits”; However, as in amorphous computers, the circuit’s global structure is not constrained to be a lattice. If a lattice is used, though, simulation is much more efficient, and cellular circuits becomes CAs. We will now contrast cellular circuits with respect to amorphous computing, and CAs.

(a)
(b)
Figure 1: Two target architectures for the underlying maximal planar graph linking PEs (a) homogenous isotropic (b) hexagonal lattice.

1.2 Isotropic cellular circuit and amorphous computing

A basic amorphous medium is rough: PEs do not know their spatial location, they receive a radio signal sent by nearby neighbors. Research has focused on computing low level information such as a simple pair of approximate 2D coordinate [15]111 Coore [4] proposed a more generic technique having the potential of installing arbitrary patterns of blobs, with a target topological arrangement. However, only small patterns where demonstrated.. In contrast, cellular circuits really enable the programming of complex behavior [8]. However, this is achieved at the cost of considering a “cleaner” medium endowed with a specific property: the PEs have to communicate between themselves following a network which must be a maximal planar graph (all the faces are triangles). It can be seen as a preliminary layer which has to be installed on a basic amorphous medium, enhancing its programmability. We know of two possible solutions: 1- If the PEs know their 2D coordinates, doing a Delaunay triangulation directly builds a maximal planar graph. 2- Otherwise a “combinatorial Delaunay graph” can be built using only the hop counts between PEs. In [19], only a subset of the PE in a sensor network are linked, the subset is computed iteratively so as to create a Centroid Voronoi Tessellation, which means in short, an homogeneous distribution as close as possible to the hexagonal lattice.

Synchronous versus asynchronous

In contrast to amorphous medium we simulate a synchronous framework of cycles, during which each PEs receives all the messages of its neighbor and then updates its state. However, with a small uniform probability we choose not to update the state. This is a significant step towards the asynchronous framework. It generates random fluctuation in the SDN medium, and is in fact beneficial: it is a natural way to solve conflicts.

Homogeneous and isotropic distribution.

For amorphous media, a simple and often used PE distribution in 2D is “Poisson-disk” sampling: the location of each PE is chosen with a uniform probability, but discarded if there are already other PEs nearby (within a disk of a given radius). We used the Furthest Point Optimization (FPO) algorithm [17] which produces more homogeneous and isotropic distribution. The resulting planar graph (produced with Delaunay triangulation) is shown in fig. 1 (a). The improved quality causes the hop-count distance to become a good approximation of the geometric distance, and this in turns, enables to compute spatial features with accuracy. In the example of this paper, we will see that the hop-count discrete Voronoï Diagram (VD) approximates the real VD.

1.3 Contribution of hexagonal cellular circuits to CA.

The hexagonal lattice shown in fig. 1 (b) is a non-isotropic (6 obvious preferred directions), but very regular maximal planar graph. This paper is an extended version of [10], which considered only the hexagonal network option. In this option, a cellular circuit can be translated into a CA, so spatial types can be understood as a different scheme for specifying CAs. What are the advantages?

Specifying rotation-invariant rule in a more expressive way.

The next state of a CA cell is programmed as a function of its neighborhood. In contrast, cellular circuits are inspired from *Lisp [11], considering fields. Boolean fields are defined on sets of points in 2D space called “locus”. Fields are combined using reductions which apply a commutative associative operation on values found in the immediate spatial neighborhood of each point. One great virtue of reduction is that the order in which the neighbors are processed does not matter. In particular, neighbors need not be distinguished, and this ensures rotation invariance. Totalistic CA [18], which sums the immediate neighbor’s state is a CA-illustration of this principle. With cellular circuits, we use any type of reductions, not just the sum. More fundamentally, by carefully constructing 9 locus, we are able to define 12 different type of neighborhood on which to reduce. This increases expressiveness to the point that being forced to compute using reductions is not experienced as a constraint. On the contrary, it feels just natural. When doing physics, we always do compute rotation-invariant quantities anyway, so embedding rotation-invariance in the operations themselves incorporates a useful domain-specific information which alleviates the task.

Generating and simulating rules with high radius.

An hexagonal cellular circuit can be translated into a classic hexagonal CA, however this is mainly a theoretical statement. Usually, a CA next-state update rule consults only the immediate neighborhood, which is radius 1. Considering bigger neighborhood with higher radius is not natural, the idea being to keep the rule simple. With spatial types, we do specify update rules which process large neighborhood, but at the same time also remaining very simple. For example, a simple sequential composition of reductions produces a field of radius . This is because each time a reduction is applied, it creates a communication thus incrementing the radius of the resulting field. As a result, fields of radius can be computed in time , which becomes if translated in a CA. In practice, when using spatial types, fields having high radius is the normality not the exception. In other words, spatial types allow to explore and simulate a different portion of the landscape of CAs. For the SDN-medium, we compute fields of radius up to , which render the interactive simulation of a CA-translation unfeasible.

Modularity.

This is the most important property. By modularity we mean encapsulating code into functions that can be reused, several time within the same CA, or from one CA to another distinct CA: We illustrate both cases in this paper. Modularity is obtained because we compute fields. We can compute them as the result of a function call taking other fields in parameter. Functions of generic interest naturally pop up. By composing them, one obtains quickly very complex behavior.

2 A 2D spatial type based on maximal planar graph.

Informally, a “2D spatial type” is a set of data embedded data in 2D space, and proximity is used to define operation. Instead of using a lattice to define location and proximity, as is done in CA, we will need only a planar graph, and use its faces and edges to locate data in 2D.

The simplicial graph.

A connected graph which can be drawn without any edges crossing, is called planar. When a planar graph is drawn in this way, it divides the plane into regions called faces. A planar graph can be represented very naturally by another graph  [7], called “simplicial graph”: The set of vertices of coincides with the set of all simplexes of , which consist of three classes: vertices of dimension 0, edges of dimension 1, and faces of dimension 2. Two simplexes are connected in if and only if or in . In other words, a vertices (resp an edge) is adjacent to the edges, and faces (resp. to the two faces) including it.

Maximal planar graph.

It is a planar graph where no edges can be added without breaking the planarity, which implies that all the faces are triangles. We will consider exclusively maximal planar graph, because this property is needed for defining some of the operations. Let the vertex count be , edge count , and face count . Maximal planar graph verify , as can be derived by taking the sum over every face of the number of edges in each face which is 3. We also have , (Euler’s formula) hence . The arity is a number associated to each class of simplex; it represents the proportion of the simplexes in each classes: It is 1 for vertices, 2 for faces, and 3 for edges.

The V,E,F simplicial locus.

From an embedding in 2D of a maximal planar graph where edges are drawn with straight lines, the simplicial graph can be also embedded in 2D as another planar graph, by locating its vertices: The vertex-vertices map to the vertices of , the edge-vertices to the edge’s middle, and the face-vertices to the face ’s barycenter. Bits of data will be conceptually associated to those 2D points hence we call them “data-points”. Those three set of data-points are called respectively the V,E,F locus. We refer to them as the “simplicial locus”, so as to distinguish them from other locus introduced later.

(b)
(a)
Figure 2: The VEF tiling associated to (a) the hexagonal lattice (b) the homogeneous isotropic planar graph

2.1 Computing blob features by reducing simplicial fields.

Simplicial fields.

Spatial types are boolean fields, more precisely: function from one locus to . The type is called boolV (resp. boolE, boolF), for Vertices (resp. Edge, Faces). We also use integer fields with a small number of bits, usually 2 or 3. For example, for two bits, the type is noted int2V, int2E, int2F. The bit density of a field is this number multiplied by the arity of the simplex: int2E costs 2*3=6 bits. The total memory needed for a field is its bit density, multiplied by the number of vertices, minus a small constant.

Representation of simplicial fields.

we draw the Voronoï Diagram of the three VEF locus taken together, shown in fig. 2. Fig. 2 shows the tiling obtained for the two planar graphs of fig. 1: For the hexagonal lattice in fig. 2 (a), the tile of vertices, (resp. edges, faces) are hexagons, (resp. rectangles, triangles). This tiling is known as the ”Rhombitrihexagonal” tiling. It is a beautiful Archimedian tiling used in architecture. In order to represent a field, we color the tiles of the subset of data-points for which the field is true. For this reason, false and true are often called “empty” and ”filled”. Fig. 3 shows some example of boolV,boolE and boolF using the Rhombitrihexagonal tiling.

two -blobs
inside
frontier)
neighborhood)
outside
inside
Figure 3: Boolean Fields (black) encoding features of two -blobs (gray). A boolV (resp. boolE, boolF )is a set of hexagons (resp. rectangles, triangles). From a boolV representing two -blobs, generic featuresof -blobs can be computed from using simplicial reductions.

Six simplicial reduction between simplicial locus.

Let V,E,F be two distinct simplicial locus. From a bool, one can compute a bool, as follows: For each point of the target locus , we apply a bit reduction AND, (resp. OR, XOR) of all the point in locus which are simplicial neighbor of . We call the corresponding operation (resp. ). The upper script indicates the target locus . The he number of neighbors of is called the “co-arity”. In the hexagonal case, for the three simplicial locus, we have arityco-arity . co-arity is the number of binary gates needed to do the reduction. We must multiply by the arity of to obtain the gate density which is therefore . For =V (resp. E, F) it is 5 resp (3, 4). A “simplicial reduction” is overloaded since it can be applied to any of the two other locus, for example produces a bool from either a bool or a bool. Simplicial reductions also exist for integer field, for example with min, max, or plus. A simplicial reduction is a “spatial operations”, because it uses proximity in space. We also use non-spatial operations applying an operation separately on each data-point. For example can be applied on a bool (resp. a bool, a bool) to produce a new bool (resp. bool, bool). In this case, the gate density is the arity.

Computing blob features using simplicial reductions.

An SDN-medium uses non-punctual agents whose support spans a set of vertices. It is thus represented using a bool. Supports are separated by considering connected components for vertex-adjacency:

Definition 1

Let be a boolV. -blobs (resp. -holes) are connected components of filled (resp. empty) vertices.

Arbitrary many -blobs can be encoded with a single bool , provided there is enough space. Using simplicial reductions, we can easily compute simple 2D-features of -blobs, shown in fig. 3.

  • Function frontier (resp. inside, outside ) is the set of edges adjacent to both an empty and filled (resp. to only filled, to only empty) vertices. It costs 3 (resp. 3, 4) gates. Function inside is the set of face adjacent to only filled vertices, it costs 4 gates.

  • Function insideinside (resp. outsideoutside is the set of filled (resp. empty) vertices surounded by vertices in the same blob (resp. hole). It costs 3+5=8 (resp. 4+5=9) gates.

  • Function neighborhood costs 3+5=8 gates.

The radius of an operation expression.

CAs uses the notion of “radius” of the neighborhood to consider for computing the next state. It is an important concept which is also defined for spatial types, though at a finer granularity than vertices:

Definition 2

The radius of a function is the max distance ( hop-count between V,E,F locus) to data-points of parameters influencing the result.

For the preceding small functions, which are all taking a bool as input, the radius is 1 for the bool, and bool functions, and 2 for bool functions. Indeed, going from one vertex to the neighbor vertex takes two hops.

(a)
(b)
Figure 4: Transfer Locus: (a) a pair of data points is inserted on each edge between two simplicial points (b) Corresponding tiles are peripheral subdivision of the former simplicial tile.
(a)
(b)
(c)
Figure 5: The six transfer locus grouped by pair of communicating locus (in gray) with adjacent pair of tiles: (a) eV and vE tile, (b) fE and eF tile, (c) fV and vF tile.

2.2 The transfer-locus

From each of pair of simplicial locus V,E,F. We define two other locus (called “transfer locus”) as follows: for each pair of adjacent points in locus we add two points dividing the segment in three. The six transfer locus defined in this way are called: eV, vE, eF, fE, vF, fV. The upper case designate the nearest simplicial locus, also called the father. The two transfer locus with identical father are called brother. Fig. 4 shows the tiling obtained by including the transfer locus within the seeds of the Voronoï Diagram. The former simplicial Voronoï cell is subdivided: the peripheral regions represent the transfer locus, while the portion allocated to the simplicial locus is now reduced to a central tile.

Decomposition of simplicial reductions in three steps.

Data traveling from a simplicial locus to another simplicial locus will now transit through the two intermediate transfer locus which form a pair of communicating data points, as shown in fig. 5. For example, bits move from V to E by passing through transfer locus eV, and then vE, inserted in-between the V and E locus. So the simplicial reduction is decomposed in three more elementary operations (fig 6:

  1. operation broadcasts bits from each V-point to its 6 adjacent eV-points.

  2. operation transfers bits between the paired transfer locus eV and vE

  3. operation computes the conjunction of bits on the two adjacent vE points.

The last step of pure reduction is noted using a slash and the reduction operation itself. So, is now a function taking a boolV, producing a boolE, and programmed by composing three operations:

(1)

In the second notation, we omit parenthesis for unary operations, this saves a lot of parenthesis. The superscript of broadcast reminds of the target locus. Just like simplicial reductions, elementary operations are overloaded: can be applied to a boolV (resp. a boolF), it broadcasts it to a booleV (resp. a booleF); Broadcast are defined similarly as . Transfer and reduction apply to any of the six transfer locus;

Compilation into a circuit.

A simplicial reduction corresponds to a circuit part shown fig. 6 (e) for . Broadcast (resp. transfer, conjunction) is translated as a fan-out wiring, (resp. a ”trans-wire” crossing simplicial tiles , a logic gate). An operation-expression is compiled into a circuit, by putting together the circuit parts associated to each of its operation.

(a)
(b)
(c)
(d)
(e)
Figure 6: Decomposition of . (a) Initial boolV. (b) Broadcast to a booleV (c) Transfer to a boolvE (d) Reduction to a boolE (e) Corresponding logical circuit.

2.3 Internal one-to-one communication between transfer locus

(a0)
(a1)
(a2)
(b1)
(b2)
Figure 7: One-to-one communication between transfer locus, based on geometric transformation, applied on an integer field where different values are different gray tones. (a0,a1,a2) clockwise/anticlockwise rotation (b0,b1) central symmetry,

The purpose of introducing transfer locus, is to increase the expressiveness of spatial types: First of all, transfer fields such as boolvE are used often in the SDN-medium. A boolvE represent an edge together with an orientation: a vertex can then compute wether it is in the inside or in the outside component. Secondly, two new elementary communication-operations can be defined:

  1. Clock, and anti-clock rotation map each transfer locus to its brother.

  2. Central symmetry is available for Edge and Face transfer locus.

They are illustrated in fig.7 (a,b). We adopt the convention that one-to-one communication are denoted with arrows. Rotation (resp. central symmetry) is noted (resp. ). Transfer (already covered) was noted . Clock and anti-clock rotation can be defined because two brother transfer locus are interleaved. The central symmetry is an idempotent operation. It exploits the fact that for Edge and Face, the data-points of the two transfer locus are facing each other. For faces, this is true because face are always triangles. Note that this is the place where we use the hypothesis that the planar-graph is maximal. The central symmetry maps an eF field to vF field and vice-versa. For vertices, the central symmetry is not defined in the isotropic case, because as we have shown, the number of neighbors can very between 5,6 and 7.

Six supplementary reductions.

By applying a reduction on the fields produced using the two opposite rotations, we can program a second set of six reductions mapping one transfer locus to its brother. For example, when reducing with , we have a new function:

(2)
(a)
(b)
(c)
(d)
Figure 8: Apex communication.(a) fV field (b) transfer to a vF field (c) central symmetry to a eF field(d)transfer to a fE field.

Apex neighbors.

The central symmetry on faces is used to implement a composite communication called “apex”. On a maximal planar graph, each edge has two distant vertices called ”apex-vertices”, lying on the summit of the two triangles next to it. Each vertex has also distant edges (5,6 or 7), also called apex-edges. The function apex implements a one-to-one composite communication from boolfV to boolfE, between a vertex and its apex-edges. The effect is illustrated in fig. 8. Bits transit from vertex to face, move within each face (central symmetry), and then from face to edge. Because of overloading, this function also implements the reciprocal transformation from edges to apex vertices: apexapexId, 222and also a permutation between the eV-points of a vertex, and the neighbor vertices

(3)
(a)
(b)
(c)
Figure 9: Meet-points in black, (a) 2 -blobs in gray, =0 beyond the border. (b) a merge-vertex and a merge-edge (c) a div-vertex and a div-edge.

3 Computing the meet-point function meet

The vertex frontier, its inside and outside.

Let be a boolV representing. Before defining meet-points meet , we first need to compute the set of vertices adjacent to the frontier of -blobs:

(4)

Frontier is decomposed into an inside frontier , and an outside frontier:

Global meet-points.

The blobs represent agent’s supports. For illustrating cellular circuits, we program the function meet used in SDN-media for preserving the supports when agents move. We will use it in section 4.2 to compute the discrete Voronoï diagram. It has a boolV and a boolE component: meetmeetmeet. Each of those component is a conjunction of a div and a merge part: meetmergediv and meetmergediv.

Definition 3

Let be a bool, merge is true for empty vertices (resp. merge is true for edges) adjacent to two (resp. the out-frontier of two) distinct -blobs. div is true for filled vertices, (resp. div true for edges), adjacent to two distinct -holes (resp. the in-frontier of two distinct -holes).

Meet-points are needed to preserve connectedness.

In a SDN-medium, agents move by modifying their -blob support, by emptying (resp. filling) a given vertex on the inside (resp. outside) frontier. However, such a move can cause a division (resp. a merge) if the chosen vertex is a div-vertex (resp. a merge-vertex). The same hold if two vertices adjacent to a div-edge (resp. to a merge-edge) are simultaneously emptied (resp. filled). In summary, preserving -blob’s supports implies not modifying meet-vertices, and not modifying simultaneously the two vertices on both sides of a meet-edge (see fig. 9).

Local meet-points.

An -blob can be arbitrary big, and with non-convex shape. Computing whether two vertices belong to the same -blob requires the exploration of a region which is not a priori bounded. It cannot be done with a fixed operation-expression wich can explores only a fixed radius neighborhood. We propose an alternative definition of meet-points: local meet-points. In definition 3, instead of -blobs, we take the -blobs locally induced in the immediate neighborhood, by intersecting with a ball of a given radius , centered on the meet-point. For meet-vertices (resp. meet-edges) we use (resp. ). Local meet-points are not necessary global. This is because although locally one may find two distinct components, those two components may meet if we look further. What matters is that global meet-points are also local meet-points, so detecting local meet-points is an overkill but work for our purpose of preserving -blobs.

3.1 Computing the Meet-vertex function, meet.

Computing nbcc.

As shown in fig. 10, the radius-2 ball centered on a vertex includes a ring of 6 neighbor vertices333It could be 5 or 7 for the isotropic case; As a prerequisite, we need to compute the number nbcc of filled connected components in this ring. For example, in fig. 10 (a) and (b) nbcc. In general the number of neighbors is , hence nbcc so nbcc is an intV2 (encoded with 2 bits). Each component is delimited by two apex-edges in frontier, we therefore only need to make the sum of those and divide by two.

(5)
(b0)
component 1
component 2
(b1)
(a0)
component 1
component 2
(a1)
Figure 10: Detecting (a) the merge-vertex, and (b) the dividing-vertex of fig. 9 (b,c). The added darker rectangles in (a1,b1) represent the edge in frontier, which are also apex edges of the central vertex. There are four of them in both case.

Computing Meet-vertice.

If nbcc, filling (resp. emptying) merges those 2 (divide into those 2) components. If nbcc the same reasoning applies with 3 components. Finally, if nbcc, no division nor merge happens, hence:

(6)

Computing frontier costs 3 gates. nbcc; Knowing that is even, the computation can be done with only 17 gates. Finally, meet costs gates, div costs 21 gates, and merge costs 22 gates.

(b)
(a)
(c)
Figure 11: Detecting the two meet-edges of figure 9,(a) the radius 3 ball centered on an edge contains vertices at distance 1,2 and 3 (b) merging edge (c) dividing edge.

3.2 Computing the meet-edge function meet.

In order to compute a local version of meet for one edge, we consider the radius 3 ball centered on that edge, shown fig. 11. It includes three kinds of neighbor vertices: two immediate neighbors at distance 1, two apex neighbors at distance 2, and 6 remote neighbors at distance 3. Immediate and apex neighbors form a rhombus. We will need a function which takes a boolV and computes a boolE true for an edge if is true within the rhombus centered on that edge. It can be computed using gates by chaining two reductions. The formula can also be applied to a boolE.

(7)

An edge is locally merging two -blobs ( fig. 11 (b)) if there are two locally induced -blobs, two vertices away from each other, one vertex away on each side of the edges. Equivalently: (i) the surrounding rhombus is empty (otherwise those two blobs would meet locally) (ii) on each side of the rhombus, some vertices at distance 3, must be full. (ii) is checked if and only if both immediate neighbor of belong to Frontier, this is computed as . Putting together (i) and (ii) we obtain:

(8)

Because a div-edge is a merge-edge of the complement we obtain:

(9)

But . From we can factorize, simplify and derive:

(10)

The auxiliary field Frontier has already been computed for nbcc, so it is availabe and free! The field frontier(frontier costs 5 gates. costs 7 gates, the non spatial conjunction applied to a boolE costs 3 gates. meet costs 7 + 3 + 5 + 3 = 18 gates. The radius is 3. Taken separately, div costs 7 + 3 + 3 + 5 = 18 gates. merge cost 19 gates. If meet has aleady been computed, then one can also compute div=meet inside and merge= meet inside, which adds only 6 gates.

4 Sequential cellular circuits

A set of fields is called a configuration. A sequential circuit is described by a function updating a configuration, i.e. with identical domain and co-domain.

Definition 4

Let be some spatial-types and . A sequential Cellular Circuit is a mapping , each component is computed as an operation-expression

We often call it simply a “cellular circuit”. Starting from the initial configuration , we iterate times and obtain the configuration at time : . The sequence represents the circuit iteration. A component represent the successive values (of type ) of a stored field, which is called a “layer”.

Circuit tiling

The set of gates can be partitioned with one vertex per tile, by choosing a vertex responsible for each edges and faces. In the hexagonal case, this can be done in a simple systematic way, by selecting direction, and in particular, each class correspond to an identical circuit tile. Fig. 4 shows 4 such tiles. In the isotropic case, a distributed algorithm must decide using random tournament and makes sure that each PE get approximately the same number of edges and faces.

Complexity of a cellular circuit

It is measured by

  1. the radius and gate density of the updating function,

  2. the memory density equal to the sum of the bit density of layers.

  3. the trans-wire count which is the number of wires crossing a circuit tile.

In this introductory paper, we consider only simple circuits with a single boolV layer. Therefore they have a minimal memory density of 1.

(a)
(b)
t=0
t=1
t=2
t=3
t=4
t=0
t=1
t=2
0
0
0
0
0
0
0
0
0
0
0
0
Figure 12: The 9-vertice 1D-circuit obtained from: (a) neighborhood (b) neighborhood(neighborhood). The rectangle delimits a tile. Squares are 1-bit register storing . Execution is illustrated with an initial configuration having a single central true vertex.

4.1 Analysis of a simple cellular circuit for growing blobs.

The boolV circuit neighborhood let some initial -blobs grow, until they meet and merge, and fill the whole medium. Its gate density is 8. This simple circuit is helpfull to better understand the different concepts involved. Consider a 1D graph consisting of a simple line of 9 vertices. For this “degenerated” planar graph, the gate density is 2 instead of 8. We can easily represent the compiled circuit in fig. 12 (a). It is made of 9 copies of the same tile.

In fig. 12 we can see that each PEs computes the boolE of its left edge, in the lower row of or-gate. The upper row computes the . Zero values (the neutral value of OR) must be supplied to the OR-gate, to the border tiles.

The circuit for neighborhood has gate density 2, radius 2, and trans-wire count also 2. The circuit neighborhood( neighborhood ) shown in fig. 12 (b) goes two times faster. Gate density, radius and trans-wire count are all doubled to 4. More generally, neighborhood, , compiles into a cellular-circuit with a gate density,radius and trans-wire count of , and goes times faster. Translated in a CA framework, each PE would need to explore a neighborhood of radius which is ) for the 2D homogeneous planar graph. This circuit family is very particular because the same reduction (OR) is applied repetitively times. In the general case, different reductions are applied at each stage. If the radius is , for each stage at height , , a CA simulation must entirely traverse the neighborhood of radius . The complexity of the translated CA execution becomes ) instead of for cellular circuits. The improved complexity is due to a fine grain interleaving of computation with communication: As soon as a field is computed by applying a reduction, it is communicated again. In this way, a reduction done for one vertex benefits to neighbor vertices. As a result, the circuit’s complexity augments only linearly with the radius. When programming complex circuits, this feature allows to handle large radius.

4.2 A circuit for the discrete Voronoï Diagram (VD) of -blobs.

A definition of discrete VD encoded with fields.

Let be a boolV encoding seeds as -blob. The seeds are therefore possibly non-punctual. The discrete Voronoï cell of an

-blob is the set of vertices strictly nearer to it than to other blobs. Here, the distance is the same V,E,F hop count, used to define the radius. Voronoï cells partition the set of vertices. The VD is usually defined as the partition itself. In the continuous case, this partition can be represented by polygons; In the discrete case it is less obvious due to the following discrete artifact: If two nearest seeds are at even (resp. odd) distance, their Voronoï cell are separated by an edge (resp. a vertex). Spatial types can represent a set of vertices (boolV) as well as a set of edges (boolE). This allows to define the VD as a boolE and a boolV: VD(

)=(VD, VD):

Definition 5

The discrete Voronoï Diagram (VD) of a set of -blobs is the set of edges and vertices equidistant to at least two nearest -blobs.

Let closureboolVboolEboolV be defined as closure. It needs 4 gates. We remark that closureVD encodes VD using only vertices, with the additional important topological property of separating the seeds. This property is of high interest for SDN-media, and makes it a legitimate representation of the VD. We call it the “vertex-VD”.

neighborhood
frontier
frontier
nbcc
merge
merge
next state
Figure 13: Folded form of the VD circuit. Thick arrows represent multiple connections to all neighbors: the fan out (resp fan in) si the co-arity of the sender (resp. the receiver). Thick gate represent simplicial reduction, they combine a number of inputs equal to the co-arity. Curved arrows represent a single connection, it links two gates of distinct radius. Thin arrows (including curved) and thin gates represent non-spatial computation within the same tile. To unfold the circuit, thick arrow must be drawn towards every neighbors, and each gate must be copied in each tile.

The VD circuit.

Instead of marking the vertex-VD, we mark the complement which consists in Voronoï cells deprived from vertices adjacent to another distinct Voronoï cell. We call those “strict Voronoï Cell”. We reuse the preceding growing circuit of the preceding subsection. Starting from -blobs, we grow everywhere except on closuremerge where merge = (merge, merge); Since merge-points were defined precisely so as to avoid merging supports, the growth will be canceled when two -blobs come close (one or two vertices away). As a result, the -blobs will grow until they exactly fill their associated strict Voronoï cell. As shown in fig. 14, convergence happens in a time equals to half the diameter of the medium. We choose a set of seeds in order to illustrate a “multi-vertex” equidistant to three seeds or more. In the hexagonal case, multivertice can also be detected because they are in outside, i.e. they remain surrounded by unmarked vertices.

Theorem 4.1

The circuit neighborhoodclosuremerge fills exactly all the strict Voronoï cell, using 55 gates and a radius 4.

t=2
t=0
t=3
t=1
t=2
t=0
t=1
t=3
Figure 14: Iteration of the circuit computing the strict Voronoï cells in a boolV layer (light gray). At , contains the seeds. Convergence happens at . Two auxiliary boolV and boolE field (black) represent merge. There is also a unique multi-vertex (dark gray) detected as outside. Execution on the left uses the hexagonal medium, and on the right, the isotropic medium of fig. 2.

Proof.

The circuit’s radius is 4 because meet’s radius is 3. The circuit is shown in fig 13 in a folded representation. It allows also to see the radius of computed field: it is the hop count from the gate computing it, to the flip-flop noted MEM storing the boolV layer . Function merge costs 22+19=41 gates. The total gate count is 8+1+1+4+41 =55. Notice first that the vertex-VD is preserved from one iteration to the next: vertex-VD vertex-VD:

  1. The part not adjacent to is preserved because growth is uniform.

  2. The part adjacent to is precisely closuremerge, which means it is directly detected as part of the vertex-VD at time , and will remain empty.

The configuration is increasing, and will therefore converge at a time . When this happen, is surrounded by closuremerge, otherwise would keep growing. The connected components of vertice within the complement of closuremerge are of two types: either totally marked or totally unmarked. The empty components contain some vertices equidistant to three seeds or more. We call those “multi-vertice”. The examples in fig. 14 were chosen to illustrate a multi-vertex component consisting of a ball of radius 1, centered on a vertex equidistant to three seeds. As fig. 15 shows, a multi-vertex component can become arbitrary big if we choose a set of seeds regularly spaced on a big discrete circle, so that the circle’s center is equidistant to arbitrary many seeds. For the discrete VD, such situations must be considered because they can occur with a non-zero probability. In order to prove that identifies exactly all the strict Voronoï cells, we must still prove that the multi-vertex components do not intersect strict Voronoï-Cell. Let be one such multi-vertex component. The vertice adjacent to , outside , are connected because the graph is planar and maximal. They form a closed curve around included in closuremerge, which is itself included in the vertex-VD. A strict Voronoï cell is connected, and contains a seed, while does not contain seeds. If one strict Voronoï cell was intersecting , let be a vertex in the intersection, we can apply the Jordan theorem: a path between the seed of and , within , would have to intersect the closed curve which is absurd; because is in the vertex-VD which is the complement of strict Voronoï Cell.

(a)
(b)
(c)
Figure 15: Large multi-vertex component. (a) Some seed (gray), meet-points (black). Strict Voronoï cells are limited to the seed themselves (b) closuremerge (black) separates vertice in strict Voronoï cells, plus (c) a zone including two multi-vertice (black) equidistant to six seeds.

5 Conclusion

Contrasting Cellular Circuits with CA.

The usual scheme for specifying cellular computation, Cellular Automata (CA), uses a lattice of Processing Elements (PE) exchanging their finite state between direct neighbors, and applying a Look-up Table (LUT) to find out the next state. In this paper, we present a new scheme using spatial types and producing cellular circuits. The goal is to explore the world of cellular computation beyond lattice networks to reach amorphous computing, and along the complexity axis of elaborate rules with high radius.

It is not automata that are mapped on a network’s vertices, but bits and gates. The network can be any maximal planar graph. Bits and gates are distributed not only on vertices, but also on edges and faces of this graph, and also on secondary locus in-between those. No LUT are used. Instead, fields of those bits are computed from other fields using functions programmed with spatial operations. Those operations let data travel between adjacent vertices, edges and faces, and interact through reductions. Operation-expressions can be directly translated into circuits of logical gates. The new scheme improves efficiency and programmability.

5.1 Improving efficiency

Exploiting symmetries to factorize computation.

Cellular circuits exploit the spatial symmetries always occurring when doing artificial physics. For example, let be a boolV (a boolean vertex field). Consider the computation of frontier which returns true for edges on the frontier of an -blob. It is symmetric with respect to the edge’s adjacent vertices : one must be empty, and the other filled. In a PE+LUT scheme, the computation of frontier must be done on PEs assigned to vertices. It has to be computed two times for and

, and stored as a vector of 6 boolV, (bit density of 6) with no obvious visualization. With spatial types, it is computed a single time, using a xor reduction. It is stored as a unique boolean edge field (a boolE with bit density of 3). There is an automatic nice visualization looking like a set of closed curves around each blob (fig. 

3), which precisely remind frontiers. The speed up brough by exploiting symmetries with reduction really makes a difference when several reduction are applied in sequence, generating high radius field. It results in a change of time complexity with respect to the radius, as we now detail:

Translating cellular circuits into CAs.

If the planar-graph is a lattice, the cellular circuit can be tiled with the exact same building block circuit (as is done in fig. 12 for the degenerated 1D case). The cellular circuit can be translated into a formal CA, by doing the computation of each tile on the PE of a CA. However, as fig. 12 shows, the tiles exchange not only the stored state, but also intermediate values that have already used the neighbor state in their computation. In contrast, in CAs, each PE receives only the stored state from neighbor. In the CA translation, each PE must therefore redo part of the computation done by neighboring tiles. The analysis done in subsection 4.1 shows that for a computed field of radius , this extra work causes the time complexity to jump from to making it unfeasible for large (for our current SDN medium ).

SIMD pipelined execution.

We used exclusively the hexagonal lattice with 64 columns for simulating the SDN-medium. Our simulator process the lattice row by row, in a pipeline way. It has two advantages: 1- it exploit the SIMD capability of standard PC: 64 logic gates can be evaluated in a single logic operation on long integer, 64 bits can be communicated with a single bit rotation444We measured more than 64 gate evaluation per clock-cycle, due to the super-scalar capabilities present in standard laptops.2- Rows of generated intermediate fields are consumed at the same rate as they are created. As a result, only rows have to be stored, where is the radius.

5.2 Improving programmability through procedural programming.

This is probably the most significant advantage of spatial types. When programming a cellular circuit, we do not directly focus on achieving a specific update function. Instead, we program and debug separately a library of functions, that we reuse later, just as we do with standard procedural programming languages. For example in this paper, we introduce first low-level functions to compute blob features such as the inside, the outside, the frontier, the neighborhood. We also implement a one-to-one communication-operation between apex neigbors as a function. Those functions have radius 1 or 2, and need less than 10 gates. Let be a boolV representing the support of agents. We then program the more complex function of radius 3: merge 41 gates (resp. divide) used for moving agents without merging (resp. without dividing) their support. The disjunction meetmergedivide is is a key building block reused dozens of time in the SDN-media in order to preserve agent’s support. In this paper we reuse the merge component adding 14 gates, in order to obtain a radius-4, 55-gates cellular-circuit computing the VD which is not related to the SDN-media555An SDN-medium does compute a VD, but a dynamic one, using a much more complex circuit: it computes distances modulo 8 ( [13]), and constantly updates the VD as the seeds are moving simultaneously with their VD computation.: the discrete Voronoï Diagram.

Using auxiliary fields.

CA transition rules tends to be simple. One factor limiting their complexity is that during an update cycle, there is only a limited set of data availabe as inputs: the local state plus the states of the other PEs in the immediate neighborhood. With spatial types, one update cycle includes an arbitrary number of steps of exchange-compute. Each step is a reduction which produces a new auxiliary field, that can be exchanged again, to become the input of another steps. The generated auxiliary fields increase the volume of data on which it is possible to compute. They are like auxiliary variables used in procedural programing: they store intermediate results to be re-used several times for different purposes. For example the auxiliary field frontier appears in formula 10 and is reused for computing nbcc and frontier. So it is reused three times in total for the computation of meet. Designing increasingly complex transitions naturally results from adding new steps, and is most often done without having to introduce more bits of state. An illustration of this fact is that in this paper, the discrete Voronoï diagram is computed with a single bit of state .

5.3 Application: from the Voronoi Diagram to the SDN medium

The discrete Vornoï Diagram (VD).

We choose this example because 1- it was simple enough to fit in the paper, and 2- it reuses a key function needed for SDN media, allowing us to start presenting it 3- It has also an intrinsic interest. Computing the VD on a CA is not new, we use a technique inspired from [2, 14]: waves propagate synchronously and define the VD when they collide. Spatial types can do it for the more general context of maximal planar graph. The program can capture the simple algorithmic essence of the wave technique, which is to grow the seeds uniformly as much as possible and stop just before they meet. The circuit uses 55 gates on each tile, for the hexagonal case. We conjecture that it is the minimum. The number 55 measures the complexity of the computation in a more precise way than just the number of states which is traditionally used in the CA community. There is only one bit of state, compared to two bits for [2] (four states) and bits for [14], where is the number of PEs. As in amorphous computing, the hypothesis of synchronism is not mandatory for the circuit’s operation. If the unique bit of state is not updated with a small probability, uniformly on each PE, then the circuit will still compute an approximation of the VD, with fluctuation due to variation in the propagation speed.

The SDN medium

Spatial types and cellular circuits were developed as a necessary tool needed to construct piece by piece a quite complex computing medium which can simulate Self Developing Network (SDN). The goal of SND-media is to broaden the scope of what can be computed on a computing medium, and reach general purpose computing. The complexity of SND-media is due to the simulation of many artificial physical laws, needed to achieve division of homogenized membranes. Our current version, uses 77 bits of state 14,291 gates, 300 trans-wires between cells and a radius of 25. A real-time execution, interpreting a flow of host-instructions dictating the development of a virtual 2D-grid can be viewed on the three videos [8], it gives an idea of how complex computational behavior can be obtained thanks to the new scheme, while still doing bit-level local communication, the essence of CA.

Acknowledgment

We thank Luidnel Maignan for his fruitful comments.

References

  • [1] Abelson, H., D.Allen, Coore, D., Hanson, C., Homsy, G., T. F. Knight, J., Nagpal, R., Rauch, E., Sussman, G.J., Weiss, R.: Amorphous computing. Commun. ACM 43(5), 74–82 (2000)
  • [2] Adamatzky, A.: Voronoi-like partition of lattice in cellular automata. Mathematical and Computer Modelling 23(4), 51 – 66 (1996)
  • [3] Chopard, B., Droz, M.: Cellular Automata Modeling of Physical Systems (1998)
  • [4] Coore, D.: Botanical computing: a developmental approach to generating interconnect topologies on an amorphous computer. Ph.D. thesis, MIT (1999)
  • [5] F.Gruau: Self developing networks, part 1: the formal system. Tech. Rep. 1549, LRI (2012), http://www.lri.fr/~bibli/Rapports-internes/2012/RR1549.pdf
  • [6] F.Gruau: Self developing networks, part 2: Universal machines. Tech. Rep. 1550, LRI (2012), http://www.lri.fr/~bibli/Rapports-internes/2012/RR1550.pdf
  • [7] Grigor’yan, A., Muranov, Y.V., Yau, S.T., et al.: Graphs associated with simplicial complexes. Homology, Homotopy and Applications 16(1), 295–311 (2014)
  • [8] Gruau, F.: Videos of development on the general purpose cellular automaton (2018), https://www.lri.fr/ gruau/development
  • [9] Gruau, F., Eisenbeis, C., Maignan, L.: The foundation of self-developing blob machines for spatial computing. physica D:Nonlinear Phenomena 237 (2008)
  • [10] Gruau, F., Maignan, L.: Spatial types: a scheme for specifying complex cellular automata to explore artificial physics. In: TPNC 2018. LNCS, vol. v, p. pp (2018)
  • [11] Hillis, W.D.: The connection machine. MIT press (1989)
  • [12] Lengauer, T.: VLSI theory. In: Handbook of theoretical computer science (vol. A): algorithms and complexity, pp. 835–866. MIT Press, Cambridge, MA, USA (1990)
  • [13] Maignan, L., Gruau, F.: Integer gradient for cellular automata: Principle and examples. In: SASO 2008. IEEE (2008)
  • [14] Maniatty, W.A., Szymanski, B.K.: Fine-grain discrete voronoi diagram algorithms in l1 and l∞ norms. Mathematical and Computer Modelling 26(4), 71–78 (1997)
  • [15] Nagpal, R., Shrobe, H., Bachrach, J.: Organizing a global coordinate system from local information on an ad hoc sensor network. Springer Berlin Heidelberg (2003)
  • [16] Rauch, E.: Discrete, amorphous physical models. International Journal of Theoretical Physics 42(2), 329–348 (2003)
  • [17] Schlömer, T., Heck, D., Deussen, O.: Farthest-point optimized point sets with maximized minimum distance. In: Proceedings of the ACM SIGGRAPH (2011)
  • [18] Wolfram, S.: Statistical mechanics of cellular automata. Reviews of Modern Physics 55(3), 601–644 (1983)
  • [19] Zhou, H., Jin, M., Wu, H.: A distributed delaunay triangulation algorithm based on centroidal voronoi tessellation for wireless sensor networks. In: MobiHoc ’13. ACM (2013)

Appendix

We here report more technical issues.

V
E
E
F
F
Figure 16: Finite State automaton for computing the radius: state distinguishes perimeter-edges and perimeter-faces noted , from radius-edges or radius-faces noted . Transitions correspond to transfer communications. Thick transitions indicates when the radius is incremented.

Computing the radius of a circuit.

The radius of layers is 0, since they are directly read from memory. Usually the radius is incremented, each time a transfer is done. However, fig 13 shows a conter example: the field nbcc undergoes three transfers: from vertex to edge to face to vertex again, but its radius is only 2, (it is clearly computed on vertices around the starting vertice of reference). The second transfer, from edges to faces did not increase the radius, because the faces adjacent to the edges at distance 1 from the starting vertice are also at distance . In the same way edges (resp vertices) which are adjacent to faces (resp. edges) at distance , are also at distance . We call edges (or faces) at distance 1 (resp. 2) “perimeter” (resp. “radial”) edges (or faces). When computing the radius of an auxiliary field, by induction on the operator expression, we must remember for edges or faces, whether it is a perimeter, or a radial one. We take this “sub-type” into account when a transfer occurs, to know wether the radius is incremented or not. Fig 16 shows the finite state automaton that does this job. What matters for the computation is the parity of transfers done since the last vertex. If communications is done only between edges and faces, the radius augments on average only one transfer out of two. If a binary non-spatial operation is applied to two fields, then the resulting radius is of course, the max of the radius of the input fields, and if the locus is edges or faces, we have radialperimeter when determining the resulting sub-type.

Processing of the border.

A circuit is a finite object. On a 2D plane, a difficulty occurs on the border of the planar graph: the unbounded face is not triangular: it is adjacent to all the vertices on the perimeter which number is where is the total number of vertices. There are two solutions: 1-The simplest is to come back to a triangulated form by considering a 2D-torus instead of a 2D-plane. Simplicial proximity can also be defined on toroidal-graph , whose vertices are mapped on a 2D torus, and edges do not cross. In our experiments, we use the perfect hexagonal toroidal graph shown in fig 17 (a) (with 64 columns). The same construction can be done in the isotrope case. 2- Ultimately, the border will have to be instanciated, so as to model input and output to the cellular circuit. Notice that the vertices of the border form a 1D-ring. This ring is not used for computation. When a reduction of a vertex field is done, the field is prolongated on the vertice of the ring, by setting the value to the neutral element of the reduction. In the example of the Voronoi Diagram, frontier is computed using a xor reduction, and the neutral element for xor is zero. It means that the seeds on the border behave as if they were surounded by empty vertices.

C
D
E
F
C
3
0
1
2
3
0
7
4
5
6
7
4
B
8
9
A
B
8
C
D
E
F
C
F
3
0
1
2
3
0
(a)
(b)
C
D
E
F
C
3
0
1
2
3
0
7
4
5
6
7
4
B
8
9
A
B
8
C
D
E
F
C
F
3
0
1
2
3
0
Figure 17: The 4 hexagonal lattice: (a) The toroidal wrapping. (b) The underlying planar graph (c)Voronoi polygons associated to V, E, F data points.