Spontaneous Emergence of Computation in Network Cascades

04/25/2022
by   Galen Wilkerson, et al.
0

Neuronal network computation and computation by avalanche supporting networks are of interest to the fields of physics, computer science (computation theory as well as statistical or machine learning) and neuroscience. Here we show that computation of complex Boolean functions arises spontaneously in threshold networks as a function of connectivity and antagonism (inhibition), computed by logic automata (motifs) in the form of computational cascades. We explain the emergent inverse relationship between the computational complexity of the motifs and their rank-ordering by function probabilities due to motifs, and its relationship to symmetry in function space. We also show that the optimal fraction of inhibition observed here supports results in computational neuroscience, relating to optimal information processing.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

03/22/2014

Cortex simulation system proposal using distributed computer network environments

In the dawn of computer science and the eve of neuroscience we participa...
01/30/2013

Statistical mechanics of complex neural systems and high dimensional data

Recent experimental advances in neuroscience have opened new vistas into...
08/31/2020

Collectively canalizing Boolean functions

This paper studies the mathematical properties of collectively canalizin...
04/04/2019

On functions computed on trees

Any function can be constructed using a hierarchy of simpler functions t...
05/14/2019

Stochastic thermodynamics of computation

One of the major resource requirements of computers - ranging from biolo...
04/20/2011

Emergent Criticality Through Adaptive Information Processing in Boolean Networks

We study information processing in populations of Boolean networks with ...
03/23/2021

Combinators: A Centennial View

We give a modern computational introduction to the S,K combinators inven...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The relationship between physical systems and information has been of increasing and compelling interest in the domains of physics [jaynes1957information, wheeler1992recent, ben2008farewell], neuroscience [sejnowski1988computational, lynn2019physics], computer science [shannon1941mathematical, turing1936computable, dubbey2004mathematical, wolfram2002new, conway1970game, lizier2014framework, rojas2013neural], quantum computing [lloyd2013universe, wheeler1992recent, perseguers2010quantum], and other fields such as computation in social networks [schelling1971dynamic, granovetter1978threshold, sakoda1971checkerboard], or biology [schrodinger1992life, brooks1988evolution] to the point where some consider information to be a fundamental phenomenon in the universe [lloyd2010computational, wheeler2018information, knuth2011information]. Often, physical systems operating on information take place on, or can be modeled by, network activity [watts2002simple], since information is transmitted and processed by interactions between physical entities.

The principle of Occam’s razor and goals of achieving a deeper understanding of these physical-information interactions encourage us to find the simplest possible processes achieving computation. Thus we may conduct basic research into understanding necessary and sufficient conditions for systems to perform information processing.

Cascades, particularly on networks, are such a simple and ubiquitous process. Cascades are found in a great number of systems – the brain, social networks, chemical-, physical- , and biological- systems – occurring as neuronal avalanches, information diffusion, influence spreading, chemical reactions, chain reactions, activity in granular media, forest fires or metabolic activity, to name a few [watts2002simple, kempe2003maximizing, newman2018networks, easley2010networks, christensen2005complexity, jalili2017information]

. The Linear Threshold Model (LTM) is among the simplest theoretical models to undergo cascades. As a simple threshold network, the LTM is also similar to artificial models of neural networks, without topology restrictions

[rojas2013neural].

Since the work of Shannon [shannon1948mathematical], the bit

has been considered the basic unit of information. Therefore, whatever we can learn about processing of bits can be extended to information processing in non-Boolean systems. The tools of Boolean logic then allow us to begin to develop a formalism linking LTM and other cascades to information processing in the theory of computing

[von1956probabilistic]. In systems of computation or statistical learning, patterns of inputs are mapped to patterns of output by Boolean functions [savage1998models, rojas2013neural].

Another way to express this is that a bit is the simplest possible perturbation of a system. Bits can interact via some medium, these interactions can be represented by edges in a network, and Boolean functions describe the results of possible interaction patterns.

Since we aim to study this topic from first principles, we are interested in how the combinatorial space of possible networks interacts with the combinatorial space of possible Boolean functions, via cascades and the control parameters. Particularly, we would like to understand the phase space of Boolean functions computed by LTM nodes on the input (seed) nodes by the cascade action.

From a mathematical perspective, we can treat the brain or other natural systems having elements in the worst case as a random network, where there are possible connections, yielding possible networks. Meanwhile, the space of Boolean functions grows exceptionally quickly. There are unique Boolean functions on inputs. This immediately makes us ask how this space behaves, and how large networks such as the brain can navigate toward particular functions in this vast space. We also observe that for the all the functions available on

inputs, the decision tree complexity (depth of decision tree computing them) appears exponentially distributed, meaning that the vast majority of functions available are complex as

increases.

A somewhat surprising initial result in this investigation is that complex functions on inputs emerge spontaneously and seemingly inevitably as threshold networks are connected at random.

2 Linear Threshold Model (LTM), Boolean logic, and antagonism

The Linear Threshold model (LTM) [watts2002simple] is defined as follows: A random (Erdos-Renyi-Gilbert) graph is constructed, having nodes and , the probability of an edge between each pair of nodes. Each node is then assigned a random threshold

from a uniform distribution,

. Nodes can be unlabelled or labelled, and are all initialized as unlabelled. To run the cascade, a small set of seed nodes are perturbed, marked as labelled. Now, each unlabelled node is examined randomly and asynchronously, and the fraction of its graph neighbors that are labelled is determined, where is the number of ’s neighbors that are labeled, and is ’s degree. If ’s fraction reaches its threshold ,

is marked labelled. This process continues until no more nodes become labelled. Here we note that the LTM may be written in vector form, and bears some similarity to the artificial McCulloch-Pitts neuron

[rojas2013neural].

It has been shown that the LTM exhibits percolation, where a giant connected component (GCC) of easily-influenced vulnerable nodes (having , where is the degree of ) suddenly arises at the critical connectivity [watts2002simple].

We observe that cascades in the LTM compute monotone Boolean functions (the number of true outputs cannot decrease in the number of true inputs) at each node on input perturbation patterns [wilkerson2019universal]. In our numerical experiments, we create the LTM as above, but choose input seed nodes and (for inputs) as the only possible loci of initial perturbation. In one trial, we create a network, freezing network edges and thresholds across all possible input patterns [Table 1, cols. a, b]. For each input pattern we reset non-seed nodes to unlabelled, set seeds according to inputs, and run the cascade. We then identify the function computed by each node () [Table 1, cols. 0-15].

inputs functions
a b 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Table 1: Truth tables for binary functions. The truth tables of all possible unique binary Boolean functions are shown ( functions). The LTM can only compute monotonically-increasing Boolean functions (columns 0, 1, 3, 5, 7), where the first row equals zero, since the seed nodes are unlabelled. Thus, it cannot compute functions 2, 4, 6, or 8 to 15.

The zero function, (False) is computed by a simple sub-network, where node has no path to either seed node [Fig. 1]. Similarly, function (AND) is computed by with a sub-network having paths from both seed nodes , and a threshold . Similar sub-networks allow us to obtain nodes computing monotone functions [Fig. 1]. These sub-networks are therefore logical automata [von1951general, von1956probabilistic], and we note that they form functional logic motifs in the network [milo2002network].

Figure 1: Logic motifs compute Boolean functions. The simplest LTM sub-networks are logical automata (logic motifs) and compute the monotone functions for inputs at node on perturbations of and . Dashed lines are network paths.

We find that an LTM network cascade will yield a distribution of Boolean functions on its input nodes, and the possible functions computed by network nodes will partition the set of monotone Boolean functions [Fig. 2] (with the exception of ). Thus the LTM carries out computational cascades on input perturbation patterns.

Figure 2: LTM nodes compute Boolean functions in computational cascades. Iterating through all possible perturbations of input seed nodes and , each network node must compute some Boolean function on the inputs.

We then obtain monotonically decreasing functions (negation of the LTM), by taking the logical complement of the original LTM labelling rule, so that some node is instead activated when its fraction of labelled neighbors is less than its threshold . We call such nodes antagonistic, from which we can construct an antagonistic linear threshold model (ALTM). For 2 inputs, replacing with an ALTM node , will compute , and [Table 1], and the sub-networks are antagonistic versions of those for and , respectively [Fig. 1].

(a)
(b)
Figure 3: Function frequency corresponds to probability of required paths in a rank-ordering. (a) Logarithmic frequency of non-zero functions computed by the ensemble of LTM cascades for nodes, average degree and inputs, over 500 realizations reveals an apparent rank-ordering (solid line). Mean frequency is proportional to path probabilities, having a Pearson correlation of , both predicted by probabilities derived from logic motifs using (’+’) (e.g. see (1)), and complexity (large dot) (4) (rescaled, overlaid, both dashed). Thus (4) also well-predicts (1). Frequency therefore varies inversely with decision tree complexity (’+’). (b) Rank-ordering is more evident for inputs, appearing as a decreasing exponential with goodness of fit . Again, and . Here, Pearson correlation between and mean frequency is

. Shaded regions are one standard deviation. Probabilities have been centered and normalized.

A sufficiently large ALTM, by composing monotone decreasing functions (e.g. NAND, NOR), can undergo a cascade to compute any logical function on its nodes, forming a universal basis [savage1998models].

3 Statistics of attractors in the Boolean function space

We experiment first on the LTM, to investigate the observed frequency of Boolean functions in simulation. With a network having nodes, ensembled over 500 realizations, at mean degree

we observe that the frequency of functions is very skewed [Fig.

2(a)]. Experiments for inputs, again for nodes at mean degree , ensembled over realizations, also yield an approximate exponential decay of the rank ordering function [Fig. 2(b)].

We investigate the skewed distribution of these functions by asking "What is the probability of obtaining the simplest network to compute each of these functions?". From Fig. 1, we can derive the probability of each monotone function. For example, if there is no path from seed nodes and to some node we obtain , thus

where is the probability of a path between two randomly chosen nodes.

The function requires paths from and to , thus

(1)

However, with percolation in mind, we observe that for large graphs, the probability of paths between nodes approaches the probability that all nodes belong to the giant connnected component (GCC) [newman2018networks].

This gives us, again from Fig. 1,

where is the probability for a random node to belong to the GCC.

From [newman2018networks], we have the recursive relation

(2)

where is the mean degree.

We subsequently observe that the number of required paths from seed nodes to node , computing monotone function , is equal to the decision tree complexity (), the depth of the shortest decision tree to compute . In order for to decide the value of a seed node, the seed’s perturbation information must be transmitted along a path to .

Taking a Boolean function’s Hamming cube representation, its decision tree complexity is complementary to the number of congruent axial reflections along each of its axes (details in supplemental information A.1) That is, if a Boolean function’s Hamming cube is constant along an axis, it is independent of that axis, giving us

(3)

In other words, the number of paths a monotone Boolean function requires is exactly the number of axial reflection asymmetries of its Hamming cube.

This allows us to relate function frequency to decision tree complexity. Recall that the critical percolation threshold in an arbitrarily large Erdos-Renyi-Gilbert graph occurs at mean degree , a very small connectivity. Thus since , . Therefore, the network will be be tree-like, since the clustering coefficient [newman2018networks]. In a tree, the number of nodes is one more than the number of edges . Thus, as ,

(4)

Indeed it appears that (4) is highly correlated with the probabilities derived from logic motifs (1), and that observed function frequency is proportional to (4) as well [Fig. 2(a)], having a Pearson correlation of approximately 1.0 for k = 2, and 0.74 for k = 4. This also shows, due to (4) an inverse rank ordering relation between frequency and decision-tree complexity, appearing as a decreasing exponential in frequency. Given that, as mentioned in the introduction, there is a increasing exponential distribution of decision tree complexity in the truth table of all Boolean functions, this result is especially surprising.

3.1 Function Distribution with Antagonism

A similar simulation, having nodes, inputs, ensembled over 500 realizations in a range of mean degree values and fraction of antagonistic nodes , reveals a sudden increase in the number of unique non-zero functions vs. both and [Fig. 3(a)]. The number of unique functions is maximized over several orders of magnitude near criticality, for ], and . Observing that antagonism and inhibition are interchangeable [rojas2013neural] (Supplemental section A.2) , this lends support to optimal information processing around inhibition, found in other research [capano2015optimal], and why this fraction of inhibitory neurons seems prevalent biologically.

(a)
(b)
Figure 4: Antagonism fraction () agrees with biology; non-monotone functions also predicted by path requirements. (a) For networks with nodes and inputs, over 500 realizations, varying the mean degree and fraction of antagonistic nodes , we observe that the mean number of unique functions per network is maximized over several orders of magnitude () by networks having a fraction of antagonistic nodes (triangles), coinciding with other findings [capano2015optimal]. (b) At and , we again observe a skewed frequency, and a proportional relationship between function frequency and probability due to complexity (4), having Pearson correlation of . Shaded region is one standard deviation. Probabilities have been centered and normalized. (Functions and have been removed, since in the ALTM they can occur outside of the GCC.)
Figure 5: Motifs for non-monotone functions. Simplest logic motifs to compute non-monotone Boolean functions , , [Table 1] in the ALTM at random node , on seed nodes . Dashed lines represent paths, and dashed nodes are antagonistic. Functions , , and are negations of these, respectively, so have very similar networks, negating each node.

For this mix of LTM and ALTM nodes, we again observe a similar rank-ordering of functions, here at , and that, as in the LTM, frequency is again proportional to probability derived from function complexity [Fig. 3(b)], having a Pearson correlation of 0.91.

We note, however, that (3

) under-estimates the number of paths required for non-monotone functions. For example,

(XOR) requires 4 paths between 5 nodes, all of which must be in the GCC [Fig. 5], so that . However, this function’s decision tree complexity , predicting by (4) that . Therefore a more informative complexity measure is needed for non-monotone functions.

4 Discussion

As indicated in the title, we see the main result of interest as the spontaneous emergence of complex logic functions in the minimally-constrained random threshold networks. This then implies that many physical, biological, or other systems are able to perform such computation by ubiquitous avalanches or cascades.

We note that this result also begins to give us an explanation of the criticality hypothesis vis-à-vis neuroscience [massobrio2015criticality, hesse2014self, shew2013functional]. That is, at the critical threshold, with the emergence of the giant component, the number of unique functions spontaneously increases. Along with that comes an increase in the number of complex functions. As neuronal networks need to compute integrative complex functions on sensory information, or on information passed between modular areas in the brain, the utility of this complexity is self-evident [sejnowski1988computational]. We note that in computational neuroscience, there is also discussion of the integration of information and complexity or consciousness [tononi1994measure, lynn2019physics]. These motifs therefore give us a starting point for the relationship between structure and function as well.

Also, the present work connects to machine- or statistical-learning, where in classification, Boolean functions are computed on high-dimensional data. Until now, however, despite their ubiquity in nature, neither criticality nor cascades have played a large role in machine learning as a design paradigm or analytical framework

[rojas2013neural]. We see this as a large potential opportunity to improve deep learning methods.

The spontaneous emergence of complex computation is an example of a symmetry breaking phase transition, as the giant connected component (spanning cluster) comes into existence at the critical connectivity

[landau1937broken, anderson1972more]. We conjecture that we are witnessing how complexity of functionality results from symmetry breaking in systems [anderson1972more]. This complexity takes on a distribution that reflects a hierarchy in an exponential rank-ordering law.

We also see that, from a larger theoretical perspective, the confluence of cascades (percolation branching processes) and information processing by Boolean logic stands at the intersection between several very large and highly developed areas of research – percolation- and computational automata-theory [von1956probabilistic, christensen2005complexity].

The specific mechanism of the logical automata realized by logic motifs extends previous work about network motifs and their function, mainly in the genetic domain [milo2002network], into many other areas, again due to the ubiquity of cascades in threshold networks.

The observance of logic motifs as automata also allows us to change our perspective on network percolation. In the past, we saw it perhaps only in terms of connected component size distribution. Now, however, we may view these components as a zoo or library of functions, available to the network by connection, much as importing a function occurs in programming languages. We note that the scale invariance at criticality may exist at the Pareto-optimal point between complexity and diversity. That is, there will be a small number of larger components computing complex functions, and a great number of very small, simple components having a large variety of thresholds.

4.1 Future work

In developing this work, we inevitably stumbled across an overwhelming number of ideas and directions that we can take. We can only briefly list them.

We have seen above that other complexity measures could be found for non-monotone functions, to better predict their frequency in mixed LTM/ALTM networks. We suspect that Boolean Fourier analysis would be fruitful here. We also expect that, for larger inputs, these non-monotone functions will dominate the function space, and that the Hamming cube symmetries make it possible to write a partition function for them. Along with this, it should be possible to predict more exact probabilities of functions, which depend on the occurrence of cascades being blocked, and of nodes inheriting their neighbors’ complexity, among other factors.

We would also like to generalize these predictions to inputs and much larger networks (

nodes), while understanding mechanisms and heuristics for learning by re-wiring in these large combinatorial spaces. For example, we suspect that modularity develops as a network’s capacity to extract complexity from inputs is exhausted. We also suspect that function distribution can be understood in terms of multiple network density percolation thresholds, depending on function path requirements, more evident for larger inputs.

Furthermore, we intend to study the relation between function and network symmetry in the context of symmetry breaking. We conjecture, for example, that there is a conservation law of complexity or information, meaning that what we call computation comes at the expense of lost information, rendering the network a kind of information engine [landauer1961irreversibility], whose output is computation, and that this lies at the heart of information creation.

Of course, it could also be fruitful to understand this work in terms of information processing, using measures such as transfer entropy, of increasing use in computational neuroscience and automata theory [lizier2014framework]. Along with this we see an opportunity to formalize the criticality hypothesis in light of our results on computation. In the hypothesis, avalanche criticality (the kind of percolation seen here) and so-called edge of chaos are convolved qualitatively, by saying that information processing is optimized ’near criticality’ [beggs2008criticality, jensen2021critical].

We would like to research the effects of geographic energy constraints and other network topologies, found in real-world systems, on the function phase space. For example we conjecture that both modularity and layering will result from restricting geographic connection distance, with a result that complex functions appear at nodes on the surface (or interface) of networks, convenient for passing to subsequent networks.

Finally, although we have used the term computation here, it would be useful to carefully study the linear threshold model as a computing machine, especially when re-wiring, investigating its Turing completeness, run-time, and related phenomena.

5 Conclusion

Here we have shown that the Linear Threshold Model computes a distribution of monotone Boolean logic functions on perturbation inputs at each node in its network, and that with the introduction of antagonism (inhibition), any function can be computed. Notably, complex functions arise in an apparent exponentially decreasing rank-ordering due to their requirements for perturbation information from seed nodes, and these requirements correspond to their functional asymmetries. These asymmetries can be used to obtain their probability exponent as a function of the probability of belonging to the network’s giant connected component. Finally, we observe that the number of unique functions computed by an LTM of mixed excitatory and antagonistic nodes is maximized near antagonism, over several orders of magnitude of connectivity, coinciding with other research.

References

6 Supplementary Information

6.1 Algorithm for determining Decision Tree Complexity

Figure 6: Hamming cube representations of LTM-computable monotone Boolean functions of two variables. Line represents linear separator. Note that for monotone functions, true values must be below or to the right of false values.
1: create labelled Hamming cube from inputs, outputs
2: number of congruent reflections
3:for  do for each dimension
4:     
5:     
6: Complexity = dimensionality - congruent reflections
Algorithm 1 Decision Tree Complexity using Hamming Cube reflections

To determine a Boolean function’s Decision Tree Complexity, we create the Hamming cube on the input values of the function. The number of axes of is equal to the number of inputs (). We then label each corner of according to the function values [Fig. 6].

For each input variable of the cube’s axes, we reflect the Hamming cube about that axis, obtaining the reflected Hamming cube . If , we add one to the reflection symmetry .

The Decision Tree Complexity is then the dimensionality minus the number of congruent reflections (). The intuition is that if the Hamming cube of a particular function is congruent to an axial reflection, the function is independent of that axis.

Thus, paths and their resulting cascades break symmetry and create complexity in the network, realized in the function order parameters.

6.2 Interchangeability of Antagonism and Inhibition

Figure 7: Simplest sub-networks to convert between inhibition and antagonism. Both sub-networks have 2 internal nodes, which implies that there is a 1:1 ratio in the minimal number of nodes to perform either operation.