A characterization of the Edge of Criticality in Binary Echo State Networks

10/03/2018 ∙ by Pietro Verzelli, et al. ∙ 0

Echo State Networks (ESNs) are simplified recurrent neural network models composed of a reservoir and a linear, trainable readout layer. The reservoir is tunable by some hyper-parameters that control the network behaviour. ESNs are known to be effective in solving tasks when configured on a region in (hyper-)parameter space called Edge of Criticality (EoC), where the system is maximally sensitive to perturbations hence affecting its behaviour. In this paper, we propose binary ESNs, which are architecturally equivalent to standard ESNs but consider binary activation functions and binary recurrent weights. For these networks, we derive a closed-form expression for the EoC in the autonomous case and perform simulations in order to assess their behavior in the case of noisy neurons and in the presence of a signal. We propose a theoretical explanation for the fact that the variance of the input plays a major role in characterizing the EoC.

READ FULL TEXT VIEW PDF

Authors

page 3

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

bESN
binary ESN
EoC
Edge of Chaos
ESN
Echo State Network
FIM
Fisher Information Matrix
LSM
Liquid State Machine
MFT
Mean Field Theory
ML
Machine Learning
PDF
Probability Density Function
RBN
Random Boolean Network
RNN
Recurrent Neural Network
RP
Recurrency Plot

Echo State Networks [jaeger2001echo] maximize predictive performance on the Edge of Criticality or Edge of Chaos (EoC), which is a region in parameter space where the system is maximally sensitive to perturbations [legenstein2007edge]. To date, a complete theoretical understanding for the EoC is missing for input-driven ESNs without mean-field assumptions. This performance maximization at the EoC is known to be common to many dynamical systems, even really simple ones [langton1990computation]. This property has lead many researcher to study these models in order to explore the main features associated with the transition to chaos. In this direction, Random Boolean Networks [kauffman1969metabolic] are well studied networks. Their chaotic behavior is well understood ( [derrida1986random, wang2011fisher, farkhooi2017complete]), though their applicability is scarce due to the that they only input binary signals.

In this paper, we propose binary ESNs. The architecture is equivalent to standard ESNs but simplified as they consider binary activation functions and binary weights for the recurrent connections. To the best of out knowledge, this architecture has never been investigated before. bESNs share some similarities also with a particular form of RBNs, called Random Threshold Networks [rohlf2002criticality] (that are based on the same idea of summing the input of neurons). We derive a closed-form expression to determine the EoC in autonomous bESNs (i.e., the network is not driven by signals) and perform simulations to assess the behavior of bESNs both in the autonomous and non-autonomous case. We experimentally assess the quality of our theoretical prediction regarding the onset of chaos in bESNs in the autonomous case. Results show perfect agreement with the theory. Then, in order to asses the network stability, we analyze the impact of noise on the neuron outputs on the onset of chaos. Our findings suggests that the chaotic region expands linearly with the noise intensity when the mean degree (i.e., the average number of links a neuron has) is high enough. We also study the EoC for bESNs

driven by (continuous) signals, discussing how our findings could be generalized considering the signal gain as an hyperparameter. This work sets also in the context of reducing model complexity, in which binarization plays an important role both for theoretical aspects

[baldassi2016learning] and applied perspectives [cheng2018model], since it would considerably reduce required hardware resources and speed-up training algorithms.

2 Background material

2.1 Echo State Networks

An ESN is a discrete-time non-linear system with feedback, whose model reads:

(1)
(2)

An ESN consists of a reservoir of neurons characterized by a non-linear transfer function . At time the network is driven by the input and produces an output , and

being the dimensions of the input and output vectors, respectively.

The weight matrices (reservoir internal connections), (input-to-reservoir connections) and (output-to-reservoir feedback connections) contain values in the

interval drawn from a uniform (or sometimes Gaussian) distribution. The output weight matrices

and , connecting reservoir and input to the output, represent the readout layer of the network. Activation functions and (applied component-wise) are typically implemented as a sigmoidal () and identity function, respectively. Training requires solving a regularized least-square problem [jaeger2001echo].

Various empirical results suggest that ESNs achieve the highest expressive power, i.e., the ability to provide optimal performances, exactly when configured on the edge of the transition between a ordered and chaotic regime (e.g., see [jaeger2001echo, yildiz2012re, livi2017determination, rivkind2017local, legenstein2007edge, rajan2010stimulus]). Once the network operates on the edge -or in proximity to - it achieves the highest memory capacity (storage of past information) and accuracy prediction, compatible with the network architecture. For determining the edge of chaos, one usually resorts to computing the maximum Lyapunov exponent [gallicchio2017echo] or identify parameter configuration maximizing the Fisher information [livi2017determination].

2.2 Random Boolean Networks

RBNs where first proposed in [kauffman1969metabolic] as a model for the gene regulatory mechanism. The model consists of variables – sometimes called spins – whose time-evolution is given by , where each is a Boolean function of variables, representing the incoming links to the -th element of the net. There exists possible Boolean functions of variables. The output of is randomly chosen to be with probability and with probability , so that usually one refers to as the bias of the network.

RBNs show two distinct regimes, depending on both the value of and : a phase in which the network assumes a chaotic behaviour and a phase (sometimes called frozen) in which the network rapidly collapses to a stable state. In [derrida1986random], the authors justify this behavior by studying the evolution of the (Hamming) distance of two (initially different) configurations. There, they derive the following formula for the onset of the chaos: .

3 Binary echo state network

In this section, we introduce bESNs and study the dynamics of a reservoir similar to (1) constituted of binary neurons for and binary weights for (the zero value accounts for the fact that two neurons may not be linked). For simplicity, we will not consider feedback connection (i.e., ) . The bESNs system model simplifies as:

(3)
(4)

is the input signal, which we consider to be unfiltered (, i.e., the all-ones vector), for simplicity. When for every (i.e., there is no input), we say that the system is autonomous. The study of the autonomous system plays an important role, since it allows us to investigate analytically the network dynamics and its properties. Reservoir connections are instantiated according to the Erdős–Rényi model where each link is created with probability . If the link is generated, the weight value is set to with probability or with probability .

The proposed bESN model is controlled by three hyperparameters: (1) , the number of neurons in the network; (2) , the mean degree of the network; (3) , the asymmetry in the weights values. These hyperparameters are related to and , although they are easier to understand: in fact, has a natural interpretation in terms of mean neuron degree that does not depend on the network size . The choice of using is due to the symmetry of the model around the zero value and to the fact that, with this choice, a positive (negative) value of the hyperparameter accounts for majority of positive (negative) weights. Note that can vary continuously from to and . A similar model was proposed in [rohlf2002criticality], but in their work the weights assume a positive or negative value with equal probability, i.e., their model corresponds to ours in the case.

3.1 Edge of chaos in binary ESN

Here, we study two networks with the same weight matrix , that are in the states and , the latter refers to the perturbed network and differs only in one neuron whose state is flipped. The goal is here to understand under which conditions the time evolution of the perturbed network differs from the original one, i.e., whether the perturbation will spread and significantly impact the network behavior or not.

For , the fraction of positive-valued neurons is equal to the probability for a neuron of being positive, namely , while the fraction of negative neurons is . By comparing the original network with the perturbed one, the probability that the flipped neuron will have an influence on a neuron connected to it will be given by two terms: the probability that neuron state is positive multiplied by the probability of switching from positive to negative , plus an analogous terms accounting for the negative part ( and respectively. Assuming that (i.e., that the probability of turning negative from positive is equal to the probability of being negative) and, analogously, that (that can be seen as a formulation of the annealed approximation introduced in [derrida1986random]), one obtains:

(5)

We now define and as the mean out-degree and in-degree of a neuron, respectively. Since a single neuron has influence on neurons, the expected number of changes is given by , to which one has to add the fact that at least one neuron has changed due to the flip. Therefore, if this number is bigger than half of the mean incoming links of a neuron, i.e., , then the perturbation will dominate the network dynamics and will propagate. Since in an Erdös–Rényi graph , we obtain the following condition for the onset of chaos:

(6)

which using can be rewritten as:

(7)

Note that the mean degree in (7) plays a “stabilizing” role (i.e., the higher the degree, the larger the magnitude of required for chaos), as opposed to RBNs, where increasing the mean degree leads towards a chaotic region.

4 Experiments

4.1 Edge of chaos

In order to assess the agreement between the prediction given by Eq. 7 and experimental results, we conducted an exploration of the parameter space. Here, we exploit the fact that our neurons assume binary states only and consider their Shannon entropy as an indicator for the transition to chaos. The entropy was computed considering a time average ,

(8)

where the entropy of a configuration

is estimated as

, in which is the number of neurons whose state is and the number of neurons with state . We expect Eq.(8) to be be almost zero in the frozen regime and almost one in the chaotic one, with a sharp region of intermediate values that we consider to be the edge of chaos.

In order to explore the parameter space, we run a series of simulations using a network of neurons with different random initial conditions and connection matrices , generated using specific values of and . In Eq. 8, we used time-steps and accounting for an initial transient from the initial state to a stationary condition. Results are showed in Fig. 1 and demonstrate almost perfect agreement with the theoretical result (7).

Figure 1: Values of for different configurations of the asymmetry and the mean degree. The experiment shows a good agreement with the predicted EoC region (dashed line), where we observe an abrupt change of the entropy from to .

4.2 Effects of perturbations on state evolution

In order to assess the effect of chaos on the network behavior, we compare the evolution of a bESN instantiated with weight matrix but different initial conditions. Starting from a random initial condition, we generated additional initial conditions by flipping the state of a single neuron (as in Sec. 3.1). Here, an initial condition is randomly generated with a biased probability for a neuron to assume a positive value. We let the original and perturbed networks evolve, and take into account the (normalized) Hamming distance, , between trajectories.

Results are summarized in Fig. 2. We observe that, in the ordered phase, perturbations on the initial state have no effect on the network evolution and the Hamming distance of the perturbed trajectory from the original one is zero. As decreases (i.e., the networks approach the chaotic regime), we observe how the Hamming distance significantly increases, leading to chaos. Note that the maximum value achievable by the (normalized) Hamming distance is , corresponding to the distance of two random binary vectors (a larger distance would imply a negative correlation). In the same set of figures, we show three additional indicators, called Energy, Activity, and Entropy. The mean Energy, defined as , quantifies the average number of positive and negative neurons. In the frozen phase, the network almost instantly evolves towards values close to (cfr. the role of , discussed above), and then rapidly decreases to , which is the expected value in the chaotic phase. The mean Activity of network at time-step is defined as the (normalized) Hamming distance of the current state w.r.t the previous one, , i.e., the number of neurons that changed their states in one step. As expected, networks operating in chaotic regimes are characterized by an elevated activity. Lastly, we plot the evolution of the Entropy (8) over time. As expected from the theory, transitioning to a chaotic regime is signaled by a sharp increase of entropy.

(a)
(b)
(c)
(d)
(e)
(f)
Figure 2: Mean values of the Hamming distance, Energy, Activity and Entropy of the perturbed networks, with and for selected values of (see 4.2). The -axis represents time. The values of the quantities are plotted in blue, while the dashed red lines show the variance. The predicted system should turn chaotic for , according to the theoretical formula.

4.3 Impact of noise in bESN edge of chaos

Here, we study how the EoC is influenced when considering an independent noise term for each neuron, , where is the noise gain, , and is the same as in Eq.(4). The choice of scaling the noise with was made to account for the fact that the network stability increases with it, as we discuss below.

To explore the dependency from , we ran an experiment where we fixed and plotted versus . Results are shown in Fig. 3. We can recognize three regimes: (1) for low noise values, the chaotic region remains almost constant; (2) for intermediate values, the chaotic region linearly expands with the noise intensity up until (3) there is only chaos. We repeated the experiments with different values of (not shown) and they all confirm the same linear expansion of the chaotic region (in units of ).

Figure 3: Values of for different configurations of and the . Note that is multiplied by the mean degree, which is here fixed to .

To verify this fact for a wider range of , we repeated the experiment in Fig. 1 with noise intensity . It is possible to observe in Fig. 4 how the EoC maintains its shape for lower values of , while for higher average degrees it deviates from the theoretical prediction and the chaotic region depends on only.

Figure 4: The same experiment of Fig.1, but with the presence of a noise term with (multiplied by the mean degree, so that it increases along the -axis). Note how for higher degree the chaos region is constant (the predicted value is the red dashed line), deviating from the autonomous-case prediction (red dashed line).
Figure 5:

Network driven by white noise. Values of

for different configurations of and . The mean degree was fixed to .
Figure 6: Network driven with the sum of three sinusoids. Values of for different configurations of and . The mean degree was fixed to .

We explain this fact as follows. Neurons can only assume or values. The probability of a neuron having positive inputs is then where is its in-degree. If we consider that , where is the value of the sum of the positive and negative inputs (whose sign determines the value of the neuron), then we obtain . The expectation of is , so that the expectation of and its variance are:

(9)
(10)

Note that these values are related to a single neuron. For a general understanding of the network behavior, one simply uses instead of in the expressions above, so that it is possible to consider as a mean-field variance of the total inputs to neurons. The impact of the noise on the network can then be studied considering the ratio between and . As previously discussed, the noise expands the chaos region linearly with its .

The noise we are considering has a standard deviation

. This leads us to a formula for the chaotic region which, for , is . This relation, as shown in Fig. 4, does not depend on . As such, having Gaussian noise with standard deviation , the formula is , or in terms of , we have . In our experiments, constant was determined as .

4.4 Impact of a signal

As for the noise, the magnitude of the signal should have a major role in the EoC, but this time the chaotic region should reduce instead of expanding, since the signal is known to suppress chaos in certain conditions [rajan2010stimulus]. The signal introduces a correlation among neurons, which makes the annealed approximation ineffective. We drive the network with the signal as in (4), but we scale with a gain factor , since we are interested in its usage as an hyperparameter and not in relation with . We initially feed the network with white noise (note that this is different from what we did in 4.3, since in this case the noise is the same for each neuron): from Fig. 5 one can observe how the chaotic region rapidly shrinks as increases, but a region with an intermediate value of entropy expands (linearly). This is due to the fact the signal prevents the system from collapsing in a stable state, keeping the entropy above zero.

In Fig.6 we show the results obtained for the normalized sum of three sines with incommensurable frequencies (repeated also with different numbers of sinudoids, not shown). Again we note how the chaotic region shrinks as increases, with the appearance of the region characterized by intermediate entropy values which, instead, expands.

5 Conclusions

The binary ESN model herein introduced is in principle similar to regular ESNs. However, its simplicity permits a theoretical analysis of some important aspects of the transition to chaos. The expression we derived here for the autonomous case perfectly matches the experimental results. Our analysis of the noise applied to neuron activations showed how the network stability increases linearly with the mean degree of recurrent connections. The effects of input signals on the network dynamics are more complex to understand, since they introduce correlations among neurons. Our analysis partially explained the role that the signal magnitude and the mean degree play in shaping the EoC of the non-autonomous case.

[heading=bibintoc]