Deep Time-Delay Reservoir Computing: Dynamics and Memory Capacity

06/11/2020 ∙ by Mirko Goldmann, et al. ∙ 0

The Deep Time-Delay Reservoir Computing concept utilizes unidirectionally connected systems with time-delays for supervised learning. We present how the dynamical properties of a deep Ikeda-based reservoir are related to its memory capacity (MC) and how that can be used for optimization. In particular, we analyze bifurcations of the corresponding autonomous system and compute conditional Lyapunov exponents, which measure the generalized synchronization between the input and the layer dynamics. We show how the MC is related to the systems distance to bifurcations or magnitude of the conditional Lyapunov exponent. The interplay of different dynamical regimes leads to a adjustable distribution between linear and nonlinear MC. Furthermore, numerical simulations show resonances between clock cycle and delays of the layers in all degrees of the MC. Contrary to MC losses in a single-layer reservoirs, these resonances can boost separate degrees of the MC and can be used, e.g., to design a system with maximum linear MC. Accordingly, we present two configurations that empower either high nonlinear MC or long time linear MC.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The introduction of the reservoir computing paradigm by Jaeger Jaeger2001 and Maass Maass2001

independently gained considerable interest in supervised machine learning utilizing dynamical systems. The reservoir computing scheme contains three different parts: the input layer, the reservoir, and the output layer. The reservoir can be any dynamical system, like an artificial neural network but also a laser with self-feedback. The output layer is trained by a linear weighting of all accessible reservoir states, while the reservoir parameters are kept fixed. This simplification overcomes the main issues of the time expensive training of recurrent neural networks like exploding gradients and its high power consumption

Werbos1990.

Appeltant et al. Appeltant2011 successfully implemented the RC scheme onto a single nonlinear node with a delayed self-feedback. In the input layer, time-multiplexing is used to create temporally separated virtual nodes. The reservoir dynamics are hereby given by a delay differential equation, which has been proven to exhibit rich high-dimensional dynamics Hale1993; Erneux2009; Erneux2017; Yanchuk2017. For the training, the temporally separated nodes are read out and weighted to solve a given task. The introduction of time-delay reservoir computing enabled simple optical and opto-electronic hardware implementations, which led to improvements of computation time scales for supervised learning Brunner2018; Larger2017. The delay-based reservoirs were successfully applied to a wide range of tasks, such as chaotic time series forecasting or speech recognition.

The success of single node delay-based reservoir computing has triggered interest into more complex network architectures, like coupled Stuart-Landau oscillators arranged in a ring topology Rohm2018, single nonlinear nodes with multiple self-feedback loops of various length Chen2019, parallel usage of multiple nodes Sugano2020 and multiple nonlinear nodes coupled in a chain topology Penkovsky2019a. Further, it was recently shown by Gallicchio et al. Gallicchio2018; Gallicchio2019; Gallicchio2017c, that echo state networks with multiple unidirectional coupled layers called deepESN provide a performance boost in comparison to their shallow counterpart. Penkovsky et al. Penkovsky2019a found that unidirectional coupled delay systems are superior against bidirectional coupling for certain symmetric parameter choice. In the following, we present a general deep time-delay reservoir computing model, where we use asymmetrical layers and discuss their dynamical and computational properties.

The paper is structured as follows: In section II we present the system implementing deep time-delay reservoir computer and show its performance at predicting the chaotic Mackey-Glass attractor. Afterward in III, we study the dynamical properties of an autonomous two-layer system. The conditional Lyapunov exponent for a non-autonomous system is introduced in IV. The numerically calculated conditional Lyapunov exponent is then related to the linear and nonlinear memory capacity. In section VI the resonances of the clock cycle and the delay-times are presented for two and three-layer systems.

Ii Deep Time-Delay Reservoir Computing

ii.1 Model

A deep time-delay reservoir computer (deepTRC) consists of nonlinear nodes with states . All nodes feature a self-feedback with a delay length .

Figure 1: Deep Time-Delay Reservoir Computer containing layers (blue) with delayed self-feedback, where the delay length can vary. The first layer is driven by the time multiplexed input sequence whereas all other layers are driven by their previous layer weighted with . The output layer is given by linear weighting of all states.

The nodes are coupled unidirectional, where the first node is the only which is feed by the task-specific input sequence . The coupling between the nodes is instantaneous, i.e., without delay. The nodes with their corresponding feedback loops are referred to as layers in the following because of the unidirectional topology and their high-dimensional dynamics. The dynamical evolution of a layer is given by:

(1)

with , being a nonlinear delay differential equation, and the layer dependent input:

(2)

Time-multiplexing is used to transform a layer into a high dimensional network. The discrete input is given by the sequence , . This sequence is transformed into the time-continuous input as follows

(3)

where is the clock cycle and the scaling , determines a mask, which is applied periodically with the period . Such a preprocessing method generates virtual nodes which are distributed temporally with a separation of , see more details in Appeltant2011; Rohm2018. The given deepTRC now contains layers with each having virtual nodes resulting in a total reservoir size of . The virtual nodes within the layers correspond to the values .

For the training of the deepTRC, all virtual nodes of each layer are read out. The virtual nodes are combined into the global state given by

(4)

In order to train for a given task , the global state is weighted

(5)

where is a constant bias and the weights

are determined via a linear regression with an optional Tikhonov regularization.

In the following we will focus on the analysis of the recently introduced opto-electronic reservoir Penkovsky2019a, which is governed by the equations:

(6)

where is the input gain and are coupling strengths between the consecutive layers and . Further, is a damping constant satisfying , is the feedback gain and is a scalar phase shift of the nonlinearity. According to the nonlinearity, the system is referred as Ikeda time-delay system Penkovsky2019a; Gao2019; Ikeda1979.

ii.2 Chaotic Time-Series Prediction

Figure 2: Performance Comparison of deepTRCs with up to layers at the Mackey-Glass prediction task into future. The black line represents the performance of a single-layer TRC with the increase of virtual nodes . Hereby, the number in the circle marks the number of layers, e.g. orange circle with 3 means 3 layers with each having 200 nodes.

In Fig. 2, we show the performance of the Ikeda based deepTRC at the prediction of the chaotic Mackey-Glass attractor, i.e., where is the time-series of the Mackey-Glass system and determines how far into the future it shall be predicted. The chosen value corresponds to twice the delay time of the Mackey-Glass system. The parameters of the Mackey-Glass system are as in Jaeger et al. Jaeger2004a. We simulated deepTRC with up to layers and varied the number of nodes per layer. Hereby, the separation of nodes is set to and will be kept fixed for all following simulations. A Bayesian optimization approach Snoek2012 was used to optimize the feedback-gains , the delays , coupling gains and phase offsets

. Hereby, the Bayesian optimisation of deepTRC becomes much harder for deeper systems according to an increased amount of hyperparameters. In general, additional layers improve the performance of deepTRC as compared to single-layer TRC. As a remark, the deepTRC enables a faster computation by a constant separation of nodes

, i.e a deepTRC with layers is times faster than the single-layer TRC with the same amount of total nodes in line with a 5 times shorter clock cycle.

delay damping phase feedback coupling
Layer 1 230 0 0.2 0.68 4.0
Layer 2 457 0.01 0.2 0.8 1
Layer 3 199 0.01 1.5 0.97 0.01
Layer 4 27 0.01 1.28 0.83 0.13
Layer 5 40 0.01 1.9 0.2 0.01
Table 1: Parameters of the five-layer deepTRC with the best performance (NRMSE) shown in Fig. 2.

Iii Dynamics of Autonomous deepTRC

The dynamics of a delay based RC play an essential role for its performance Appeltant2011; Dambre2012. In this section we consider an autonomous -layer deepTRC by setting .

The equilibrium of Eq. (6) are given as solutions of the following nonlinear system of equations

(7)

Without further restriction we set and . System (6) can be linearised around this equilibrium, which leads to:

(8)

where is the linearisation of layer and is the linearisation shifted by the delay of the layer.

The block matrix is given by the sub-matrices

(9)

According to the unidirectional topology of the Ikeda-deepTRC matrix becomes a lower triangular block matrix. Further, the block matrix can be calculated by:

(10)

Therefore, becomes block diagonal. The sub-matrices of the autonomous Ikeda deepTRC (6) are given by:

(11)

with and

. The Eigenvalues of the linearised system can therefore be calculated by solving the characteristic equation:

(12)

with . This equation can further be simplified by using that the determinant of a lower triangular matrix is given by the product of the determinants of the block matrices:

(13)

The characteristic equation of a L-layer Ikeda deepTRC (6) is therefore given by:

(14)

For the details of the derivation we refer to Appendix B. One can see that the eigenvalues of the linearised autonomous deepTRC are given by the combined set of the eigenvalues for the single layers.

Figure 3: (a) Bifurcation diagram of the first layer where the blue shaded area marks bi-stability, (b) bifurcation diagram of the second layer for constant where the green shaded area marks coexistence of a stable equilibrium and a stable limit cycle, (c) 2-parameter bifurcation diagram of the two-layer system; computed using DDE-Biftool. The used parameters are given in Table 2. Bifurcation legend: H (red) – supercritical Hopf, H (green) – subcritical Hopf, SN – saddle-node (blue),F – fold (orange) and PD – period doubling (magenta) bifurcation. For the periodic and chaotic parts of the bifurcation diagram in (a) and (b), the maxima and minima of the solutions are plotted.

In the following we study the dynamics of the two-layer DRTC depending on the feedback gains and . The remaining parameters are kept fixed to the values shown in Table 2.

Description Parameter Layer 1 Layer 2
delay time 30 30
phase offset 0.2 0.2
coupling 0 1
damping 0 0.01
initial value
Table 2: Parameter values used for the bifurcation diagrams in Fig. 3.

In Fig. 3 (a) the equilibrium of the first layer is shown as a function of the feedback gain . The layer exhibits two saddle-node bifurcations at and respectively; between these two points the first layer possesses two coexisting stable equilibria. Further, it reveals periodic solutions after the supercritical Hopf bifurcation at , and a period-doubling cascade resulting in chaotic dynamics at . The numerical part of the bifurcation diagram, i.e., the chaotic solutions of in Fig. 3 (a) are computed using Heun’s method.

In Fig 3 (b) the bifurcation diagram of the second layer is shown for a constant feedback gain of the first layer . The second layer reveals a subcritical Hopf at . The created unstable limit cycle becomes stable due to a saddle-node bifurcation at . Accordingly, a coexistence of a limit cycle and the stable equilibrium occurs in the range .

While the first layer can be considered separately, the second layer is driven by the first one. As a result, one has to study the whole system (22) for analyzing the dynamics of the second layer. In Fig. 3 (c) the bifurcation diagram of the full two-layer system is obtained using the DDE-Biftool Sieber2016. The bifurcations of the first layer occur as vertical lines (indicating saddle node Hopf and period doubling bifurcations in blue, red and magenta) while the Hopf bifurcation of the second layer shows as a loop in the , plane (green and orange).

In the following sections, the obtained bifurcation diagrams will be compared with the other characteristics of the reservoir that describe its memory capacity. In particular, the next section investigates conditional Lyapunov exponents and shows how they restrict the parameter set where the RC can properly function.

Iv Conditional Lyapunov Exponent of deep time delay reservoirs

To use the two-layer deep Ikeda reservoir we enable the input into the first layer by setting and solve the delay differential equation system (6) with initial history functions . The fading memory concept Dambre2012 states that the reservoir needs to become independent of those history functions after a certain time. Therefore two identical reservoirs with different initial conditions need to approximate each other asymptotically. From a dynamical perspective, a reservoir has to show generalized synchronization to its input.

In the following, we check for generalized synchronization of two unidirectional coupled systems by estimating the maximal conditional Lyapunov exponent of the driven system. This is done by the auxiliary system method, where we initialize two identical systems with different initial conditions and drive both with the same input sequence. The conditional Lyapunov exponent then measures the convergence or divergence rate. If its maximal value is below zero, the state sequences will approximate each other asymptotically, and therefore the system shows generalized synchronization to the input system. If the exponent is positive, the systems diverge.

In order to calculate the conditional Lyapunov exponent we consider the distance between the solutions of two identical systems with different initial conditions: and :

(15)

where

(16)

As a remark, gives the state of layer whereas is a function of the -layer system state defined over the interval given in (16). The evolution of the distance is now given by a delay differential equation. For small perturbations, we linearise equation (15):

(17)

where the linear part of was summarized into a constant matrix , and the nonlinearity and the time varying input are included into . According to Hale1977 the solution of (17) can be estimated as

(18)

where is the conditional Lyapunov exponent of the non-autonomous reservoir.

For the numerical estimation of the conditional Lyapunov exponent, two equal non-autonomous systems with different initial conditions of the first layer

were evaluated. The input sequence was drawn from the uniform distribution

. The distances , of the state sequences was calculated using the maximum norm over each delay interval :

(19)

An exponential function was approximated accordingly to

(20)

where determines the numerically approximated conditional Lyapunov exponent for layer .

Figure 4: Numerical estimations of conditional Lyapunov exponents of (a) for the first layer, and (b) for the second layer. Black dashed lines show the positions of the bifurcations shown in Fig. 3. In panel (a), two lines (solid and dotted) correspond to two different conditional Lyapunov exponents, which are obtained for two different sets of initial conditions of the first layer. They appear due to bistability of the layer dynamics for between the saddle-node bifurcations. In (b) the initial conditions are set equally to the calculated conditional Lyapunov exponent corresponding to the solid line in (a). In (c) the transients of the second layer are shown for feedback gains (blue line) and (red line) which further correspond to the colored crosses in (b). The latter reveals a periodic oscillation of the length of the delay , leading to loss of generalized synchronization shown in (b).

In Fig. 4 the numerical estimation of the conditional Lyapunov exponent is presented for a two-layer deepTRC. The parameters are as in Table 2, except the input gain, which was set to .111The lower input gain was chosen to create low-degree MC, because higher degrees are computationally expensive. Here, the feedback-gains were scanned systematically for both layers.

In Fig 4 (a) the conditional Lyapunov exponent of the first layer is shown, which depends only on . Here, the conditional Lyapunov exponent starts to increase with increasing and drops short behind the second saddle-node bifurcations SN at shown in Fig. 3 (a). In the region between the two saddle node bifurcations SN and SN according to the bistability of the autonomous system two Lyapunov exponent were computed referring to different initial conditions. The dip at ,close after the annihilation of the stable equilibrium with the unstable one, reveals the most negative Lyapunov exponent. After the Hopf bifurcation H in the first layer at the conditional Lyapunov exponent becomes positive, and no generalized synchronization is possible. In other words, the system violates the fading memory condition for .

The conditional Lyapunov exponent computed for the second layer reflects the dynamics of both layers. We observe from Fig. 4, that has, in general, a smaller magnitude than . As a result, a perturbation of the first layer’s initial condition stays longer in the system when a second layer is added. For larger values of , the borders of the region, where the conditional Lyapunov exponent is negative, are determined by the Hopf bifurcations. The strong negative conditional Lyapunov exponent in the range is due to the so-called exceptional point Sweeney2019, were two negative real eigenvalues coalescence. In the parameter range the second layer losses generalized synchronization before reaching the subcritical Hopf bifurcation. We assume that due to the ongoing drive of the system, the second layer is pushed into the basin of attracting of the coexisting limit cycle, shown in Fig. 3 (b). Accordingly, periodic oscillations shown in Fig. 4 (c) (red line) occur which leads to a loss of generalized synchronization before reaching the subcritical Hopf H.

V conditional Lyapunov exponents versus Memory Capacity

In this section, we systematically investigate the relation between the conditional Lyapunov exponent and the MC for two-layer deepTRC. The MC measures how the reservoir memorizes and transforms previous input Dambre2012; Koester2020; Harkhoe2019, for more details about MC and a definition, we refer to Appendix C.

Description Parameter Value
separation of nodes 1
clock cycle 25
virtual nodes per layer 25
initial steps
training steps
MC threshold ()
MC threshold ()
Table 3: Parameter values used for computation of the memory capacity in Fig. 5 and Fig. 6.
Figure 5: Dependence of the memory capacity on parameters and .(a)-(c) show linear, quadratic and cubic memory capacities while (d) shows the total memory capacity. The black dashed lines show the positions of the bifurcations shown in Fig. 3. The parameters are as given in Table 2.

In Fig. 5, we show the linear, quadratic, cubic, and total MC for the parameter values as in Table 3. The maximal total memory capacity can be reached in a wide parameter range, as shown in Fig. 5 (d), except for the regions with periodic solutions or strongly negative conditional Lyapunov exponent. In all four panels, a clear drop of the MC is visible close after the supercritical Hopf bifurcation in the first layer due to the violation of fading memory, i.e., in layer 1 at . Also the drop of MC occurs before reaching the subcritical Hopf in layer two, which is in agreement with the loss of generalized synchronization shown in Fig. 4. More specifically, we observe the following features:

  • A large memory capacity, which is necessary for a reservoir computer to perform its tasks, is observed in the regions where conditional Lyapunov exponent is negative, and the fading memory condition is satisfied.

  • The highest linear MC can be achieved close to the bifurcations where the conditional Lyapunov exponent is negative and small in absolute value. In such a case, the linear information of the input stays longer in the system.

  • With the decreasing of the conditional Lyapunov exponent, the linear MC is decreasing, and MC starts dominating. With the further decrease of conditional Lyapunov exponent, the third-order MC becomes dominant. We remark that there is always a trade-off between the MC of different degrees since the total MC bounds their sum.

Concluding, different dynamical regimes of a deepTRC can boost different degrees of the MC.

Vi Resonances between Delays and Clock Cycle

As recently shown analytically and numerically by Stelzer et al. Stelzer2019, resonances between delay-time and clock cycle

lead to a degradation of linear memory capacity due to parallel alignment of eigenvectors of the underlying network. This effect was later shown in all degrees of the memory capacity by Harkhoe et al. 

Harkhoe2019 and Köster et al. Koester2020 independently. This loss of total memory capacity at all resonances of delay-time and clock cycle further results in less performance at certain tasks.

In the following, we analyze this effect for two and three layer deepTRC via computing the single degrees of memory capacity up to the cubic degree while scanning the delays , and , respectively. The system parameters are as in Table 2 and the simulation parameter are shown in Table 3.

vi.1 Two-Layer deepTRC

Figure 6: (a) – (c) MC of degrees 1, 2, 3 and (d) total MC of two-layer deepTRC. The reservoir size is and the clock cycle is . The red triangles mark the delay setting used in Fig. 5. (e) – (g) MC of degrees 1, 2, 3 and (h) total MC of three-layer deepTRC with fixed. The reservoir size is and the clock cycle . (Resolution ).

In Figs. 6 (a)–(d) the numerically computed memory capacities are shown as a function of the delays of a two-layer deepTRC. Resonances between the delays , , and the clock cycle are present in all shown degrees of the MC. Further, resonances of the two delays occur as diagonals in the plot, with the main diagonal being dominant. In contrast to the off-diagonal resonances, the resonance broadens with higher delays. The total memory capacity exhibits weak degradations at the diagonal delay–delay and the clock cycle–delay resonances. The comparison between the linear and nonlinear MC reveals the trade-off between both, where the linear MC becomes dominant at and .

In contrast to the reported linear MC degradations of the delay–clock cycle resonances for single-layer TRC, a new effect is visible in Fig. 6 (a), where the resonance crosses the main diagonal. We observe that for fixed , when increases, the linear MC is degraded for , while it is boosted for .

Figure 7: (a) Linear recallability of past inputs for a single-layer RC (blue) and a additional second layer (red). Here, the second layer augments the intervals with low MC of the first layer. (b) Overall linear memory capacity for single-layer (blue) and strongly superior two-layer system (red-blue dashed).

In the following, we present an explanation of a boosted linear MC of a two-layer deepTRC for . For this, we compute the linear recallability of inputs -steps into the past and separate between the two layersJaeger2004a. Hereby, means the reservoir can fully recover the input from clock cycles ago. As shown in Fig. 7, increasing for a single layer TRC leads to an improved recallability from inputs farther in the past. For comparison, at , the single-layer TRC is able to recall inputs up to . For , the linear MC splits up into small intervals with a high MC alternated with intervals of almost no MC, visible as blue rays in Fig. 7. The length of the intervals of no MC further increases for longer delay , where the frequency can be estimated as .

The addition of a second layer with a delay of while the delay of the first layer increases the length of the recallability up to . Furthermore, for the second layer can fill up the intervals of low MC. In Fig. 7 (a) the recallability of both layers is presented in different colour. Here, the areas with a high recallability in layer 1 (blue) are followed by areas of high recallability in layer 2 (red), which results in a overall higher linear MC. Again resonance effects at occur, resulting in degradations.

vi.2 Three-layer deepTRC

In Figs. 6 (e)-(h), we show the MC of a three-layer deepTRC with a reservoir size of . The delay of the first layer is fixed at , and the delay plane is scanned. The linear and quadratic MC exhibit strong resonances at , , which increase the linear MC. The resonances with multiples of the clock cycle occur as well, but they are less prominent with . The resonances between and occur as the main diagonal in the plots. In comparison to the two-layer case, the regions of high linear and cubic MC are not separated by the main diagonal, here they are divided by the resonances with the first layer. The cubic MC becomes maximal with the delays of all three layers being equal. The total MC is almost equal for all and showing a small general loss and with weak degradations at resonances with the clock cycle for .

vi.3 Memory Capacity Distribution of deepTRC

In this section, we show the role of multiple layers of deepTRC and we show ways how to systematically create certain memory in the a deepTRC. Therefore, we are use using the knowledge obtained in the previous sections. In particular, we present two configurations: the first one allows using the deep layers for producing large higher-order MC; the second configuration produce an increasing in the linear MC with the growing number of layers.

We start with a single-layer deepTRC. Figure 8 (a) shows the MC distribution with an increasing of the number of virtual nodes . The increase of the nodes number leads mainly to an increase of quadratic MC, and, starting from , a slow increase of the cubic MC. As a remark, increasing the virtual nodes of a single-layer TRC changes the clock cycle and, therefore, we adjusted the delay to .

In Fig. 8 (b), we keep fixed and add layers with the same number of nodes to the deepTRC. The delay of all layers is kept fixed at and the clock cycle is . We observe that additional layers lead to an increasing of nonlinear MC of higher orders.

The third configuration is similar to (b) but now the delays were varied in order to boost only the linear MC. Here, the augmentation method described in section VI was extended to layers with the delays . The last layer has a delay because here a single layer shows its most linear MC before splitting into rays. As shown in Fig. 8 (c), this method produces large linear MC, while suppressing higher order MC.

Possible combinations of the presented delay configurations can be a good method for adjusting a deepTRCs memory to specific tasks. The deepTRC configurations in Fig. 8 show small general losses of total MC with a increasing number of layers, which is caused by linear dependence of different nodes.

Figure 8: Distribution of the memory capacities of different degrees for (a) a single-layer TRC, (b) deepTRC with constant delays and (c) deepTRC with optimal linear MC using descending delays . The white number at the foot of the bar marks the number of layers L.

Vii Conclusion

We analyzed a Deep Time-Delay Ikeda System and presented a relation between the dynamics of the autonomous deepTRC and the numerically computed conditional Lyapunov exponent. We showed a correlation between MC and conditional Lyapunov exponents. A high linear MC is observed for small negative conditional Lyapunov exponent. With decreasing of the Lyapunov exponent, different degrees of MC are sequentially activated. Further, we investigated the clock cycle-delay resonances in different layers as well as delay-delay resonances. The degradation of the total MC at clock cycle delay resonances was numerically shown. We explained the boost of the linear MC at specific delay-clock cycle resonances by an augmentation effect. Additionally, we used the gained information to present two general delay configurations, one with an increasing nonlinear MC and one with boosted linear MC. These configurations provide a variability superior to a single-layer TRC. They provide a potential for building general deepTRC, which are oriented at different tasks due to a broad MC spectrum. We could verify that the deepTRC concept is a promising architecture for fast and variable reservoir computing.

Acknowledgements

The authors would like to thank Florian Stelzer for fruitfull discussion. M.G acknowledges financial support provided by the Deutsche Forschungsgemeinsschaft (DFG, German Research Foundation) through the IRTG 1740. K.L. and F.K. acknowledge support from the Deutsche Forschungsgemeinschaft in the framework of the CRC910. S.Y. acknowledges the financial support by the Deutsche Forschungsgemeinschaft - Project 411803875.

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Appendix A Normalised Root Mean Squared Error

The performance at a certain task is measured using the Normalized Root Mean Squard Error (NRMSE) given by:

(21)

where is the length of the testing sequence, is the reservoir output and are the task-dependent desired outputs. Further

is the variance of the desired output and

is the Euclidean norm.

Appendix B Stability and Hopf Bifurcation Analysis

This section gives a detailed analysis of the autonomous dynamics of the two-layer deep Ikeda time-delay system given in (6) which we repeat here for convenience:

(22)

By setting the derivates to zero, the equations for the equilibrium are given by:

where the equilibrium of the first layer cannot be given explicitly. In order to check the stability we linearise the system at its equilibrium leading to:

This can be rewritten into a lower triangular block matrix form as shown in section III and therefore the resulting characteristic equation is given by:

with and and lambda being the eigenvalues of the autonomous deepTRC.

A equilibrium is asymptotically stable if the real part of every solution of the characteristic equation is negative:

(23)

what we check for in the following for layer 1 and layer 2 separately.

Stability of Layer 1

The eigenvalues of the first layer are roots of the equation , and they are given by the Lambert- function:

with being the -th order of the Lambert- function.

The stability of the equilibrium point will change if the eigenvalues cross the imaginary axis. Therefore we set

where . Separating the real and imaginary part we can rewrite this into:

using the absolute values this leads to:

with being stable and being unstable due to a Andronov-Hopf bifurcation.

Stability of Layer 2

For the second layer we analyse the second term of the characteristic equation given by:

(24)

By substituting it is straightforward to find the frequencies, at which the equation can cross the imaginary axis

(25)

The Hopf Bifurcation of the second layer occur in regions where: . Denoting

the delay values for stabilizing and destabilizing Hopf bifurcations are given as:

(26)
(27)

Appendix C Memory Capacity

The task-independent memory capacity (MC) introduced by Dambré et al Dambre2012 determines how a dynamical system memorizes previous inputs and how it further transforms them. The total MC is given by the sum over all degrees of MC.

(28)

In the following refers to the linear and to the nonlinear MC (quadratic, cubic, …). Hereby, the linear MC is given by a simple linear recall of past inputs, whereas the nonlinear gives evidence about which computation of past inputs are performed by the system. Further, it was proven that the read-out dimension of a system bounds the maximal reachable total MC . According to this fact, a trade-off between the linear and nonlinear MC can be obtained.
The MC can be calculated via the correlation between input and the reservoir states:

(29)
(30)

where is the average value over the time , is the inverse and the transpose of the matrix. The calculation of the MCs via the correlation given in (30) is biased due to statistics and this bias strongly depends on the length of . Therefore we manually set a threshold meaning that no MCs below this threshold are regarded.
As suggested by Dambre et al.Dambre2012, the input values were drawn from a uniform distribution . In order to compute the different degrees of MC we used the set of Legendre polynomials , which provide orthogonality over the given input range.
For example, a target with total degree and three variables is given by:

(31)

In order to find all appearing a maximal step into the past of was set.

References