Master Memory Function to calculate linear memory capacity of a time-delay based reservoir computer.
We show that many delay-based reservoir computers considered in the literature can be characterized by a universal master memory function (MMF). Once computed for two independent parameters, this function provides linear memory capacity for any delay-based single-variable reservoir with small inputs. Moreover, we propose an analytical description of the MMF that enables its efficient and fast computation. Our approach can be applied not only to reservoirs governed by known dynamical rules such as Mackey-Glass or Ikeda-like systems but also to reservoirs whose dynamical model is not available. We also present results comparing the performance of the reservoir computer and the memory capacity given by the MMF.READ FULL TEXT VIEW PDF
Using the machine learning approach known as reservoir computing, it is
This paper addresses the reservoir design problem in the context of
The Deep Time-Delay Reservoir Computing concept utilizes unidirectionall...
Reservoir computing is a recent trend in neural networks which uses the
We analyze the reservoir computation capability of the Lang-Kobayashi sy...
As a test of general applicability, we use the recently proposed spin-wa...
It has been demonstrated that cellular automata had the highest computat...
Master Memory Function to calculate linear memory capacity of a time-delay based reservoir computer.
Reservoir computing is a neuromorphic inspired machine learning paradigm, which enables high-speed training of recurrent neural networks and is capable of solving highly complex time-dependent tasks. First proposed by Jaeger and inspired by the human brain , it utilizes the inherent computational capabilities of dynamical systems. Very recently, the universal approximation property has also been shown for a wide range of reservoir computers, which solidifies the concept as a broadly applicable scheme 
. Bollt pointed out a connection between reservoir computers and VAR (vector autoregressive) and nonlinear VAR machines, which may be one of the reasons behind the surprising efficiency of reservoir computers for time-dependent tasks[4, 5]. Many different realizations [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18] have shown the relevance of reservoir computing to practical applications, while analytical and numerical analyses [19, 20, 21, 22] help in building understanding of its working principles and improve its performance. Motivated by fast inference and low energy consumption, optoelectronic and optical hardware implementations of reservoir computers are often realized [23, 24, 25, 26, 27, 28, 29, 30, 31, 32] indicating a high future potential of such processing unit.
Originally, reservoir computing is performed with a network of nonlinear nodes, which projects the input information into a high dimensional phase space, allowing a linear regression to linearly separate features. In time-delayed reservoir computing, a single dynamical node with delayed feedback is employed as a reservoir instead of the network . The time-multiplexing procedure allows for such a single-element system to implement a recurrent ring network [33, 34, 35], see Fig. 1. The absence of the need for a large number of nonlinear elements significantly reduces the complexity of the reservoir hardware implementation. Existing experimental and numerical realizations show promising results in solving time-dependent tasks, such as speech recognition, time-series predictions [36, 31, 37, 38, 39, 40, 41, 42, 43] or equalization tasks on nonlinearly distorted signals . For a general overview, we refer to [45, 46, 47].
Often reservoirs are optimized to a specific task by hyperparameter tuning, which defeats the purpose of reservoir computing as a fast trainable machine learning scheme. Dambre et al. introduced a task-independent quantification of a reservoir computer, building on the memory capacity notion already introduced in  whereas a high memory capacity pinpoints to generally well-performing reservoirs.
In this paper, we provide an analytical tool for finding promising reservoir setups by introducing a master memory function (MMF) for delay-based reservoir computing with a small input. The MMF allows for fast computable predictions of the linear memory capacity and it indicates that linear memory capacity of reservoirs is similar for systems with similar linearizations.
The main idea behind our method can be outlined as follows. Consider a delay-based reservoir described by a general nonlinear system , where is an input signal, which is ”small” in a certain sense, and determines the state of the reservoir. The response of the reservoir must be independent (at least to some extend) on its initial state, the property known as echo state. Such a situation occurs when the reservoir is operating near an equilibrium state that is stable in the absence of the input signal. Therefore, all reservoir dynamics takes place in a neighborhood of this equilibrium and as a result, the reservoir linearization approximates these dynamics. Here is the deviation from the equilibrium. In the considered case of the single-variable reservoir, the scalar parameters and are the only determining quantities. The relatively simple form of the linearized system allows us to obtain an analytical expression for the linear memory capacity, which depends on the parameters and and thus parametrically determines the linear memory capacity of any reservoir with the above properties. We call the obtained function MMF due to its universal features, i.e. different reservoir computing setups, which possess the same linearizations, yield the same linear memory capacity given by MMF.
The paper is structured as follows. First, we will briefly revise the concept of time-delay-based reservoir computing and the concept of linear memory capacity. We will then present our main analytical result while additionally presenting an example code for an efficient evaluation of the obtained expression; the derivation is given in the appendix. Finally, comparisons of numerically simulated reservoir computer performance with the semianalytical approach are provided. We also show in Sec. 4.6 the applications of our results to reservoirs with unknown dynamical model, where the parameters and are evaluated using the system response to an external stimuli.
Reservoir Computing utilizes the intrinsic abilities of dynamical systems to project the input information into a high dimensional phase space . By linearly combining the responses of the dynamical reservoir to inputs, a specific task is approximated. In the classical reservoir computing scheme, often, a so-called echo state network is used by feeding the input into a spatially extended network of nonlinear nodes. Linear regression is then applied to minimize the Euclidean distance between the output and a target. This approach is particularly resourceful for time-dependent tasks because the dynamical system which forms the reservoir acts as a memory kernel.
In the time-delay-based reservoir computing scheme , the spatially extended network is replaced by a single nonlinear node with a time-delayed feedback loop. The time-multiplexing procedure with a periodic mask function is applied to translate the input data to a temporal input signal. Similarly, the time-multiplexing procedure translates the single temporal high-dimensional reservoir response to the spatio-temporal responses of virtual nodes. The virtual nodes play the same role as the spatial nodes in echo state networks.
A sketch of the delay-based reservoir computing setup is shown in Fig. 1. In the following, we will give a short overview of the quantities and notations used in this paper. We also refer to our previous works [49, 50, 51] for a detailed explanation of how the reservoir setup is operated and task-independent memory capacities are computed.
Let us briefly remind the main ingredients of the time-multiplexed reservoir computing scheme [33, 50, 51, 49]. We apply an input vector componentwise at times , , , being the number of sample points. The administration time for different inputs is the same and it is called the clock cycle . To achieve a high dimensional response to the same input, a -periodic mask function multiplies the input and the resulting signal enters the system (see Fig. 1 and Fig. 2). The mask is a piecewise-constant function on intervals, each of length corresponding to virtual nodes. The values of the mask function play the same role as the input weights in spatially extended reservoirs, with the difference that time-multiplexing distributes the weights over time. The responses of the reservoir are collected in the state matrix , see Fig. 3. The elements of the state matrix are with , and , where is the state of the dynamical element of the reservoir at time shifted by the mean over all clock cycles , see . The average can be understood as the averaging over the row elements. For example, for an experimental or numerical realization of the reservoir with a semiconductor laser, could be the laser intensity.
A linear combination of the state matrix is given by , where
is a vector of weights. Such a combination is trained by ridge regression, i.e., the least square approximation to some target vector
where is the Euclidean norm, and is a Tikhonov regularization parameter. The solution to this problem is
In the case of invertible , the matrix is the Moore-Penrose pseudoinverse. We set , where is the largest state response in the state matrix . To quantify the system’s performance, we use the capacity (see [48, 49]) to approximate a specific task which is given by
where NRMSE is the normalized root mean square error between the approximation and the target
is the variance of the target values.
Here we introduce the linear memory capacity as a quantitative measure for the memory kernel of a dynamical system.
The central task-independent quantification was introduced by Jaeger in  and refined by Dambre et al. in , which yields that the computational capability of a reservoir system can be quantified via an orthonormal set of basis functions on a sequence of inputs.
Here we give a recap of the quantities introduced in [50, 49] and focus on the linear memory capacity.
In particular, the capacity to fulfill a specific task is given by
which can be derived from Eq. (2) (see [48, 49]). The capacity equals if the reservoir computer computes the task perfectly, thus ; and it equals if the prediction is not correlated with the target. In between and if it is partially capable of fulfilling the task. To quantify the systems capability for approximating linear recalls of inputs, an input sequence is applied, where
are uniformly distributed random numbers, independent and identically drawn in. With the input sequence of random numbers, the reservoir response is collected in the state matrix .
To describe a linear recall task of steps into the past, the target vector is defined as
which is the linear recall steps into the past. Formally, one considers an infinitely long sequence . To approximate it numerically, we use .
The linear memory capacity MC is defined as the sum over the capacities of all possible linear recall tasks
where is the capacity of the -th recall into the past. This quantification is task independent and thus implications for specific applications cannot be given. Different tasks may need different specific capacities. The measure MC
thus only gives a hint for well-performing reservoirs in the context of using the full scope of the given reservoirs, rather than a direct task-specific estimate. We have to point out, that the linear-nonlinear trade off is a well-known effect, thus a system with high linear memory capacity can yield a low nonlinear transformation capability. Nevertheless, we believe predicting a well-performing linear memory kernel reservoir is beneficial for a general reservoir computer setup, as higher nonlinear memory transformation can be utilized by adding additional reservoir systems with increased perturbations.
As one example system, we use a Stuart-Landau oscillator with delayed feedback:
Here, describes the real-valued amplitude of the system, is the feedback strength, the delay time, is a parameter determining the dynamics, and the input strength of the information fed into the system.
For and , system (8) has only the trivial equilibrium , and for , additionally, the nontrivial equilibria exist , which appear in a pitchfork bifurcation at . The linearization at the nontrivial equilibria (taking into account the input term) reads
where , , and .
As the second example, we use the Mackey-Glass system
where is the dynamical variable, is the delayed variable, and , , and are control parameters. The reservoir input is fed into the system via the term . We set , for which the system possesses a stable nontrivial equilibrium (for ). The corresponding linearization at this equlibrium is Eq. (9) with , , and .
From Eq. (5), we see that the capacity to approximate a specific input is given by the inverse of the covariance matrix (corrected by ), also called the concentration matrix, and the matrix multiplication of the state matrix and the target . Thus, it is necessary to derive the state matrix from the responses to the small perturbations of the system. This has already been done for 1-dimensional reservoirs with by an Euler step scheme , and for 1-dimensional reservoirs with for specific differential equations . We would like to extend this knowledge by analyzing arbitrary systems and . We assume the virtual node distance to be small and , with . We also assume the operation point of the reservoir to be a stable equilibrium. We will exemplarily validate our analysis on the two 1-dimensional nonlinear reservoirs given by Eqs. (8) and (10).
Our main result is the modified state matrix , that we can use to determine the MC while we can calculate it solely from the linearized system. The entries are given by
where , , , the parameters and , and are given by the linearization (9), are the weights of the time-multiplexing. The index corresponds to the -th clock cycle, and to the -th virtual node. The rows of the modified state matrix contain entries in the statistical direction of the -th shifted input. As we show in App. A, the covariance of the modified state matrix approximates the original state matrix
Moreover, we also show that the full linear memory capacity can be calculated by using solely the modified state matrix
and the capacity of the -th recall is given by
where is the -th row of . Details of the derivations can be found in App. A.
We call the memory capacity given by Eq. (13) the Master Memory Function (MMF). For given parameters of the linearization , , and , as well as the mask coefficients , this function can be evaluated in a much more efficient way than the direct evaluation of the linear memory capacity via a stepwise integration of the differential equation. A speed comparison is given in App. B. The new approach does not require calculating the reservoir, and it does not involve the input sequence .
The obtained approximations of the modified state matrix (11) and memory capacity function (13) allow for efficient numerical evaluation. For this, we propose the following scheme, which we also show as pseudocode in Alg. 1.
First, we iterate over all entries of the modified pascal’s triangle given in Fig. 6, which can be done by two nested loops , . We do this until all entries in a row are below a given threshold for for (see Fig. 6). The threshold ensures that we cut unnecessary terms smaller than the regularisation parameter . A third loop goes over all virtual nodes adding the result multiplied with the corresponding weight to all corresponding entries , that thus lie in the same input interval . See App. A for more information. The algorithm to compute the modified state matrix is given below, where is the floor function rounding down to the greatest integer less than or equal to , getBinomialTerm(i,j,p) returns and is the mask weight vector of length . The implemented C++ code can be found in the supplementary.
Simulations have been performed in standard C. For linear algebra calculations, the linear algebra library "Armadillo"  was used. To numerically integrate the delay-differential equations, a Runge-Kutta fourth-order method was applied, with integration step in dimensionless time units. First, the system is simulated without reservoir inputs, thus letting transients decay. After that, a buffer time of 10000 inputs was applied (this is excluded from the training process). In the training process, inputs were used to have sufficient statistics. Afterward, the memory capacities of linear recalls were calculated with Eq. (5), whereby a testing phase is not necessary. The linear memory capacity MC was calculated by summing the obtained capacities . For the piecewise-constant -periodic mask function independent and identically distributed random numbers between were used.
For all simulations, the input strength was fixed to . The small input strength was used to guarantee linear answers of the reservoir and, hence, the relevance of the approximation.
A program written in C++ to perform the semi-analytic calculations is given in the supplementary material.
In this section we illustrate the MMF effectiveness. First, we show that the MMF provides a very good approximation of MC using the reservoir given by Eq. (8). The approximation works quite well as long as is relatively small. This is fulfilled for typical reservoir computing setups, as one would otherwise lose computation speed. In the second part, we show how MMF provides a universal, system-independent characteristics. For this, we compare MMF with the memory capacities of different reservoirs. Each particular reservoir realization is described by one parameter combination of the MMF. In the last part, we describe how MMF can be computed for reservoirs with unknown dynamical rule. For this, the parameters and of the linearization are measured from system’s response to a small periodic input.
Figure 4 shows the memory recall capacity obtained from direct simulations and compares it with the MMF for 4 different cases of the Stuart-Landau system, given by Eq. (8). The exact parameters are given in the caption of Fig. 4. The directly simulated results are shown by blue solid lines and blue markers , whereby green dashed lines and green markers show the MMF. For a small virtual node distance in Fig. 4(a,c), the MMF predicts the linear memory capacity very accurately. For a higher value of (Fig. 4(b,d)), the accuracy drops, though the results are still accurate for qualitative predictions, and describe the general trend of the system’s memory capacity.
The scans in Fig. 4(c) and 4(d) were done with a higher delay time , which induces memory gaps . Even though the memory capacity has a complex dependency on at these parameter values, the prediction for the two different virtual node distances and is still accurate.
A 2-D parameter plane was simulated in App. E to show that the predictions of the MMF work for arbitrary parameter setups, thus the general predictability of the new scheme is very promising.
Comparing the computation speed of the classical numerical intergration and the new proposed scheme shows an increase of 2 to 3 orders of magnitude, depending on the operation point, the number of training steps and the value of the clock cycle . A higher clock cycle and more training steps increase the simulation time for the direct numerical integration, whereas the new proposed scheme is independent of that. If the operation point is close to a bifurcation, the convergence of the new proposed scheme is slower, increasing the computation time needed. Still, even very close to the bifurcation line, the computation speed is significantly higher (with a factor of about 100) making the MMF a valuable tool. See App. B.
An exciting result that follows from the MMF concept is the possibility to generalize to arbitrary time-delay-based reservoirs. Every reservoir with a similar linearization should yield similar linear memory capacity. To illustrate this, we compare the Stuart-Landau reservoir system given by Eq. (8) and the Mackey-Glass reservoir system given by Eq. (10).
The inset of Fig. 5 illustrates this fact. It shows the capacity to recall the -th step into the past as a function of for the Stuart-Landau (blue), the Mackey-Glass (red), and the MMF given by Eq. (14) (green). Both systems are tuned such that their respective linearization yield the same parameters and .
From this it follows that it is enough to compute the linearization parameters and to predict the MC of any arbitrary delay-based reservoir computer. The color plot in Fig. 5 shows the MMF given by Eq. (13) for different parameter values and . A well-performing operation point seems to be the edge to instability, agreeing with the known rule of thumb from the literature. Any reservoir yielding the same linearization parameters and in (9) must possess the corresponding memory capacity as given by Fig. 5 for these values of the parameters, as soon as the input is sufficiently small.
It thus follows that analyzing the Jacobian (linearization given by Eq. (9)) for fixed delay , virtual node distance , and number of virtual node is sufficient to predict the linear memory capacity of any arbitrary time-delay-based reservoir computer, and this memory capacity is given by MMF via Eqs. (13) and (14).
In this chapter, we show an experimentally accessible approach for measuring the parameters and for a delay system whose dynamical equations of motion are not known and which can be described by a single variable. The corresponding linearized dynamical system is given by
The goal is to measure and . This can be achieved by perturbing the system with a harmonic periodic signal . When this signal is small, we can consider the perturbed linearized system
where the complex form is chosen for simplicity. Due to linearity, the real solution is obtained simply by taking the real part. We consider the case of real and , which holds always when the reservoir variable is real.
Since the homogeneous solution decays to the stable equilibrium (we assume its exponential stability), the solution of Eq. (16) converges to the particular solution, given by
with . The ratio of the output to the input amplitude equals to the transfer function
where can be measured.
To determine the parameters and , it is sufficient to measure the transfer function at two frequencies, for example, at and . The first frequency is resonant to the delay while the second is in ’anti-phase’ to the delay . It holds
From above we can obtain the values for and
where the values of and can be obtained experimentally or numerically by perturbing and measuring the response of the reservoir.
We remark, that the choices of the resonant and anti-phase perturbation frequencies are convenient, but not unique. Clearly, one can perturb at other frequencies to obtain and . Moreover, the above idea can be generalized to the case of complex-valued parameters and , whereby more frequencies must be tested.
The measured values of the parameters and for a reservoir with unknown dynamics can be then simply used in MMF by estimating the linear memory capacities.
We have developed a simple and fast method for calculating the linear memory capacity for time-delay-based reservoirs with single-variable dynamics. The method allows the construction of a modified state matrix whose columns point in the direction of the linear recall steps.
Our results can be used to predict the reservoir computing setup with high linear memory capacity. The nonlinear memory capacity, on the other hand, remains an open question. In this case, combined setups could be used, where a delay-based reservoir computer includes multiple uncoupled subsystems. The decoupling ensures that no highly complex dynamical responses destroys the computational performance of the reservoir. One time-delay-based reservoir computing subsystem can be tuned to low perturbations at the edge of instability to act as a high linear memory kernel. Increasing the perturbation strength for the other subsystems will ultimately increase the nonlinear responses and thus the nonlinear memory capacity, so that the subsystems with high input strengths take on the role of high nonlinear transformation kernels.
A teamwork setup is thus recommended, where one or a few subsystems perturbed by small inputs and operated close to instability act as a linear memory kernel. In contrast, other nodes are perturbed more strongly and thus act as highly nonlinear transformation units. Such a setup should be capable of tackling a wide range of different tasks. It would be interesting to investigate this in future works.
One of the advantages of the delay-based reservoir, which allows the introduction of the MMF, is that it contains a small number of system parameters while the dynamics remains infinite dimensional. In the case of a small input signal and single-variable dynamics, these are only the linearization parameters and . Thus, if the linear memory capacity is computed for all possible values of these two parameters, it covers the case of all possible reservoirs. This procedure could be difficult, if not impossible, for network-based reservoirs, where the systems parameters may include, e.g., multiple coupling weights.
Consider a single-variable delay-differential equation, which describes the dynamics of the reservoir
where stands for an external input. We assume that is an equilibrium of this system, i.e., , and is small. In the case when the dynamics of (21) takes place near the equilibrium point , we can introduce the perturbative ansatz . Then the linearized system for the perturbation reads
where , , and .
Consider to be the temporal node-spacing of the reservoir, which is typically . Then, the equation (22) on any interval can be considered as the simple scalar ODE with the inhomogeneity . Moreover, according to the reservoir setup, the input is constant on this interval and equals . By variation of constants formula, we obtain for the solution of (22) for
where is the left endpoint of the interval.
Denoting to be the function on the interval , with , we rewrite equation (A) as
where we additionally used the relation and , . By evaluating Eq. (A) at , we obtain
Further, we approximate the integral from Eq. (A) by assuming . The approximation holds, in particular, when is small. The obtained expression
represents a discrete map (coupled map lattice) for approximating the state matrix . Here , , and . If considering it as a corresponding network with the nodes , see e.g. [35, 53, 34], we see that the node is coupled with the two nodes and in a feed-forward manner with the coupling weights and , respectively. The schematic representation of such a coupling structure leads to a Pascal’s triangle shown in Fig. 6. The first row of the Pascal’s triangle from Fig. 6 shows the dependence on , which is simply the multiplication by . In the second row, the contributions of and are shown. To obtain these dependencies explicitly, we insert and recursively in (27):
that is, we obtain the terms and . To build up further intuition about the dependence of the state matrix on the input, we show here the third level by substituting recursively , , and into Eq. (A):
To obtain a general recursive formula, we need to split the index in the appearing terms as , where corresponds to the delayed (’right’, ) and to the ’left’ () connections in the coupling network in Fig. 6:
where is a constant depending only on and . For an infinitely long input sequence, the sum in (30) goes for all from 0 to . Practically, the sum is considered for the available data . As a result, the reservoir states are composed of a linear combination of the inputs with corresponding coefficients given in Eq. (30).
The elements of the state matrix used in the reservoir computing setup are
where is the average over the input intervals . Here and later, the dot denotes the index, over which the averaging is performed. Taking into account Eq. (30), we obtain
The input of the reservoir computer is given by the discrete input sequence multiplied by the input weights:
Therefore, we obtain
since has zero mean. Hence, we have for the elements of the state matrix
Correspondingly, the elements of the covariance matrix from (5) are
and they describe the covariance of the virtual node with virtual node over all clock cycles .
One can show that the elements from containing mixed terms of the form ,
can be approximated by zero since the random variable
is independently distributed with zero mean. The only nonzero second moment, which matters in, is the mean square
Hence, for further simplification of , we keep only terms of the form . The following calculations formalize these arguments.
where the second summation range (*) is taken over all values of such that and .
The obtained expression (37) does not depend on the sequence and hence, provides a significant simplification for calculating the covariance matrix. We may further notice that the same covariance (37) can be obtained by defining the modified state matrix , where is the the th interval of the shifted input (the -th recall) and the -th virtual node. is given by the sum over all combinations , that fall into the same shifted input interval , i.e.
This is our main result, because Eq. (38) defines the modified state matrix from which all capacities are derivable. More specifically, we have shown
Further, for the -th recall, where the target is the shifted input sequence , we have
therefore, it holds
where is the