Master memory function for delay-based reservoir computers with single-variable dynamics

08/28/2021 ∙ by Felix Köster, et al. ∙ Berlin Institute of Technology (Technische Universität Berlin) 0

We show that many delay-based reservoir computers considered in the literature can be characterized by a universal master memory function (MMF). Once computed for two independent parameters, this function provides linear memory capacity for any delay-based single-variable reservoir with small inputs. Moreover, we propose an analytical description of the MMF that enables its efficient and fast computation. Our approach can be applied not only to reservoirs governed by known dynamical rules such as Mackey-Glass or Ikeda-like systems but also to reservoirs whose dynamical model is not available. We also present results comparing the performance of the reservoir computer and the memory capacity given by the MMF.



There are no comments yet.


page 6

page 12

Code Repositories


Master Memory Function to calculate linear memory capacity of a time-delay based reservoir computer.

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Reservoir computing is a neuromorphic inspired machine learning paradigm, which enables high-speed training of recurrent neural networks and is capable of solving highly complex time-dependent tasks. First proposed by Jaeger

[1] and inspired by the human brain [2], it utilizes the inherent computational capabilities of dynamical systems. Very recently, the universal approximation property has also been shown for a wide range of reservoir computers, which solidifies the concept as a broadly applicable scheme [3]

. Bollt pointed out a connection between reservoir computers and VAR (vector autoregressive) and nonlinear VAR machines, which may be one of the reasons behind the surprising efficiency of reservoir computers for time-dependent tasks

[4, 5]. Many different realizations [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18] have shown the relevance of reservoir computing to practical applications, while analytical and numerical analyses [19, 20, 21, 22] help in building understanding of its working principles and improve its performance. Motivated by fast inference and low energy consumption, optoelectronic and optical hardware implementations of reservoir computers are often realized [23, 24, 25, 26, 27, 28, 29, 30, 31, 32] indicating a high future potential of such processing unit.

Originally, reservoir computing is performed with a network of nonlinear nodes, which projects the input information into a high dimensional phase space, allowing a linear regression to linearly separate features

[1]. In time-delayed reservoir computing, a single dynamical node with delayed feedback is employed as a reservoir instead of the network [33]. The time-multiplexing procedure allows for such a single-element system to implement a recurrent ring network [33, 34, 35], see Fig. 1. The absence of the need for a large number of nonlinear elements significantly reduces the complexity of the reservoir hardware implementation. Existing experimental and numerical realizations show promising results in solving time-dependent tasks, such as speech recognition, time-series predictions [36, 31, 37, 38, 39, 40, 41, 42, 43] or equalization tasks on nonlinearly distorted signals [44]. For a general overview, we refer to [45, 46, 47].

Often reservoirs are optimized to a specific task by hyperparameter tuning, which defeats the purpose of reservoir computing as a fast trainable machine learning scheme. Dambre et al.

[48] introduced a task-independent quantification of a reservoir computer, building on the memory capacity notion already introduced in [1] whereas a high memory capacity pinpoints to generally well-performing reservoirs.

Figure 1: Scheme of delay-based reservoir computing. Important timescales are marked in green: input cycle, delay, virtual node seperation time.

In this paper, we provide an analytical tool for finding promising reservoir setups by introducing a master memory function (MMF) for delay-based reservoir computing with a small input. The MMF allows for fast computable predictions of the linear memory capacity and it indicates that linear memory capacity of reservoirs is similar for systems with similar linearizations.

The main idea behind our method can be outlined as follows. Consider a delay-based reservoir described by a general nonlinear system , where is an input signal, which is ”small” in a certain sense, and determines the state of the reservoir. The response of the reservoir must be independent (at least to some extend) on its initial state, the property known as echo state. Such a situation occurs when the reservoir is operating near an equilibrium state that is stable in the absence of the input signal. Therefore, all reservoir dynamics takes place in a neighborhood of this equilibrium and as a result, the reservoir linearization approximates these dynamics. Here is the deviation from the equilibrium. In the considered case of the single-variable reservoir, the scalar parameters and are the only determining quantities. The relatively simple form of the linearized system allows us to obtain an analytical expression for the linear memory capacity, which depends on the parameters and and thus parametrically determines the linear memory capacity of any reservoir with the above properties. We call the obtained function MMF due to its universal features, i.e. different reservoir computing setups, which possess the same linearizations, yield the same linear memory capacity given by MMF.

The paper is structured as follows. First, we will briefly revise the concept of time-delay-based reservoir computing and the concept of linear memory capacity. We will then present our main analytical result while additionally presenting an example code for an efficient evaluation of the obtained expression; the derivation is given in the appendix. Finally, comparisons of numerically simulated reservoir computer performance with the semianalytical approach are provided. We also show in Sec. 4.6 the applications of our results to reservoirs with unknown dynamical model, where the parameters and are evaluated using the system response to an external stimuli.

2 Time-Delay based Reservoir Computing

Reservoir Computing utilizes the intrinsic abilities of dynamical systems to project the input information into a high dimensional phase space [1]. By linearly combining the responses of the dynamical reservoir to inputs, a specific task is approximated. In the classical reservoir computing scheme, often, a so-called echo state network is used by feeding the input into a spatially extended network of nonlinear nodes. Linear regression is then applied to minimize the Euclidean distance between the output and a target. This approach is particularly resourceful for time-dependent tasks because the dynamical system which forms the reservoir acts as a memory kernel.

In the time-delay-based reservoir computing scheme [33], the spatially extended network is replaced by a single nonlinear node with a time-delayed feedback loop. The time-multiplexing procedure with a periodic mask function is applied to translate the input data to a temporal input signal. Similarly, the time-multiplexing procedure translates the single temporal high-dimensional reservoir response to the spatio-temporal responses of virtual nodes. The virtual nodes play the same role as the spatial nodes in echo state networks.

A sketch of the delay-based reservoir computing setup is shown in Fig. 1. In the following, we will give a short overview of the quantities and notations used in this paper. We also refer to our previous works [49, 50, 51] for a detailed explanation of how the reservoir setup is operated and task-independent memory capacities are computed.

Figure 2: Exemplary timeline sketch for time-delay based reservoir computing. Three input intervals of length are shown for the inputs , , in red, blue and green respectively. The delay time in multiplicatives of the -intervals is . The number of virtual nodes in this example is , thus . Four system states are indicated in grey (, , and ), where influences directly via the delay time . The pink line indicates an example trajectory, with black dots showing the measured system states i.e., the virtual nodes.

Let us briefly remind the main ingredients of the time-multiplexed reservoir computing scheme [33, 50, 51, 49]. We apply an input vector componentwise at times , , , being the number of sample points. The administration time for different inputs is the same and it is called the clock cycle . To achieve a high dimensional response to the same input, a -periodic mask function multiplies the input and the resulting signal enters the system (see Fig. 1 and Fig. 2). The mask is a piecewise-constant function on intervals, each of length corresponding to virtual nodes. The values of the mask function play the same role as the input weights in spatially extended reservoirs, with the difference that time-multiplexing distributes the weights over time. The responses of the reservoir are collected in the state matrix , see Fig. 3. The elements of the state matrix are with , and , where is the state of the dynamical element of the reservoir at time shifted by the mean over all clock cycles , see [48]. The average can be understood as the averaging over the row elements. For example, for an experimental or numerical realization of the reservoir with a semiconductor laser, could be the laser intensity.

A linear combination of the state matrix is given by , where

is a vector of weights. Such a combination is trained by ridge regression, i.e., the least square approximation to some target vector

where is the Euclidean norm, and is a Tikhonov regularization parameter. The solution to this problem is


In the case of invertible , the matrix is the Moore-Penrose pseudoinverse. We set , where is the largest state response in the state matrix . To quantify the system’s performance, we use the capacity (see [48, 49]) to approximate a specific task which is given by


where NRMSE is the normalized root mean square error between the approximation and the target



is the variance of the target values


Figure 3: State matrix corresponding to the timeline shown in Fig. 2 with .

3 Reservoir Computation Quantification

Here we introduce the linear memory capacity as a quantitative measure for the memory kernel of a dynamical system.

3.1 Memory Capacity

The central task-independent quantification was introduced by Jaeger in [1] and refined by Dambre et al. in [48], which yields that the computational capability of a reservoir system can be quantified via an orthonormal set of basis functions on a sequence of inputs. Here we give a recap of the quantities introduced in [50, 49] and focus on the linear memory capacity.
In particular, the capacity to fulfill a specific task is given by


which can be derived from Eq. (2) (see [48, 49]). The capacity equals if the reservoir computer computes the task perfectly, thus ; and it equals if the prediction is not correlated with the target. In between and if it is partially capable of fulfilling the task. To quantify the systems capability for approximating linear recalls of inputs, an input sequence is applied, where

are uniformly distributed random numbers, independent and identically drawn in

. With the input sequence of random numbers, the reservoir response is collected in the state matrix .

To describe a linear recall task of steps into the past, the target vector is defined as


which is the linear recall steps into the past. Formally, one considers an infinitely long sequence . To approximate it numerically, we use .

The linear memory capacity MC is defined as the sum over the capacities of all possible linear recall tasks


where is the capacity of the -th recall into the past. This quantification is task independent and thus implications for specific applications cannot be given. Different tasks may need different specific capacities. The measure MC

thus only gives a hint for well-performing reservoirs in the context of using the full scope of the given reservoirs, rather than a direct task-specific estimate. We have to point out, that the linear-nonlinear trade off is a well-known effect

[48], thus a system with high linear memory capacity can yield a low nonlinear transformation capability. Nevertheless, we believe predicting a well-performing linear memory kernel reservoir is beneficial for a general reservoir computer setup, as higher nonlinear memory transformation can be utilized by adding additional reservoir systems with increased perturbations.

3.2 Reservoir Systems

As one example system, we use a Stuart-Landau oscillator with delayed feedback:


Here, describes the real-valued amplitude of the system, is the feedback strength, the delay time, is a parameter determining the dynamics, and the input strength of the information fed into the system.

For and , system (8) has only the trivial equilibrium , and for , additionally, the nontrivial equilibria exist , which appear in a pitchfork bifurcation at . The linearization at the nontrivial equilibria (taking into account the input term) reads


where , , and .

As the second example, we use the Mackey-Glass system


where is the dynamical variable, is the delayed variable, and , , and are control parameters. The reservoir input is fed into the system via the term . We set , for which the system possesses a stable nontrivial equilibrium (for ). The corresponding linearization at this equlibrium is Eq. (9) with , , and .

4 Results

4.1 Analytic description of memory capacity

From Eq. (5), we see that the capacity to approximate a specific input is given by the inverse of the covariance matrix (corrected by ), also called the concentration matrix, and the matrix multiplication of the state matrix and the target . Thus, it is necessary to derive the state matrix from the responses to the small perturbations of the system. This has already been done for 1-dimensional reservoirs with by an Euler step scheme [22], and for 1-dimensional reservoirs with for specific differential equations [51]. We would like to extend this knowledge by analyzing arbitrary systems and . We assume the virtual node distance to be small and , with . We also assume the operation point of the reservoir to be a stable equilibrium. We will exemplarily validate our analysis on the two 1-dimensional nonlinear reservoirs given by Eqs. (8) and (10).

Our main result is the modified state matrix , that we can use to determine the MC while we can calculate it solely from the linearized system. The entries are given by


where , , , the parameters and , and are given by the linearization (9), are the weights of the time-multiplexing. The index corresponds to the -th clock cycle, and to the -th virtual node. The rows of the modified state matrix contain entries in the statistical direction of the -th shifted input. As we show in App. A, the covariance of the modified state matrix approximates the original state matrix


Moreover, we also show that the full linear memory capacity can be calculated by using solely the modified state matrix


and the capacity of the -th recall is given by


where is the -th row of . Details of the derivations can be found in App. A.

We call the memory capacity given by Eq. (13) the Master Memory Function (MMF). For given parameters of the linearization , , and , as well as the mask coefficients , this function can be evaluated in a much more efficient way than the direct evaluation of the linear memory capacity via a stepwise integration of the differential equation. A speed comparison is given in App. B. The new approach does not require calculating the reservoir, and it does not involve the input sequence .

4.2 Efficient numerical evaluation of the memory capacity and the modified state matrix

The obtained approximations of the modified state matrix (11) and memory capacity function (13) allow for efficient numerical evaluation. For this, we propose the following scheme, which we also show as pseudocode in Alg. 1.

First, we iterate over all entries of the modified pascal’s triangle given in Fig. 6, which can be done by two nested loops , . We do this until all entries in a row are below a given threshold for for (see Fig. 6). The threshold ensures that we cut unnecessary terms smaller than the regularisation parameter . A third loop goes over all virtual nodes adding the result multiplied with the corresponding weight to all corresponding entries , that thus lie in the same input interval . See App. A for more information. The algorithm to compute the modified state matrix is given below, where is the floor function rounding down to the greatest integer less than or equal to , getBinomialTerm(i,j,p) returns and is the mask weight vector of length . The implemented C++ code can be found in the supplementary.

State ;
for  do
       for  do
             for  do
                   n = VirtualNeuron;
                   i = RowsInPascalsTriangle;
                   k = ColumnsInRow;
                   State (,) += ;
                   if  for all ColumnsInRow then
                         return State ;
             end for
       end for
end for
Algorithm 1 Calculate modified state matrix

4.3 Direct simulation of the reservoir and memory capacity

Simulations have been performed in standard C. For linear algebra calculations, the linear algebra library "Armadillo" [52] was used. To numerically integrate the delay-differential equations, a Runge-Kutta fourth-order method was applied, with integration step in dimensionless time units. First, the system is simulated without reservoir inputs, thus letting transients decay. After that, a buffer time of 10000 inputs was applied (this is excluded from the training process). In the training process, inputs were used to have sufficient statistics. Afterward, the memory capacities of linear recalls were calculated with Eq. (5), whereby a testing phase is not necessary. The linear memory capacity MC was calculated by summing the obtained capacities . For the piecewise-constant -periodic mask function independent and identically distributed random numbers between were used.

For all simulations, the input strength was fixed to . The small input strength was used to guarantee linear answers of the reservoir and, hence, the relevance of the approximation.

A program written in C++ to perform the semi-analytic calculations is given in the supplementary material.

Figure 4: Linear memory capacity computed directly (blue dots) and via MMF by Eq. (14) (green crosses) for the Stuart-Landau reservoir computer, plotted as a function of the recall steps . (a,c) , (b,d) and (a,b) , (c,d) (green) . The values are averaged over 100 different masks. The parameters for all four setups are , , , , and .

4.4 Comparison of MMF and direct numeric calculations of the memory capacity

In this section we illustrate the MMF effectiveness. First, we show that the MMF provides a very good approximation of MC using the reservoir given by Eq. (8). The approximation works quite well as long as is relatively small. This is fulfilled for typical reservoir computing setups, as one would otherwise lose computation speed. In the second part, we show how MMF provides a universal, system-independent characteristics. For this, we compare MMF with the memory capacities of different reservoirs. Each particular reservoir realization is described by one parameter combination of the MMF. In the last part, we describe how MMF can be computed for reservoirs with unknown dynamical rule. For this, the parameters and of the linearization are measured from system’s response to a small periodic input.

Figure 4 shows the memory recall capacity obtained from direct simulations and compares it with the MMF for 4 different cases of the Stuart-Landau system, given by Eq. (8). The exact parameters are given in the caption of Fig. 4. The directly simulated results are shown by blue solid lines and blue markers , whereby green dashed lines and green markers show the MMF. For a small virtual node distance in Fig. 4(a,c), the MMF predicts the linear memory capacity very accurately. For a higher value of (Fig. 4(b,d)), the accuracy drops, though the results are still accurate for qualitative predictions, and describe the general trend of the system’s memory capacity.

The scans in Fig. 4(c) and 4(d) were done with a higher delay time , which induces memory gaps [50]. Even though the memory capacity has a complex dependency on at these parameter values, the prediction for the two different virtual node distances and is still accurate.

A 2-D parameter plane was simulated in App. E to show that the predictions of the MMF work for arbitrary parameter setups, thus the general predictability of the new scheme is very promising.

Comparing the computation speed of the classical numerical intergration and the new proposed scheme shows an increase of 2 to 3 orders of magnitude, depending on the operation point, the number of training steps and the value of the clock cycle . A higher clock cycle and more training steps increase the simulation time for the direct numerical integration, whereas the new proposed scheme is independent of that. If the operation point is close to a bifurcation, the convergence of the new proposed scheme is slower, increasing the computation time needed. Still, even very close to the bifurcation line, the computation speed is significantly higher (with a factor of about 100) making the MMF a valuable tool. See App. B.

Figure 5: computed by the MMF in the 2-dimensional parameter plane of the linearization parameters and , with , , . At the edge to instability the performance is highest. The inset shows the MC over the recall steps of the MMF, the Stuart-Landau and the Mackey-Glass System, at the parameter point indicated at the red cross.

4.5 Universality

An exciting result that follows from the MMF concept is the possibility to generalize to arbitrary time-delay-based reservoirs. Every reservoir with a similar linearization should yield similar linear memory capacity. To illustrate this, we compare the Stuart-Landau reservoir system given by Eq. (8) and the Mackey-Glass reservoir system given by Eq. (10).

The inset of Fig. 5 illustrates this fact. It shows the capacity to recall the -th step into the past as a function of for the Stuart-Landau (blue), the Mackey-Glass (red), and the MMF given by Eq. (14) (green). Both systems are tuned such that their respective linearization yield the same parameters and .

From this it follows that it is enough to compute the linearization parameters and to predict the MC of any arbitrary delay-based reservoir computer. The color plot in Fig. 5 shows the MMF given by Eq. (13) for different parameter values and . A well-performing operation point seems to be the edge to instability, agreeing with the known rule of thumb from the literature. Any reservoir yielding the same linearization parameters and in (9) must possess the corresponding memory capacity as given by Fig. 5 for these values of the parameters, as soon as the input is sufficiently small.

It thus follows that analyzing the Jacobian (linearization given by Eq. (9)) for fixed delay , virtual node distance , and number of virtual node is sufficient to predict the linear memory capacity of any arbitrary time-delay-based reservoir computer, and this memory capacity is given by MMF via Eqs. (13) and (14).

4.6 Systems with unknown dynamics; small signal response approach

In this chapter, we show an experimentally accessible approach for measuring the parameters and for a delay system whose dynamical equations of motion are not known and which can be described by a single variable. The corresponding linearized dynamical system is given by


The goal is to measure and . This can be achieved by perturbing the system with a harmonic periodic signal . When this signal is small, we can consider the perturbed linearized system


where the complex form is chosen for simplicity. Due to linearity, the real solution is obtained simply by taking the real part. We consider the case of real and , which holds always when the reservoir variable is real.

Since the homogeneous solution decays to the stable equilibrium (we assume its exponential stability), the solution of Eq. (16) converges to the particular solution, given by


with . The ratio of the output to the input amplitude equals to the transfer function


where can be measured.

To determine the parameters and , it is sufficient to measure the transfer function at two frequencies, for example, at and . The first frequency is resonant to the delay while the second is in ’anti-phase’ to the delay . It holds

From above we can obtain the values for and


where the values of and can be obtained experimentally or numerically by perturbing and measuring the response of the reservoir.

We remark, that the choices of the resonant and anti-phase perturbation frequencies are convenient, but not unique. Clearly, one can perturb at other frequencies to obtain and . Moreover, the above idea can be generalized to the case of complex-valued parameters and , whereby more frequencies must be tested.

The measured values of the parameters and for a reservoir with unknown dynamics can be then simply used in MMF by estimating the linear memory capacities.

5 Conclusions and discussion

We have developed a simple and fast method for calculating the linear memory capacity for time-delay-based reservoirs with single-variable dynamics. The method allows the construction of a modified state matrix whose columns point in the direction of the linear recall steps.

Our results can be used to predict the reservoir computing setup with high linear memory capacity. The nonlinear memory capacity, on the other hand, remains an open question. In this case, combined setups could be used, where a delay-based reservoir computer includes multiple uncoupled subsystems. The decoupling ensures that no highly complex dynamical responses destroys the computational performance of the reservoir. One time-delay-based reservoir computing subsystem can be tuned to low perturbations at the edge of instability to act as a high linear memory kernel. Increasing the perturbation strength for the other subsystems will ultimately increase the nonlinear responses and thus the nonlinear memory capacity, so that the subsystems with high input strengths take on the role of high nonlinear transformation kernels.

A teamwork setup is thus recommended, where one or a few subsystems perturbed by small inputs and operated close to instability act as a linear memory kernel. In contrast, other nodes are perturbed more strongly and thus act as highly nonlinear transformation units. Such a setup should be capable of tackling a wide range of different tasks. It would be interesting to investigate this in future works.

One of the advantages of the delay-based reservoir, which allows the introduction of the MMF, is that it contains a small number of system parameters while the dynamics remains infinite dimensional. In the case of a small input signal and single-variable dynamics, these are only the linearization parameters and . Thus, if the linear memory capacity is computed for all possible values of these two parameters, it covers the case of all possible reservoirs. This procedure could be difficult, if not impossible, for network-based reservoirs, where the systems parameters may include, e.g., multiple coupling weights.

Appendix A Derivation of the modified state matrix and reduced formula for memory capacity

Consider a single-variable delay-differential equation, which describes the dynamics of the reservoir


where stands for an external input. We assume that is an equilibrium of this system, i.e., , and is small. In the case when the dynamics of (21) takes place near the equilibrium point , we can introduce the perturbative ansatz . Then the linearized system for the perturbation reads


where , , and .

Consider to be the temporal node-spacing of the reservoir, which is typically . Then, the equation (22) on any interval can be considered as the simple scalar ODE with the inhomogeneity . Moreover, according to the reservoir setup, the input is constant on this interval and equals . By variation of constants formula, we obtain for the solution of (22) for


where is the left endpoint of the interval.

Denoting to be the function on the interval , with , we rewrite equation (A) as


where we additionally used the relation and , . By evaluating Eq. (A) at , we obtain


Denote , which is the approximation for state of the reservoir (21) at the virtual nodes . From (A), we obtain


Further, we approximate the integral from Eq. (A) by assuming . The approximation holds, in particular, when is small. The obtained expression


represents a discrete map (coupled map lattice) for approximating the state matrix . Here , , and . If considering it as a corresponding network with the nodes , see e.g. [35, 53, 34], we see that the node is coupled with the two nodes and in a feed-forward manner with the coupling weights and , respectively. The schematic representation of such a coupling structure leads to a Pascal’s triangle shown in Fig. 6. The first row of the Pascal’s triangle from Fig. 6 shows the dependence on , which is simply the multiplication by . In the second row, the contributions of and are shown. To obtain these dependencies explicitly, we insert and recursively in (27):


that is, we obtain the terms and . To build up further intuition about the dependence of the state matrix on the input, we show here the third level by substituting recursively , , and into Eq. (A):


To obtain a general recursive formula, we need to split the index in the appearing terms as , where corresponds to the delayed (’right’, ) and to the ’left’ () connections in the coupling network in Fig. 6:


where is a constant depending only on and . For an infinitely long input sequence, the sum in (30) goes for all from 0 to . Practically, the sum is considered for the available data . As a result, the reservoir states are composed of a linear combination of the inputs with corresponding coefficients given in Eq. (30).

Figure 6: Pascals Triangle showing the series contributions for , given by all the participating equilibria . Blue boxes show participating timesteps, green boxes show multiplications by time propagating factors and red boxes give equilibria contributions for the specific timesteps. Out of convenience, we denote and . denotes the fixpoint factor contributions given by for a specific .

The elements of the state matrix used in the reservoir computing setup are

where is the average over the input intervals . Here and later, the dot denotes the index, over which the averaging is performed. Taking into account Eq. (30), we obtain


The input of the reservoir computer is given by the discrete input sequence multiplied by the input weights:

Therefore, we obtain


since has zero mean. Hence, we have for the elements of the state matrix


Correspondingly, the elements of the covariance matrix from (5) are


and they describe the covariance of the virtual node with virtual node over all clock cycles .

By substituting (33) into (34), we obtain


One can show that the elements from containing mixed terms of the form ,

can be approximated by zero since the random variable

is independently distributed with zero mean. The only nonzero second moment, which matters in

, is the mean square


Hence, for further simplification of , we keep only terms of the form . The following calculations formalize these arguments.


where the second summation range (*) is taken over all values of such that and .

The obtained expression (37) does not depend on the sequence and hence, provides a significant simplification for calculating the covariance matrix. We may further notice that the same covariance (37) can be obtained by defining the modified state matrix , where is the the th interval of the shifted input (the -th recall) and the -th virtual node. is given by the sum over all combinations , that fall into the same shifted input interval , i.e.


This is our main result, because Eq. (38) defines the modified state matrix from which all capacities are derivable. More specifically, we have shown


Further, for the -th recall, where the target is the shifted input sequence , we have


therefore, it holds

where is the