In the past decade, there has been a proliferation of machine learning techniques applied in various fields, from spam filtering  to self-driving cars , including the more recent physical applications in fluid dynamics [6, 4]. However, a major hurdle in applying machine learning to complex physical systems, such as those in fluid dynamics, is the high cost of generating data for training . Nevertheless, this can be mitigated by leveraging prior knowledge (e.g. physical laws). Physical knowledge can compensate for the small amount of training data. These approaches, called physics-informed machine learning, have been applied to various problems in fluid dynamics [6, 4]. For example, [13, 5] improve the predictability horizon of echo state networks by leveraging physical knowledge, which is enforced as a hard constraint in 
, without needing more data or neurons. In this study, we use a hybrid echo state network (hESN), originally proposed to time-accurately forecast the evolution of chaotic dynamical systems, to predict the long-term time averaged quantities, i.e., the ergodic averages. This is motivated by recent research in optimization of chaotic multi-physics fluid dynamics problems with applications to thermoacoustic instabilities . The hESN is based on reservoir computing 
, in particular, conventional Echo State Networks (ESNs). ESNs have shown to predict nonlinear and chaotic dynamics more accurately and for a longer time horizon than other deep learning algorithms. However, we stress that the present study is not focused on the accurate prediction of the time evolution of the system, but rather of its ergodic averages, which are obtained by the time averaging of a long time series (we implicitly assume that the system is ergodic, thus, the infinite time average is equal to the ergodic average .). Here, the physical system under study is a prototypical time-delayed thermoacoustic system, whose chaotic dynamics have been analysed and optimized in .
2 Echo State Networks
The ESN approach presented in  is used here. The ESN is given an input signal , from which it produces a prediction signal that should match the target signal , where is the discrete time index. The ESN is composed of a reservoir, which can be represented as a directed weighted graph with nodes, called neurons, whose state at time
is given by the vector. The reservoir is coupled to the input via an input-to-reservoir matrix, , such that its state evolves according to
where is the weighted adjacency matrix of the reservoir, i.e. is the weight of the edge from node to node . In Eq. 1
, the hyperbolic tangent is used as the activation function. Finally, the prediction is produced by a linear combination of the states of the neurons
where . In this work, we are interested in dynamical system prediction. Thus, the target at time step is the input at time step , i.e. . We wish to learn ergodic averages, given by
where is a cost functional, of a dynamical system governed by
where is the state vector and is a nonlinear operator. The training data is obtained via numerical integration of Eq. 4, resulting in the time series , where the different samples are taken at equally spaced time intervals , and is the length of the training data set. In the conventional ESN approach, and are generated once and fixed to satisfy the Echo State Property . Only is trained to minimize the mean-squared-error
To avoid overfitting, we use ridge regularization, so the optimization problem is , where is the regularization factor. Because the prediction is a linear combination of the reservoir state , the optimal can be explicitly obtained with , where
is the identity matrix andand are the column-concatenation of the various time instants of the output data, , and corresponding reservoir states, , respectively. After the optimal is found, the ESN can be used to predict the time evolution of the system. This is done by looping back its output to its input, i.e. , which, on substitution into Eq. 1, results in
with . Interestingly, Eq. 6 shows that if the reservoir follows an evolution of states , where is the number of prediction steps, then is also possible, because is a solution of Eq. 6. This implies that either the attractor of the ESN (if any) is symmetric, i.e. if some is in the ESN’s attractor, then so is ; or the ESN has a co-existing symmetric attractor. While this seemed not to have been an issue in short-term prediction, such as in [13, 5], it does pose a problem in the long-term prediction of statistical quantities. This is because the ESN, in its present form, can not generate non-symmetric attractors. This symmetry needs to be broken to work with a general non-symmetric dynamical system. This can be done by including biases . However, the addition of a bias can make the reservoir prone to saturation (results not shown), i.e.
, and thus great care needs to be taken in the choice of hyperparameters. In this paper, we break the symmetry by exploiting prior knowledge on the physics of the problem under investigation with a hybrid ESN.
3 Physics-informed and hybrid Echo State Network
The ESN’s performance can be increased by incorporating physical knowledge during training  and/or prediction . This physical knowledge is usually present in the form of a reduced-order model (ROM) that can generate (imperfect) predictions. The authors of  introduced a physics-informed ESN (PI-ESN), which constrains the physics as a hard constraint in a physics loss term. The prediction is consistent with the physics, but the training requires nonlinear optimization. The authors of 
introduced a hybrid echo state network (hESN), which incorporates incomplete physical knowledge by feeding the prediction of the physical model into the reservoir and into the output. This requires ridge regression. Here, we use a hESN (Figure1) because we are not interested in constraining the physics as a hard constraint for an accurate short-term prediction .
4 Learning the ergodic average of an energy
We use a prototypical time-delayed thermoacoustic system composed of a longitudinal acoustic cavity and a heat source modelled with a nonlinear time-delayed model [14, 8], which has been used to optimize ergodic averages in  with a dynamical systems approach. The non-dimensional governing equations are
where , , and are the non-dimensionalized acoustic velocity, pressure, damping and heat-release rate, respectively. is the Dirac delta. These equations are discretized by using Galerkin modes
which results in a system of oscillators, which are nonlinearly coupled through the heat released by the heat source
where is the heat source location and is the modal damping . The heat release rate, , is given by the modified King’s law , , where and are the heat release intensity parameter and the time delay, respectively. With the nomenclature of Section 2, . Using 10 Galerkin modes (), and results in a chaotic motion (Fig. 2), with the leading Lyapunov exponent being . (The leading Lyapunov exponent measures the rate of (exponential) separation of two close initial conditions, i.e. an initial separation grows asymptotically like .) However, for the same choice of parameter values, the solution with is a limit cycle (i.e. a periodic solution).
The echo state network is trained on data generated with , while the physical knowledge (ROM in Fig. 1) is generated with only. As relevant to optimization of chaotic acoustic oscillations , we wish to predict the time average of the instantaneous acoustic energy, . The reservoir is composed of 100 units, a modest size, half of which receives their input from , while the other half receives it from the output of the ROM, . The entries of
are randomly generated from the uniform distribution. The matrix is highly sparse, with only 3% of non-zero entries from a uniform distribution between -1 and 1. Finally, is scaled such that its spectral radius is 0.1 and 0.3 for the ESN and the hESN, respectively. The time step is . The network is trained for units, which corresponds to 6 Lyapunov times, i.e. . The data is generated by integrating Eq. 9 in time with , resulting in . In the hESN, the ROM is obtained by integrating the same equations, but with (one Galerkin mode only) unless otherwise stated. Ridge regression is performed with . The hyperparameters’ values are taken from the literature [13, 5] and a grid search.
On the one hand, Fig. (a)a shows the instantaneous error of the first modes of the acoustic velocity and pressure for the ESN, hESN and ROM. None of these can accurately predict the instantaneous state of the system. On the other hand, Fig. (b)b shows the error of the prediction of the average acoustic energy. Once again, the ROM alone does a poor job at predicting the statistics of the system, with an error of 50%. This should not come at a surprise since, as discussed previously, the ROM does not even produce a chaotic solution. The ESN, trained on data only, performs marginally better, with an error of 48%. In contrast, the hESN predicts the time-averaged acoustic energy satisfactorily, with an error of about 7%. This is remarkable, since both the ESN and the ROM do a poor job at predicting the average acoustic energy. However, when the ESN is combined with prior knowledge from the ROM, the prediction becomes significantly better. Moreover, while the hESN’s error still decreases at the end of the prediction period,
, which is 5 times the training data time, the ESN and the ROM stabilize much earlier, at a time similar to that of the training data. This result shows that complementing the ESN with a cheap physical model (only 10% the number of degrees of freedom of the full system) can greatly improve the accuracy of the predictions, with no need for more data or neurons. With a slightly more accurate ROM (), the error further decreases to 3% (result not shown).
The performance of the network is sensitive to the hyperparameters and the values that yield good performance for a certain may perform poorly for another set. Figure 4 shows the phase plots of the full model and of the hESN when the physical parameters are varied. The hyperparameters we selected perform well with the system under investigation (, ) (Fig. 4, middle panel). However, if no further fine tuning is carried out, the same hyperparameters perform poorly for a different set of physical parameters (Fig. 4, left and right panels). This well-known drawback in machine learning techniques  calls for robust methods for the automatic selection of the optimal hyperparameters. This is the scope of other current studies.
5 Conclusion and future directions
We propose the use of echo state networks informed with incomplete prior physical knowledge for the prediction of time averaged cost functionals in chaotic dynamical systems. We apply this to chaotic acoustic oscillations, which is relevant to aeronautical propulsion. The inclusion of physical knowledge comes at a low cost and significantly improves the performance of conventional echo state networks from a 48% error to 7%, without requiring additional data or neurons. The ability of the proposed ESN can be exploited in the optimization of chaotic systems by accelerating computationally expensive shadowing methods . For future work, (i) the performance of the hybrid echo state network should be compared against those of other physics-informed machine learning techniques; (ii) robust methods for hyperparameters’ search are needed; and (iii) this technique is currently being applied to larger scale problems. In summary, the proposed framework is able to learn the ergodic average of a fluid dynamics system, which opens up new possibilities for the optimization of highly unsteady problems.
-  (2012) Practical recommendations for gradient-based training of deep architectures. In Neural Networks: Tricks of the Trade: Second Edition, pp. 437–478. External Links: Cited by: §4.
-  (1931) Proof of the ergodic theorem. Proceedings of the National Academy of Sciences 17 (12), pp. 656–660. External Links: Cited by: §1.
-  (2016-04) End to End Learning for Self-Driving Cars. arXiv e-prints, pp. arXiv:1604.07316. External Links: Cited by: §1.
-  (2020) Machine learning for fluid mechanics. Annual Review of Fluid Mechanics 52 (1). External Links: Cited by: §1.
-  (2019) Physics-informed echo state networks for chaotic systems forecasting. In Computational Science – ICCS 2019, Cham, pp. 192–198. External Links: Cited by: §1, §2, §3, §4.
-  (2019) Turbulence modeling in the age of data. Annual Review of Fluid Mechanics 51 (1), pp. 357–377. External Links: Cited by: §1.
-  (2009) A review of machine learning approaches to spam filtering. Expert Systems with Applications 36 (7), pp. 10206 – 10222. External Links: Cited by: §1.
-  (2020) Stability, sensitivity and optimisation of chaotic acoustic oscillations. Journal of Fluid Mechanics 882, pp. A24. External Links: Cited by: §1, §4, §4.
-  (2004) Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science 304 (5667), pp. 78–80. External Links: Cited by: §2.
Reservoir computing approaches to recurrent neural network training. Computer Science Review 3 (3), pp. 127 – 149. External Links: Cited by: §1.
-  (2012) A practical guide to applying echo state networks. pp. 659–686. External Links: Cited by: §2.
-  (2017) Sensitivity analysis on chaotic dynamical systems by non-intrusive least squares shadowing (NILSS). Journal of Computational Physics 347, pp. 56 – 77. External Links: Cited by: §5.
-  (2018) Hybrid forecasting of chaotic processes: using machine learning in conjunction with a knowledge-based model. Chaos: An Interdisciplinary Journal of Nonlinear Science 28 (4), pp. 041101. External Links: Cited by: §1, §2, §3, §4.
-  (2019) Data assimilation in a nonlinear time-delayed dynamical system with lagrangian optimization. In Computational Science – ICCS 2019, Cham, pp. 156–168. External Links: Cited by: §4.