Chemically reacting flows are common in many fields of engineering, such as hypersonic flow, combustion, explosions, manufacturing processes, and environmental assessments [28, 26, 3, 16]. For hydrocarbon combustion and explosions the numbers of species and reactions can reach into hundreds and thousands respectively. Even with so-called reduced models [29, 11, 19, 14, 17, 27], which try to keep the main species and reactions while neglecting those that are not important, typically over 100 reactions need to be updated. An expedient (and widely used) way to compute flows with chemical reactions is to separate the advection and diffusion of species from the actual reactions. In this way, the vastly different timescales of the reactants can be treated in a separate, stiff ODE solver. Such chemical reaction solvers take the given species at the time step and desired timestep and update the species to . In terms of a ‘black box solver’ this implies either:
where stands for the ODE integrator of chemical reactions and is the right-hand side of the system. This is the formulation to solve the system of ODEs numerically using the standard Euler method. Compared to a typical ‘cold’ flow case, the presence of these chemical reactions implies an increase of computing requirements that can exceed factors of 1:100, i.e. 2 orders of magnitude. This makes many of these flow simulations so expensive that entire classes of problems have been sidelined, waiting for faster computers to be developed in years to come. The goal here is to replace the ‘black box’ solvers (equivalently the functions and ) given in (1) and (2) by novel, robust Deep Neural Networks (DNNs) without sacrificing accuracy.
The list of references on using DNNs in computational fluid dynamics (CFD) is growing fast, see for example, [18, 8]. However, the results on using DNNs in chemical kinetics are scarce. A popular approach to solve PDEs and ODEs is through the use of so-called Physics-Informed Neural Networks (PINNs) [23, 5]. The goal of PINNs is to minimize the PDE/ODE residual by using a neural network as a PDE/ODE solution Ansatz. The inputs to the network are space-time variables and all the derivatives are computed using automatic differentiation. See  for an example of a PINN for stiff ODE systems where the only input is time.
The approach presented here fundamentally differs from the aforementioned approaches. Instead of Physics-Informed-Neural-Networks, the goal is to pursue Learn-from-Physics/Chemistry. For instance in (1) and (2), DNNs will be used to learn and from a given dataset coming from physics/chemistry simulations. Such an approach to learn has also been considered recently in [21, 25, 20] where the authors employ an architecture that is motivated by standard feed forward networks. The authors of 
consider a similar problem, but use an autoencoder, which is a type of DNN used to reduce the dimension of the system. Notice that the proposed approach will allow for the chemical reactions described by (1) and (2) to start at any point in time without knowing a reference time. The latter is crucial for the application under consideration.
The DNNs used in this paper have been motivated by the Residual Neural Network (ResNet) architecture. ResNets have been introduced in [9, 24, 2] in the context of data/image classification, see also  for parameterized PDEs and  where the (related) so-called Neural ODE Nets 
have been used to solve stiff ODEs. The ResNet architecture is known to overcome the vanishing gradient problem, which has been further analyzed using fractional order derivatives in. The key feature of a ResNet is that in the continuum limit, it becomes an optimization problem constrained by an ODE. Such a continuous representation further enables the analysis of the stability of the network using nonlinear ODE theory. In addition, standard approaches from optimization, such as the Lagrangian approach can be used to derive the sensitivities with respect to the unknown weights and biases.
The main novelties in the DNNs presented in this work are the following:
Motivated by chemically reacting flows, the goal is to create networks that can learn multiple reactions propagating multiple species. To accomplish this task, parallel ResNets are constructed where the data corresponding to multiple quantities is used as input for each network but the output is only a single species. Similar approaches for chemical kinetics can be found in [21, 25], where the authors use standard feed forward networks.
The proposed DNNs are applied to non-trivial chemically reacting flows. Experimental results show that it is helpful to know the underlying properties of the species while designing the networks.
The remainder of the paper is organized as follows: In Section 2, the DNNs used to approximate systems of type (1) and (2) are introduced, and training strategies are discussed. Section 3 provides some information on the implementation of the method. Several experimental results are discussed in Section 4. Additional information on the data being used in the experiments is provided in Appendix A.
2. Deep Residual Neural Networks
2.1. Problem Formulation
Consider an input-to-output map
where could be or in (1) and (2), respectively. The goal is to learn an approximation of using Deep Neural Networks (DNNs). In particular, the proposed DNNs are motivated by Deep Residual Neural Networks (ResNets). See [24, 2] for examples of ResNets for classification problems and  for an application to parameterized PDEs.
Given a training dataset , the DNN approximation of is given by the output of the DNN (ResNet)
i.e., . Here is a given parameter called the skip connection parameter. The nonlinear function
Figure 1 (left) shows that is a good approximation of ReLU. Here, and in the experiments below, is taken to be 0.1. The number of layers (depth of the network) is denoted by .
Layer 0 is referred to as the input layer, layers 1 through as hidden layers and layer as the output layer.
The quantities denote the weights and biases, i.e. the parameters that need to be determined. In this setting is a matrix and
is a vector, together they introduce an affine transformation. Iffor , then and . The dimension is also referred to as the width of the -th layer. Notice that for in (3), it is assumed that , but this can be easily generalized, see . Nevertheless, in the current setup the dimension of the input and output can be different.
An example of a typical DNN is shown in Figure 1
(right) with depth 3, width 8, input dimension 10, and output dimension 1. The values of the weights are given by the color of the lines connecting the neurons, the values of the biases are given by the color of the neurons, and the color of the input layer has been set to zero.
needs to be evaluated. This requires introducing the so-called adjoint equation (also known as back-propagation in machine learning literature). See[24, 2, 1] for complete details. Having obtained the gradient, an approximate Hessian is computed via the BFGS routine. Both the gradient and the approximate Hessian are used to solve the minimization problem.
2.2. Proposed Approach and Applications
The simplest time discretization of (5) is given by
for . At first, the right-hand-side of (6) is approximated. This is referred to as the first approach. In this case, the training data (input and output) is . In the second approach, the map to be approximated is , i.e., the setting of (1). Here marches forward in time with time increments given by . The training data for this approach is given by .
The training data in the examples below has been computed using CHEMKIN . Before training the networks, the data is scaled in the following manner: For a data set corresponding to a single quantity (e.g. temperature) the meanof the set are computed. Each entry is scaled as
Using this new centered and normalized set, the training data is obtained by defining
so that the resulting data set lies in the interval . Given that chemical reactions follow an exponential Arrhenius-type law, a logarithmic scaling was also tried. This implies performing the above scaling on the dataset instead of .
For the DNNs implemented below, the input dimension will always be given by
The output dimension is , which corresponds to temperature plus the number of species.
Rather than using a single DNN to approximate the entire solution map , in many cases a parallel ResNet architecture is implemented in which a separate ResNet is created to correspond to each desired output. With this structure the output dimension for each ResNet is 1, but there are ResNets implemented in parallel. The inputs to all of the parallel ResNets are the same, and so the parallel architecture can also be thought of as a single large ResNet (with ) that is not fully connected.
Loss Function and Training.
In the case of a parallel ResNet architecture, each parallel network is associated to a separate loss function. The same form of the loss function is used for each network. Lettingrepresent the concatenation of all weight and bias parameters associated to the -th parallel network, the loss function takes the form
In other words, the process of training each network is the process of finding the parameters which minimize the mean squared error between the network output and the training data, while also using both and regularizations to penalize the size of the parameters. As indicated in section 2, a gradient based method (BFGS in particular) is used to solve the constrained optimization problem.
DNNs are trained with validation data in order to overcome the overfitting issue. The validation data is a subset of the training data that is not used to compute the weights and biases. Instead, a separate loss function is formed that computes the mean squared error between the ResNet output and the validation data. This is a way to test how the ResNet performs on unseen data during the training process itself. If the validation error increases, the training continues for a certain number of additional iterations, called the patience. During this time, if the validation error decreases to a new minimum, the patience resets to zero and training continues. If the validation error does not attain a new minimum and the full number of patience iterations is reached, then the training process is terminated.
After training and validation comes the testing phase in which the DNN approximations are used to carry out time marching. Using the first approach (see above) this involves implementing an Euler time-stepping commonly used to numerically solve ODEs, with the usual right-hand-side of the ODE replaced by the DNN output, that is,
where is known data. For the second approach, results are tested by using the following update step for , where again is known.
The above methods are applied to a system of ODEs that model hydrogen-oxygen reaction. In particular, the reduced hydrogen-air reaction model with 8 species and 18 reactions  is used. This model is simpler than the hydrocarbon reaction model mentioned in section 1. However, it can still capture several essential key features. More complicated DNNs which can handle over 100 reactions will be part of future work.
To start, the second approach
was implemented in Keras
using stochastic gradient descent. Libraries such as Keras have proven to be highly successful for classification problems, but their success in physics based modeling is still limited. Initially, it was observed that the results produced by using Keras were unable to capture the ‘pulse’ in chemical reactions. After gaining access to some of the Keras code, and seeing some improvement, the results were still highly inconsistent. For example, when running the same experiment multiple times, with the same hyperparameters, different solutions were obtained. Such an inconsistency may not be critical for classification problems (for example[13, 15]), but it is critical for ODE/PDE problems. Implementation through Keras was abandoned as further customization of the code proved to be a daunting task.
The DNN implementation used in this paper was carried out in MATLAB and a BFGS routine with an Armijo line-search was used to solve the optimization problem. Unlike many neural network software packages, this code uses the framework and closely resembles code that is used to solve optimal control problems.
In this section a variety of results are presented that represent different approaches and architectures. The results will be grouped together by the type of data that was used to train the networks. Further details on the training data are provided in Appendix A. The loss function is as given in (9) with , the skip connection parameter (in (3)) , where is the number of hidden layers, and a constant time increment of . Unless otherwise specified, all training data was scaled by the method outlined in (7) and (8). All of the results presented below were achieved by using the MATLAB implementation of the ResNets described above. The blue curves in the plots below represent ‘true solution’ (data generated using CHEMKIN), while the dashed red curves represent DNN output.
4.1. Training with a single data set.
Using the first approach, a single ResNet was trained on data corresponding to temperature, 8 species, and . In Figure 2, the results from a ResNet with depth 9 and width 15 are shown. For this experiment only regularization has been used in the loss function. After being trained, the training data was used again to test the network, and these results can be seen in the left plots of Figure 2. The initial condition from the training data was marched forward in time as described in (10), and the results are shown on the right of Figure 2 compared with the actual .
4.2. Training data sets with varying equivalence ratio and fixed initial temperature.
Experimental results showed that while a single ResNet was able to handle data corresponding to a single initial condition, it struggled with data containing multiple initial conditions. To improve results, the parallel ResNet structure described in Section 2.2 was created. The parallel DNNs were trained on a data set containing multiple initial conditions with the same initial temperature but varying equivalence ratio (see Appendix A). Figure 3 shows the results for ResNets trained on this data without the use of validation data. Each ResNet has depth 4 and width 10. For this result, the testing data was a subset of the training data, and so these plots do not represent a generalization of the networks to unseen data. Figure 4 shows the result of an experiment trained with a configuration of data subsets from the same data set as Figure 3, (see Appendix A for specific details). Additionally, validation data was implemented during training with a patience of 250 iterations. The ResNets used for this experiment have varying depths (3 to 5) but a fixed width of 10. The plots in Figure 4 are the result of testing the DNNs on data that was not used in training. For these results mass conservation is also implemented in the sense that the sum of the densities of the 8 species is adjusted at the end of each update step (10) to be equal to that of the previous step. This is the only experiment presented in this work in which mass conservation is enforced.
The next result uses the second approach described in section 2.2, in which the ResNets are trained to learn the next time step rather than the right-hand-side of the ODE. For the results shown in Figure 5 parallel ResNets of depth 7 and width 50 were used. These ResNets were trained using validation data with a patience of 400. For the nine subplots on the left side of Figure 5, the ResNets are being used to predict only a single timestep. In other words, the red dashed curves are produced by plotting for all where comes from known data. The right nine subplots show where only is from known data. The plots on the left are included as evidence that the networks are close to learning the correct curves. The accumulation and propagation of errors when the quantities are marched in time can significantly affect the accuracy of the results as shown in the right plots. The results shown in Figure 5 come from data on which the ResNets were not trained.
In Figure 6, results are displayed in which the training data was log-scaled (i.e. where is used rather than in (7)). The ResNets used for these results all have depth 9 and width 20 and were trained using validation data with a patience of 400 iterations. The plots in Figure 6 were created using the same data set that was used to create the curves in Figure 5. From these results, and many others not shown here, it is unclear if log-scaling provides any improvement in accuracy.
4.3. Grouping training data based on equivalence ratio.
For the next results the training data is split into three groups: fuel lean (equivalence ratio ), fuel balanced (equivalence ratio between 0.1 and 2), and fuel rich (equivalence ratio ). See Appendix A for more details on these data sets. The results of an experiment using the second approach, where a different group of parallel ResNets is trained for each of the fuel-based data sets, are presented in Figure 7. In other words, there are 9 plots corresponding to results from ResNets trained only on fuel lean data sets, 9 plots for ResNets trained only on fuel balanced data sets, and 9 plots for ResNets trained only on fuel rich data sets. All ResNets have a depth of 6 and width of 30 and were trained using validation data with a patience of 400 iterations. Furthermore, these plots are presented with a logarithmically scaled -axis, as for some of the data the reactions happen very quickly. All of the plots in Figure 7 show results of marching in time given only an initial condition from known data that was not seen by the networks during training. These results show that the ResNets trained on fuel-based grouping is more successful than the previous attempts.
4.4. Training data sets with varying initial temperatures, but fixed equivalence ratio.
For all of the results shown in Figures 3 through 7 the ResNets were trained on data sets where the equivalence ratio was varied and there were at most two different initial temperatures (see Appendix A for more details). For this final result, parallel ResNets were trained on data sets that had initial conditions with the same equivalence ratio, but different initial temperatures. The results of an experiment using DNNs trained on this data can be seen in Figure 8. The 9 plots on the left show results from ResNets with a depth of 6 and width of 30. On the right, the results came from ResNets with different depths (3 to 8) all with width 30. All of the ResNets were trained with validation data with patience of 400 iterations.
5. Conclusions and Future Directions
A number of ResNet based approaches to approximate the stiff ODEs arising in chemical kinetics were developed. The experiments presented indicate that when the equivalence ratio is fixed and initial temperature is varied (this case has been considered in [25, 20]), the proposed DNNs can (almost) perfectly generalize to unseen data. This capacity for generalization deteriorates, however, when the initial temperature is kept fixed and the equivalence ratio is varied. In order to overcome this issue, the training data was separated into fuel lean, fuel balanced, and fuel rich based on the equivalence ratio. This approach has led to encouraging results.
There are several questions that are currently being investigated:
All DNNs are dependent on the quality of training data. The data used for experiments reported here were generated using CHEMKIN. In view of the high dimensionality of the data, and the nature of chemical reactions (large periods of inactivity followed by a sudden reaction followed by a convergence to a steady state), quality criteria need to be developed to make sure redundant data is avoided.
It will be interesting to see if further dividing the training data based on equivalence ratio is helpful.
The usefulness of training with noisy data is also being explored. For some classification problems, this approach is known to increase the robustness of ResNets.
It is of interest to see how the proposed approach generalizes to more reactions and species.
This work was supported by the Defense Threat
Reduction Agency (DTRA) under contract HDTRA1-15-1-0068.
Jacqueline Bell served as the technical monitor.
It is a pleasure to acknowledge a number of fruitful discussions with Drs. Adam Moses, Alisha J. Sharma and Ryan F. Johnson from the Naval Research Laboratory that lead to several improvements in the techniques discussed in this paper.
Novel deep neural networks for solving bayesian statistical inverse. arXiv preprint arXiv:2102.03974. Cited by: 1st item, §1, §2.1, §2.1.
-  (2020) Fractional deep neural network via constrained optimization. Machine Learning: Science and Technology. External Links: Cited by: §1, §2.1, §2.1.
-  (2004) Assessing maximum possible damage for contaminant release events. Engineering computations 21 (7), pp. 748–760. Cited by: §1.
-  (2018) . In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31, pp. . External Links: Cited by: §1.
-  (2021) Deep learning method based on physics informed neural network with resnet block for solving fluid flow problems. Water 13 (4). External Links: Cited by: §1.
-  (2015) Keras. Note: https://keras.io Cited by: §3.
-  (2020) STEER: simple temporal regularization for neural odes. arXiv preprint arXiv:2006.10711. External Links: Cited by: §1.
-  Hyperreduction of cfd models of turbulent flows using a machine learning approach. In AIAA Scitech 2020 Forum, pp. . External Links: Cited by: §1.
-  (2016) Deep residual learning for image recognition. In , Vol. , pp. 770–778. External Links: Cited by: §1.
-  (2020) Stiff-pinn: physics-informed neural network for stiff chemical kinetics. arXiv preprint arXiv:2011.04520. External Links: Cited by: §1.
-  (1990) Rate-controlled constrained-equilibrium theory of chemical reactions in complex systems. Progress in Energy and Combustion Science 16 (2), pp. 125–154. External Links: Cited by: §1.
-  (2000) CHEMKIN collection, release 3.6. Reaction Design, Inc., San Diego, CA. Cited by: §2.2.
-  (2017-05) ImageNet classification with deep convolutional neural networks. Commun. ACM 60 (6), pp. 84–90. External Links: Cited by: §3.
-  (1994) The csp method for simplifying kinetics. International Journal of Chemical Kinetics 26 (4), pp. 461–486. External Links: Cited by: §1.
-  (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. External Links: Cited by: §3.
-  (2005) Optimal placement of sensors for contaminant detection based on detailed 3d cfd simulations. Engineering computations 22 (3), pp. 260–273. Cited by: §1.
-  (2005-01) A directed relation graph method for mechanism reduction. Proceedings of the Combustion Institute 30, pp. 1333–1341. External Links: Cited by: §1.
-  (2020) Deep learning observables in computational fluid dynamics. Journal of Computational Physics 410, pp. 109339. External Links: Cited by: §1.
-  (1992) Simplifying chemical kinetics: intrinsic low-dimensional manifolds in composition space. Combustion and Flame 88 (3), pp. 239–264. External Links: Cited by: §1.
-  (2021) ChemNODE: a neural ordinary differential equations approach for chemical kinetics solvers. arXiv preprint arXiv:2101.04749. External Links: Cited by: §1, §5.
-  (2017) Efficient and accurate time-integration of combustion chemical kinetics using artificial neural networks. Cited by: 2nd item, §1.
-  (1999) Reduced kinetics mechanisms for ram accelerator combustion. Journal of Propulsion and Power 15 (4), pp. 591–600. External Links: Cited by: §2.2.
Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, pp. 686–707. External Links: Cited by: §1.
-  (2020) Deep neural networks motivated by partial differential equations. J. Math. Imaging Vision 62 (3), pp. 352–364. External Links: Cited by: §1, §2.1, §2.1, §2.1.
-  (2020-0181) Deep learning for scalable chemical kinetics. In AIAA Scitech 2020 Forum, pp. . External Links: Cited by: 2nd item, §1, §5.
-  (2010) Adjoint-based design of shock mitigation devices. International Journal for Numerical Methods in Fluids 64 (4), pp. 443–472. Cited by: §1.
-  (2010) A path flux analysis method for the reduction of detailed chemical kinetic mechanisms. Combustion and Flame 157 (7), pp. 1298–1307. External Links: Cited by: §1.
-  (2006-954) Numerical simulation of h2/air detonation using detailed reaction models. In 44th AIAA Aerospace Sciences Meeting and Exhibit, pp. . External Links: Cited by: §1.
-  (2004-10) Principal component analysis of kinetic models. International Journal of Chemical Kinetics 17, pp. 55 – 81. External Links: Cited by: §1.
-  Reduced models for chemical kinetics derived from parallel ensemble simulations of stirred reactors. In AIAA Scitech 2020 Forum, pp. . External Links: Cited by: §1.
Appendix A Description of the Training Data Sets
In this appendix the data sets that were used to train the DNNs and create the plots presented above are described in more detail. Specifically, information about the number of points, the equivalence ratio, and initial temperature of the individual data sets is provided. For the results in Figure 2, a single ResNet was trained with a set containing data from a single initial condition. This set has 4,999 points, and the initial condition has equivalence ratio 0.1 and initial temperature 1,200K. Once the network was trained on this data, it was tested with the same initial point from the set on which it was trained.
The parallel ResNets used to produce the plots in Figures 3 through 6 were trained on a data set with subsets that correspond to 9 different initial conditions, all with initial temperature 1,200K. The data is described further in Table 1, where the subsets are numbered for convenient reference. This set also contains the data used to create Figure 2 (set 3). After being trained on this entire data set, the ResNets that were used for the results in Figure 3 were then tested on set 5, and therefore these results do not reflect the ResNets generalizing to unseen data.
The subsets of data described in Table 1 are of different sizes. For the results in Figure 4 the differences in sizes was compensated by training with copies of the smaller sets. Specifically, the ResNets were trained with set 1, two copies of sets 3 and 9, and eight copies of sets 5 and 7. The ResNets were then tested by using the initial condition from set 6, and these are the results that are shown in Figure 4. To train the ResNets that produced the results in Figures 5 and 6 a single copy of sets 1, 3, 5, 7, and 9 from Table 1 was used. The trained networks were then tested on set 8. The difference between these two experiments are the architecture of the ResNets and the way that the training data was scaled prior to training. Also, as opposed to the experiments represented by Figures 3 and 4, the experiments that resulted in Figures 5 and 6 use the second approach, where the ResNets are being trained to learn the next timestep of the data. To create the training data in this situation, the same data is used as both components (input and output). This results in size of each subset being reduced by one, because there is no next timestep to learn at the final time.
|Number of points||8,000||6,000||4,999||1,999||999||999||999||1,999||3,999|
In Table 2 the data sets that have been separated into fuel lean, balanced, and rich sets are described. The first thing to notice about this data is that it contains the data sets from Table 1, where here the number of points is one less for the same reason described above. This data was used to create the results shown in Figure 7. The ResNets trained on fuel lean sets (top left set of 9 plots) were trained with subsets 1, 3, 4, 6, and 9, and then tested with the initial condition from set 8. The ResNets trained on the fuel balanced sets were trained on subsets 1, 3, 5, 7, 9, and 12, and subsequently tested with the initial condition from set 11. Finally the networks trained on fuel rich sets were trained with subsets 1, 2, 3, 5, and 7, and then tested with the initial condition from set 4.
|Fuel Lean Sets|
|Number of points||7,999||5,999||4,998||4,999||4,999||4,999||4,999||4,999||4,999|
|Fuel Balanced Sets|
|Number of points||1,998||998||998||998||4,999||4,999||4,999||4,999||4,999||4,999||4,999||4,999|
|Fuel Lean Sets|
|Number of points||1,998||3,998||4,999||4,999||4,999||4,999||4,999|
For the experiments that produced the plots in Figure 8, the training data consists of 13 subsets. Each subset has 499 points and an equivalence ratio of 1. Furthermore, each subset has a different initial temperature beginning with 1,200K and increasing by increments of 100K to 2,400K. For the experiment shown in Figure 8, the ResNets were trained with the sets corresponding to initial temperatures 1,200K, 1,500K, 1,800K, 2,100K, and 2,400K. The trained ResNets were then tested using the initial condition with temperature 2,200K.