Novel DNNs for Stiff ODEs with Applications to Chemically Reacting Flows

04/01/2021 ∙ by Thomas S. Brown, et al. ∙ George Mason University 0

Chemically reacting flows are common in engineering, such as hypersonic flow, combustion, explosions, manufacturing processes and environmental assessments. For combustion, the number of reactions can be significant (over 100) and due to the very large CPU requirements of chemical reactions (over 99 number of flow and combustion problems are presently beyond the capabilities of even the largest supercomputers. Motivated by this, novel Deep Neural Networks (DNNs) are introduced to approximate stiff ODEs. Two approaches are compared, i.e., either learn the solution or the derivative of the solution to these ODEs. These DNNs are applied to multiple species and reactions common in chemically reacting flows. Experimental results show that it is helpful to account for the physical properties of species while designing DNNs. The proposed approach is shown to generalize well.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Chemically reacting flows are common in many fields of engineering, such as hypersonic flow, combustion, explosions, manufacturing processes, and environmental assessments [28, 26, 3, 16]. For hydrocarbon combustion and explosions the numbers of species and reactions can reach into hundreds and thousands respectively. Even with so-called reduced models [29, 11, 19, 14, 17, 27], which try to keep the main species and reactions while neglecting those that are not important, typically over 100 reactions need to be updated. An expedient (and widely used) way to compute flows with chemical reactions is to separate the advection and diffusion of species from the actual reactions. In this way, the vastly different timescales of the reactants can be treated in a separate, stiff ODE solver. Such chemical reaction solvers take the given species at the time step and desired timestep and update the species to . In terms of a ‘black box solver’ this implies either:




where stands for the ODE integrator of chemical reactions and is the right-hand side of the system. This is the formulation to solve the system of ODEs numerically using the standard Euler method. Compared to a typical ‘cold’ flow case, the presence of these chemical reactions implies an increase of computing requirements that can exceed factors of 1:100, i.e. 2 orders of magnitude. This makes many of these flow simulations so expensive that entire classes of problems have been sidelined, waiting for faster computers to be developed in years to come. The goal here is to replace the ‘black box’ solvers (equivalently the functions and ) given in (1) and (2) by novel, robust Deep Neural Networks (DNNs) without sacrificing accuracy.

The list of references on using DNNs in computational fluid dynamics (CFD) is growing fast, see for example, [18, 8]. However, the results on using DNNs in chemical kinetics are scarce. A popular approach to solve PDEs and ODEs is through the use of so-called Physics-Informed Neural Networks (PINNs) [23, 5]. The goal of PINNs is to minimize the PDE/ODE residual by using a neural network as a PDE/ODE solution Ansatz. The inputs to the network are space-time variables and all the derivatives are computed using automatic differentiation. See [10] for an example of a PINN for stiff ODE systems where the only input is time.

The approach presented here fundamentally differs from the aforementioned approaches. Instead of Physics-Informed-Neural-Networks, the goal is to pursue Learn-from-Physics/Chemistry. For instance in (1) and (2), DNNs will be used to learn and from a given dataset coming from physics/chemistry simulations. Such an approach to learn has also been considered recently in [21, 25, 20] where the authors employ an architecture that is motivated by standard feed forward networks. The authors of [30]

consider a similar problem, but use an autoencoder, which is a type of DNN used to reduce the dimension of the system. Notice that the proposed approach will allow for the chemical reactions described by (

1) and (2) to start at any point in time without knowing a reference time. The latter is crucial for the application under consideration.

The DNNs used in this paper have been motivated by the Residual Neural Network (ResNet) architecture. ResNets have been introduced in [9, 24, 2] in the context of data/image classification, see also [1] for parameterized PDEs and [7] where the (related) so-called Neural ODE Nets [4]

have been used to solve stiff ODEs. The ResNet architecture is known to overcome the vanishing gradient problem, which has been further analyzed using fractional order derivatives in

[2]. The key feature of a ResNet is that in the continuum limit, it becomes an optimization problem constrained by an ODE. Such a continuous representation further enables the analysis of the stability of the network using nonlinear ODE theory. In addition, standard approaches from optimization, such as the Lagrangian approach can be used to derive the sensitivities with respect to the unknown weights and biases.

The main novelties in the DNNs presented in this work are the following:

  • These networks allow for learning both the solution (see (1)) and difference quotients (see (2)). A similar approach to learn the solution in the context of parameterized PDEs has been recently considered in [1].

  • Motivated by chemically reacting flows, the goal is to create networks that can learn multiple reactions propagating multiple species. To accomplish this task, parallel ResNets are constructed where the data corresponding to multiple quantities is used as input for each network but the output is only a single species. Similar approaches for chemical kinetics can be found in [21, 25], where the authors use standard feed forward networks.

  • The proposed DNNs are applied to non-trivial chemically reacting flows. Experimental results show that it is helpful to know the underlying properties of the species while designing the networks.

The remainder of the paper is organized as follows: In Section 2, the DNNs used to approximate systems of type (1) and (2) are introduced, and training strategies are discussed. Section 3 provides some information on the implementation of the method. Several experimental results are discussed in Section 4. Additional information on the data being used in the experiments is provided in Appendix A.

2. Deep Residual Neural Networks

2.1. Problem Formulation

Consider an input-to-output map

where could be or in (1) and (2), respectively. The goal is to learn an approximation of using Deep Neural Networks (DNNs). In particular, the proposed DNNs are motivated by Deep Residual Neural Networks (ResNets). See [24, 2] for examples of ResNets for classification problems and [1] for an application to parameterized PDEs.

Given a training dataset , the DNN approximation of is given by the output of the DNN (ResNet)


i.e., . Here is a given parameter called the skip connection parameter. The nonlinear function

denotes an activation function. In this study, the activation function is taken to be a smooth quadratic approximation of the ReLU function, i.e.,

Figure 1 (left) shows that is a good approximation of ReLU. Here, and in the experiments below, is taken to be 0.1. The number of layers (depth of the network) is denoted by .

Figure 1. Left: Smooth ReLU and its derivative for . Right: Example of a typical Deep ResNet used in the experiments.

Layer 0 is referred to as the input layer, layers 1 through as hidden layers and layer as the output layer.

The quantities denote the weights and biases, i.e. the parameters that need to be determined. In this setting is a matrix and

is a vector, together they introduce an affine transformation. If

for , then and . The dimension is also referred to as the width of the -th layer. Notice that for in (3), it is assumed that , but this can be easily generalized, see [24]. Nevertheless, in the current setup the dimension of the input and output can be different.

An example of a typical DNN is shown in Figure 1

(right) with depth 3, width 8, input dimension 10, and output dimension 1. The values of the weights are given by the color of the lines connecting the neurons, the values of the biases are given by the color of the neurons, and the color of the input layer has been set to zero.

The question remains of how to compute the weights and biases in order to obtain a good approximation of . Following [24, 2], these weights are computed by solving


where , that is, in (3) is taken to be the training data . The problem (4) is a constrained optimization problem. To solve this problem, the gradient of with respect to and

needs to be evaluated. This requires introducing the so-called adjoint equation (also known as back-propagation in machine learning literature). See

[24, 2, 1] for complete details. Having obtained the gradient, an approximate Hessian is computed via the BFGS routine. Both the gradient and the approximate Hessian are used to solve the minimization problem.

2.2. Proposed Approach and Applications

As stated earlier, the goal is to replace and in (1) and (2), respectively by DNNs. The continuous ODE is given by


The simplest time discretization of (5) is given by


for . At first, the right-hand-side of (6) is approximated. This is referred to as the first approach. In this case, the training data (input and output) is . In the second approach, the map to be approximated is , i.e., the setting of (1). Here marches forward in time with time increments given by . The training data for this approach is given by .


The training data in the examples below has been computed using CHEMKIN [12]. Before training the networks, the data is scaled in the following manner: For a data set corresponding to a single quantity (e.g. temperature) the mean

and standard deviation

of the set are computed. Each entry is scaled as


Using this new centered and normalized set, the training data is obtained by defining


so that the resulting data set lies in the interval . Given that chemical reactions follow an exponential Arrhenius-type law, a logarithmic scaling was also tried. This implies performing the above scaling on the dataset instead of .


For the DNNs implemented below, the input dimension will always be given by

The output dimension is , which corresponds to temperature plus the number of species.

Rather than using a single DNN to approximate the entire solution map , in many cases a parallel ResNet architecture is implemented in which a separate ResNet is created to correspond to each desired output. With this structure the output dimension for each ResNet is 1, but there are ResNets implemented in parallel. The inputs to all of the parallel ResNets are the same, and so the parallel architecture can also be thought of as a single large ResNet (with ) that is not fully connected.

Loss Function and Training.

In the case of a parallel ResNet architecture, each parallel network is associated to a separate loss function. The same form of the loss function is used for each network. Letting

represent the concatenation of all weight and bias parameters associated to the -th parallel network, the loss function takes the form


In other words, the process of training each network is the process of finding the parameters which minimize the mean squared error between the network output and the training data, while also using both and regularizations to penalize the size of the parameters. As indicated in section 2, a gradient based method (BFGS in particular) is used to solve the constrained optimization problem.


DNNs are trained with validation data in order to overcome the overfitting issue. The validation data is a subset of the training data that is not used to compute the weights and biases. Instead, a separate loss function is formed that computes the mean squared error between the ResNet output and the validation data. This is a way to test how the ResNet performs on unseen data during the training process itself. If the validation error increases, the training continues for a certain number of additional iterations, called the patience. During this time, if the validation error decreases to a new minimum, the patience resets to zero and training continues. If the validation error does not attain a new minimum and the full number of patience iterations is reached, then the training process is terminated.


After training and validation comes the testing phase in which the DNN approximations are used to carry out time marching. Using the first approach (see above) this involves implementing an Euler time-stepping commonly used to numerically solve ODEs, with the usual right-hand-side of the ODE replaced by the DNN output, that is,


where is known data. For the second approach, results are tested by using the following update step for , where again is known.


The above methods are applied to a system of ODEs that model hydrogen-oxygen reaction. In particular, the reduced hydrogen-air reaction model with 8 species and 18 reactions [22] is used. This model is simpler than the hydrocarbon reaction model mentioned in section 1. However, it can still capture several essential key features. More complicated DNNs which can handle over 100 reactions will be part of future work.

3. Implementation

To start, the second approach

was implemented in Keras


using stochastic gradient descent. Libraries such as Keras have proven to be highly successful for classification problems, but their success in physics based modeling is still limited. Initially, it was observed that the results produced by using Keras were unable to capture the ‘pulse’ in chemical reactions. After gaining access to some of the Keras code, and seeing some improvement, the results were still highly inconsistent. For example, when running the same experiment multiple times, with the same hyperparameters, different solutions were obtained. Such an inconsistency may not be critical for classification problems (for example

[13, 15]), but it is critical for ODE/PDE problems. Implementation through Keras was abandoned as further customization of the code proved to be a daunting task.

The DNN implementation used in this paper was carried out in MATLAB and a BFGS routine with an Armijo line-search was used to solve the optimization problem. Unlike many neural network software packages, this code uses the framework and closely resembles code that is used to solve optimal control problems.

4. Results

In this section a variety of results are presented that represent different approaches and architectures. The results will be grouped together by the type of data that was used to train the networks. Further details on the training data are provided in Appendix A. The loss function is as given in (9) with , the skip connection parameter (in (3)) , where is the number of hidden layers, and a constant time increment of . Unless otherwise specified, all training data was scaled by the method outlined in (7) and (8). All of the results presented below were achieved by using the MATLAB implementation of the ResNets described above. The blue curves in the plots below represent ‘true solution’ (data generated using CHEMKIN), while the dashed red curves represent DNN output.

4.1. Training with a single data set.

Using the first approach, a single ResNet was trained on data corresponding to temperature, 8 species, and . In Figure 2, the results from a ResNet with depth 9 and width 15 are shown. For this experiment only regularization has been used in the loss function. After being trained, the training data was used again to test the network, and these results can be seen in the left plots of Figure 2. The initial condition from the training data was marched forward in time as described in (10), and the results are shown on the right of Figure 2 compared with the actual .

Figure 2. A comparison between the actual and the learned difference quotient (left 9 plots) and the results of marching in time as in (10) (right 9 plots) using a Deep ResNet of depth 9 and width 15.

4.2. Training data sets with varying equivalence ratio and fixed initial temperature.

Experimental results showed that while a single ResNet was able to handle data corresponding to a single initial condition, it struggled with data containing multiple initial conditions. To improve results, the parallel ResNet structure described in Section 2.2 was created. The parallel DNNs were trained on a data set containing multiple initial conditions with the same initial temperature but varying equivalence ratio (see Appendix A). Figure 3 shows the results for ResNets trained on this data without the use of validation data. Each ResNet has depth 4 and width 10. For this result, the testing data was a subset of the training data, and so these plots do not represent a generalization of the networks to unseen data. Figure 4 shows the result of an experiment trained with a configuration of data subsets from the same data set as Figure 3, (see Appendix A for specific details). Additionally, validation data was implemented during training with a patience of 250 iterations. The ResNets used for this experiment have varying depths (3 to 5) but a fixed width of 10. The plots in Figure 4 are the result of testing the DNNs on data that was not used in training. For these results mass conservation is also implemented in the sense that the sum of the densities of the 8 species is adjusted at the end of each update step (10) to be equal to that of the previous step. This is the only experiment presented in this work in which mass conservation is enforced.

Figure 3. A comparison between the actual and the learned difference quotient (left 9 plots), and the results of marching in time as in (10) (right 9 plots) using parallel Deep ResNets of depth 4 and width 10.
Figure 4. A comparison between the actual and the learned difference quotient (left 9 plots), and the results of marching in time as in (10) (right 9 plots) using parallel Deep ResNets of different depths and width 10, trained with validation data with patience of 250 iterations.

The next result uses the second approach described in section 2.2, in which the ResNets are trained to learn the next time step rather than the right-hand-side of the ODE. For the results shown in Figure 5 parallel ResNets of depth 7 and width 50 were used. These ResNets were trained using validation data with a patience of 400. For the nine subplots on the left side of Figure 5, the ResNets are being used to predict only a single timestep. In other words, the red dashed curves are produced by plotting for all where comes from known data. The right nine subplots show where only is from known data. The plots on the left are included as evidence that the networks are close to learning the correct curves. The accumulation and propagation of errors when the quantities are marched in time can significantly affect the accuracy of the results as shown in the right plots. The results shown in Figure 5 come from data on which the ResNets were not trained.

Figure 5. Results from using parallel ResNets to predict a single step (left 9 plots) and marching in time from an initial condition (9 right plots) using DNNs with a depth of 7 and a width of 50.

In Figure 6, results are displayed in which the training data was log-scaled (i.e. where is used rather than in (7)). The ResNets used for these results all have depth 9 and width 20 and were trained using validation data with a patience of 400 iterations. The plots in Figure 6 were created using the same data set that was used to create the curves in Figure 5. From these results, and many others not shown here, it is unclear if log-scaling provides any improvement in accuracy.

Figure 6. Predicting a single timestep (left 9 plots) and marching in time (right 9 plots) from ResNets trained on log-scaled data. The ResNets used here have a depth of 9 and a width of 20, and were trained with validation data with patience of 400 iterations.

4.3. Grouping training data based on equivalence ratio.

For the next results the training data is split into three groups: fuel lean (equivalence ratio ), fuel balanced (equivalence ratio between 0.1 and 2), and fuel rich (equivalence ratio ). See Appendix A for more details on these data sets. The results of an experiment using the second approach, where a different group of parallel ResNets is trained for each of the fuel-based data sets, are presented in Figure 7. In other words, there are 9 plots corresponding to results from ResNets trained only on fuel lean data sets, 9 plots for ResNets trained only on fuel balanced data sets, and 9 plots for ResNets trained only on fuel rich data sets. All ResNets have a depth of 6 and width of 30 and were trained using validation data with a patience of 400 iterations. Furthermore, these plots are presented with a logarithmically scaled -axis, as for some of the data the reactions happen very quickly. All of the plots in Figure 7 show results of marching in time given only an initial condition from known data that was not seen by the networks during training. These results show that the ResNets trained on fuel-based grouping is more successful than the previous attempts.

Figure 7. Results from marching initial conditions forward in time using ResNets trained on fuel lean data (top left), fuel balanced (top right), and fuel rich (bottom) data sets. The ResNets used have a depth of 6 and a width of 30, and were trained with validation data with patience of 400 iterations.

4.4. Training data sets with varying initial temperatures, but fixed equivalence ratio.

For all of the results shown in Figures 3 through 7 the ResNets were trained on data sets where the equivalence ratio was varied and there were at most two different initial temperatures (see Appendix A for more details). For this final result, parallel ResNets were trained on data sets that had initial conditions with the same equivalence ratio, but different initial temperatures. The results of an experiment using DNNs trained on this data can be seen in Figure 8. The 9 plots on the left show results from ResNets with a depth of 6 and width of 30. On the right, the results came from ResNets with different depths (3 to 8) all with width 30. All of the ResNets were trained with validation data with patience of 400 iterations.

Figure 8. Results from ResNets trained on data that varies the initial temperature. The plots on the left were produced from ResNets with a depth of 6 and a width of 30, and were trained with validation data with patience of 400 iterations. The plots on the right came from ResNets with varying depths all with a width of 30, and were trained with validation data with patience of 400 iterations. All plots show the results of marching in time (known data in blue, ResNet results in red).

5. Conclusions and Future Directions

A number of ResNet based approaches to approximate the stiff ODEs arising in chemical kinetics were developed. The experiments presented indicate that when the equivalence ratio is fixed and initial temperature is varied (this case has been considered in [25, 20]), the proposed DNNs can (almost) perfectly generalize to unseen data. This capacity for generalization deteriorates, however, when the initial temperature is kept fixed and the equivalence ratio is varied. In order to overcome this issue, the training data was separated into fuel lean, fuel balanced, and fuel rich based on the equivalence ratio. This approach has led to encouraging results.

There are several questions that are currently being investigated:

  • All DNNs are dependent on the quality of training data. The data used for experiments reported here were generated using CHEMKIN. In view of the high dimensionality of the data, and the nature of chemical reactions (large periods of inactivity followed by a sudden reaction followed by a convergence to a steady state), quality criteria need to be developed to make sure redundant data is avoided.

  • It will be interesting to see if further dividing the training data based on equivalence ratio is helpful.

  • The usefulness of training with noisy data is also being explored. For some classification problems, this approach is known to increase the robustness of ResNets.

  • It is of interest to see how the proposed approach generalizes to more reactions and species.

6. Acknowledgements

This work was supported by the Defense Threat Reduction Agency (DTRA) under contract HDTRA1-15-1-0068. Jacqueline Bell served as the technical monitor.
It is a pleasure to acknowledge a number of fruitful discussions with Drs. Adam Moses, Alisha J. Sharma and Ryan F. Johnson from the Naval Research Laboratory that lead to several improvements in the techniques discussed in this paper.


  • [1] H. Antil, H. C. Elman, A. Onwunta, and D. Verma (2021)

    Novel deep neural networks for solving bayesian statistical inverse

    arXiv preprint arXiv:2102.03974. Cited by: 1st item, §1, §2.1, §2.1.
  • [2] H. Antil, R. Khatri, R. L. Lohner, and D. Verma (2020) Fractional deep neural network via constrained optimization. Machine Learning: Science and Technology. External Links: Link Cited by: §1, §2.1, §2.1.
  • [3] F. Camelli and R. Löhner (2004) Assessing maximum possible damage for contaminant release events. Engineering computations 21 (7), pp. 748–760. Cited by: §1.
  • [4] R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud (2018)

    Neural ordinary differential equations

    In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31, pp. . External Links: Link Cited by: §1.
  • [5] C. Cheng and G. Zhang (2021) Deep learning method based on physics informed neural network with resnet block for solving fluid flow problems. Water 13 (4). External Links: Link, ISSN 2073-4441, Document Cited by: §1.
  • [6] F. Chollet et al. (2015) Keras. Note: Cited by: §3.
  • [7] A. Ghosh, H. S. Behl, E. Dupont, P. H. S. Torr, and V. Namboodiri (2020) STEER: simple temporal regularization for neural odes. arXiv preprint arXiv:2006.10711. External Links: 2006.10711 Cited by: §1.
  • [8] S. J. Grimberg and C. Farhat Hyperreduction of cfd models of turbulent flows using a machine learning approach. In AIAA Scitech 2020 Forum, pp. . External Links: Document, Link, Cited by: §1.
  • [9] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    Vol. , pp. 770–778. External Links: Document Cited by: §1.
  • [10] W. Ji, W. Qiu, Z. Shi, S. Pan, and S. Deng (2020) Stiff-pinn: physics-informed neural network for stiff chemical kinetics. arXiv preprint arXiv:2011.04520. External Links: 2011.04520 Cited by: §1.
  • [11] J. C. Keck (1990) Rate-controlled constrained-equilibrium theory of chemical reactions in complex systems. Progress in Energy and Combustion Science 16 (2), pp. 125–154. External Links: ISSN 0360-1285, Document, Link Cited by: §1.
  • [12] R. J. Kee, F. M. Rupley, J.A. Miller, M.E. Coltrin, J.F. Grcar, E. Meeks, H.K. Moffat, A.E. Lutz, G. Dixon-Lewis, M.D. Smooke, J. Warnatz, G.H. Evans, R. S. Larson, R.E. Mitchell, L.R. Petzold, W.C. Reynolds, M. Caracotsios, W.E. Stewart, P. Glarborg, C. Wang, and O. Adigun (2000) CHEMKIN collection, release 3.6. Reaction Design, Inc., San Diego, CA. Cited by: §2.2.
  • [13] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2017-05) ImageNet classification with deep convolutional neural networks. Commun. ACM 60 (6), pp. 84–90. External Links: ISSN 0001-0782, Link, Document Cited by: §3.
  • [14] S. H. Lam and D. A. Goussis (1994) The csp method for simplifying kinetics. International Journal of Chemical Kinetics 26 (4), pp. 461–486. External Links: Document, Link, Cited by: §1.
  • [15] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. External Links: Document Cited by: §3.
  • [16] R. Löhner and F. Camelli (2005) Optimal placement of sensors for contaminant detection based on detailed 3d cfd simulations. Engineering computations 22 (3), pp. 260–273. Cited by: §1.
  • [17] T. Lu and C. Law (2005-01) A directed relation graph method for mechanism reduction. Proceedings of the Combustion Institute 30, pp. 1333–1341. External Links: Document Cited by: §1.
  • [18] K. O. Lye, S. Mishra, and D. Ray (2020) Deep learning observables in computational fluid dynamics. Journal of Computational Physics 410, pp. 109339. External Links: ISSN 0021-9991, Document, Link Cited by: §1.
  • [19] U. Maas and S.B. Pope (1992) Simplifying chemical kinetics: intrinsic low-dimensional manifolds in composition space. Combustion and Flame 88 (3), pp. 239–264. External Links: ISSN 0010-2180, Document, Link Cited by: §1.
  • [20] O. Owoyele and P. Pal (2021) ChemNODE: a neural ordinary differential equations approach for chemical kinetics solvers. arXiv preprint arXiv:2101.04749. External Links: 2101.04749 Cited by: §1, §5.
  • [21] W. Y. Peng and N. H. Pinkowski (2017) Efficient and accurate time-integration of combustion chemical kinetics using artificial neural networks. Cited by: 2nd item, §1.
  • [22] E. L. Petersen and R. K. Hanson (1999) Reduced kinetics mechanisms for ram accelerator combustion. Journal of Propulsion and Power 15 (4), pp. 591–600. External Links: Document, Link, Cited by: §2.2.
  • [23] M. Raissi, P. Perdikaris, and G. E. Karniadakis (2019)

    Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations

    J. Comput. Phys. 378, pp. 686–707. External Links: ISSN 0021-9991, Document, Link, MathReview Entry Cited by: §1.
  • [24] L. Ruthotto and E. Haber (2020) Deep neural networks motivated by partial differential equations. J. Math. Imaging Vision 62 (3), pp. 352–364. External Links: ISSN 0924-9907, Document, Link, MathReview Entry Cited by: §1, §2.1, §2.1, §2.1.
  • [25] A. J. Sharma, R. F. Johnson, D. A. Kessler, and A. Moses (2020-0181) Deep learning for scalable chemical kinetics. In AIAA Scitech 2020 Forum, pp. . External Links: Document, Link, Cited by: 2nd item, §1, §5.
  • [26] A. Stück, F. F. Camelli, and R. Löhner (2010) Adjoint-based design of shock mitigation devices. International Journal for Numerical Methods in Fluids 64 (4), pp. 443–472. Cited by: §1.
  • [27] W. Sun, Z. Chen, X. Gou, and Y. Ju (2010) A path flux analysis method for the reduction of detailed chemical kinetic mechanisms. Combustion and Flame 157 (7), pp. 1298–1307. External Links: ISSN 0010-2180, Document, Link Cited by: §1.
  • [28] F. Togashi, R. Lohner, and N. Tsuboi (2006-954) Numerical simulation of h2/air detonation using detailed reaction models. In 44th AIAA Aerospace Sciences Meeting and Exhibit, pp. . External Links: Document, Link, Cited by: §1.
  • [29] S. Vajda, P. Valko, and T. Turányi (2004-10) Principal component analysis of kinetic models. International Journal of Chemical Kinetics 17, pp. 55 – 81. External Links: Document Cited by: §1.
  • [30] P. Zhang, R. Sankaran, M. Stoyanov, D. Lebrun-Grandie, and C. E. Finney Reduced models for chemical kinetics derived from parallel ensemble simulations of stirred reactors. In AIAA Scitech 2020 Forum, pp. . External Links: Document, Link, Cited by: §1.

Appendix A Description of the Training Data Sets

In this appendix the data sets that were used to train the DNNs and create the plots presented above are described in more detail. Specifically, information about the number of points, the equivalence ratio, and initial temperature of the individual data sets is provided. For the results in Figure 2, a single ResNet was trained with a set containing data from a single initial condition. This set has 4,999 points, and the initial condition has equivalence ratio 0.1 and initial temperature 1,200K. Once the network was trained on this data, it was tested with the same initial point from the set on which it was trained.

The parallel ResNets used to produce the plots in Figures 3 through 6 were trained on a data set with subsets that correspond to 9 different initial conditions, all with initial temperature 1,200K. The data is described further in Table 1, where the subsets are numbered for convenient reference. This set also contains the data used to create Figure 2 (set 3). After being trained on this entire data set, the ResNets that were used for the results in Figure 3 were then tested on set 5, and therefore these results do not reflect the ResNets generalizing to unseen data.

The subsets of data described in Table 1 are of different sizes. For the results in Figure 4 the differences in sizes was compensated by training with copies of the smaller sets. Specifically, the ResNets were trained with set 1, two copies of sets 3 and 9, and eight copies of sets 5 and 7. The ResNets were then tested by using the initial condition from set 6, and these are the results that are shown in Figure 4. To train the ResNets that produced the results in Figures 5 and 6 a single copy of sets 1, 3, 5, 7, and 9 from Table 1 was used. The trained networks were then tested on set 8. The difference between these two experiments are the architecture of the ResNets and the way that the training data was scaled prior to training. Also, as opposed to the experiments represented by Figures 3 and 4, the experiments that resulted in Figures 5 and 6 use the second approach, where the ResNets are being trained to learn the next timestep of the data. To create the training data in this situation, the same data is used as both components (input and output). This results in size of each subset being reduced by one, because there is no next timestep to learn at the final time.

Set 1 2 3 4 5 6 7 8 9
Number of points 8,000 6,000 4,999 1,999 999 999 999 1,999 3,999
Equivalence eatio 0.01 0.05 0.1 0.25 0.5 1 2 5 10
Initial temperature 1,200 1,200 1,200 1,200 1,200 1,200 1,200 1,200 1,200
Table 1. Description of the training data used to produce Figures 3 through Figure 6.

In Table 2 the data sets that have been separated into fuel lean, balanced, and rich sets are described. The first thing to notice about this data is that it contains the data sets from Table 1, where here the number of points is one less for the same reason described above. This data was used to create the results shown in Figure 7. The ResNets trained on fuel lean sets (top left set of 9 plots) were trained with subsets 1, 3, 4, 6, and 9, and then tested with the initial condition from set 8. The ResNets trained on the fuel balanced sets were trained on subsets 1, 3, 5, 7, 9, and 12, and subsequently tested with the initial condition from set 11. Finally the networks trained on fuel rich sets were trained with subsets 1, 2, 3, 5, and 7, and then tested with the initial condition from set 4.

Fuel Lean Sets
Set 1 2 3 4 5 6 7 8 9
Number of points 7,999 5,999 4,998 4,999 4,999 4,999 4,999 4,999 4,999
Equivalence ratio 0.01 0.05 0.1 0.01 0.02 0.04 0.06 0.08 0.1
Initial temperature 1,200 1,200 1,200 1,500 1,500 1,500 1,500 1,500 1,500
Fuel Balanced Sets
Set 1 2 3 4 5 6 7 8 9 10 11 12
Number of points 1,998 998 998 998 4,999 4,999 4,999 4,999 4,999 4,999 4,999 4,999
Equivalence ratio 0.25 0.5 1 2 0.2 0.4 0.5 0.75 0.9 1 1.5 2
Initial temperature 1,200 1,200 1,200 1,200 1,500 1,500 1,500 1,500 1,500 1,500 1,500 1,500
Fuel Lean Sets
Set 1 2 3 4 5 6 7
Number of points 1,998 3,998 4,999 4,999 4,999 4,999 4,999
Equivalence ratio 5 10 2.5 3 3.5 4.5 5
Initial temperature 1,200 1,200 1,500 1,500 1,500 1,500 1,500
Table 2. Description of the training data with fuel-based grouping that was used for the results in Figure 7.

For the experiments that produced the plots in Figure 8, the training data consists of 13 subsets. Each subset has 499 points and an equivalence ratio of 1. Furthermore, each subset has a different initial temperature beginning with 1,200K and increasing by increments of 100K to 2,400K. For the experiment shown in Figure 8, the ResNets were trained with the sets corresponding to initial temperatures 1,200K, 1,500K, 1,800K, 2,100K, and 2,400K. The trained ResNets were then tested using the initial condition with temperature 2,200K.