During reactive transport simulations, the following three degrees of chemical reactivity behavior may be observed across space and over time: weak, moderate, and intense. For example, at relatively distant locations from where a fluid enters a medium, none or little reactivity may occur, potentially over long periods of time, if those regions were initially in chemical, thermal, and mechanical equilibrium. Eventually, these equilibrium conditions are gradually disrupted due to fluid flow, heat transfer, and chemical transport, causing some moderate chemical reactivity. Later, the introduced perturbations reach those once calm locations and cause relatively intense local chemical changes.
Interestingly, after such a wave of perturbations has passed, very often local chemical reactivity becomes weak once again. This implies that only a relatively small percentage of the entire medium is experiencing fast and substantial chemical changes at any moment in time. As a result, most chemical reaction calculations are performed for low-reactivity regions at every time step of the simulation. Even though reaction calculations are relatively faster in low-reactivity regions, compared to high-reactivity regions, their overall computational cost is typically 10–100 times greater than that for calculations of physical processes (e.g., mass transport, heat transfer). Thus, reactive transport simulations spend most of the computation time calculating chemical reactions and not transport processes, and typically require very long compute times. Hence, speeding up chemical calculations, without compromising accuracy, is crucial for significant performance gains in reactive transport simulations.
Whenever chemical reaction rates are considerably faster than rates of physical processes, the local chemical equilibrium assumption is a plausible and sufficient rate model for chemical reactions. In general, however, a combination of fast and slow reaction rates is present, so that the fast reactions are reasonably modeled employing the chemical equilibrium assumption and slow reaction rates are modeled using chemical kinetics. This approach is termed the local partial chemical equilibrium assumption [1, 2, 3, 4, 5, 6, 7]. Nevertheless, what needs to be noticed about these assumptions is that chemical equilibrium calculations are needed to model chemical processes in either case. In fact, such equilibrium calculations need to be performed at least once at every mesh node/cell during every transport time step. Thus, millions to billions of chemical equilibrium calculations tend to accumulate over the course of massive numerical reactive transport simulations, particularly when using fine-resolution meshes, large three-dimensional domains, and/or long simulation times. Hence, speeding up reactive transport modeling by a significant factor can only be accomplished by accelerating chemical reaction calculations in general, and chemical equilibrium calculations in particular. Here, we focus on accelerating chemical equilibrium calculations, whereas a companion paper (Leal et al. ) introduces a similar strategy to accelerate chemical kinetics calculations.
Chemical equilibrium calculations are computationally expensive. This is because (i) they involve the iterative solution of a system of non-linear algebraic equations, requiring at every iteration (ii) the evaluation of thermodynamic properties, such as activity coefficients, concentrations, activities, chemical potentials (e.g., using the Pitzer  model for aqueous solutions and the Peng and Robinson  model for gaseous solutions), and (iii) the solution of a system of linear equations with dimension on the order of either the number of chemical species or the number of chemical elements . Over several decades, major advances in developing fast, accurate, and robust methods for chemical equilibrium calculations have been made, either based on Gibbs energy minimization (GEM) or law of mass action (LMA) formulations [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 11, 34, 35, 36, 37, 38, 39, 40, 41]. However, even if one could devise an algorithm that would always converge in one single iteration, instead of the typical few to dozens of iterations, the computational cost of chemical equilibrium calculations would still be dominant among all other calculations. The question of utmost importance is, thus: Is it possible to calculate chemical equilibrium without actually solving the non-linear equations governing equilibrium, which requires several evaluations of thermodynamic models and as many solutions of systems of linear equations? In this paper, we demonstrate that this can, indeed, be achieved with exceptional speed and accuracy.
Our here-introduced smart chemical equilibrium algorithm operates like a human being. A human begins life with none to little problem solving skills. At an early age, the child is exposed to a variety of challenges and learns how to solve them, often with assistance from others. However, once the child has learned how to solve a specific problem, it can solve similar problems without requiring assistance. Solving a problem with the help of an already acquired skill is much faster than having to first learn how to solve that problem, since learning is a difficult and time-consuming process. As the child grows, it can solve more and more problems until, eventually, as an adult, external help is needed only occasionally, under exceptional circumstances, i.e., in situations that have rarely or never been encountered before. Importantly, in contrast to going to school, this way of “learning by doing” may be viewed as on-demand learning or training, a highly efficient way of learning as one only learns what is actually needed at least once.
The above metaphor illustrates some key features of our new, on-demand machine learning algorithm for fast chemical equilibrium calculations. Initially, the algorithm has no recollection of any past calculation. Thus, the first time the algorithm faces a new chemical equilibrium problem, it fully solves the problem and saves the inputs and outputs describing the calculated equilibrium state. This is, as discussed previously, a computationally expensive process, now considered as an act of learning. The next time the algorithm is employed, it recalls that a calculation has previously been performed and attempts to quickly predict the new chemical equilibrium state, using sensitivity derivatives to project the previously recorded equilibrium state into the new one. Then, the algorithm checks if the prediction is accurately acceptable within some given tolerance. If it is, then the predicted equilibrium state is accepted as a solution, otherwise a full chemical equilibrium calculation is performed and the corresponding inputs and outputs recorded to describe the newly learned equilibrium state. This way, the memory of past and fully solved equilibrium problems grows, permitting searches among all saved and learned problems to determine the one that is closest to the new equilibrium problem.
In contrast to traditional machine learning algorithms, which first require a time-consuming training phase, in which potentially many outcomes are calculated for problems that may later never occur, our new machine learning algorithm is only trained with problems that actually occur, relying on the recurrence of subsequent similar problems to quickly respond with an accurate prediction. Hence, traditional machine learning is equivalent to going to school, while our machine learning algorithm is more akin to on-demand learning or “learning by doing.” Furthermore, there is no clear criterion to decide, when the training phase of traditional machine learning algorithms is ideally terminated, i.e., when it is time to “leave school.”
The here-introduced smart chemical equilibrium algorithm is particularly useful for applications that require repetitive calculations, such as chemical equilibrium calculations during reactive transport modeling. For such applications, the algorithm can eventually achieve a knowledgeable state, so that, after some time steps during a reactive transport simulation, all equilibrium calculations can be rapidly and accurately performed using previously learned key chemical equilibrium states. Even if the smart algorithm occasionally faces some exceptional circumstances that require additional on-demand training (e.g., a sudden change in the chemistry or temperature of the fluid entering a region of the medium), one can still reasonably expect significant speedups with this machine-learning-accelerated algorithm, if the occasions during which learning is needed are considerably less frequent than the occasions during which the algorithm can make smart predictions.
The “intelligence” of the here-introduced algorithm is, thus, a combination of both the memory of already solved equilibrium problems, that needed to be solved anyway, and its ability to “learn” new ones on-demand. Physiologically speaking, the algorithm is like a brain that not only can record its experiences on solving chemical equilibrium problems, but can also make decisions about when it needs additional training, following its judgment on the accuracy/acceptability of an estimated chemical equilibrium outcome. This on-demand training is key to:
Performance: learning and keeping only what is needed results in compact storage and thus fast search operations when finding the closest past equilibrium problem already solved;
Accuracy: finer control on minimizing errors resulting from predicted equilibrium states by performing additional training whenever needed;
Convenience: neither a dedicated prior training stage nor anticipation of possible chemical conditions during the simulation are required;
Simplicity: users immediately benefit from high-performance reactive transport simulations without having to prepare any prior configuration and training of the smart chemical equilibrium solver.
Our on-demand training philosophy differs radically from most previous efforts in speeding up chemical reaction calculations in reactive transport simulations, which use conventional machine learning methods. Jatnieks et al. , for example, present the steps necessary to construct a so-called surrogate model for fast speciation calculations. The construction of this surrogate model requires a prior training phase, during which many random input conditions are used in a speciation solver (PHREEQC ) and the resultant outputs collected for statistical learning. During their numerical experiment, 32 different statistics and machine learning methods were tried to identify the potentially best one. For a specific reactive transport modeling problem, they collected all possible input-output combinations in speciation calculations, and from these combinations, 7880 random inputs were used for training the statistics model, which represented 80% of all collected input conditions. The various constructed surrogate models were subsequently used in a reactive transport simulation. Each surrogate model was constructed with different 7880 input-output samples.
Our new on-demand machine learning approach has clear advantages over conventional statistics-based machine learning methods. First, the use of sensitivity derivatives of the calculated equilibrium states results in a machine learning method that better understands the behavior of chemical systems regarding its reaction behavior following changes in the input equilibrium conditions. Secondly, the use of these sensitivity derivatives permits extrapolating and predicting new equilibrium states with much higher accuracy and confidence. Thirdly, it requires no statistical training before it can be applied in a reactive transport simulation, because its on-demand training characteristics allow it to spontaneously learn only what is needed during a reactive transport simulation to keep the predictions accurate enough. Furthermore, our approach is not only simpler from a user point of view, but also potentially faster, as it will save only a few input conditions compared to any statistical approach that can only get more accurate the more it knows a priori. Finally, our method produces outputs that always satisfy the mass conservation conditions of chemical elements and electric charge, since these constraints are incorporated into the calculation of the sensitivity derivatives (see Leal et al. ) and, thus, our method contrasts with statistics-based machine learning methods that can fail predicting equilibrium states that satisfy given mass balance conditions.
This communication is organized as follows. In Section 2, we introduce definitions and notation that are needed to describe the new algorithm rigorously. In Section 3, we formulate the smart chemical equilibrium method, providing details on the prediction of new equilibrium states from previously saved and learned states as well as on the acceptance criteria for the predicted equilibrium states. In Section 4, we compare the performance and accuracy of a reactive transport simulation using the here-proposed smart equilibrium algorithm against that using a conventional Newton-based equilibrium algorithm. Finally, we discuss in Section 5 the implications and conclusions of this study together with a planned road-map for further research efforts in this direction.
The here-introduced smart chemical equilibrium algorithm is implemented in Reaktoro [Leal2015b], a unified open-source framework for modeling chemically reactive systems (reaktoro.org).
2 Definitions and Notation
Consider a chemical system is a collection of chemical species composed of one or more components and distributed among one or more phases. The species can be substances such as aqueous ions (e.g., Na(aq), Cl(aq), HCO(aq)), neutral aqueous species (e.g., SiO2(aq), CO(aq), HO(l)), gases (e.g., CO(g), CH(g), N(g)), and pure condensed phases (e.g., CaCO(s, calcite), SiO(s, quartz), AlSiO(OH)(s, kaolinite)). The phases are each composed of one or more different chemical species with homogeneous properties within their boundaries; a phase can be aqueous, gaseous, liquid, solid solutions, a pure mineral, a plasma, etc. Note that substances with the same chemical formula, but in different phases, are distinct species (e.g., CO(aq) and CO(g) are different species). The components are chemical elements (e.g., H, O, C, Na, Cl, Ca, Si) and electrical charge (Z), but it can also be a linear combination of these, commonly known as primary species (e.g., H(aq), HO(l), CO(aq)). We shall use from now on the words elements and components interchangeably, with elements not only denoting chemical elements but also electrical charge .
A chemical system can exist at infinitely many chemical states. A chemical state is defined here as the triplet , where is temperature, is pressure, and
is the vector of molar amounts of the species, withdenoting the mole amount of the th species and N the number of species. If we denote by the vector of molar amounts of the elements, with denoting the amount of the th element and E the number of elements, then and are related through the following mass conservation equation:
where is the formula matrix of the chemical system , with denoting the coefficient of the th element in the th species, and thus a matrix with dimensions . Below is an example of a formula matrix for a simple chemical system, containing species and elements, distributed among two phases, namely an aqueous and a gaseous phase, with names of aqueous species suffixed with (aq), except HO(l), and gaseous species with (g):
In the mathematical presentation that follows, we assume, for convenience reasons, that the formula matrix , of the corresponding chemical system, is full-rank (i.e., the rows of are linearly independent) so that .
Let a chemical equilibrium calculation be represented as:
The chemical equilibrium function, , is an abstraction of the operations needed to solve the fundamental Gibbs energy minimization problem:
at prescribed conditions of temperature, , pressure, , and element amounts, . In this problem, one seeks that minimizes the total Gibbs energy function of the system:
subject to the elemental mass conservation constraint equations, , and the non-negativity constraints for the species amounts, . The chemical potential of the th species, , is defined as:
with denoting the universal gas constant, the standard chemical potential of the th species, and the activity of the th species. More details on how this chemical equilibrium problem can be solved, using either Gibbs energy minimization (GEM) or law of mass action (LMA) methods, are presented in Section A or in more detail in Leal et al. [32, 33, 11].
3.1 First-Order Taylor Approximation
Assume a chemical equilibrium calculation has been done previously with inputs , and the new chemical equilibrium calculation needs to be performed with inputs . Rather than computing using the computationally expensive chemical equilibrium function :
we want to try first to estimate using the following first-order Taylor approximation:
where , , and are sensitivity derivatives (with dimensions , , and , respectively) of the previous chemical equilibrium state. These sensitivity derivatives are measures of how the species amounts change when the previous chemical equilibrium state is perturbed by infinitesimal changes in temperature, pressure, and the amounts of elements. They can be equivalently written as , , and .
By using these sensitivity derivatives, we can quickly and accurately estimate all chemical equilibrium states in the vicinity of some previous, and fully known, chemical equilibrium state.
3.2 Acceptance Testing
Once a predicted chemical equilibrium state is calculated, one needs to test if it can be accepted. Because we are using a first-order Taylor approximation, we need to ensure that the new estimated chemical equilibrium state is not too far from the previous fully-calculated equilibrium state, used as a reference point. This could be done naively by checking how much the species amounts changed from one state to the other using the following test condition for all species:
which controls how much the absolute and relative changes in the new estimated species amounts, , can be tolerated for given absolute and relative tolerance parameters and , respectively.
However, there is a major disadvantage of the previous tolerance test: it does not “understand” the thermodynamic behavior of stable phases. Consider a chemical system with two stable phases: an aqueous solution saturated with a mineral. Clearly, adding more of that mineral to the system will not alter the composition of the fluid and just accumulate to a higher total amount of solids. Under such conditions, the acceptance test based on species amounts would fail, even though the estimated chemical equilibrium state, using sensitivity derivatives, could be done very accurately for any large amounts of added mineral. By adding 1000 moles of the mineral, and enforcing the constraint that, for example, not more than 10% of a species amount can vary from the previous state to the new estimated state, that acceptance test would trigger hundreds of unnecessary training equilibrium calculations if the initial mineral amount is 1 mol.
For the previous example, assuming the perturbation of the system is made only by adding the mineral, i.e., without changing temperature or pressure, there is one thermodynamic quantity that would remain constant: the chemical potentials of the species, . Because of this behavior, we use instead the following acceptance test in terms of chemical potentials:
where is the estimated, not evaluated, chemical potential of the th species at the new chemical equilibrium state:
where , , and are chemical potential derivatives (with dimensions , , and , respectively) evaluated at the previous equilibrium state.
Remark: For LMA methods, in which no access to standard chemical potentials, , exists in some cases (e.g., the thermodynamic database used only contains equilibrium constants of reactions), a similar, but not equivalent, test is the use of activities:
Note, however, that activities are in general less sensitive to temperature variations than standard chemical potentials and, thus, the above alternative acceptance test would be less rigorous than equation (10) and more indifferent towards temperature changes. To fix this problem, one could use the conversion approach detailed in Leal et al. , which permits apparent standard chemical potentials of the species to be calculated using equilibrium constants of reactions.
3.3 Machine Learning Chemical Equilibrium Algorithm
The machine learning chemical equilibrium algorithm (or, alternatively, smart chemical equilibrium algorithm) proposed here is capable of “remembering” past calculations and can make use of these calculations to quickly and accurately estimate new equilibrium states. Figure 1 is a flowchart that illustrates the main steps of the algorithm.
During its first chemical equilibrium calculation, with given conditions, the algorithm does so by solving the Gibbs energy minimization problem (4). This is represented here, using the computationally expensive equilibrium function, :
Once this is done, we need to save not only the input conditions , but also the corresponding, calculated equilibrium amounts of species, ; the sensitivity derivatives of this equilibrium state, , , and , needed by equation (8); and the derivatives of the chemical potentials, , , and , needed by equation (11). See Leal et al.  for a description of how to calculate these derivatives. By saving these state values, we will later be able to make fast and accurate predictions of all equilibrium states in the vicinity of the one just recorded.
The next time the algorithm is asked to solve a new equilibrium problem, it does so by first searching, among all saved equilibrium input conditions:
the input that is closest to the given one , where “closest” is measured here in the sense of the Euclidean norm (other norms can be implemented as well):
where is a scalar that measures the difference between and . Once the search is concluded, the previously calculated equilibrium state, corresponding to input conditions , can be used to estimate the equilibrium state corresponding to input conditions using equation (8).
Finally, it remains to check if the estimated equilibrium state is potentially accurate enough using the acceptance test defined by equation (10). If the test succeeds, then the equilibrium calculation ends. Otherwise, a complete chemical equilibrium calculation at conditions is performed, , and the corresponding sensitivity derivatives of the fully computed equilibrium state (, , and ) are evaluated, which are then finally saved for possible future use when estimating equilibrium states under conditions in the vicinity of the just recorded conditions.
We now present the use of the machine learning method for smart chemical equilibrium calculations in a reactive transport simulation and show how its performance compares with the use of a conventional chemical equilibrium algorithm. For details on how we solve the reactive transport equations, see Appendix B.
Figure 2 illustrates the reactive transport modeling carried out in this work to investigate the efficiency of the proposed smart chemical equilibrium calculations for sequential calculations. It shows the injection of an aqueous fluid resulting from the mixture of 1 kg of water with 0.90 moles of NaCl, 0.05 moles of MgCl, 0.01 moles of CaCl, and 0.75 moles of CO, in a state very close to CO saturation, and thus in an acidic state with a calculated pH of 4.65. The initial rock composition is 98% SiO(quartz) and 2% CaCO(calcite), with an initial porosity of 10%. The resident fluid is a 0.70 molal NaCl brine in equilibrium with the rock minerals, with a calculated pH of 10.0. The temperature and pressure of the fluids are 60 °C and 100 bar, respectively. The reactive transport modeling procedure assumes a constant fluid velocity of () and the same diffusion coefficient for all fluid species, without dispersivity.
The activity coefficients of the aqueous species are calculated using the HKF extended Debye-Hückel model [45, 46, 47, 48] for solvent water and ionic species, except for the aqueous species CO(aq), for which the Drummond  model is used. The standard chemical potentials of the species are calculated using the equations of state of Helgeson and Kirkham , Helgeson et al. , Tanger and Helgeson , Shock and Helgeson  and Shock et al. . The database file slop98.dat from the software SUPCRT92  is used to obtain the parameters for the equations of state. The equation of state of Wagner and Pruss  is used to calculate the density of water and its temperature and pressure derivatives. Kinetics of dissolution and precipitation of both calcite and dolomite is neglected in this particular example (i.e., the local equilibrium assumption is employed).
Figure 3 shows the volumes of the minerals calcite, CaCO, and dolomite, CaMg(CO), as well as the concentrations of aqueous species Ca(aq), Mg(aq), HCO(aq), CO(aq), and H(aq) along the rock core at three different simulation times: 10 minutes, 1 hour, and 20 hours. As calcite dissolves, Ca(aq) ions are released into the aqueous solution, which react with the incoming Mg(aq) ions from the left boundary to precipitate dolomite. After 10 minutes of injecting the CO-saturated brine, one observes a slight dissolution of calcite and a corresponding precipitation of dolomite. The injected CO-saturated brine increases the local concentrations of carbonic species, CO(aq) and HCO(aq). The local concentration of ions Ca(aq) also increase as a result of CaCO dissolution. The precipitated dolomite, however, is gradually dissolved, as the injection of the acidic CO-saturated fluid continues. This can be seen in Figure 3, in the left region of the rock core, where neither calcite nor dolomite are present after 20 hours of continuous fluid injection. Also after 20 hours of fluid injection, the Mg(aq) concentration drops sharply between core distances 0.1 m and 0.2 m, which is exactly where dolomite is currently precipitating.
Common to all sub-figures in Figure 3 is the use of solid curves to denote the reactive transport results using conventional, and thus computationally expensive, chemical equilibrium calculations throughout the simulation. These curves are used as a reference case for the results obtained using the here-proposed machine learning algorithm for smart chemical equilibrium calculations applied to the same reactive transport problem (see Section 3.3). Both the filled circles and the empty circles are used to mark the calculated mineral volumes and species concentrations in Figures 3 using the smart equilibrium algorithm accelerated with machine learning. However, while the filled circles represent occasions, where the machine learning algorithm was able to quickly and successfully predict an accurate equilibrium state, the empty circles represent occasions in which the smart algorithm required on-demand training to learn a new chemical equilibrium problem. These on-demand learning occasions happen at different mesh cells, either at the same time step or at different ones. The on-demand training operation is needed, since the machine learning algorithm is faced with an equilibrium problem for which no accurate-enough prediction can be made. We remark that our proposed machine learning algorithm, for smart chemical equilibrium calculations, does not need to know any spatial or temporal information, although this could be explored for faster search operations in the future.
The machine learning algorithm starts without any recollection of previous chemical equilibrium calculations. It is initially trained during two occasions: when calculating the initial equilibrium state between the resident fluid and the rock minerals and when calculating the equilibrium state of the injected fluid at the left boundary. We can see in Figure 3 that during the first time step of the simulation, from time 0 to 10 minutes, the smart equilibrium algorithm was able to accurately estimate the equilibrium states in most mesh cells (see the filled circles ). All these successful quick estimates were done using only one out of the two initial saved equilibrium states: the initial equilibrium state of fluid species and rock minerals.
As the reactive fluid is injected inside the rock core, this promotes strong compositional changes in both resident fluid and rock minerals. Because of this, we can see that the machine learning algorithm required during the first time step, at 10 minutes, on-demand training in the first 12 cells (see the empty circles ). As the perturbation fronts move down the rock core, additional training is performed as needed to fulfill a given accuracy criterion (using ). As seen in Figure 3, at 1h, or after 6 time steps, where strong concentration changes are occurring between 0.1 m and 0.2 m, there were 4 mesh cells that required full and expensive chemical equilibrium calculations for both accuracy and on-demand learning purposes. Eventually, the smart equilibrium algorithm gains enough knowledge about the recurring equilibrium states during the simulation, so that it is then able to quickly and accurately perform all equilibrium calculations without further training, as seen after 20h of fluid injection, or after 120 time steps (when only filled circles are present).
Figure 4 shows how many on-demand training occasions were needed at each time step. During each of these training operations, a full chemical equilibrium calculation, using a conventional Newton-based algorithm, as presented in Leal et al. , is performed and the sensitivity derivatives of the equilibrium state are calculated for learning reasons. Figure 4 shows some relatively intense learning activities during the first 1000 minutes of simulated time (or 100 time steps) as expected, since the machine learning algorithm is constantly facing new equilibrium challenges in the beginning of its life. During the first time step, 12 on-demand training operations are needed, exactly in the first 12 mesh cells, in a mesh containing 100 cells (see empty circles in Figure 4 at 10 minutes). This implies that during the first time step, the machine learning algorithm was not yet trained enough to accurately predict the equilibrium states in those cells. During the second time step, 7 additional on-demand training operations are carried out and, as the simulation continues, only 1–2 additional training cases happen subsequently per time step, until the smart chemical equilibrium algorithm no longer requires any further learning (after about 258 time steps of 10 minutes of simulated time each). At this point, the machine learning algorithm is able to quickly and accurately predict all subsequent equilibrium states in every mesh cell, at every time step. For the reactive transport modeling problem chosen here, from the total of 60,000 equilibrium calculations needed, only 181 of these were actually performed using the expensive Newton-based chemical equilibrium algorithm, so that 99.7% of all such calculations were quickly performed using the proposed machine learning algorithm for fast chemical equilibrium calculations.
Figure 5 compares the computational cost per time step for conventional and smart chemical equilibrium calculations with the cost for transport calculations. The computational costs are measured in CPU time (in seconds). For the equilibrium calculations, it is the CPU time taken to calculate the equilibrium states of all mesh cells in a time step. For the transport calculations, it is the CPU time taken to solve the algebraic transport equations (see Appendix B). Figure 5 shows that the cost for full, conventional chemical equilibrium calculations is about two orders of magnitude greater than the cost for transport calculations. It also shows that the computational cost for the machine learning algorithm, for smart equilibrium calculations, is greater than that of transport calculations at the beginning, but less than that for conventional Newton-based equilibrium calculations. After the initial on-demand training phase of the machine-learning-accelerated chemical equilibrium algorithm, it can be seen in Figure 5 that its computational cost stabilizes and remains below the cost of transport calculations. To note, typically, full chemical equilibrium calculations require by far the most computing time during reactive transport simulations, as also seen in Figure 5 and as stated in the introduction.
Figure 6 compares the cumulative computational costs of conventional and machine-learning-accelerated chemical equilibrium calculations with the cost of transport calculations. After several time steps (about 200 or at about 2000 minutes of simulated time in the figure), the cumulative computational costs for all these calculations increase at a more or less constant rate. However, it can be seen that the increment rate of the cumulative costs for the conventional equilibrium calculations, using a Newton-based method, is considerably higher than that for transport (about 45 times higher) and than that for smart, machine-learning-accelerated equilibrium calculations (about 63 times higher).
Figure 7 shows the speedup during chemical equilibrium calculations, at each time step of the simulation, promoted by the use of the smart chemical equilibrium algorithm. The speedup at a given time step is calculated as the ratio of the CPU time needed for conventional equilibrium calculations and the CPU time for smart equilibrium calculations, performed in all mesh cells at that time step. At the beginning, during a more active on-demand training phase by the smart chemical equilibrium solver, the speedup oscillates, achieving a maximum of about 125. The oscillation is a combination of many factors, such as the need to occasionally perform full equilibrium calculations for learning purposes and the fact that the strongest compositional changes occur during the first time steps when the injected fluid perturbs its initial equilibrium. As the machine learning chemical equilibrium algorithm continues to learn how to solve different equilibrium problems, it gets to a stage that is able to quickly predict all new equilibrium states. When this happens, the speedup stabilizes at about 63 (or ), as shown in Figure 7, at later time steps.
We remark that this speedup is strongly dependent on how fast we can search for the previous equilibrium problem, closest to the one being solved. Currently, we are using a naive approach, during which all past equilibrium inputs, , are saved in a linked list data structure. This data structure is not optimal for nearest-neighbor search algorithms, a common procedure in computer science, such as in machine learning and computer graphics applications, because it requires each previous input to be compared against the new equilibrium input . As a result, this particular search algorithm has complexity , where K is the number of saved inputs. A more suitable data structure is a kd-tree, which would split the input space into several orthogonal partitions, permitting a search algorithm that succeeds, on average, after about comparisons. For example, instead of 256 comparisons using a linked list, we would need only with a balanced kd-tree. Future optimization of our machine learning equilibrium method will make use of kd-trees or other fast search algorithms.
5 Conclusions and Future Work
We present a straightforward, smart, machine-learning chemical equilibrium algorithm that calculates equilibrium states in reactive transport simulations 60–125 times faster than a conventional Newton-based algorithm. The reason for this speedup is that the smart equilibrium algorithm employs a new concept of an unconventional, supervised machine learning method, with an on-demand training strategy, rather than the more common training-in-advance approach. Our new algorithm is capable of “remembering” past equilibrium calculations and performing rapid predictions of new equilibrium states, whenever the new equilibrium problem is sufficiently close to some previously solved one. These machine-learning-accelerated chemical equilibrium calculations are able to dramatically speedup reactive transport simulations, in which chemical reaction calculations are otherwise typically responsible for over 90% of all computation costs.
Our proposed machine learning algorithm differs from previous machine learning strategies to speedup chemical equilibrium calculations, because here, learning is carried out on-demand, i.e., during the actual simulation, rather than during an initial, time-consuming training phase. Consequently, our method requires no a priori insights of possible chemical conditions that may occur during the actual simulation, a problem inherent to standard, i.e., statistics-based, machine learning algorithms. These initial training sessions of standard machine learning algorithms require large data-sets of input-output information, which are sometimes randomly obtained. Furthermore, there are no clear criteria for determining when such training sessions are sufficient and can be terminated to begin with the actual simulation of interest. Finally, traditional, statistically-based machine learning algorithms applied to chemical equilibrium calculations neither understand the thermodynamic behavior of stable phases in chemical systems nor can they straightforwardly predict chemical equilibrium states that satisfy given mass conservation constraints.
The presented smart equilibrium algorithm contains only three steps, which are computationally relatively cheap to perform. The first step performs a lookup of past equilibrium problems that are closest to the new one. The second step is a simple algebraic calculation to estimate the new equilibrium state from the previous one, more specifically, two vector additions (of dimension ) and a matrix-vector multiplication (the matrix with dimension , the vector with dimension E1), where N and E are the number of species and elements, respectively. The third step is to check the acceptance criterion, which is also a relatively cheap compute operation. Hence, the smart approximation is able to quickly bypass extremely expensive operations, when solving chemical equilibrium problems, such as (i) the evaluation of thermodynamic properties of all species (e.g., activities)and (ii) the solution of systems of linear equations.
Since equilibrium calculations are carried out iteratively, these computationally expensive operations are performed once each iteration and, thus, several times. Even if convergence could always be established in only one iteration, this would still not cause a substantial improvement in the performance of reactive transport simulations, given the high cost of those operations. In contrast, the proposed smart equilibrium algorithm skips all these iterations, and their intrinsic expensive operations, and is instead able to immediately predict an accurate equilibrium state for similar problems. The new algorithm, thus, has the potential to substantially accelerate reactive transport simulations, as, indeed, demonstrated for the modeling problem in Section 4.
Potential future work and improvements of our machine learning algorithm for rapid chemical equilibrium calculations could include:
the use of kd-trees for faster lookup operations;
the investigation of the smart chemical equilibrium algorithm for more complex chemical systems in more heterogeneous and complex reactive transport problems per time step;
the use of alternative acceptance criteria to further avoid unnecessary on-demand training operations;
the use of GPUs for massively parallel smart predictions of thousands to millions of equilibrium states;
the extension of the machine learning strategy, presented here, to chemical kinetics calculations and other problems in science and engineering;
the application of garbage collector ideas, used in programming languages, such as Python and Java, to eliminate all previously saved equilibrium states that have not been used for some time;
the implementation of strategies that rely not only on the nearest previous saved equilibrium state, but up to a certain number (e.g., the two or three closest equilibrium problems solved previously) to possibly avoid the triggering of on-demand training operations and also to improve accuracy by combining the sensitivity derivatives in different nearby states;
incorporate spatial and temporal information in the machine learning algorithm and test if the computation cost of search operations can be decreased with it.
The above roadmap of future investigations shows how much a new machine learning algorithm for rapid chemical reaction calculations can yet be improved in addition to what we have already shown here. We believe that the use of machine-learning-accelerated algorithms for chemical reaction calculations is crucial for a significant acceleration of complex reactive transport simulations, and it is essential for a substantial decrease of the overall computational costs of chemical calculations, which so far have been responsible for 90–99% of all computing costs in large-scale numerical simulations with intricate chemistry representation.
- Ramshaw  Ramshaw, J. D. (1980). Partial Chemical Equilibrium in Fluid Dynamics. Physics of Fluids, 23(4):675–680.
- Ramshaw and Cloutman  Ramshaw, J. D. and Cloutman, L. D. (1981). Numerical method for partial equilibrium flow. Journal of Computational Physics, 39(2):405–417.
- Ramshaw and Amsden  Ramshaw, J. D. and Amsden, A. A. (1985). Improved iteration scheme for partial equilibrium flow. Journal of Computational Physics, 59(3):484–489.
- Ramshaw and Chang  Ramshaw, J. and Chang, C. (1995). Iteration Scheme for Implicit Calculations of Kinetic and Equilibrium Chemical Reactions in Fluid Dynamics. Journal of Computational Physics, 116(2):359–364.
- Lichtner  Lichtner, P. C. (1985). Continuum model for simultaneous chemical reactions and mass transport in hydrothermal systems. Geochimica et Cosmochimica Acta, 49(3):779–800.
- Steefel and Lasaga  Steefel, C. I. and Lasaga, A. C. (1994). A coupled model for transport of multiple chemical species and kinetic precipitation/dissolution reactions with applications to reactive flow in single phase hydrothermal systems. American Journal of Science, 294(5):529–592.
- Steefel and MacQuarrie  Steefel, C. I. and MacQuarrie, K. T. B. (1996). Approaches to modeling of reactive transport in porous media. Reviews in Mineralogy and Geochemistry, 34(1):83–129.
- Leal et al.  Leal, A. M., Kulik, D. A., and Saar, M. O. (2017). Ultra-Fast Reactive Transport Simulations When Chemical Reactions Meet Machine Learning: Chemical Kinetics.
- Pitzer  Pitzer, K. S. (1973). Thermodynamics of electrolytes. I. Theoretical basis and general equations. The Journal of Physical Chemistry, 77(2):268–277.
- Peng and Robinson  Peng, D.-Y. and Robinson, D. B. (1976). A New Two-Constant Equation of State. Industrial & Engineering Chemistry Fundamentals, 15(1):59–64.
- Leal et al.  Leal, A. M. M., Kulik, D. A., Smith, W. R., and Saar, M. O. (2017). An overview of computational methods for chemical equilibrium and kinetic calculations for geochemical and reactive transport modeling. Pure and Applied Chemistry, 89(5):597–643.
- White et al.  White, W. B., Johnson, S. M., and Dantzig, G. B. (1958). Chemical Equilibrium in Complex Mixtures. . J. Chem. Phys., 28:751.
- Smith  Smith, W. R. (1980). The Computation of Chemical Equilibria in Complex Systems. Industrial & Engineering Chemistry Fundamentals, 19(1):1–10.
- Smith and Missen  Smith, W. and Missen, R. (1982). Chemical reaction equilibrium analysis: theory and algorithms. Wiley-Interscience, New York.
- Alberty  Alberty, R. A. (1992). Equilibrium calculations on systems of biochemical reactions at specified pH and pMg. Biophysical Chemistry, 42(2):117–131.
- Crerar  Crerar, D. A. (1975). A method for computing multicomponent chemical equilibria based on equilibrium constants. Geochimica et Cosmochimica Acta, 39(10):1375–1384.
- de Capitani and Brown  de Capitani, C. and Brown, T. H. (1987). The computation of chemical equilibrium in complex systems containing non-ideal solutions. Geochimica et Cosmochimica Acta, 51(10):2639–2652.
- Eriksson and Thompson  Eriksson, G. and Thompson, W. (1989). A procedure to estimate equilibrium concentrations in multicomponent systems and related applications. Calphad, 13(4):389–400.
- Ghiorso  Ghiorso, M. S. (1994). Algorithms for the estimation of phase stability in heterogeneous thermodynamic systems. Geochimica et Cosmochimica Acta, 58(24):5489–5501.
- McBride and Gordon  McBride, B. J. and Gordon, S. (1971). Computer program for calculation of complex chemical equilibrium compositions rocket performance incident and reflected shocks, and Chapman-Jouguet detonations. Technical report, NASA.
- McBride and Gordon  McBride, B. J. and Gordon, S. (1994). Computer Program for Calculation of Complex Chemical Equilibrium Compositions and Applications: I. Analysis. Technical report.
- McBride and Gordon  McBride, B. J. and Gordon, S. (1996). Computer Program for Calculation of Complex Chemical Equilibrium Compositions and Applications: II-User Manual and Program Description. Technical report, NASA.
Harvey et al. 
Harvey, J. P., Eriksson, G., Orban, D., and Chartrand, P. (2013).
Global minimization of the gibbs energy of multicomponent systems involving the presence of order/disorder phase transitions.American Journal of Science, 313(3):199–241.
- Harvie et al.  Harvie, C. E., Greenberg, J. P., and Weare, J. H. (1987). A chemical equilibrium algorithm for highly non-ideal multiphase systems: Free energy minimization. Geochimica et Cosmochimica Acta, 51(5):1045–1057.
- Karpov et al.  Karpov, I. K., Chudnenko, K. V., and Kulik, D. A. (1997). Modeling chemical mass transfer in geochemical processes: Thermodynamic relations, conditions of equilibria, and numerical algorithms. American Journal of Science, 297(8):767–806.
- Karpov et al.  Karpov, I. K., Chudnenko, K. V., Kulik, D. A., Avchenko, O. V., and Bychinski, V. A. (2001). Minimization of Gibbs free energy in geochemical systems by convex programming. Geochemistry International, 39(11):1108–1119.
- Karpov et al.  Karpov, I. K., Chudnenko, K. V., Kulik, D. A., and Bychinskii, V. A. (2002). The convex programming minimization of five thermodynamic potentials other than Gibbs energy in geochemical modeling. American Journal of Science, 302(4):281–311.
- Koukkari and Pajarre  Koukkari, P. and Pajarre, R. (2011). A Gibbs energy minimization method for constrained and partial equilibria. Pure and Applied Chemistry, 83(6):1243–1254.
- Kulik et al.  Kulik, D. A., Wagner, T., Dmytrieva, S. V., Kosakowski, G., Hingerl, F. F., Chudnenko, K. V., and Berner, U. R. (2013). GEM-Selektor geochemical modeling package: revised algorithm and GEMS3K numerical kernel for coupled simulation codes. Computational Geosciences, 17(1):1–24.
- Leal et al.  Leal, A. M. M., Blunt, M. J., and LaForce, T. C. (2013). A robust and efficient numerical method for multiphase equilibrium calculations: Application to CO2-brine-rock systems at high temperatures, pressures and salinities. Advances in Water Resources, 62(Part C):409–430.
- Leal et al.  Leal, A. M., Blunt, M. J., and LaForce, T. C. (2014). Efficient chemical equilibrium calculations for geochemical speciation and reactive transport modelling. Geochimica et Cosmochimica Acta, 131:301–322.
- Leal et al. [2016a] Leal, A. M., Kulik, D. A., and Kosakowski, G. (2016a). Computational methods for reactive transport modeling: A Gibbs energy minimization approach for multiphase equilibrium calculations. Advances in Water Resources, 88:231–240.
- Leal et al. [2016b] Leal, A. M., Kulik, D. A., Kosakowski, G., and Saar, M. O. (2016b). Computational methods for reactive transport modeling: An extended law of mass-action, xLMA, method for multiphase equilibrium calculations. Advances in Water Resources, 96:405–422.
- Morel and Morgan  Morel, F. and Morgan, J. (1972). Numerical method for computing equilibriums in aqueous chemical systems. Environmental Science & Technology, 6(1):58–67.
- Néron et al.  Néron, A., Lantagne, G., and Marcos, B. (2012). Computation of complex and constrained equilibria by minimization of the Gibbs free energy. Chemical Engineering Science, 82:260–271.
- Nordstrom et al.  Nordstrom, D., Plummer, L. N., Wigley, T., Wolery, T., Ball, J., Jenne, E., Bassett, R., Crerar, D., Florence, T., Fritz, B., Hoffman, M., Holdren, G., Jr., L. G., Mattigod, S., McDuff, R., Morel, F., Reddy, M., and Sposito, G. (1979). A comparison of computerized chemical models for equilibrium calculations in aqueous systems. American Chemical Society Symposium Series, 93:857–892.
- Trangenstein  Trangenstein, J. A. (1987). Customized minimization techniques for phase equilibrium computations in reservoir simulation. Chemical Engineering Science, 42(12):2847–2863.
- van Zeggeren and Storey  van Zeggeren, F. and Storey, S. H. (1970). The Computation of Chemical Equilibria. Cambridge University Press, London, England.
- Voňka and Leitner  Voňka, P. and Leitner, J. (1995). Calculation of chemical equilibria in heterogeneous multicomponent systems. Calphad, 19(1):25–36.
- Wolery and Walters  Wolery, T. J. and Walters, L. J. (1975). Calculation of equilibrium distributions of chemical species in aqueous solutions by means of monotone sequences. Journal of the International Association for Mathematical Geology, 7(2):99–115.
- Zeleznik and Gordon  Zeleznik, F. J. and Gordon, S. (1960). An analytical investigation of the general methods of calculating chemical equilibrium compositions. (NASA-TN-D-473).
- Jatnieks et al.  Jatnieks, J., De Lucia, M., Dransch, D., and Sips, M. (2016). Data-driven Surrogate Model Approach for Improving the Performance of Reactive Transport Simulations. Energy Procedia, 97:447–453.
- Parkhurst and Appelo  Parkhurst, D. and Appelo, C. (2013). Description of input and examples for PHREEQC version 3—A computer program for speciation, batch-reaction, one-dimensional transport, and inverse geochemical calculations. In Groundwater Book 6, Modeling Techniques, chapter A43, page 497. U.S. Geological Survey Techniques and Methods.
- Leal et al.  Leal, A. M., Kulik, D. A., and Saar, M. O. (2016). Enabling Gibbs energy minimization algorithms to use equilibrium constants of reactions in multiphase equilibrium calculations. Chemical Geology, 437:170–181.
- Helgeson and Kirkham [1974a] Helgeson, H. C. and Kirkham, D. H. (1974a). Theoretical prediction of the thermodynamic behavior of aqueous electrolytes at high pressures and temperatures: I. Summary of the thermodynamic/electrostatic properties of the solvent. American Journal of Science, 274(10):1089–1198.
- Helgeson and Kirkham [1974b] Helgeson, H. C. and Kirkham, D. H. (1974b). Theoretical prediction of the thermodynamic behavior of aqueous electrolytes at high pressures and temperatures: II. Debye-Huckel parameters for activity coefficients and relative partial molal properties. American Journal of Science, 274(10):1199–1261.
- Helgeson and Kirkham  Helgeson, H. C. and Kirkham, D. H. (1976). Theoretical prediction of the thermodynamic properties of aqueous electrolytes at high pressures and temperatures: III. Equation of state for aqueous species at infinite dilution. American Journal of Science, 276(2):97–240.
- Helgeson et al.  Helgeson, H. C., Kirkham, D. H., and Flowers, G. C. (1981). Theoretical prediction of the thermodynamic behavior of aqueous electrolytes at high pressures and temperatures: IV. Calculation of activity coefficients, osmotic coefficients, and apparent molal and standard and relative partial molal properties to 600 C. American Journal of Science, 281(10):1249–1516.
- Drummond  Drummond, S. (1981). Boiling and mixing of hydrothermal fluids: chemical effects on mineral precipitation. Ph.d., Pennsylvania State University.
- Helgeson et al.  Helgeson, H. C., Delany, J. M., Nesbitt, H. W., and Bird, D. K. (1978). Summary and critique of the thermodynamic properties of rock-forming minerals. American Journal of Science, 278 A(1):229.
- Tanger and Helgeson  Tanger, J. C. and Helgeson, H. C. (1988). Calculation of the thermodynamic and transport properties of aqueous species at high pressures and temperatures; revised equations of state for the standard partial molal properties of ions and electrolytes. American Journal of Science, 288(1):19–98.
- Shock and Helgeson  Shock, E. and Helgeson, H. C. (1988). Calculation of the thermodynamic and transport properties of aqueous species at high pressures and temperatures: Correlation algorithms for ionic species and equation of state predictions to 5 kb and 1000°C. Geochimica et Cosmochimica Acta, 52(8):2009–2036.
- Shock et al.  Shock, E. L., Oelkers, E. H., Johnson, J. W., Sverjensky, D. A., and Helgeson, H. C. (1992). Calculation of the thermodynamic properties of aqueous species at high pressures and temperatures. Effective electrostatic radii, dissociation constants and standard partial molal properties to 1000 °C and 5 kbar. Journal of the Chemical Society, Faraday Transactions, 88(6):803.
- Johnson et al.  Johnson, J. W., Oelkers, E. H., and Helgeson, H. C. (1992). SUPCRT92: A software package for calculating the standard molal thermodynamic properties of minerals, gases, aqueous species, and reactions from 1 to 5000 bar and 0 to 1000 C. Computers & Geosciences, 18(7):899–947.
- Wagner and Pruss  Wagner, W. and Pruss, A. (2002). The IAPWS Formulation 1995 for the Thermodynamic Properties of Ordinary Water Substance for General and Scientific Use. Journal of Physical and Chemical Reference Data, 31(2):387.
- Nocedal and Wright  Nocedal, J. and Wright, S. J. (1999). Numerical Optimization. Springer, 2nd edition.
- Fletcher  Fletcher, R. (2000). Practical Methods of Optimization. Wiley, 2nd edition.
Appendix A Chemical Equilibrium Equations
The solution of the Gibbs energy minimization problem in equation (4) needs to satisfy the following Karush–Kuhn–Tucker (KKT) conditions, or first-order optimality conditions, for a local minimum of the Gibbs energy function [56, 57]:
where and are introduced Lagrange multipliers that need to be solved along with the species amounts . For more details about these Lagrange multipliers and their interpretation as well as instructions on how to efficiently solve these equations, see Leal et al. [32, 11].
The previous chemical equilibrium equations can be written in an extended law of mass action (xLMA) formulation as:
following the use of the extended law of mass action equations:
associated with the M linearly independent chemical reactions among the N chemical species in equilibrium:
where is the vector of equilibrium constants of the reactions, with denoting the equilibrium constant of the th reaction, and the stoichiometric matrix of these chemical reactions, with denoting the stoichiometric coefficient of the th species in the th reaction with the convention that is positive if the th species is a product in the th reaction, and negative if a reactant. Moreover, is the vector of species stability factors that need to be solved along with the species amounts . These factors are introduced to ensure that the extended law of mass action equations (26) are valid even when some species in their corresponding reactions are unstable at equilibrium (i.e., when a species belongs to a phase that is absent from equilibrium). When all species are stable at equilibrium, it follows that and the xLMA equations reduce to the conventional LMA equations:
The number of linearly independent chemical reactions among the N species in equilibrium is , where , and thus whenever the formula matrix is full rank, and . For more information on how to solve these equations and how they are related to the conventional law of mass action equations, see Leal et al. [33, 11].
Appendix B Reactive Transport Equations
The fundamental mass conservation equations for both fluid and solid species are:
where and are the bulk concentration of the th fluid and solid species (in mol/m), respectively; is the fluid pore velocity (in m/s); is the diffusion coefficient of the fluid species (in m/s); and are the rates of production/consumption of the th fluid and solid species (in mol/s), respectively, due to chemical reactions; and and are the number of fluid and solid species, respectively. Note that the above equations assume a single fluid phase and common diffusion coefficients for all fluid species.
By partitioning the species as fluid and solid species, the formula matrix, can be conveniently represented as:
where and are the formula matrices of the fluid and solid partitions (i.e., the matrices constructed from the columns of corresponding to fluid and solid species). The concentrations of elements in both fluid and solid partitions, and , can then be calculated from the species concentrations in the same partition using:
Recall that the rates of production of the species, and , are exclusively due to chemical reactions. Let denote the rate of production/consumption of the th species in the system, and not in a fluid/solid partition. From the mass conservation condition for the elements (chemical elements and electrical charge), it follows that:
which is the mathematical statement for the fact that elements are neither created nor destroyed during chemical reactions. We can combine this result with equations (29) and (30) to derive the following conservation equations for the elements:
Assume that all species, fluid and solid, are in local chemical equilibrium everywhere, at all times. One can then perform operator splitting steps to solve the fundamental mass conservation equations (29) and (30) to calculate the concentrations of the species, , over time. Let denote the current time step and the time step length used in the discretization of the time derivative terms. The operator splitting steps at the th time step are:
update the concentrations of the elements in the fluid partition using:
where is the known concentration of the th element at time step in the fluid partition and is its unknown concentration at the next time step. Note that an implicit scheme is assumed for both advection and diffusion rates.
update the total concentrations of the elements:
calculate the concentrations of the species, , in each mesh cell, using a conventional or the proposed smart chemical equilibrium algorithm. For this, use the local temperature and pressure values together with the updated local concentrations of elements, as inputs.