Physics-based models of dynamical systems are often used to study engineering and environmental systems. Despite their extensive use, these models have several well-known limitations due to incomplete or inaccurate representations of the physical processes being modeled. Given rapid data growth due to advances in sensor technologies, there is a tremendous opportunity to systematically advance modeling in these domains by using machine learning (ML) methods. However, direct application of black-box ML models to a scientific problem encounters several major challenges. First, in the absence of adequate information about the physical mechanisms of real-world processes, ML approaches are prone to false discoveries and can also exhibit serious inconsistencies with known physics. This is because scientific problems often involve complex spaces of hypotheses with non-stationary relationships among the variables that are difficult to capture solely from the data. Second, black-box ML models suffer from poor interpretability since they are not explicitly designed for representing physical relationships and providing mechanistic insights. Third, the data available for several scientific problems are far smaller than what than what is needed to effectively train advanced ML models. Leveraging physics will be key to constrain hypothesis spaces to do ML in such small sample regimes. Hence, neither an ML-only nor a physics-only approach can be considered sufficient for knowledge discovery in complex scientific and engineering applications. Instead, there is a need to explore the continuum between physics-based and ML models, where both physics and data are integrated in a synergistic manner. Next we outline issues involved in building such a hybrid model that is already beginning to show great promise.
In science and engineering applications, a physical model often predicts values of many different variables. Machine learning models can also generate predictions for many different variables (e.g., by having multiple nodes in the output layer of a neural network). Most ML algorithms make use of a loss function that captures the difference between predicted and actual (i.e., observed values) to guide the search for parameter values that attempts to minimize this loss function. Although, such empirical models are often used in many scientific communities as alternatives to physical models, they fail to take in to account many physical aspects of modeling. In the following we list some of these.
In science and engineering applications, all errors (i.e., difference between predicted and observed values) may not be equally important. For example, for the lake lake temperature monitoring application, accuracy at surface and at high depth can be more important than error at the middle levels of the lake.
Instead of minimizing the difference between predicted and observed values, it may be more important to optimize the prediction of a different physical quantity, which can be computed from the observed or predicted values. For example, for certain lake temperature monitoring applications, the ability to correctly predict the depth of thermocline (i.e., depth at which temperature gradient is maximum) can be more important than correctly predicting the temperature profile at all depths.
Values of different variables predicted by a science and engineering model may have certain relationships (guided by physical laws) across space and time. For example, in the lake temperature monitoring application, predicted values of the temperature at different depths should be such that denser water is at lower depth (note that water is heaviest at 4 degree centigrade). As another example, changes in temperature profile across time involves transfer of energy and mass across different layers of a lake that must be conserved according to physical laws.
In this paper, we propose a novel framework, Physics-Guided Recurrent Neural Networks (PGRNN) that can incorporate many of these physical aspects by designing non-standard loss functions and new architectures. We motivate and illustrate these ideas in the context of monitoring temperature and water quality in lakes, but they are applicable to a broad range of science and engineering problems.
Ii Physics-Guided Recurrent Neural Networks
Ii-a Long-Short Term Memory
We first briefly describe the structure of the Long-Short Term Memory (LSTM) model. Given the input
at every time step, the LSTM model generates hidden representation/embeddingsat every time step, which are then used for prediction. In essense, the LSTM model defines a transition relationship for hidden representation through an LSTM cell, which takes the input of features at the current time step and also the inherited information from previous time steps.
Each LSTM cell contains a cell state , which serves as a memory and allows the hidden units to reserve information from the past. The cell state is generated by combining , , and the input features at . Hence, the transition of cell state over time forms a memory flow, which enables the modeling of long-term dependencies. Specifically, we first generate a new candidate cell state by combining and into a function, as follows:
where and denote the weight parameters used to generate candidate cell state. Hereinafter we omit the bias terms as they can be absorbed into weight matrices. Then we generate a forget gate layer , an input gate layer , and an output gate layer, as:
Then we compute the new cell state and the hidden representation as:
Ii-B Hybrid-physics-data Model
Then we construct a hybrid model in two steps. First, we propose to integrate the predicted outputs
from physics-based models as input to the LSTM model. If the goal of the traditional data science model is to learn a mapping, the hybrid model can be represented as , where is the output from the physics-based model. In physics-based models, the use of by itself may provide an incomplete representation of the target variable due to simplified or missing physics. By including as part of the input for the data science model, we aim to fill in the gap between and true observations while maintaining the physical knowledge in .
Second, we use to refine the training loss for the time steps with missing observations. The effective learning of LSTM requires frequently collected data. However, real-world observations can be missing or noisy on certain dates. Therefore, the use of on those missing dates can provide a complete temporal trajectory for training LSTM.
Ii-C Physical Constraints
Having described the hybrid model, we now add additional constraints for training this model so that the predictions are physically consistent. To better illustrate this, we consider the example of lake temperature monitoring. We introduce two constraints along the depth dimension and the time dimension, respectively.
Density-depth relationship: It is known that the density of water monotonically increases with depth. Also, the temperature, , and density, , of water are related to each other according to the following known physical equation :
We first transform the values of predicted temperature into the density values according to Eq. 4. Then, we add an extra penalty for violation of density-depth relationship.
In Table II, we report some preliminary results. Our dataset is comprised of 13,543 observations from 30 April 1980 to 02 Nov 2015. We use 2/3 of data of training while testing on the remaining 1/3 data. For each observation, we used a set of 11 meteorological drivers as input variables, including wind speed, rain, freezing conditions, long-wave and short-wave radiation, etc.
It can be seen that PGRNN (with density-depth constraint) outperforms PGRNN0 (without density-depth constraint) for both RMSE and Phy-inconsistency. Also, the comparison between RNN (LSTM networks) and ANN shows that the modeling of temporal transition can help better capture the temperature change over time. From a temporal perspective, we observe from Fig. 2 that PGRNN can better capture the changes at certain depths where physics-based model and traditional RNN cannot achieve reasonable accuracy.
The temperature change in lake water is caused by the energy flow over time. The lake energy budget is a balance between incoming energy fluxes and heat losses from the lake. A mismatch in losses and gains results in a temperature change - more gains than losses will warm the lake, and more losses than gains will cool the lake.
Given the temporal modeling structure in the LSTM model, we add constraint on the predicted temperature over time such that the change of volume-average temperature is consistent to the energy gain/loss.
In Fig. 3, we show the thermocline depth detected in Lake Mendota. It can be seen that the detected thermocline position evolves more smoothly over time after we integrate the energy conservation constraint.
-  A. Karpatne, W. Watkins, J. Read, and V. Kumar, “Physics-guided neural networks (pgnn): An application in lake temperature modeling,” arXiv preprint arXiv:1710.11431, 2017.
-  J. L. Martin and S. C. McCutcheon, Hydrodynamics and transport for water quality modeling. CRC Press, 2018.