1. Introduction
Physicsbased models have been widely used to study engineering and environmental systems in domains such as hydrology, climate science, materials science, agriculture, and computational chemistry. Despite their extensive use, these models have several wellknown limitations due to simplified representations of the physical processes being modeled or challenges in selecting appropriate parameters. There is a tremendous opportunity to systematically advance modeling in these domains by using machine learning (ML) methods. However, capturing this opportunity is contingent on a paradigm shift in dataintensive scientific discovery since the ”black box” use of ML often leads to serious false discoveries in scientific applications (Lazer and others, 2014; Karpatne et al., 2017a)
. In this paper, we present a novel methodology for combining physicsbased models with stateoftheart deep learning methods to leverage their complementary strengths.
Even though physicsbased models are based on known physical laws that govern relationships between input and output variables, the majority of physicsbased models are necessarily approximations of reality due to incomplete knowledge of certain processes, which introduces bias. In addition, they often contain a large number of parameters whose values must be estimated with the help of limited observed data. A standard approach for calibrating these parameters is to exhaustively search the space of parameter combinations and choose parameter combinations that result in the best performance on training data. Besides its computational cost, this approach is also prone to overfitting due to heterogeneity in the underlying processes in both space and time. The limitations of physicsbased models cut across discipline boundaries and are well known in the scientific community; e.g., see a series of debate papers in hydrology
(Lall, 2014; Gupta and others, 2014; McDonnell and Beven, 2014).ML models, given their tremendous success in several commercial applications (e.g., computer vision, and natural language processing) are increasingly being considered as promising alternatives to physicsbased models by the scientific community. State of the art (SOA) ML models (e.g., Recurrent Neural Networks (RNN) and LongShort Term Memory (LSTM)) given enough data, can often perform better than traditional empirical models (e.g., regressionbased models) used by science communities as an alternative to physicsbased models
(GrahamRowe et al., 2008; Goh et al., 2017). However, direct application of blackbox ML models to a scientific problem encounters three major challenges: 1. They require a lot of training data, which is scarce in most practical settings. 2. Empirical models (including the SOA ML models) simply identify statistical relations between inputs and the system variables of interest (e.g., the temperature profile of the lake) without taking into account any physical laws (e.g., conservation of energy or mass) and thus can produce results that are inconsistent with physical laws. 3. Relationships produced by empirical models can at best be valid only for the set of variable combinations present in the training data and are unable to generalize to scenarios unseen in the training data. For example, a ML model trained for today’s climate may not be accurate for future warmer climate scenarios.The goal of this work is to improve the modeling of engineering and environmental systems. Effective representation of physical processes in such systems will require development of novel abstractions and architectures. In addition, the optimization process to produce an ML model will have to consider not just accuracy (i.e., how well the output matches the observations) but also its ability to provide physically consistent results. In particular, we present PhysicsGuided Recurrent Neural Network models (PGRNN) as a general framework for modeling physical phenomena with potential applications for many disciplines. The PGRNN model has a number of novel aspects:
1. Many temporal processes in environmental/engineering systems involve complex longterm temporal dependencies that cannot be captured by a plain neural network or a simple temporal model such as a standard RNN. In contrast, in PGRNN we use advanced ML models such as LSTM, which have the potential to mimic complex temporal processes in physical systems.
2. The proposed PGRNN can incorporate explicit physical laws such as energy conservation or mass conservation. This is done by introducing additional variables in the recurrent structure to keep track of physical states that can be used to check for consistency with physical laws. In addition, we generalize the loss function to include a physicsbased penalty
(Karpatne et al., 2017a). Thus, the overall training loss is , where the first term on the right hand side represents the supervised training loss between the predicted outputs and the observed outputs (e.g., RMSE in regression or crossentropy in classification), and the seconed term represents the physical consistencybased penalty. In addition, to favoring physically consistent solutions, another major side benefit of including physicsbased penalty in the loss function is that it can be applied even to instances for which output (observed) data is not available since the physicsbased penalty can be computed as long as input (driver) data is available. Note that in absence of physics based penalty, training loss can be computed only on those time steps where observed output is available. Inclusion of physics based loss term allows a much more robust training, especially in situations, where observed output is available on only a small number of time steps.3. Physics based/mechanistic models contain a lot of domain knowledge that goes well beyond what can be captured as constraints such conservation laws. To leverage this knowledge, we generate a large amount of “synthetic” observation data by executing physics based models for a variety input drivers (that are easily available) and use the synthetic observation to pretrain the ML model. The idea here is that training from synthetic data generated by imperfect physical models may allow the ML model to get close enough to the target solution, so only a small amount of observed data (ground truth labels) is needed to further refine the model. In addition, the synthetic data is guaranteed to be physically consistent due to the nature of the process model being founded on physical principles.
Our proposed PhysicsGuided Recurrent Neural Networks model (PGRNN) is developed for the purpose of predicting lake water temperatures at various depths at the daily scale. The temperature of water in a lake is known to be an ecological “master factor” (Magnuson and others, 1979) that controls the growth, survival, and reproduction of fish (Roberts and others, 2013). Warming water temperatures can increase the occurrence of aquatic invasive species (Rahel and Olden, 2008; Roberts and others, 2017), which may displace fish and native aquatic organisms, result in more harmful algal blooms (HABs) (Harris and Graham, 2017; Paerl and Huisman, 2008). Understanding temperature change and the resulting biotic “winners and losers” is timely science that can also be directly applied to inform priority action for natural resources. Given the importance of this problem, the aquatic science community has developed numerous modeuls for the simulation of temperature, including the General Lake Model (GLM) (Hipsey et al., 2019), which simulates the physical processes (e.g., vertical mixing, and the warming or cooling of water via energy lost or gained from fluxes such as solar radiation and evaporation, etc.). As is typical for any such model, GLM is only an approximation of the physical reality, and has a number of parameters (e.g., water clarity, mixing efficiency, and wind sheltering) that often need to be calibrated using observations.
We evaluate the proposed PGRNN method in a realworld system, Lake Mendota (Wisconsin), which is one of the most extensively studied lake systems in the world. We chose this lake because it has plenty of observed data that can be used to evaluate the performance of any new approach. In particular, we can measure the performance of different algorithms by varying the the amount of observations used for training. This helps test the effectiveness of the proposed methods in datascarce scenarios, which is important since most realworld lakes have very few observations or are not observed at all (they usually have less than 1% of observations that are available for Mendota). In addition, Lake Mendota is large and deep enough such that it shows a variety of temperature patterns (e.g., stratified temperature patterns in warmer seasons and wellmixed patterns in colder seasons). This allows us to test the capacity of ML models in capturing such complex temperature patterns.
Our main contributions are as follows. We show that it is possible to effectively model the temporal dynamics of temperature in lakes using LSTMs provided that enough observed data is available for training. We show that traditional LSTMs can be augmented to take energy conservation into account and track the balance of energy loss and gain relative to temperature change (a physical law of thermodynamics). Including such components in models to make the output consistent with physical laws can make them more acceptable for use by scientists and also may improve the prediction performance. We also studied the benefit of pretraining this model using synthetic data (i.e., the output of an uncalibrated physicsbased model) and then refining it using only a small amount of observation data. The results show that such pretrained models can easily outperform the stateofthe art physicsbased model by using a small amount of observed data. Moreover, we show that such pretraining is useful even if it uses simulated data from lakes that are very different in geometry, clarity or climate than the lake being studied. These results confirm that the PGRNN can leverage the strengths of physicsbased models while also filling in knowledge gaps by overlaying features learned from data.
The organization of the paper is as follows: In Section 2, we describe the preliminary knowledge and the setting of our problem. Section 3 presents the discussions related to the proposed PGRNN model. In section 4, we extensively evaluate the proposed method in a realworld dataset. We then recapitulate related existing work in Section 5 before we conclude our work in Section 6. A preliminary version of this work appeared in (Jia et al., 2019).
2. Preliminaries
2.1. Problem Formulation
Our goal is to simulate the temperature of water in the lake at each depth , and on each date , given physical variables governing the dynamics of lake temperature. This problem is referred to as 1Dmodeling of temperature (depth being the single dimension). Specifically, represents input physical variables at on a specific date , which include meteorological recordings at the surface of water such as the amount of solar radiation (in W/m, for shortwave and longwave), wind speed (in m/s), air temperature (in ), relative humidity (0100%), rain (in cm), snow indicator (True or False), as well as the value of depth (in m) and day of year (1366). These chosen features are known to be the primary drivers of lake thermodynamics (Hipsey et al., 2019). Given these input drivers and a depth level , we aim to predict water temperature at this depth over the entire study period. For simplicity, we use and to represent and in the paper when it causes no ambiguity. During the training process, we are given the sparse groundtruth observed temperature profiles on certain dates and at certain depths captured by inwater sensors (more dataset description is provided in Section 4.1).
2.2. General Lake Model (GLM)
The physicsbased GLM captures a variety of physical processes governing the dynamics of water temperature in a lake, including the heating of the water surface due to incoming shortwave radiation, the attenuation of radiation beneath the water surface, the mixing of layers with varying thermal energy at different depths, and the loss of heat from the surface of the lake via evaporation or outgoing longwave radiation (shown in Fig. 1). We use GLM as our preferred physicsbased model for lake temperature modeling due to its model performance and wide use among the lake modeling community.
The GLM has a number of parameters (e.g., parameters related to vertical mixing, wind sheltering, and water clarity) that are often calibrated specifically to individual lakes if training data are available. The basic calibration method (common to a wide range of scientific and engineering problems) is to run the model for combinations of parameter values and select the parameter set that minimizes model error. This calibration process can be both labor and computationallyintensive. Furthermore, the calibration process, applied even in the presence of ample training data, is still limited by simplifications and rigid formulations in these physicsbased models.
2.3. Sequential Machine Learning Model
Sequential machine learning model in dynamical systems aims to learn a blackbox transformation from the input series to target variables . In this work, we will train a single sequential model that applies to different depths. This allows utilizing observation data from different depths in the training process. Later in Section 4 we will show that the model trained using the data from all the depths can still very well capture temporal dynamics at each depth separately.
We also use areadepth profile as additional information to compute energy constraints (see Section 3.2). Since we train machine learning models that are specific to a target lake, the areadepth profile remains the same on different days and thus we do not include it in the input features.
3. Method
In this section, we will discuss the proposed PGRNN model in detail. First, we describe how to train an LSTM to model temperature dynamics using sparse observed data. Second, we describe how to combine the energy conservation law and the standard recurrent neural networks model. Then, we further utilize a pretraining method to improve the learning performance even with limited training data.
3.1. Recurrent Neural Networks and LongShort Term Memory Networks
Recent advances in deep learning models enable automatic extraction of representative patterns from multivariate input temporal data to better predict the target variable. As one of the most popular temporal deep learning models, RNN models have shown success in a broad range of applications. The power of the RNN model lies in its ability to combine the input data at the current and previous time steps to extract an informative hidden representation
. In an RNN, the hidden representation is generated using the following equation:(1) 
where and represent the weight matrices that connect and , respectively. Here the bias terms are omitted as they can be absorbed into the weight matrix.
While RNN models can model transitions across time, they gradually lose the connections to long histories as time progresses (Bengio et al., 1994). Therefore, the RNNbased method may fail to grasp longterm patterns that are common in scientific applications. For example, the seasonal patterns and yearly patterns that commonly exist in environmental systems can last for many time steps if we use data at a daily scale. The standard RNN fails to memorize longterm temporal patterns because it does not explicitly generate a longterm memory to store previous information but only captures the transition patterns between consecutive time steps. It is wellknown (CHEN and BILLINGS, 1992; Pan and Duraisamy, 2018) that such issue of memory is a major difficulty in the study of dynamical system.
As an extended version of the RNN, LSTM is better in modeling longterm dependencies where each time step needs more contextual information from the past. The difference between LSTM and RNN lies in the generation of the hidden representation . In essence, the LSTM model defines a transition relationship for the hidden representation through an LSTM cell. Each LSTM cell contains a cell state , which serves as a memory and forces the hidden variables to preserve information from the past.
Specifically, LSTM first generates a candidate cell state by combining and , as:
(2) 
The forget gate is used to filter the information inherited from , and the input gate is used to filter the candidate cell state at . Then we compute the new cell state and the hidden representation as:
(4)  
where denotes the entrywise product.
As we wish to conduct regression for continuous values, we generate the predicted temperature at each time step via a linear combination of hidden units, as:
(5) 
We also apply the LSTM model for each depth separately to generate predictions for every depth and for every date . Then given the true observation for the dates and depths where the sparse observed data is available, i.e., , our training loss is defined as:
(6) 
It is noteworthy that even if the training loss is only defined on the time steps where the observed data is available, the transition modeling (Eqs. 25) can be applied to all the time steps. Hence, the time steps without observed data can still contribute to learning temporal patterns by using their input drivers.
3.2. Energy conservation over time
The law of energy conservation states that the change of thermal energy of a lake system over time is equivalent to the net gain of heat energy fluxes, which is the difference between incoming energy fluxes and any energy losses from the lake (see Fig. 3). The explicit modeling of energy conservation is critical for capturing temperature dynamics since a mismatch in losses and gains results in a temperature change. Specifically, more incoming heat fluxes than outgoing heat fluxes will warm the lake, and more outgoing heat fluxes than incoming heat fluxes will cool the lake.
The total thermal energy of the lake at time can be computed as follows:
(7) 
where is the temperature at depth at time , the specific heat of water (4186 J kg°C), the crosssectional area of the water column (m) at depth , the water density (kg/m) at depth at time , and the thickness (m) of the layer at depth . In this work, we simulate water temperature for every 0.5m and thus we set =0.5. The computation of requires the output of temperature through a feedforward process for all the depths, as well as the crosssectional area , which is available as input.
The balance between incoming heat fluxes () and outgoing heat fluxes () results in a change in the thermal energy () of the lake. The consistency between lake energy and energy fluxes can be expressed as:
(8) 
where . More details about computing heat fluxes are described in the appendix. All the involved energy components are in Wm.
In Fig. 2, we show the flow of the proposed PGRNN model, which integrates energy conservation flow into the recurrent process. While the recurrent flow in the standard RNN can capture data dependencies across time, the modeling of energy flow ensures that the change of lake environment and predicted temperature conforms to the law of energy conservation. Traditional LSTM models utilize the LSTM cell to implicitly encode useful information at each time step and pass it to the next time step. In contrast, the energy flow in PGRNN explicitly captures the key factor that leads to temperature change in dynamical systems  the heat energy fluxes that are transferred from one time to the next. Further, even though the input drivers and temperature profiles in different years and different seasons distributions, they all conform to the universal law of energy conservation. Therefore, by complying with the universal law of energy conservation, PGRNN has a better chance at learning generalizable patterns to unseen scenarios (Read et al., 2019).
We define the loss function term for energy conservation and combine this with the training objective of standard LSTM model in the following equation:
(9)  
where represents the length of the icefree period. Here we consider the energy conservation only for icefree periods since the lake exhibits drastically different reflectance and energy loss dynamics when covered in ice and snow, and the modeling of ice and snow was considered out of scope for this study. We provide more details about how to compute the energy fluxes and from input data in the appendix. The value is a threshold for the loss of energy conservation. This threshold is introduced because physical processes can be affected by unknown less important factors which are not included in the model, or by observation errors in the metereological data. The function is adopted such that only the difference larger than the threshold is counted towards the penalty. In our implementation, the threshold is set as the largest value of in the GLM model for daily averages. The hyperparameter controls the balance between the loss of the standard RNN and the energy conservation loss.
Note that the modeling of energy flow using the procedure described above does not require any input of true labels/observations. According to Eqs. 1113, the heat fluxes and lake energy are computed using only input drivers and predicted temperature. In light of these observations, we can apply this model for semisupervised training for lake systems which have only a few labeled data points.
3.3. Pretraining using physical simulations
In realworld environmental systems, observed data is limited. For example, amongst the lakes being studied by USGS, less than 1% of lakes have 100 or more days of temperature observations and less than 5% of lakes have 10 or more days of temperature observations (Read and others, 2017). Given their complexity, the RNNbased models trained with limited observed data can lead to poor performance.
To address this issue, we propose to pretrain the PGRNN model using the simulated data produced by a generic (uncalibrated) GLM that uses default values for parameters. In particular, given the input drivers, we run the generic GLM to predict temperature at every depth and at every day. These simulated temperature data from the generic GLM are imperfect but they provide a synthetic realization of physical responses of a lake to a given set of meteorological drivers. Hence, pretraining a neural network using simulations from the generic GLM allows the network to emulate a synthetic but physically realistic phenomena. This process results in a more accurate and physically consistent initialized status for the learning model. When applying the pretrained model to a real system, we finetune the model using true observations. Here our hypothesis is that the pretrained model is much closer to the optimal solution and thus requires less observed data to train a good quality model. In our experiments, we show that such pretrained models can achieve high accuracy given only a few observed data points.
4. Experiment
In this section, we conduct extensive evaluations for the proposed method. We first show that the RNN model with LSTM cell can capture the dynamics of lake systems. Then we build the RNN model by incorporating energy conservation, and demonstrate its effectiveness in maintaining physical consistency while also reducing prediction error. Moreover, we show that the pretraining method can leverage complex knowledge hidden in a physicsbased model. In particular, pretraining the RNN model even using the simulated data of a lake that is very different than the target lake (in terms of geometry, clarity and the climate conditions) is able to reduce the number of observations needed to train a good quality model.
4.1. Dataset
Our dataset was collected from Lake Mendota in Wisconsin, USA. This lake system is reasonably large (40 km in area) and exhibits large changes in water temperatures in response to seasonal and subseasonal weather patterns. Observations of lake temperature were collected from North Temperate Lakes LongTerm Ecological Research Program. These temperature observations vary in their distribution across depths and time. There are certain days when observations are available at multiple depths while only a few or no observations are available on some other days.
The input drivers that describe prevailing meteorological conditions are available on a continuous daily basis from April 02, 1980 to December 30, 2014. Specifically, we used a set of seven drivers as input variables, which include shortwave and longwave radiation, air temperature, relative humidity, wind speed, frozen and snowing indicators. In contrast, observed data for training and testing the models is not uniform, as measurements were made at varying temporal and spatial (depth) resolutions. In total, 13,158 observations were used for the study period, as shown in Fig. 4.
We use the observed data from April 02, 1980 to October 31, 1991 and the data from June 01, 2003 to December 30, 2014 as training data (in total 8,037 observations). Then we applied the trained model to predict the temperature at different depths for the period from November 01, 1991 to May 31, 2003 (in total 5,121 observations).
4.2. Model setup
We implement the proposed method using Tensorflow with Tesla P100 GPU. The recurrent modeling structure uses 21 hidden units. The threshold value
is set as 24, which is equivalent to the largest value of in the GLM model for daily averages. The hyperparameter is set to 0.01. The value of is selected to balance the supervised training loss and the conservation of energy. A smaller value of results in a lower training loss at the expense of conservation of energy, and vice versa. Note that, when ¿0 (and thus energy conservation is part of the loss function), then the model has a better chance at learning general patterns that can reduce test error. (compared with the test error using =0). Also note that the energy conservation term is not fully accurate since certain minor physical processes are not captured by the energy conservation loss. Hence, a much larger value of can also results in suboptimal performance by enforcing the model to conform to approximate physical relationships. The model is trained with the learning rate of 0.005.4.3. Performance: prediction accuracy and energy consistency
First, we aim to evaluate how energy conservation helps improve the prediction accuracy and maintain the energy consistency. In our experiments, we use RNN to represent the RNN model with the LSTM cell, and use the RNN to represent the LSTMRNN networks after incorporating energy conservation to the entire study period. We assess the performance of each model based on their prediction accuracy (see Section 4.3.1) and the physical consistency (see Section 4.3.2). Some sensitivity tests regarding to hyperparameters can be found in our previous work (Jia et al., 2019).
4.3.1. Prediction accuracy
Here we compare RNN, RNN, and GLM in terms of their prediction RMSE ^{1}^{1}1Here we do not include the basic neural network and the standard RNN model (without LSTM cell) since the basic neural network produces an RMSE of 1.88 and the standard RNN produces an RMSE of 1.60 using 100% observed data, which is far higher than the models under discussion.
. To test whether each model can perform well using reduced observed data. We randomly select a different proportion of data from the training period. For example, to select 20% of training data, we remove every observation in our training period with 0.8 probability. The test data stays the same regardless of training data selection. We repeat each test 10 times and report the mean RMSE and standard deviation.
From Table 1, we have several observations: 1) RNN consistently outperforms RNN. The gap is especially obvious when using smaller subsets of observed data (e.g., 0.2% or 2% data). However, given plenty of observed data, the RNN model can achieve the similar performance with the RNN model. 2) Both RNN and RNN can get close to their best performance using over 20% observed data. 3) RNN using 20% observed data outperforms fully calibrated GLM (using 100% observed data).
Method  0%  0.2%  2%  20%  100% 

GLM  2.950(NA)  2.616(0.499)  2.422(0.423)  2.318(0.368)  1.836(NA) 
RNN    4.615(0.173)  2.311(0.240)  1.531(0.083)  1.489(0.091) 
RNN    4.107(0.181)  2.149(0.163)  1.489(0.115)  1.471(0.077) 
4.3.2. Energy consistency
To visualize how RNN contributes to a physically consistent solution, we wish to verify whether the gap between incoming and outgoing heat energy fluxes matches the lake energy change over time. Specifically, we train RNN and RNN using observed data from the first ten years. Then, we show the curves for the gap between incoming and outgoing heat fluxes and the change of lake energy over time for a certain period (Fig. 5). These two curves should be well aligned (in the icefree period) if the learning model follows the law of energy conservation. We also summarize the average gap between these two curves in test data (in the icefree period) as the energy inconsistency. In Fig. 6, we show the RMSE and the energy inconsistency of RNN, RNN and the calibrated GLM model in the entire test period. Here each model is trained using 100% observed data (the last column in Table 1).
From Fig. 5, we observe that RNN produces a better match between energy fluxes and lake energy change while RNN leads to a large difference between the two curves. This confirms that the addition of energy conservation term in the loss function used for RNN during its training period results in a model that helps preserve energy conservation in the test data. Note that the match between the blue and yellow curves for RNN is not as good as that for the calibrated GLM. RNN can obtain a greater match between these two curves by simply using a larger value of during the training phase. However, the energy conservation formula used in Eqs. 9 and 10 (in Appendix) captures only a subset of physical processes and ignores certain minor processes that can be challenging to be precisely modeled (Read et al., 2019), and thus strict compliance to the simplified energy conservation term used in the loss function of RNN can reduce the prediction accuracy in unseen data. Finally, from Figure 6 (and also from Table 1), we can see that RNN has even lower RMSE than RNN (which focuses only on reducing RMSE during the training phase). This shows that a more physically realistic model can also be more generalizable.
4.4. Leverage the knowledge hidden in physicsbased model via pretraining
Here we show the power of pretraining to improve prediction accuracy of the model even with small amounts of training data. A basic premise of pretraining our models is that GLM simulations, though imperfect, provide a synthetic realization of physical responses of a lake to a given set of meteorological drivers. Hence, pretraining a neural network using GLM simulations allows the network to emulate a synthetic realization of physical phenomena. Our hypothesis is that such a pretrained model requires fewer labeled samples to achieve good generalization performance, even if the GLM simulations do not match with the observations. To test this hypothesis, we conduct an experiment where we generate GLM simulations with input drivers from Lake Mendota. These simulations have been created using a GLM model with generic parameter values that are not calibrated for Lake Mendota, resulting in large errors in modeled temperature profiles with respect to the real observations on Lake Mendota (RMSE=2.950). Nevertheless, these simulated data are physically consistent and by using them for pretraining, we can demonstrate the power of our ML models to work with limited observed data while leveraging complex physical knowledge inherent in the physical models.
We finetune the pretrained models with different amounts of observed data and report the performance in Table 2. We use the notation, RNN, to refer to the RNN model with energy conservation that is first pretrained using simulation data during 19812013 and then gets finetuned using observed data from the training period. The comparison between RNN and RNN shows that the pretraining can significantly improve the performance. The improvement is relatively much larger given a small amount of observed data. For example, even with 0.2% of observed data (16 observations) RNN achieves RMSE of 2.056, which is much smaller than that obtained by RNN or RNN when using ten times the amount of observed data. Moreover, we find that the training RNN and RNN
model commonly takes 150200 epochs to converge while the training for RNN
and RNN only takes 3050 epochs to converge. The improvements in these aspects demonstrate that pretraining can indeed provide a better initialized state for learning a good quality model.Now we wish to better understand how the finetuning improves the performance using only limited observations. In Fig. 7, we show the predictions at 10 m depth by the generic GLM (i.e., GLMgnr), the pretrained RNN without finetuning (i.e., RNN), and the pretrained RNN using 2% data for finetuning (i.e., RNN). We include the distribution of the randomly selected 2% training data in the appendix. We have following observations: 1) The generic GLM results in a large bias with true observations. 2) RNN has similar predictions with the generic GLM since RNN is pretrained to emulate the generic GLM. Note that RNN has roughly captured temperature dynamics even without using any observed data. 3) After finetuning using just 2% observed data, the RNN very well closes the gap between RNN and true observations.
Method  0%  0.2%  2%  20%  100% 

GLM  2.950  2.616(0.499)  2.422(0.423)  2.318(0.368)  1.836(NA) 
RNN    4.615(0.173)  2.311(0.240)  1.531(0.083)  1.489(0.091) 
RNN    4.107(0.181)  2.149(0.163)  1.489(0.115)  1.471(0.077) 
RNN  2.455(0.169)  2.056(0.180)  1.590(0.162)  1.402(0.106)  1.380(0.078) 
4.5. The RMSE profile across depths and seasons
Here we further analyze the prediction results to understand the limitations of physicsbased GLM models and how our proposed method can overcome these limitations. Specifically, we conduct analysis from two different perspectives  across depths and across seasons. Each one will provide unique insights on the underlying difference between GLM and the proposed method in modeling lake temperature dynamics.
4.5.1. Error across depths:
In Fig. 8, we show the error of RNN models (pretrained and finetuned with 100% data) and the GLM models (generic GLM and calibrated GLM using 100% data) across different depths.
It can be seen at the shallow depth levels (¡ 6 m), RNN model achieves similar performance with the generic GLM, but has larger errors than the calibrated GLM. This is because a single RNN model is trained to optimize the performance across all the depths. If we separately train an RNN only for shallow depths, the performance can be close to the calibrated GLM.
The generic GLM model has much larger errors than RNN at depths larger than 6 m, especially at intermediate depths (i.e., between 6 m  16 m). The reason for such depthdependent differences between GLM and RNN is because GLM includes complex processes to model the dynamics of thermal stratification, which includes the densitybased separation of the surface and bottom waters. Specifically, the GLM is designed to capture the location of this temperature transition and strength of the gradient. However, predicting the dynamics of stratification from the basis of the underlying processes is very challenging for any model, including the GLM (Hipsey et al., 2019), and thus we can observe an increase in errors of the generic GLM model at depth layers below 6 m.
The calibrated GLM has much smaller errors than the generic GLM at middle depths. This shows that the generic GLM model simulates complex processes that cannot be easily generalized to specific lake systems without calibration. After GLM is calibrated using true observations, it can better locate the temperature transition in this specific lake and consequently reduce the errors in the middle depths. Note that the calibrated GLM still has larger errors compared to RNN at lower depths. This is potentially the result of challenges from a physicsbased formulation of stratification dynamics. In contrast, the ML models approach the problem of prediction without making any assumptions of the stratification processes, and are able to perform much better at intermediate and lower depths by learning patterns from the training data.
4.5.2. Error across seasons:
We show the overall error in each season in Fig. 9. We can observe that in spring RNN and calibrated GLM have similar errors, while in summer and fall RNN outperforms calibrated GLM by a considerable margin, with calibrated GLM offering improvement over RNN during the winter season. This implies a bias by GLM in modeling certain physical processes that are active during warmer seasons.
To better understand the difference between our proposed method and GLM across seasons, we separately plot the errordepth relation for different seasons (see Fig. 10). We can observe the errordepth profile in summer and fall are similar to that in Fig. 8. The difference between RNN and calibrated GLM performance is especially worse in summer and fall because these two seasons are dominated by a stronger stratification and/or rapid changes in stratification as the lake cools. The influence of stratification on model performance in the spring and winter period is weaker compared to summer and fall. Hence, the difficulty in modeling stratification in addition to the increased range of temperatures are likely responsible for GLM’s worse performance when compared to RNN in warmer seasons.
4.6. Can a pretrained ML model perform better than its teacher?
As observed from Table 2, the performance of the pretrained RNNbased models with no finetuning is better than the accuracy of the outputs from the generic GLM model (RMSE=2.950) based on which RNN is pretrained. GLM tracks temperature at various depth layers that grow and shrink, split, or combine based on prevailing conditions (this is referred to as a Lagrangian layer model, since vertical layers are not fixed in time). As adjacent layers split or combine, prediction artifacts that are not representative of the realworld lake system are introduced, which often result in additional variability at lower depths. The resulting temperature variability can be overly sensitive for Lake Mendota and can increase GLM error. In contrast, the pretrained RNN as an imperfect emulator of GLM does not fully capture such complexity, and instead predicts smoother and often more accurate temperature dynamics compared to the simulated data. To verify that GLM can introduce unnecessary variability or temperature change artifacts at lower depths that are comparatively muted in the pretrained model, in Fig. 11 we show the error profile of GLM and the pretrained model at different depths when no observations are used for refinement, i.e., the RNN model. We can observe that the pretrained RNN model and GLM achieve similar performance around the surface but the pretrained RNN has much lower RMSE than the GLM model at lower depths.
To better illustrate this, we pretrain RNN using data from different depth layers  surface (0 m) and 9 m. Then we measure the error of each model with respect to GLM simulated data and true observation data at the same depth where the model is trained (Table 3). We can observe that the error to GLM outputs is much higher at 9 m than at the surface. This shows that the ML models cannot fully mimic the complexity of GLM at lower depths. However, since these complex processes are not necessarily good representations of Lake Mendota temperature dynamics, the ML models achieve lower RMSE to true observations compared to GLM (4.752 by RNN, and 5.333 by GLM) at 9 m by learning a simpler temporal process that is closer to reality.
Surface  9 m  

Method  Simulation error  Observation error  Simulation error  Observation error 
GLMgnr    1.875    5.333 
RNN  0.854  1.932  1.498  4.752 
4.7. Ability to pretrain using lakes that are very different with target lake
In practice, the GLM model may not have access to true values of parameters (e.g., lake geometry, water clarity and climate conditions), and therefore can only generate simulations based on default and inaccurate assumptions of parameters that influence lake temperature dynamics. Here we show the power of pretraining using simulated data from a physicsbased model built on different lake geometries, lake clarity, and climate conditions. Our assumption is that the simulations by physicsbased models still represent physical responses that strictly follow known physical laws. Hence, the pretrained model should be able to capture these physical relationships and reach a physicallyconsistent initialized state. In our experiment, we will show that the pretraining with even a wrong set of lake parameters or with weather drivers very different from the target lake can still significantly reduce the amount of observations required to train a good quality model.
Specifically, we pretrain RNN using the simulated data by GLM based on specific conditions (geometry, clarity, and climate conditions). Then we will verify whether theses pretrained models still have superior performance after they are finetuned with a small amount of observations.
Lake geometry:
We generate GLM simulations for three synthetic lakes with three different lake geometric structure: cone, barrel, and martini. The cone shape is closer to the true geometry of Lake Mendota (see Fig. 12) while both barrel and martini are very different to the true geometry. We first conduct pretraining using the GLM outputs based on each geometric structure. Then we conduct finetuning using true observations. The performance is shown in Table 4.
It can be seen that when adapted to Lake Mendota, the learned model from the cone shape works well even with no observed data. In contrast, the models learned from the barrel and martini shapes have a much larger error when directly applied to Lake Mendota. However, these errors are significantly reduced after finetuning with only 2% data. This shows that the model learned from a specific geometric structure can also capture certain temporal patterns that are physically consistent and applicable to the target system.
Method  0%  0.2%  2%  20%  100% 

RNN    4.107(0.181)  2.149(0.163)  1.489(0.115)  1.471(0.077) 
RNN  2.455(0.169)  2.056(0.180)  1.590(0.162)  1.402(0.106)  1.380(0.078) 
RNN  2.469(0.168)  2.056(0.184)  1.595(0.097)  1.452(0.113)  1.374(0.074) 
RNN  3.239(0.098)  2.060(0.144)  1.617(0.090)  1.401(0.098)  1.383(0.078) 
RNN  5.340(0.110)  3.033(0.104)  2.216(0.141)  1.485(0.092)  1.459(0.059) 
When comparing the performance of different pretrained geometric structures, we notice that the model pretrained with the martini shape has a much larger error (RMSE 5.340) than the other two geometric shapes and the cone shape has the smallest error (see the first column in Table 4). This result agrees with the assumption that the cone shape is closer to the true geometry of Lake Mendota. Consequently, the GLM simulations using the cone shape should be closer to reality and the GLM simulations in martini shape should be far away from true observations. We verify this by measuring the RMSE of the GLM simulations with respect to true observation data: {cone simulation=2.792, martini simulation=5.950, barrel simmulation=3.864}. Even though the GLM simulations can have large errors when assuming the wrong geometric structure, the pretrained models obtain lower errors than their teacher (see the first column in Table 4: {cone 2.469, martini 5.340, barrel 3.239}). This shows that the machine learning models are less sensitive to the change of geometric structure. Moreover, even though the models pretrained using the wrong geometric structure have relatively large errors after pretraining, they can quickly recover to reasonable performance when finetuned with small amount of true observations data (e.g., 2% data).
Lake clarity:
Similarly, we generate GLM simulations for three synthetic lakes with different levels of clarity: normal (Kw=0.45), dark (Kw=1.20) and clear (Kw=0.25). Here we fix the lake geometry as a cone shape. The clarity level affects the penetration rate of radiation into the deeper water. We wish to verify how a model learned from a different clarity level can be finetuned to fit Lake Mendota. The performance is shown in Table 5.
We can observe that even if the Lake Mendota has the clarity level Kw close to the normal (Kw=0.45) level, the model pretrained from both “dark” clarity and “clear” clarity can be well adapted to lake Mendota after finetuning. We also note that the performance of finetuned models from different clarity levels are similar given even 0.2% observations. This shows that the clarity level has less impact than lake geometry on learning an accurate predictive model for lake systems.
Method  0%  0.2%  2%  20%  100% 

RNN    4.107(0.181)  2.149(0.163)  1.489(0.115)  1.471(0.077) 
RNN  2.455(0.169)  2.056(0.180)  1.590(0.162)  1.402(0.106)  1.380(0.078) 
RNN  2.469(0.168)  2.056(0.184)  1.595(0.097)  1.452(0.113)  1.374(0.074) 
RNN  2.776(0.124)  2.067(0.155)  1.601(0.078)  1.393(0.091)  1.380(0.068) 
RNN  2.518(0.135)  2.050(0.120)  1.648(0.128)  1.399(0.088)  1.371(0.076) 
The water clarity mainly determines how rapidly sunlight is attenuated with respect to water depth. This parameter therefore affects the gradient of the temperature transition and the warming rates of deeper waters. To further analyze this impact, we measure the error across different depths for models pretrained under different clarity levels, as shown in Fig. 13 (a). It can be seen that the model pretrained under ”dark” clarity has much higher error at depths 6m12m, where the temperature changes most rapidly. This confirms that a different clarity level can negatively impact water temperature modeling across depths. However, when we finetune the models with a small amount of true observed data, e.g., 2% data, the model can quickly recover to reasonable performance, as shown in Fig. 13 (b). Here it can be seen that the model pretrained under ”dark” clarity achieves similar performance with models pretrained under other clarity levels across all the depths.
Climate conditions:
Next, we generate GLM simulations for a synthetic lake with input drivers from Florida (which are very different from the typically much colder conditions in Wisconsin) and then pretrain the RNN using the simulated data from GLM based on these input drivers. We show the performance of pretrained models (RNN) in Table 6. Note that RNN trained using these input drivers and simulated data in Florida have very poor performance when directly applied to Lake Mendota (9.106 for RNN). This is not surprising because there is a huge temperature difference between Wisconsin (where Lake Mendota is located) and Florida. It is more interesting to see that even with just 2% observations, the learned model becomes much better after finetuning.
Method  0%  0.2%  2%  20%  100% 

RNN    4.107(0.181)  2.149(0.163)  1.489(0.115)  1.471(0.077) 
RNN  2.455(0.169)  2.056(0.180)  1.590(0.162)  1.402(0.106)  1.380(0.078) 
RNN  9.106(0.172)  2.601(0.177)  1.759(0.147)  1.470(0.091)  1.394(0.071) 
5. Related Work
Various components proposed in this work, including generalizing the loss function to include physical constraints, addressing the imperfection of existing physical models, and training ML models using the outputs from physical models, have been studied in different contexts.
As discussed in (Karpatne et al., 2017a), the idea of including an additional term in the loss function to prefer solutions that are consistent with domain specific knowledge is beginning to find extensive use in many applications. In addition to favoring solutions that are physically consistent, this also allows training in absence of labels, since physicsbased loss can be computed even in absence of class labels. Some recent applications of this approach to combining physical knowledge in machine learning can be found in computer vision (Sturmfels et al., 2018; Shrivastava et al., 2012), natural language processing (Kotzias et al., 2015), object tracking (Stewart and Ermon, 2017)
(Ren and others, 2018), and image restoration (Pan et al., 2018; Li et al., 2019). To the best of our knowledge, our work demonstrates for the first time that an ML framework can be adapted to incorporate energy conservation constraint, which is a universal law that applies to many dynamical systems.In the context of directly addressing the imperfection of physical models, which is the focus of this paper, the most common approach is residual modeling, where an ML model is learned to predict the errors made by a physicsbased model. This ML model can be learned using standard supervised learning techniques as long as some observations are available (that can be used to compute the errors made by the physics model). Once learnt, this ML model is used to make corrections to the output of the physics model. Most of the work on residual modeling going back several decades (perhaps even earlier) has used plain regression models
(Forssell and Lindskog, 1997; Xu and Valocchi, 2015), although some recents works (Wan et al., 2018) have used LSTM. A key limitation of such approaches is that they cannot enforce physics based constraints because they try to model the error made by a physics model as opposed to predicting some physical quantity. Recently, Karpatne et al. introduced a novel hybrid ML and physics model in which the output of a physics model is fed into an ML model along with inputs that are used to driver the physics model (Karpatne et al., 2017b). This hybrid model learns to use the output of the physics model as the final output for the input drivers for which physics model is doing well, and make corrections where it makes mistakes. Since the output of this hybrid model is a physical quantity, physics based constraints can now be enforced, allowing for label free learning. However, such approaches cannot be used to initialize the ML model using just synthetic outputs from the physics model (which are technically free to to obtain) since they require observations to be available during training.Machine learning models are increasingly being used to emulate physics based models since an ML model is typically much faster to execute than a physics based model once it has been trained (Butler et al., 2018; Ojika et al., 2017; McGregor et al., ). Since these ML models are trained using synthetic outputs generated by physics based models, the availability of training data is not a limitation, which makes it possible to train even highly complex ML models. However these emulators (if well trained) can, in general, be expected to do only as well as the physics models used for generating the training data. In particular, they cannot correct the errors made by physicsbased models due to missing physics or incorrect parameterization. However, the PGRNN approach presented in this paper can be used to develop emulators that are physically consistent and thus likely to more robust and generalizable to out of sample scenarios.
Another technique to fuse physical models with machine learning is to replace part of the physical model that is costly or inaccurate with a datadriven solution (Yao et al., 2018; Tartakovsky et al., 2018). In (Hamilton et al., 2017), a subset of the mechanistic model’s equations are replaced with datadriven nonparametric methods to improve prediction beyond the baseline process model. As another example from the domain of fluid dynamics, (Raissi et al., 2018) uses neural networks to approximate latent quantities of interest like velocity and pressure in Navier Stokes equations. This creates a much more generalizable fluid dynamics framework that doesn’t depend as heavily on careful specification of the geometry, as well as initial and boundary conditions. Such approaches are orthogonal to the ones being discussed in our work, as these ML models being used as surrogates can be made ”physicsguided” using the framework described in this paper.
There also exists extensive literature on the datadriven discovery of governing equations or mathematical forms that underly complex dynamical systems (P. Crutchfield and S. McNamara, 1987; Bongard and Lipson, 2007; J Majda and Harlim, 2012; Sugihara et al., 2012; Brunton et al., 2016; Raissi et al., 2017, 2018)
, or even how to discover the underlying physical laws expressed by partial differential equations from data
(Raissi, 2018). For example, Rudy et al. (Rudy et al., 2017) present a sparse regression method for identifying governing PDEs from a large library of potential candidate fictions and spatialtemporal measurements from a model dynamical system. Such approaches can be very valuable for analyzing and understanding complex systems for which analytical descriptions are not available (e.g., epidemiology, finance, neuroscience). In contrast, the focus of our work is on systems where the dominant governing equations and laws are already known, but physicsbased models contain inherent biases, as they are necessarily approximations of reality.6. Conclusion
The PGRNN approach presented in this paper is unique in that it provides a powerful framework for modeling spatial and temporal physical processes while incorporating energy conservation. We also studied the ability of pretraining these models using simulated data to deal with the scarcity of observed data. Using the simulated data from a poorly parameterized physicsbased model, PGRNN obtains high prediction performance with fewer observation data used for refinement compared with a parameterized physicsbased model calibrated using a large number of observations. Thus, PGRNN can leverage the strengths of physicsbased models while filling in knowledge gaps by employing stateoftheart predictive frameworks that learn from data.
The PGRNN framework incorporates energy conservation by adding additional states whose values are computed from physical equations. This allows the use of a rich set of constraints beyond those that can be enforced by just considering the output of the model. In particular, it can be used to model other important physical laws in dynamical systems, such as the law of mass conservation. The PGRNN framework can also be viewed as a transfer learning method that transfers the knowledge from physical processes to ML models. Future research needs to determine the types of dynamical systems models for which such an approach will be effective. It is entirely possible that new architectural enhancements will need to be made to the traditional LSTM framework to incorporate different types of physical laws and to model underlying physical processes that may be interacting at different spatial and temporal scales. Hence, the proposed framework can be applied to a variety of scientific problems such as nutrient exchange in lake systems and analysis of crop field production, as well as engineering problems such as autovehicle refueling design. Therefore, we anticipate this work as an important steppingstone towards applications of machine learning to problems traditionally solved by physicsbased models.
Acknowledgements.
This work was supported by NSF and USGS. We thank North Temperate Lakes LongTerm Ecological Research (NSF DEB1440297) for temperature and lake metadata. Access to computing facilities was provided by Minnesota Supercomputing Institute.References
 Learning longterm dependencies with gradient descent is difficult. IEEE transactions on neural networks 5 (2), pp. 157–166. Cited by: §3.1.
 Automated reverse engineering of nonlinear dynamical systems. Proceedings of the National Academy of Sciences 104 (24), pp. 9943–9948. Note: bibtex*[publisher=National Academy of Sciences;eprint=https://www.pnas.org/content/104/24/9943.full.pdf] External Links: ISSN 00278424, Link, Document Cited by: §5.
 A multilake comparative analysis of the general lake model (glm): stresstesting across a global observatory network. Environmental Modelling & Software 102, pp. 274–291. Cited by: footnote 2.
 Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences 113 (15), pp. 3932–3937. Cited by: §5.
 Machine learning for molecular and materials science. Nature 559 (7715), pp. 547. Cited by: §5.
 Neural networks for nonlinear dynamic system modelling and identification. International Journal of Control 56 (2), pp. 319–346. External Links: Document, Link Cited by: §3.1.
 Combining semiphysical and neural network modeling: an example ofits usefulness. IFAC Proceedings Volumes. Cited by: §5.
 Deep learning for computational chemistry. Journal of computational chemistry 38 (16), pp. 1291–1307. Cited by: §1.

Big data: science in the petabyte era
. Nature 455 (7209), pp. 8–9. Cited by: §1.  Debates—the future of hydrological sciences: a (common) path forward? using models and data to learn: a systems theoretic perspective on the future of hydrological science. WRR. Cited by: §1.
 Hybrid modeling and prediction of dynamical systems. PLoS computational biology 13 (7), pp. e1005655. Cited by: §5.
 Predicting cyanobacterial abundance, microcystin, and geosmin in a eutrophic drinkingwater reservoir using a 14year dataset. Lake and reservoir management. Cited by: §1.
 Some evaluations of drag and bulk transfer coefficients over water bodies of different sizes. BoundaryLayer Meteorology 3 (2), pp. 201–213. Cited by: Appendix A.
 A general lake model (glm 3.0) for linking with highfrequency sensor data from the global lake ecological observatory network (gleon). Cited by: Appendix A, Appendix A, §1, §2.1, §4.5.1.
 GLMgeneral lake model: model overview and user information. Cited by: Figure 1.
 Physics constrained nonlinear regression models for time series. Nonlinearity 26, pp. 201. External Links: Document Cited by: §5.
 Physics guided rnns for modeling dynamical systems: a case study in simulating lake temperature profiles. In Proceedings of the 2019 SIAM International Conference on Data Mining, pp. 558–566. Cited by: §1, §4.3.
 Theoryguided data science: a new paradigm for scientific discovery from data. IEEE Transactions on Knowledge and Data Engineering 29 (10), pp. 2318–2331. Cited by: §1, §1, §5.
 Physicsguided neural networks (pgnn): an application in lake temperature modeling. arXiv preprint arXiv:1710.11431. Cited by: §5.

From group to individual labels using deep features
. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 597–606. Cited by: §5.  Debates—the future of hydrological sciences: a (common) path forward? one water. one world. many climes. many souls. WRR. Cited by: §1.
 The parable of google flu: traps in big data analysis. Science. Cited by: §1.
 Heavy rain image restoration: integrating physics model and conditional adversarial learning. arXiv preprint arXiv:1904.05050. Cited by: §5.
 Temperature as an ecological resource. American Zoologist 19 (1), pp. 331–343. Cited by: §1.
 Debates—the future of hydrological sciences: a (common) path forward? a call to action aimed at understanding velocities, celerities and residence time distributions of the headwater hydrograph. WRR. Cited by: §1.
 [26] FlareNet: a deep learning framework for solar phenomena prediction. In Workshop on Deep Learning for Physical Sciences (DLPS 2017), NIPS 2017, Cited by: §5.
 Accelerating highenergy physics exploration with deep learning. In Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact, pp. 37. Cited by: §5.
 Equations of motions a data series. Complex Systems 1, pp. . Cited by: §5.
 Blooms like it hot. Science 320 (5872), pp. 57–58. Cited by: §1.
 PhysicsBased Generative Adversarial Models for Image Restoration and Beyond. arXiv eprints, pp. arXiv:1808.00605. External Links: 1808.00605 Cited by: §5.
 Longtime predictive modeling of nonlinear dynamical systems using neural networks. Complexity 2018, pp. 1–26. External Links: Document Cited by: §3.1.
 Assessing the effects of climate change on aquatic invasive species. Conservation biology 22 (3), pp. 521–533. Cited by: §1.
 Inferring solutions of differential equations using noisy multifidelity data. Journal of Computational Physics 335, pp. 736–746. Cited by: §5.
 Multistep Neural Networks for Datadriven Discovery of Nonlinear Dynamical Systems. arXiv eprints, pp. arXiv:1801.01236. External Links: 1801.01236 Cited by: §5.
 Hidden fluid mechanics: a navierstokes informed deep learning framework for assimilating flow visualization data. arXiv preprint arXiv:1808.04327. Cited by: §5.
 Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations. arXiv:1801.06637 [cs, math, stat] (en). Note: arXiv: 1801.06637 External Links: Link Cited by: §5.
 Water quality data for nationalscale aquatic research: the water quality portal. Water Resources Research. Cited by: §3.3.
 Processguided deep learning predictions of lake water temperature. Water Resources Research. Cited by: §3.2, §4.3.2.
 Learning with weak supervision from physics and datadriven constraints.. AI Magazine. Cited by: §5.
 Fragmentation and thermal risks from climate change interact to affect persistence of native trout in the colorado river basin. Global Change Biology. Cited by: §1.
 Nonnative trout invasions combined with climate change threaten persistence of isolated cutthroat trout populations in the southern rocky mountains. North American Journal of Fisheries Management. Cited by: §1.
 Datadriven discovery of partial differential equations. Science Advances 3 (4), pp. e1602614. Cited by: §5.

Constrained semisupervised learning using attributes and comparative attributes
. In Proceedings of the 12th European Conference on Computer Vision  Volume Part III, ECCV’12, Berlin, Heidelberg, pp. 369–383. External Links: ISBN 9783642337116, Link, Document Cited by: §5.  Labelfree supervision of neural networks with physics and domain knowledge.. In AAAI, Vol. 1, pp. 1–7. Cited by: §5.
 A domain guided cnn architecture for predicting age from structural brain images. arXiv preprint arXiv:1808.04362. Cited by: §5.
 Detecting Causality in Complex Ecosystems. Science 338 (6106), pp. 496–500. Note: bibtex*[publisher=American Association for the Advancement of Science;eprint=https://science.sciencemag.org/content/338/6106/496.full.pdf] External Links: ISSN 00368075, Link, Document Cited by: §5.
 A simple but accurate formula for the saturation vapor pressure over liquid water. Journal of Applied Meteorology 12 (8), pp. 1410–1411. Cited by: Appendix A.
 Learning parameters and constitutive relationships with physics informed deep neural networks. arXiv preprint arXiv:1808.03398. Cited by: §5.
 Dataassisted reducedorder modeling of extreme events in complex dynamical systems. PloS one 13 (5), pp. e0197704. Cited by: §5.
 Datadriven methods to improve baseflow prediction of a regional groundwater model. Computers & Geosciences. Cited by: §5.
 The tensormol0.1 model chemistry: a neural network augmented with longrange physics. Royal Society of Chemistry. External Links: Link Cited by: §5.
Appendix A Energy conservation
In Fig. 14, we show the major incoming and outgoing heat fluxes that impact the lake energy. The incoming heat fluxes include terrestrial longwave radiation and incoming shortwave radiation. The lake loses heat mainly through the outward fluxes of back radiation (), sensible heat fluxes (), and latent evaporative heat fluxes ()^{2}^{2}2Here the latent heat fluxes are related to changes in phase between liquids, gases, and solids while the sensible heat fluxes are related to changes in temperature with no change in phase (Bruce et al., 2018)..
We now expand Eq. 10 with more detailed energy fluxes. The consistency between lake energy and detailed energy fluxes can be expressed as:
(10) 
where , is the shortwave albedo (the fraction of shortwave energy reflected by the lake surface) and is the longwave albedo. In our implementation, we set to 0.07 and to 0.03 which are generally accepted values for lakes from previous scientific studies (Hipsey et al., 2019). All energy components are in Wm. By comparing this with Eq. 10, we can see that and . In this work, we ignore the smaller flux terms such as sediment heat flux and advected energy from surface inflows and groundwater.
Estimation of Heat Fluxes and Lake Thermal Energy: We now introduce how to estimate energy fluxes in our implementation.
Terrestrial longwave () radiation is emitted from the atmosphere, and depends on prevailing local conditions like air temperature and cloud cover. Incoming shortwave radiation () is affected mainly by latitude (solar angle), time of year, and cloud cover. Both factors are included in the input drivers .
As for the outgoing energy fluxes, we estimate , , and separately using the input drivers and modeled surface temperature.
The sensible heat flux and latent evaporative heat flux can be computed based on the previous study (Hipsey et al., 2019):
(11)  
where is the bulk aerodynamic coefficients for sensible heat transfer, and the bulk aerodynamic coefficients for latent heat transfer. Both coefficients are estimated from Hicks’ collection of ocean and lake data (Hicks, 1972). The coefficient is the ratio of the molecular mass of water to the molecular mass of dry air (=0.622), the latent heat of vaporization (=2.45310), and the specific heat capacity of air (=1005). The variable is the air temperature, and the wind speed (m/s) above the lake referenced to 10m height. Both these variables are included or can be derived from input drivers. is the surface water temperature in degrees Kelvin obtained through the feedforward process. The air density is computed as , where is air pressure (hPa) and is the water vapour mixing ratio (both derived from input drivers). The vapour pressure ( and ) is calculated by the linear formula from Tabata (Tabata, 1973):
(12)  
where is the relative humidity scaling factor (=1, obtained through calibrating the GLM model) and is the relative humidity (included in input drivers).
The back radiation is estimated as:
(13) 
where is the emissivity of the water surface (=0.97), and is the StefanBoltzmann constant (=5.6697e8 WmK).
Appendix B Distribution of training data
In Fig. 15, we show the distribution of randomly selected 2% observed data across different depths and different dates.
Comments
There are no comments yet.