1 Introduction
Given the success of deep learning methods in commercial domains such as computer vision, speech, and natural language processing, there is a growing interest in the scientific community to unlock the power of deep learning methods for advancing scientific discovery
[2, 8, 14, 31]. Out of the many reasons fueling this interest, a primary factor is the rich ecosystem of advanced deep learning frameworks such as Conv Nets [17]and long short term memory (LSTM) models
[11] that can handle complex structures in the data common in many scientific applications. Another reason is that with algorithmic innovations such as Dropout [32], we are not only moving toward robustness in deep learning but also toward better approaches for uncertainty quantification in deep learning, e.g., using the Monte Carlo (MC) Dropout method [7]. This is especially important in scientific problems where we need to produce uncertainty bounds in addition to point estimates, e.g., in climate change applications [25].Despite dramatic advances in many commercial fields, current standards of deep learning has seen limited success in scientific applications (e.g., [3, 18, 22]), sometimes even leading to spectacular failures (e.g., [18]). This is primarily because of the blackbox
nature of conventional deep learning frameworks, that are learned solely from data and are agnostic to the underlying scientific principles driving realworld phenomena. Since a blackbox model can only be as good as the data it is fed during training, it can easily produce spurious and physically inconsistent solutions in applications suffering from paucity of labeled data. Furthermore, dropout methods strain physical consistency further by denying some predictors and hidden states to the calculation at each iteration. Despite their value in estimating uncertainty, methods such as dropout are limited if they depart from physical realism. As a first step in moving beyond blackbox applications of deep learning, there is an emerging field of research combining scientific knowledge (or theories) with data science methods, termed
theoryguided data science [15]. A promising line of research in this field is to guide the learning of neural network models usingphysicsbased loss functions
[16, 13, 34], that measure the violations of physical principles in the neural network outputs. We refer to this paradigm as physicsguided learning (PGL) of neural networks.While PGL formulations have been shown to improve generalization performance and generate more physically consistent predictions, adding a loss function in the learning objective still does not circumvent the blackbox nature of neural network architectures, involving arbitrary design choices (e.g., number of layers and nodes per layer). As a result, blackbox architectures are susceptible to producing physically inconsistent solutions with minor perturbations in the network weights, even after being trained with physicsbased loss functions. This is a major concern when using uncertainty quantification methods such as MC dropout, where the network edges are randomly dropped with a small probability in the testing stage to produce a distribution of sample predictions for every test instance. Indeed, our results demonstrate that the randomness injected by MC dropout in the network weights easily breaks the ability of the PGL paradigm to preserve physical consistency in the sample predictions, leading to physically nonmeaningful uncertainty estimates.
This paper presents innovations in an emerging field of theoryguided data science where, instead of using blackbox architectures, we principally embed wellknown physical principles in the neural network design. We refer to this paradigm as physicsguided architecture (PGA) of neural networks. Specifically, this paper offers two key innovations in the PGA paradigm for the illustrative problem of lake temperature modeling as illustrated in Figure 1. First, we introduce novel physicsinformed connections among neurons in the network to capture physicsbased relationships of lake temperature. Second, we associate physical meaning to some of the neurons in the network by computing physical intermediate variables in the neural pathway from inputs to outputs. By hardwiring physics in the model architecture, the PGA paradigm ensures physical consistency of results regardless of small perturbations in the network weights, e.g, due to MC dropout. We compare the efficacy of our proposed approach with baseline methods on data collected from two lakes with differing physical characteristics and climatic regimes: Lake Mendota in Wisconsin, U.S.A., and Falling Creek Reservoir in Virginia, U.S.A.
The remainder of the paper is organized as follows. Section 2 provides a brief background of the problem of lake temperature modeling and relevant related work. Section 3 describes our proposed PGALSTM framework. Section 4 discusses our evaluation procedure while Section 5 presents results. Section 6 provides detailed analysis of our results and Section 7 provides concluding remarks and directions for future research.
2 Background and Related Work
2.1 Lake Temperature Modeling:
Modeling the temperature of water in a lake is important from both economic and ecological perspectives. Water temperature is known to be principal driver of the growth, survival, and reproduction of economically viable fish [30, 21] (see Appendix for more details). Increases in water temperature are also linked to the occurrence of aquatic invasive species [28, 29], which may displace fish and native aquatic organisms, and further result in harmful algal blooms [9, 26]. Hence, accurate and timely information about water temperature is necessary to monitor the ecological health of lakes and forecast future populations of fish and other aquatic taxa.
Since observations of water temperatures are incomplete at broad spatial scales (or nonexistent for most lakes), physicsbased models of lake temperature, e.g., the General Lake Model (GLM) [10], are commonly used for studying lake processes. A standard formulation in these models is to assume horizontal heterogeneity is limited and that the most relevant dynamics are captured in the vertical dimension of the lake, thereby modeling the lake as a series of vertical layers. These modeling studies often use temperature of water at the centre of a lake at varying depth values^{1}^{1}1Depth is measured in the direction from lake surface to lake bottom. and time points for model validation. We adopt the same formulation to model the temperature of water in a lake, at depth and time . In particular, we leverage two key physical principles of our problem to guide neural network approaches, briefly described in the following.
a) TemperatureDensity Physics: The temperature and density of water are nonlinearly related according to the following known physical equation [24]:
(1) 
Figure 2(a) shows a plot of this relationship, where we can see that water is maximally dense at 4C. We can use this physics to directly map temperature to density.
b) Density–Depth Physics: The density of water monotonically increases with depth as shown in the example plot of Figure 2(b), since denser water is heavier and goes down to the bottom of the lake. Formally,
(2) 
These two physical relationships serve as the basis of the PGA innovations proposed in this paper for lake temperature modeling.
2.2 Physicsguided Machine Learning:
Physicsguided Learning (PGL) is a recent paradigm in learning neural networks [16, 13, 34] where along with considering the prediction loss in the target space , we also measure the violations of physical principles in the model outputs , represented as physicsbased loss in the PGL objective:
(3) 
where is a tradeoff hyperparameter that decides the relative importance of minimising the physical inconsistency compared to the empirical loss and the model complexity. By using physicsbased loss, PGL restricts the search space of neural network weights to physically consistent options, thereby aiming to achieve more generalizable and physically relevant predictions. For the problem of lake temperature modeling, Karpatne et al.[16] developed a PGL framework to measure violations of the two physical relationships introduced in Section 2.1. Jia et al. [13] extended this to work with timebased LSTM architectures and implemented an additional physicsbased loss term to incorporate energy conservation. Other recent works like Xu et al. [36] integrated probabilistic logic with neural networks using semantic loss for classification tasks. Also, Pathak et al. [27] leveraged a set of linear constraints as loss functions for weakly supervised segmentation. Marquez et al. [23] imposed hard constraints on neural networks using constrained optimization formulations.
A major limitation of the PGL paradigm is that the choice of the neural network architecture is still blackbox and not informed by physics. Even though minimizing physicsbased loss helps in physically constraining the search space of neural network weights during training, there are no architectural constraints in the neural network design that guarantee the model predictions to be physically consistent on unseen test instances.
Physicsguided architecture of neural networks has recently gained popularity in several domains. Leibo et al. [19] proposed network connections to incorporate Hebbian rule of learning in neuroscience for viewtolerant facial detection. Another line of work has explored ways of embedding various forms of invariance in neural networks for problems in molecular dynamics [1] and turbulence modeling [20]. However, none of these developments are directly applicable to our problem of temperature prediction, where we need to encode physics available in the form of monotonic relationships and presence of intermediate variables.
2.3 Uncertainty Quantification:
Uncertainty quantification (UQ) is critical for model evaluation in a number of scientific applications, where rather than producing point estimates of the target variable, it is preferred to have a distribution of the possible values. In our problem of lake temperature modeling, we wish to perform UQ to ascertain the amount of confidence we can place in our temperature predictions and its estimated impact on the population of fish species and other ecological variables.
A standard approach for performing UQ in neural networks is by using dropout [32] on the trained neural network weights in the testing phase, to produce Monte Carlo samples of the target variable for every test instance—a technique called Monte Carlo (MC) dropout [7]
. While there are other methods in Bayesian deep learning for UQ that directly estimate posterior probabilities using priors on network weights
[6], they are generally slower than MC Dropout. We use MC Dropout in our approach to perform UQ for lake temperature modeling, although our proposed PGA innovations are generic and can be coupled with any other method for UQ in deep learning.Note that every dropout network represents a slightly perturbed version of the trained ANN model. Ideally, we want every dropout network to produce physically consistent simulations of the target variable, so that the UQ analysis is physically meaningful. However, if we use blackbox architectures, we can easily obtain dropout networks that produce physically inconsistent solutions. This is because even the small amount of randomness injected by the dropout procedure may be sufficient to unlearn the physical consistency learned during training by the PGL paradigm. In contrast, by infusing physics directly in the neural network architecture, our proposed PGA paradigm has a better chance of ensuring physical consistency in every MC dropout sample.
3 Proposed Framework
3.1 Overview of PgaLstm:
Figure 3 provides an overview of our proposed physicsguided architecture of LSTM (PGALSTM) for lake temperature modeling. It is comprised of three basic components: (i) An LSTM based autoencoder framework which extracts temporal features from the data at a given time , (ii) a monotonicitypreserving LSTM which uses along with additional depthbased features to predict an intermediate physical quantity: density , while ensuring that
, and (iii) a multilayer perceptron model which combines the density predictions
with the input drivers to finally predict the temperatures . In the following, we describe these three components in detail and present an endtoend learning procedure for the complete PGALSTM framework.3.2 Temporal Feature Extraction:
The problem of lake temperature modeling can be viewed as a spatiotemporal sequential prediction problem. In order to develop a model which addresses both of these aspects simultaneously, we propose a simple yet effective method to incorporate spatiotemporal relationships into our model.
The autoencoder consists of two Recurrent Neural Networks (RNN), the encoder LSTM and the decoder LSTM, as shown in Figure
4. We construct an input sequence by augmenting the feature vectors of the last
days of target date with its feature vector. This inputis then fed into the encoder which generates some hidden representations
for the target date. The decoder is then asked to reconstruct the entire input sequence from just the hidden representation of the target date. In order to do so, the representation must retain information about the sequential nature of the input data corresponding to the last week. Note that the dimensionality of the hidden representations is intentionally kept smaller than the input dimensionality. This design of autoencoder is inspired by the earlier work of Srivastava et al. [33].3.3 Monotonicitypreserving LSTM:
We build upon the basic LSTM architecture [11] that is designed to capture longterm and shortterm memory effects in predicting a target sequence given an input sequence using a recurrent neural network (RNN) framework. The basic idea of LSTM is to remember information for arbitrary long intervals by maintaining memory cell states, , and hidden states, . The cell state is operated on by two neural network modules (or gates): input gate and forget gate, which can add or delete information in the cell state, respectively, and are learnable functions of the features, , and the hidden state at the previous index, . The cell state in turn affects the hidden state using a learnable output gate. Finally, is mapped to estimates of the target variable using a stack of fullyconnected dense layers.
While LSTM explicitly captures recurrence relationships between hidden states at consecutive indices, and , and thus offers some smoothness in its predictions, the choice of the recurrence forms are quite arbitrary and not informed by physics. In our PGALSTM formulation, corresponds to a physically meaningful intermediate quantity: the density of water, which we know from Section 2.1 can only increase or remain constant with depth . To incorporate this knowledge of densitydepth physics, we introduce novel physicsinformed connections in LSTM to have a monotonic recurrence relationship between and . The proposed monotonicitypreserving LSTM architecture is shown in Figure 5 and described in the following.
The main innovation in our proposed architecture (marked as red in Figure 5) is to not only keep track of the hidden state and cell state, and , respectively, but also the physical intermediate variable, , which we know can only increase with depth. Hence, we consider the problem of only predicting the positive increment in density, , as a function of , which when added to yields . In particular, we apply a stack of dense hidden layers on
and pass the outputs through a ReLU activation function to predict positive values of
. The complete set of equations for the forward pass of monotonicitypreserving LSTM is given by:where, terms highlighted in red represent novel physicsinformed innovations introduced in our proposed architecture compared to conventional LSTMs. Note that denotes the sigmoid activation, denotes the Hadamard product, denotes the concatenation of and , and denotes learnable weight and bias terms for all values of .
While the physicsinformed innovations in our PGALSTM model were specifically motivated by the densitydepth physics in our target application, the idea of preserving monotonicity in LSTM outputs is useful in many other scientific applications. In general, our PGAmonotonicitypreserving LSTM framework can be used in applications where a target variable obeys monotonic constraints.
3.4 Mapping Density to Temperature:
Having computed density as an intermediate variable in our PGALSTM framework, mapping estimates of density at depth to estimates of temperature at depth appears quite straightforward. Ideally, one can refer to the temperaturedensity physics introduced in Section 2.1 to infer density given temperature. However, the physical mapping from density to temperature is onetomany and thus nonunique (see Figure 2(a)). In particular, a given value of density can be mapped to two possible values of temperature , one corresponding to the freezing phase () and the other corresponding to the warming phase (). To address this, we learn the mapping from to directly from the data, by concatenating with and feeding the concatenated values to a stack of fullyconnected dense layers with a single output node predicting the target variable . Since is already a strong physical predictor of , we do not need a deep architecture to map density to temperature and thus use a small number of hidden layers.
3.5 Endtoend Learning Procedure:
One of the benefits of our PGALSTM framework is that along with predicting the target variable: temperature, , it also produces estimates of a physical intermediate variable: density, , as ancillary outputs. Further, during the training stage, groundtruth observations of temperature, , can be converted to groundtruth estimates of density, , using the onetoone physical mapping from temperature to density. Hence, we perform endtoend training of the complete PGALSTM model by minimizing the empirical loss over both and in the following learning objective:
(4)  where,  
Here, and are the observed temperature and density values, respectively, at depth and time , is the total number of observations, is the combined set of weights and bias terms, respectively, across all components of PGALSTM, and and are the tradeoff parameters for the density prediction loss and regularization loss, respectively.
4 Evaluation Setup
4.1 Data and Experiment Design:
Our proposed PGALSTM model was trained and tested on two lakes that differed in depth, size, and climatic conditions. The first lake, Lake Mendota in Wisconsin, USA, is approximately 40 km in surface area with a maximum depth of approximately 25 m. Lake Mendota is a dimictic lake with seasonal variation in water temperatures from 0° in the winter to nearly 30° in the summer. Lake Mendota thermally stratifies each spring and mixes in the fall before ice cover appears, typically from late December or early January until March or April. Thermal stratification is the process by which lake surface warming generates differences in temperatures between waters closer to the surface with the colder waters below [35]. Temperature data for Lake Mendota were collected from an instrumented buoy situated near the deepest part of the lake. The buoy collected temperature observations every 0.5 m from surface to 2 m and every 1 m from 2 m to 20m (for a total of 23 depth locations). The overall data for lake Mendota consisted of 35,213 observations. This buoy was removed from the lake in the fall and replaced in the spring to avoid ice damage. To supplement the dataset with additional temperature measurements during periods when the buoy was not operational, we added manual temperature observations from the NTLLTER sampling program to generate a dataset from April 2009 to December 2017. As such, the temperature measurements varied both across depth and over time.
Falling Creek Reservoir (FCR) in Virginia, USA, is approximately 0.119 km in surface area with a maximum depth of 9.3 m [4]. Similar to Mendota, FCR is also dimictic and has seasonal variation in water temperatures from 0° in the winter to nearly 30° in the summer. FCR thermally stratifies each spring and mixes in the fall. Ice cover will occur at FCR each winter, but the duration of ice varies substantially year to year, from a few weeks to multiple months, depending on winter weather. Water temperature data for FCR were collected between 2013 and 2018 using manual casts of a CTD (Conductivity, Temperature, and Depth) SeaBird profiler at the deepest site of the reservoir. CTD profiles were generally collected weekly to subweekly during April to October, monthly in October  December, and then intermittently in the winter months. The CTD profiler collects observations every 4 Hz as it is lowered through the water column, resulting in 1 cm resolution data. To standardize data among CTD profiles over time, the water temperature data were discretized to every 0.33 meter for a total of 28 measurement depths. The overall data for FCR consisted of 7588 observations [5].
For both lakes features consisted of: day of year, depth, air temperature, shortwave radiation, longwave radiation, relative humidity, wind speed, rain, growing degree days, if the lake was frozen, and if it was snowing. With the exception of depth, all driver data were measured or calculated from meteorological datasets, and thus remained constant for a particular time across all depths. Additional simulated water temperature output from a physicsbased model (GLM), using the features above as inputs, were also used as features.
We partitioned the data into two contiguous time windows to be used for training and testing, such that there is no temporal autocorrelation between the training and test sets. The first 4 years was used for training for both the lakes and the remaining for testing. For the training subset selection, we have randomly selected dates and cumulatively added the number of observations for each date till we reached the required number of observations given a particular training fraction. Both the input and density outputs were normalised to zero mean unity standard deviation. The main temperature output
was not normalised ^{2}^{2}2The codes for PGALSTM are available on Github: https://github.com/arkadaw9/PGA_LSTM.4.2 Model Specifications
We have used previous 7 days of data to extract a 5 dimensional temporal embedding for each date. It is well known that a LSTM keeps on updating its memory state from the previous cell state to the next, i.e. it learns the history and uses that information to predict more accurately. Thus, LSTM needs a few memory cells to build up its memory in order to predict outputs for the full sequence. For our proposed monotonicitypreserving LSTM we perform padding to handle this shortcoming of LSTM, where we copy the features at the surface of the lake as padding values. A padding of size 10 was used for both lakes, and the Lake mendota and FCR was discretized into 50 and 28 depth intervals respectively. The number of units for the
monotonicitypreserving LSTM recurrent units was set to 8, followed by two dense layers with 5 hidden neurons and ‘ELU’ activations each with a final single neuron layer to predict the value. The values are then used to update intermediate physical variable . The second component of the PGALSTM, which maps into , comprises of another set of two dense layers, again each with 5 hidden neurons and ’ELU’ activations. A dense layer with only one neuron was used to predict the final temperature. To obtain uncertainty estimates, the dropout method was used during the testing phase [7]. We have used a dropout probability of 0.2, and for each input we have randomly created 100 different dropout networks to get a distribution on the model outputs.4.3 Evaluation Metrics:
We consider the root mean square error (RMSE) of a model on the test set as a metric of generalizability of the model. We also consider Physical Inconsistency as another evaluation metric, which is defined as the fraction of times the MC sample predictions at consecutive depths are physically inconsistent, i.e., they violate the densitydepth relationship. We have used a tolerance value of
kg/m to decide if a difference in density across consecutive depths is physically inconsistent or not.4.4 Baselines:
We have chosen the following baselines for comparison. (1) An LSTM with similar architecture as a PGALSTM was chosen for comparison. The blackbox LSTM has 8 memory units followed by four dense layers with 5 hidden units each, finally followed by a dense layer with one unit. (2) The PGLLSTM was used as another baseline with a similar architecture to the LSTM but using physicsguided learning in the form of loss functions. The PGLLSTM tries to minimise the physics based loss that can evaluated by computing the physical inconsistency of two consecutive depth outputs. Note that we did not consider the PGRNN approach [12] for lake temperature modeling using physicsguided learning in RNNs as a baseline in this paper due to two main reasons. First, this paper only considers densitydepth relationships while PGRNN also considers energy conservation. Second, PGRNN builds RNNs in time dimension whereas we are building LSTM in the depth dimension, to make use of the densitydepth relations directly in the creation of physicsguided architecture.
5 Results
5.1 Comparing PGALSTM with Baselines:
Test RMSE (in )  Physical Inconsistency  

Per Sample  Mean  Per Sample  Mean  
LSTM  
PGL  
PGA 
Test RMSE (in )  Physical Inconsistency  

Per Sample  Mean  Per Sample  Mean  
LSTM  
PGL  
PGA 
Tables 1 and 2 compare the performance of PGALSTM with baseline methods on Lake Mendota and FCR, respectively, using data for training on both lakes. We are interested in two evaluation metrics: Test RMSE and Physical Inconsistency of the predicted temperature profiles. Both metrics can be evaluated either on the individual MC samples (referred to as Per Sample) or on the mean of the MC samples (referred to as Mean). (Note that we are interested in Per Sample evaluation as we want every MC sample to be accurate and physically consistent for the Mean results to make sense in scientific applications.) We can see from Table 1 that on Lake Mendota, LSTM has fairly high Per Sample Test RMSE, illustrating the limitations of blackbox models in achieving good generalizability. Further, the Per Sample Physical inconsistency of LSTM is 0.32, which indicates that the MC samples generated from LSTM are physically inconsistent 32% of the time. Such samples, even if they match with the test labels, represent physically nonmeaningful estimates that cannot be used for subsequent scientific studies including uncertainty quantification. If we consider the mean of MC samples generated from LSTM, we can get lower Mean Test RMSE, due to the cancellation of noise through aggregation. However, the Mean Physical Inconsistency of LSTM is still fairly high.
If we employ the PGL paradigm, we can see that PGLLSTM shows little to no improvement in RMSE or Physical Inconsistency in comparison to LSTM. A more serious concern with PGLLSTM is that they provide little to no improvement in the physical consistency of the prediction samples, obtained via MC dropout. Note that every dropout network represents a slightly perturbed version of the trained neural network model. Ideally, we want every dropout network to produce physically consistent simulations of the target variable, so that the UQ analysis is physically meaningful. However, if we use blackbox architectures, it is highly likely to obtain dropout networks that produce physically inconsistent solutions even after using the PGL paradigm. This is because the dropout procedure effectively injects a small amount of randomness in the neural network weights, which may be sufficient to unlearn the physical consistency introduced during training by the PGL paradigm. In contrast to the baseline methods, we can observe that our proposed PGALSTM model shows the smallest Per Sample Test RMSE while always preserving physical consistency, even after performing MC dropout. Note that the number of learnable parameters in PGALSTM is very similar to all baseline models. For example, LSTM and PGALSTM use the same number of features in the hidden states, , and have the same number of dense hidden layers. However, the difference in PGALSTM architecture is that one of the hidden neurons in the neural network is explicitly trained to express a physically meaningful quantity: density, which is known to be a key intermediate variable in the mapping from input drivers to temperature. Further, PGALSTM explicitly encodes physicsinformed connections in LSTM to maintain the monontonic recurrence between density and depth. These physicsbased architectural changes ensure that the learned neural network is physically consistent and is robust to minor perturbations in the network weights, thus demonstrating better generalization power. Table 2 shows similar trends in the results of PGALSTM w.r.t. baseline methods on FCR. We can see that PGALSTM is able to reduce the Per Sample Test RMSE from 2.96 for LSTM to 2.19, which represents a significant improvement in the accuracy of MC samples generated by our proposed method. Further, since the Physical Inconsistency of PGALSTM is close to 0, our proposed method produces meaningful samples that can be employed by lake scientists for subsequent analyses of lake processes that are affected by lake temperature, such as the growth and survival of fish species. By taking the mean of our MC samples, the Mean Test RMSE of PGALSTM further reduces to 1.88.
5.2 Effect of Varying Training Size:
To demonstrate the effects of reducing training size on the accuracy of comparative models, Figure 6 shows the Per Sample Test RMSE of PGALSTM, PGLLSTM, and LSTM on varying training fractions. We can see that the test RMSE of all methods increase as we reduce the amount of data available for training in both Lake Mendota and FCR. However, we can see that PGALSTM shows the lowest Test RMSE for all values of training fractions in both the lakes. While Lake Mendota and FCR represent heavily studied water bodies, a majority of lakes in the USA (and the world) suffer from limited number of observations. Hence, by comparing models in scarcity of training data on these lakes, we intend to simulate reallife scenarios on other unseen lakes where temperature models have to be deployed. Note that in FCR, the rate of increase in RMSE as we reduce training size is lowest for PGALSTM as compared to all baseline methods. This resonates with the fact that the need for introducing physics to achieve better test RMSE is higher at smaller training sizes, when blackbox models have higher risks of overfitting and learning spurious solutions.
6 Analysis of Results
6.1 Visualizing Temperature Profiles:
Beyond simply analyzing the performance of PGALSTM in terms of two evaluation metrics, here we perform further visualizations of the sample profiles of temperature predicted by our proposed method in comparison with baselines, to assess the physical validity of our results. Figures 7(a), 7(b), and 7(c) show plots of 15 sample temperature profiles generated by comparative models on a representative test date of October 15, 2013 in Lake Mendota, when trained on 40% data. Note that these 15 samples have been selected at random from the pool of dropout MC samples generated on this test date across all 10 random runs of training. We can observe that the sample profiles of LSTM and PGLLSTM are highly physically inconsistent (i.e., temperature profiles show no monotonic behavior with depth), even if they appear close to the groundtruth observations on this date. Hence, despite their RMSE values, a lake scientist will have lower confidence in trusting their results and using them in subsequent scientific analyses. In contrast to baseline methods, PGALSTM produces sample profiles that are always physically consistent, and thus are useful from a domain perspective.
To analyze the validity of MC profiles in capturing the uncertainty around temperature predictions, Figures 7(d), 7(e), and 7(f) show the mean and variance of comparative models for the complete pool of dropout samples generated on this test date across all 10 random runs of training. The error bars in these plots have been generated using two standard deviations around the mean, thus capturing of samples under Gaussian assumption. We can see that the mean profiles of LSTM and PGLLSTM are close to the groundtruth and the error bars engulf the groundtruth in the shallower portion of the lake (depth m). However, as we move to the deeper portion of the lake (depth m), the groundtruth observations start to depart from the distribution of samples generated by LSTM and PGLLSTM and escapes outside the error bars. On the other hand, the distribution of samples generated using PGALSTM accurately envelops the groundtruth observations at every portion of the lake irrespective of depth. ^{3}^{3}3While Figure 7 provides results over a single test date in Lake Mendota, videos of the results for both lakes for all test dates are available at the following link : click here
Notice that in contrast to baseline methods, the variance of PGALSTM samples gradually increases with depth. The challenge in predicting the temperatures in the deeper depths is that the set of input features of the model are not adequate. For example, improved water clarity in some years make the bottom of the lake have faster warming rates. Since we don’t have water clarity as an input feature, it makes it harder for the model to predict the dynamics of the lake in deeper depths on unseen test years. Hence, by reporting higher uncertainty at deeper depths, PGALSTM is exhibiting physically meaningful behavior that can be explained using domain understanding.
6.2 Assessing Uncertainty Estimates:
While Figure 7(f) shows that the groundtruth observations are contained within samples of PGALSTM on a given date in Lake Mendota, we attempt to quantitatively assess the validity of our uncertainty estimates as compared to baseline methods across all dates. Ideally, if the uncertainty estimates produced by a model are valid, we should expect the distribution of its samples to accurately match the distribution of groundtruth observations on test points. In other words, if we look at the percentile of samples generated by an ideal model, then we should expect to observe
of groundtruth test points to fall within it. To capture this idea, we first fit a Gaussian distribution on the complete pool of samples generated by a model on a test point, and then estimate the twotailed percentile of the groundtruth observed at that point. Figure
8 plots the cumulative percentage of groundtruth observations (axis) that fall within a certain percentile of samples generated by comparative models (axis). The ideal model is represented by the diagonal line , where the percentage of groundtruth points within a percentile is equal to the percentile value. Models that are overconfident would have fewer groundtruth points within a certain percentile and hence would lie below the diagonal. Conversely, models that are underconfident would reside above the diagonal. We can see from Figure 8 that the baseline models, LSTM and PGLLSTM, lie just below the diagonal and hence produce slightly overconfident uncertainty estimates, i.e., the distribution of groundtruth points sometimes falls outside the distribution of MC samples. On the other hand, PGALSTM lies just above the diagonal and hence produces slightly larger uncertainty estimates than what is ideally expected. Note that in the absence of any information about the groundtruth on test dates, it is generally desirable to be slightly underconfident and produce wider uncertainty bounds than to be overconfident with narrower bounds. Further, even though the uncertainty bounds of PGALSTM are wider, it produces lower Per Sample Test RMSE than all other baseline methods. This illustrates that PGALSTM produces reasonably good uncertainty estimates that captures the distribution of the groundtruth observations.7 Conclusions and Future Work
This paper explored an emerging direction in theoryguided data science to move beyond blackbox neural network architectures and design physicsguided architectures (PGA) of neural networks that are informed by physics. We specifically develop a novel PGALSTM model for the problem of lake temperature modeling, where we design a monotonicitypreserving LSTM module to predict physically consistent densities. We compared our PGALSTM model with baseline methods to demonstrate its ability to produce generalizable and physically consistent solutions, even after making minor perturbations in the network weights by the Monte Carlo (MC) dropout method for uncertainty quantification.
Future work will explore applications of the proposed PGALSTM in other scientific problems that show monotonic recurrence relationships. Future work will also explore the effect of other state of the art uncertainty prediction methods on physicsguided architecture models. Extensions of the PGA framework for capturing more complex forms of physical relationships in space and time will be explored as well. Future work can also study the impact of PGALSTM on physical interpretability of neural networks, since the features extracted at the hidden layers of the network correspond to physically meaningful concepts.
References
 [1] Brandon Anderson, TruongSon Hy, and Risi Kondor. Cormorant: Covariant molecular neural networks. arXiv preprint arXiv:1906.04015, 2019.
 [2] Tim Appenzeller. The scientists’ apprentice. Science, 357(6346):16–17, 2017.
 [3] Peter M Caldwell, Christopher S Bretherton, Mark D Zelinka, Stephen A Klein, Benjamin D Santer, and Benjamin M Sanderson. Statistical significance of climate sensitivity predictors obtained by data mining. Geophysical Research Letters, 41(5):1803–1808, 2014.
 [4] CC Carey, JP Doubek, RP McClure, and PC Hanson. Oxygen dynamics control the burial of organic carbon in a eutrophic reservoir. Limnology and OceanographyLetters, 3:293–301, 2018.
 [5] CC Carey, RP McClure, AB Gerling, JP Doubek, S Chen, ME Lofton, and KD Hamre. Time series of highfrequency profiles of depth, temperature, dissolved oxygen, conductivity, specific conductivity, chlorophyll a, turbidity, ph, and oxidationreduction potential for beaverdam reservoir, carvins cove reservoir, falling creek reservoir, gatewood reservoir, and spring hollow reservoir in southwestern virginia, usa, 2013  2018. Environmental Data Initiative, link to dataset, 2019.
 [6] Meire Fortunato, Charles Blundell, and Oriol Vinyals. Bayesian recurrent neural networks. arXiv:1704.02798, 2017.
 [7] Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In ICML, pages 1050–1059, 2016.
 [8] D GrahamRowe, D Goldston, C Doctorow, M Waldrop, C Lynch, F Frankel, R Reid, S Nelson, D Howe, SY Rhee, et al. Big data: science in the petabyte era. Nature, 455(7209):8–9, 2008.
 [9] Ted D Harris and Jennifer L Graham. Predicting cyanobacterial abundance, microcystin, and geosmin in a eutrophic drinkingwater reservoir using a 14year dataset. Lake and Reservoir Management, 33(1):32–48, 2017.
 [10] Matthew R Hipsey, Louise C Bruce, Casper Boon, Brendan Busch, Cayelan C Carey, David P Hamilton, Paul C Hanson, Jordan S Read, Eduardo de Sousa, Michael Weber, et al. A general lake model (glm 3.0) for linking with highfrequency sensor data from the global lake ecological observatory network (gleon). Geoscientific Model Development, 12(1):473–523, 2019.
 [11] Sepp Hochreiter and Jürgen Schmidhuber. Long shortterm memory. Neural computation, 9(8):1735–1780, 1997.
 [12] Xiaowei Jia, Ankush Khandelwal, Nayak Guru, James Gerber, Kimberly Carlson, Paul West, and Vipin Kumar. Predict land covers with transition modeling and incremental learning. In SDM), 2017.
 [13] Xiaowei Jia, Jared Willard, Anuj Karpatne, Jordan Read, Jacob Zwart, Michael Steinbach, and Vipin Kumar. Physics guided rnns for modeling dynamical systems: A case study in simulating lake temperature profiles. In SDM, pages 558–566. SIAM, 2019.
 [14] TO Jonathan, AM Gerald, et al. Special issue: dealing with data. Science, 331(6018):639–806, 2011.
 [15] Anuj Karpatne, Gowtham Atluri, James H Faghmous, Michael Steinbach, Arindam Banerjee, Auroop Ganguly, Shashi Shekhar, Nagiza Samatova, and Vipin Kumar. Theoryguided data science: A new paradigm for scientific discovery from data. IEEE Transactions on Knowledge and Data Engineering, 29(10):2318–2331, 2017.
 [16] Anuj Karpatne, William Watkins, Jordan Read, and Vipin Kumar. Physicsguided neural networks (pgnn): An application in lake temperature modeling. arXiv preprint arXiv:1710.11431, 2017.
 [17] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097–1105, 2012.
 [18] David Lazer, Ryan Kennedy, Gary King, and Alessandro Vespignani. The Parable of Google Flu: Traps in Big Data Analysis. Science (New York, N.Y.), 343(6176):1203–5, March 2014.

[19]
Joel Z Leibo, Qianli Liao, Fabio Anselmi, Winrich A Freiwald, and Tomaso
Poggio.
Viewtolerant face recognition and hebbian learning imply mirrorsymmetric neural tuning to head orientation.
Current Biology, 27(1):62–67, 2017.  [20] Julia Ling, Andrew Kurzawski, and Jeremy Templeton. Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. Journal of Fluid Mechanics, 807:155–166, 2016.
 [21] John J Magnuson, Larry B Crowder, and Patricia A Medvick. Temperature as an ecological resource. American Zoologist, 19(1):331–343, 1979.
 [22] Gary Marcus and Ernest Davis. Eight (no, nine!) problems with big data. The New York Times, 6(04):2014, 2014.
 [23] Pablo MárquezNeila, Mathieu Salzmann, and Pascal Fua. Imposing hard constraints on deep networks: Promises and limitations. arXiv:1706.02025, 2017.
 [24] James L Martin and Steven C McCutcheon. Hydrodynamics and transport for water quality modeling. CRC Press, 1998.
 [25] James M Murphy, David MH Sexton, David N Barnett, Gareth S Jones, Mark J Webb, Matthew Collins, and David A Stainforth. Quantification of modelling uncertainties in a large ensemble of climate change simulations. Nature, 430(7001):768, 2004.
 [26] Hans W Paerl and Jef Huisman. Blooms like it hot. Science, 320(5872):57–58, 2008.
 [27] Deepak Pathak, Philipp Krahenbuhl, and Trevor Darrell. Constrained convolutional neural networks for weakly supervised segmentation. In ICCV, pages 1796–1804, 2015.
 [28] Frank J Rahel and Julian D Olden. Assessing the effects of climate change on aquatic invasive species. Conservation biology, 22(3):521–533, 2008.
 [29] James J Roberts, Kurt D Fausch, Mevin B Hooten, and Douglas P Peterson. Nonnative trout invasions combined with climate change threaten persistence of isolated cutthroat trout populations in the southern rocky mountains. North American Journal of Fisheries Management, 37(2):314–325, 2017.
 [30] James J Roberts, Kurt D Fausch, Douglas P Peterson, and Mevin B Hooten. Fragmentation and thermal risks from climate change interact to affect persistence of native trout in the colorado river basin. Global Change Biology, 19(5):1383–1398, 2013.
 [31] Terrence J Sejnowski, Patricia S Churchland, and J Anthony Movshon. Putting big data to good use in neuroscience. Nature neuroscience, 17(11):1440–1441, 2014.
 [32] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. JMLR, 15(1):1929–1958, 2014.
 [33] Nitish Srivastava, Elman Mansimov, and Ruslan Salakhutdinov. Unsupervised learning of video representations using lstms. In ICML, ICML’15, pages 843–852. JMLR.org, 2015.
 [34] Russell Stewart and Stefano Ermon. Labelfree supervision of neural networks with physics and domain knowledge. In AAAI, 2017.
 [35] ROBERT G Wetzel and G Limnology. Lake and river ecosystems. Limnology, 37:490–525, 2001.
 [36] Jingyi Xu, Zilu Zhang, Tal Friedman, Yitao Liang, and Guy Van den Broeck. A semantic loss function for deep learning with symbolic knowledge. arXiv preprint arXiv:1711.11157, 2017.
Comments
There are no comments yet.