1 Introduction
Machine Learning (ML) has the potential to significantly speedup scientific discoveries and transform all branches of science and engineering [lee2018basic]. To employ machine learning in physical problems, theory and domain knowledge can be leveraged to setup domainaware models. This approach which is recently coined as Scientific Machine Learning (SciML) [lee2018basic], Theoryguided Machine learning (TGML) [karpatne2017theory, wagner2016theory, zobeiry2019TGML1, zobeiry2019TGML2] or Physicsinformed Deep Learning [raissi2017physics1, raissi2017physics2], offers many advantages over traditional blackbox machine learning methods. This includes capability to train models with small and fragmented data (which is often the case in physical problems) [karpatne2017theory], setup interpretable models with physicsbased architecture, and ensure wellposedness and physical consistency in solution [lee2018basic]
. A variety of approaches have been recently explored in the scientific literature to incorporate theory and domain knowledge in machine learning including imposing physical constraints or symmetries in models, enriching loss functions using physicsbased correlations, and feature transformation based on governing physical laws
[wagstaff2001constrained, wang2017physics, karpatne2017physics, bird2007transport].Any physical phenomenon such as heat transfer, turbulence flow, wave propagation or radiation damping, can be described using a set of partial differential equations usually on the basis of conservation laws such as energy, mass or momentum
[feynman2011feynman]. In some specific cases, these equations can be solved to obtain closedform solutions representing underlying correlations and physics of the problem (e.g. steadystate heat transfer) [bergman2011fundamentals]. Generally speaking, one might also express equivalent solutions using phenomenological models that are consistent with the underlying theory but are obtained through empirical observations and measurements.Considering a specific physical phenomenon, instead of setting up partial differential equations or developing empirical solutions, one might employ machine leaning algorithms to train neural networks based on measurable inputs (i.e. features) and outputs (e.g. temperature). When the training is performed using transformed inputs that describe the mathematics underlying the physical phenomenon (e.g. a combination of the variables that is present in the mathematical expression) [karpatne2017theory], the neural network trains more quickly and the errors in predicting outputs are decreased. Given this understanding, it must be possible to identify combination of variables that are present in the underlying methodical expression: train a neural network with the input data augmented by an expression that is a combination of inputs, and the rate at which the each neural network trains should be a measure of how well the combined variable represents components of the mathematical expression. This can be done in the absence of any knowledge of the meanings of the variables and without any knowledge of the components of the mathematical expression (i.e. theoryagnostic). What we are therefore seeking to do is to introduce a new method for discovering theories underlying physical phenomena, by employing an iterative and physicsbased feature transformation to find components of the mathematical expression of the physical problem. This is done on the basis of commonly observed functional forms of physical phenomena. To explore the applicability of the method, in a blind approach, three demonstrations from different physical domains are considered in this paper. Results showed that the proposed method is a promising tool to uncover underlying expressions, leading to constructing closedform solutions.
2 Distilling Functional Forms of Physical Phenomena
Considering the immense body of theory in various fields of science and engineering, it becomes a daunting task to distill functional forms of available expressions and closedform solutions. However, families of similar forms can be observed:

Crosses: in many theoretical fields, such as heat transfer and fluid mechanics, specific unitless crosses or nondimensional numbers are shown to highly influence the physics of the problem. Not surprisingly, these numbers appear in many closedform solutions [bergman2011fundamentals]. Mathematically speaking, often times these are nothing but crosses of original features. Prime example is the Reynolds number, Re, in fluid mechanics that is used to identify fluid regimes (i.e. laminar or turbulence), and appears in countless closedform solutions [bird2007transport]:
(1) Here Re is expressed as the cross of three original features: velocity, , characteristic length, , and kinematic viscosity, .

Power functions: This is also a very common functional form in physical problems. Often times, crosses and power functions appear together. In solid mechanics, for example, deformation of a simply supported beam, , under a center load, , is obtained on the basis of the Timoshenko beam theory [lubliner2016introduction]:
(2) Here E and G are related to beam material properties, I, A and L are related to beam dimensions, and k is a constant (i.e. shear correction factor).

Exponential forms: This is a widely observed functional form in various fields including statistical mechanics, heat and mass transfer, solid mechanics and polymer engineering. Prime example is exponential decay such as Arrhenius equation that describes the dependency of rates of a chemical reactions, , on temperature, T [ladd1998introduction]:
(3) In this equation, and are empirically measured constants and R is the universal gas constant. Other examples include Newton’s law of cooling in heat transfer [bergman2011fundamentals] and Boltzmann’s law in statistical mechanics [feynman2011feynman].

Others: Other common types of functional forms include hyperbolic, trigonometric and logarithmic. These are listed in Table 1.
Category  Examples 

Crosses  Reynold number in fluid mechanics [bird2007transport]: 
Power  Beam bending in solid mechanics [lubliner2016introduction]: 
Exponential  Exponential decay in chemical reactions [ladd1998introduction]: 
Hyperbolic  Fin convective heat transfer rate [bergman2011fundamentals]: 
Trigonometric  Capillary adhesion of two flat plates [de2013capillarity]: 
Logarithmic  Isothermal entropy change of an ideal gas [bent1965second]: 
3 Method
We sought to explore the idea described above through a singleblind study in which one researcher created data corresponding to a mathematical expression describing a physical phenomenon (i.e. a known theory result), flatten that data to remove all description of what the variables indicated (i.e. substituted generic variable names and removed all units) and provided the set of inputs and outputs to the second researcher. The second researcher trained a series of neural networks utilizing the inputs and outputs and one additional combination of the variables as a unique input to each neural network. It was quickly determined that this approach would not be successful in any reasonable amount of time. The computational effort required to train a neural network repeatedly, in order to determine the rate at which the training occurs, and to do this for each possible combination of input variables was far too much. A lower cost method of screening potential combinations of variables was needed.
That method was a correlation analysis. Using Python and sklearn library, a first program was written to iteratively develop functional expressions of input variables. A second program took these functional expressions of input variables and determined the degree to which the outputs were correlated to each of those functional expressions. The most correlated functional expressions were then used individually to augment the inputs into a neural network which was then trained for an increasing size of training data in order to determine the rate at which the neural network was trained with that additional input. These steps are described below:

For a given problem, initially the original n features () were mathematically transformed to create a library of base functions, , as listed in Table 2:
Category Initial Base Functions (Z) Original Cross/Power , Exponential , Hyperbolic Trigonometric ) ,) ,) ,) ,) ,) ) ,) ,) ,) ,) ,) Table 2: List of base functions, Z, created from original features 
All base functions and outputs (i.e. vectors) were normalized using their respective mean,
, and standard deviation,
, values:(4) 
The covariance of each output vector and base function was calculated to obtain their correlations:
(5) 
For some cases, top 1020 highest correlated base functions were used to calculate new base functions:
(6) In these functions, constants are obtained using linear regression:
(7) For some cases, this process of creating new bases functions from previous set of functions was repeated couple of times.

From the list of highest correlated base functions, random combination of functions (2 to 6) were selected as transformed inputs to train a neural network with Fully Connected (FC) layers to predict the output. A highLevel API, estimator, in TensorFlow was used for all training. For a given dataset, size of training dataset was incrementally increased in consecutive training to study the effect of dataset size. List of all hyperparameters are given in Table
3.Parameter Value Neurons 3 hidden layers, 5 nodes per layer Activation Optimization Proximal Adaptive Gradient Learning rate Bathsize 50 Steps 25000 Table 3: Hyper parameters for training of Neural Networks 
After all training, base functions that were creating the biggest hits in the output (i.e. lowest value of error, RMSE) were chosen and the process was repeated to find all related base functions.
4 Demonstrations
4.1 Demonstration 1
A set of 2000 data, comprising six inputs ( and one output () consistent with an unidentified theory, were provided by researcher one. To preserve the anonymity of the data, units for the inputs and outputs were not provided.
An observation from the correlation analysis was that two of the original inputs, , were only randomly correlated with the output. Reducing the number of inputs from 6 to 4, greatly reduced the number of functions that would be built by the function generator, allowing to explore the iterative process more effectively. The correlation analysis was fast enough that large number of base functions could be analyzed quickly on a quadcore Intel i7 processor. From this first analysis, three functions showed the highest correlations to output: , , with as the top base function with 99.4% correlation to output. In Figure 1 and Figure 2, results from the Principle Component Analysis (PCA) for two different functions, one highly correlated, , and one randomly correlated to output, , are shown. From this first analysis, it was observed that results from correlation analysis was dominated by various derivations of functions. To reduce the dependency of output, , to base function, new output was created using the following equation:
(8) 
With this new output, PCA analysis was performed again and different sets of highly correlated functions were obtained. This process was repeated for the third time. Results from these analysis to identify top correlated functions in each step are shown in Table 4.
First PCA trial  Second PCA trial  Third PCA trial 

To establish a baseline, the recipient of the data trained a neural network using the six original provided inputs. This model was trained several times using successively more data points ranging from 100 to 2000. The Root Square Mean Error (RSME) of the predictions of each trained neural network were recorded from each training. Random combinations of functions were then selected from the list in Table 4 and NN training with different dataset sizes were performed again. RMSEs of several trainings for 4 combinations of inputs are compared in Figure 3. This shows that a combination of base functions gives the lowest error in NN training.
Even though NN training was used in this demonstration, it was not greatly advantageous to do so. If the intention is to discover a combination of variables included in the mathematics that describes a physical phenomenon, finding a base function that is 99.4% correlated to the output meets the objective without relying on neural networks. This was suspected, however, to be related to the simplicity of the mathematical expression in the first demonstration. The equation used for demonstration one describes the onedimensional problem of motion under constant acceleration:
(9) 
Here, is initial position of a movable object, is the initial velocity of the object, is the constant acceleration and is time. The function generator achieved a correlation better than 99% but not 100% due to the limitations of the function generator itself. Later an iterative capability was built into the function generator it was able to produce a function that had precisely 100% correlation to the data as a result of precisely matching the equation used without relying on NN training.
5 Demonstration 2
Similar to previous demonstration, a dataset of 6 inputs ( and 1 output () with 2000 data points were provided by the first researcher. Based on the success of the function generator for demonstration one, it was expanded to include more functional forms of the input variables. Specifically, the following base function was added to the list in Table 2:
(10) 
Using correlation analysis, the function generator created a ranked list of expressions all highly correlated to output. Top correlated functions are listed here:
(11) 
This shows a common form in all the highly correlated functions. A second iteration of correlation analysis revealed an expression with 99.5% correlation to the output:
(12) 
The first researcher with knowledge of the mathematical expression used to generate the data, found that the solution did bear resemblance to the desired expression, but had some problems. In particular there was unit mismatch, and also a component of the equation missing from the suggested expression. The equation used for demonstration two was an expression for the Nusselt number describing the heat transfer coefficient for forced convection over a circular cylinder. This equation was described by Churchill and Bernstein [churchill1977correlating] as:
(13) 
In which following nondimensional numbers are used:
(14) 
In above equations, is the heat transfer coefficient, is the cylinder diameter, is fluid thermal conductivity, is the fluid velocity, is the fluid density and is the specific heat capacity. The best correlated solution from the function generator (Equation 12) was:
(15) 
Which can be rewritten as:
(16) 
Unit mismatch is possible to prevent. Given that the intention of this research is to show a new method for discovering theories based on data, it is not necessary that researchers strip away the unit information from the data. Unit information can be used to establish a set of constraints within the function generator such that unit mismatch will not occur. Evaluation of the second demonstration was not repeated with constraints on the function generator based on units, but this approach was carried forward to the later demonstrations.
The equation used in the second demonstration was an empirical equation determined as a fit to experimental data. This likely makes the equation inappropriate for inclusion in this study. The data generated for the second demonstration, as a simulated experiment, precisely matches the empirical equation which is only an approximate fit to actual experimental data. It is very possible that the empirical equation used in the second demonstration includes terms to help it match the data well, but that are not actually representative of the underlying physical phenomenon. Rather than attempting to improve the results for the second demonstration, given its potential inapplicability to this research, the research moved to a third demonstration.
6 Demonstration 3
For the third demonstration, the researchers included knowledge of the units of the variables. Similar to previous demonstrations, a dataset of 6 inputs ( and 1 output () with 2000 data points were provided by the first researcher. The output function was unitless but was known to be the cotangent of an angle, one input variable was an angle (), one input variable was a normalized speed (
, and all other input variables were dummy variables which were immediately removed from the study.
This resulted in a number of constraints to the function generator, allowing to generate a small number of base functions as listed in Table 5.
Category  Initial Base Functions (Z) 

Cross/Power  
Trigonometric  ) , ) , ) ,) , ) ,) 
Returning to the original research concept in this paper, a neural network was trained repeatedly with combinations of candidate variables. In each instance, the neural network was trained using one function of and one function of . The models were ranked according to their RMSEs. Considering the 40 models that resulted in predictions with the lowest error, a set of six functions (two of the angle variable and four of the speed variable) were identified as most likely contributing to the efficacy of the neural network:
(17) 
These six functions were then compared with the mathematical expression used to generate the data for demonstration three, which is an expression of the angle of deflection of supersonic air of a given velocity affected by a shockwave at a given shock angle [potter2016mechanics]:
(18) 
For which is the Mach number, is the oblique shock angle, is the deflection angle and is a constant (heat capacity ratio). At first glance, the results were unsatisfying; only one of the six functions was explicitly used in the equation, the other five were not, and one of the functions of the input variables used repeatedly in the equation was not captured in the set of six. However it was noted that the mathematical expression used to generate the data could be manipulated in a number of ways, through a process we generally call “algebra”. For example, it turned out that the mathematical expression in Equation 18 could be equivalently written using three of the six identified functions:
(19) 
Further, given the capability of a neural network to determine power functions of a variable, it was not surprising that the other three functions also showed signals. After all,
(20) 
This leads to a bit of a conundrum. A neural network can be used to determine which functions may be directly included in the expression used to generate the data, but there can be many such functions given that there can be many equivalent expressions that precisely describe the same phenomenon.
7 Discussion
In this study, one researcher used governing equations for three different physical phenomena to create datasets of six inputs and one output in each case. These were equation of motion (demonstration 1), equation for Nusselt number over a circular cylinder in convective heat transfer (demonstration 2), and angle of deflection in oblique shock waves (demonstration 3). In a blind study, the second researcher attempted different iterative machine learning methods to extract the underlying expressions. Summary of the results and methods are listed under Table 6.
Underlying Physical Equation  Obtained Expressions  Method 
Iterative Correlation Analysis and NN training  
Iterative Correlation Analysis  
20cm), )  
,, ,  Iterative NN training 
One observation from these demonstrations was that for each problem, a certain variation of the approach was more effective to uncover underlying expressions. For demonstration 1, a combination of correlation analysis and NN training was used. This demonstration showed that by expanding the list of base functions for correlation analysis (i.e. Table 2), better results are obtained. For demonstration 2, only the correlation analysis with expanded base functions were used. The final expression had several components similar to the ones in the closedform solution. However, there were some differences. This demonstration showed that the knowledge of input units can be effective in guiding the proposed approach. For demonstration 3, units of the inputs were provided to the second researcher. This reduced the number of base functions to the point that NN training could be performed with many combinations of base functions. Although several underlying expressions were detected, some of them were only derivations of exact expressions in the closedform solution.
8 Summary and Future Work
In summary, we proposed an iterative machine learning approach enriched with common functional forms of physical phenomena for discovering theories and underlying expressions. This showed to be a promising approach in some cases leading to uncovering explicit closedform solutions. For some cases, however, challenges were observed including limitation on the number of base functions, and false positives for derivations of exact expressions. However, the proposed method can be effective as an starting point to analyze physical phenomena and extract underlying expressions.
Aside from feature transformation, we would like to consider other established SciML approaches in the future. Specifically, we would like to consider the effects of optimization approach, activation function and loss function enriched with domain knowledge. For example, for heat transfer problems, exponentialbased activation functions correlate better with the physics of the problem. Moreover, we would like to consider the effect of NN architecture in terms of both hidden layers and neurons on the sensitivity of results on base functions. This can be done in a systematic way to observed changes in correlation of base functions to output. Finally, we would like to employ data kpartitioning to analyze the effect of based functions to train within one region and predict trends in other data regions.
Comments
There are no comments yet.