I Introduction
Performing predictions based on observed data is a general problem of interest in a wide range of scientific disciplines. A traditional approach is to construct problemspecific mathematical models that relate observed data features or inputs to predictions or outputs. For many practical problems however, such relationships can be too complex to establish by manually analyzing data.Dhar (2013)
In recent years, there has been an explosion of alternative purely datadriven and understandingagnostic approaches involving Artificial Intelligence (AI) / Machine Learning (ML) techniques effecting numerous areas of social and physical sciences and engineering.
ShalevShwartz and BenDavid (2014); LeCun, Bengio, and Hinton (2015); Davies et al. (2021); Carleo et al. (2019); Mater and Coote (2019); Hamet and Tremblay (2017); Baldi and Brunak (2001); Brunton and Kutz (2022) As compared to models derived on the basis of physical intuition and understanding, these datadriven AI/ML models are in principle agnostic to our understanding of the system. This is both the strength and shortcoming of this latter class of methods.Per construction, these AI methods learn models for which the reasons behind predictions are difficult to understand by a human and are known as blackbox models.LoyolaGonzalez (2019) Typically, such blackbox models are capable of learning very complicated relationships between inputs and can generate excellent predictions. However, it is natural to feel suspicious especially when designing further actionable policies on the basis of such opaque blackbox models. One way to trust AI could be to not just make predictions on its basis, but also be able to explain why those specific predictions were made. On this basis, one could then at least rule out misleading AI models if the explanation or interpretation was being made due to some invalid reason. Such interpretations could also help understand better the domain of applicability of the AI and if there are certain types of data where it stops being reliable. Thus, it is important to interpret AI models to establish trust before accepting them as accurate or even further refine them if necessary.
In this work we view the problem of interpretation from the lens of classical thermodynamics.Callen (1985) One of the key postulates in thermodynamics states that there exists an entropy function for any system, which is a continuously differentiable and monotonically increasing function of the system’s energy . In absence of any constraints, equilibrium is characterized by the entropy being maximized. In presence of constraints, equilibrium is instead characterized by minimizing socalled free energies. For instance, for a closed system with fixed number of particles at constant temperature and volume , the equilibrium state is characterized by the Helmholtz Free Energy attaining its minimum value. Furthermore, due to the monotonicity postulate ,Callen (1985) minimizing when viewed as a function of and is a convex optimization problem. At equilibrium, a tradeoff is achieved between minimizing the energy and maximizing the entropy . At any given temperature , there exists only one minimum value for the pair that minimizes the free energy at that temperature. All microstate configurations with this pair of
are then equally probable.
In the same vein as thermodynamics, we set up a formalism where interpretation or representation of any complex model can be expressed as a tradeoff between its simplicity and unfaithfulness to the underlying ground truth. Just like how in thermodynamics the entropy increases with energy, i.e. higher energy states have higher entropy, in our framework, when appropriately defined, the unfaithfulness of an interpretation increases with its simplicity. More technically, we introduce a Simplicity function and an unfaithfulness function (see Sec. II for details) which depend monotonically on each other. We define the best interpretation as the simplest model that also minimizes unfaithfulness to the ground truth model being explained. This is expressed through an interpretation free energy where is a tunable parameter, analogous to temperature in thermodynamics. For any choice of , is then guaranteed to have exactly one minimum characterized by a pair of values . All interpretations corresponding to these values of and are then equally valid interpretations. By systematically decreasing we can then increase the complexity of the interpretation.
We call this approach Thermodynamically Explainable Representations of AI and other blackbox Paradigms (TERP). In Sec. II we clarify details of , , as well as other crucial aspects of our approach. TERP has the following salient features:

It is locally valid, i.e. interpretations are produced not for the entire dataset at the same time but in a tunable vicinity of any specific datapoint.

It can be modelagnostic or modeldependent, i.e. it works even without assuming anything about the model being explained, while still being capable of using any modelspecific information if available.

It uses a surrogate model generation scheme which is implemented through a forward feature selection Kumar and Minz (2014) Monte Carlo algorithm.
TERP is a general protocol suitable for a wide variety of blackbox models and datasets coming from simulations and reallife data. We demonstrate this generality by interpreting the widely used XGBoost
Chen and Guestrin (2016) and MobileNetsHoward et al. (2017) models trained to predict heart disease and classify images respectively. In addition, we have applied it to a domain of great current interest and of relevance to our own research,Frenkel and Smit (2001); Doerr et al. (2021); Han et al. (2017); Gao et al. (2020) namely the use of AIaugmented models for analyzing molecular dynamics (MD) simulations.Ma and Dinner (2005); Wang, Ribeiro, and Tiwary (2020) The aim of these methods is to learn and even accelerate the underlying physics governing the system. Wang, Ribeiro, and Tiwary (2020); Ribeiro et al. (2018) Application of an interpretation scheme would be very useful for deriving direct mechanistic insight from these simulations and in ensuring that these models are working as intended. For instance, a crucial topic of interest in this field is the behavior of the system near the socalled transition state,VandenEijnden (2014) where the system goes from one metastable state to another.Smith et al. (2020) TERP directly answers this question by simply postprocessing a trained AIaugmented MD model.In this regard, we have applied TERP to find interpretable representations of two deep neural network based approaches to enhance MD. They are the recently developed VAMPnets
Mardt et al. (2018) and SPIBWang and Tiwary (2021); Beyerle, Mehdi, and Tiwary (2022) methods, applied to prototypical biophysical systems alanine dipeptide in vacuumBolhuis, Dellago, and Chandler (2000) and the chirally symmetric peptide aminoisobutyric acid (Aib) in water.Mehdi et al. (2022)TERP attributes feature contributions for a specific blackbox prediction based on the nonzero weights of an approximate, linear model. Interpreting blackbox models by building a local surrogate model is not new, and many other posthoc analysis schemes for interpreting blackbox models already exist, such as LIMERibeiro, Singh, and Guestrin (2016), permutation feature importance,Fisher, Rudin, and Dominici (2019) SHAP,Lundberg and Lee (2017); Gupta, Kulkarni, and Mukherjee (2021) integrated gradients,Sundararajan, Taly, and Yan (2017) and counterfactual explanations.Wachter, Mittelstadt, and Russell (2017); Wellawatte, Seshadri, and White (2022) Especially the LIME approach from Ref. Ribeiro, Singh, and Guestrin, 2016 closely inspires our work. However, TERP advances such methods by introducing the connection with thermodynamics which puts the optimization procedure on a more rigorous and also intuitive setting, and opens up research directions for further improvement, especially from the perspective of application of AI methods to problems in chemical and biological physics. We summarize some such avenues in Sec. IV.
Ii Theory
ii.1 Simplicity, Unfaithfulness and Interpretation Free Energy
Our starting point is some given dataset coming from an unknown ground truth . For a particular element , we seek linear, approximate interpretations or representations that are as simple as possible while also being as faithful as possible to in the vicinity of . We restrict ourselves to linear interpretations , with nonzero coefficients expressed as a linear combination of corresponding features , defined as:
(1) 
where denotes the weight for feature , and denotes identity or null feature. A coefficient model has nonzero coefficients out of possible , with the other equaling 0. For such a coefficient model, we define a Simplicity function as:
(2) 
Such a functional form penalizes higher representations as being less interpretable, and encourages the construction of a sparse linear model. We tested other definitions of and empirically found the logarithmic definition to be most stable. As per Eq. 2, decreases monotonically with and has the property , i.e, a linear surrogate model with only the intercept term has maximal simplicity evaluated as zero, and gives . This Simplicity function so defined is a functional of the interpretation and is denoted .
At the same time, we introduce an Unfaithfulness function where represents an appropriate distance metric between the data instance and some other data point in . We define more rigorously before the end of this subsection. Intuitively, captures the deviation from blackbox model behaviour within the neighborhood of interest, where different points in the neighborhood carry a weight that depends inversely on the distance from the instance being explained.
Given the Simplicity and Unfaithfulness functions and , we define the Interpretation Free Energy as:
(3) 
Here is a tradeoff parameter that plays a similar role as temperature in thermodynamics. Directly inspired by Ref. Ribeiro, Singh, and Guestrin, 2016, we then postulate that an ideal interpretation model valid within a local neighborhood should be as simple as possible while being as faithful as possible. Such a model can be obtained by minimizing the Interpretation Free Energy in Eq. 3. As we show in the next paragraph when we visit a precise construction of , we have as we vary the interpretation , thereby giving the same fundamental convexity property as the Helmholtz Free Energy described in the introduction. In other words, there exists a unique set of values for that minimizes the Interpretation Free Energy. All interpretations consistent with this pair of values are equally valid interpretations of the ground truth in the vicinity of the data point being explained.
We now describe the construction of the Unfaithfulness function that guarantees the crucial monotonic relation central to TERP. Consider a specific problem where is a highdimensional instance for which an explanation is needed for blackbox model prediction . We first generate a neighborhood of samples , and associated blackbox predictions . Afterwards, a linear, local surrogate model with nonzero coefficients corresponding to observed features (Eq. 1) is built by minimizing weighted squares of the residuals between groundtruth and all possible coefficient representations/interpretations :
(4) 
Here is a gaussian similarity measure with , where distance between a neighborhood sample and instance to be explained is defined by considering the sum of differences across all the features, (see Sec. II.2 for details). The kernel width can be used to tune the distribution of . Too high or low will result in narrow distribution with peaks close to or respectively. TERP implements a simple grid search algorithmLerman (1980) to find that produces a spreadout distribution.
The minimized quantity also serves as the Unfaithfulness measure possible with coefficient models. With so defined, it can be seen that increasing can not increase , since a model with +1 nonzero coefficients will be less or at best equally unfaithful as a model with nonzero coefficients defined in Eq. 1.
Thus, both and decrease with increasing , giving us the sought after monotonicity. This then gives a unique minimum at a critical with maximal simplicity and minimum unfaithfulness as illustrated in Fig. 1. With this definition of we write down the final expression for the Interpretation Free Energy as a function of the number of nonzero coefficients in the interpretation:
(5) 
With this setup, we now describe a complete protocol for implementing TERP as shown in Fig. 2. It begins by obtaining the trained blackbox model which will be used later to generate predictions for neighborhood data. Afterwards, a particular blackbox prediction
in onehot encoded form corresponding to a highdimensional instance
is chosen for TERP explanation.ii.2 Sampling data neighborhoods in modelagnostic and modeldependent manners
As can be seen from the discussion in Sec. II.1, the Interpretation Free Energy is a functional of the interpretation as well as a distance measure which quantifies distance from the specific instance of data being explained. We want the interpretation to be valid in vicinity of this data point, i.e. for data points deemed similar to the specific data being explained, and helps us quantify this vicinity. A key question now is how to appropriately calculate this distance metric , which is crucial for evaluating the similarity measure . As discussed below, can be calculated by using the input feature space or using an abstract, improved representation of the features.
Local surrogate model family of methods typically generate new neighborhood data by randomly perturbing the highdimensional input space. The primary reason behind not using already existing data that was used to train the blackbox model and instead generating new data is that, practical highdimensional input data is typically sparse in nature. Thus it might not do a good job of generating samples from local neighborhood of the data instance being explained. Another more practical concern could be that the training data used to set up the model is no longer available. We call this a modelagnostic approach for generating new neighborhood data for any given data instance and corresponding predictions , that can be directly employed in TERP in Fig. 2.
However, certain classes of blackbox models (e.g, convolutional neural networks,
Gu et al. (2018) information bottleneck based approachesTishby, Pereira, and Bialek (2000); Alemi et al. (2016), and many others) work by mapping the highdimensional input space into a lowdimensional latent space representation. This allows us to appropriately assign similarity measures in the vicinity of any data point sampled from the highdimensional input space, helping with the issue of sparsity. A subtle assumption being made in this approach is that Euclidean distance measures are applicable in the latent space. Developing better distance measures for latent space will be subject of future investigations. We call this approach a modeldependent approach. We demonstrate the use of both methods in our numerical results in Sec. III.ii.3 Monte Carlo procedure for calculating Unfaithfulness with forward feature selection
After generating neighborhood data, TERP standardizes all the input and latent variables (in the modelagnostic and modeldependent schemes respectively) by subtracting the mean and dividing by the standard deviation. As a result, the feature contributions can be directly extracted as the local, surrogate model weights. Once neighborhood data (
) around a specific instance (), corresponding onehot encoded blackbox predictions , and similarity measures () are obtained, this local surrogate model can be constructed. Since, calculating is trivial for any using Eq. 2, can be evaluated by following Eq. 5. We establish a baseline unfaithfulness () by employing a linear model with i.e, including only in Eq. 4. We then employ a Monte Carlo forward feature selection algorithm using a Metropolis criterionMetropolis et al. (1953) for calculating as summarized in Algorithm 1. The central idea in this algorithm is that an introduction of stochasticity in determining which nonzero parameters are being scanned, leads to much more rapid convergence compared to brute force calculations involving testing all possible interpretations with nonzero coefficients in Eq. 1. Additionally, in this forward feature selection implementation, weights for a coefficient model are initialized by inheriting weights from the best coefficient model, resulting in faster convergence. If the addition of a feature does not decrease , then the coefficient corresponding to that feature is assigned a trivial weight of zero. Every step of the algorithm guarantees that we are moving in the right direction, i.e. minimizing the Unfaithfulness by increasing . However it does not guarantee that we have obtained the lowest possible for any particular , which is a common limitation for any global optimization procedure.Iii Applications to different domains
In this section, we look at domains that have seen rapid applications of AI driven methods and apply TERP to explain predictions coming from widely used blackbox models. We focus on AI and ML methods for solving the problems of image classification, tabular data analysis, and more recently in the use of analyzing and enhancing molecular dynamics (MD) simulations.Wang, Ribeiro, and Tiwary (2020) Although these methods are becoming increasingly popular, use of interpretability approaches particularly for the last class of problems have not been systematically applied to rationalize the deep neural networks at the heart of such methods.
iii.1 Image classification: MobileNets
Convolutional neural network (CNN) is a class of AI that has become very popular and is constructed from a deep, nonfully connected feed forward artificial neural network (ANN). Because of their unique architecture, CNNs are efficient in analyzing data with local correlations and have numerous applications in computer vision and in other fields.
Traore, KamsuFoguem, and Tangara (2018); Giménez, Palanca, and Botti (2020); Pelletier, Webb, and Petitjean (2019) Per construction, CNNs are blackbox models and because of their practical usage, it is desirable to employ an interpretation scheme to validate their predictions before deploying them.In this work, we examine MobileNetsHoward et al. (2017), a particular CNN implementation for image recognition that is suitable for mobile devices due to it being architecturally lightweight. We trained a MobileNet model using the publicly available Largescale CelebFaces Attributes (CelebA)Liu et al. (2015) dataset to learn features from Human facial images. Details of the architecture and training procedure are provided in Supporting Information (SI).
Fig. 3 shows results from having employed TERP to explain feature predictions from four images that were not present in the training data. For this purpose, every image was divided into superpixels by using the SLIC image segmentation algorithm.Achanta et al. (2010) These superpixels are then perturbed to generate neighborhood data based on which a linear, interpretable model is constructed by minimizing the Interpretation Free Energy in Eq. 5. We can see from Fig. 3 (a), (b), and (c) respectively for the attributes ‘smiling’, ‘goatee’, and ‘necktie’, that the blackbox model made predictions based on reasons that a human reader of this manuscript would perceive as justified. However, for Fig. 3 (d), the blackbox model predicted the attribute ‘blonde hair’, which is clearly wrong as identified by TERP, which shows that the attribute leading to this classification is nothing to do with hair or its color. For these four images, TERP generates the interpretation free energy minima at four different for these instances as shown in Fig. 3 (e). TERP parameters for these four explanations are provided in SI.
iii.2 Heart disease prediction: XGBoost
XGBoost (Extreme Gradient Boosting) is a powerful ML library that has become very popular in practical applications
Zoabi, DeriRozov, and Shomron (2021); Nobre and Neves (2019); Dhaliwal, Nahid, and Abbas (2018) due to its excellent performance, flexibility, and ease of implementation.Chen et al. (2019)This library is capable of analyzing both numerical and categorical data and is typically used to train gradient boosted decision trees. Here, we train an XGBoost classifier on the Heart Disease Dataset from the UC Irvine Machine Learning Repository.
Dua and Graff (2017); Detrano et al. (1989); Aha and Kibler (1988); Gennari, Langley, and Fisher (1989) The dataset contains different feature reports from patients collected by four different hospitals. The features are: age, sex, chest pain type (cp), resting blood pressure (trestbps), serum cholesterol in mg/dl (chol), fasting blood sugar mg/dl (fbs), resting electrocardiographic results (restecg), maximum heart rate achieved (thalach), exercise induced angina (exang), ST depression induced by exercise relative to rest (oldpeak), slope of the peak exercise ST segment (slope), number of major vessels (03) colored by fluoroscopy (ca), thalassemia (thal). The dataset includes both categorical and numerical data and missing features for instances were populated using a dummy value (negative integer). We employed of the total data, i.e, instances to train the classifier, and the rest of the data was used for validation purposes. The XGBoost parameters used to train the model have been reported in SI.We used TERP to explain a specific prediction of positive heart disease prediction. Detailed discussions for all interpretations are provided in the SI. Interestingly, TERP identifies that for an interpretation with , the feature ‘sex’ played highest role in the blackbox XGBoost model prediction. This is possibly due to a bias in the training data, where male patients outnumbered female patients by a factor of , and the fraction of patients with heart disease was much higher for males as shown in Fig. 4 (b). This demonstrates a key advantage of TERP since it will not be obvious when using a global feature attribution scheme such as SHAP as shown in the SI (Figure S3b). For interpretation, fasting blood sugar mg/dl (fbs) and chest pain (cp) were given the highest importance by the blackbox model when predicting heart disease for this instance (Fig. 4 (c), (d)). Additionally, the instance was deliberately populated with two missing fields for slope and colored by fluoroscopic (ca) features. In this regard, the XGBoost classifier correctly learnt that these features are not relevant for this prediction by assigning almost zero weight for all models as discussed in the SI (Figure S4). Fig. 4 (a) shows a minimum for at .
Thus, this example shows how TERP successfully checked for training data bias, and the effects of missing values in the blackbox model prediction that can be commonly found in practical problems.
iii.3 AIaugmented MD method: VAMPnets
Variational approach for markov processes (VAMPnets) is a popular technique for analyzing molecular dynamics (MD) trajectories. VAMPnets can be used to featurize, transform inputs to a lower dimensional representation, and construct a markov state modelBowman, Pande, and Noé (2013) in an automated manner by maximizing the so called VAMP score. Detailed discussion of VAMPnets theory and parameters are provided in the SI.
In this work, we trained a VAMPnet model on a standard toy system: alanine dipeptide in vacuum. The system was parametrized using CHARMM36mHuang et al. (2017) forcefield, and a ns MD simulation at K temperature and atm pressure was performed in GROMACS.Van Der Spoel et al. (2005) Afterwards, an 8dimensional input space with sines and cosines of all the dihedral angles was constructed and passed to VAMPnet. For the chosen parameters, VAMPnet was able to identify three metastable states I, II, and III as shown in Fig. 5 (a).
To interpret the VAMPnet model using TERP, we picked three configurations A, B, and C corresponding to three datapoints at the boundaries between the three pairs of states (I, II), (II, III), and (III, I) respectively, thus likely to be a configuration from the transition state ensemble for moving between these pairs of states. These three instances were chosen for TERP analysis, with the goal of understanding the reasons behind their classification under respective transition metastable states. At first, neighborhoods around each of these instances were generated by randomly perturbing the input space based on the standard deviation of the respective feature. The generated neighborhood data was then used to construct linear, local interpretable models using Eq. 5. As shown in Fig. 5 (c), TERP identified minima at and for these configurations respectively. The relative feature importance for each of these models can be used to explain the blackbox VAMPnet model predictions. Fig. 5 (e) shows that VAMPnet classified A at the boundary between two specific metastable states by considering the dihedral angle, while for configurations B, and C both , and dihedral angles were taken into account. The feature attributions learned by TERP to explain VAMPnet predictions for this system are in agreement with previous literature,Bolhuis, Dellago, and Chandler (2000) thereby showing that VAMPnet worked here for the right reasons and thus can be trusted.
iii.4 AIaugmented MD method: SPIB
The second AIaugmented MD method we explain using TERP is the State predictive information bottleneck (SPIB) method.Wang and Tiwary (2021) SPIB is an information bottleneck based framework that takes MD trajectory order parameters (OPs) as inputs and constructs a low dimensional latent space representation by predicting the metastable state of the molecular system after a short time delay . This is implemented through an optimal encoder and decoder combination. The decoder ensures that the model retains as much predictive power as possible while the encoder ensures as little information as possible has been used for that prediction. It has been shown in previous works Wang et al. (2022); Wang and Tiwary (2021) that this latent space approximates the reaction coordinate describing system behaviour.
In this work, we ran a ns MD simulation of a small residue peptide aminoisobutyric acid (Aib) at K temperature and atm pressure implemented through NoseHoover thermostat and ParrinelloRahman barostatNosé (1984); Parrinello and Rahman (1980) in GROMACS. The peptide was solvated in TIP3P water molecules and CHARMM36m forcefield was used to parametrize (Aib) prepared using CHARMMGUI.Lee et al. (2016)
To analyze the resultant MD trajectory, a deep nonlinear SPIB artificial neural network with two enocoder and two decoder layers architecture was constructed. A dimensional input space with sines and cosines of all the and dihedral angles were passed as input to SPIB which detected 10 converged metastable states as shown in Fig. 6 (c). Here the most populated states are and corresponding to right (R) and left (L) handed chiral structures respectively. Interestingly, we can see from Fig. 6(d) that SPIB places these states as far from each other as possible. This indicates that the use of the modeldependent scheme from Sec. II.2 to compute similarity measures on the basis of latent space could be justified here. We have added detailed discussion about the dihedral angles, and SPIB training process in the SI.
To achieve improved understanding of the transition process we employed TERP to probe regions near the SPIB learnt transition states. Using TERP, we learnt the most important features among all the , and dihedral angles for different regions A, B, C, D, E, F, G, H, I, and J shown in Fig. 6(c).
We can see from Fig. 6 (b), and (c) that the transition between SPIB states and that correspond to the fully right (R) and left (L)handed configurations of the peptide respectively, can happen following the top pathway highlighted by representative instances A, B, C, D, and E or the bottom pathway by F, G, H, and I, J. Fig. 6 (c) also highlights the relevant features corresponding to nonzero coefficients of the interpretation using TERP. By performing TERP for the different states lying in between these two end configurations, we see that starting from SPIB converged state , the molecule may reach state as adjacent residues undergo chiral transitions. From Fig. 6 (c), for instance A, SPIB considered dihedral angle for assigning a metastable state. Similarly for B,C,D, and E instances the adjacent dihedral angles of increasing order were considered. After reaching the metastable state , the molecule can transition back to state if the dihedral angles undergo righthanded transitions starting from the end residues to the initial residues. However, if the initial residues, e.g, or undergo righthanded transitions before the end residues, one possibility is that the molecule will go back to state by following the bottom pathway through F, G, H, I, and J instances as learnt by TERP. This result matches with previous literature Biswas, Lickert, and Stock (2018) and validates SPIB model behavior for these representative instances.
Iv Discussion
The use of AIbased blackbox models has now become a staple feature across domains as they can be deployed without any need for a fundamental understanding of the governing processes at work. This however leads to questions about whether an AI model can be trusted and how one should proceed about deriving the meaning of AI based models. Numerous approaches have been proposed to tackle this problem Ribeiro, Singh, and Guestrin (2016); Fisher, Rudin, and Dominici (2019); Lundberg and Lee (2017); Sundararajan, Taly, and Yan (2017); Wachter, Mittelstadt, and Russell (2017), however very few with the notable exception of Ref. Wellawatte, Seshadri, and White, 2022; Kikutsuji et al., 2022 have been used in molecular simulations. In this work, we established a thermodynamic framework for generating interpretable representations of complex blackbox models, wherein the optimal representation was expressed as one that minimizes unfaithfulness to the ground truth model, while still staying as simple as possible. This tradeoff was quantified through the concept of an Interpretation Free Energy which has simple but useful mathematical properties guaranteeing the existence of unique minima. The minima is found using a Monte Carlo forward feature selection scheme. We demonstrated the use of this approach on different problems using AI, such as classifying images, predicting heart disease and labeling biomolecular conformations. We believe that arguably, this is one of the first applications of interpretability schemes to AIaugmented molecular dynamics, which is a rapidly burgeoning subdiscipline in its own right. In TERP, as well as in other local surrogate model based schemes such as LIMERibeiro, Singh, and Guestrin (2016), interpretations are generated that are valid locally in vicinity of the data instance being explained. This raises a key question  how does one define locality? Here for biomolecular systems we applied TERP to methods such as VAMPnetMardt et al. (2018) and SPIB Wang and Tiwary (2021). Especially for the latter, we were able to exploit the lowdimensional latent space which captures attributes of the reaction coordinate of the system.Wang and Tiwary (2021); Wang et al. (2022) In future work we would like to more carefully visit the question of introducing kinetically relevant distance metrics on the lowdimensional manifold,Tsai and Tiwary (2021) including for the case when the data is being generated from biased importance sampling.Tsai, Smith, and Tiwary (2021) A second direction in future work will involve exploring if by systematically varying the parameter the interpretation soobtained changes qualitatively. This could help develop strategies for picking a range of values of where minimizing the Interpretation Free Energy does not lead to drastically different interpretations. However, we believe even in its current version, TERP should be useful to the community for generating optimally interpretable representations of complex AIdriven models in molecular sciences and beyond. Code for TERP is available at github.com/tiwarylab/TERP.
V Acknowledgments
This work was supported by the National Science Foundation, grant no. CHE2044165. The authors also thank Deepthought2, MARCC, and XSEDE (projects CHE180007P and CHE180027P) for the computational resources used in this work.
References
References

Dhar (2013)
V. Dhar, “Data science and prediction,” Communications of the ACM
56, 64–73 (2013).  ShalevShwartz and BenDavid (2014) S. ShalevShwartz and S. BenDavid, Understanding machine learning: From theory to algorithms (Cambridge university press, 2014).

LeCun, Bengio, and Hinton (2015)
Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature
521, 436–444 (2015).  Davies et al. (2021) A. Davies, P. Veličković, L. Buesing, S. Blackwell, D. Zheng, N. Tomašev, R. Tanburn, P. Battaglia, C. Blundell, A. Juhász, et al., “Advancing mathematics by guiding human intuition with ai,” Nature 600, 70–74 (2021).
 Carleo et al. (2019) G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. VogtMaranto, and L. Zdeborová, “Machine learning and the physical sciences,” Reviews of Modern Physics 91, 045002 (2019).
 Mater and Coote (2019) A. C. Mater and M. L. Coote, “Deep learning in chemistry,” Journal of chemical information and modeling 59, 2545–2559 (2019).
 Hamet and Tremblay (2017) P. Hamet and J. Tremblay, “Artificial intelligence in medicine,” Metabolism 69, S36–S40 (2017).
 Baldi and Brunak (2001) P. Baldi and S. Brunak, Bioinformatics: the machine learning approach (MIT press, 2001).
 Brunton and Kutz (2022) S. L. Brunton and J. N. Kutz, Datadriven science and engineering: Machine learning, dynamical systems, and control (Cambridge University Press, 2022).
 LoyolaGonzalez (2019) O. LoyolaGonzalez, “Blackbox vs. whitebox: Understanding their advantages and weaknesses from a practical point of view,” IEEE Access 7, 154096–154113 (2019).
 Callen (1985) H. B. Callen, “Thermodynamics and an introduction to thermostatistics,” (1985).
 Kumar and Minz (2014) V. Kumar and S. Minz, “Feature selection: a literature review,” SmartCR 4, 211–229 (2014).
 Chen and Guestrin (2016) T. Chen and C. Guestrin, “XGBoost,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2016).
 Howard et al. (2017) A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” (2017).
 Frenkel and Smit (2001) D. Frenkel and B. Smit, Understanding molecular simulation: from algorithms to applications, Vol. 1 (Elsevier, 2001).
 Doerr et al. (2021) S. Doerr, M. Majewski, A. Pérez, A. Kramer, C. Clementi, F. Noe, T. Giorgino, and G. De Fabritiis, “Torchmd: A deep learning framework for molecular simulations,” Journal of chemical theory and computation 17, 2355–2363 (2021).
 Han et al. (2017) J. Han, L. Zhang, R. Car, et al., “Deep potential: A general representation of a manybody potential energy surface,” arXiv preprint arXiv:1707.01478 (2017).

Gao et al. (2020)
X. Gao, F. Ramezanghorbani, O. Isayev, J. S. Smith, and A. E. Roitberg, “Torchani: a free and open source pytorchbased deep learning implementation of the ani neural network potentials,” Journal of chemical information and modeling
60, 3408–3415 (2020).  Ma and Dinner (2005) A. Ma and A. R. Dinner, “Automatic method for identifying reaction coordinates in complex systems,” The Journal of Physical Chemistry B 109, 6769–6779 (2005).
 Wang, Ribeiro, and Tiwary (2020) Y. Wang, J. M. L. Ribeiro, and P. Tiwary, “Machine learning approaches for analyzing and enhancing molecular dynamics simulations,” Current opinion in structural biology 61, 139–145 (2020).

Ribeiro et al. (2018)
J. M. L. Ribeiro, P. Bravo, Y. Wang, and P. Tiwary, “Reweighted autoencoded variational bayes for enhanced sampling (rave),” The Journal of chemical physics
149, 072301 (2018).  VandenEijnden (2014) E. VandenEijnden, “Transition path theory,” An introduction to Markov state models and their application to long timescale molecular simulation , 91–100 (2014).
 Smith et al. (2020) Z. Smith, P. Ravindra, Y. Wang, R. Cooley, and P. Tiwary, “Discovering protein conformational flexibility through artificialintelligenceaided molecular dynamics,” The Journal of Physical Chemistry B 124, 8221–8229 (2020).
 Mardt et al. (2018) A. Mardt, L. Pasquali, H. Wu, and F. Noé, “Vampnets for deep learning of molecular kinetics,” Nature communications 9, 1–11 (2018).
 Wang and Tiwary (2021) D. Wang and P. Tiwary, “State predictive information bottleneck,” The Journal of Chemical Physics 154, 134111 (2021).
 Beyerle, Mehdi, and Tiwary (2022) E. R. Beyerle, S. Mehdi, and P. Tiwary, “Quantifying energetic and entropic pathways in molecular systems,” The Journal of Physical Chemistry B (2022).
 Bolhuis, Dellago, and Chandler (2000) P. G. Bolhuis, C. Dellago, and D. Chandler, “Reaction coordinates of biomolecular isomerization,” Proceedings of the National Academy of Sciences 97, 5877–5882 (2000).
 Mehdi et al. (2022) S. Mehdi, D. Wang, S. Pant, and P. Tiwary, “Accelerating allatom simulations and gaining mechanistic understanding of biophysical systems through state predictive information bottleneck,” Journal of Chemical Theory and Computation 18, 3231–3238 (2022).
 Ribeiro, Singh, and Guestrin (2016) M. T. Ribeiro, S. Singh, and C. Guestrin, “” why should i trust you?” explaining the predictions of any classifier,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (2016) pp. 1135–1144.
 Fisher, Rudin, and Dominici (2019) A. Fisher, C. Rudin, and F. Dominici, “All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously.” J. Mach. Learn. Res. 20, 1–81 (2019).
 Lundberg and Lee (2017) S. M. Lundberg and S.I. Lee, “A unified approach to interpreting model predictions,” Advances in neural information processing systems 30 (2017).
 Gupta, Kulkarni, and Mukherjee (2021) A. Gupta, M. Kulkarni, and A. Mukherjee, “Accurate prediction of bform/aform dna conformation propensity from primary sequence: A machine learning and free energy handshake,” Patterns 2, 100329 (2021).
 Sundararajan, Taly, and Yan (2017) M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” in International conference on machine learning (PMLR, 2017) pp. 3319–3328.
 Wachter, Mittelstadt, and Russell (2017) S. Wachter, B. Mittelstadt, and C. Russell, “Counterfactual explanations without opening the black box: Automated decisions and the gdpr,” Harv. JL & Tech. 31, 841 (2017).
 Wellawatte, Seshadri, and White (2022) G. P. Wellawatte, A. Seshadri, and A. D. White, “Model agnostic generation of counterfactual explanations for molecules,” Chemical science 13, 3697–3705 (2022).
 Lerman (1980) P. Lerman, “Fitting segmented regression models by grid search,” Journal of the Royal Statistical Society: Series C (Applied Statistics) 29, 77–84 (1980).

Gu et al. (2018)
J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai, and T. Chen, “Recent advances in convolutional neural networks,” Pattern Recognition
77, 354–377 (2018).  Tishby, Pereira, and Bialek (2000) N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,” (2000).
 Alemi et al. (2016) A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” (2016).
 Metropolis et al. (1953) N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, “Equation of state calculations by fast computing machines,” The journal of chemical physics 21, 1087–1092 (1953).
 Traore, KamsuFoguem, and Tangara (2018) B. B. Traore, B. KamsuFoguem, and F. Tangara, “Deep convolution neural network for image recognition,” Ecological Informatics 48, 257–268 (2018).

Giménez, Palanca, and Botti (2020)
M. Giménez, J. Palanca, and V. Botti, “Semanticbased padding in convolutional neural networks for improving the performance in natural language processing. a case of study in sentiment analysis,” Neurocomputing
378, 315–323 (2020).  Pelletier, Webb, and Petitjean (2019) C. Pelletier, G. I. Webb, and F. Petitjean, “Temporal convolutional neural network for the classification of satellite image time series,” Remote Sensing 11, 523 (2019).
 Liu et al. (2015) Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Proceedings of International Conference on Computer Vision (ICCV) (2015).
 Achanta et al. (2010) R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, “Slic superpixels,” Tech. Rep. (2010).
 Zoabi, DeriRozov, and Shomron (2021) Y. Zoabi, S. DeriRozov, and N. Shomron, “Machine learningbased prediction of COVID19 diagnosis based on symptoms,” npj Digital Medicine 4 (2021).

Nobre and Neves (2019)
J. Nobre and R. F. Neves, “Combining principal component analysis, discrete wavelet transform and xgboost to trade in the financial markets,” Expert Systems with Applications
125, 181–194 (2019).  Dhaliwal, Nahid, and Abbas (2018) S. S. Dhaliwal, A.A. Nahid, and R. Abbas, “Effective intrusion detection system using xgboost,” Information 9, 149 (2018).
 Chen et al. (2019) T. Chen, T. He, M. Benesty, and V. Khotilovich, “Package ?xgboost?,” R version 90, 1–66 (2019).
 Dua and Graff (2017) D. Dua and C. Graff, “UCI machine learning repository,” (2017).
 Detrano et al. (1989) R. Detrano, A. Janosi, W. Steinbrunn, M. Pfisterer, J.J. Schmid, S. Sandhu, K. H. Guppy, S. Lee, and V. Froelicher, “International application of a new probability algorithm for the diagnosis of coronary artery disease,” The American journal of cardiology 64, 304–310 (1989).
 Aha and Kibler (1988) D. Aha and D. Kibler, “Instancebased prediction of heartdisease presence with the cleveland database,” University of California 3, 3–2 (1988).
 Gennari, Langley, and Fisher (1989) J. H. Gennari, P. Langley, and D. Fisher, “Models of incremental concept formation,” Artificial intelligence 40, 11–61 (1989).
 Bowman, Pande, and Noé (2013) G. R. Bowman, V. S. Pande, and F. Noé, An introduction to Markov state models and their application to long timescale molecular simulation, Vol. 797 (Springer Science & Business Media, 2013).
 Huang et al. (2017) J. Huang, S. Rauscher, G. Nawrocki, T. Ran, M. Feig, B. L. De Groot, H. Grubmüller, and A. D. MacKerell, “Charmm36m: an improved force field for folded and intrinsically disordered proteins,” Nature methods 14, 71–73 (2017).
 Van Der Spoel et al. (2005) D. Van Der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark, and H. J. Berendsen, “Gromacs: fast, flexible, and free,” Journal of computational chemistry 26, 1701–1718 (2005).
 Wang et al. (2022) D. Wang, R. Zhao, J. D. Weeks, and P. Tiwary, “Influence of longrange forces on the transition states and dynamics of nacl ionpair dissociation in water,” The Journal of Physical Chemistry B 126, 545–551 (2022).
 Nosé (1984) S. Nosé, “A unified formulation of the constant temperature molecular dynamics methods,” The Journal of chemical physics 81, 511–519 (1984).
 Parrinello and Rahman (1980) M. Parrinello and A. Rahman, “Crystal structure and pair potentials: A moleculardynamics study,” Physical review letters 45, 1196 (1980).
 Lee et al. (2016) J. Lee, X. Cheng, J. M. Swails, M. S. Yeom, P. K. Eastman, J. A. Lemkul, S. Wei, J. Buckner, J. C. Jeong, Y. Qi, et al., “Charmmgui input generator for namd, gromacs, amber, openmm, and charmm/openmm simulations using the charmm36 additive force field,” Journal of chemical theory and computation 12, 405–413 (2016).

Biswas, Lickert, and Stock (2018)
M. Biswas, B. Lickert, and G. Stock, “Metadynamics enhanced markov modeling of protein dynamics,” The Journal of Physical Chemistry B
122, 5508–5514 (2018).  Kikutsuji et al. (2022) T. Kikutsuji, Y. Mori, K.i. Okazaki, T. Mori, K. Kim, and N. Matubayasi, “Explaining reaction coordinates of alanine dipeptide isomerization obtained from deep neural networks using explainable artificial intelligence (xai),” The Journal of Chemical Physics 156, 154108 (2022).
 Tsai and Tiwary (2021) S.T. Tsai and P. Tiwary, “On the distance between a and b in molecular configuration space,” Molecular Simulation 47, 449–456 (2021).

Tsai, Smith, and Tiwary (2021)
S.T. Tsai, Z. Smith, and P. Tiwary, “Sgoopd: Estimating kinetic distances and reaction coordinate dimensionality for rare event systems from biased/unbiased simulations,” Journal of Chemical Theory and Computation
17, 6757–6765 (2021).