Thermodynamics of Interpretation

by   Shams Mehdi, et al.
University of Maryland

Over the past few years, different types of data-driven Artificial Intelligence (AI) techniques have been widely adopted in various domains of science for generating predictive black-box models. However, because of their black-box nature, it is crucial to establish trust in these models before accepting them as accurate. One way of achieving this goal is through the implementation of a post-hoc interpretation scheme that can put forward the reasons behind a black-box model prediction. In this work, we propose a classical thermodynamics inspired approach for this purpose: Thermodynamically Explainable Representations of AI and other black-box Paradigms (TERP). TERP works by constructing a linear, local surrogate model that approximates the behaviour of the black-box model within a small neighborhood around the instance being explained. By employing a simple forward feature selection Monte Carlo algorithm, TERP assigns an interpretability free energy score to all the possible surrogate models in order to choose an optimal interpretation. Additionally, we validate TERP as a generally applicable method by successfully interpreting four different classes of black-box models trained on datasets coming from relevant domains, including classifying images, predicting heart disease and classifying biomolecular conformations.


page 4

page 6

page 8

page 9


Explainable Artificial Intelligence (XAI) for Internet of Things: A Survey

Black-box nature of Artificial Intelligence (AI) models do not allow use...

bLIMEy: Surrogate Prediction Explanations Beyond LIME

Surrogate explainers of black-box machine learning predictions are of pa...

Post-hoc explanation of black-box classifiers using confident itemsets

It is difficult to trust decisions made by Black-box Artificial Intellig...

Making Sense of CNNs: Interpreting Deep Representations Their Invariances with INNs

To tackle increasingly complex tasks, it has become an essential ability...

mSHAP: SHAP Values for Two-Part Models

Two-part models are important to and used throughout insurance and actua...

Mixture of Linear Models Co-supervised by Deep Neural Networks

Deep neural network (DNN) models have achieved phenomenal success for ap...

Evolution of Transparent Explainable Rule-sets

Most AI systems are black boxes generating reasonable outputs for given ...

I Introduction

Performing predictions based on observed data is a general problem of interest in a wide range of scientific disciplines. A traditional approach is to construct problem-specific mathematical models that relate observed data features or inputs to predictions or outputs. For many practical problems however, such relationships can be too complex to establish by manually analyzing data.Dhar (2013)

In recent years, there has been an explosion of alternative purely data-driven and understanding-agnostic approaches involving Artificial Intelligence (AI) / Machine Learning (ML) techniques effecting numerous areas of social and physical sciences and engineering.

Shalev-Shwartz and Ben-David (2014); LeCun, Bengio, and Hinton (2015); Davies et al. (2021); Carleo et al. (2019); Mater and Coote (2019); Hamet and Tremblay (2017); Baldi and Brunak (2001); Brunton and Kutz (2022) As compared to models derived on the basis of physical intuition and understanding, these data-driven AI/ML models are in principle agnostic to our understanding of the system. This is both the strength and shortcoming of this latter class of methods.

Per construction, these AI methods learn models for which the reasons behind predictions are difficult to understand by a human and are known as black-box models.Loyola-Gonzalez (2019) Typically, such black-box models are capable of learning very complicated relationships between inputs and can generate excellent predictions. However, it is natural to feel suspicious especially when designing further actionable policies on the basis of such opaque black-box models. One way to trust AI could be to not just make predictions on its basis, but also be able to explain why those specific predictions were made. On this basis, one could then at least rule out misleading AI models if the explanation or interpretation was being made due to some invalid reason. Such interpretations could also help understand better the domain of applicability of the AI and if there are certain types of data where it stops being reliable. Thus, it is important to interpret AI models to establish trust before accepting them as accurate or even further refine them if necessary.

In this work we view the problem of interpretation from the lens of classical thermodynamics.Callen (1985) One of the key postulates in thermodynamics states that there exists an entropy function for any system, which is a continuously differentiable and monotonically increasing function of the system’s energy . In absence of any constraints, equilibrium is characterized by the entropy being maximized. In presence of constraints, equilibrium is instead characterized by minimizing so-called free energies. For instance, for a closed system with fixed number of particles at constant temperature and volume , the equilibrium state is characterized by the Helmholtz Free Energy attaining its minimum value. Furthermore, due to the monotonicity postulate ,Callen (1985) minimizing when viewed as a function of and is a convex optimization problem. At equilibrium, a trade-off is achieved between minimizing the energy and maximizing the entropy . At any given temperature , there exists only one minimum value for the pair that minimizes the free energy at that temperature. All microstate configurations with this pair of

are then equally probable.

In the same vein as thermodynamics, we set up a formalism where interpretation or representation of any complex model can be expressed as a trade-off between its simplicity and unfaithfulness to the underlying ground truth. Just like how in thermodynamics the entropy increases with energy, i.e. higher energy states have higher entropy, in our framework, when appropriately defined, the unfaithfulness of an interpretation increases with its simplicity. More technically, we introduce a Simplicity function and an unfaithfulness function (see Sec. II for details) which depend monotonically on each other. We define the best interpretation as the simplest model that also minimizes unfaithfulness to the ground truth model being explained. This is expressed through an interpretation free energy where is a tunable parameter, analogous to temperature in thermodynamics. For any choice of , is then guaranteed to have exactly one minimum characterized by a pair of values . All interpretations corresponding to these values of and are then equally valid interpretations. By systematically decreasing we can then increase the complexity of the interpretation.

We call this approach Thermodynamically Explainable Representations of AI and other black-box Paradigms (TERP). In Sec. II we clarify details of , , as well as other crucial aspects of our approach. TERP has the following salient features:

  1. It is locally valid, i.e. interpretations are produced not for the entire dataset at the same time but in a tunable vicinity of any specific datapoint.

  2. It can be model-agnostic or model-dependent, i.e. it works even without assuming anything about the model being explained, while still being capable of using any model-specific information if available.

  3. It uses a surrogate model generation scheme which is implemented through a forward feature selection Kumar and Minz (2014) Monte Carlo algorithm.

TERP is a general protocol suitable for a wide variety of black-box models and datasets coming from simulations and real-life data. We demonstrate this generality by interpreting the widely used XGBoost

Chen and Guestrin (2016) and MobileNetsHoward et al. (2017) models trained to predict heart disease and classify images respectively. In addition, we have applied it to a domain of great current interest and of relevance to our own research,Frenkel and Smit (2001); Doerr et al. (2021); Han et al. (2017); Gao et al. (2020) namely the use of AI-augmented models for analyzing molecular dynamics (MD) simulations.Ma and Dinner (2005); Wang, Ribeiro, and Tiwary (2020) The aim of these methods is to learn and even accelerate the underlying physics governing the system. Wang, Ribeiro, and Tiwary (2020); Ribeiro et al. (2018) Application of an interpretation scheme would be very useful for deriving direct mechanistic insight from these simulations and in ensuring that these models are working as intended. For instance, a crucial topic of interest in this field is the behavior of the system near the so-called transition state,Vanden-Eijnden (2014) where the system goes from one metastable state to another.Smith et al. (2020) TERP directly answers this question by simply post-processing a trained AI-augmented MD model.

In this regard, we have applied TERP to find interpretable representations of two deep neural network based approaches to enhance MD. They are the recently developed VAMPnets

Mardt et al. (2018) and SPIBWang and Tiwary (2021); Beyerle, Mehdi, and Tiwary (2022) methods, applied to prototypical biophysical systems alanine dipeptide in vacuumBolhuis, Dellago, and Chandler (2000) and the chirally symmetric peptide -aminoisobutyric acid (Aib) in water.Mehdi et al. (2022)

TERP attributes feature contributions for a specific black-box prediction based on the non-zero weights of an approximate, linear model. Interpreting black-box models by building a local surrogate model is not new, and many other post-hoc analysis schemes for interpreting black-box models already exist, such as LIMERibeiro, Singh, and Guestrin (2016), permutation feature importance,Fisher, Rudin, and Dominici (2019) SHAP,Lundberg and Lee (2017); Gupta, Kulkarni, and Mukherjee (2021) integrated gradients,Sundararajan, Taly, and Yan (2017) and counterfactual explanations.Wachter, Mittelstadt, and Russell (2017); Wellawatte, Seshadri, and White (2022) Especially the LIME approach from Ref. Ribeiro, Singh, and Guestrin, 2016 closely inspires our work. However, TERP advances such methods by introducing the connection with thermodynamics which puts the optimization procedure on a more rigorous and also intuitive setting, and opens up research directions for further improvement, especially from the perspective of application of AI methods to problems in chemical and biological physics. We summarize some such avenues in Sec. IV.

Ii Theory

ii.1 Simplicity, Unfaithfulness and Interpretation Free Energy

Our starting point is some given dataset coming from an unknown ground truth . For a particular element , we seek linear, approximate interpretations or representations that are as simple as possible while also being as faithful as possible to in the vicinity of . We restrict ourselves to linear interpretations , with non-zero coefficients expressed as a linear combination of corresponding features , defined as:


where denotes the weight for feature , and denotes identity or null feature. A coefficient model has non-zero coefficients out of possible , with the other equaling 0. For such a coefficient model, we define a Simplicity function as:


Such a functional form penalizes higher- representations as being less interpretable, and encourages the construction of a sparse linear model. We tested other definitions of and empirically found the logarithmic definition to be most stable. As per Eq. 2, decreases monotonically with and has the property , i.e, a linear surrogate model with only the intercept term has maximal simplicity evaluated as zero, and gives . This Simplicity function so defined is a functional of the interpretation and is denoted .

At the same time, we introduce an Unfaithfulness function where represents an appropriate distance metric between the data instance and some other data point in . We define more rigorously before the end of this subsection. Intuitively, captures the deviation from black-box model behaviour within the neighborhood of interest, where different points in the neighborhood carry a weight that depends inversely on the distance from the instance being explained.

Given the Simplicity and Unfaithfulness functions and , we define the Interpretation Free Energy as:


Here is a trade-off parameter that plays a similar role as temperature in thermodynamics. Directly inspired by Ref. Ribeiro, Singh, and Guestrin, 2016, we then postulate that an ideal interpretation model valid within a local neighborhood should be as simple as possible while being as faithful as possible. Such a model can be obtained by minimizing the Interpretation Free Energy in Eq. 3. As we show in the next paragraph when we visit a precise construction of , we have as we vary the interpretation , thereby giving the same fundamental convexity property as the Helmholtz Free Energy described in the introduction. In other words, there exists a unique set of values for that minimizes the Interpretation Free Energy. All interpretations consistent with this pair of values are equally valid interpretations of the ground truth in the vicinity of the data point being explained.

We now describe the construction of the Unfaithfulness function that guarantees the crucial monotonic relation central to TERP. Consider a specific problem where is a high-dimensional instance for which an explanation is needed for black-box model prediction . We first generate a neighborhood of samples , and associated black-box predictions . Afterwards, a linear, local surrogate model with non-zero coefficients corresponding to observed features (Eq. 1) is built by minimizing weighted squares of the residuals between ground-truth and all possible coefficient representations/interpretations :


Here is a gaussian similarity measure with , where distance between a neighborhood sample and instance to be explained is defined by considering the sum of differences across all the features, (see Sec. II.2 for details). The kernel width can be used to tune the distribution of . Too high or low will result in narrow distribution with peaks close to or respectively. TERP implements a simple grid search algorithmLerman (1980) to find that produces a spread-out distribution.

The minimized quantity also serves as the Unfaithfulness measure possible with -coefficient models. With so defined, it can be seen that increasing can not increase , since a model with +1 non-zero coefficients will be less or at best equally unfaithful as a model with non-zero coefficients defined in Eq. 1.

Thus, both and decrease with increasing , giving us the sought after monotonicity. This then gives a unique minimum at a critical with maximal simplicity and minimum unfaithfulness as illustrated in Fig. 1. With this definition of we write down the final expression for the Interpretation Free Energy as a function of the number of non-zero coefficients in the interpretation:


With this set-up, we now describe a complete protocol for implementing TERP as shown in Fig. 2. It begins by obtaining the trained black-box model which will be used later to generate predictions for neighborhood data. Afterwards, a particular black-box prediction

in one-hot encoded form corresponding to a high-dimensional instance

is chosen for TERP explanation.

Figure 1: Illustrative example showing the Interpretation Free Energy , Unfaithfulness , and Simplicity corresponding to non-zero coefficient interpretations. Strength of the contribution to can be tuned by the trade-off parameter . Here, results in minima for two different interpretations for the same at critical non-zero coefficient .
Figure 2: Flowchart illustrating complete TERP protocol. TERP constructs a local surrogate model with non-zero coefficients by implementing a Monte Carlo forward feature selection algorithm.

ii.2 Sampling data neighborhoods in model-agnostic and model-dependent manners

As can be seen from the discussion in Sec. II.1, the Interpretation Free Energy is a functional of the interpretation as well as a distance measure which quantifies distance from the specific instance of data being explained. We want the interpretation to be valid in vicinity of this data point, i.e. for data points deemed similar to the specific data being explained, and helps us quantify this vicinity. A key question now is how to appropriately calculate this distance metric , which is crucial for evaluating the similarity measure . As discussed below, can be calculated by using the input feature space or using an abstract, improved representation of the features.

Local surrogate model family of methods typically generate new neighborhood data by randomly perturbing the high-dimensional input space. The primary reason behind not using already existing data that was used to train the black-box model and instead generating new data is that, practical high-dimensional input data is typically sparse in nature. Thus it might not do a good job of generating samples from local neighborhood of the data instance being explained. Another more practical concern could be that the training data used to set up the model is no longer available. We call this a model-agnostic approach for generating new neighborhood data for any given data instance and corresponding predictions , that can be directly employed in TERP in Fig. 2.

However, certain classes of black-box models (e.g, convolutional neural networks,

Gu et al. (2018) information bottleneck based approachesTishby, Pereira, and Bialek (2000); Alemi et al. (2016), and many others) work by mapping the high-dimensional input space into a low-dimensional latent space representation. This allows us to appropriately assign similarity measures in the vicinity of any data point sampled from the high-dimensional input space, helping with the issue of sparsity. A subtle assumption being made in this approach is that Euclidean distance measures are applicable in the latent space. Developing better distance measures for latent space will be subject of future investigations. We call this approach a model-dependent approach. We demonstrate the use of both methods in our numerical results in Sec. III.

ii.3 Monte Carlo procedure for calculating Unfaithfulness with forward feature selection

After generating neighborhood data, TERP standardizes all the input and latent variables (in the model-agnostic and model-dependent schemes respectively) by subtracting the mean and dividing by the standard deviation. As a result, the feature contributions can be directly extracted as the local, surrogate model weights. Once neighborhood data (

) around a specific instance (), corresponding one-hot encoded black-box predictions , and similarity measures () are obtained, this local surrogate model can be constructed. Since, calculating is trivial for any using Eq. 2, can be evaluated by following Eq. 5. We establish a baseline unfaithfulness () by employing a linear model with i.e, including only in Eq. 4. We then employ a Monte Carlo forward feature selection algorithm using a Metropolis criterionMetropolis et al. (1953) for calculating as summarized in Algorithm 1. The central idea in this algorithm is that an introduction of stochasticity in determining which non-zero parameters are being scanned, leads to much more rapid convergence compared to brute force calculations involving testing all possible interpretations with non-zero coefficients in Eq. 1. Additionally, in this forward feature selection implementation, weights for a coefficient model are initialized by inheriting weights from the best coefficient model, resulting in faster convergence. If the addition of a feature does not decrease , then the coefficient corresponding to that feature is assigned a trivial weight of zero. Every step of the algorithm guarantees that we are moving in the right direction, i.e. minimizing the Unfaithfulness by increasing . However it does not guarantee that we have obtained the lowest possible for any particular , which is a common limitation for any global optimization procedure.

Input Number of features (), number of Monte Carlo iterations , Metropolis parameter , neighborhood datapoints , associated weights , and black-box predictions
      Output linear, local surrogate model with non-zero coefficients .

1:Create variables for updating coefficients(), intercept (); storing best coefficients (), best intercept (), and minimum unfaithfulness () respectively. Initialize all values with zero.
2:for  to  do
3:     if  then
4:         Set any one coefficient in to . E.g, . Set to 1.
5:     else
6:         Inherit coefficient and intercept values from the linear model learned for .
7:     end if
8:     Calculate unfaithfulness
9:     Set
10:     repeat
11:         Pick small random numbers

from a uniform distribution with mean

and add to non-zero coefficients respectively. Similarly, pick and add to I
12:         Randomly swap position of one of the coefficients with any coefficient in and obtain
13:         Calculate
14:         if  then
15:              Accept coefficients i.e, set
16:         else if , where  then
17:              Accept coefficients i.e, set
18:         else
19:              Reject coefficients
20:         end if
21:         if  then
22:              Set
23:              Set
24:              Set
25:         end if
27:     until 
28:end for
Algorithm 1 Monte Carlo algorithm for

Iii Applications to different domains

In this section, we look at domains that have seen rapid applications of AI driven methods and apply TERP to explain predictions coming from widely used black-box models. We focus on AI and ML methods for solving the problems of image classification, tabular data analysis, and more recently in the use of analyzing and enhancing molecular dynamics (MD) simulations.Wang, Ribeiro, and Tiwary (2020) Although these methods are becoming increasingly popular, use of interpretability approaches particularly for the last class of problems have not been systematically applied to rationalize the deep neural networks at the heart of such methods.

iii.1 Image classification: MobileNets

Convolutional neural network (CNN) is a class of AI that has become very popular and is constructed from a deep, non-fully connected feed forward artificial neural network (ANN). Because of their unique architecture, CNNs are efficient in analyzing data with local correlations and have numerous applications in computer vision and in other fields.

Traore, Kamsu-Foguem, and Tangara (2018); Giménez, Palanca, and Botti (2020); Pelletier, Webb, and Petitjean (2019) Per construction, CNNs are black-box models and because of their practical usage, it is desirable to employ an interpretation scheme to validate their predictions before deploying them.

In this work, we examine MobileNetsHoward et al. (2017), a particular CNN implementation for image recognition that is suitable for mobile devices due to it being architecturally light-weight. We trained a MobileNet model using the publicly available Large-scale CelebFaces Attributes (CelebA)Liu et al. (2015) dataset to learn features from Human facial images. Details of the architecture and training procedure are provided in Supporting Information (SI).

Fig. 3 shows results from having employed TERP to explain feature predictions from four images that were not present in the training data. For this purpose, every image was divided into superpixels by using the SLIC image segmentation algorithm.Achanta et al. (2010) These superpixels are then perturbed to generate neighborhood data based on which a linear, interpretable model is constructed by minimizing the Interpretation Free Energy in Eq. 5. We can see from Fig. 3 (a), (b), and (c) respectively for the attributes ‘smiling’, ‘goatee’, and ‘necktie’, that the black-box model made predictions based on reasons that a human reader of this manuscript would perceive as justified. However, for Fig. 3 (d), the black-box model predicted the attribute ‘blonde hair’, which is clearly wrong as identified by TERP, which shows that the attribute leading to this classification is nothing to do with hair or its color. For these four images, TERP generates the interpretation free energy minima at four different for these instances as shown in Fig. 3 (e). TERP parameters for these four explanations are provided in SI.

Figure 3: Using TERP to interpret and check reliability of MobileNet model trained on celebA dataset for images not included in the training set. Top rows in (a), (b), (c), and (d) show images classified by MobileNet as ‘smiling’, ‘goatee’, ‘necktie’ and ‘blonde hair’ respectively. Bottoms rows show corresponding free energy optimized explanations generated by TERP. By comparing the figures in the bottom row with the corresponding figures in the top row, it can be seen that the black-box MobileNet model had the right reasons to label the images as ‘smiling’, ‘goatee’, and ‘necktie’ in (a), (b), and (c) respectively. However, for (d) the black-box model prediction of ‘blonde hair’ is clearly wrong. (e) Interpretation free energy versus number of features for these four explanations showing respectively for .

iii.2 Heart disease prediction: XGBoost

XGBoost (Extreme Gradient Boosting) is a powerful ML library that has become very popular in practical applications

Zoabi, Deri-Rozov, and Shomron (2021); Nobre and Neves (2019); Dhaliwal, Nahid, and Abbas (2018) due to its excellent performance, flexibility, and ease of implementation.Chen et al. (2019)

This library is capable of analyzing both numerical and categorical data and is typically used to train gradient boosted decision trees. Here, we train an XGBoost classifier on the Heart Disease Dataset from the UC Irvine Machine Learning Repository.

Dua and Graff (2017); Detrano et al. (1989); Aha and Kibler (1988); Gennari, Langley, and Fisher (1989) The dataset contains different feature reports from patients collected by four different hospitals. The features are: age, sex, chest pain type (cp), resting blood pressure (trestbps), serum cholesterol in mg/dl (chol), fasting blood sugar mg/dl (fbs), resting electrocardiographic results (restecg), maximum heart rate achieved (thalach), exercise induced angina (exang), ST depression induced by exercise relative to rest (oldpeak), slope of the peak exercise ST segment (slope), number of major vessels (0-3) colored by fluoroscopy (ca), thalassemia (thal). The dataset includes both categorical and numerical data and missing features for instances were populated using a dummy value (negative integer). We employed of the total data, i.e, instances to train the classifier, and the rest of the data was used for validation purposes. The XGBoost parameters used to train the model have been reported in SI.

We used TERP to explain a specific prediction of positive heart disease prediction. Detailed discussions for all interpretations are provided in the SI. Interestingly, TERP identifies that for an interpretation with , the feature ‘sex’ played highest role in the black-box XGBoost model prediction. This is possibly due to a bias in the training data, where male patients outnumbered female patients by a factor of , and the fraction of patients with heart disease was much higher for males as shown in Fig. 4 (b). This demonstrates a key advantage of TERP since it will not be obvious when using a global feature attribution scheme such as SHAP as shown in the SI (Figure S3b). For interpretation, fasting blood sugar mg/dl (fbs) and chest pain (cp) were given the highest importance by the black-box model when predicting heart disease for this instance (Fig. 4 (c), (d)). Additionally, the instance was deliberately populated with two missing fields for slope and colored by fluoroscopic (ca) features. In this regard, the XGBoost classifier correctly learnt that these features are not relevant for this prediction by assigning almost zero weight for all models as discussed in the SI (Figure S4). Fig. 4 (a) shows a minimum for at .

Thus, this example shows how TERP successfully checked for training data bias, and the effects of missing values in the black-box model prediction that can be commonly found in practical problems.

Figure 4: Using TERP to interpret XGBoost classifier for predicting heart diesease. (a)Interpretation free energy vs. non-zero coefficients plot showing a minimum for at , (b) positive heart disease values in the training dataset for the feature ‘sex’. Subfigures (c) and (d) show relative feature importance for and interpretations respectively.

iii.3 AI-augmented MD method: VAMPnets

Variational approach for markov processes (VAMPnets) is a popular technique for analyzing molecular dynamics (MD) trajectories. VAMPnets can be used to featurize, transform inputs to a lower dimensional representation, and construct a markov state modelBowman, Pande, and Noé (2013) in an automated manner by maximizing the so called VAMP score. Detailed discussion of VAMPnets theory and parameters are provided in the SI.

In this work, we trained a VAMPnet model on a standard toy system: alanine dipeptide in vacuum. The system was parametrized using CHARMM36mHuang et al. (2017) forcefield, and a ns MD simulation at K temperature and atm pressure was performed in GROMACS.Van Der Spoel et al. (2005) Afterwards, an 8-dimensional input space with sines and cosines of all the dihedral angles was constructed and passed to VAMPnet. For the chosen parameters, VAMPnet was able to identify three metastable states I, II, and III as shown in Fig. 5 (a).

To interpret the VAMPnet model using TERP, we picked three configurations A, B, and C corresponding to three datapoints at the boundaries between the three pairs of states (I, II), (II, III), and (III, I) respectively, thus likely to be a configuration from the transition state ensemble for moving between these pairs of states. These three instances were chosen for TERP analysis, with the goal of understanding the reasons behind their classification under respective transition metastable states. At first, neighborhoods around each of these instances were generated by randomly perturbing the input space based on the standard deviation of the respective feature. The generated neighborhood data was then used to construct linear, local interpretable models using Eq. 5. As shown in Fig. 5 (c), TERP identified minima at and for these configurations respectively. The relative feature importance for each of these models can be used to explain the black-box VAMPnet model predictions. Fig. 5 (e) shows that VAMPnet classified A at the boundary between two specific metastable states by considering the dihedral angle, while for configurations B, and C both , and dihedral angles were taken into account. The feature attributions learned by TERP to explain VAMPnet predictions for this system are in agreement with previous literature,Bolhuis, Dellago, and Chandler (2000) thereby showing that VAMPnet worked here for the right reasons and thus can be trusted.

Figure 5: Using TERP to interpret VAMPnet for alanine dipeptide in vacuum. Projected converged states highlighted in three different colors as obtained by VAMPnets along (a) (), and (b) () dihedral angles. VAMPnets learns the transitions state in this system, three of which were probed using TERP denoted by states A, B, and C. (c) Interpretation free energy corresponding to these three states at . (d) molecular structure of alanine dipeptide for I, II, III, (e), (f), (g) Shows the relative feature importance at for these three states.

iii.4 AI-augmented MD method: SPIB

The second AI-augmented MD method we explain using TERP is the State predictive information bottleneck (SPIB) method.Wang and Tiwary (2021) SPIB is an information bottleneck based framework that takes MD trajectory order parameters (OPs) as inputs and constructs a low dimensional latent space representation by predicting the metastable state of the molecular system after a short time delay . This is implemented through an optimal encoder and decoder combination. The decoder ensures that the model retains as much predictive power as possible while the encoder ensures as little information as possible has been used for that prediction. It has been shown in previous works Wang et al. (2022); Wang and Tiwary (2021) that this latent space approximates the reaction coordinate describing system behaviour.

In this work, we ran a ns MD simulation of a small residue peptide -aminoisobutyric acid (Aib) at K temperature and atm pressure implemented through Nose-Hoover thermostat and Parrinello-Rahman barostatNosé (1984); Parrinello and Rahman (1980) in GROMACS. The peptide was solvated in TIP3P water molecules and CHARMM36m forcefield was used to parametrize (Aib) prepared using CHARMM-GUI.Lee et al. (2016)

To analyze the resultant MD trajectory, a deep non-linear SPIB artificial neural network with two enocoder and two decoder layers architecture was constructed. A dimensional input space with sines and cosines of all the and dihedral angles were passed as input to SPIB which detected 10 converged metastable states as shown in Fig. 6 (c). Here the most populated states are and corresponding to right (R)- and left (L)- handed chiral structures respectively. Interestingly, we can see from Fig. 6(d) that SPIB places these states as far from each other as possible. This indicates that the use of the model-dependent scheme from Sec. II.2 to compute similarity measures on the basis of latent space could be justified here. We have added detailed discussion about the dihedral angles, and SPIB training process in the SI.

To achieve improved understanding of the transition process we employed TERP to probe regions near the SPIB learnt transition states. Using TERP, we learnt the most important features among all the , and dihedral angles for different regions A, B, C, D, E, F, G, H, I, and J shown in Fig. 6(c).

We can see from Fig. 6 (b), and (c) that the transition between SPIB states and that correspond to the fully right (R)- and left (L)-handed configurations of the peptide respectively, can happen following the top pathway highlighted by representative instances A, B, C, D, and E or the bottom pathway by F, G, H, and I, J. Fig. 6 (c) also highlights the relevant features corresponding to non-zero coefficients of the interpretation using TERP. By performing TERP for the different states lying in between these two end configurations, we see that starting from SPIB converged state , the molecule may reach state as adjacent residues undergo chiral transitions. From Fig. 6 (c), for instance A, SPIB considered dihedral angle for assigning a metastable state. Similarly for B,C,D, and E instances the adjacent dihedral angles of increasing order were considered. After reaching the metastable state , the molecule can transition back to state if the dihedral angles undergo right-handed transitions starting from the end residues to the initial residues. However, if the initial residues, e.g, or undergo right-handed transitions before the end residues, one possibility is that the molecule will go back to state by following the bottom pathway through F, G, H, I, and J instances as learnt by TERP. This result matches with previous literature Biswas, Lickert, and Stock (2018) and validates SPIB model behavior for these representative instances.

Figure 6: Using TERP to interpret SPIB deep neural network for (Aib) in water. (a) Left (L)- and right (R)- handed molecular structure of (Aib), (b) free energy surface for 2-dimensional SPIB reaction coordinate, (c) SPIB converged state labels project on free energy surface for SPIB reaction coordinate.

Iv Discussion

The use of AI-based black-box models has now become a staple feature across domains as they can be deployed without any need for a fundamental understanding of the governing processes at work. This however leads to questions about whether an AI model can be trusted and how one should proceed about deriving the meaning of AI based models. Numerous approaches have been proposed to tackle this problem Ribeiro, Singh, and Guestrin (2016); Fisher, Rudin, and Dominici (2019); Lundberg and Lee (2017); Sundararajan, Taly, and Yan (2017); Wachter, Mittelstadt, and Russell (2017), however very few with the notable exception of Ref. Wellawatte, Seshadri, and White, 2022; Kikutsuji et al., 2022 have been used in molecular simulations. In this work, we established a thermodynamic framework for generating interpretable representations of complex black-box models, wherein the optimal representation was expressed as one that minimizes unfaithfulness to the ground truth model, while still staying as simple as possible. This trade-off was quantified through the concept of an Interpretation Free Energy which has simple but useful mathematical properties guaranteeing the existence of unique minima. The minima is found using a Monte Carlo forward feature selection scheme. We demonstrated the use of this approach on different problems using AI, such as classifying images, predicting heart disease and labeling biomolecular conformations. We believe that arguably, this is one of the first applications of interpretability schemes to AI-augmented molecular dynamics, which is a rapidly burgeoning sub-discipline in its own right. In TERP, as well as in other local surrogate model based schemes such as LIMERibeiro, Singh, and Guestrin (2016), interpretations are generated that are valid locally in vicinity of the data instance being explained. This raises a key question - how does one define locality? Here for biomolecular systems we applied TERP to methods such as VAMPnetMardt et al. (2018) and SPIB Wang and Tiwary (2021). Especially for the latter, we were able to exploit the low-dimensional latent space which captures attributes of the reaction coordinate of the system.Wang and Tiwary (2021); Wang et al. (2022) In future work we would like to more carefully visit the question of introducing kinetically relevant distance metrics on the low-dimensional manifold,Tsai and Tiwary (2021) including for the case when the data is being generated from biased importance sampling.Tsai, Smith, and Tiwary (2021) A second direction in future work will involve exploring if by systematically varying the parameter the interpretation so-obtained changes qualitatively. This could help develop strategies for picking a range of values of where minimizing the Interpretation Free Energy does not lead to drastically different interpretations. However, we believe even in its current version, TERP should be useful to the community for generating optimally interpretable representations of complex AI-driven models in molecular sciences and beyond. Code for TERP is available at

V Acknowledgments

This work was supported by the National Science Foundation, grant no. CHE-2044165. The authors also thank Deepthought2, MARCC, and XSEDE (projects CHE180007P and CHE180027P) for the computational resources used in this work.



  • Dhar (2013)

    V. Dhar, “Data science and prediction,” Communications of the ACM 

    56, 64–73 (2013).
  • Shalev-Shwartz and Ben-David (2014) S. Shalev-Shwartz and S. Ben-David, Understanding machine learning: From theory to algorithms (Cambridge university press, 2014).
  • LeCun, Bengio, and Hinton (2015)

    Y. LeCun, Y. Bengio,  and G. Hinton, “Deep learning,” nature 

    521, 436–444 (2015).
  • Davies et al. (2021) A. Davies, P. Veličković, L. Buesing, S. Blackwell, D. Zheng, N. Tomašev, R. Tanburn, P. Battaglia, C. Blundell, A. Juhász, et al., “Advancing mathematics by guiding human intuition with ai,” Nature 600, 70–74 (2021).
  • Carleo et al. (2019) G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto,  and L. Zdeborová, “Machine learning and the physical sciences,” Reviews of Modern Physics 91, 045002 (2019).
  • Mater and Coote (2019) A. C. Mater and M. L. Coote, “Deep learning in chemistry,” Journal of chemical information and modeling 59, 2545–2559 (2019).
  • Hamet and Tremblay (2017) P. Hamet and J. Tremblay, “Artificial intelligence in medicine,” Metabolism 69, S36–S40 (2017).
  • Baldi and Brunak (2001) P. Baldi and S. Brunak, Bioinformatics: the machine learning approach (MIT press, 2001).
  • Brunton and Kutz (2022) S. L. Brunton and J. N. Kutz, Data-driven science and engineering: Machine learning, dynamical systems, and control (Cambridge University Press, 2022).
  • Loyola-Gonzalez (2019) O. Loyola-Gonzalez, “Black-box vs. white-box: Understanding their advantages and weaknesses from a practical point of view,” IEEE Access 7, 154096–154113 (2019).
  • Callen (1985) H. B. Callen, “Thermodynamics and an introduction to thermostatistics,”  (1985).
  • Kumar and Minz (2014) V. Kumar and S. Minz, “Feature selection: a literature review,” SmartCR 4, 211–229 (2014).
  • Chen and Guestrin (2016) T. Chen and C. Guestrin, “XGBoost,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2016).
  • Howard et al. (2017) A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto,  and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,”  (2017).
  • Frenkel and Smit (2001) D. Frenkel and B. Smit, Understanding molecular simulation: from algorithms to applications, Vol. 1 (Elsevier, 2001).
  • Doerr et al. (2021) S. Doerr, M. Majewski, A. Pérez, A. Kramer, C. Clementi, F. Noe, T. Giorgino,  and G. De Fabritiis, “Torchmd: A deep learning framework for molecular simulations,” Journal of chemical theory and computation 17, 2355–2363 (2021).
  • Han et al. (2017) J. Han, L. Zhang, R. Car, et al., “Deep potential: A general representation of a many-body potential energy surface,” arXiv preprint arXiv:1707.01478  (2017).
  • Gao et al. (2020)

    X. Gao, F. Ramezanghorbani, O. Isayev, J. S. Smith,  and A. E. Roitberg, “Torchani: a free and open source pytorch-based deep learning implementation of the ani neural network potentials,” Journal of chemical information and modeling 

    60, 3408–3415 (2020).
  • Ma and Dinner (2005) A. Ma and A. R. Dinner, “Automatic method for identifying reaction coordinates in complex systems,” The Journal of Physical Chemistry B 109, 6769–6779 (2005).
  • Wang, Ribeiro, and Tiwary (2020) Y. Wang, J. M. L. Ribeiro,  and P. Tiwary, “Machine learning approaches for analyzing and enhancing molecular dynamics simulations,” Current opinion in structural biology 61, 139–145 (2020).
  • Ribeiro et al. (2018)

    J. M. L. Ribeiro, P. Bravo, Y. Wang,  and P. Tiwary, “Reweighted autoencoded variational bayes for enhanced sampling (rave),” The Journal of chemical physics 

    149, 072301 (2018).
  • Vanden-Eijnden (2014) E. Vanden-Eijnden, “Transition path theory,” An introduction to Markov state models and their application to long timescale molecular simulation , 91–100 (2014).
  • Smith et al. (2020) Z. Smith, P. Ravindra, Y. Wang, R. Cooley,  and P. Tiwary, “Discovering protein conformational flexibility through artificial-intelligence-aided molecular dynamics,” The Journal of Physical Chemistry B 124, 8221–8229 (2020).
  • Mardt et al. (2018) A. Mardt, L. Pasquali, H. Wu,  and F. Noé, “Vampnets for deep learning of molecular kinetics,” Nature communications 9, 1–11 (2018).
  • Wang and Tiwary (2021) D. Wang and P. Tiwary, “State predictive information bottleneck,” The Journal of Chemical Physics 154, 134111 (2021).
  • Beyerle, Mehdi, and Tiwary (2022) E. R. Beyerle, S. Mehdi,  and P. Tiwary, “Quantifying energetic and entropic pathways in molecular systems,” The Journal of Physical Chemistry B  (2022).
  • Bolhuis, Dellago, and Chandler (2000) P. G. Bolhuis, C. Dellago,  and D. Chandler, “Reaction coordinates of biomolecular isomerization,” Proceedings of the National Academy of Sciences 97, 5877–5882 (2000).
  • Mehdi et al. (2022) S. Mehdi, D. Wang, S. Pant,  and P. Tiwary, “Accelerating all-atom simulations and gaining mechanistic understanding of biophysical systems through state predictive information bottleneck,” Journal of Chemical Theory and Computation 18, 3231–3238 (2022).
  • Ribeiro, Singh, and Guestrin (2016) M. T. Ribeiro, S. Singh,  and C. Guestrin, “” why should i trust you?” explaining the predictions of any classifier,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (2016) pp. 1135–1144.
  • Fisher, Rudin, and Dominici (2019) A. Fisher, C. Rudin,  and F. Dominici, “All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously.” J. Mach. Learn. Res. 20, 1–81 (2019).
  • Lundberg and Lee (2017) S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” Advances in neural information processing systems 30 (2017).
  • Gupta, Kulkarni, and Mukherjee (2021) A. Gupta, M. Kulkarni,  and A. Mukherjee, “Accurate prediction of b-form/a-form dna conformation propensity from primary sequence: A machine learning and free energy handshake,” Patterns 2, 100329 (2021).
  • Sundararajan, Taly, and Yan (2017) M. Sundararajan, A. Taly,  and Q. Yan, “Axiomatic attribution for deep networks,” in International conference on machine learning (PMLR, 2017) pp. 3319–3328.
  • Wachter, Mittelstadt, and Russell (2017) S. Wachter, B. Mittelstadt,  and C. Russell, “Counterfactual explanations without opening the black box: Automated decisions and the gdpr,” Harv. JL & Tech. 31, 841 (2017).
  • Wellawatte, Seshadri, and White (2022) G. P. Wellawatte, A. Seshadri,  and A. D. White, “Model agnostic generation of counterfactual explanations for molecules,” Chemical science 13, 3697–3705 (2022).
  • Lerman (1980) P. Lerman, “Fitting segmented regression models by grid search,” Journal of the Royal Statistical Society: Series C (Applied Statistics) 29, 77–84 (1980).
  • Gu et al. (2018)

    J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai,  and T. Chen, “Recent advances in convolutional neural networks,” Pattern Recognition 

    77, 354–377 (2018).
  • Tishby, Pereira, and Bialek (2000) N. Tishby, F. C. Pereira,  and W. Bialek, “The information bottleneck method,”  (2000).
  • Alemi et al. (2016) A. A. Alemi, I. Fischer, J. V. Dillon,  and K. Murphy, “Deep variational information bottleneck,”  (2016).
  • Metropolis et al. (1953) N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller,  and E. Teller, “Equation of state calculations by fast computing machines,” The journal of chemical physics 21, 1087–1092 (1953).
  • Traore, Kamsu-Foguem, and Tangara (2018) B. B. Traore, B. Kamsu-Foguem,  and F. Tangara, “Deep convolution neural network for image recognition,” Ecological Informatics 48, 257–268 (2018).
  • Giménez, Palanca, and Botti (2020)

    M. Giménez, J. Palanca,  and V. Botti, “Semantic-based padding in convolutional neural networks for improving the performance in natural language processing. a case of study in sentiment analysis,” Neurocomputing 

    378, 315–323 (2020).
  • Pelletier, Webb, and Petitjean (2019) C. Pelletier, G. I. Webb,  and F. Petitjean, “Temporal convolutional neural network for the classification of satellite image time series,” Remote Sensing 11, 523 (2019).
  • Liu et al. (2015) Z. Liu, P. Luo, X. Wang,  and X. Tang, “Deep learning face attributes in the wild,” in Proceedings of International Conference on Computer Vision (ICCV) (2015).
  • Achanta et al. (2010) R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua,  and S. Süsstrunk, “Slic superpixels,” Tech. Rep. (2010).
  • Zoabi, Deri-Rozov, and Shomron (2021) Y. Zoabi, S. Deri-Rozov,  and N. Shomron, “Machine learning-based prediction of COVID-19 diagnosis based on symptoms,” npj Digital Medicine 4 (2021).
  • Nobre and Neves (2019)

    J. Nobre and R. F. Neves, “Combining principal component analysis, discrete wavelet transform and xgboost to trade in the financial markets,” Expert Systems with Applications 

    125, 181–194 (2019).
  • Dhaliwal, Nahid, and Abbas (2018) S. S. Dhaliwal, A.-A. Nahid,  and R. Abbas, “Effective intrusion detection system using xgboost,” Information 9, 149 (2018).
  • Chen et al. (2019) T. Chen, T. He, M. Benesty,  and V. Khotilovich, “Package ?xgboost?,” R version 90, 1–66 (2019).
  • Dua and Graff (2017) D. Dua and C. Graff, “UCI machine learning repository,”  (2017).
  • Detrano et al. (1989) R. Detrano, A. Janosi, W. Steinbrunn, M. Pfisterer, J.-J. Schmid, S. Sandhu, K. H. Guppy, S. Lee,  and V. Froelicher, “International application of a new probability algorithm for the diagnosis of coronary artery disease,” The American journal of cardiology 64, 304–310 (1989).
  • Aha and Kibler (1988) D. Aha and D. Kibler, “Instance-based prediction of heart-disease presence with the cleveland database,” University of California 3, 3–2 (1988).
  • Gennari, Langley, and Fisher (1989) J. H. Gennari, P. Langley,  and D. Fisher, “Models of incremental concept formation,” Artificial intelligence 40, 11–61 (1989).
  • Bowman, Pande, and Noé (2013) G. R. Bowman, V. S. Pande,  and F. Noé, An introduction to Markov state models and their application to long timescale molecular simulation, Vol. 797 (Springer Science & Business Media, 2013).
  • Huang et al. (2017) J. Huang, S. Rauscher, G. Nawrocki, T. Ran, M. Feig, B. L. De Groot, H. Grubmüller,  and A. D. MacKerell, “Charmm36m: an improved force field for folded and intrinsically disordered proteins,” Nature methods 14, 71–73 (2017).
  • Van Der Spoel et al. (2005) D. Van Der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark,  and H. J. Berendsen, “Gromacs: fast, flexible, and free,” Journal of computational chemistry 26, 1701–1718 (2005).
  • Wang et al. (2022) D. Wang, R. Zhao, J. D. Weeks,  and P. Tiwary, “Influence of long-range forces on the transition states and dynamics of nacl ion-pair dissociation in water,” The Journal of Physical Chemistry B 126, 545–551 (2022).
  • Nosé (1984) S. Nosé, “A unified formulation of the constant temperature molecular dynamics methods,” The Journal of chemical physics 81, 511–519 (1984).
  • Parrinello and Rahman (1980) M. Parrinello and A. Rahman, “Crystal structure and pair potentials: A molecular-dynamics study,” Physical review letters 45, 1196 (1980).
  • Lee et al. (2016) J. Lee, X. Cheng, J. M. Swails, M. S. Yeom, P. K. Eastman, J. A. Lemkul, S. Wei, J. Buckner, J. C. Jeong, Y. Qi, et al., “Charmm-gui input generator for namd, gromacs, amber, openmm, and charmm/openmm simulations using the charmm36 additive force field,” Journal of chemical theory and computation 12, 405–413 (2016).
  • Biswas, Lickert, and Stock (2018)

    M. Biswas, B. Lickert,  and G. Stock, “Metadynamics enhanced markov modeling of protein dynamics,” The Journal of Physical Chemistry B 

    122, 5508–5514 (2018).
  • Kikutsuji et al. (2022) T. Kikutsuji, Y. Mori, K.-i. Okazaki, T. Mori, K. Kim,  and N. Matubayasi, “Explaining reaction coordinates of alanine dipeptide isomerization obtained from deep neural networks using explainable artificial intelligence (xai),” The Journal of Chemical Physics 156, 154108 (2022).
  • Tsai and Tiwary (2021) S.-T. Tsai and P. Tiwary, “On the distance between a and b in molecular configuration space,” Molecular Simulation 47, 449–456 (2021).
  • Tsai, Smith, and Tiwary (2021)

    S.-T. Tsai, Z. Smith,  and P. Tiwary, “Sgoop-d: Estimating kinetic distances and reaction coordinate dimensionality for rare event systems from biased/unbiased simulations,” Journal of Chemical Theory and Computation 

    17, 6757–6765 (2021).