1 Introduction
Competing enthalpic and entropic contributions to the solvation free energies give rise to the hydrophobic effect^{1}, which is vital for protein function and folding^{2, 3, 4}. Despite extensive theoretical work^{1, 5}, a quantitative understanding of the hydrophobic effect particularly at heterogeneous surfaces, such as of proteins and mixed bilayers, remains elusive.
Because surfacewater shows a significantly altered behaviour compared to bulk^{6, 7}, it is essential for our understanding of the thermodynamics and energetics of protein solvation to better characterize, e.g., the relative contributions by different solvation shells or the effect of individual protein side chains on the solvent. Molecular dynamics (MD) simulations describe the hydrophobic effect at an atomic level^{8, 9}, but a deeper understanding of the molecular driving forces, requires a quantitative and spatially resolved picture of the solvation shell thermodynamics, which poses considerable challenges.
Methods like thermodynamic integration (TI)^{10, 11} allow for the calculation of solvation entropies based on MD simulations, but the lack of a spatial resolution precludes detailed analysis of how local features of the solventsurface interface contribute and interact. Various order parameters^{12, 13, 14, 15, 16} assess both the local translational and the local rotational order of water molecules, but yield only a qualitative picture of the thermodynamic entropy.
Here, we limit our analysis to absolute rotational water entropies and present a method to reach a spatial resolution from atomistic simulations or Monte Carlo ensembles. Our method employs a mutual information expansion (MIE) to calculate the total entropy of water molecules based on the contributions of each molecule individually and the entropy losses due to correlations between molecule pairs and triples. A similar approach was taken e.g. by the grid inhomogeneous solvation theory (GIST)^{17, 18, 19, 20, 21, 22}, which, like 3D2PT^{23, 24, 25}, considers contributions by voxels, rather then entropic contributions by the correlations between individual molecules directly. Convergence of these correlation terms is challenging, as they require sampling and density estimates in highdimensional configuration spaces.
MIE terms were calculated using a
nearestneighbor (kNN) density estimator, typically used in Euclidean spaces
^{26, 27, 28}, which we modified and optimized for , the cartesian products of the group of rotations. We considered different metrices for the nearest neighbors in , determined an optimal value, and provide a computationally efficient framework for rotational entropy calculation.For easier notation, will develop our method for water molecules, although it is general and applicable to any system with rotational degrees of freedom.
In the following sections, we will first provide the conceptual foundation and then describe our rotation entropy approach. Subsequently, we will apply it to analytical test distributions, as well as to MD water boxes.
2 Theory
2.1 Absolute entropy
When separating the entropy of water into rotational and translational contributions
correlations between translation and rotation give rise to the mutual information (MI) term . Here, we focus on the rotational contribution.
Let the rotation of water molecules of the simulation system be described by the Hamiltonian , with angular momenta , orientations , the kinetic energy , and the potential energy , typically described by a molecular dynamics force field. The total entropy is
with the Boltzmann constant , Planck’s constant , and the phase space density . Because factorizes, the entropy can be split into a kinetic and a configurational term
where with arbitrary , and
are the eigenvalues of the momentofinertia tensor of a water molecule.
Because can be solved analytically, the challenge is to estimate .
2.2 Entropy estimation
Because the rotational entropy integral in dimensions usually cannot be computed directly, we use a truncated mutual information expansion^{29, 30, 31, 32} (see section 2.2.1) to expand the fullhigh dimensional integral into multiple lowdimensional integrals over marginal distributions, which can be calculated numerically. To obtain these marginal entropies, a nearestneighbor estimator (see section 2.2.2), which estimates the density at each sample point by finding the closest neighboring sample points and dividing by the volume of a ball that encloses the points, was used. Here, the orientations of water molecules in different samples, e.g., frames of a computer simulation trajectory, were represented by a series of quaternions (see section 2.2.3) with . We then defined suitable distance metrics, as required by the kNN algorithm, which are not trivial in the curved spaces of rotations (see section 2.2.4), and then calculated the volumes of balls, as induced by the metrics (see section 2.2.5). We will finally present a computationally efficient framework that allows finding neighbors to each sample point (see section 3.1).
2.2.1 Mutual information expansion
Figure 1A shows an example of an entropy expansion into mutual information (MI) terms of a system containing three subsystems, such as three water molecules, in a Venn diagram: The full entropy () is expanded into MI terms (), of which the first term represents the entropies of each molecule individually and the further terms are correlation terms of 2nd and 3rd order, respectively,
(1a)  
(1b)  
(1c) 
In this notation, is the entropy of the marginal distribution with respect to molecules with indices .
For water molecules, the expansion consists of MI orders, of which the th term involves dimensional integrals and takes all possible molecule correlations into account. Approximating the full entropy by a truncated expansion thus leads to lower dimensional integrals, which can be better sampled. Although there is no guarantee that truncated orders are small and can be neglected, it has been shown that a truncated expansion provides accurate entropy estimates if the physical interactions are short ranged^{33}, as for water.
Here we take up to molecule correlations into account by truncating after the 3rd order, hence
(2)  
where the first order includes the kinetic entropy contribution and a correction of , due to the twofold symmetry of the water molecule.
2.2.2 kNN entropy estimation
To evaluate eq 2 from a given sample of orientations with , the marginal entropies from eq 1 are calculated using a kNN entropy estimator^{26, 34, 27, 28, 35}. For , the th nearest neighbor with respect to the sample point is defined by a metric (see Figure 1B), and is estimated as , where is a fixed integer, is the volume of a ball with radius , the distance between and its th neighbor, and is a normalization constant. Results for and are obtained by generalizing the metric and the volume to higher dimensions. The choice of metrices, on which the results may depend for finite sampling, and their corresponding volumes in will be discussed in section 2.2.4 and section 2.2.5. The entropy is
where is a correction which accounts for the bias introduced by the th neighbors being, by definition, on the edges of the balls^{28}. is the digamma function.
Because eqs 1b and 1c are sums and differences of integrals of different dimensionalities, biases are introduced: With increasing dimensionality and thus reducing sampling, the kNN estimator yields increasingly smoothed versions of the underlying true distributions. The estimator therefore overestimates entropies of distributions with higherdimensional supports more than of those defined in lowerdimensional spaces, resulting in biases if entropies of different dimensionality are added or subtracted. To overcome this problem, the sampling space is expanded to equal dimensionality by using fill modes^{36, 32}. , defined in eq 1b as the sum of integrals in and , can be rewritten as a sum of two integrals
if the corresponding joint distribution
factorizes to . To achieve statistical independence, the sample points corresponding to index were subjected to a random permutation , which decorrelates and , but leaves the marginal distributions unchanged, as sketched in Figure 1C. The joint entropy is thus the sum of the initial marginal entropies .Similarly, the 3rd order MI term reads
2.2.3 Parametrization of orientations
From different parametrizations of orientations in 3d space, such as Euler angles, TaitBryan angles, Hopf coordinates^{37, 38}, and spherical coordinates, we used quaternions^{39}, which, contrary to most other charts of , do not suffer from Gimbal lock. They are defined as , where and are a normalized rotation axis and a rotation angle, respectively, and thus can be interpreted as elements of the 3sphere, i.e., . Because there is a onetoone mapping of the 3sphere to the Special Unitary group , which in turn provides a doublecovering of , each orientation is described by two equivalent quaternions, which differ only by a sign^{40}.
2.2.4 Choice of metrics in
We next considered the proper choice of metrics in . At first sight, one might think that, of many possible metrics in ^{40, 41}, only one, e.g., the geodesic metric shown in Figure 1B, yields the correct entropy. However, in the limit of infinite sampling, kNN entropy estimation with any metric is possible if used with its induced ballvolumes (see section 2.2.5)^{42}. Our choice was therefore guided by the speed of convergence and computational efficiency.
We chose the quaternion metric^{43, 40}
sketched in Figure 1B, which defines a metric between two rotations as the minimum Euclidean distance between unit quaternions, taking the sign ambiguity into account. In , the quaternion metric and the more natural geodesic metric yield identical nearest neighbors. They are functionally equivalent because a positive continuous strictly increasing function , such that (and vice versa), exists^{40}. does not require evaluation of the inverse cosine function and thus is computationally more efficient; it was therefore preferred over .
Metrices in and were obtained by combining with the Euclidean norms in and , respectively,
with . When combined with the Euclidean norms, the quaternion metric and the more natural geodesic metric are not functionally equivalent and hence yield, in general, different nearest neighbors. For small distances, i.e., for high sampling, the metrices are asymptotically identical.
To test whether this choice of metrics impacts the accuracy of the MI results, we compared our choice to the composite metric using and the maximumnorm in , which is functionally equivalent to the geodesic composite metric but slightly less efficient to evaluate than the Euclidean norm. For frames, no significant difference between the MI values was seen.
2.2.5 Volumes of balls in
The volumes (dark green in Figure 1B), enclosed by the kNN radius , read
for in . The respective volumes for (in ) and (in ) reduce to
respectively, with and . The integrals were solved numerically for equally spaced values of using the software Mathematica 10.0^{44}
and the multidimensional rule; the results were stored in a lookup table. Cubic interpolation was used to obtain results from the stored values.
3 Methods
3.1 Nearestneighbor search
Nearestneighbor searches were performed using the NonMetric Space Library 1.7.3.6^{45} (NMSLIB)^{1}^{1}1https://github.com/nmslib/nmslib using the above metrics. Each data set was indexed in a vantagepoint tree^{46, 47} (VPtree) that rests on the triangle inequality. Our version of the NMSLIB, modified to include the orientational metrices, is available online^{2}^{2}2https://gitlab.gwdg.de/lheinz/nmslib_quaternion.
3.2 Accuracy assessment
3.2.1 Test distributions
To assess the accuracy of our method, we used analytical testdistributions in , , and , derived from
with a quaternion , the first quaternion component , the first azimuthal angle in spherical coordinates for the 3sphere , and the appropriate normalization constant (Figure 2A). The analytical expression for the configurational entropy reads
using for simpler notation and the gamma function . As illustrated in Figure 2A, the distribution depends on the localization parameter ; a value of
yields a uniform distribution; larger
values yield increasingly narrower distributions.For and
and were used to obtain uncorrelated distributions with entropies , and , respectively.To also assess the accuracy for correlated distributions with , the testdistribution
was used, which was designed such that the marginals with respect to and are and , respectively. The localization here controls the degree of correlation between and , ranging from an uncorrelated uniform distribution () on to strongly correlated distributions for lager values. The entropy of this distribution is
where is the entropy of a free rotor.
Samples were obtained using a rejection method: First, a random point in was drawn from a uniform distribution by drawing quaternions from the uniform distribution on the 3sphere. Next, a random number was drawn from a uniform distribution between and and was accepted if and rejected otherwise. This process was repeated until the desired number of samples was obtained.
The accuracy of our method was assessed for each test distribution for localization parameters between and , nearestneighbor values of , and , and with to frames (). The computed entropy and MI values were compared to the analytical results. To obtain statistical error estimates, the calculations for each parameterset was repeated times.
3.3 Molecular dynamics simulations
All MD simulations were carried out using a modified version^{3}^{3}3https://github.com/Tsjerk/gromacs/tree/rtc2018 of the software package Gromacs 2018^{48, 49, 50, 51, 52} with an additional center of mass motion (COM) removal^{53} method, used to individually constrain all oxygen atoms. We furthermore made small additional changes to apply COM removal to individual atoms and to overcome the limit of COM removal groups^{4}^{4}4https://gitlab.gwdg.de/lheinz/gromacsrtc2018_modif. The CHARMM36m force field^{54, 55, 56, 57, 58} and the CHARMMTIP3P water model^{59} were used. All water molecules were subjected to SETTLE^{60} constraints, and the leap frog integrator with a time step of fs was used. Electrostatic forces were calculated using the ParticleMesh Ewald (PME) method^{61} with a nm real space cutoff; the same cutoff was used for LennardJones potentials^{62}. In all simulations, the Vrescale thermostat^{63} with a time constant of ps, and, if applicable, the ParrinelloRahman barostat^{64, 65} with a time constant of ps and bar pressure were used.
A total of water molecules was placed within a cubic simulation box, and the system was equilibrated for ns at K as a NPT ensemble. From the equilibrated system, resulting in a box size of approximately nm, three s production runs were started, as shown in the first column of Figure 3: Run m ("mobile") was carried out as described above. To benchmark our method against the established method of thermodynamic integration (TI), a system with only rotational degrees of freedom was constructed. To this end, all oxygens were positionconstrained using COM removal as shown in the first column of Figure 3 in run p ("pinned"), allowing only rotational movements around the oxygen atom under NVT conditions.
Because at K, the water molecules formed an almost rigid, icelike hydrogen bond network with only very little dynamics, the temperature was increased to K. System sp ("sliced & pinned") was simulated as p but all water molecules within a slice of nm width were removed to create a watervacuum interface.
3.3.1 Entropy calculation
For all three test systems, the entropy of rotation was calculated as described in section 2, each using a s trajectory with frames. For the MI terms, a cutoff on the distance between average molecule positions was used. Whereas including the MIs of many molecule pairs by using a large cutoff distance gives rise to a more accurate MIE, it also introduces larger noise due to limited sampling. For pairwise MI terms, the cutoff was chosen as nm, because for larger distances, the MI terms vanish within statistical errors (see Figure 3B and D). Similarly, triple MI contributions were cut off at nm.
Because the water molecules in system m were mobile, average positions across the obtained trajectory were unusable to define a cutoff. Therefore the water molecules were relabeled in each frame, such that they remained as close as possible to a simplecubic reference structure using permutation reduction^{66, 67}, which leaves the physics of the system unchanged. In systems p and sp, the molecules were immobilized and the oxygen positions where used for applying the cutoff.
To quantify the precision of the method, the MD simulations and the subsequent entropy analyses were repeated in 10 independent calculations.
3.3.2 Thermodynamic integration reference
Reference entropy values for systems p and sp were obtained using thermodynamic integration^{10, 11, 66} (TI). Interactions between water molecules were gradually switched off in a stepwise fashion to obtain the entropy difference between real water and noninteracting water. The absolute rotational entropy was obtained as the sum of the excess entropy, obtained via TI, and the ideal gas contribution,
where are the eigenvalues of the moment of inertia tensor of a water molecule.
Both TI calculations were performed using softcore^{68} parameters and . Coulomb interactions were linearly switched off in windows of ns each and further windows were used to subsequently switch off the vanderWaals interactions. The first nanosecond of each window was discarded.
4 Results and discussion
4.1 Test distributions
We first assessed the accuracy of our method for three uncorrelated analytical test distributions (defined in , , and ) and one correlated analytical test distributions (defined in ), as described in section 3.2.1. The distributions depend on the localization parameter , which, for the uncorrelated distributions , and determines their width, as demonstrated in Figure 2A, and, for , controls the strength of the correlation.
As can be seen in Figure 2B, the kNN estimator largely agrees with the analytic results (dashed lines) for the uncorrelated distributions , and for between (uniform distribution) and (strongly peaked), and the tested values between and . The graphs for the three distributions are scaled and offset as indicated in the figure. We find that for distributions and , our method accurately reproduces the true entropy for all tested values within statistical errors, even for the small number of frames. The statistical errors amount to nats (natural units of information) for , , or less. Also for and , the analytical result is matched within statistical errors (nats at at maximum), whereas larger values of lead to overestimated entropies of up to nats (, ), caused by the limited sampling of just frames and the increased dimensionality of compared to the other tested distributions.
Next, we assessed the accuracy for the correlated test distribution. The panels Figure 2C and D show the entropy (calculated via the MIE as defined in eq 1b) and the MI of for frames, respectively. For the uniform distribution (), the algorithm yields the analytic values of and for entropy and mutual information, respectively. With increasing correlation , the entropy is increasingly overestimated as MI is underestimated. Both effects are more pronounced for larger values: Whereas for , the algorithm yields accurate values within statistical errors up to a correlation of , the results deviate significantly for even for very small values. Overall, small values, such as , yield high accuracy but with reduced precision (i.e., larger statistical errors) compared to large values like , which give rise to smaller statistical errors but reduced accuracy.
To further assess this tradeoff and the convergence properties of our method, we calculated the relative entropy errors for for sampling between and frames, shown in Figure 2E. For and only frames, the method overestimates the true entropy by to , which quickly drops to below for more than frames. For larger values, the entropy errors increase and the convergence becomes slower, e.g., requires frames to achieve an entropy error of less than %. The statistical errors at frames are % and % for values of and , respectively. Overall, yields somewhat lower precision but significantly faster convergence compared to larger values, which becomes even more pronounced in higher dimensions. We therefore find this value the optimal choice for the systems at hand and used it for all subsequent analyses.
The kNN entropy estimator rests on the assumption that the density is approximately constant and isotropic within each nearestneighbor ball (see Figure 1B). This assumption implies that features of the true distribution that are smaller than the average distance between sample points are not resolved, which, in case of poor sampling, inevitably leads to an overestimated entropy, as seen for with large or as shown in Figure 2E. The assumption of isotropy no longer holds for highly correlated datasets, such as for large values of . In this case, also the nearest neighbors to each sample point are correlated and thus not isotropically distributed, which is not reflected by an isotropic kernel, i.e., a ball. For Euclidean spaces, this problem was addressed by using anisotropic kernels^{69, 70}. Although this idea could also be applied in , we find that the correlation of water molecules at standard conditions is weak enough (see Figure 3A) to allow for sufficiently accurate results under the isotropy assumption.
The tradeoff between accuracy and precision with respect to the value is a general property of kNN entropy estimators, which has been characterized previously^{71, 70} and is intuitively accessible: Whereas averaging over an increasing number of neighbors reduces statistical uncertainties and thus improves precision, the assumptions of approximately constant isotropic densities are applied to increasingly larger balls, resulting in increasingly overestimated entropies for distributions with small scale features or strong correlations.
Overall, the kNN method with yields the most accurate results while being only slightly less precise than estimators with lager . It retrieves the analytical entropies within statistical errors for the uncorrelated distributions, as well as for the correlated distribution with using just and frames, respectively.
4.2 Entropy calculated from molecular dynamics simulations
Having assessed our rotational entropy method against analytic test distributions, we tested its accuracy for more realistic systems of up to interacting water molecules. To this end, we simulated three atomistic water MD systems (Figure 3 left column), as described in section 3.3. System m ("mobile") comprised 1728 unconstrained water molecules. In system p ("pinned"), the position of the oxygen atom of each individual molecule was fixed, permitting only rotations in order to allow direct entropy comparison using TI^{10, 11}. In system sp ("sliced & pinned"), a nm slice of molecules was removed from the center of the simulations box (see Figure 3) in addition to pinning the oxygen atoms to assess the accuracy of the method also for systems with spatial variation. For all systems, 10 independent MD simulations were performed and for each, entropies were calculated via a MIE as explained in section 3.3.1. For the pinned systems, reference entropies were obtained by TI, as described in section 3.3.2.
As shown in Figure 3A, an absolute rotational entropy of JmolK per molecule was obtained for system m, to which the first, second, and third MI orders contribute kJmol, JmolK, and JmolK
, respectively. Note that the provided values are average and standard deviation of the 10 independent calculations and that the errors are too small to be illustrated in Figure
3. The pair mutual information terms , shown in Figure 3B, reach a maximum of JmolK for very close water molecules and vanish monotonically for molecules that are, after permutation reduction and on average, separated by more than nm. Note that the discrete nature of distances in Figure 3B is due to the choice of a simple cubic reference structure for permutation reduction.To compare the obtained absolute entropies to TI, the water movement was restricted to the rotational degrees of freedom in system p by pinning each molecule as described in section 3.3. Here, the rotational entropy, shown in panel C, is reduced to JmolK. The 2nd and 3rd order mutual information terms contribute JmolK and JmolK, respectively. Compared to the results from TI shown in gray, the entropy is underestimated by % due to the limited sampling of the strongly correlated system. Similar to what we observed for the analytical test case of Figure 2D, the MI terms are underestimated for strong correlations, of which the 3rd order is most severely affected due to the high dimensionality of the sampling space.
The terms, illustrated in Figure 3D, show a maximum of JmolK and indicate that water molecules decorrelate beyond nm. The distribution shows secondary and tertiary peaks around nm and nm, which arise from indirect coupling via one or two mediating water molecules, as indicated by the structures shown in Figure 3D. In this case, the correlations between the molecule pairs are not due to direct interactions; instead, mediating water molecules (orange) enhance distant orientation correlations via short hydrogenbonded chains (shown in red). This finding demonstrates that the method is able to identify regions of locally coupled water molecules and to quantify the resulting entropy losses, thus providing a spatially resolved picture of entropy changes.
To further assess and demonstrate the accuracy of the method for systems with spatial features, we included position dependent properties by introducing a vacuumslice in system sp, for which the water molecules close to the surface show a different dynamic behavior than those within the bulk. For system sp, the accuracy of our entropy estimation relative to TI improved slightly to %, whereas the contributions by higher MI orders remained almost identical (see Figure 3E). We assume that the improved accuracy is due to the slightly smaller number of molecules (1728 vs. 1493 with slice). It may further be due to the vacuum slice limiting the range of manyparticle correlations, which would not be captured by a 3rd order approximation.
Figure 3F shows the terms of molecule pairs that are closer than nm, i.e., are within their first hydration shells, relative to their distance to the slice, yielding increased correlations for pairs that are closer to the vacuum interface of JmolK on average compared to JmolK in bulk. The increased correlations at the surface and their associated entropy loss contribute to the thermodynamic unfavorability of water at a (hydrophobic) vacuum interface.
The MIE approaches the TI values for systems p and sp to % and %, respectively, and additionally yields information about individual correlations and their associated entropy losses, thus providing spatial resolution. Remarkably, about 25fold less computer time was required for the MIE compared to TI for the shown examples.
The large 2nd and 3rd order contributions, illustrated in Figure 3C and E, show that both systems with pinned water exhibit strong correlations between water molecules. As for the test distributions illustrated in Figure 2, strong correlations result in systematically underestimated MI values. Due to their high dimensionality, and thus low sampling density, we expect the 3rd order MIE contributions for systems p and sp to be mostly affected, contributing to their overall underestimated entropy. For the same reason, we expect entropies calculated from more loosely coupled mobile water to yield markedly more accurate results.
Although a direct comparison to TI is impossible for system m, we expect that the errors due to the truncation of higher order MI terms, observed for the more tightly correlated systems p and sp, are larger than for unconstrained water. Therefore, the approximation of the truncated MIE yields more accurate results for realistic solute systems. These two effects combined, the performances obtained for the more correlated pinned water systems provide upper bounds for the expected errors.
5 Conclusion
We have developed an estimator for spatially resolved rotational solvent entropies based on a truncated mutual information expansion and the nearestneighbor algorithm on . Accuracy and computational efficiency were assessed for both analytical test distributions, and systems of up to 1728 water molecules, described by atomistic MD simulations.
For the uncorrelated test distributions in , , and , the estimator with yielded quite accurate entropies for as little as sample points. For the correlated test distribution , the entropies were overestimated for increasing coupling, caused by underestimating mutual information terms. The latter effect is especially pronounced for large values. Precision increased only marginally for larger at the cost of decreased accuracy, which led us to conclude that represents the best tradeoff for the problem at hand. We furthermore demonstrated convergence within frames for a correlated distribution () and therefore expect our approach to accurately describe correlations of water molecules already in relatively short MD trajectories of ns to s.
For the considered MD systems, we find agreement within % and % with TI for pinned waters in systems p and sp, respectively, corresponding to deviations of kJmol and kJmol per water molecule at K. The obtained rotational entropies are precise within kJmol and kJmol, respectively. For the binding of a small ligand that displaces water molecules at the binding pocket, we therefore expect to obtain absolute rotational entropies with an accuracy of at least kJmol and to resolve rotational entropy differences of at least kJmol. As seen in the second column of Figure 3, fully mobile water exhibits considerably smaller correlations than pinned water, rendering the tests using pinned water quite a tough benchmark compared to realistic solute systems. For a protein/water system, we would therefore expect markedly smaller error margins.
The algorithm provides spatial resolution by assessing the mutual information contributions on the level of individual molecules, distinguishing it from, e.g. GIST^{17, 18, 19, 20, 21, 22}. E.g., for the hydrophobic vacuum interface, we calculated an entropy loss due to an increase in mutual information close to the surface. This ability to resolve the origin of entropy changes renders the method a promising tool to enhance our understanding of processes like the hydrophobic effect and the thermodynamics of solvated complex heterogeneous biomolecules in general.
Work on assessing the contributions by the translational entropy and the translationrotation correlation to the overall entropy is in progress and will be published elsewhere.
Although here, we restricted the application and assessment of our approach to water only, generalization to other solvents is straight forward as long as the internal dynamics of the solvent molecules does not couple to their rotational dynamics. An implementation is available for download^{5}^{5}5https://gitlab.gwdg.de/lheinz/hydration_entropy as a python package with a C++ backend for fast neighbor search.
L.P.H thanks the International Max Planck Research School for Physics of Biological and Complex Systems for support through a PhD Fellowship.
References
 BenNaim 1975 BenNaim, A. Hydrophobic interaction and structural changes in the solvent. Biopolymers: Original Research on Biomolecules 1975, 14, 1337–1355.
 Chandler 2005 Chandler, D. Interfaces and the driving force of hydrophobic assembly. Nature 2005, 437, 640.
 Berne et al. 2009 Berne, B. J.; Weeks, J. D.; Zhou, R. Dewetting and hydrophobic interaction in physical and biological systems. Annual review of physical chemistry 2009, 60.
 Dias et al. 2010 Dias, C. L.; AlaNissila, T.; Wongekkabut, J.; Vattulainen, I.; Grant, M.; Karttunen, M. The hydrophobic effect and its role in cold denaturation. Cryobiology 2010, 60, 91–99.
 Grdadolnik et al. 2017 Grdadolnik, J.; Merzel, F.; Avbelj, F. Origin of hydrophobicity and enhanced water hydrogen bond strength near purely hydrophobic solutes. Proceedings of the National Academy of Sciences 2017, 114, 322–327.
 Cheng and Rossky 1998 Cheng, Y.K.; Rossky, P. J. Surface topography dependence of biomolecular hydrophobic hydration. Nature 1998, 392, 696.
 Tarek and Tobias 2000 Tarek, M.; Tobias, D. J. The dynamics of protein hydration water: a quantitative comparison of molecular dynamics simulations and neutronscattering experiments. Biophysical journal 2000, 79, 3244–3257.
 Sugita and Okamoto 1999 Sugita, Y.; Okamoto, Y. Replicaexchange molecular dynamics method for protein folding. Chemical physics letters 1999, 314, 141–151.
 De Vries et al. 2004 De Vries, A. H.; Mark, A. E.; Marrink, S. J. Molecular dynamics simulation of the spontaneous formation of a small DPPC vesicle in water in atomistic detail. Journal of the American Chemical Society 2004, 126, 4488–4489.
 Kirkwood 1935 Kirkwood, J. G. Statistical mechanics of fluid mixtures. The Journal of Chemical Physics 1935, 3, 300–313.
 Peter et al. 2004 Peter, C.; Oostenbrink, C.; van Dorp, A.; van Gunsteren, W. F. Estimating entropies from molecular dynamics simulations. The Journal of chemical physics 2004, 120, 2652–2661.
 Errington and Debenedetti 2001 Errington, J. R.; Debenedetti, P. G. Relationship between structural order and the anomalies of liquid water. Nature 2001, 409, 318.
 Errington et al. 2002 Errington, J. R.; Debenedetti, P. G.; Torquato, S. Cooperative origin of lowdensity domains in liquid water. Physical review letters 2002, 89, 215503.
 Yan et al. 2007 Yan, Z.; Buldyrev, S. V.; Kumar, P.; Giovambattista, N.; Debenedetti, P. G.; Stanley, H. E. Structure of the firstand secondneighbor shells of simulated water: Quantitative relation to translational and orientational order. Physical Review E 2007, 76, 051201.
 Godec et al. 2011 Godec, A.; Smith, J. C.; Merzel, F. Increase of both order and disorder in the first hydration shell with increasing solute polarity. Physical review letters 2011, 107, 267801.
 Heyden 2019 Heyden, M. Heterogeneity of water structure and dynamics at the proteinwater interface. The Journal of chemical physics 2019, 150, 094701.
 Wallace 1987 Wallace, D. C. On the role of density fluctuations in the entropy of a fluid. The Journal of chemical physics 1987, 87, 2282–2284.
 Baranyai and Evans 1989 Baranyai, A.; Evans, D. J. Direct entropy calculation from computer simulation of liquids. Physical Review A 1989, 40, 3817.
 Lazaridis 1998 Lazaridis, T. Inhomogeneous fluid approach to solvation thermodynamics. 1. Theory. The Journal of Physical Chemistry B 1998, 102, 3531–3541.
 Lazaridis 1998 Lazaridis, T. Inhomogeneous fluid approach to solvation thermodynamics. 2. Applications to simple fluids. The Journal of Physical Chemistry B 1998, 102, 3542–3550.
 Nguyen et al. 2012 Nguyen, C. N.; Kurtzman Young, T.; Gilson, M. K. Grid inhomogeneous solvation theory: hydration structure and thermodynamics of the miniature receptor cucurbit [7] uril. The Journal of chemical physics 2012, 137, 044101.
 Nguyen et al. 2012 Nguyen, C. N.; Young, T. K.; Gilson, M. K. Erratum:“Grid inhomogeneous solvation theory: Hydration structure and thermodynamics of the miniature receptor cucurbit [7] uril”[J. Chem. Phys. 137, 044101 (2012)]. The Journal of chemical physics 2012, 137, 149901.
 Lin et al. 2003 Lin, S.T.; Blanco, M.; Goddard III, W. A. The twophase model for calculating thermodynamic properties of liquids from molecular dynamics: Validation for the phase diagram of LennardJones fluids. The Journal of chemical physics 2003, 119, 11792–11805.
 Lin et al. 2010 Lin, S.T.; Maiti, P. K.; Goddard III, W. A. Twophase thermodynamic model for efficient and accurate absolute entropy of water from molecular dynamics simulations. The Journal of Physical Chemistry B 2010, 114, 8191–8198.
 Persson et al. 2017 Persson, R. A.; Pattni, V.; Singh, A.; Kast, S. M.; Heyden, M. Signatures of solvation thermodynamics in spectra of intermolecular vibrations. Journal of chemical theory and computation 2017, 13, 4467–4481.

Kozachenko and Leonenko 1987
Kozachenko, L.; Leonenko, N. N. Sample estimate of the entropy of a random vector.
Problemy Peredachi Informatsii 1987, 23, 9–16.  Singh et al. 2003 Singh, H.; Misra, N.; Hnizdo, V.; Fedorowicz, A.; Demchuk, E. Nearest neighbor estimates of entropy. American journal of mathematical and management sciences 2003, 23, 301–321.
 Kraskov et al. 2004 Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Physical review E 2004, 69, 066138.
 Matsuda 2000 Matsuda, H. Physical nature of higherorder mutual information: Intrinsic correlations and frustration. Physical Review E 2000, 62, 3096.
 Hnizdo et al. 2007 Hnizdo, V.; Darian, E.; Fedorowicz, A.; Demchuk, E.; Li, S.; Singh, H. Nearestneighbor nonparametric method for estimating the configurational entropy of complex molecules. Journal of computational chemistry 2007, 28, 655–668.
 Hnizdo et al. 2008 Hnizdo, V.; Tan, J.; Killian, B. J.; Gilson, M. K. Efficient calculation of configurational entropy from molecular simulations by combining the mutualinformation expansion and nearestneighbor methods. Journal of computational chemistry 2008, 29, 1605–1614.
 Fengler 2011 Fengler, M. Estimating Orientational Water Entropy at Protein Interfaces. Ph.D. thesis, GeorgAugustUniversität Göttingen, 2011.

Goethe et al. 2017
Goethe, M.; Fita, I.; Rubi, J. M. Testing the mutual information expansion of entropy with multivariate Gaussian distributions.
The Journal of chemical physics 2017, 147, 224102.  Tsybakov and Van der Meulen 1996 Tsybakov, A. B.; Van der Meulen, E. Rootn consistent estimators of entropy for densities with unbounded support. Scandinavian Journal of Statistics 1996, 75–83.
 Evans 2008 Evans, D. A computationally efficient estimator for mutual information. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences. 2008; pp 1203–1215.
 Hensen et al. 2010 Hensen, U.; Lange, O. F.; Grubmüller, H. Estimating absolute configurational entropies of macromolecules: the minimally coupled subspace approach. PloS one 2010, 5, e9179.
 Hopf 1964 Hopf, H. Selecta Heinz Hopf; Springer, 1964; pp 38–63.
 Yershova et al. 2010 Yershova, A.; Jain, S.; Lavalle, S. M.; Mitchell, J. C. Generating uniform incremental grids on SO (3) using the Hopf fibration. The International journal of robotics research 2010, 29, 801–812.
 Karney 2007 Karney, C. F. Quaternions in molecular modeling. Journal of Molecular Graphics and Modelling 2007, 25, 595–604.
 Huynh 2009 Huynh, D. Q. Metrics for 3D rotations: Comparison and analysis. Journal of Mathematical Imaging and Vision 2009, 35, 155–164.
 Huggins 2014 Huggins, D. J. Comparing distance metrics for rotation using the knearest neighbors algorithm for entropy estimation. Journal of computational chemistry 2014, 35, 377–385.
 Singh and Póczos 2016 Singh, S.; Póczos, B. Analysis of knearest neighbor distances with application to entropy estimation. arXiv preprint arXiv:1603.08578 2016,
 Ravani and Roth 1983 Ravani, B.; Roth, B. Motion synthesis using kinematic mappings. Journal of mechanisms, Transmissions, and Automation in Design 1983, 105, 460–467.
 44 Inc., W. R. Mathematica, Version 10.0. Champaign, IL, 2014.
 Boytsov and Naidan 2013 Boytsov, L.; Naidan, B. Engineering efficient and effective nonmetric space library. International Conference on Similarity Search and Applications. 2013; pp 280–293.
 Uhlmann 1991 Uhlmann, J. K. Satisfying general proximity/similarity queries with metric trees. Information processing letters 1991, 40, 175–179.
 Yianilos 1993 Yianilos, P. N. Data structures and algorithms for nearest neighbor search in general metric spaces. SODA. 1993; pp 311–321.
 Berendsen et al. 1995 Berendsen, H. J.; van der Spoel, D.; van Drunen, R. GROMACS: a messagepassing parallel molecular dynamics implementation. Computer Physics Communications 1995, 91, 43–56.
 Van Der Spoel et al. 2005 Van Der Spoel, D.; Lindahl, E.; Hess, B.; Groenhof, G.; Mark, A. E.; Berendsen, H. J. GROMACS: fast, flexible, and free. Journal of computational chemistry 2005, 26, 1701–1718.
 Hess et al. 2008 Hess, B.; Kutzner, C.; Van Der Spoel, D.; Lindahl, E. GROMACS 4: algorithms for highly efficient, loadbalanced, and scalable molecular simulation. Journal of chemical theory and computation 2008, 4, 435–447.
 Pronk et al. 2013 Pronk, S.; Páll, S.; Schulz, R.; Larsson, P.; Bjelkmar, P.; Apostolov, R.; Shirts, M. R.; Smith, J. C.; Kasson, P. M.; van der Spoel, D.; Hess, B.; Lindahl, E. GROMACS 4.5: a highthroughput and highly parallel open source molecular simulation toolkit. Bioinformatics 2013, 845–854.
 Pall et al. 2014 Pall, S.; Abraham, M. J.; Kutzner, C.; Hess, B.; Lindahl, E. Tackling exascale software challenges in molecular dynamics simulations with GROMACS. International Conference on Exascale Applications and Software. 2014; pp 3–27.
 Amadei et al. 2000 Amadei, A.; Chillemi, G.; Ceruso, M.; Grottesi, A.; Di Nola, A. Molecular dynamics simulations with constrained rototranslational motions: theoretical basis and statistical mechanical consistency. The Journal of Chemical Physics 2000, 112, 9–23.
 Brooks et al. 1983 Brooks, B. R.; Bruccoleri, R. E.; Olafson, B. D.; States, D. J.; Swaminathan, S.; Karplus, M. CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. Journal of computational chemistry 1983, 4, 187–217.
 MacKerell Jr et al. 1998 MacKerell Jr, A. D.; Bashford, D.; Bellott, M.; Dunbrack Jr, R. L.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; JosephMcCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher III, W. E.; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; WiórkiewiczKuczera, J.; Yin, D.; Karplus, M. Allatom empirical potential for molecular modeling and dynamics studies of proteins. The journal of physical chemistry B 1998, 102, 3586–3616.
 Brooks et al. 2009 Brooks, B. R.; Brooks, C. L.; MacKerell, A. D.; Nilsson, L.; Petrella, R. J.; Roux, B.; Won, Y.; Archontis, G.; Bartels, C.; Boresch, S.; Caflisch, A.; Caves, L. S. D.; Cui, Q.; Dinner, A. R.; Feig, M.; Fischer, S.; Gao, J.; Hodos̆ c̆ek, M.; Im, W.; Kuczera, K.; Lazaridis, T.; Jun, M.; Ovchinnikov, V.; Paci, E.; Pastor, R. W.; Post, C. B.; Pu, J. Z.; Schaefer, M.; Tidor, B.; Venable, R. M.; Woodcock, H. L.; Wu, X.; Yang, W.; York, D. M.; Karplus, M. CHARMM: the biomolecular simulation program. Journal of computational chemistry 2009, 30, 1545–1614.
 Huang and MacKerell 2013 Huang, J.; MacKerell, A. D. CHARMM36 allatom additive protein force field: Validation based on comparison to NMR data. Journal of computational chemistry 2013, 34, 2135–2145.
 Huang et al. 2017 Huang, J.; Rauscher, S.; Nawrocki, G.; Ran, T.; Feig, M.; de Groot, B. L.; Grubmüller, H.; MacKerell Jr, A. D. CHARMM36m: an improved force field for folded and intrinsically disordered proteins. nature methods 2017, 14, 71.
 Jorgensen et al. 1983 Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of simple potential functions for simulating liquid water. The Journal of chemical physics 1983, 79, 926–935.
 Miyamoto and Kollman 1992 Miyamoto, S.; Kollman, P. A. Settle: An analytical version of the SHAKE and RATTLE algorithm for rigid water models. Journal of computational chemistry 1992, 13, 952–962.
 Darden et al. 1993 Darden, T.; York, D.; Pedersen, L. Particle mesh Ewald: An N log (N) method for Ewald sums in large systems. The Journal of chemical physics 1993, 98, 10089–10092.
 Jones 1924 Jones, J. E. On the determination of molecular fields.—II. From the equation of state of a gas. Proc. R. Soc. Lond. A 1924, 106, 463–477.
 Bussi et al. 2007 Bussi, G.; Donadio, D.; Parrinello, M. Canonical sampling through velocity rescaling. The Journal of chemical physics 2007, 126, 014101.
 Andersen 1980 Andersen, H. C. Molecular dynamics simulations at constant pressure and/or temperature. The Journal of chemical physics 1980, 72, 2384–2393.
 Parrinello and Rahman 1981 Parrinello, M.; Rahman, A. Polymorphic transitions in single crystals: A new molecular dynamics method. Journal of Applied physics 1981, 52, 7182–7190.
 Reinhard and Grubmüller 2007 Reinhard, F.; Grubmüller, H. Estimation of absolute solvent and solvation shell entropies via permutation reduction. The Journal of chemical physics 2007, 126, 014102.
 Reinhard et al. 2009 Reinhard, F.; Lange, O. F.; Hub, J. S.; Haas, J.; Grubmüller, H. g_permute: Permutationreduced phase space density compaction. Computer Physics Communications 2009, 180, 455–458.
 Beutler et al. 1994 Beutler, T. C.; Mark, A. E.; van Schaik, R. C.; Gerber, P. R.; Van Gunsteren, W. F. Avoiding singularities and numerical instabilities in free energy calculations based on molecular simulations. Chemical physics letters 1994, 222, 529–539.
 Hensen et al. 2009 Hensen, U.; Grubmüller, H.; Lange, O. F. Adaptive anisotropic kernels for nonparametric estimation of absolute configurational entropies in highdimensional configuration spaces. Physical Review E 2009, 80, 011913.

Gao et al. 2015
Gao, S.; Ver Steeg, G.; Galstyan, A. Efficient estimation of mutual information for strongly dependent variables. Artificial Intelligence and Statistics. 2015; pp 277–286.
 Khan et al. 2007 Khan, S.; Bandyopadhyay, S.; Ganguly, A. R.; Saigal, S.; Erickson III, D. J.; Protopopescu, V.; Ostrouchov, G. Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data. Physical Review E 2007, 76, 026209.
Comments
There are no comments yet.