Optimal measurement network of pairwise differences

by   Huafeng Xu, et al.

When both the difference between two quantities and their individual values can be measured or computational predicted, multiple quantities can be determined from the measurements or predictions of select individual quantities and select pairwise differences. These measurements and predictions form a network connecting the quantities through their differences. Here, I analyze the optimization of such networks, where the trace (A-optimal), the largest eigenvalue (E-optimal), or the determinant (D-optimal) of the covariance matrix associated with the estimated quantities are minimized with respect to the allocation of the measurement (or computational) cost to different measurements (or predictions). My statistical analysis of the performance of such optimal measurement networks -- based on large sets of simulated data -- suggests that they substantially accelerate the determination of the quantities, and that they may be useful in applications such as the computational prediction of binding free energies of candidate drug molecules.



There are no comments yet.



Ellipse Combining with Unknown Cross Ellipse Correlations

We discuss the combining of measurements where single measurement covari...

Recovery of spectrum from estimated covariance matrices and statistical kernels for machine learning and big data

In this paper we propose two schemes for the recovery of the spectrum of...

Leveraging Legacy Data to Accelerate Materials Design via Preference Learning

Machine learning applications in materials science are often hampered by...

UPB at SemEval-2021 Task 8: Extracting Semantic Information on Measurements as Multi-Turn Question Answering

Extracting semantic information on measurements and counts is an importa...

Data-driven determination of the spin Hamiltonian parameters and their uncertainties: The case of the zigzag-chain compound KCu_4P_3O_12

We propose a data-driven technique to estimate the spin Hamiltonian, inc...

Bayesian mass averaging

Mass averaging is pivotal in turbomachinery. In both experiments and CFD...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

It is quite common in a scientific study that multiple quantities are to be determined from the measurements of their individual values and their pairwise differences. (I will refer to both experimental measurements and computational predictions as measurements unless otherwise distinguished.) An important example is the determination of the binding affinities of a set of molecules for a target receptor of pharmaceutical interest, which is instrumental to ranking and selecting candidate molecules in all drug discovery projects. The binding free energy of an individual molecule may be measured by titration experiments  or computationally predicted by absolute binding free energy calculations Boresch et al. (2003); the difference between the binding free energies of any two molecules can be measured experimentally Annis et al. (2004); Lin et al. (2013) or computationally predicted by relative binding free energy calculations Cournia, Allen, and Sherman (2017); Wang et al. (2015). Binding free energy calculations have demonstrated sufficient accuracy Harder et al. (2016) and are increasingly adopted in the pharmaceutical industry Wang et al. (2015). The binding free energies of all the molecules in the set can be estimated from the appropriate combinations of the individual values and the pairwise differences (see below).

Compared to measuring only the individual quantities, including measurements of pairwise differences may substantially improve the statistical precision in the estimated quantities. This is especially true when the statistical uncertainties in the measurements of the differences are much lower than that in the measurements of the individual quantities. For example, the calculation of absolute binding free energies tends to have larger statistical uncertainty than that of relative binding free energies at the same computational cost. A related example is the computation of the normalization constants of multivariate probability models, a common problem in machine learning. The individual normalization constants are all but impossible to compute as they involve multidimensional integrals, but the relative ratios between two normalization constants can be computed using the technique of bridge sampling 

Meng and Wong (1996); Gronau et al. (2017), which is equivalent to the free energy perturbation technique used in binding free energy calculations Bennett (1976). In experiments, measurements of the differences have higher statistical precision than measurements of individual quantities when the differential value between two quantities lies within the detection range of the experimental technique but the individual values are outside the detection limit.

The attendant statistical uncertainty associated with the measurement of each quantity or pairwise difference depends on the inherent fluctuation in the outcome of the measurement and the resources allocated to the measurement, i.e. the number of repetitions in the case of experimental measurements or the number of uncorrelated data points sampled in the case of computational predictions. The optimal estimator of the quantities from the individual values and the pairwise differences should take into account of the statistical uncertainties of each measurement. Given these statistical uncertainties, a maximum likelihood estimator can be readily constructed Stephen Boyd (2004); Pukelsheim (2006); Wang et al. (2013) (Eq. 2-4 below).

The overall statistical precision of the estimated quantities–characterized by the covariance matrix–depends on the resources allocated to different measurements. Given fixed total measurement resources (i.e. cost), what are their optimal allocations to a combination of measurements of individual quantities and pairwise differences so as to minimize the overall statistical error, and how much do such optimal allocations decrease the statistical error compared to the naive allocations?

Here, I apply the mathematical results from optimal experimental designs Pukelsheim (2006); Stephen Boyd (2004) to the special case of optimizing the allocations of measurement resources to the measurements of individual quantities and their pairwise differences, and I characterize the statistical performance of such optimizations in terms of the reduction in the statistical error in the estimated quantities at fixed total measurement cost. The -optimal (minimizing the trace of the covariance matrix) and the -optimal (minimizing the determinant) are found by iterative numerical minimization. For the -optimal (minmizing the largest eigenvalue), I present a new mathematical theorem that enables the minimization to be solved by construction. My results suggest that optimal designs of the measurements can substantially reduce the statistical error in the estimated quantities, allowing the same statistical precision to be achieved at–on average–less than half the measurement cost when compared to naive allocations. The Python code for generating the optimal designs is made available as free open source software (https://github.com/forcefield/DiffNet).

Ii Theory

Suppose that we want to measure a set of quantities . We can either measure each individual quantity with the estimator , or the difference between any pair of quantities and with the estimator , where and are the respective statistical errors in the measurements. For each measurement

, the statistical variance of the estimate decreases with

–the resource allocated to the measurement–as


where is the statistical fluctuation in the corresponding experimental measurement or computer sampling.

The quantities and the measurements form a network–which I will refer to as the difference network–represented by a graph of vertices and edges, where the vertices stand for the quantities , the edge between vertices stands for the difference measurement , and the edge between vertices and stands for the individual measurement . Each edge can be either associated with the corresponding fluctuation (the resulting weighted graph will be denoted as ) or with the number of samples (). I will denote and .

For a given set of and the corresponding per Eq. 1, the maximum likelihood estimator for is Stephen Boyd (2004) (see Appendix A for a derivation)




and is the Fisher information matrix:


The covariance in the estimates of is given by the inverse of the Fisher information matrix:


In the classic theory of optimal design of experiments Pukelsheim (2006), three objectives of minimizations of statistical errors with respect to are commonly sought

  • -optimal: minimize ,

  • -optimal: minimize ,

  • -optimal: minimize ,

where denotes the trace of the matrix , denotes the determinant of , and denotes the spectral norm–or, equivalently, the largest eigenvalue–of , subject to the constraints of non-negativity


and of total fixed measurement cost


The -optimal design minimizes the total variance of the estimated quantities, the -optimal design minimizes the volume of the confidence ellipsoid for a fixed confidence level, and the -optimal design minimizes the diameter of the confidence ellipsoid.

Although are integers, the above minimizations are more conveniently performed if are allowed to be real numbers. They can then be rounded up to the next integers: . This is a reasonable approximation when .

All three objectives are convex functions of (see Appendix B), and the minimization can be performed by standard algorithms in convex optimization Stephen Boyd (2004). For the -optimal design, the minimization can be cast by duality as a semidefinite programming problem (SDP):

minimize (8)
subject to

where signifies that the symmetric matrix is positive semidefinite, and is the diagonal matrix with the diagonal elements .

For the -optimal design, the minimization can be similarly cast as an SDP. But for the difference network, I present the following theorem (see Appendix C for proof), which allows the minimization problem to be solved by construction, with a fixed time complexity of and space complexity of , and which, to my knowledge, has not appeared in any previously publication.

Theorem 1

Let be the shortest path from vertex 0 to vertex in the graph (i.e., the path with the smallest ), and be the tree rooted at vertex 0 from the resulting union. Denoting and


is minimized by the following set of :


where is the subtree rooted at vertex , and is the vertex immediately preceding in the path from 0 to .

The shortest paths can be constructed by the single-source-multiple-desitination Dijkstra algorithm (see, e.g., Ref. Kurt Mehlhorn, 2008), which guarantees that is a tree. The -optimal by construction is substantially faster than by minimization using SDP: for randomly generated , the speedup of the former over the latter (implemented in CVXOPT Andersen, Dahl, and Vandenberghe ) is for and for . In Appendix D, I suggest a special class of difference networks in which the - and the -optimals may also be solved by construction.

Often, there is a cost associated with generating each sampling point in the estimator , and the total cost is . We can, however, introduce , and , and solve the same problem as above for , with the parameters .

Iii Results

Figure 1: Optimal allocation of computational sampling to the binding free energy calculations for COX-2 inhibitors. A The 8 inhibitors are generated by different combinations of 3 rings, with 2 different options for each ring. The weighted graph is shown. Each filled vertex represents an inhibitor. The edge between any two such vertices and represents the relative binding free energy calculation between inhibitors and , and the thickness of the edge is proportional to the corresponding . The edge between the unfilled vertex at the bottom and a vertex represents the absolute binding free energy calculation of inhibitor , and its thickness is proportional to . B The -, -, and -optimals of the corresponding difference network, where the thickness of each edge is proportional according to the optimal allocations. C The optimals when the binding free energies of Celecoxib and Rofecoxib are known and used as references, and only relative binding free energy calculations are used to computed the binding free energies of the other 6 inhibitors. For comparison, the minimum spanning tree (MST) of is also shown, where is constant for each edge in the tree.

I consider the potential application of the optimal difference network to the problem of binding free energy calculations, where the binding free energies of a set of molecules for a target receptor are predicted, so as to prioritize candidate molecules in drug discovery projects. The binding free energy of an individual molecule can be computed by the technique of absolute binding free energy calculations Boresch et al. (2003); Mobley et al. (2007); Woo and Roux (2005); Aldeghi et al. (2015), and the difference in the binding free energies of two molecules can be computed by the technique of relative binding free energy calculations Tembre and Cammon (1984); Radmer and Kollman (1997); Wang et al. (2015). Optimal estimators for individual absolute or relative binding free energy calculations have been developed Bennett (1976); Shirts and Chodera (2008). At fixed computational cost, the statistical errors of free energy calculations are bound by the thermodynamic length between the thermodynamic end states Shenfeld et al. (2009). Typically, the end states are more dissimilar in absolute binding free energy calculations than in relative binding free energy calculations. As a result, the statistical errors in the former are often substantially larger than those in the latter given the same amount of computational cost. Optimal allocation of computational resources to different absolute and relative binding free energy calculations may lead to improved efficiency in predicting the binding free energies of a set of molecules. Past effort aimed at providing redundancy and consistency checks in the calculations Liu et al. (2013). In contrast, the optimals in this work focus on minimizing the overall statistical error.

I first illustrate the optimal difference networks using the example of the calculation of the binding free energies of inhibitors for the COX-2 protein Yamakawa et al. (2014). The optimals depend on the statistical fluctuations , which are unknown a priori. I make the simple assumption that the in the absolute binding free energy calculations is , where is the number of heavy atoms in molecule and is a constant, and that the in the relative binding free energy calculations is , where is the number of heavy atoms in molecule that do not transform into atoms in molecule in the relative binding free energy calculation of molecules and . More realistic values of can be dynamically determined during the iterative rounds of binding free energy calculations; they may even be predicted more accurately by machine learning models trained on a large collection of binding free energy calculations. The optimal allocations are shown in Fig. 1B.

In a typical drug discovery projects, there will be a few molecules whose binding free energies have already been experimentally determined, and, using these molecules as references, relative binding free energy calculations can be used to predict the binding free energies for other similar molecules. The -, -, and -optimals for such relative binding free energy calculations are shown in Fig. 1C for the COX-2 inhibitors, using Celecoxib and Rofecoxib as the reference molecules. Because the values above reflects the assummption that relative binding free energy calculations have substantially lower statistical errors than absolute binding free energy calculations, this network of relative binding free energy calculations–taking advantage of the known binding free energies of the reference molecules–yield much lower overall statistical errors than the previous network in Fig. 1B.

Next, I characterize the statistical performance of the optimal difference networks in comparison to naive choices of . The -, -, -optimals and various other naive allocations are applied to randomly generated sets of , and the resulting covariance matrices are compared in terms of their traces, determinants, and spectral norms (Fig. 2). The optimal allocations consistently lead to substantial reductions in the statistical errors in comparison to all naive allocations.

Figure 2: Statistical performance of different allocations. Here , and are uniformly disitrbuted between 1 and 5. As the reference, I chose for , where is the minimum spanning tree (MST) of the weighted graph . The covariances of the various allocations are computed for randomly generated sets of , and the ratios of , , and

to the corresponding reference values are shown here in boxplots, with the median ratios and the quartiles indicated.

For the randomly generated sets of , the -optimal outperforms all tested schemes of naive allocations by all three criteria (, , and ; it also yields a close to that of the -optimal and a close to that of the -optimal (Fig. 2). Compared to the -optimal, which has the second lowest average , the -optimal reduces by a factor of ; compared to the naive allocation of , which has the lowest average among the tested naive allocations, the -optimal reduces by a factor of . These observations suggest that the -optimal–which has the simple interpretation of minimizing the total variance of the measured quantities–may be a good default choice in designing difference networks.

Figure 3: Statistical distributions of in the optimal allocations, for the same random sets of as in Fig. 2. The fractions of edges in the weighted graph with different ratios are shown on the left. The right panels show the distributions of the minimum number of edges that need to be added to the graph –after eliminating the edges with –to make it 2-connected (i.e. there are at least 2 distinct paths connecting any two vertices in ). The majority of the -optimal networks are 2-connected.

The majority of the - and -optimal networks are densely connected (the -optimal is a tree hence sparsely connected). For example, in the -optimal allocations for the above randomly generated sets of , on average individual measurements are applied to 11.8 of the individual quantities, and difference measurements are applied to 86.7 of the pairs (Fig. 3). Thus the implementation of the - and -optimal measurement networks will entail massively parallel tasks.

It is desirable that the measurement networks are 2-connected–i.e.

, at least 2 edges need to be removed in order to disconnect the graph–such that for any two quantities, there are two paths through which their differences can be computed. This allows a self-consistency check: the differences computed both ways should be approximately equal. Such checks can reveal potential measurement errors and outliers. The minimum number of edges that need to be added to each optimal network above to make it 2-connected is computed and its distributions are shown in Fig. 

3. The majority (98.5%) of the -optimal networks are 2-connected, and all of them become 2-connected with the addition of at most 1 edge. This property of the -optimal allocation also makes it a good choice in designing difference networks.

Iv Discussions

This work explored the application of optimal experimental design to the determination of multiple quantities by the measurements of select individual quantities and select pairwise differences, such as in the computational prediction of binding free energies for candidate drug molecules. A related work was recently published by Yang et al. Yang et al. (2019), in which the authors proposed using discrete - or -optimals to select the pairs of relative binding free energy calculations for predicting the binding free energies of a set of molecules. Whereas Yang et al. addressed the question of between which pairs of molecules relative binding free energy calculations should be performed, given a fixed number of calculations (i.e. , under the constraint , being the total number of calculations), I addressed here the question of how much sampling should be allocated to the relative binding free energy calculations between each possible pair of molecules, given a fixed total amount of sampling (i.e. , under the constraint , being the total number of samples in all the calculations). The expanded domain of in the latter allows additional control in the design of the calculations, which further reduces the resulting statistical errors.

In this work, I have demonstrated that optimal designs of difference networks can substantially reduce the statistical errors in determining a set of quantities at fixed measurement cost. It is worth noting that such optimal designs are applicable only when two conditions–discussed below–are met: first, the total measurement cost, , must be large in comparison to the number of pairs; second, the statistical fluctuations in the measurements, , must be available.

Because the optimal designs in this work have been derived for real numbers of , the total measurement cost have to be sufficiently large () for the integer values not to deviate significantly from the optimal design and not to affect the total measurement cost (since , where is the number of edges with in the difference network ). This condition is usually satisfied in samping-based computational predictions such as binding free energy calculations, because in such cases is the total number of independent sampling points, which can be in the range of tens of millions. Automated experiments permitting large numbers of repetitions may also satisfy this condition and thus benefit from such optimal designs.

Design of the optimal difference network requires as input the statistical fluctuations associated with each measurement, which is usually unknown a priori. Thus the application of the optimal difference networks needs to proceed in an iterative manner. Starting with an initial guess of (e.g. the statistical fluctuation in relative or absolute binding free energy calculations may be predicted by a machine-learning model trained on a collection of past free energy calculations), the quantities can be measured with the corresponding optimal network at a small total measurement cost; For any measurement performed, update by the actual statistical variance of the measurement, and re-optimize the network accordingly; Repeat this process at increasingly larger total cost until sufficiently precise estimates of the quantities are reached.

The optimal difference network may help accelerate a variety of scientific inquiries. Improving the statistical precision of binding free energy predictions, for instance, allows better selection of candidate drug molecules in drug discovery projects, as suggested by Yang et al. Yang et al. (2019). The optimal difference network may find immediate use in the parameterization of computational models, where the same set of quantities are computed repeatedly in search for the values of the model parameters that best fit benchmark data. For example, binding free energy calculations for sets of inhibitors binding to their receptor targets Harder et al. (2016) and solvation free energy calculations for a large number of solutes Mobley and Guthrie (2014) are routinely used to test the accuracy of molecular force fields. The increased statistical efficiency conferred by the optimal difference networks will allow faster assessment of the quantitative accuracy of the models and thus shorten their development cycles.

Appendix A Derivation of the estimator and the covariance in the difference network

The set of quantities can be estimated from the measurements by the maximum likelihood (ML) estimator. Let the be the variance for the measurement , the likelihood that the measurements produce the set of values , given the true values of the quantities , is


The above assumes that the samples in the measurements are independent.

Maximizing with respect to yields Eq. 2-4. The convariance matrix for the ML estimator is given by the inverse of the Fisher information matrix, hence Eq. 5.

Appendix B Proof of convexity of and

The objective functions , , and have all been previously shown to be convex functions of  Stephen Boyd (2004). Here I include simple proofs for the convexity of and .

The Fisher information matrix is a linear function of . Both and are symmetric and positive definite. Consider a perturbation to , where is an arbitrary symmetric matrix. The perturbed covariance matrix satisfies



is the identity matrix. Differentiating Eq. 

12 with respect to twice, we have


which is positive definite.



proving the convexity of in .

To prove that is convex in , note that


where is the smallest eigenvalue of .

According to the second-order perturbation theory, for the perturbed matrix , the corresponding is


where is the ’th eigenvalue of (), and

is the corresponding normalized eigenvector. The second derivative of



because ,



proving the convexity of in .

Appendix C Proof that Eq. 10 is the -optimal

The largest eigenvalue of is the inverse of the smallest eigenvalue of (Eq. 15), so the problem of minimizing is equivalent to maximizing the smallest eigenvalue of :


The smallest eigenvalue of satisfies


for all normal vectors

; the equality holds if and only if is the eigenvector of corresponding to . Our problem is thus to find


I have


Plugging in , I have


Given any vector , eq. 23 is maximized with respect to when the only non-zero ’s are the ones corresponding to the largest value of or . There may be degenerate set of with the largest values:




The maximimum can be achieved by different values of , so long as only if .

If I can determine a set and a set of such that , and the corresponding Fisher information matrix has an eigenvector of the lowest eigenvalue satisfying eq. 24, I have


Such a set of would thus be a solution of eq. 21. There may be degenerate solutions of eq. 21 and to our problem.

Next I show how to construct such a set of by constructing its corresponding eigenvector that satisfies eq. 24.

Consider a complete graph consisting of vertices, indexed as . The edge between the vertex and the vertex is assigned weight , and the edge between the vertex and the vertex is assigned weight . For each vertex , find the shortest path from (i.e., with the smallest sum ). We also denote , , and . I will show that the vector is the sought eigenvector, and that comprises the edges of a tree–rooted at vertex –that is the union of the shortest path to each vertex .

Consider the union of : . If there is any loop in , an arbitrary edge in the loop can be removed and the resulting graph will still contain the shortest path from to every vertex . Thus the tree can be constructed by removing all the loops in . In fact, if are found by the Dijkstra’s algorithm, will not contain any loops. This is taken to be the case below.

First, I prove that and the corresponding satisfy eq. 24. Clearly


If there exists a pair such that , it implies that the shortest path to is followed by edge , with the sum , which is a contradiction.

I can now construct the set so that is the eigenvector of , i.e.,


Note that only if . Denote the set of vertices whose edge to vertex are in as (which may include the vertex 0), the elements of can be written as




and Eq. 28 becomes


Let be the vertex immediately preceding in the path connecting to . is the only vertex in for which . Otherwise, let be another vertex such that , which implies that the edge cannot be part of , and thus both and are paths connecting to , in contradiction to the fact that is without any loop. Another corollary is that the edge must be in the path for every in the subtree rooted at vertex .

I can write


Eq. 32 can be solved by starting from the leaves of the tree and working backwards toward vertex . The solution is


where the sum runs over the set of vertices in the subtree rooted at , including itself.

The eigenvalue can be determined by applying the constraint .


In both and , the product appears once and only once if and only if is in the subtree of or is in the subtree of . This implies that the two sums are equal, and I have




To prove that this in eq. 36 is the smallest eigenvalue of , I show that it is the largest eigenvalue of the corresponding covariance matrix .

is estimated–because is a tree rooted at –by


thus the covariance between and is


Without loss of generality, let’s assume that within vertices have edges to . can be rearranged into block diagonal form where each block corresponds to the covariances between pairs of vertices both within the subtree . Any eigenvalue must be the eigenvalue of . Since for any two vertices sharing edges in and , each is a positive matrix, and by Perron-Frobenius theorem it has only one eigenvector with all positive components, and the corresponding eigenvalue has the largest magnitude. Since is an eigenvector of with all positive components, its corresponding eigenvalue must be the eigenvalue of the largest magnitude for each , thus be the eigenvalue of the largest magnitude for . Q.E.D.

Appendix D The special case of constant relative error

In the special case where the relative error for any measurement is the same, i.e., is a constant, - and -optimals may also be solved by construction. Without loss of generality, the quantities can be ordered by (which is often easily done by qualitative comparison without quantitative measurement), and it is apparent that for . I conjecture that in such cases of constant relative error, the - and -optimals can be solved by the following construction:

Conjecture 1

If and for , is minimized by

Conjecture 2

If and for , is minimized by


where . The minimum value is

Figure 4: The -, -, and -optimals of difference networks where the relative errors in the measurements are a constant. The width of each edge in the is proportional to the allocation .

In the - and -optimals, the subgraph of consisting of only the edges with weights is the minimum spanning tree (MST) of , connecting vertices and , for (Fig. 4). The above conjectures have been corroborated by comparing the constructive solutions with the results of numerical minimizations for many sets of randomly generated , but their rigorous proofs have so far defied me.


  • Boresch et al. (2003) S. Boresch, F. Tettinger, M. Leitgeb,  and M. Karplus, “Absolute Binding Free Energies:  A Quantitative Approach for Their Calculation,” The Journal of Physical Chemistry B 107, 9535–9551 (2003).
  • Annis et al. (2004) D. A. Annis, N. Nazef, C.-C. Chuang, M. P. Scott,  and H. M. Nash, “A General Technique To Rank Protein−Ligand Binding Affinities and Determine Allosteric versus Direct Binding Site Competition in Compound Mixtures,” Journal of the American Chemical Society 126, 15495–15503 (2004).
  • Lin et al. (2013) A. H. Y. Lin, R. W. S. Li, E. Y. W. Ho, G. P. H. Leung, S. W. S. Leung, P. M. Vanhoutte,  and R. Y. K. Man, “Differential Ligand Binding Affinities of Human Estrogen Receptor-α Isoforms,” PLoS ONE 8, e63199 (2013).
  • Cournia, Allen, and Sherman (2017) Z. Cournia, B. Allen,  and W. Sherman, “Relative Binding Free Energy Calculations in Drug Discovery: Recent Advances and Practical Considerations,” Journal of Chemical Information and Modeling  (2017), 10.1021/acs.jcim.7b00564.
  • Wang et al. (2015) L. Wang, Y. Wu, Y. Deng, B. Kim, L. Pierce, G. Krilov, D. Lupyan, S. Robinson, M. K. Dahlgren, J. Greenwood, D. L. Romero, C. Masse, J. L. Knight, T. Steinbrecher, T. Beuming, W. Damm, E. Harder, W. Sherman, M. Brewer, R. Wester, M. Murcko, L. Frye, R. Farid, T. Lin, D. L. Mobley, W. L. Jorgensen, B. J. Berne, R. A. Friesner,  and R. Abel, “Accurate and Reliable Prediction of Relative Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy Calculation Protocol and Force Field,” Journal of the American Chemical Society 137, 2695–2703 (2015).
  • Harder et al. (2016) E. Harder, W. Damm, J. Maple, C. Wu, M. Reboul, J. Xiang, L. Wang, D. Lupyan, M. K. Dahlgren, J. L. Knight, J. W. Kaus, D. S. Cerutti, G. Krilov, W. L. Jorgensen, R. Abel,  and R. A. Friesner, “OPLS3: A Force Field Providing Broad Coverage of Drug-like Small Molecules and Proteins,” Journal of Chemical Theory and Computation 12, 281–296 (2016).
  • Meng and Wong (1996) X.-L. Meng and W. H. Wong, “Simulating ratios of normalizing constants via a simple identity: A theoretical exploration,” Statistica Sinica 6, 831–860 (1996).
  • Gronau et al. (2017) Q. F. Gronau, A. Sarafoglou, D. Matzke, A. Ly, U. Boehm, M. Marsman, D. S. Leslie, J. J. Forster, E.-J. Wagenmakers,  and H. Steingroever, “A tutorial on bridge sampling,” Journal of Mathematical Psychology 81, 80–97 (2017).
  • Bennett (1976) C. H. Bennett, “Efficient estimation of free energy differences from Monte Carlo data,” Journal of Computational Physics 22, 245–268 (1976).
  • Stephen Boyd (2004) L. V. Stephen Boyd, “Convex optimization,”  (Cambridge University Press, 2004) Chap. 7.
  • Pukelsheim (2006) F. Pukelsheim, Optimal design of experiments, Classics in Applied Mathematics, Vol. 50 (SIAM, 2006).
  • Wang et al. (2013) L. Wang, Y. Deng, J. L. Knight, Y. Wu, B. Kim, W. Sherman, J. C. Shelley, T. Lin,  and R. Abel, “Modeling Local Structural Rearrangements Using FEP/REST: Application to Relative Binding Affinity Predictions of CDK2 Inhibitors.” Journal of chemical theory and computation 9, 1282–93 (2013).
  • Kurt Mehlhorn (2008) P. S. Kurt Mehlhorn, Algorithms and Data Structures (Springer, Berlin, Heidelberg, 2008) Chap. 10.
  • (14) M. Andersen, J. Dahl,  and L. Vandenberghe, “CVXOPT,” www.cvxopt.org.
  • Mobley et al. (2007) D. L. Mobley, A. P. Graves, J. D. Chodera, A. C. McReynolds, B. K. Shoichet,  and K. A. Dill, “Predicting Absolute Ligand Binding Free Energies to a Simple Model Site,” Journal of Molecular Biology 371, 1118–1134 (2007).
  • Woo and Roux (2005) H.-J. Woo and B. Roux, “Calculation of absolute protein–ligand binding free energy from computer simulations,” Proceedings of the National Academy of Sciences of the United States of America 102, 6825–6830 (2005).
  • Aldeghi et al. (2015) M. Aldeghi, A. Heifetz, M. J. Bodkin, S. Knapp,  and P. C. Biggin, “Accurate calculation of the absolute free energy of binding for drug molecules,” Chemical Science 7, 207–218 (2015).
  • Tembre and Cammon (1984) B. L. Tembre and J. M. Cammon, “Ligand-receptor interactions,” Computers & Chemistry 8, 281–283 (1984).
  • Radmer and Kollman (1997) R. J. Radmer and P. A. Kollman, “Free energy calculation methods: A theoretical and empirical comparison of numerical errors and a new method qualitative estimates of free energy changes,” Journal of Computational Chemistry 18, 902–919 (1997).
  • Shirts and Chodera (2008) M. R. Shirts and J. D. Chodera, “Statistically optimal analysis of samples from multiple equilibrium states,” The Journal of Chemical Physics 129, 124105 (2008).
  • Shenfeld et al. (2009) D. K. Shenfeld, H. Xu, M. P. Eastwood, R. O. Dror,  and D. E. Shaw, “Minimizing thermodynamic length to select intermediate states for free-energy calculations and replica-exchange simulations,” Physical Review E 80, 046705 (2009).
  • Liu et al. (2013) S. Liu, Y. Wu, T. Lin, R. Abel, J. P. Redmann, C. M. Summa, V. R. Jaber, N. M. Lim,  and D. L. Mobley, “Lead optimization mapper: automating free energy calculations for lead optimization,” Journal of Computer-Aided Molecular Design 27, 755–70 (2013).
  • Yamakawa et al. (2014) N. Yamakawa, K. Suzuki, Y. Yamashita, T. Katsu, K. Hanaya, M. Shoji, T. Sugai,  and T. Mizushima, “Structure–activity relationship of celecoxib and rofecoxib for the membrane permeabilizing activity,” Bioorganic & Medicinal Chemistry 22, 2529–2534 (2014).
  • Yang et al. (2019) Q. Yang, W. W. Burchett, G. S. Steeno, D. L. Mobley,  and X. Hou, “Optimal Designs of Pairwise Calculation: an Application to Free Energy Perturbation in Minimizing Prediction Variability,”  (2019), 10.26434/chemrxiv.7965140.v1.
  • Mobley and Guthrie (2014) D. L. Mobley and J. P. Guthrie, “FreeSolv: a database of experimental and calculated hydration free energies, with input files,” Journal of Computer-Aided Molecular Design 28, 711–720 (2014).