Learning protein conformational space by enforcing physics with convolutions and latent interpolations

10/10/2019 ∙ by Venkata K. Ramaswamy, et al. ∙ 0

Determining the different conformational states of a protein and the transition path between them is a central challenge in protein biochemistry, and is key to better understanding the relationship between biomolecular structure and function. This task is typically accomplished by sampling the protein conformational space with microseconds of molecular dynamics (MD) simulations. Despite advances in both computing hardware and enhanced sampling techniques, MD will always yield a discretized representation of this space, with transition states undersampled proportionally to their associated energy barrier. We design a convolutional neural network capable of learning a continuous and physically plausible conformational space representation, given example conformations generated by experiments and simulations. We show that this network, trained with MD simulations of two distinct protein states, can correctly predict a possible transition path between them, without any example on the transition state provided. We then show that our network, having a protein-independent architecture, can be trained in a transfer learning scenario, leading to performances superior to those of a network trained from scratch.



There are no comments yet.


page 2

page 5

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Proteins are dynamic systems that undergo conformational transitions essential for their biological functions. Stimulated by either environmental conditions or ligand binding, proteins hop between many different configurations. Proteins are often composed of thousands of atoms, that should in principle be associated to an enormous amount of possible arrangements. However, only a small energetically favourable subset of these arrangements is accessible  [1], and only a subset of these accessible states, those of lowest energy, can be observed at atomic resolution by experimental techniques. Molecular dynamics (MD) simulations are a common approach to characterize transition paths between known states. However, these transitions can sometimes take place in the timescale of milliseconds or higher (e.g. the full cycle of GroEL chaperonin from its closed to open state takes 15s [2]) which are beyond the scope of conventional MD simulations.

Deep neural networks are able to learn continuous representations that capture the structure of a dataset. In particular, generative models such as variational autoencoders 

[3] and generative adversarial networks [4] show a remarkable ability to synthesize new plausible dataset examples. While most generative models sample from an assumed prior distribution, recent architectures improve latent interpolations through additional adversarial components [5]. With protein data, we have direct access to additional prior information about the physics of atomic interactions, that we can impose on the interpolations outside of the known states in the dataset.

Many successful generative architectures utilise convolutional neural networks (CNNs) [6], which are more computationally efficient than regular neural networks and have fewer parameters due to shared weights mimicking local connectivity in the visual cortex of the brain. Though a surfeit of applications are in the fields of image, video, audio, and speech recognition [6], variants of CNNs have also been applied to bioinformatics ranging from gene expression regulation [7, 8, 9], anomaly classification [10, 11, 12], to prediction of protein secondary structure [13, 14], and protein folds [15, 16, 17, 18].

Figure 1:

Neural network design. (A) The generative architecture encodes and interpolates between different conformational states, applying convolutional loss functions to enforce physical characteristics to be respected at random midpoints between given conformations. (B) Protein atoms can be sorted in a list so that atoms that are adjecent in the list are also adjecent in the Cartesian space. The convolutional neural network operates on this list, that can be of arbitrary size. (C) The first 1D convolution layer learns

feature detectors, each with a kernel size of

. The stride is set to

, therefore the output sequence is half the input size for any input length. Each subsequent layer further reduces the spatial length of the molecule, warping the input such that it becomes progressively deeper and thicker (more ribbon-like) as well as more abstract.

We present a 1D CNN architecture (Figure 1A) that directly processes molecular structures. The network takes as input two different conformations, encodes and interpolates their latent representations, which are then decoded by the network . New physics-based convolutional loss functions are introduced to ensure properties of the generated results, in particular on the path between the known conformational states.

This architecture has several significant advantages over the conventional networks taking atomic coordinates [19]

or molecular features as input. First, being fully-convolutional, it features a small number of parameters and it is therefore easy to train. Second, it can handle input molecules with arbitrary numbers of atoms, enabling network training with different molecules, either simultaneously or via transfer learning. Third, it does not make the assumption that data is Gaussian distributed around observations used for training. In order to ensure that our network identifies plausible transition paths between known states, we apply a new physics-based convolutional loss function, leading to better generalisation to biologically relevant intermediate configurations.

2 Network architecture

Proteins are defined by their amino acid sequence, and each sequence maps onto an ensemble of possible three dimensional atomic arrangements (conformations). The space of possible conformations associated to a specific sequence may be extremely reduced for a protein taking a single well-defined state, or broad for a flexible protein capable of interconverting between multiple states. As the proteome is vast and many proteins are resistent to most forms of experimental interrogation, only a relatively small collections of proteins have had at least one of their possible conformations revealed at atomic resolution. Furthermore, as the techniques characterizing molecular structures typically report on low energy conformations, transition states are undersampled proportionally to their associated energy barrier.

From a machine learning perspective, we can define the entire proteome as a distribution

, where is a protein, and a collection of conformations of a specific protein experimentally (e.g. nuclear magnetic resonance spectroscopy, x-ray crystallography) or computationally (e.g. Monte Carlo or MD simulations) determined. We wish to learn a low -dimensional embedding that maps proteins onto the latent space, where sampling any point and taking the inverse yields a continuous space of physically plausible molecular structures. However, as the expected observations and relax into a subset of conformations [20], the behaviour at the valley regions on the manifold (maxima in the energy landscape) is difficult to capture explicitly from the observations.

Let be an encoder function with parameters , and be a decoder function that approximates the inverse accordingly. The conventional approach is a simple reconstructive autoencoder that minimises the mean squared error loss , where:


This follows the geometry and probability distribution from which the dataset was collected and therefore fails to generalise, especially at undersampled regions associated to transition states.

Proteins undergo conformational changes following lower energy paths in their energy landscape, where transition states are expected to be saddle points. The protein’s expected energy, as determined by its atomic interactions, can be expressed as a loss function such as , defined as:


where is the error between the physical properties for the decoder (bond lengths, angles etc.) and the target properties . However, as with the naïve autoencoder, this also fails to generalise at regions far from conformations provided as example. In principle, it is possible to enforce physics to be respected in regions outside the known conformational space, for example with a Gaussian prior in the latent encoding. However this makes an assumption on the distribution of the latent space.

Any midpoint along the geodesic (i.e. shortest path on the learned manifold) between any two protein conformations will also be a protein of same connectivity and composition. Assuming a degree of convexity of the latent space, we can enforce our physics-based loss function at random midpoints between and

, sampled linearly from a uniform distribution:


where are midpoints in the latent space between two protein conformations. This enables physical characteristics to be enforced on the manifold between two points, but outside of the known sampled conformational space.

We note that, in principle, this approach allows for the training of a neural network with simulations of multiple proteins, whereby pairs (to interpolate between) are picked from the same simulation.

Putting this together, we can assemble the final loss as a weighted sum of the geometric term and the path term (which itself comprises of various individually weighted physics terms):


2.1 Masked bonded loss

Atoms in a protein structure file are listed residue-by-residue. So, the positions of atoms throughout the list are spatially coherent, where atoms involved in a common bond or angle are most likely adjacent in the list. We developed an efficient way to assess bonds and angles within a protein, by expressing

as differentiable 1D convolutions with a small number of pre-defined kernels that extract specific vectors relative for each atom in the list. The output of the convolution operator

is then multiplied by a binary mask to filter only those vectors describing covalently interacting atoms.

For a given protein , the function returns sets of binary masks , target properties , and pre-defined convolution kernels . The error function is defined as the sum of squared differences between the current properties and the target properties multiplied by the mask, and normalised by the sum of the masked elements:


where the final loss is the mean of each case .

To assess bond lengths, we convolve with a kernel for each (xyz) component:


Similarly, for angles we take the dot product, using convolutions on the inner components:


where the norms in the denominator are over the three components. For instance, to calculate the N-CA-C angle from atomic positions sorted as [N, CA, CB, C], we set and . A list of all masks adopted in this work is provided in Supplementary Material (Table S1).

2.2 Warped non-bonded loss

The non-bonded potential consists of a sum of two terms, describing van der Waals and electrostatic interactions. is evaluated for any pair of atoms not involved in a mutual bond, angle or dihedral. Let the distance between two atoms, we describe their van der Waals interaction as a 12-6 Lennard-Jones potential :


where is the equilibrium interatomic distance, the depth of the potential well. However, since side-chains atoms (except C) are not used in the network training, only the repulsive term of the Lennard-Jones is applied. This is to avoid the protein structure to pack excessively, filling the voids left by missing side-chain atoms. In this work, we set and adopted the combination rule for the van der Waals radius to determine for each atomic pair ( and ). Thus, the Lennard-Jones potential is defined as:


Electrostatic interactions are described by a Coulombic potential :


where is the multiplication of their respective charges. As , the gradient descent becomes unstable at short inter-atomic distances. Clamping causes the gradients to get stuck in the corners of the hypercube. Therefore, we approximate the non-bonded potentials by warping the input space piecewise with an exponential. This is achieved equivalently and efficiently with the ELU [21]

intrinsic activation function for some offset constant

, where:


such that the potentials now tend to high positive or negative values, without significantly altering the profile of the potential well (Figure S1). We choose for and for , giving large positive or negative y-intercepts for the expected upper and lower bounds for and accordingly.

3 Results

MurD (UDP-N-acetylmuramoyl-L-alanine:D-glutamate ligase) [22] is a ligase playing a key role in the peptidoglycan biosynthesis of almost all bacterial species by catalyzing the addition of D-glutamic acid to UDP-N-acetylmuramoyl-L-alanine. The protein consists of three globular domains, one of which (residues 299-437) undergoes a large scale rearrangement (from open to closed state) triggered by substrate binding to activate the catalytic site. The open (PDB: 1E0D [23]) and closed (PDB: 3UAG [24]) states, as well as a few intermediates (PDB: 5A5E and 5A5F [25]), have been crystallized, providing key experimental evidence about the possible protein’s mode of action. MD simulations of the open and closed states, and of the transition between them have been previously carried out [19], providing a useful dataset for our network training and its performance evaluation. Such extensive data and the importance of MurD as a potential antibacterial drug target [26, 27] together make this protein a particularly interesting test case for this study.

Here, we train our neural network with MD-generated conformations of MurD open and closed states (training set), and assess the network capacity of predicting a possible transition path between the two. We assess the quality of the predicted path in terms of its structural quality as well as matching with the closed-to-open MD simulation and available intermediate crystal structures. We then evaluate the capacity of our network trained on MurD to adapt to another protein in a transfer learning scenario.

3.1 Force field-based loss functions improve network accuracy

Figure 2:

Performance of the neural network trained with four different loss functions featuring an increasing number of physical terms (labelled on the left). At every epoch, each network was asked to generate 20 conformations interpolating from MurD closed to open state. (A) At each epoch, we calculate the total error of each conformation, defined as the sum of the difference between all the measured bond and angles and their accepted (force field - based) values. Mean values are reported on the vertical axes, standard deviations in color. Physics-based loss functions lead to interpolations with lower errors. (B) The network-predicted protein conformations of open (left), intermediate (centre) and closed (right) states at the last epoch, shown in sausage representation with the thickness and colour corresponding to the percentage error at the residue level.

We first assessed whether training the neural network using information on the physical properties of proteins is beneficial. To this end, we trained our neural network with four different loss functions featuring an increasing amount of physics-based constraints (Figure 2). Each network was trained with conformations produced by the MD simulations of MurD closed and open states, and then challenged to produce 20 intermediates along the predicted transition path between the two states. In order to assess the quality of intermediate conformations throughout the training, we measured each of their bonds and angles, and compared these measures to the expected equilibrium values within the Amber ff14SB force field [28]. Thus, for each intermediate conformation at every training epoch, we determined a total error in the bonded parameters (see Methods section and Figures S2 and S3).

The network trained without any physics-based constraints (using just the Mean Square Error, MSE) could only poorly reproduce MurD expected bonded parameters (average error of intermediate structures equal to 22.9% 2.3). Notably, the interpolation quality degraded the further the intermediate structure was from examples within the training set (errors of up to 26.6%). Regions of the protein featuring the highest error were concentrated in loops and on the mobile domain of the protein (Figure 2B: MSE).

Coupling bond information with MSE in the loss function only slightly improved the quality of the protein structures generated (Figure 2: Bonds), with average errors equal to 23.3% 2.3. A substantial improvement was then obtained when training a network with MSE, bonds and angles. Intermediate conformations featured an average error of 12.9% 0.8, with a worst case of 14.4% (Figure 2: Bonds + Angles). Mapping bond and angle errors on MurD structure revealed that the only regions of lower quality were within a few loops.

Figure 3: Latent space landscapes of all physical loss function terms combine into a clearly defined minimum. White points indicate the projection in the latent space of all training examples, including the generated midpoints .

In addition to bonded terms (bonds and angles), protein conformations are also modulated by non-bonded interactions such as electrostatics and van der Waals. Including non-bonded potentials into the loss function along with the MSE and bonded terms further improved the quality of the transition state conformations (total errors 12.6% 0.9, Figure 2: Bonded + Non-bonded). Mapping the training set and associated random midpoints onto the latent space revealed that these lie in the centre of a clearly defined near-convex minimum (Figure 3).

In order to obtain a more detailed information on the structural quality of each intermediate model, we assessed their Discrete Optimized Protein Energy (DOPE) score [29]. The DOPE score is an atomic distance-dependent statistical potential commonly used to assess experimentally determined and computationally predicted protein structures. Models generated by the network featuring non-bonded potentials in its loss function featured a slightly better score (-34173 1648) than the network not featuring this term (-34104 1703 at best), and the residue-level quality of the models generated by the best network was consistent with the profile of MurD crystal structure (PDB: 3UAG, see Figure S4 and Supplementary Material).

Overall, training our 1D CNN with a loss function featuring a combination of MSE, bonded and non-bonded terms yielded interpolations of good structural quality through regions not sampled in the training set.

3.2 Neural network predicts a possible state transition path

Figure 4: State transition path prediction by the neural network. (A) Side view of MurD showing the transition of its mobile domain from the closed to open state (with the time evolution marked as beads colored from yellow to violet). (B) The conformational change of MurD can be described in terms of spherical coordinates between its three domains (see Methods). We report the opening angle of each conformation in the training set (light green), test set (dark green), intermediate crystal structures (palatinate stars) as well as interpolations generated by PCA (gray circles), purely geometry-based neural network (MSE, grey triangles) and neural network combining geometry and physics (DNN, grey squares). The interpolation produced by DNN best reproduces the path described by the test set, and transits in the vicinity of intermediate crystal structures. (C) Superimposition of intermediate MurD conformation 5A5E (in palatinate) and an intermediate conformation generated by the neural network (in white), with an RMSD of 1.2 Å.

The switch of MurD between closed and open conformations involves the rigid-body rearrangement of one domain (residues 299-437) with respect to the rest of the protein (residues 1-298). We can readily characterize this movement by tracking the position on the center of mass this domain with respect of its connection to the rest of the protein, reporting it in spherical coordinates (Figure 4A). Describing the closed and open MurD MD simulations according to this metric reveals that the conformations of these two states are clearly distinct (Figure 4B). The MD simulation of MurD switching from closed-to-open state (hereon test set

) follows an irregular path: first a concerted increase in elevation and azimuth opens the domain, leading to conformations closely resembling the crystal structure of the intermediates (RMSD of secondary structure elements equal to 1.16 and 1.12 Å versus PDBs 5A5E and 5A5F, respectively), then an increase in azimuth and radius leads the domain to its final equilibrium position. Methods relying on purely geometric interpolations should be unable to predict such a two-step process. Indeed, the transition path between open and closed state generated by principal components analysis (PCA, as a linear interpolation within the simulations’ eigenspace) traces a near-uniform far from what was observed in the test set. Considering more than 2 eigenvectors also had no significant improvement in the interpolations generated by PCA (Figure S5).

The transition path predicted by the neural network trained solely based on MSE is less uniform, but nevertheless only poorly recapitulates the test set. The interpolation produced by the network trained considering MSE, bonded and non-bonded terms traces instead a path closely resembling the test set. The additional physics-based terms in the loss function not only helped in generating structures with correct bond lengths and angles, but also prevented an interpolation that would have required the protein domains to slightly compenetrate. Remarkably, one of the interpolated conformations had a backbone RMSD of 1.18 Å from the intermediate crystal structure (PDB: 5A5E), a quantity comparable to that of the test set (Figure S6).

Our results, thus, show that in this application our 1D CNN trained with a combination of MSE and physics-based terms was capable of correctly identifying a possible non-linear conformational change between two distinct states.

3.3 Transfer learning improves convergence

Training a neural network on a prohibitively small dataset can be possible if the network is pre-trained on a similar, larger dataset, a process called transfer learning. The convolutional nature of our network enables this training approach, as its architecture (and thus the number of its parameters) is independent from the number of atoms in the dataset under study. In order to assess the transferability of our trained network, we leveraged the MD simulation of the HIV-1 capsid protein p24. p24 is about half the size of MurD (24-kDa vs. 47-kDa, respectively) and consists of two rigid domains connected by a flexible linker, allowing for a broad range of conformations (Figure 5A). We clustered its simulation, and selected 100 and 50 representative cluster centroids (see Methods). For both datasets, we then trained our neural network twice: once from scratch, and once by initializing its parameters with those of the best network trained on the larger MurD dataset using our physics-based loss function. Each neural network was assessed according to its capacity of interpolating between the two most different conformations (largest RMSD) in the training set.

In both cases, the networks having their parameters initialized converged faster than those trained from scratch and they generated interpolations with lower percentage errors both in terms of mean and standard deviation (Figure 5B). Furthermore, in both cases there was no significant loss in the structural quality of the p24 intermediates generated by the network trained by transfer learning (total percentage error of 15.0% 3.5 and 16.3% 4.2 with training set size of 50 or 100, respectively). The residue level quality of the intermediate conformations was also reflected in their DOPE score profile (Figure S7), with the only poor quality region (positive score) being located in correspondence of a long flexible loop (residues P75-R87).

These results demonstrate that our network can be trained with proteins of arbitrary size, and indicate that transfer learning enables faster training convergence, even when training data is limited.

Figure 5: Transfer learning of our best network trained on MurD to HIV-1 capsid protein p24. (A) Side view of p24 showing its mobile domain in yellow, and exemplar alternative positions with a transparency. (B) Moving average (window of 3) mean and standard deviation of the total error percentage comparing the performance of the pre-trained network with its standalone counterpart, both using the same loss function featuring both MSE and physics-based terms. The pre-trained network features a lower mean and standard deviation, as compared to a network trained from scratch. The slight increase in mean error at 150 epochs is caused by the triggering of non-bonded terms evaluation in the loss function (see Methods).

4 Discussion and conclusion

Conformational dynamics are an intrinsic property of any protein, their magnitude depending on the required biological function. Several computational methods have been designed to sample the protein conformational landscape and predict existing conformational states. Methods leveraging MD with advanced sampling methods to overcome energy barriers between different conformational states are limited by the timescales at which the transitions occur and associated high demand for computational power. We have designed and trained a generative neural network with collections of protein conformations as a means to predict plausible intermediate between any of them. While here the network has been trained with structures from MD simulations, any source of structural information can in principle be used. Since as little as 50 structures were sufficient to produce physically correct models with our trained network, this opens the door to directly leveraging collections of experimentally determined atomic structures.

Generative networks [3, 4, 5]

are effective when interpolating between existing data but, unless additional information is provided, their extrapolation capabilities are typically insufficient to generate plausible molecular structures. The additional difficulty when training a neural network with protein atomic coordinates is associated to the very high number of degrees of freedom to be handled (often >1000 individual

, and coordinates), making the training process arduous. To overcome these limitations, we have designed a 1D CNN architecture and introduced convolutional physics-based terms in the loss function. The convolutional architecture, already commonplace for image and text processing as well as bioinformatics tasks, is associated to a smaller number of trainable parameters, and is independent from the number of degrees of freedom (and thus atoms) within the training set. Our physics-based terms in the loss function directly constrain learning of protein structures obeying essential force field potentials and, being fully convolutional, make the neural network faster to train. To further facilitate the training process, we have introduced a warping on the non-bonded potential terms, leading to better gradients encouraging convergence in otherwise high Lipschitz constant conditions. To test our network we have purposely designed a set of hard tasks, by selecting the most different conformations from protein simulations featuring large conformational changes, and asking the neural network to generate suitable interpolations by leveraging only a -dimensional latent space, as well as a limited number of training examples.

Challenged with the flexible protein MurD, our network could produce a suitable non-linear transition path between two distinct states, closed and open. Remarkably, conformations along this path were both physically plausible and compatible with an existing MD simulation featuring a closed-to-open transition as well as with two crystal structures of MurD locked in intermediate conformations. While the MSE term in the loss function was key for the network to learn the global protein shape, the addition of physics-based terms was crucial to generating low-energy conformations, associated to a low DOPE score (Figure S4). The bonded terms (bonds and angles) helped fine-tuning local atomic arrangements, whereas the non-bonded ones (electrostatics [30] and Lennard-Jones [31]) were a significant player in identifying a suitable transition path within the conformational space. The structures produced by our new network resulted to be both of lower energy and of greater biological relevance than those obtained by PCA-based interpolations or by a neural network trained solely according to an MSE-based loss function (Figure 4B and C). By analyzing the loss function values within the -dimensional latent space of our physics-based neural network we observed a clearly defined and near-convex minimum, despite the network having been trained with clearly distinct conformational states (Figure 3).

The convolutional nature of our network allows its training with conformations of proteins of arbitrary amino acid sequences. This also enables transfer learning, whereby a pre-trained network is re-purposed to tackle a new though related task, expected to lead to improved generalisation and faster network optimization. In this context, we noted that different proteins still share common features (e.g. typical bonds length and angles, as well as the same sequence of atoms in the backbone), that should not be re-learned. We transferred the network trained on MurD to a different protein, the HIV-1 capsomer protein p24, for which we provided only a very limited amount of training examples (as low as 50). After few epochs, the total error associated to structures generated by the pre-trained network dropped lower than that of models produced by the network trained from scratch.

The intermediate structures generated by our neural network will feature an overall low energy according to a subset of typical terms associated to molecular structures, namely their bond, angles, van der Waals and electrostatic interactions. Still, a more detailed analysis indicated that our neural network may generate suboptimal loop regions when these are highly flexbile and the protein movement is dominated by larger domain-level conformational changes. Under those premises, the energy of predicted intermediates will not be an accurate estimation of energy barriers a conformational change is associated to, and the network should not be expected to predict new, completely unseen states. Nevertheless, knowledge of possible transition paths can provide a guidance for the definition of appropriate reaction coordinates/collective variables in MD-based enhanced sampling schemes. Furthermore, as the neural network effectively transforms a discrete collection of potein conformations into a "conformational continuum", it can find applications in flexible protein-protein docking scenarios where, under the conformational selection paradigm, the ability of fine tuning protein conformations to maximize their compatibility with a binding partner is desirable 


Future work will include incorporating torsional angles and solvation energy terms in the scoring function. We expect this to further enhance the predictive power of the network, as these terms play an important role in correct folding of proteins. A current limitation of our approach is that linearly interpolating between pairs of midpoints assumes a degree of convexity in the latent space. In the future, we will consider alternative interpolation approaches that consider non-convex latent spaces, and train our network on larger datasets featuring conformations of multiple proteins.

5 Method

5.1 Implementation

Atoms (points) of a molecular structure (such as in PDB file, the standard representation for molecular 3D structure data, Figure 1B) can be sorted in a list so that covalently connected atoms appear close to each other.

The 1D CNN architecture was implemented in PyTorch with 12 layers (6 in the encoder and 6 in the decoder components). The first layer has 3 input channels (for each atom’s 3D coordinates) and 32 output channels, where subsequent layers of depth


input channels with batch normalisation and ReLU activations. Convolutions have size

kernels with strides of

and padding of

, halving the spatial dimension each layer. This means the molecule becomes like a progressively thicker but shorter ‘ribbon’ whose activations correspond to more deep and abstract characteristics of the molecule, Figure 1C. The latent space was fixed to two dimensions using a 1D adaptive mean pooling layer to handle arbitrary length input sequences.

We found that increasing the dimension of the latent space, adding an adversarial discriminating component, or adding residual layers led to only negligeable generalisation improvements. The total number of parameters is 11,892,767 optimised using Adam with learning rate of and weight decay of using a batch size of for 200 epochs and 1,000 optimisation steps per epoch. Training the model takes 7 hours on an NVIDA TITAN Xp graphics card. In preliminary tests, we found that including non-bonded terms in the loss function only towards the end of the convergence of bonded terms (here, after 150 epochs) was beneficial to optimize the atomic arrangements.

Each neural network presented in this work was trained 10 independent times; all values reported (loss functions mean and standard deviation, as well as DOPE score of resulting interpolated structures) are the average of these 10 repeats.

5.2 Datasets

MurD has been crystallized in its closed (PDB: 3UAG [24]) and open (PDB: 1E0D [23]) states, as well as in intermediates between the two (PDB: 5A5E and 5A5F [25] with a backbone RMSD of secondary structure elements equal to 1.12Å between them). The difference between these states comes from large scale conformational change of one of its three globular domains (residues 299-437) caused by substrate binding. Conformations from MD simulations of the closed and open states performed by some of the authors [19] were used here as the training dataset (4420 conformations in total comprising of 2507 and 1913 from closed and open simulations, respectively). In order to evaluate the predictive performance of our network in interpolating between the diverse conformational states, a third set of MD simulations were performed with the ligand removed from the closed state (closed-apo) [19]. Two additional repeats of this simulation were performed by us with the same simulation protocol as [19]. This produced a total of 1513 conformations representing a transition from the closed-to-open state unseen in either the closed or open state simulations.

The dataset of 1E6J was taken from  [32] by clustering the simulation trajectory using an average-linkage hierarchical agglomerative clustering. By applying a heavy atom RMSD cutoff of 3 Å, 100 centroids were selected as representative structures. 50 of these structures were randomly chosen to test the performance of transfer learning on a smaller datatset.

For all datasets, we selected the backbone (C, C, N, O) as well as the side chain C atoms as representatives of protein structure. These are sufficient to describe the global conformation (fold) of a protein and the orientation of each side chain. The two extreme conformations in both cases (MurD and p24) were selected by calculating an RMSD (backbone) matrix considering all the conformations of the training set and picking the pair with the highest RMSD. We generated 20 conformations interpolating between these extrema.

5.3 Percentage error calculation

We used the equilibrium values of bonds and angles from the Amber ff14SB force field  [28] to estimate the percentage error in the corresponding bonded parameters modelled by our network. The error over all the bonds and angles present in a conformation was summed to obtain the as a measure of accuracy of the network in generating physically plausible protein structures: . Similarly, the per-residue errors were calculated as the sum of errors associated to interactions involving any atom within a residue.

5.4 Analysis of MurD opening angle

The opening angles (azimuth and elevation) of MurD was calculated by aligning the protein along the stable domain (residues 1 to 298), centering the resulting alignment on the hinge between the mobile and stable domains (center of mass of residues 230 to 299), and reporting the position of the centre of mass of the mobile domain (residues 299 to 437) in spherical coordinates. A schematic representation of the angles calculated is shown in Figure 4A.

Visual inspections of the protein structures and related figures were done with VMD1.9.2 [33] and PyMOL [34]. Graphs were plotted using matplotlib [35].

6 Acknowledgments

The work was supported by the Engineering and Physical Sciences Research Council (EP/P016499/1).

7 Author contributions

V.K.R., C.G.W. and M.T.D. designed the research, developed the software, carried out the experiments, interpreted the results and wrote the manuscript.

8 Declaration of interests

The authors declare no competing interests.


  • [1] Robert Zwanzig, Attila Szabo, and Biman Bagchi. Levinthal’s paradox. Proceedings of the National Academy of Sciences, 89(1):20–22, 1992.
  • [2] Martin Karplus and J Andrew McCammon. Molecular dynamics simulations of biomolecules. Nature Structural & Molecular Biology, 9(9):646, 2002.
  • [3] Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. International Conference on Learning Representations, abs/1312.6114, 2013.
  • [4] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2672–2680. Curran Associates, Inc., 2014.
  • [5] David Berthelot*, Colin Raffel*, Aurko Roy, and Ian Goodfellow. Understanding and improving interpolation in autoencoders via an adversarial regularizer. In International Conference on Learning Representations, 2019.
  • [6] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436, 2015.
  • [7] Babak Alipanahi, Andrew Delong, Matthew T Weirauch, and Brendan J Frey. Predicting the sequence specificities of dna-and rna-binding proteins by deep learning. Nature biotechnology, 33(8):831, 2015.
  • [8] Jian Zhou and Olga G Troyanskaya. Predicting effects of noncoding variants with deep learning–based sequence model. Nature methods, 12(10):931, 2015.
  • [9] Olgert Denas and James Taylor. Deep modeling of gene expression regulation in an erythropoiesis model. In Representation Learning, ICML Workshop. ACM New York, USA, 2013.
  • [10] H. R. Roth, L. Lu, J. Liu, J. Yao, A. Seff, K. Cherry, L. Kim, and R. M. Summers. Improving computer-aided detection using convolutional neural networks and random view aggregation. IEEE Transactions on Medical Imaging, 35(5):1170–1181, May 2016.
  • [11] Petros-Pavlos Ypsilantis, Musib Siddique, Hyon-Mok Sohn, Andrew Davies, Gary Cook, Vicky Goh, and Giovanni Montana. Predicting response to neoadjuvant chemotherapy with pet imaging using convolutional neural networks. PloS one, 10(9):e0137036, 2015.
  • [12] Tao Zeng, Rongjian Li, Ravi Mukkamala, Jieping Ye, and Shuiwang Ji. Deep convolutional neural networks for annotating gene expression patterns in the mouse brain. BMC bioinformatics, 16(1):147, 2015.
  • [13] Buzhong Zhang, Jinyan Li, and Qiang Lü. Prediction of 8-state protein secondary structures by a novel deep learning architecture. BMC bioinformatics, 19(1):293, 2018.
  • [14] Shiyang Long and Pu Tian. Protein secondary structure prediction with context convolutional neural network. BioRxiv, page 633172, 2019.
  • [15] Jie Hou, Badri Adhikari, and Jianlin Cheng. DeepSF: deep convolutional neural network for mapping protein sequences to folds. Bioinformatics, 34(8):1295–1303, 12 2017.
  • [16] Rafael Zamora-Resendiz and Silvia Crivelli. Structural learning of proteins using graph convolutional neural networks. bioRxiv, page 610444, 2019.
  • [17] Mu Gao, Hongyi Zhou, and Jeffrey Skolnick. Destini: A deep-learning approach to contact-driven protein structure prediction. Scientific reports, 9(1):3514, 2019.
  • [18] R Evans, J Jumper, J Kirkpatrick, L Sifre, TFG Green, C Qin, A Zidek, A Nelson, A Bridgland, H Penedones, et al. De novo structure prediction with deeplearning based scoring. Annu Rev Biochem, 77:363–382, 2018.
  • [19] Matteo T Degiacomi. Coupling molecular dynamics and deep learning to mine protein conformational space. Structure, 27(6):1034–1040, 2019.
  • [20] Andrew R Leach and Andrew R Leach. Molecular modelling: principles and applications. Pearson education, 2001.
  • [21] Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and accurate deep network learning by exponential linear units (elus). International Conference on Learning Representations, abs/1511.07289, 2015.
  • [22] Jay A Bertrand, Geneviève Auger, Eric Fanchon, Lydie Martin, Didier Blanot, Jean van Heijenoort, and Otto Dideberg. Crystal structure of udp-n-acetylmuramoyl-l-alanine: D-glutamate ligase from escherichia coli. The EMBO journal, 16(12):3416–3425, 1997.
  • [23] Jay A Bertrand, Eric Fanchon, Lydie Martin, Laurent Chantalat, Geneviève Auger, Didier Blanot, Jean van Heijenoort, and Otto Dideberg. “open” structures of murd: domain movements and structural similarities with folylpolyglutamate synthetase. Journal of molecular biology, 301(5):1257–1266, 2000.
  • [24] Jay A Bertrand, Geneviève Auger, Lydie Martin, Eric Fanchon, Didier Blanot, Dominique Le Beller, Jean van Heijenoort, and Otto Dideberg. Determination of the murd mechanism through crystallographic analysis of enzyme complexes. Journal of molecular biology, 289(3):579–590, 1999.
  • [25] Roman Šink, Miha Kotnik, Anamarija Zega, Hélène Barreteau, Stanislav Gobec, Didier Blanot, Andréa Dessen, and Carlos Contreras-Martel. Crystallographic study of peptidoglycan biosynthesis enzyme murd: domain movement revisited. PloS one, 11(3):e0152075, 2016.
  • [26] Roman Šink, Hélène Barreteau, Delphine Patin, Dominique Mengin-Lecreulx, Stanislav Gobec, and Didier Blanot. Murd enzymes: some recent developments. Biomolecular concepts, 4(6):539–556, 2013.
  • [27] Tafere Mulaw Belete. Novel targets to develop new antibacterial agents and novel alternatives to antibacterial agents. Human Microbiome Journal, 11:100052, 2019.
  • [28] James A Maier, Carmenza Martinez, Koushik Kasavajhala, Lauren Wickstrom, Kevin E Hauser, and Carlos Simmerling. ff14sb: improving the accuracy of protein side chain and backbone parameters from ff99sb. Journal of chemical theory and computation, 11(8):3696–3713, 2015.
  • [29] Min-yi Shen and Andrej Sali. Statistical potential for assessment and prediction of protein structures. Protein science, 15(11):2507–2524, 2006.
  • [30] Zhe Zhang, Shawn Witham, and Emil Alexov. On the role of electrostatics in protein–protein interactions. Physical biology, 8(3):035001, 2011.
  • [31] JB Adams. Bonding energy models. Encyclopedia of Materials: Science and Technology, pages 763–767, 2001.
  • [32] Matteo T Degiacomi and Matteo Dal Peraro. Macromolecular symmetric assembly prediction using swarm intelligence dynamic modeling. Structure, 21(7):1097–1106, 2013.
  • [33] William Humphrey, Andrew Dalke, and Klaus Schulten. Vmd: visual molecular dynamics. Journal of molecular graphics, 14(1):33–38, 1996.
  • [34] Schrödinger, LLC. The PyMOL molecular graphics system, version 1.8. November 2015.
  • [35] J. D. Hunter. Matplotlib: A 2d graphics environment. Computing in Science & Engineering, 9(3):90–95, 2007.