Log In Sign Up

Normalizing flows for atomic solids

by   Peter Wirnsberger, et al.

We present a machine-learning approach, based on normalizing flows, for modelling atomic solids. Our model transforms an analytically tractable base distribution into the target solid without requiring ground-truth samples for training. We report Helmholtz free energy estimates for cubic and hexagonal ice modelled as monatomic water as well as for a truncated and shifted Lennard-Jones system, and find them to be in excellent agreement with literature values and with estimates from established baseline methods. We further investigate structural properties and show that the model samples are nearly indistinguishable from the ones obtained with molecular dynamics. Our results thus demonstrate that normalizing flows can provide high-quality samples and free energy estimates of solids, without the need for multi-staging or for imposing restrictions on the crystal geometry.


page 1

page 2

page 3

page 4


Graph Dynamical Networks: Unsupervised Learning of Atomic Scale Dynamics in Materials

Understanding the dynamical processes that govern the performance of fun...

Deep neural network for Wannier function centers

We introduce a deep neural network (DNN) model that assigns the position...

Flow Annealed Importance Sampling Bootstrap

Normalizing flows are tractable density models that can approximate comp...

Copula-Based Normalizing Flows

Normalizing flows, which learn a distribution by transforming the data t...

Energy Flows: Towards Determinant-Free Training of Normalizing Flows

Normalizing flows are a popular approach for constructing probabilistic ...

MLSolv-A: A Novel Machine Learning-Based Prediction of Solvation Free Energies from Pairwise Atomistic Interactions

Recent advances in machine learning technologies and their chemical appl...

Efficient CDF Approximations for Normalizing Flows

Normalizing flows model a complex target distribution in terms of a bije...


I Supplemental Material

i.1 Model details

Our normalizing flow is an improved version of the implementation proposed by Wirnsberger et al. [20], which is a sequence of permutation-equivariant coupling layers based on circular splines [26] whose parameters are the output of transformers [27]. We refer the reader to Section V or Ref. [20] for a full description of the flow architecture. The improvements we made in this paper are the following.

  • The coupling layers as used in Ref. [20] have the property that particle coordinates at the edge of the box remain fixed, which limits the flexibility of the flow. To overcome this limitation, we interleave the coupling layers with learned circular shifts, defined by


    were is a learned parameter corresponding to dimension . The parameter is a constant—it does not depend on any of the coordinates—so the Jacobian determinant of the above transformation is equal to .

  • Reference [20] uses a circular encoding of the coordinates prior to feeding them to the transformer in order to encode their periodicity due to periodic boundary conditions. Specifically, a coordinate is encoded as


    where . Here we use a richer encoding that also includes higher-order frequencies, and is defined by



    is the total number of frequencies, a hyperparameter to be tuned. Although the higher frequency inputs could be deduced from the lowest frequency, we found that the higher frequencies significantly helped the network express more flexible functions.

Table 1 lists the hyperparameters we used for the experiments in this paper.

Normalizing flow
Number of layers ()
Number of blocks
Number of heads
Embedding dimension
Number of frequencies in circular embedding () ( particles), ( and particles), 24 ( and particles)
Circular rational-quadratic spline
Number of segments
Base distribution
Standard deviation of truncated Gaussian noise (in reduced units) (LJ), (mW)
Supplementary Table 1: Model hyperparameters.

i.2 Optimization details

All models were trained using the Adam optimizer [41]. Prior to applying the Adam update rule, we clip the norm of the gradient to be below a maximum value, to avoid potential instability due to large gradients. Also, we reduce the learning rate during training at pre-specified training steps by a constant factor.

The pairwise potentials we use in the experiments (the Lennard-Jones potential and the two-body term of the monatomic Water potential) have the property that they diverge for zero pairwise distance. This can cause numerical problems during training, as it can cause the gradients of the loss function to become very large. To avoid this problem, we train on a linearized version of these potentials, defined by:


where is the original pairwise potential and is a distance threshold below which it is linearized. In practice, we set smaller than the typical distance between particles so linearization has a negligible effect on the target Boltzmann distribution.

Table 2 lists the training hyperparameters we used in our experiments.

Batch size
Maximum gradient norm
Learning-rate schedule
Initial learning rate
Learning-rate decay steps k, k
Learning-rate decay factor
Energy linearization
Distance threshold to linearize below () (LJ), (mW)
Adam optimizer
Supplementary Table 2: Training hyperparameters.

i.3 Physical systems under study

We apply our method to two particular physical systems in the solid state: truncated and shifted Lennard-Jones (LJ) and monatomic Water (mW). We now summarize the energy functions of these systems.

i.3.1 Lennard-Jones

The pairwise LJ potential is given by [2]


where is the distance between two particles, defines the unit of energy and the particle diameter the unit of length. To compare free energy estimates with literature values reported in Ref. [29], we employed a spherically truncated and shifted version of the above potential given by [2]


with being a radial cutoff set to . The total potential energy comprises contributions over all pairs of the -particle system described by , where defines the position of the -th atom with coordinates , and is given by


where and the function pbc is applied elementwise yielding the shortest, signed distance for the component of its argument, that is . Finally, we switch to reduced units by choosing as the unit of length and as the unit of energy, and establish the connection with and that we introduced in the main text. See Ref. [2] for further details on reduced units.

i.3.2 Monatomic Water

Monatomic Water [25] models water as point particles interacting via two-body and three-body potentials. The total energy is given by


where is an angle computed for each triplet, and the functions




with , , , , and . The remaining two parameters and set the scales for energy and length, similarly to the case of LJ above.

i.4 MD simulation details

We used molecular dynamics (MD) to simulate the above systems in the canonical ensemble: particles in a fully periodic simulation box of volume and temperature .

All MD simulations were carried out with the package LAMMPS [31], defined by the following parameters (see Table 3 for values). To simulate at constant temperature, we used a Langevin thermostat with damping constant and subtracted a force for the centre of mass to remain stationary (keyword “zero yes”). The equations of motion were integrated using the velocity Verlet algorithm and discretized using a timestep . We initialized particle positions with the target lattice and performed an equilibration run of length followed by a production run of the same length during which we sampled particle positions every timesteps.

LJ 2 1.28 0.2 100
mW 0.1 fs 200 K 1 ps 10 ns 200
Supplementary Table 3: Simulation parameters. For LJ, all quantities are reported in reduced units.

i.5 MBAR details

For comparison with the flow-based free energy estimates, we compute the value of the solid free energy using the multistate Bennett acceptance ratio (MBAR) method [17]. To this end, we decomposed the free energy into two contributions, . The first term is the free energy of an Einstein crystal, which is approximated as [42, 43]


where is the thermal de Broglie wavelength, which we set to the particle diameter following Ref. [34], is the inverse temperature and is the Boltzmann constant. The quantity defines the spring constant of the Einstein crystal with energy


where denotes the position of the lattice site with which particle is associated. To estimate , we introduced intermediate energies

for each system (see below) to interpolate between the ideal Einstein crystal defined by

and the target solid based on a scalar parameters . We then discretized uniformly into values and performed intermediate simulations for each value of during which we collected 10k samples per simulation. The energy matrix computed from the samples was then used as input to MBAR [17]. We repeated this procedure to obtain estimates for a minimum of five different random seeds, and reported the mean and two standard deviation across the seed estimates in Tab. 1 of the main text.

i.5.1 Interpolation stages for Lennard-Jones system

We use a softcore lambda version of the LJ potential to interpolate between and  [44]. We then constructed the energy for the simulations.

i.5.2 Interpolation stages for monatomic Water

For ice, we employed a simple linear interpolation

i.6 Supplementary figures

Supplementary Figure 1: Energy histograms (a) and radial distribution functions (b) of the base distribution, the fully trained model and MD simulation data, for the 512-particle hexagonal ice system.
Supplementary Figure 2: Histograms of work values () per particle from base and model samples for 512-particle hexagonal ice; the vertical line marks the MBAR free energy estimate (enlarged in inset). Top right inset: scatter plot of model density vs approximately normalized target density computed from model samples, where the model is either the base distribution or the fully trained model.
Supplementary Figure 3: Energy histograms (a) and radial distribution functions (b) of the base distribution, the fully trained model and MD simulation data, for the 500-particle Lennard-Jones system.
Supplementary Figure 4: Histograms of work values () per particle from base and model samples for 500 Lennard-Jones particles; the vertical line marks the MBAR free energy estimate (enlarged in inset). Top right inset: scatter plot of model density vs approximately normalized target density computed from model samples, where the model is either the base distribution or the fully trained model.