A framework for a generalisation analysis of machine-learned interatomic potentials
Machine-learned interatomic potentials (MLIPs) and force fields (i.e. interaction laws for atoms and molecules) are typically trained on limited data-sets that cover only a very small section of the full space of possible input structures. MLIPs are nevertheless capable of making accurate predictions of forces and energies in simulations involving (seemingly) much more complex structures. In this article we propose a framework within which this kind of generalisation can be rigorously understood. As a prototypical example, we apply the framework to the case of simulating point defects in a crystalline solid. Here, we demonstrate how the accuracy of the simulation depends explicitly on the size of the training structures, on the kind of observations (e.g., energies, forces, force constants, virials) to which the model has been fitted, and on the fit accuracy. The new theoretical insights we gain partially justify current best practices in the MLIP literature and in addition suggest a new approach to the collection of training data and the design of loss functions.
READ FULL TEXT