Accurate and prompt property prediction plays an essential role in discovering and designing novel materials. First principle (viz., ab initio) methods RN4 are commonly used to calculate the properties of candidate materials. However, ab initio methods based on either Density-Functional Theory (DFT) or wavefunction theories are relatively time-consuming. An alternative method is to predict these properties using machine learning (ML) technologies. By modeling the material structure and atomic interactions, ML algorithms are much faster than DFT and are expected to provide similar accuracy RN34 .
Graph Neural Networks (GNNs) have attracted great attention recently because of their flexibility and effectiveness in describing 3D structures formed by discrete particles. Graph Convolutional Neural Networks (GCNNs) model atom distribution using an undirected graph, and they simulate interactions among the atoms by passing messages among vertices. Although most models only handle inputs with fixed format, GCNNs are capable of modeling highly variable structures with varying number of nodes. This flexibility makes GCNNs have wide applications in different research fields such as computer vision and recommend systems. There have been several applications in chemistry with excellent performance on property predictionRN8; RN29; RN14; RN18 . However, currently these methods have some noticeable disadvantages. In particularly, the graph description abandons most 3D structural information, except the relative distance between connected vertices. In chemistry terminology, this means that only bond lengths and coordination numbers are taken into consideration. This overly simplification naturally fails to model atomic interactions accurately. It is reasonable to assume that adding 3D information will improve model’s capability of making accurate predictions.
This paper presents a K-Nearest Atoms Graph Convolution Neural Network (KAGCN) for materials properties prediction. The proposed KAGCN replaces the “summing up” operation by a novel convolutional operation that integrates information from surrounding atoms simultaneously. We applied the model to two distinctly challenging problems on predicting material properties. The first is Henry’s constant for gas adsorption in Metal-Organic Frameworks (MOFs), which is notoriously difficult because of its highly sensitive on inter-atomic separations. The second is the ion conductivity of solid-state crystal materials, which is difficult due to the lack of labeled data available for training. The new model achieves state-of-the-art performance on both data sets, suggesting that some important three-dimensional geometric information is indeed captured by the new model.
2 Related Work
The performance of a machine learning algorithm improves with experience regarding to the task RN6 . The emergence of large-scale data sets of materials, such as the Inorganic Crystal Structure Database (ICSD) hellenbrandt2004inorganic, the Open Quantum Materials Database (OQMD) kirklin2015open, and the Material Project (MP) jain2013commentary, greatly empowers the applications of novel machine learning technologies in the field of materials science. For materials science, an algorithm typically addresses a given problem by optimizing model parameters based on existing material samples that have been experimentally validated. The rich information contained in large-scale data sets makes the complex model feasible with minimal human interference.
The representation of materials plays a critical role in building an effective machine learning system, and it is challenging due to the complexity of the chemical structures. One current strategy is to describe the materials by a group of physical properties that are collected from computational simulations and/or experiments. For example, RN7
designed a transfer learning framework that generalizes a feature-based predictor into a formula-based predictor and screened billions of materials. Their feature-based predictor employed 30 features.RN8 obtained electronic properties using DFT computations and then applied a neural network to predict other properties. These methods require professional knowledge to develop useful features and thus can barely benefit from the increasing sample size. End-to-end optimization is not feasible for these methods either.
Contrary to the concise properties, some researchers preferred to describe the chemical structure in detail. RN5 , for instance, constructed a smooth and continuous three-dimensional density representation of each crystal based on different atomic positions and designed an encoding/decoding system for molecules. RN10 presented the atomic configuration as a two-dimensional periodic table and normalized the table to mimic a digital image. They trained a multitask CNN model that outputs the lattice parameters and enthalpy of formation simultaneously. The experimental results presented that their method achieved competitive performance on OQMD and ICSD.
Graph-based models could handle variant inputs and thus have attracted much attention in chemistry. RN14 described a molecule by a set of n atoms with nuclear charges and atomic positions and proposed to use continuous filter convolutional layers to model local correlations. The Schnet presented excellent performance on molecules property prediction. RN18 described a crystal using an undirected graph, where the vertices and edges correspond to the atoms and bonds, respectively. The atomic interactions were captured by a specific convolution layer, and a pooling layer embedded the graph into the feature space. The CGCNN model presented excellent performance on a broad array of properties, including formation energy, bandgap, Fermi energy, and elastic properties. RN16 adapted the original CGCNN using multitask learning RN17 , and their experiments showed that sharing parameters in the lower level improved the prediction performance. Some researchers tried to integrate additional knowledge in the model for better performance. RN11 developed an universal model to predict property for both molecules and crystals by incorporating state variables, such as temperature, pressure, and entropy. RN19 added a relative position matrix into the model and provided a mechanism to handle the feature update process. Their 3DGCN model showed good performance on predicting molecular properties and biochemical activities.
In theoretical perspective of ML, predicting material property is straightforward. Here, we are looking for a function that takes material features as input, and outputs the property of interest. The input should contain features that reflect observable characteristics of the materials. There are a variety of features and functions used in existing researchRN29 . In this article, the input contains a feature descriptor and the position of each atom in the materials. The parameters of the function are determined using properties of interest of known materials. The neural network is expected to have great generalization capabilities and thus, could effectively predict the property of interest for new materials.
3.1 Data Representation
Effective and concise representation of atomic structures of materials is a challenging topic in the field, and a variety of methods have been proposed RN30 . Among these methods, graph-based methods present excellent computational efficiency and have superiority in modeling the inner complexity of the structures. Models based on GCN have been successfully applied to property prediction for crystals and molecules RN18; RN16 , and the results demonstrated that these models are effective in general.
We represent the material by encoding three pieces of information. The first is the prior chemistry knowledge about the atoms that was extracted from existing information about the materials. In this article, an matrix was calculated using the adaptive ‘atom2vector’ method RN31 , where l is a hyper-parameter that reflects the dimension of the feature of each atom, and is the number of different elements in the data. Each column of
is an l-dimensional feature vector of the corresponding element.summarizes all the priori known knowledge about the materials in the data set and is independent from the specific structure. The second piece is the element type of the atoms in the structure, which is noted by a matrix with column vectors , where is the number of atoms in the unit cell. Each column vector is an -dimensional vector that represents the atom type. The matrix
is the one-hot encodingRN35 for the atoms in the structure: equals to 1 if the -th atom is the -th element in , and otherwise. The third is the coordinates of the atoms in the material, which is a matrix . Each column in is a 3D vector representing the atomic position. With these three pieces of information as input, we aim to find an optimal function F that minimizes the distance between ground truth and the prediction , where is the predicted value of the property of interest.
3.2 Convolution Operation
Like commonly applied GCN models, we use an undirected graph to store the input information, where is the set of vertices and is the set of edges. There is a one-to-one correspondence between the vertices and the atoms. If there is a bond between two atoms at vertices and , an edge is included in the edge set of the graph RN32 . Each vertex in the graph is associated with an element type represented by the one-hot matrix A. Thus, the atom features can be calculated using . We denote the feature vector associating with vertex as . Similarly, each edge associates with a feature vector , which is a vector depending on the length of the edge . In practical, we used , where are prechosen numbers. One can view
as a probability that the information will be passed through the edge.
To simulates the atomic interaction, the associated feature of each vertex in the structure is updated using specific convolution mechanism. A common approach is integrating the effect of all bonded vertices. For vertex , the updated feature is calculated according to equation 1.
is a nonlinear transformation consists of linear transformation and non-linear activation function. The network contains a sequence of convolutions layers and each layer updates the feature vectors for all atoms. The convolution layers are used to simulate atomic interactions, and the finial feature vectors are expected to reflect the stable atomic states.
The convolution described in equation 1 could effectively accumulate the interaction among immediately connected vertices. and has been successful applied in some material property prediction tasks RN18 . However, it does not seem to fit well for metal-organic framework materials, whose properties are usually highly dependent on the geometric structure of the atoms of the framework in a large neighborhood, instead of merely among the bound atoms. Reducing the 3D spatial relation to pairwise distance between bound atoms neglects the interactions among the atoms in the same neighborhood but not bounded. Also, the parameter-sharing design of GCNs forces the model to calculate the interaction sequentially. In the real world, however, atoms interact with its bonded neighbors simultaneously. This drawback pushes the GCN model further away from accurately modeling the true local chemical environment of the atoms.
To tackle the above disadvantages, we involve the position matrix in the calculation, and the convolution is defined to integrate all neighbor atoms simultaneously. The atomic feature attached to the th vertex is updated according to equation 2.
where is a combination of linear transformation and nonlinear activation function, and is the set indices of the vertices that are closest to . The value is a hyper-parameter that decides how many neighbor atoms should be involved in the convolution. The new convolution mechanism considers the nearest neighbor atoms simultaneously in 3D space, and thus, is expected to achieve better performance.
3.3 Rotational Invariant Description
There is one significant difference between convolution mechanism described in equation 1 and equation 2. The former doesn’t care about the order of the vertices due to the ‘summing up’ mechanism. By contrast, it is very important to keep the atoms consequence identical even the configuration rotated for the proposed convolution mechanism. Assuming and the two neighbor vertices of vertex is noted as and . Then the input of is . Use as the center, we could rotate the geometric composed of , , and until and switch the place with each other. Then the input of becomes and the output of will change accordingly. However, the chemical feature of atom shouldn’t change same even the orientation is different. Theoretically, the equivalency of the rotated coordinates could be learned if a large group of training samples are provided. Unfortunately, we don’t have enough training samples under most circumstances, and it is computational expensive. In this paper, we developed a simple and effective method to achieve rotational invariant for the atom sequences.
There are existing researches devoted to invariant descriptions. Most of these descriptions have a high computational cost and could not fit deep neural network structures. The scale-invariant feature transform (SIFT) RN21 is the most widely used algorithm that generates rotation invariant descriptors for local patches in images. For each located keypoint, the SIFT descriptor calculates the gradient distribution and decides the principal direction. Then the algorithm rotates the principal direction to the x-axis, and the description vector was calculated. The SIFT algorithm has been extensively investigated and applied widely. There are a few adaptions of the original SIFT algorithm RN22; RN23; RN24
that extended the algorithm to 3D space. Comparing to images, the calculation in 3D space is more complex due to the additional degree of freedom. However, the chemical structure data under our consideration are discrete and sparse, and thus, the calculation could be significantly simplified. We transform the coordinate system of the surrounding atoms according to Algorithm1.
For each atom in the structure, we calculated the distance between and the rest atoms , where. Then we sorted all according to the distance and find the nearest atoms based on the hyper-parameter . Denote by the position of atom , and by and the positions of the closest and the second closest atoms to . We first find transform that translates to the origin. Next, we calculate a matrix that rotates to positive -axis according to Rodrigues’ rotation formula RN33 . Then, we calculated the matrix that rotate to the -plane, keeping fixed. Finally, new coordinate for every atom in the nearest neighborhood. It is readily seen that this distance-based sort and rotation guarantee the rotational invariance.
4 Experimental Data
The experiments involve two data sets. The main experimental data is CoRE2019 MOF data set that contains 10136 MOF structures. For the MOF data, the property of interest is Henry’s constant RN26 . Appendix A presents the details of the calculation of Henry’s constant.
The original CoRE2019 MOF database contains 12763 MOF structures. According to our calculation, the range of the Henry’s constant covers more than 30 orders of magnitude. After examining the MOF structure files, we found the board range of Henry’s constant is due to some unreasonable MOF structures. A closer examination reveals that in a small portion of MOFs with exceptionally high Henry’s constant, atoms are unreasonably close to each other. The overlapped atom position would lead to unreasonably high attraction and unreasonably high Henry’s constant. We got rid of MOF structures with potentially overlapped atoms using a threshold on smallest atomic distance. MOFs with atomic distance smaller than were filtered out .
Meanwhile, there are some MOFS with extremely low Henry’s constant (at the order of 10-10). For the low Henry’s constant for those MOFs is due to the relative confined geometry. Compared to other MOF with similar void fraction, it has relative thin structure and therefore most interaction inside the MOF are repulsive which results in a low Henry’s constant. Similarly, we filtered out the MOFs with extremely low Henry’s constant by removing structures with surface area less than 50 (meaning that there will be very little space and very unlikely for surface adsorption to happen). The cleaned data set contains 10136 MOF structures. The Henry’s constants still cover the several order of magnitudes. However, we believe they may be reachable. The details of this sample selection process could be found in Appendix B.
The proposed algorithm was also tested on ion conductivity data that is a subset of the ICSD database. The properties of interest of the ICSD data includes minimum bond length, large Li site, and percolation radius. These properties were calculated through topological analysis and could be used as indicators for high potential conductors. Readers could refer to RN25 for the details of the calculation of these properties.
5 Computational Experiments
The values of
for MOF data in the database range across several order of magnitudes. The tremendous values would cause issues during the backpropagation of the gradient and make the training infeasible. For example, it could be hard to find a reasonable learning rate because of the significant variances. Existing researchRN27 predicted the instead of the widely varied original value. In this paper, we projected the original value to both and . Both terms vary are close when the is large, but the later term put less weight on materials with extremely low Henry’s constants. We introduce because high Henry’s constant would be favored for gas adsorption, while discovering materials with extremely low Henry’s constant is not of practical interest.
10-folded validation was used through all experiments. For each experiment, samples were randomly divided into three groups. of the samples for training, of samples for validation, and the rest were used for testing. The parameters were selected based on the performance on validation data, and then the model was evaluated on the rest of the samples for testing. Mean Absolute Error (MAE) was picked as the measurement. The above process was performed 10 times, and the algorithm’s performance was evaluated using the average of MAE.
For the algorithm in RN18 , the network structure includes one feature embedding layer, three graph convolution layers, and three fully-connected layers. More details could be found in the original literature. For the 3DGCN in RN19
, we prefer the max pooling for comparison. Similarly, the details of the 3DGCN could be found in the original literature. The proposed algorithm employs three convolution layers and three fully connected layers, similar to the CGCNN. The following parameters were shared for all tasks. The batch size was set as, and a weight decay was . The above setting was helpful to enhance the generalization of the prediction model. Besides, we used Adam as the optimizer, and the initial learning rate was set as for all tasks. The experiments were performed on a virtual machine with 24 vCPU (2.5GHz), 112 G ram, and 1 V100 GPU (5120 CUDA Cores and 32Gb ram).
We first tested the proposed method on MOF data, and the property of interest is . The structures of MOF include more atoms and are more complex than the crystals. We have 10136 MOF structures for the experiments. As mentioned above, we predicted both and . The results are summarized in Figure 1. It could be seen that the proposed method achieved better MAE comparing to the classical GCN model. The results proved that the proposed method could handle complex structures and further validated our hypothesis. It also shows that achieve better MAE. Considering both terms are one to one mapping, the is a better pre-processing choice for prediction task.
There is one crucial hyper-parameter is the number of neighbors while calculating the convolution (see equation 2
). It is similar to the scale of the convolutional kernel and the scale in computer vision. To check the effect of the parameter, we repeated the experiments using the same setup with different scale parameters, which are 4, 8, 12, and 16. Specifically, we use two convolutional layers and one fully connected layer. The batch size was set to 32 to suppress the overfit. The samples were divided into training, validation, and testing as we described in the above section. The training processes converged in 100 epochs. The performance was summarized in Figure2.
It could be found that the number of neighbors plays an essential role in the model. The MAE is significantly higher when only a few neighbor atoms, e.g., four, were involved in the convolution. The MAE reduced while increasing the number and put more atoms into the calculation. However, when we increased the size of the neighborhood to a specific range (e.g., 12), a future increment didn’t increase the performance. The newly involved atoms have no interaction with the target atom and are not helpful for the model because of the distances. The result suggested that the calculation of the connection matrix was not needed if we picked the proper scale parameter. On the other hand, if the scale was too small, the convolution didn’t consider the target atom’s whole local environment, and the model underperformed.
The network structure is an important part that affects the performance of the network. However, the simplified structure was used for the proposed method due to its superiority in describing the materials’ physical features. We used two convolutional layers and one fully connected layer through the experiments, and the increment of the convolutional layers led to overfitting on the training data, even we have plenty of MOF data.
One concern about the proposed method is the computational cost. There are two important parts when discussing about computational efficiency, the graph building and convolution calculation. Our algorithm added extra steps to generate rotation invariant coordination, which cost more time. Meanwhile, the new convolution layer has much parameters than the existing methods. Assuming the convolution involves four neighbor atoms, and the atom feature and the edge feature are 128 and 32 respectively. The new convolution has slightly over 100 thousands parameters, versus 36 thousands parameters for the convolution in RN18 . The average time consumed for samples used in the experiments were summarized in Table 1.
As expected, it could be found that the extra process during graph building increased the computational cost. For MOF data, the average processing time increased from 335.89 milliseconds to 493.15 milliseconds, and from 113.06 milliseconds to 120.7 milliseconds. It is noticeable that the computational cost of training (forward and backward) reduced significantly. The reason is that the method in RN18 needs to sum up the effect of the neighbor atoms and multiple convolutions are needed. Improvement could be made by introducing more computation units and compute the convolutions in parallel. But the possible improvement is by no means without limitations. Comparingly, the proposed method improves the overall computational efficiency by reducing the training time.
The 3DGCN algorithm also involves the coordination of the atoms. However, the performance of the 3DGCN is only slightly better comparing to the classical GCN model and was outperformed by the proposed method. By analyzing the model structure, one could discover that the 3DGCN could only model the pairwise interaction, but the surrounding atoms interact with the target atom simultaneously. The experimental results prove that the proposed method that models multiple atoms at the same time could better fit the physics laws.
On the other hand, the performance of the proposed method is better than the classical GCN model, but the advantage is not as significant as the other two properties. The calculation of bond length involves the atom type and the distance between atoms, which is involved in the two-dimensional graph description, and thus the introduction of the three-dimensional topology didn’t benefit much.
We also tested the new algorithm on the ICSD crystal data. MAE was picked as measurement, and the results are summarized in Table 2.
|Database||# of training data||Unit||RN18||RN19||KAGCN|
|Large Li-Site||863||(0.1 nm)||0.390||0.375||0.329|
|Percolation Radius||2524||(0.1 nm)||0.063||0.060||0.052|
|Minimum Bond Length||1760||(0.1 nm)||0.697||0.700||0.653|
The results in present that the proposed method outperformed two GCN based method on prediction topology-based predictor. According to RN25 , the enlarged Li site originates from the large local space within the crystal structural framework, and the percolation radius is defined as the maximum radius of a sphere that can percolate across at least one direction of the structure. Both definitions involve the three-dimensional topology information, and the proposed algorithm benefited from the extra information and thus outperformed the classical GCN model. The ICSD data set experiments validated our hypotheses that the introduction of three-dimensional information into the GCN model improves the model performance.
The paper investigated three-dimensional topological information usage in the GCN model and proposed the KAGCN model. The experimental results presented that the 3D topology plays a vital role in modeling the interactions among atoms and the proposed network structure better reflected the real-world interaction comparing to existing graph-based models. Meanwhile, we developed an algorithm that generated rotation invariant descriptors for the distribution of local atoms. The invariant descriptor enhances the generalization of the prediction model by eliminating the ambiguity caused by rotation. Due to the benefits brought by our improvements, the KAGCN model outperformed the existing methods on MOF and crystal property prediction. The GCN models present excellent flexibility and effectiveness in modeling material structures. Our experiments proved that model material structure as a 3D point cloud is feasible. The proposed model could be extended to more property prediction tasks and could be easily embedded in other models such as graph autoencoders and graph generative networks. Meanwhile, the introduction of the scale parameter helps discover the most significant local environment for the atoms and may lead to novel meaningful chemistry structures.
Accurate and rapid prediction of the material property is a challenging problem. The recent major advance of deep learning and the emerge of large-scale material databases empower the development of the application of machine learning technologies in the field. However, the variant material structure still stands in the way hindering immense success of these technologies. Graph representation is concise and useful in describing variant structures. The GCN built on graph representation could effectively handle most materials analysis tasks. The undirected graph could hardly carry the higher dimensional topology, which is vital to understand the interaction between atoms. This paper defines a new convolutional operation that involves three-dimensional topological information.
The three-dimensional information makes the process more complex than the operation on the graph. For materials, one of the core problems is that the operation should be rotation invariant in the three-dimensional space. Inspired by the classic SIFT algorithm, we propose to rotate each atom’s neighborhood in 3D space according to the nearest neighbor’s coordinates. The new description is rotation invariant. Moreover, Both RN19 and RN5 reported reduced calculation efficiency after involved the 3D information into consideration. The additional computational cost of calculating the new description depends on the total number of atoms in the structure and the neighborhood scales s, which is a constant. Thus, the computational complexity is . The increased computational cost doesn’t affect the forward/backward process and doesn’t slow down the training process. Besides, the experiments present that the new convolutional operation performs faster than looping over all bonded atoms.
The concept of ’scale’ should also be considered in the new model. The existing methods loop over all the atom pairs that ’bonded’ to sum up all neighbor atoms’ effects. We can eliminate the connected matrix in the new network structure and consider the proper scale parameter to prospect the corresponding local area. The re-introduction of the scale aspect helps the future applications of existing deep neural networks in chemistry.
Appendix A Appendix: The calculation of Henry’s Contant
The Henry’s constant RN26 is calculated as follows,
where , represents gas constant, stands for Boltzmann constant. is the absolute temperature and is the external potential. In this work, we consider hydrogen adsorption in MOF where the interaction is mainly contributed by van der Waals interaction. Lennard-Jones 12-6 potential is therefore used to describe the external potential of hydrogen inside MOF with the cutoff distance of 12.9 .
where is the pairwise distance between gas molecules and atoms in the MOFs, and and are the size and energy parameters. In this work, universal force field (UFF) is used for all atoms in MOFs. For hydrogen molecule, nm and K. For interactions between different kinds of atoms, Lorentz-Berthelot mixing rules are used.
Appendix B Appendix: The removal of MOF structures with extreme values
The original CoRE2019 MOF database contains 12763 MOF structures. According to our calculation of Henry’s constant, the range of the Henry’s constant covers more than 30 orders of magnitude (as shown in Figure B1). After examining the MOF structure files, we found the board range of Henry’s constant is due to some unreasonable MOF structures. In Figure B1, we can see that most MOFs distribute around unit Henry’s constant in the unit of . There are a small portion of MOFs with exceptionally high Henry’s constant in Figure B1.
A closer examination reveals that in these structures, atoms are unreasonably close to each other. For example, in MOF (RUBLEH) with the highest Henry’s constant, the distance of oxygen-oxygen there is only , which is even shorter than the shortest covalent bond (hydrogen-hydrogen ). The overlapped atom position would lead to unreasonably high attraction and unreasonably high Henry’s constant.
After getting rid of MOF structure with potentially overlapped atoms (with smallest atom distance smaller than ), the distribution of Henry’s constant is more reasonable as shown in Figure B2. Most MOFs with exceptionally high Henry’s constant are filtered out.
It is also worth noticing that some Henry’s constant is also extremely low (at the order of 10-10). For the low Henry’s constant for those MOFs is due to the relative confined geometry (XEKUO). Compared to other MOF with similar void fraction, it has relative thin structure and therefore most interaction inside the MOF are repulsive which results in a low Henry’s constant. For example, compared with XEJJOM (void volume of 253 ), XEKUO (void volume of 185 3) has 0 surface area while XEJJOM has surface of 9.40 when using test particle with diameter of 3.68 (size of nitrogen molecule). After getting rid of MOF with surface area less than 50 (meaning that there will be very little space and very unlikely for surface adsorption to happen), all the MOFs with extremely low Henry’s constant were filtered out. The cleaned data set contains 10136 MOF structures. The Henry’s constants still cover the several order of magnitudes. However, we believe they may be reachable. The distribution of Henry’s constants of the cleaned data set is shown in Figure B3.