Integrating Crystal Graph Convolutional Neural Network with Multitask Learning for Material Property Prediction
Developing accurate, transferable and computationally inexpensive machine learning models can rapidly accelerate the discovery and development of new materials. Some of the major challenges involved in developing such models are, (i) limited availability of materials data as compared to other fields, (ii) lack of universal descriptor of materials to predict its various properties. The limited availability of materials data can be addressed through transfer learning, while the generic representation was recently addressed by Xie and Grossman , where they developed a crystal graph convolutional neural network (CGCNN) that provides a unified representation of crystals. In this work, we develop a new model (MT-CGCNN) by integrating CGCNN with transfer learning based on multi-task (MT) learning. We demonstrate the effectiveness of MT-CGCNN by simultaneous prediction of various material properties such as Formation Energy, Band Gap and Fermi Energy for a wide range of inorganic crystals (46774 materials). MT-CGCNN is able to reduce the test error when employed on correlated properties by upto 8 compared to CGCNN, even when the training data is reduced by 10 demonstrate our model's better performance through prediction of end user scenario related to metal/non-metal classification. These results encourage further development of machine learning approaches which leverage multi-task learning to address the aforementioned challenges in the discovery of new materials. We make MT-CGCNN's source code available to encourage reproducible research.READ FULL TEXT VIEW PDF
We develop new transfer learning algorithms to accelerate prediction of
Despite increasing focus on data publication and discovery in materials
The adoption of machine learning in materials science has rapidly transf...
Machine learning has been widely adopted to accelerate the screening of
Understanding dynamic fracture propagation is essential to predicting ho...
Development of next-generation electronic devices for applications call ...
Stochastic microstructure reconstruction has become an indispensable par...
Integrating Crystal Graph Convolutional Neural Network with Multitask Learning for Material Property Prediction
The discovery, design and development of new materials with required properties underpin the development of various next generation energy, medical and electronic technologies. Discovery of new materials has historically been made through trial and error process leading to slow development cycles Tom Kalil and Cyrus Wadia . The advent of data driven modeling techniques has provided a new approach to develop computationally inexpensive and accurate models, that enables us to rapidly screen large material search spaces to select potential material candidates with desired properties. These approaches have recently been employed to predict new materials for various functionalities such as thermoelectrics Gaultois et al. (2016), photovoltaics Lu et al. (2018), molecular light emitting diodesGómez-Bombarelli et al. (2016) and shape memory alloys Xue et al. (2016) among others.
One of the major challenges in developing data driven models for material discovery is the limited availability of the material datasets compared to other fields. This creates challenges in applying conventional machine learning tools for materials data. Recent works have proposed transfer learning Hutchinson et al. (2017) and augmenting the model with pre-existing physical knowledge Narendra Kumar et al.
to overcome this data constraint. Multi-task learning (MTL) is an important class of transfer learning algorithms that enables us to overcome such data scarcity challenges. MTL is the procedure of learning several tasks at the same time with the objective of mutually benefitting the performance of individual tasks. In this way, MTL is able to learn generalized representations (embeddings) that can explain multiple aspects of the data. Also, it is able to overcome data limitations by co-learning multiple tasks simultaneously. Using multi-task learning has shown improvements in various fields of machine learning, from natural language processing(Collobert and Weston, 2008)2015) to drug discovery (Ramsundar et al., 2015) and pharmaceuticals Ramsundar et al. (2017) among others.
The other major challenge in material science is to be able to come up with a universal material descriptor that can be used to predict various material properties. Until recently most of the work in literature has focused on developing hand crafted descriptors based on domain expertise Huang and von Lilienfeld (2016); Bartók and Csányi (2015). However, these approaches typically are difficult to be generalized outside the tasks (properties) for which they were trained. Molecules and crystals can be defined by their chemical composition (atoms) and structure (bonding). Hence, they are naturally amenable to a generalized graph representation. Recent progress in Geometric deep learning (Bronstein et al., 2017) has lead to formulation of graph based deep neural networks for graphical structures (Gori et al., 2005; Scarselli et al., 2009; Kipf and Welling, 2017; Bruna et al., 2014)
. These deep learning based approaches can automatically learn the best representation (embedding) from raw data of atoms/bonds features for different property predictions. These approaches have been successfully applied to molecules for performing various tasks such as molecular feature extraction(Duvenaud et al., 2015; Kearnes et al., 2016; Gilmer et al., 2017) and drug discovery (Altae-Tran et al., 2017). Recently, Xie and Grossman (2018) have developed a GCN based approach for inorganic crystals called crystal graph convolutional neural network (CGCNN), to predict various properties of inorganic crystals.
In this work, we bridge the two approaches by augmenting CGCNN model with multitask learning (MTL) to jointly predict multiple material properties. This approach of simultaneous prediction of different properties ensures that the generic model can automatically transfer the learning of one property to another that results in better performance. We demonstrate this approach through simultaneous prediction of various material properties such as Formation Energy (), Band Gap () and Fermi Energy () for a wide range of inorganic crystals (46774 materials). We also systematically explore the impact of our approach on test errors for different MTL experiments with varying amounts of training data. Finally, we also understand the impact of our method on end user scenario related to metal/non-metal classification.
The work by Xie and Grossman (2018) focuses on building a generalized crystal graph convolutional network to represent the crystals and to predict their properties with accuracy of ab initio physics models. A crystal graph is an undirected multigraph defined by nodes representing atoms and edges representing bonds in a crystal. It allows multiple edges between the same pair of end nodes which represent the different bonds between the atoms. Thus, the graph is defined as , where is the set of atoms in the crystal structure, , is the set of undirected edges and is the number of atoms in the crystal graph. contains the features of the atom encoding properties of the atom.
is the feature vector for thebond between atoms and . The authors propose a simple convolution function as,
where denotes the concatenation of atom and bond feature vectors of the neighbors of atom, , , and are the convolution weight matrix, self weight matrix, convolution bias and self bias of the -th layer of GCN respectively, and
is some non-linear activation function between layers.
As noted by the authors, this formulation has a shortcoming. Since the weight matrix is shared across all neighbors, equal importance is given to all the neighbors. This inherently neglects the differences of interaction strength between neighbors. To overcome this, the authors use the standard edge-gating technique (Marcheggiani and Titov, 2017), where the new convolution function first concatenates neighbor feature vectors , and then performs convolution by,
where denotes element-wise multiplication and
denotes a sigmoid function. Theacts as a learned weight matrix to incorporate different interaction strengths between neighbors.
The atom features are then pooled (using average pooling (Duvenaud et al., 2015)) to get a vector representation of the crystal . This is then used as an input to a network of fully-connected layers with non-linearities which learn to predict a property value for the crystal. More concretely,
where is the learned feature representation of atom using Eq. 2, is the crystal representation learned from pooling and is the predicted value of the crystal property. , and are the weight matrix, bias and non-linearities of the fully-connected network respectively.
The fundamental motivation for doing multi-task learning is to achieve better generalization performance. As summarized by (Caruana, 1997), "MTL improves generalization by leveraging the domain-specific information contained in the training signals of related tasks". The two main architectures for MTL in the deep learning context (Ruder, 2017) are:
Hard parameter sharing: This is the simplest approach to MTL. The architecture shares a common set of layers across all tasks and then some task-specific output layers are present for each individual task. The key motivation is to force the model to learn better representations that can be used to learn multiple related tasks at the same time.
Soft parameter sharing: Here, there are independent models with own set of parameters for each of the tasks being learned. But then, the distance between the parameters ( distance) are regularized to encourage learning of similar parameters for the different models. This indirectly leads to a generalized representation with the flexibility of unique parameters for each task.
Fig. 1 shows the schematics of the MT-CGCNN model setup. Every atom and bond between atoms in a crystal has some initial vector representation (Xie and Grossman, 2018). The feature embedding for atoms () and bonds () are the input to the GCN layers. Stacked GCN layers are used to encode these atomic representations using Eq. 2. This is then followed by a pooling layer (Eq. 3) which gives a vector representation for the crystal structure . We then use hard parameter sharing MTL, where for each crystal property being learned, there is an independent fully-connected network which takes and predicts the property value as,
where is the crystal property value for the property. , and are the weight matrix, bias and non-linear mapping of the fully-connected network respectively. So, each task essentially shares the crystal representation
and tries to learn functions that can predict a set of crystal properties. In this work, we employ mean squared loss function for each property. The total loss function for the network is the weighted linear sum of individual losses from parts of the network. This formulation of the total loss function is a common setup for the multi-tasking problem(Zhao Chen and Rabinovich, 2018; Kendall et al., 2018). Mathematically,
where is the total loss of the network, are individual losses from each of the task-specific layers and are the weights for the individual losses. A trivial setup is where which gives an average loss across tasks. For our experiments, each of is mean squared error defined by
where is the mini-batch size during an iteration. is the model predicted property value and is the target property value for the property. Finally, back-propagation using gradient descent (Rumelhart et al., 1988) is done to train the model. The source code for MT-CGCNN is available at https://github.com/soumyasanyal/mt-cgcnn.
MT-CGCNN is trained and validated on inorganic crystal data comprising of 46774 materials used by Xie and Grossman (2018) which is obtained from the Materials Project (MP) (Jain et al., 2013). In our experiments, we focus on three correlated properties namely, Formation Energy (), Band Gap () and Fermi Energy ().
One of the crucial problems in multitasking is to understand which tasks could probably help in an MTL setupCaruana (1997); Ruder (2017). While there have been advancements towards understanding that problem Xu et al. (2017); Bingel and Søgaard (2017), in our setup we select tasks which have significant correlation. The Pearson correlation coefficients (Benesty et al., 2009) for the three properties – , and are shown in Fig. 2.
Weighted loss as defined in Eq. 6 is useful for cases when we want to give more importance to one task over another. This may be needed in cases when a specific task is harder to learn than the rest and hence would not get equally trained as others Zhao Chen and Rabinovich (2018)
. In our current setup, we consider these weights as hyperparameters for the model and search for the best weights.
To evaluate MT-CGCNN, we run a set of experiments with setup as detailed in Table 1. The results from our experiments are summarized in Table 2 and Table 3. We report mean absolute error (MAE) over 5 runs with random splits of 60/20/20 ratio of train, validation and test sets, unless specified otherwise. To get the numbers for the CGCNN model, we used the code provided by the authors 111https://github.com/txie-93/cgcnn with the hyperparameters reported in their work.
|E1||Formation Energy () and Band Gap ()|
|E2||Formation Energy () and Fermi Energy ()|
|E3||Band Gap () and Fermi Energy ()|
|E4||Formation Energy (), Band Gap () and Fermi Energy ()|
In Table 2, the average MAE (the average of MAEs for individual properties) is tabulated with the relative increase in performance over the baseline due to multi-tasking. Here, we can see that multi-task learning clearly outperforms the single-task CGCNN model across all the experiments. In Table 3 we show how our model performs on individual properties compared to single task setup (CGCNN). For example, we observe a strong reduction in the MAE scores of when we do multi-tasking using and . A similar trend is observed for when we do multi-tasking using and . These observations indicate that multi-tasking is more helpful when done with a specific combination of tasks. We observe from Table 3 that prediction shows degradation during multi-task learning, likely due to the strong constraints of hard parameter sharing.
Further, we do another set of experiment where we systematically reduce the training data available to the different models and check the model performance for the reduced training dataset. The results are shown in Table 4. We observe that MT-CGCNN outperforms CGCNN for the same amount of input data. Specifically, we note that the MAE values of MT-CGCNN using 50% training data is better than CGCNN using 60% training data. This is a reduction of approximately 4.5k training samples for the current setup. This result verifies that multi-tasking leads to comparable performance even with lesser training data. Also, it indirectly shows that multi-tasking leads to a faster learning of the crystal embedding space.
|MT-CGCNN||E1||0.043 0.001||0.290 0.004||-|
|E2||0.041 0.001||-||0.363 0.003|
|E3||-||0.319 0.004||0.373 0.003|
|E4||0.050 0.002||0.295 0.004||0.363 0.006|
Beyond test error evaluation, we also evaluate our model on scenarios that are useful for the end users. In the case of material scientists and chemists, this translates into obtaining chemical insights from the predicted data. This, in turn, provides another framework to compare the two approaches. Here, we analyze two scenarios that can provide some chemical insights.
For the first scenario, we compare the ordering of different materials based on Formation energy. The difference between Formation energy helps to understand the relative stability of different materials. Hence, from the end user standpoint, it is more important to rank the crystals correctly using the rather than the accuracy of prediction. To quantify this ordering (ranking) of materials, we calculate the Spearman’s rank correlation coefficient Myers and Well (2003) for the predicted and true using MT-CGCNN and CGCNN for different amounts of training data as shown in Fig. 3(c). The values of both the approaches are very high and comparable. This suggests that the ordering between the crystals based on their is mostly preserved.
In case of second scenario, based on
we classify the materials into two classes namely (i)metals – that can easily conduct electrons and (ii) non-metals such as semiconductors and insulators where electron conduction is constrained. The energy equivalent of a physical system maintained at temperature is calculated as , where is Boltzmann constant. In case of room temperature (), this value is 0.025eV. Hence, crystals with less than 0.025 eV are considered metals, while the rest of them are considered non-metals comprising of semiconductors and insulators. Fig. 3(d) shows the area under the curve (AUC) for crystal classification into metal/non-metal using MT-CGCNN and CGCNN for different amounts of training data. It can be observed that MT-CGCNN has a much higher accuracy in classification compared to CGCNN as measured by the AUC metric. In fact, as a function of training data, the lowest AUC of MT-CGCNN is still higher than the highest AUC of CGCNN.
We divide the dataset into train, validation and test splits. To tune the hyperparameters, we train the model using the training set and then check the test error on the validation set. We perform grid search with early stopping over the hyperparameter space mentioned in Table 5. For training, we use Adam optimizer (Kingma and Ba, 2015) with a learning rate of 0.01.
|Number of convolutional layers||1, 2, 3, 4, 5|
|Length of learned atom feature vector||16, 32, 64, 128|
Length of graph hidden representation
|16, 32, 64, 128|
|Number of hidden fully-connected layers per task||1, 2, 3, 4|
|Step size of the Adam optimizer||, , ,|
|Weights in the weighted loss (Eq. 6)||1, 2, 3, 4, 5, 6, 7|
In summary, we propose MT-CGCNN, an effective multi-tasking framework that uses crystal graph convolutions to predict different material properties (, , ) by exploiting the correlation between them. We also show that MT-CGCNN can achieve comparable accuracy as CGCNN with fewer training samples. Additionally, we demonstrate the effectiveness of MT-CGCNN by testing some end user scenarios relating to the ordering of crystal based on and classification of materials based on . The ability to predict multiple properties shows that the material representation learned is well generalized. This work opens up new research directions for machine learning with material science, where we can continue to build upon the framework of MT-CGCNN (eg. including soft-parameter sharing) to predict other functional properties of materials with limited input data. Also, exploring dynamic weighted loss has the advantage of not requiring extensive hyperparameter tuning. Integrating this with MT-CGCNN is left for future works (Zhao Chen and Rabinovich, 2018; Kendall et al., 2018). We make MT-CGCNN’s source code available to encourage reproducible research 222https://github.com/soumyasanyal/mt-cgcnn.
This work was funded by Shell. We would like to thank Professor Umesh Waghmare from Jawaharlal Nehru Centre for Advanced Scientific Research and Professor Arnab Bhattacharyya from Indian Institute of Science for their insightful discussions. We would also like to thank Tian Xie for providing clarifications on various aspects of the CGCNN code.
Kendall, A.; Gal, Y.; Cipolla, R. Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018.