MT-CGCNN: Integrating Crystal Graph Convolutional Neural Network with Multitask Learning for Material Property Prediction

11/14/2018 ∙ by Soumya Sanyal, et al. ∙ indian institute of science shell 12

Developing accurate, transferable and computationally inexpensive machine learning models can rapidly accelerate the discovery and development of new materials. Some of the major challenges involved in developing such models are, (i) limited availability of materials data as compared to other fields, (ii) lack of universal descriptor of materials to predict its various properties. The limited availability of materials data can be addressed through transfer learning, while the generic representation was recently addressed by Xie and Grossman [1], where they developed a crystal graph convolutional neural network (CGCNN) that provides a unified representation of crystals. In this work, we develop a new model (MT-CGCNN) by integrating CGCNN with transfer learning based on multi-task (MT) learning. We demonstrate the effectiveness of MT-CGCNN by simultaneous prediction of various material properties such as Formation Energy, Band Gap and Fermi Energy for a wide range of inorganic crystals (46774 materials). MT-CGCNN is able to reduce the test error when employed on correlated properties by upto 8 compared to CGCNN, even when the training data is reduced by 10 demonstrate our model's better performance through prediction of end user scenario related to metal/non-metal classification. These results encourage further development of machine learning approaches which leverage multi-task learning to address the aforementioned challenges in the discovery of new materials. We make MT-CGCNN's source code available to encourage reproducible research.



There are no comments yet.


page 1

page 2

page 3

page 4

Code Repositories


Integrating Crystal Graph Convolutional Neural Network with Multitask Learning for Material Property Prediction

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The discovery, design and development of new materials with required properties underpin the development of various next generation energy, medical and electronic technologies. Discovery of new materials has historically been made through trial and error process leading to slow development cycles Tom Kalil and Cyrus Wadia . The advent of data driven modeling techniques has provided a new approach to develop computationally inexpensive and accurate models, that enables us to rapidly screen large material search spaces to select potential material candidates with desired properties. These approaches have recently been employed to predict new materials for various functionalities such as thermoelectrics Gaultois et al. (2016), photovoltaics Lu et al. (2018), molecular light emitting diodesGómez-Bombarelli et al. (2016) and shape memory alloys Xue et al. (2016) among others.

One of the major challenges in developing data driven models for material discovery is the limited availability of the material datasets compared to other fields. This creates challenges in applying conventional machine learning tools for materials data. Recent works have proposed transfer learning Hutchinson et al. (2017) and augmenting the model with pre-existing physical knowledge Narendra Kumar et al.

to overcome this data constraint. Multi-task learning (MTL) is an important class of transfer learning algorithms that enables us to overcome such data scarcity challenges. MTL is the procedure of learning several tasks at the same time with the objective of mutually benefitting the performance of individual tasks. In this way, MTL is able to learn generalized representations (embeddings) that can explain multiple aspects of the data. Also, it is able to overcome data limitations by co-learning multiple tasks simultaneously. Using multi-task learning has shown improvements in various fields of machine learning, from natural language processing

(Collobert and Weston, 2008)

, computer vision

(Girshick, 2015) to drug discovery (Ramsundar et al., 2015) and pharmaceuticals Ramsundar et al. (2017) among others.

The other major challenge in material science is to be able to come up with a universal material descriptor that can be used to predict various material properties. Until recently most of the work in literature has focused on developing hand crafted descriptors based on domain expertise Huang and von Lilienfeld (2016); Bartók and Csányi (2015). However, these approaches typically are difficult to be generalized outside the tasks (properties) for which they were trained. Molecules and crystals can be defined by their chemical composition (atoms) and structure (bonding). Hence, they are naturally amenable to a generalized graph representation. Recent progress in Geometric deep learning (Bronstein et al., 2017) has lead to formulation of graph based deep neural networks for graphical structures (Gori et al., 2005; Scarselli et al., 2009; Kipf and Welling, 2017; Bruna et al., 2014)

. These deep learning based approaches can automatically learn the best representation (embedding) from raw data of atoms/bonds features for different property predictions. These approaches have been successfully applied to molecules for performing various tasks such as molecular feature extraction

(Duvenaud et al., 2015; Kearnes et al., 2016; Gilmer et al., 2017) and drug discovery (Altae-Tran et al., 2017). Recently, Xie and Grossman (2018) have developed a GCN based approach for inorganic crystals called crystal graph convolutional neural network (CGCNN), to predict various properties of inorganic crystals.

In this work, we bridge the two approaches by augmenting CGCNN model with multitask learning (MTL) to jointly predict multiple material properties. This approach of simultaneous prediction of different properties ensures that the generic model can automatically transfer the learning of one property to another that results in better performance. We demonstrate this approach through simultaneous prediction of various material properties such as Formation Energy (), Band Gap () and Fermi Energy () for a wide range of inorganic crystals (46774 materials). We also systematically explore the impact of our approach on test errors for different MTL experiments with varying amounts of training data. Finally, we also understand the impact of our method on end user scenario related to metal/non-metal classification.

2 Background

2.1 Crystal Graph Convolution Neural Network (CGCNN)

The work by Xie and Grossman (2018) focuses on building a generalized crystal graph convolutional network to represent the crystals and to predict their properties with accuracy of ab initio physics models. A crystal graph is an undirected multigraph defined by nodes representing atoms and edges representing bonds in a crystal. It allows multiple edges between the same pair of end nodes which represent the different bonds between the atoms. Thus, the graph is defined as , where is the set of atoms in the crystal structure, , is the set of undirected edges and is the number of atoms in the crystal graph. contains the features of the atom encoding properties of the atom.

is the feature vector for the

bond between atoms and . The authors propose a simple convolution function as,


where denotes the concatenation of atom and bond feature vectors of the neighbors of atom, , , and are the convolution weight matrix, self weight matrix, convolution bias and self bias of the -th layer of GCN respectively, and

is some non-linear activation function between layers.

As noted by the authors, this formulation has a shortcoming. Since the weight matrix is shared across all neighbors, equal importance is given to all the neighbors. This inherently neglects the differences of interaction strength between neighbors. To overcome this, the authors use the standard edge-gating technique (Marcheggiani and Titov, 2017), where the new convolution function first concatenates neighbor feature vectors , and then performs convolution by,


where denotes element-wise multiplication and

denotes a sigmoid function. The

acts as a learned weight matrix to incorporate different interaction strengths between neighbors.

The atom features are then pooled (using average pooling (Duvenaud et al., 2015)) to get a vector representation of the crystal . This is then used as an input to a network of fully-connected layers with non-linearities which learn to predict a property value for the crystal. More concretely,


where is the learned feature representation of atom using Eq. 2, is the crystal representation learned from pooling and is the predicted value of the crystal property. , and are the weight matrix, bias and non-linearities of the fully-connected network respectively.

2.2 Multi-task learning

The fundamental motivation for doing multi-task learning is to achieve better generalization performance. As summarized by (Caruana, 1997), "MTL improves generalization by leveraging the domain-specific information contained in the training signals of related tasks". The two main architectures for MTL in the deep learning context (Ruder, 2017) are:

  • Hard parameter sharing: This is the simplest approach to MTL. The architecture shares a common set of layers across all tasks and then some task-specific output layers are present for each individual task. The key motivation is to force the model to learn better representations that can be used to learn multiple related tasks at the same time.

  • Soft parameter sharing: Here, there are independent models with own set of parameters for each of the tasks being learned. But then, the distance between the parameters ( distance) are regularized to encourage learning of similar parameters for the different models. This indirectly leads to a generalized representation with the flexibility of unique parameters for each task.

A more detailed discussion on various aspects of multi-task learning could be found in (Caruana, 1997; Ruder, 2017)

3 Proposed method (MT-CGCNN)

Fig. 1 shows the schematics of the MT-CGCNN model setup. Every atom and bond between atoms in a crystal has some initial vector representation (Xie and Grossman, 2018). The feature embedding for atoms () and bonds () are the input to the GCN layers. Stacked GCN layers are used to encode these atomic representations using Eq. 2. This is then followed by a pooling layer (Eq. 3) which gives a vector representation for the crystal structure . We then use hard parameter sharing MTL, where for each crystal property being learned, there is an independent fully-connected network which takes and predicts the property value as,


where is the crystal property value for the property. , and are the weight matrix, bias and non-linear mapping of the fully-connected network respectively. So, each task essentially shares the crystal representation

and tries to learn functions that can predict a set of crystal properties. In this work, we employ mean squared loss function for each property. The total loss function for the network is the weighted linear sum of individual losses from parts of the network. This formulation of the total loss function is a common setup for the multi-tasking problem

(Zhao Chen and Rabinovich, 2018; Kendall et al., 2018). Mathematically,


where is the total loss of the network, are individual losses from each of the task-specific layers and are the weights for the individual losses. A trivial setup is where which gives an average loss across tasks. For our experiments, each of is mean squared error defined by


where is the mini-batch size during an iteration. is the model predicted property value and is the target property value for the property. Finally, back-propagation using gradient descent (Rumelhart et al., 1988) is done to train the model. The source code for MT-CGCNN is available at

Figure 1: (best viewed in color) Overview of MT-CGCNN: Given a crystal structure, a crystal graph is created from it. Note that the graph created can have multiple edges between the atoms representing different atomic bonds. Next, CGCNN is used to extract the crystal representation using Graph Convolutional Networks. The crystal representation is then used as input for different task-specific fully connected layers () which predict some property of the crystal. Refer to section 3 for more details.

4 Experiments and results

4.1 Dataset

MT-CGCNN is trained and validated on inorganic crystal data comprising of 46774 materials used by Xie and Grossman (2018) which is obtained from the Materials Project (MP) (Jain et al., 2013). In our experiments, we focus on three correlated properties namely, Formation Energy (), Band Gap () and Fermi Energy ().

4.2 Correlation between properties

One of the crucial problems in multitasking is to understand which tasks could probably help in an MTL setup

Caruana (1997); Ruder (2017). While there have been advancements towards understanding that problem Xu et al. (2017); Bingel and Søgaard (2017), in our setup we select tasks which have significant correlation. The Pearson correlation coefficients (Benesty et al., 2009) for the three properties – , and are shown in Fig. 2.

Figure 2: Correlation plots between different properties.

4.3 Weighted loss

Weighted loss as defined in Eq. 6 is useful for cases when we want to give more importance to one task over another. This may be needed in cases when a specific task is harder to learn than the rest and hence would not get equally trained as others Zhao Chen and Rabinovich (2018)

. In our current setup, we consider these weights as hyperparameters for the model and search for the best weights.

4.4 Model evaluation

To evaluate MT-CGCNN, we run a set of experiments with setup as detailed in Table 1. The results from our experiments are summarized in Table 2 and Table 3. We report mean absolute error (MAE) over 5 runs with random splits of 60/20/20 ratio of train, validation and test sets, unless specified otherwise. To get the numbers for the CGCNN model, we used the code provided by the authors 111 with the hyperparameters reported in their work.

Experiment Setup
E1 Formation Energy () and Band Gap ()
E2 Formation Energy () and Fermi Energy ()
E3 Band Gap () and Fermi Energy ()
E4 Formation Energy (), Band Gap () and Fermi Energy ()
Table 1: Experimental Setup for evaluation

In Table 2, the average MAE (the average of MAEs for individual properties) is tabulated with the relative increase in performance over the baseline due to multi-tasking. Here, we can see that multi-task learning clearly outperforms the single-task CGCNN model across all the experiments. In Table 3 we show how our model performs on individual properties compared to single task setup (CGCNN). For example, we observe a strong reduction in the MAE scores of when we do multi-tasking using and . A similar trend is observed for when we do multi-tasking using and . These observations indicate that multi-tasking is more helpful when done with a specific combination of tasks. We observe from Table 3 that prediction shows degradation during multi-task learning, likely due to the strong constraints of hard parameter sharing.

Further, we do another set of experiment where we systematically reduce the training data available to the different models and check the model performance for the reduced training dataset. The results are shown in Table 4. We observe that MT-CGCNN outperforms CGCNN for the same amount of input data. Specifically, we note that the MAE values of MT-CGCNN using 50% training data is better than CGCNN using 60% training data. This is a reduction of approximately 4.5k training samples for the current setup. This result verifies that multi-tasking leads to comparable performance even with lesser training data. Also, it indirectly shows that multi-tasking leads to a faster learning of the crystal embedding space.

Experiment CGCNN MT-CGCNN Improvement(%)
E1 0.181 0.166 8.3%
E2 0.210 0.202 3.8%
E3 0.352 0.346 1.7%
E4 0.247 0.236 4.4%
Table 2: Average MAE values with percentage of improvement for different experiments on , and . Our model performs consistently better than baseline (CGCNN). Refer section 4.4 for more details.
Method Experiment (eV/atom) (eV) (eV)
CGCNN 0.039 0.0003 - -
- 0.323 0.006 -
- - 0.380 0.006
MT-CGCNN E1 0.043 0.001 0.290 0.004 -
E2 0.041 0.001 - 0.363 0.003
E3 - 0.319 0.004 0.373 0.003
E4 0.050 0.002 0.295 0.004 0.363 0.006
Table 3: Individual MAE of three properties - , and using CGCNN and MT-CGCNN models. Our model performs better for and prediction. Refer section 4.4 for more details.
20% 30% 40% 50% 60% 20% 30% 40% 50% 60%
0.062 0.052 0.046 0.043 0.039 0.062 0.053 0.049 0.046 0.043
0.424 0.385 0.356 0.332 0.323 0.388 0.346 0.326 0.301 0.290
Avg MAE 0.243 0.218 0.201 0.188 0.181 0.225 0.200 0.188 0.174 0.166
Table 4: MAE values of and with increasing training data split from 20% to 60%. Our model performs better with 50% training data compared to baseline with 60% training data (highlighted in bold). Refer section 4.4 for more details.

4.5 End user scenarios (chemical insights)

Beyond test error evaluation, we also evaluate our model on scenarios that are useful for the end users. In the case of material scientists and chemists, this translates into obtaining chemical insights from the predicted data. This, in turn, provides another framework to compare the two approaches. Here, we analyze two scenarios that can provide some chemical insights.

For the first scenario, we compare the ordering of different materials based on Formation energy. The difference between Formation energy helps to understand the relative stability of different materials. Hence, from the end user standpoint, it is more important to rank the crystals correctly using the rather than the accuracy of prediction. To quantify this ordering (ranking) of materials, we calculate the Spearman’s rank correlation coefficient Myers and Well (2003) for the predicted and true using MT-CGCNN and CGCNN for different amounts of training data as shown in Fig. 3(c). The values of both the approaches are very high and comparable. This suggests that the ordering between the crystals based on their is mostly preserved.

In case of second scenario, based on

we classify the materials into two classes namely (i)

metals – that can easily conduct electrons and (ii) non-metals such as semiconductors and insulators where electron conduction is constrained. The energy equivalent of a physical system maintained at temperature is calculated as , where is Boltzmann constant. In case of room temperature (), this value is  0.025eV. Hence, crystals with less than 0.025 eV are considered metals, while the rest of them are considered non-metals comprising of semiconductors and insulators. Fig. 3(d) shows the area under the curve (AUC) for crystal classification into metal/non-metal using MT-CGCNN and CGCNN for different amounts of training data. It can be observed that MT-CGCNN has a much higher accuracy in classification compared to CGCNN as measured by the AUC metric. In fact, as a function of training data, the lowest AUC of MT-CGCNN is still higher than the highest AUC of CGCNN.

Figure 3: (best viewed in color) (a) Predicted (vs) true for 60% training data. (b) Predicted (vs) true for 60% training data. (c) Spearman’s rank correlation coefficient of predicted and true for MT-CGCNN and CGCNN as a function of training data. Our model is comparable with the baseline. (d) Area under the curve (AUC) of metal/non-metal classification for MT-CGCNN and CGCNN as a function of training data. The lowest AUC of our model is higher than the highest AUC of the baseline. Refer section 4.5 for more details.

4.6 Hyperparameters

We divide the dataset into train, validation and test splits. To tune the hyperparameters, we train the model using the training set and then check the test error on the validation set. We perform grid search with early stopping over the hyperparameter space mentioned in Table 5. For training, we use Adam optimizer (Kingma and Ba, 2015) with a learning rate of 0.01.

Hyperparameter Values
Number of convolutional layers 1, 2, 3, 4, 5
Length of learned atom feature vector 16, 32, 64, 128

Length of graph hidden representation

16, 32, 64, 128
Number of hidden fully-connected layers per task 1, 2, 3, 4
Regularization term
Step size of the Adam optimizer , , ,
Weights in the weighted loss (Eq. 6) 1, 2, 3, 4, 5, 6, 7
Table 5: A list of hyperparameters with values on which grid search is performed

5 Conclusion

In summary, we propose MT-CGCNN, an effective multi-tasking framework that uses crystal graph convolutions to predict different material properties (, , ) by exploiting the correlation between them. We also show that MT-CGCNN can achieve comparable accuracy as CGCNN with fewer training samples. Additionally, we demonstrate the effectiveness of MT-CGCNN by testing some end user scenarios relating to the ordering of crystal based on and classification of materials based on . The ability to predict multiple properties shows that the material representation learned is well generalized. This work opens up new research directions for machine learning with material science, where we can continue to build upon the framework of MT-CGCNN (eg. including soft-parameter sharing) to predict other functional properties of materials with limited input data. Also, exploring dynamic weighted loss has the advantage of not requiring extensive hyperparameter tuning. Integrating this with MT-CGCNN is left for future works (Zhao Chen and Rabinovich, 2018; Kendall et al., 2018). We make MT-CGCNN’s source code available to encourage reproducible research 222


This work was funded by Shell. We would like to thank Professor Umesh Waghmare from Jawaharlal Nehru Centre for Advanced Scientific Research and Professor Arnab Bhattacharyya from Indian Institute of Science for their insightful discussions. We would also like to thank Tian Xie for providing clarifications on various aspects of the CGCNN code.


  • Xie and Grossman (2018) Xie, T.; Grossman, J. C. Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. Phys. Rev. Lett. 2018, 120, 145301.
  • (2) Tom Kalil,; Cyrus Wadia, Materials Genome Initiative for Global Competitiveness.
  • Gaultois et al. (2016) Gaultois, M. W.; Oliynyk, A. O.; Mar, A.; Sparks, T. D.; Mulholland, G. J.; Meredig, B. Perspective: Web-Based Machine Learning Models for Real-Time Screening of Thermoelectric Materials Properties. APL Materials 2016, 4, 053213.
  • Lu et al. (2018) Lu, S.; Zhou, Q.; Ouyang, Y.; Guo, Y.; Li, Q.; Wang, J. Accelerated Discovery of Stable Lead-Free Hybrid Organic-Inorganic Perovskites via Machine Learning. Nature Communications 2018, 9.
  • Gómez-Bombarelli et al. (2016) Gómez-Bombarelli, R. et al. Design of Efficient Molecular Organic Light-Emitting Diodes by a High-Throughput Virtual Screening and Experimental Approach. Nature Materials 2016, 15, 1120–1127.
  • Xue et al. (2016) Xue, D.; Balachandran, P. V.; Hogden, J.; Theiler, J.; Xue, D.; Lookman, T. Accelerated Search for Materials with Targeted Properties by Adaptive Design. Nature Communications 2016, 7, 11241.
  • Hutchinson et al. (2017) Hutchinson, M. L.; Antono, E.; Gibbons, B. M.; Paradiso, S.; Ling, J.; Meredig, B. Overcoming data scarcity with transfer learning. CoRR 2017, abs/1711.05099.
  • (8) Narendra Kumar,; Padmini Rajagopalan,; Praveen Pankajakshan,; Arnab Bhattacharyya,; Suchismita Sanyal,; Janakiraman Balachandran,; Umesh V. Waghmare, Machine Learning Constrained with Dimensional Analysis and Scaling Laws: Simple, Transferable and Interpretable Models of Materials from Small Datasets. (in review)
  • Collobert and Weston (2008) Collobert, R.; Weston, J. A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. Proceedings of the 25th International Conference on Machine Learning. New York, NY, USA, 2008; pp 160–167.
  • Girshick (2015) Girshick, R. B. Fast R-CNN. 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015. 2015; pp 1440–1448.
  • Ramsundar et al. (2015) Ramsundar, B.; Kearnes, S.; Riley, P.; Webster, D.; Konerding, D.; Pande, V. Massively Multitask Networks for Drug Discovery. ArXiv e-prints 2015,
  • Ramsundar et al. (2017) Ramsundar, B.; Liu, B.; Wu, Z.; Verras, A.; Tudor, M.; Sheridan, R. P.; Pande, V. Is Multitask Deep Learning Practical for Pharma? Journal of Chemical Information and Modeling 2017, 57, 2068–2076, PMID: 28692267.
  • Huang and von Lilienfeld (2016) Huang, B.; von Lilienfeld, O. A. Communication: Understanding Molecular Representations in Machine Learning: The Role of Uniqueness and Target Similarity. The Journal of Chemical Physics 2016, 145, 161102.
  • Bartók and Csányi (2015) Bartók, A. P.; Csányi, G. Gaussian Approximation Potentials: A Brief Tutorial Introduction. International Journal of Quantum Chemistry 2015, 115, 1051–1057.
  • Bronstein et al. (2017) Bronstein, M. M.; Bruna, J.; LeCun, Y.; Szlam, A.; Vandergheynst, P. Geometric Deep Learning: Going beyond Euclidean data. IEEE Signal Process. Mag. 2017,
  • Gori et al. (2005) Gori, M.; Monfardini, G.; Scarselli, F. A new model for learning in graph domains. Proceedings. 2005 IEEE International Joint Conference on Neural Networks (IJCNN). 2005; pp 729–734.
  • Scarselli et al. (2009) Scarselli, F.; Gori, M.; Tsoi, A. C.; Hagenbuchner, M.; Monfardini, G. The Graph Neural Network Model. Trans. Neur. Netw. 2009, 20, 61–80.
  • Kipf and Welling (2017) Kipf, T. N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. International Conference on Learning Representations (ICLR). 2017.
  • Bruna et al. (2014) Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y. Spectral Networks and Locally Connected Networks on Graphs. International Conference on Learning Representations (ICLR). 2014.
  • Duvenaud et al. (2015) Duvenaud, D. K.; Maclaurin, D.; Iparraguirre, J.; Bombarell, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R. P. Advances in Neural Information Processing Systems (NIPS) 28; Curran Associates, Inc., 2015; pp 2224–2232.
  • Kearnes et al. (2016) Kearnes, S.; McCloskey, K.; Berndl, M.; Pande, V.; Riley, P. Molecular graph convolutions: moving beyond fingerprints. Journal of Computer-Aided Molecular Design (CAMD) 2016, 30, 595–608.
  • Gilmer et al. (2017) Gilmer, J.; Schoenholz, S. S.; Riley, P. F.; Vinyals, O.; Dahl, G. E. Neural Message Passing for Quantum Chemistry. Proceedings of the 34th International Conference on Machine Learning (ICML). 2017; pp 1263–1272.
  • Altae-Tran et al. (2017) Altae-Tran, H.; Ramsundar, B.; Pappu, A. S.; Pande, V. Low Data Drug Discovery with One-Shot Learning. ACS Central Science 2017, 3, 283–293.
  • Marcheggiani and Titov (2017) Marcheggiani, D.; Titov, I. Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017; pp 1506–1515.
  • Caruana (1997) Caruana, R. Multitask Learning. Machine Learning 1997, 28, 41–75.
  • Ruder (2017) Ruder, S. An Overview of Multi-Task Learning in Deep Neural Networks. CoRR 2017, abs/1706.05098.
  • Zhao Chen and Rabinovich (2018) Zhao Chen, C.-Y. L., Vijay Badrinarayanan; Rabinovich, A. GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks. ICML. 2018.
  • Kendall et al. (2018)

    Kendall, A.; Gal, Y.; Cipolla, R. Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018.

  • Rumelhart et al. (1988) Rumelhart, D. E.; Hinton, G. E.; Williams, R. J. In Neurocomputing: Foundations of Research; Anderson, J. A., Rosenfeld, E., Eds.; MIT Press: Cambridge, MA, USA, 1988; Chapter Learning Representations by Back-propagating Errors, pp 696–699.
  • Jain et al. (2013) Jain, A.; Ong, S. P.; Hautier, G.; Chen, W.; Richards, W. D.; Dacek, S.; Cholia, S.; Gunter, D.; Skinner, D.; Ceder, G.; Persson, K. a. The Materials Project: A materials genome approach to accelerating materials innovation. APL Materials 2013, 1, 011002.
  • Xu et al. (2017) Xu, Y.; Ma, J.; Liaw, A.; Sheridan, R. P.; Svetnik, V. Demystifying Multitask Deep Neural Networks for Quantitative Structure–Activity Relationships. Journal of Chemical Information and Modeling 2017, 57, 2490–2504, PMID: 28872869.
  • Bingel and Søgaard (2017) Bingel, J.; Søgaard, A. Identifying beneficial task relations for multi-task learning in deep neural networks. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. 2017; pp 164–169.
  • Benesty et al. (2009) Benesty, J.; Chen, J.; Huang, Y.; Cohen, I. Noise reduction in speech processing; Springer, 2009; pp 1–4.
  • Myers and Well (2003) Myers, J.; Well, A. Research Design and Statistical Analysis; Research Design and Statistical Analysis v. 1; Lawrence Erlbaum Associates, 2003.
  • Kingma and Ba (2015) Kingma, D. P.; Ba, J. Adam: A Method for Stochastic Optimization. International Conference on Learning Representations (ICLR). 2015.